[HACKERS] pg_stat_transaction patch

2010-05-06 Thread Joel Jacobson
Hi,

I propose a set of new statistics functions and system views.

I need these functions in order to do automated testing of our system,
consisting of hundreds of stored procedures in plpgsql.
My plan is to develop some additional functions to pgTAP, benefiting from
the new system tables I've added.

The patch should apply to 9.0beta or HEAD, but I created it using 8.4.3
because that's the version I'm using.

I'm thankful for your feedback.

My apologies if the packaging of the patch does not conform to your
guidelines, feedback on this is also welcome.

-- 
Best regards,

Joel Jacobson
Glue Finance

E: j...@gluefinance.com
T: +46 70 360 38 01

Postal address:
Glue Finance AB
Box  549
114 11  Stockholm
Sweden

Visiting address:
Glue Finance AB
Birger Jarlsgatan 14
114 34 Stockholm
Sweden


README:

Background
==
The views pg_stat_user_tables and pg_stat_user_functions shows statistics on
tables and functions.
The underlying functions named pg_stat_get_* fetches recent data from the
statistics collector, and returns the requested value for the given oid
(i.e. tableid/relationid or functionid).
In the end of each transaction[1], the collected statistics are sent to the
statistics collector[2].

[1] upon COMMIT/ROLLBACK, or a bit later (the report frequency is controlled
by the PGSTAT_STAT_INTERVAL setting, default value is 500 ms)
[2] if you do a ps aux, it is the process named postgres: stats collector
process

Problem
===
Within a current transaction, there was no way of accessing the internal
data structures which contains the so far collected statistics.
I wanted to check exactly what data changes my functions made and what
functions they called, without having to commit the transaction
and without mixing the statistics data with all the other simultaneously
running transactions.

Solution

I have exported get accessor methods to the internal data structure
containing so far collected statistics for the current transaction.

I have also exported the method pgstat_report_stat to make it possible to
force a report and reset of the so far collected statistics.
This was necessary to avoid not-yet-reported statistics for a previous
transaction to affect the current transaction.

I used the unused_oids script to find unused oids and choosed the range
between 3030-3044 for the new functions.

Functions
=
test=# \df+ pg_catalog.pg_stat_get_transaction_*

   List of functions
   Schema   |Name| Result data type
| Argument data types |  Type  | Volatility | Owner | Language |
   Source code |   Description

++--+-+++---+--++-
 pg_catalog | pg_stat_get_transaction_blocks_fetched | bigint
| oid | normal | stable | joel  | internal |
pg_stat_get_transaction_blocks_fetched | statistics: number of blocks
fetched in current transaction
 pg_catalog | pg_stat_get_transaction_blocks_hit | bigint
| oid | normal | stable | joel  | internal |
pg_stat_get_transaction_blocks_hit | statistics: number of blocks
found in cache in current transaction
 pg_catalog | pg_stat_get_transaction_dead_tuples| bigint
| oid | normal | stable | joel  | internal |
pg_stat_get_transaction_dead_tuples| statistics: number of dead
tuples in current transaction
 pg_catalog | pg_stat_get_transaction_function_calls | bigint
| oid | normal | stable | joel  | internal |
pg_stat_get_transaction_function_calls | statistics: number of function
calls in current transaction
 pg_catalog | pg_stat_get_transaction_function_self_time | bigint
| oid | normal | stable | joel  | internal |
pg_stat_get_transaction_function_self_time | statistics: self execution time
of function in current transaction
 pg_catalog | pg_stat_get_transaction_function_time  | bigint
| oid | normal | stable | joel  | internal |
pg_stat_get_transaction_function_time  | statistics: execution time of
function in current transaction
 pg_catalog | pg_stat_get_transaction_live_tuples| bigint
| oid | normal | stable | joel  | internal |
pg_stat_get_transaction_live_tuples| statistics: number of live
tuples in current transaction
 pg_catalog | pg_stat_get_transaction_numscans   | bigint
| oid | normal | stable | joel  | internal |
pg_stat_get_transaction_numscans   | statistics: number of scans
done for table/index in current transaction
 pg_catalog | pg_stat_get_transaction_tuples_deleted | bigint
| oid | normal | stable | joel  | internal

Re: [HACKERS] pg_stat_transaction patch

2010-05-19 Thread Joel Jacobson
Hajimemashite Takahiro,

Thanks for your feedback.

I applied all the changes on 9.0beta manually and then it compiled without
any assertion failures.

I also changed the oids to a different unused range, since the ones I used
before had been taken in 9.0beta1.

There are still some problems though. I get 0 back from the functions
supposed to return the number of inserts/updates for the current
transaction.

I suspect it is because get_tabstat_entry for some reason returns NULL, in
for example pg_stat_get_transaction_tuples_inserted(PG_FUNCTION_ARGS).

Does the function look valid? If you can find the error in it, the other
functions probably have the same problem.

It is strange though the function pg_stat_get_transaction_numscans works
fine, and it looks like it works the same way.

I added run.out843 and run.out90b1, showing the output from both patched
versions.

run.out843 is the intended output, while run.out90b1 gives 0 on the columns
n_tup_ins and n_tup_upd (and probably n_tup_del etc also).

I hope someone can help locating the problem.

Thanks.

Best regards,

Joel

2010/5/7 Takahiro Itagaki itagaki.takah...@oss.ntt.co.jp


 Joel Jacobson j...@gluefinance.com wrote:

  I propose a set of new statistics functions and system views.
 
  I need these functions in order to do automated testing of our system,
  consisting of hundreds of stored procedures in plpgsql.
  My plan is to develop some additional functions to pgTAP, benefiting from
  the new system tables I've added.

 I ported your patch into 9.0beta, but it doesn't work well.
 I had two assertion failures from the run.sql:

 TRAP: FailedAssertion(!(entry-trans == ((void *)0)), File: pgstat.c,
 Line: 715)
 TRAP: FailedAssertion(!(tabstat-trans == trans), File: pgstat.c, Line:
 1756)

 Also, pg_stat_transaction_functions returned no rows from the test case
 even
 after I removed those assertions. There are no rows in your test/run.out,
 too.

 I like your idea itself, but more works are required for the
 implementation.

 Regards,
 ---
 Takahiro Itagaki
 NTT Open Source Software Center




-- 
Best regards,

Joel Jacobson
Glue Finance

E: j...@gluefinance.com
T: +46 70 360 38 01

Postal address:
Glue Finance AB
Box  549
114 11  Stockholm
Sweden

Visiting address:
Glue Finance AB
Birger Jarlsgatan 14
114 34 Stockholm
Sweden


pg_stat_transaction-1.31.tar.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_transaction patch

2010-06-08 Thread Joel Jacobson
Hi Takahiro,

Here is an updated version of the patch.

Thanks Magnus H for the help :)

1.4: Ported to head. Updated tests. Removed pg_stat_report.


2010/5/25 Takahiro Itagaki itagaki.takah...@oss.ntt.co.jp


 Joel Jacobson j...@gluefinance.com wrote:

  I applied all the changes on 9.0beta manually and then it compiled
 without
  any assertion failures.
 
  I also changed the oids to a different unused range, since the ones I
 used
  before had been taken in 9.0beta1.

 Thanks, but you still need to test your patch:

  - You need to check your patch with make check, because it requires
   adjustments in rule test; Your pg_stat_transaction_function is the
   longest name in the system catalog.

  - You need to configure postgres with --enable-cassert to enable internal
   varidations. The attached test case failed with the following TRAP.
 TRAP: FailedAssertion(!(entry-trans == ((void *)0)), File: pgstat.c,
 Line: 715)
 TRAP: FailedAssertion(!(tabstat-trans == trans), File: pgstat.c, Line:
 1758)

  I suspect it is because get_tabstat_entry for some reason returns NULL,
 in
  for example pg_stat_get_transaction_tuples_inserted(PG_FUNCTION_ARGS).
 
  Does the function look valid? If you can find the error in it, the other
  functions probably have the same problem.

 For the above trap, we can see the comment:
/* Shouldn't have any pending transaction-dependent counts */
 We don't expect to read stats entries during transactions. I'm not sure
 whether accessing transitional stats during transaction is safe or not.

 We might need to go other directions, for example:
  - Use session stats instead transaction stats. You can see the same
information in difference of counters between before and after the
transaction.
  - Export pgBufferUsage instead of relation counters. They are
buffer counters for all relations, but we can obviously export
them because they are just plain variables.

 Regards,
 ---
 Takahiro Itagaki
 NTT Open Source Software Center





-- 
Best regards,

Joel Jacobson
Glue Finance

E: j...@gluefinance.com
T: +46 70 360 38 01

Postal address:
Glue Finance AB
Box  549
114 11  Stockholm
Sweden

Visiting address:
Glue Finance AB
Birger Jarlsgatan 14
114 34 Stockholm
Sweden


pg_stat_transaction-1.4.tar.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_stat_transaction patch

2010-08-09 Thread Joel Jacobson
2010/8/8 Tom Lane t...@sss.pgh.pa.us

 Itagaki Takahiro itagaki.takah...@gmail.com writes:
  Accessor functions to get so far collected statistics for the current
  transaction
  https://commitfest.postgresql.org/action/patch_view?id=301

  The only issue in the patch is too long view and function names:
- pg_stat_transaction_user_tables (31 chars)
- pg_stat_get_transaction_tuples_hot_updated (42 chars)
- pg_stat_get_transaction_function_self_time (42 chars)

  Since we've already used _xact_ in some system objects, we could replace
  _transaction_ parts with _xact_. It will save 7 key types per query ;-)

 Applied, with assorted corrections -

 * Renamed *_transaction_* to *_xact_* as suggested by Itagaki-san.

 * Removed functions and view columns for delta live/dead tuple counts.

 * Marked functions as volatile ... they certainly aren't stable.

 * Got rid of use of get_tabstat_entry() to fetch table entries.  That
 function forcibly creates tabstat entries if they weren't there before,
 which was absolutely not what we want here: it'd result in bloating the
 tabstat arrays with entries for tables the current transaction actually
 never touched.  Worse, since you weren't passing the correct isshared
 flag for the particular relation, the entries could be created with the
 wrong isshared setting, leading to misbehavior if they did get used later
 in the transaction.  We have to use a find-don't-create function here.

 * Fixed bogus handling of inserted/updated/deleted counts --- you need to
 add on the pending counts for all open levels of subtransaction.

 * Assorted docs improvement and other minor polishing.

 BTW, I notice that the patch provides pg_stat_get_xact_blocks_fetched()
 and pg_stat_get_xact_blocks_hit(), but doesn't build any views on top of
 them.  Was this intentional?  Providing a full complement of
 pg_statio_xact_* views seems like overkill to me, but maybe that was where
 you were intending to go and forgot.  If the functions are there then
 anyone who needs the functionality can easily build their own views atop
 them, so this might be an intentional compromise position, but I'm not
 sure.  Or maybe we should decide that intratransaction statio numbers
 aren't likely to be of interest to anybody, and drop the functions too.


When I created the views, I just copied the existing pg_stat_user_* views
without knowing if any columns where irrelevant for current transaction
data.
I guess if someone would need the blocks_fetched/hit, they could build their
own view.



regards, tom lane




-- 
Best regards,

Joel Jacobson
Glue Finance

E: j...@gluefinance.com
T: +46 70 360 38 01

Postal address:
Glue Finance AB
Box  549
114 11  Stockholm
Sweden

Visiting address:
Glue Finance AB
Birger Jarlsgatan 14
114 34 Stockholm
Sweden


[HACKERS] Deadlock bug

2010-08-20 Thread Joel Jacobson
(Magnus and pghackers, I've included you in this email, since it appears to
be PostgreSQL bug. The example below is general, and not specific to Glue
Finance database model. Feel free to share it with anyone.)

I've just tried to replicate the deadlock in 8.4.4 and 9.0b4.
Same problem in both versions.
8.4.4 example: http://screencast.com/t/ZTBlMTBmNTc

 start of comments, specific to Glue Finance database 

(1) Orders.SessionID is not really necessary, we only store it to log what
session created which order. We never use this information, it is merely
saved for logging purposes.
Dropping the foreign key...
orders_sessionid_fkey FOREIGN KEY (sessionid) REFERENCES
sessions(sessionid)
...would mean we risk data integrity problems if the session would be
deleted (which it never is), even if it would be deleted, we wouldn't really
care since it just for logging purposes.

(2) Setting Orders.Heartbeat to now() on each
intended-to-be-most-of-the-times-read-only-until-something-happends-request
(aka Get_Server_Request) is of course a huge performance hit, as it require
a row exclusive lock, meaning such requests cannot be performed in
parallell.
We will therefore remove the Orders.Heartbeat column entirely.

(3) Making sure Orders is always locked first, before obtaining the Sessions
lock, would like you suggest also solve the problem, but requires a larger
rewrite of probably a lot of functions.
Removing the foreign key means we don't have to rewrite the functions.

(4) Fix the PostgreSQL bug.

(1) would effectively solve the deadlock issue, but not the performance
issue, we should therefore do (2) as well.
 end of comments, specific to Glue Finance database 

I think this clearly looks like a bug in PostgreSQL because of the following
observations:

Below are comments to the screencast at http://screencast.com/t/NTk2Y2VhMW

The following example is not specific for Glue Finance database.
Attached, please find the text file with the queries and simple example
schema.

1. Process 1 executes UPDATE A SET Col1 = 1 WHERE AID = 1;.
We can see it obtains two RowExclusiveLocks on relations a_pkey and a.
This is the expected result.

2. Process 2 then executes UPDATE B SET Col2 = 1 WHERE BID = 2;.
We can see it obtains two RowExclusiveLocks on relations b_pkey and b.
I don't know if this is expected, since the row in B references the row in A
being updated by process 1.
Because of the foreign key, shouldn't some kind of share lock on A be
obtained by process 2, or some other kind of lock?

3. Process 1 tries to execute UPDATE B SET Col2 = 1 WHERE BID = 2; and
will of course have to wait, because process 2 already has a
RowExclusiveLock on the same row in table B.

Process 1 is now waiting...

4. Now, in the other SQL prompt (process 2), we take a look at the vLocks
view.
Unexpected observations:
a) both processes have been granted a RowExclusiveLock on table B. How can
both be granted a RowExclusiveLock on the same table? Since the table only
contains one row, it must be a lock on the same row, which should be
impossible, right?
b) process 1 (which is currently waiting) has been granted a lock of type
tuple, page 0, tuple 1, mode ExclusiveLock on table B. I don't know what
a tuple lock is, but what surprises me is process 1 being granted the
lock, and not process 2 (since process 2 updated B before 1).

Now, while process 1 is waiting, let's execute the same query in process 2:

5. Process 2 tries to execute UPDATE B SET Col2 = 1 WHERE BID = 2; which
is exactly the same query as in step 2 above.
Since process 2 already hold a granted RowExclusiveLock on the row in table
B it tries to update, I think this query should be executed instantly
without any problem. Instead, it causes a deadlock in process 2, allowing
process 1 to commit. Very strange.

Could this have any other explanation than a bug (or perhaps feature) in
postgres?

-- 
Best regards,

Joel Jacobson
Glue Finance

E: j...@gluefinance.com
T: +46 70 360 38 01

Postal address:
Glue Finance AB
Box  549
114 11  Stockholm
Sweden

Visiting address:
Glue Finance AB
Birger Jarlsgatan 14
114 34 Stockholm
Sweden


deadlock.sql
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Deadlock bug

2010-08-20 Thread Joel Jacobson
Hm, in my example, there are no INSERTs in the two conflicting transactions?
The suggestion on adding an ON INSERT trigger would have no effect as far as
I can see.
The comment from trigger.c is also about INSERT, can't see how it affects
us.

I don't understand exactly why this deadlock occurs, but the one thing I
cannot understand is why process 2 is not allowed to update the same row,
which it has already updated in the same transaction.

In general, if a transaction has a write row lock (or what ever it is
called in postgres), i.e., exclusive right to modify the row in the table,
shouldn't that same transaction always be allowed to update the same row in
a later stage? I understand the foreign key is the reason for the conflict,
but process 2 doesn't attempt to modify the foreign key data, it only does
update on table B.

It just doesn't make sense to abort process 2 with a deadlock in my example.

(If it helps, we would be willing to assign a bounty prize to anyone taking
on the task to solve this problem.)

Best regards,

Joel Jacobson
Glue Finance


2010/8/20 Tom Lane t...@sss.pgh.pa.us

 Kevin Grittner kevin.gritt...@wicourts.gov writes:
  The surprising thing is that a particular row is (using the
  identifiers from the attachment):

  Process 2 updates a particular row without blocking.
  Process 1 updates the same row, which blocks.
  Process 2 updates the same row again (with *exactly* the same UPDATE
  statement), which fails with a deadlock.

  I'm not sure I consider that a bug, but it moves the needle on the
  astonishment meter.

 OK, I looked a bit closer.  The first update in process 2 is changing
 a row in B that has an FK reference to an already-modified row in A.
 The only reason that doesn't block is that we optimize away taking a
 sharelock on the referenced row if the update doesn't change the FK
 column(s), as this doesn't.  However, the *second* update doesn't
 get the benefit of that optimization, as per this comment in trigger.c:

 * There is one exception when updating FK tables: if
 the
 * updated row was inserted by our own transaction and
 the
 * FK is deferred, we still need to fire the trigger.
 This
 * is because our UPDATE will invalidate the INSERT so
 the
 * end-of-transaction INSERT RI trigger will not do
 * anything, so we have to do the check for the UPDATE
 * anyway.

 So it goes and waits for sharelock on the A row, and then you have a
 deadlock because process 1 has exclusive lock on that row and is already
 blocked waiting for process 2.

 The Glue guys aren't the first to complain of this behavior, so it'd
 be nice to improve it.

 If we knew that the already-updated row was one for which we'd been able
 to optimize away the FK check, then we could do so again on the second
 update (assuming it still didn't change the FK columns), but I don't see
 any practical way to know that.  We only have our hands on the current
 update's old and new tuples, not on previous versions; and there's no
 convenient way to find the previous version because the update ctid
 links run the other way.

 [ thinks for awhile... ]  Conceivably we could get around this by
 programming the ON INSERT trigger to chase forward to the latest live
 row version, rather than just doing nothing when the initially inserted
 row has been outdated.  It'd be a pretty ticklish thing to get right,
 though.

regards, tom lane




-- 
Best regards,

Joel Jacobson
Glue Finance

E: j...@gluefinance.com
T: +46 70 360 38 01

Postal address:
Glue Finance AB
Box  549
114 11  Stockholm
Sweden

Visiting address:
Glue Finance AB
Birger Jarlsgatan 14
114 34 Stockholm
Sweden


Re: [Glue] [HACKERS] Deadlock bug

2010-08-20 Thread Joel Jacobson
In my example,

Process 1:Process 2:
BEGIN;
SELECT pg_backend_pid();
BEGIN;
SELECT
pg_backend_pid();
UPDATE A SET Col1 = 1 WHERE AID = 1;
SELECT * FROM vLocks WHERE PID IN (2165,2157);
UPDATE B SET Col2 =
1 WHERE BID = 2;
SELECT * FROM vLocks
WHERE PID IN (2165,2157);
UPDATE B SET Col2 = 1 WHERE BID = 2;
SELECT * FROM vLocks WHERE PID IN (2165,2157);
UPDATE B SET Col2 =
1 WHERE BID = 2;
SELECT * FROM vLocks
WHERE PID IN (2165,2157);

Process 2 is aborted due to deadlock, while process 1 is allowed to commit.

If the locking logic would be modified to allow process 2 to go through, and
instead abort process 1, I understand some other scenarios would of course
be affected, where the situation would be handled in a less optimal way.

Is there any example of scenarios where it is optimal to handle this kind of
locking situation in this way?

I am totally fine living with a feature, which is a problem in some cases,
and something good in other cases, as long as the good cases are more common
than the problem cases.

Another question, Tom referred to a comment in
src/backend/command/trigger.c.
My example does not contain any triggers, nor insert commands. Is the
trigger.c-comment still relevant or is it a misunderstanding?

2010/8/20 Josh Berkus j...@agliodbs.com

 On 8/20/10 7:18 AM, Tom Lane wrote:
  It does go through without any deadlock, *if* there is no foreign key
  involved.  You didn't tell us exactly what the FK relationship is, but
  I suspect the reason for the deadlock is that one process is trying to
  update a row that references some row already updated by the other.
  That will require a row-level share lock on the referenced row, so you
  can get a deadlock.

 That's correct. This is the generic example I was talking about earlier
 on -hackers.  I'm not certain it's a bug per spec; I wanted to talk
 through with Kevin what we *should* be doing in this situation.

 This is one example of a set of user-hostile FK-related deadlock
 behavior we have.  I'm just not certain it's logically possible to
 improve it.

 --
  -- Josh Berkus
 PostgreSQL Experts Inc.
 http://www.pgexperts.com




-- 
Best regards,

Joel Jacobson
Glue Finance

E: j...@gluefinance.com
T: +46 70 360 38 01

Postal address:
Glue Finance AB
Box  549
114 11  Stockholm
Sweden

Visiting address:
Glue Finance AB
Birger Jarlsgatan 14
114 34 Stockholm
Sweden


Re: [HACKERS] [Glue] Deadlock bug

2010-08-20 Thread Joel Jacobson
OK. Thanks for the explanation. It's probably the case in general, but in
all of my tests (10), process 2 always aborts. I don't think it is
arbitrary in this example, or could it be?

2010/8/20 Greg Stark gsst...@mit.edu

 On Fri, Aug 20, 2010 at 7:38 PM, Joel Jacobson j...@gluefinance.com
 wrote:
  If the locking logic would be modified to allow process 2 to go through,
 and
  instead abort process 1, I understand some other scenarios would of
 course
  be affected, where the situation would be handled in a less optimal way.

 Which process dies when there's a deadlock is pretty much arbitary.
 The first process to notice the deadlock will just throw an error
 itself.  Which one notices first depends on the timing of when the
 blocking locks were taken.

 If the second process to get stuck blocks before the first process
 checks then the first process will notice first. If it does other
 stuff first then the first process will check, not find a deadlock and
 go back to sleep. Then the deadlock won't be detected until the second
 process checks.

 --
 greg




-- 
Best regards,

Joel Jacobson
Glue Finance

E: j...@gluefinance.com
T: +46 70 360 38 01

Postal address:
Glue Finance AB
Box  549
114 11  Stockholm
Sweden

Visiting address:
Glue Finance AB
Birger Jarlsgatan 14
114 34 Stockholm
Sweden


Re: [HACKERS] Deadlock bug

2010-08-20 Thread Joel Jacobson
Process 1 updates A in its transaction, which is still going on when process
2 updates B, requiring a sharelock on A, which it is granted. But when
process 2 does its second update of B, also of course requiring a sharelock
on A, it is not granted.

I fully agree it must obtain a sharelock on the FK, but I cannot understand
why it is granted it the first time, but not the second time?

2010/8/20 Tom Lane t...@sss.pgh.pa.us

 Joel Jacobson j...@gluefinance.com writes:
  I don't understand exactly why this deadlock occurs, but the one thing I
  cannot understand is why process 2 is not allowed to update the same row,
  which it has already updated in the same transaction.

 It *is* allowed to, and in fact has already done so.  The problem is
 that it now needs a sharelock on the referenced row in order to ensure
 that the FK constraint remains satisfied, ie, nobody deletes the
 referenced row before we commit the update.  In the general case where
 the referencing row is new (or has a new FK value) in the current
 transaction, such a lock is necessary for correctness.  Your case would
 work if we could optimize away the FK check, but with only a limited
 view of what's happened in the current transaction, it's not always
 possible to optimize away the check.

regards, tom lane




-- 
Best regards,

Joel Jacobson
Glue Finance

E: j...@gluefinance.com
T: +46 70 360 38 01

Postal address:
Glue Finance AB
Box  549
114 11  Stockholm
Sweden

Visiting address:
Glue Finance AB
Birger Jarlsgatan 14
114 34 Stockholm
Sweden


Re: [HACKERS] Deadlock bug

2010-08-20 Thread Joel Jacobson
Optimized away, check, OK, but why? Because there is no new data in the FK
(table A) at the point of the first update of table B in process 2? But when
process 1 updates A, the FK B-A points to new data, which leads to process
2 tries to acquire a sharelock, which is not granted due to the update of A?

2010/8/20 Tom Lane t...@sss.pgh.pa.us

 Joel Jacobson j...@gluefinance.com writes:
  I fully agree it must obtain a sharelock on the FK, but I cannot
 understand
  why it is granted it the first time, but not the second time?

 It *isn't* granted it the first time, because it doesn't try to acquire
 it the first time.  That FK check gets optimized away, while the second
 one doesn't.  Please reread what I said before.

regards, tom lane




-- 
Best regards,

Joel Jacobson
Glue Finance

E: j...@gluefinance.com
T: +46 70 360 38 01

Postal address:
Glue Finance AB
Box  549
114 11  Stockholm
Sweden

Visiting address:
Glue Finance AB
Birger Jarlsgatan 14
114 34 Stockholm
Sweden


Re: [HACKERS] Deadlock bug

2010-08-26 Thread Joel Jacobson
I thought it would be interesting to see how other databases handle
this peculiar deadlock situation.

I didn't have access to any Oracle or Sybase databases, but for what
it's worth I've tested MySQL.

Results:

1. Process 1 successfully made its update and managed to commit.

2. Process 1 second update did not went straight through, but had to
wait for process 2 to attempt to commit.

3. Process 2 did not manage to commit, all its updates were discarded.

Demo of the test:

http://screencast.com/t/ZGJmMTcxN

/Joel

2010/8/25 Robert Haas robertmh...@gmail.com:
 On Wed, Aug 25, 2010 at 10:02 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On Wed, 2010-08-25 at 15:51 +0200, Markus Wanner wrote:
 Simon,

 On 08/25/2010 11:53 AM, Simon Riggs wrote:
  ..we want to ensure that the PK value..

 ..or any other possibly referenced attributes?

 Don't think that's relevant.

 referenced meaning by an RI constraint, which only ever refers to
 PKs in other tables.

 That doesn't appear to be correct:

 rhaas=# create table p (a integer primary key, b integer not null,
 unique (b)); NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit
 index p_pkey for table p
 NOTICE:  CREATE TABLE / UNIQUE will create implicit index p_b_key
 for table p
 CREATE TABLE
 rhaas=# create table r (b integer not null references p (b));
 CREATE TABLE

 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise Postgres Company




-- 
Best regards,

Joel Jacobson
Glue Finance

E: j...@gluefinance.com
T: +46 70 360 38 01

Postal address:
Glue Finance AB
Box  549
114 11  Stockholm
Sweden

Visiting address:
Glue Finance AB
Birger Jarlsgatan 14
114 34 Stockholm
Sweden

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Generate call graphs in run-time

2012-01-10 Thread Joel Jacobson
Is this only to print out the stack upon errors? Looks like the stack is in
the variable error_context_stack. Is it always available containing all the
parent functions, even when there is no error? Can I reach it from within
pgstat.c?

2012/1/10 Jim Nasby j...@nasby.net

 On Jan 9, 2012, at 2:08 PM, Joel Jacobson wrote:
  Generates call graphs of function calls within a transaction in run-time.

 Related to this... we had Command Prompt write a function for us that
 would spit out the complete call-graph of the current call stack whenever
 it was called. Alvaro didn't need to add any hooks to the backend to
 accomplish this, so it would seem that the call stack is already available.
 That might simplify what you're doing.

 I don't see this posted on pgFoundry yet, so I've attached it.
 --
 Jim C. Nasby, Database Architect   j...@nasby.net
 512.569.9461 (cell) http://jim.nasby.net




-- 
Joel Jacobson
Trustly
+46703603801
https://trustly.com


Re: [HACKERS] Generate call graphs in run-time

2012-01-17 Thread Joel Jacobson
On Mon, Jan 16, 2012 at 2:23 PM, Martin Pihlak martin.pih...@gmail.com wrote:
 My approach was to add parent oid to the per-backend function stats
 structure - PgStat_BackendFunctionEntry. Also, I changed the hash key
 for that structure to (oid, parent) pair. This means that within the
 backend the function usage is always tracked with the context of
 calling function. This has the nice property that you get the per-parent
 usage stats as well. Also the additional lists for parent tracking are
 avoided.

 During pgstat_report_stat() the call graph (with stats) is output
 to logs and the statistics uploaded to collector -- with the parent oid
 removed.

Since you only care about the parentfuncid in one level, it looks like
you will only be able to get a total call graph of all possible
function calls, and not each unique call graph per transaction.
If you have two separate transactions:
T1: a-b, b-c
T2: b-d
You would have two unique call graphs {a-b, b-c} and {b-d}.
The global call graph, where you only keep track of all unique
parentfuncid-funcid pairs, would be {a-b, b-c, b-d}, which lacks
the information on what different call graphs are actually being
executed per transaction.

Also, why remove the parent oid when uploading the statistics to the collector?
It would be nice to have the statistics for each function per parent,
to see where you have a bottleneck which might only be occurring in a
function when called from a specific parent.
Even more fine-grained would be to have the statistics per unique
call-graph, i.e. the entire tree of functions called in the
transactions.

 There is a patch for this and we do use it in production for occasional
 troubleshooting and dependency analysis. Can't attach immediately
 though -- it has some extra cruft in it that needs to be cleaned up.

I would highly appreciate a patch, don't worry about cleaning up, I
can do that, unless it's some code you can't share for other reasons.

 Indeed. Something like a pg_stat_user_function_details view would be
 very useful. Something along the lines of:

   Column     |  Type  |
 --++
  funcid       | oid    |
  parentfuncid | oid    | -- new
  schemaname   | name   |
  funcname     | name   |
  calls        | bigint |
  total_time   | bigint |
  self_time    | bigint |

funcid-parentfuncid might be sufficient for performance
optimizations, but to automatically generate directional graphs of all
unique call graphs in run-time, you would need all the unique pairs of
funcid-parentfuncid as a singel column, probably a sorted array of
oids[][], example: [[1,2],[1,3],[2,4],[2,5]] if the call craph would
be {1-2, 1-3, 2-4, 2-5}.


 And then rewrite pg_stat_user_functions by aggregating the detailed
 view. That'd make the individual pg_stat_get_function* functions a
 bit slower, but that is probably a non-issue - at least not if the
 pg_stat_user_functions view is rewritten to use a SRF.

 regards,
 Martin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Explicitly specifying use of IN/OUT variable in PL/pgSQL functions

2012-03-13 Thread Joel Jacobson
The introduction of custom_variable_classes and #variable_conflict in
9.0 partly solves the problem with mixing IN/OUT variables with column
names.
In 8.4 and before, it defaulted to the IN/OUT variable, if it shared
the name of a column.
In 9.0 the behaviour was changed to raise an error if a variable
shared the same name as a column.
This was an important an great change, as it catches unintentional
potentially very dangerous mixups of IN/OUT variables and column
names.

But it's still not possible to use the same names in IN/OUT variables
and column names, which is somewhat a limitation, if not at least it
can be argued it's ugly.

In situations when it's natural and makes sense to mix IN/OUT
variables and columns names, it would be nice to being able to
explicitly specifying you are referring to the IN or OUT variable with
a specific name.
In lack of better ideas, I propose to prefix conflicting variable with
IN or OUT, to allow using them, even if there is a column with a
conflicting name.

Example:

CREATE OR REPLACE FUNCTION Get_UserID(OUT UserID integer, Username
text) RETURNS INTEGER AS $BODY$
#variable_conflict use_column
BEGIN
SELECT UserID INTO OUT.UserID FROM Users WHERE Username = IN.Username;
IF NOT FOUND THEN
INSERT INTO Users (Username) VALUES (IN.Username) RETURNING UserID
INTO STRICT OUT.UserID;
END IF;
RETURN;
END;
$BODY$ LANGUAGE plpgsql VOLATILE;

While IN is a reserved word, OUT is not, so I guess that's a bit of a
problem with existing code.
Perhaps some other words or symbols can be used.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] UPDATE pg_catalog.pg_proc.prosrc OK?

2010-12-27 Thread Joel Jacobson
I'm working on a tool to simplify updating the source code of database
functions.
To do a revert my plan is to store the values of pg_proc.* before
updating, and then to restore pg_proc for the given oid if a revert is
necessary.
This raises the question,

Is it safe to do,
UPDATE pg_proc SET column = new source code WHERE oid = function's
oid;
instead of using the CREATE OR REPLACE FUNCTION command?

(Using pg_dump -s + pg_restore isn't feasible since it operates on the
entire schema, and not just a subset of functions, important in my case.)

Thanks for any advise.

-- 
Best regards,

Joel Jacobson
Glue Finance


Re: [HACKERS] UPDATE pg_catalog.pg_proc.prosrc OK?

2010-12-28 Thread Joel Jacobson
2010/12/28 Robert Haas robertmh...@gmail.com

 I'm not immediately sure whether it's safe, but it seems like playing
 with fire, and I don't see any particular advantage to doing it this
 way over using CREATE OR REPLACE FUNCTION.


While I understand some of the SQL commands affecting pg_catalog have side
effects, such as CREATE DATABASE, others seems to lack side effects.

To use CREATE OR REPLACE FUNCTION, I would have to assemble SQL from the
data in pg_proc.* for each function, which is quite a complex task (e.g.,
pg_dump does this, and it's far from trivial, due to differences between
different versions etc).

I understand it's not a good idea to read/write pg_proc between different
databases, but this is not my case.
My plan:
1. Take snapshot of pg_catalog.pg_proc.*
2. Update existing/install new source code of functions
3. Monitor how the live system behaves (might take 30 minutes or something
like that)
4. If problems occurr, revent to the old state by removing the new pg_proc
entries and restoring the modified existing ones.

Problems are not expected since the new code has been tested locally in a
database with identical schema, but I've learned you can never be one
hundred percent sure everything always works.

Until now, I've been creating a revent .sql-file manually, which drops the
new functions and restores the replaced functions with their old source
code.
This is quite time consuming and of course prone to human errors.

Alternative approach: It would be good if pg_dump could split a plaintext
schema dump into separate files. That would allow you to only restore the
functions, which would solve part of my problem, but would still cause
problems for functions where you alter the arguments, in which case the
existing function with the same name needs to be dropped first, before
creating the new function with different arguments. For such scenarios, I
would need to drop the new functions first, before restoring the old
functions from the dump.



 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company




-- 
Best regards,

Joel Jacobson
Glue Finance

E: j...@gluefinance.com
T: +46 70 360 38 01

Postal address:
Glue Finance AB
Box  549
114 11  Stockholm
Sweden

Visiting address:
Glue Finance AB
Birger Jarlsgatan 14
114 34 Stockholm
Sweden


[HACKERS] pg_dump --split patch

2010-12-28 Thread Joel Jacobson
Dear fellow hackers,

Problem: A normal diff of two slightly different schema dump files (pg_dump
-s), will not produce a user-friendly diff, as you get all changes in the
same file.

Solution: I propose a new option to pg_dump, --split, which dumps each
object to a separate file in a user friendly directory structure:

[-f filename] : main dump file, imports each splitted part using \i
[-f filename]-split/[desc]/[tag]/[oid].sql : dump of the oid

Example: If the filename (-f) is pg.dump, the following directory
structure would be created:

$ ./pg_dump -f /crypt/pg.dump --split -F p -s glue
/crypt/pg.dump-split/VIEW/
/crypt/pg.dump-split/TYPE/
/crypt/pg.dump-split/TRIGGER/
/crypt/pg.dump-split/TABLE/
/crypt/pg.dump-split/SEQUENCE/
/crypt/pg.dump-split/SCHEMA/
/crypt/pg.dump-split/PROCEDURAL_LANGUAGE/
/crypt/pg.dump-split/INDEX/
/crypt/pg.dump-split/FUNCTION/
/crypt/pg.dump-split/FK_CONSTRAINT/
/crypt/pg.dump-split/CONSTRAINT/
/crypt/pg.dump-split/AGGREGATE/

In each such directory, one directory per object name is created.
If we would have a function foobar with oid 12345 it would be saved to:
/crypt/pg.dump-split/FUNCTION/foobar/12345.sql

In the pg.dump plain text file, the files are linked in using the \i
psql command, e.g.:
\i /crypt/pg.dump-split/FUNCTION/foobar/12345.sql

Potential use-case scenarios:

*) Version control your database schema, by exporting it daily (using
--split) and commiting the differences.

*) Compare differences of schema dumps created in different points in time.
Since objects are stored in separate files, it is easier to see what areas
were modified, compared to looking at the diff of two entire schemas.

*) Restore only some objects, based on type (e.g., only the functions) or
name (e.g. only fucntions of certain name/names).

I've tested the patch for both the latest HEAD (9.1devel) as well as 8.4.6.

Feedback welcome.

-- 
Best regards,

Joel Jacobson
Glue Finance


pg-dump-split-plain-text-files-8.4.6.patch
Description: Binary data


pg-dump-split-plain-text-files-9.1devel.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump --split patch

2010-12-28 Thread Joel Jacobson
2010/12/28 Tom Lane t...@sss.pgh.pa.us

 Joel Jacobson j...@gluefinance.com writes:
  Dear fellow hackers,
  Problem: A normal diff of two slightly different schema dump files
 (pg_dump
  -s), will not produce a user-friendly diff, as you get all changes in the
  same file.

  Solution: I propose a new option to pg_dump, --split, which dumps each
  object to a separate file in a user friendly directory structure:

 Um ... how does that solve the claimed problem exactly?


Because then you can do,
$ diff -r old schema dump dir new schema dump dir,
instead of,
$ diff old entire schema dump new entire schema dump
which will nicely reveal each individual object modified, as opposed to a
huge global diff of everything



  [-f filename] : main dump file, imports each splitted part using \i
  [-f filename]-split/[desc]/[tag]/[oid].sql : dump of the oid

 This particular choice seems remarkably *un* friendly, since two dumps
 from different DBs will inevitably not share the same OIDs, making it
 practically impossible to compare them even if they are logically
 identical.  But even without the choice to use OIDs in the filenames
 I'm unconvinced that file-per-object is a good idea in any way shape or
 form.


Good point!

To compare two different database, perhaps it's possible to use a sequence,
1,2,...,n for each file in each directory, i.e., /[desc]/[tag]/[n], and to
sort them by something distinct which will ensure the same numbering between
different databases, such as the arguments for functions, or other
properties for other kind of objects. Any ideas?

(In my case, I didn't need to compare schemas between different database. I
needed to compare two dumps created at different points in time of the same
database, which do share the same oids for objects existing in both
versions.)





regards, tom lane




-- 
Best regards,

Joel Jacobson
Glue Finance

E: j...@gluefinance.com
T: +46 70 360 38 01

Postal address:
Glue Finance AB
Box  549
114 11  Stockholm
Sweden

Visiting address:
Glue Finance AB
Birger Jarlsgatan 14
114 34 Stockholm
Sweden


Re: [HACKERS] pg_dump --split patch

2010-12-28 Thread Joel Jacobson
2010/12/28 Tom Lane t...@sss.pgh.pa.us


 That has at least as many failure modes as the other representation.


I don't follow, what do you mean with failure modes? The oid in the
filename? I suggested to use a sequence instead but you didn't comment on
that. Are there any other failure modes which could cause a diff -r between
two different databases to break?

(This might be a bad idea for some other reason, but I noticed a few other
users requesting the same feature when I googled pg_dump split.)

-- 
Best regards,

Joel Jacobson
Glue Finance


Re: [HACKERS] pg_dump --split patch

2010-12-28 Thread Joel Jacobson
2010/12/28 Gurjeet Singh singh.gurj...@gmail.com

 I would suggest the directory structure as:

 /crypt/pg.dump-split/schema-name-1/VIEWS/view-name-1.sql
 /crypt/pg.dump-split/schema-name-1/TABLES/table-name-1.sql
 ...
 /crypt/pg.dump-split/schema-name-2/VIEWS/view-name-1.sql
 /crypt/pg.dump-split/schema-name-2/TABLES/table-name-1.sql

 This might n be more amenable to diff'ing the different dumps. Schemas are
 logical grouping of other objects and hence making that apparent in your
 dump's hierarchy makes more sense.


Thanks Gurjeet and Tom for good feedback!

I've made some changes and attached new patches.
Looks much better now I think!

This is what I've changed,

*) Not using oid anymore in the filename
*) New filename/path structure: [-f
filename]-split/[schema]/[desc]/[tag].sql
*) If two objects share the same name tag for the same [schema]/[desc], -2,
-3, etc is appended to the name. Example:
~/pg.dump-split/public/FUNCTION/foobar.sql
~/pg.dump-split/public/FUNCTION/foobar-2.sql
~/pg.dump-split/public/FUNCTION/barfoo.sql
~/pg.dump-split/public/FUNCTION/barfoo-2.sql
~/pg.dump-split/public/FUNCTION/barfoo-3.sql

I think you are right about functions (and aggregates) being the only
desc-type where two objects can share the same name in the same schema.
This means the problem of dumping objects in different order is a very
limited problem, only affecting overloaded functions.

I didn't include the arguments in the file name, as it would lead to very
long file names unless truncated, and since the problem is very limited, I
think we shouldn't include it. It's cleaner with just the name part of the
tag in the file name.

-- 
Best regards,

Joel Jacobson
Glue Finance


pg-dump-split-plain-text-files-8.4.6.patch
Description: Binary data


pg-dump-split-plain-text-files-9.1devel.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump --split patch

2010-12-28 Thread Joel Jacobson
Sent from my iPhone

On 28 dec 2010, at 21:45, Gurjeet Singh singh.gurj...@gmail.com wrote:

The problem I see with suffixing a sequence id to the objects with name
collision is that one day the dump may name myfunc(int) as myfunc.sql and
after an overloaded version is created, say myfunc(char, int), then the same
myfunc(int) may be dumped in myfunc-2.sql, which again is non-deterministic.


I agree, good point!
Perhaps abbreviations are to prefer, e.g., myfunc_i, myfunc_i_c, etc to
reduce the need of truncating filenames.



Also, it is a project policy that we do not introduce new features in back
branches, so spending time on an 8.4.6 patch may not be the best use of your
time.


My company is using 8.4 and needs this feature, so I'll have to patch it
anyway :)


Regards,
-- 
gurjeet.singh
@ EnterpriseDB - The Enterprise Postgres Company
http://www.EnterpriseDB.com

singh.gurj...@{ gmail | yahoo }.com
Twitter/Skype: singh_gurjeet

Mail sent from my BlackLaptop device


Re: [HACKERS] pg_dump --split patch

2010-12-28 Thread Joel Jacobson
2010/12/28 Andrew Dunstan and...@dunslane.net


 I think that's just horrible. Does the i stand for integer or inet? And it
 will get *really* ugly for type names with spaces in them ...


True, true.

But while c is too short, I think character varying is too long. Is
there some convenient lookup table to convert between the long names to the
short names?
E.g.,
character varying = varchar
timestamp with time zone = timestamptz
etc.

-- 
Best regards,

Joel Jacobson
Glue Finance


Re: [HACKERS] pg_dump --split patch

2010-12-28 Thread Joel Jacobson
2010/12/29 David Wilson david.t.wil...@gmail.com

 Why not place all overloads of a function within the same file? Then,
 assuming you order them deterministically within that file, we sidestep the
 file naming issue and maintain useful diff capabilities, since a diff of the
 function's file will show additions or removals of various overloaded
 versions.


Good suggestion. I agree, trying to put variations of the same function in
different files simply becomes too ugly and the problem it solves is not big
enough.
Then we just need to make sure pg_dump dumps objects in the same order, and
let diff take care of the rest of the challenge. :)

Brb with a new patch.

-- 
Best regards,

Joel Jacobson
Glue Finance


Re: [HACKERS] pg_dump --split patch

2010-12-28 Thread Joel Jacobson
2010/12/29 Tom Lane t...@sss.pgh.pa.us


 If you've solved the deterministic-ordering problem, then this entire
 patch is quite useless.  You can just run a normal dump and diff it.


No, that's only half true.

Diff will do a good job minimizing the size of the diff output, yes, but
such a diff is still quite useless if you want to quickly grasp the context
of the change.

If you have a hundreds of functions, just looking at the changed source code
is not enough to figure out which functions were modified, unless you have
the brain power to memorize every single line of code and are able to figure
out the function name just by looking at the old and new line of codes.

To understand a change to my database functions, I would start by looking at
the top-level, only focusing on the names of the functions
modified/added/removed.
At this stage, you want as little information as possible about each change,
such as only the names of the functions.
To do this, get a list of changes functions, you cannot compare two full
schema plain text dumps using diff, as it would only reveal the lines
changed, not the name of the functions, unless you are lucky to get the name
of the function within the (by default) 3 lines of copied context.

While you could increase the number of copied lines of context to a value
which would ensure you would see the name of the function in the diff, that
is not feasible if you want to quickly get a picture of the code areas
modified, since you would then need to read through even more lines of diff
output.

For a less database-centric system where you don't have hundreds of stored
procedures, I would agree it's not an issue to keep track of changes by
diffing entire schema files, but for extremely database-centric systems,
such as the one we have developed at my company, it's not possible to get
the whole picture of a change by analyzing diffs of entire schema dumps.

The patch has been updated:

*) Only spit objects with a namespace (schema) not being null
*) Append all objects of same tag (name) of same type (desc) of same
namespace (schema) to the same file (i.e., do not append -2, -3, like
before) (Suggested by David Wilson, thanks.)

I also tested to play around with ORDER BY pronargs and ORDER BY pronargs
DESC to the queries in getFuncs() in pg_dump.c, but it had no effect to the
order the functions of same name but different number of arguments were
dumped.
Perhaps functions are already sorted?
Anyway, it doesn't matter that much, keeping all functions of the same name
in the same file is a fair trade-off I think. The main advantage is the
ability to quickly get a picture of the names of all changed functions,
secondly to optimize the actual diff output.


-- 
Best regards,

Joel Jacobson
Glue Finance

E: j...@gluefinance.com
T: +46 70 360 38 01

Postal address:
Glue Finance AB
Box  549
114 11  Stockholm
Sweden

Visiting address:
Glue Finance AB
Birger Jarlsgatan 14
114 34 Stockholm
Sweden


pg-dump-split-plain-text-files-9.1devel.patch
Description: Binary data


pg-dump-split-plain-text-files-9.1alpha2.patch
Description: Binary data


pg-dump-split-plain-text-files-8.4.6.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump --split patch

2010-12-28 Thread Joel Jacobson
2010/12/29 Andrew Dunstan and...@dunslane.net

 try:

  diff -F '^CREATE' ...

 cheers

 andrew


Embarrasing, I'm sure I've done `man diff` before, must have missed that
one, wish I'd known about that feature before, would have saved me many
hours! :-) Thanks for the tip!

There are some other real-life use-cases where I think splitting would be
nice and save a lot of time:

a) if you don't have a perfect 1:1 relationship between all the SPs in your
database and your source code repository (handled by your favorite version
control system), i.e. if you suspect some SPs in the database might differ
compared to the source code files in your repo. In this scenario, it might
be simpler to start over and continue developing on a repo built from a
pg_dump --split export. You would lose all history, but it might still be
worth it if the compare everything in database against source code files in
repo-project would take a lot of man hours.

b) quick branching - perhaps you are a consultant at a company where they
don't even have the SPs stored in separate files, they might have been
magically installed by some consultant before you without any trace. :-) To
get up to speed solving the problem you've been assigned, which in this
example involves a lot of SP coding and modifications of existing functions,
it would save a lot of time if you had all functions in separate files
before you started coding, then you would use git or any other nice version
control system to track your changes and figure out what you've done once
you get everything to work.

c) automatically saving daily snapshots of your production database schema
to your version control system. While the best version control system (git)
does not track individual files, many of the ancient ones still very popular
ones like svn do so. If every function in the production database schema
would be saved automatically to the VCS, you would be guaranteed to have a
tack of all deployed changes affecting each function, which is probably a
lot fewer changes compared to the entire history for each function, assuming
developers commit things while developing and not only when deploying.

d) while pg_dump offers some options to limit the output content, such as -s
for schema only and -t/-T to limit which tables to dump, it lacks options
to export functions only or these functions only. It would require quite
a lot of such options to provide the same flexibility as a split dump,
highly reducing the need for such options as you could then compose your own
restore script based on the dump.

Of course, not all of these scenarios are relevant for everybody.

-- 
Best regards,

Joel Jacobson
Glue Finance


Re: [HACKERS] pg_dump --split patch

2010-12-29 Thread Joel Jacobson
2010/12/29 Aidan Van Dyk ai...@highrise.ca

 On Wed, Dec 29, 2010 at 2:27 AM, Joel Jacobson j...@gluefinance.com
 wrote:

 description of split stuff

 So, how different (or not) is this to the directory format that was
 coming out of the desire of a parallel pg_dump?


Not sure what format you are referring to? Custom, tar or plain text?
I noticed there are two undocumented formats as well, append and file.
I tried both of these undocumented formats, but it did not procude any
directory structure of the dumped objects.

Could you please explain how to use the directory format is such a format
already exists?
I can't find it in the documentation nor the source code of HEAD.




 a.

 --
 Aidan Van Dyk Create like a
 god,
 ai...@highrise.ca   command like a
 king,
 http://www.highrise.ca/   work like a
 slave.




-- 
Best regards,

Joel Jacobson
Glue Finance

E: j...@gluefinance.com
T: +46 70 360 38 01

Postal address:
Glue Finance AB
Box  549
114 11  Stockholm
Sweden

Visiting address:
Glue Finance AB
Birger Jarlsgatan 14
114 34 Stockholm
Sweden


Re: [HACKERS] pg_dump --split patch

2010-12-29 Thread Joel Jacobson
2010/12/29 Tom Lane t...@sss.pgh.pa.us

 I think they're fundamentally different things, because the previously
 proposed patch is an extension of the machine-readable archive format,
 and has to remain so because of the expectation that people will want
 to use parallel restore with it.  Joel is arguing for a split-up of
 the text dump format.


Yes, exactly.

My patch is of course also a lot smaller :-)
pg_dump-directory.diff.: 112 853 bytes
pg-dump-split-plain-text-files-9.1devel.patch..:   5 579 bytes

I just tried the pg_dump-directory.diff patch.
The only thing is has in common with my patch is it writes data to different
files, and it's only the data which is splitted into different files, the
schema appears to go into the single file TOC.

Example, pg_dump-directory.diff:

$ ./pg_dump -f /crypt/dirpatch -F d -s glue
$ ls -la /crypt/dirpatch/
TOC
(1 file)

$ rm -rf /crypt/dirpatch

$ ./pg_dump -f /crypt/dirpatch -F d glue

$ ls /crypt/dirpatch/
6503.dat
6504.dat
...lots of files...
6871.dat
6872.dat
6873.dat
6874.dat
TOC

Example, pg_dump --split patch:

$ pg_dump -f /crypt/splitpatch -F p --split -s glue

$ ls /crypt/splitpatch*
/crypt/splitpatch (file)
/crypt/splitpatch-split: (directory)
myschema1
myschema2
public
$ ls /crypt/splitpatch-split/public/
AGGREGATE
CONSTRAINT
FK_CONSTRAINT
FUNCTION
INDEX
SEQUENCE
TABLE
TRIGGER
TYPE
VIEW

$ ls /crypt/splitpatch-split/public/FUNCTION/
myfunc.sql
otherfunc.sql

$ cat /crypt/splitpatch
--
-- PostgreSQL database dump
--

SET statement_timeout = 0;
SET client_encoding = 'UTF8';
SET standard_conforming_strings = off;
SET check_function_bodies = false;
SET client_min_messages = warning;
SET escape_string_warning = off;
...etc...
\i /crypt/splitpatch-split/public/FUNCTION/myfunc.sql
\i /crypt/splitpatch-split/public/FUNCTION/otherfunc.sql


-- 
Best regards,

Joel Jacobson
Glue Finance


Re: [HACKERS] pg_dump --split patch

2010-12-29 Thread Joel Jacobson
2010/12/29 Dimitri Fontaine dimi...@2ndquadrant.fr

 Please have a look at getddl:

  https://github.com/dimitri/getddl


Nice! Looks like a nifty tool.
When I tried it, ./getddl.py -f -F /crypt/funcs -d glue, I got the error
No such file or directory: 'sql/schemas.sql'.

While the task of splitting objects into separate files could be solved by
an external wrapper tool like yours around pg_dump,
I argue it makes more sense of putting the (minimal required) logics into
pg_dump, due to a number of reasons, most importantly because it's simplier
and less complex, thus less error prone.

My patch is only a few lines of code and doesn't add any logics to pg_dump,
it merely reroutes the fwrite() system calls based on the toc entries.

Just the fact you and others had to create own tools to do the splitting
shows the feature is important, which I think should be included in the
normal pg_dump tool.

Happy new year!

Joel Jacobson
Glue Finance


[HACKERS] contrib/snapshot

2010-12-31 Thread Joel Jacobson
Happy new year fellow pgsql-hackers!

This is the first alpha release of a new hopefully quite interesting little
tool, named snapshot.

Feedback welcomed.

-- 
Best regards,

Joel Jacobson
Glue Finance



URL

https://github.com/gluefinance/snapshot



DESCRIPTION

Take a snapshot or rollback all your stored procedures in your PostgreSQL
database.



RATIONALE

Before reading any further, ask yourselves the following questions.

1.  Have you ever,
a)  modified stored procedures in your production database and
b)  thought it went OK because all your tests passed and
c)  later on realized something is wrong and
d)  not being able to find nor fix the bug immediately
leaving you no other option than to do a revert?
If so, go to step 2.
If not, go to step 4.

2.  During the minutes/hours while your malfunctional patch made a mess
in the production database, was there any user activity causing
important
writes to the database?
If so, go to step 3.
If not, go to step 4.

3.  Did you enjoy the revert experience in step 1?
If so, go to step 4.
If not, go to step 5.

4. Are any of the following statements TRUE?
a) your application is not very database centric.
b) your users won't stop using your service if you lose their data.
c) your application is read-only.
d) your application does not have a lot of user traffic.
If so, lucky you!
If not, you probably have a good solution to my problem already,
I would highly appreciate if you wanted to share it with me,
please contact me at j...@gluefinance.com.

5.  This proposed solution might be interesting for you.
I would highly appreciate your feedback on how to improve it,
please contact me at j...@gluefinance.com.



INTRODUCTION

snapshot can take a snapshot of all your database functions and objects
depending on them, such as constraints and views using functions.

snapshot can rollback to a previous snapshot without modifying any of your
data or tables. It will only execute the minimum set of drop/create commands
to carry out the rollback.

snapshot depends on the pgcrypto contrib package.



TERMINOLOGY

object type objects of the same type are created and dropped the same
way,
i.e. they use the same functions to build proper create and
drop SQL-commands.

object  is of an object type and has a SHA1 hash of its content
consisting of two SQL-commands, one to create and another to
drop the object.

revisionhas a timestamp when it was created and a list of objects

snapshothas a timestamp when it was taken and has a revision

active snapshot the last snapshot taken

take snapshot   create a new revision of all objects currently live in the
database and then create a new snapshot if the revision
is different compared to the active snapshot.

rollbackrestores a previously taken snapshot



SYNOPSIS

-- 1. Take a snapshot.

postgres=# SELECT * FROM snapshot();
 _snapshotid | _revisionid
-+-
   1 |   1
(1 row)


-- 2. Take a snapshot.

postgres=# SELECT * FROM snapshot();
 _snapshotid | _revisionid
-+-
   1 |   1
(1 row)


-- 3. We notice nothing changed between step 1 and 2.


-- 4. Modify your functions.

postgres=# CREATE FUNCTION myfunc() RETURNS VOID AS $$ $$ LANGUAGE sql;
CREATE FUNCTION
glue=# \df myfunc
 List of functions
 Schema |  Name  | Result data type | Argument data types |  Type
++--+-+
 public | myfunc | void | | normal
(1 row)


-- 5. Take a snapshot.

postgres=# SELECT * FROM snapshot();
 _snapshotid | _revisionid
-+-
   2 |   2
(1 row)


-- 4. Rollback to snapshot 1.

postgres=# SELECT * FROM snapshot(1);
 _snapshotid | _revisionid
-+-
   3 |   1
(1 row)


-- 5. We notice the function we created in step 4 has been dropped.

postgres=# \df myfunc
   List of functions
 Schema | Name | Result data type | Argument data types | Type
+--+--+-+--
(0 rows)


-- 6. Rollback to snapshot 2.

postgres=# SELECT * FROM snapshot(2);
 _snapshotid | _revisionid
-+-
   4 |   2
(1 row)


-- 7. We notice the function we created in step 4 has been created.

postgres=# \df myfunc
 List of functions
 Schema |  Name  | Result data type | Argument data types |  Type
++--+-+
 public | myfunc | void

Re: [HACKERS] contrib/snapshot

2010-12-31 Thread Joel Jacobson
2010/12/31 David E. Wheeler da...@kineticode.com

 This looks awesome, Joel! One question: Why the dependence on pg_crypto? If
 it's just for SHA1 support, and you're just using it to to create hashes of
 function bodies, I suspect that you could also use the core MD5() function,
 yes?


Thanks for fast reply. My guests hate me becuase I had to escape from the
dinner party, but I simply couldn't wait a whole year fixing this bug.
Commit.

-- 
Best regards,

Joel Jacobson
Glue Finance


Re: [HACKERS] contrib/snapshot

2010-12-31 Thread Joel Jacobson
2010/12/31 Simon Riggs si...@2ndquadrant.com

 Please call it something other than snapshot. There's already about 3
 tools called something similar and a couple of different meanings of the
 term in the world of Postgres.


Thanks, good point.
Renamed to fsnapshot.
Commit.

-- 
Best regards,

Joel Jacobson
Glue Finance


Re: [HACKERS] contrib/snapshot

2011-01-01 Thread Joel Jacobson
2010/12/31 Simon Riggs si...@2ndquadrant.com

 Please call it something other than snapshot. There's already about 3
 tools called something similar and a couple of different meanings of the
 term in the world of Postgres.


Renamed the entire github project as well:
https://github.com/gluefinance/fsnapshot

-- 
Best regards,

Joel Jacobson
Glue Finance


Re: [HACKERS] contrib/snapshot

2011-01-02 Thread Joel Jacobson
2011/1/2 Jim Nasby j...@nasby.net

  Renamed to fsnapshot.

 Is it actually limited to functions? ISTM this concept would be valuable
 for anything that's not in pg_class (in other words, anything that doesn't
 have user data in it).


My ambition is to primarily support functions. Support for other object
types are merely a necessary side-effect of the function dependencies.

Is there a matrix of all possible object types dependencies?
If not, for functions, is the following list correct?
   Object types which may depend on functions: constraints, views, triggers,
any more?
   Functions may depend on: language, any more?

Instead of limiting the support to functions, perhaps it would make more
sense to limit it to all non-data objects?
Is there a term for the group of object types not carrying any user data?
Which object types do carry user data? I can only think of tables and
sequences, any other?



 Also, I'm not sure why this needs to be in contrib vs pgFoundry.


Good point. It's actually in neither of them right now, it's only at
github.com :) I merely used the prefix contrib/ in the subject line to
indicate it's not a patch to the core.

I do hope though it's possible to get a place for it in contrib/ at some
time in the future, I think there is a chance quite a lot of users would
appreciate a quicker, less error-prone way of handling these things.

This tool must be made extremely reliable, otherwise you won't feel safe
using it in a production environment for deployment and revert purposes,
which is my company's requirement.

I hope to achieve this by keeping a bare minimum approach to features, and
making sure it only fulfills the objective:
1. take a snapshot of all non-data objects
2. deploy code, test new code, or let time pass while other people make a
mess in your database
3. revert to previous snapshot without affecting any of the new data,
generated in step 2

I put my faith in the reliability on system functions, such
as pg_get_functiondef(), pg_get_viewdef() etc, to build proper create/drop
commands for each object.
Even nicer would be if the pg_catalog provided functions to generate SQL
create/drop commands for all non-data object types,
and to make sure _everything_ is included in the command, ensuring the
object is created exactly the same,
currently pg_get_functiondef() does not restore the ownership of the
function, which I had to append manually.

-- 
Best regards,

Joel Jacobson
Glue Finance


Re: [HACKERS] contrib/snapshot

2011-01-02 Thread Joel Jacobson
2011/1/3 Joel Jacobson j...@gluefinance.com

 2011/1/2 Jim Nasby j...@nasby.net

 Is it actually limited to functions? ISTM this concept would be valuable
 for anything that's not in pg_class (in other words, anything that doesn't
 have user data in it).


 Instead of limiting the support to functions, perhaps it would make more
 sense to limit it to all non-data objects?
 Is there a term for the group of object types not carrying any user data?


My bad, I see you already answered both my questions.
So, it does make sense, and the term for non-data object types is therefore
non-pg_class, non-class or perhaps non-relation objects?

-- 
Best regards,

Joel Jacobson
Glue Finance


Re: [HACKERS] contrib/snapshot

2011-01-03 Thread Joel Jacobson
2011/1/3 Andrew Dunstan and...@dunslane.net

 contrib in PostgreSQL means a module maintained by the backend
 developers.

But it's not clear to me that there is any particular reason why this should
 be in contrib.


Then I definitively think contrib is the only possible place for this
module.
If the module will not be maintained by the backend developers, noone
(including myself) will trust the module to perform the sensistive tasks in
a mission critical production database.
Since the module depends on pg_catalog system tables, it's must be updated
if the they would change in future versions of PostgreSQL, and I wouldn't
trust any other team than the backend developers to do it.

I'm happy to continue hacking on the module until it's 100% working,
stable, thoroughly tested and accepted by the backend developers.
It's not working 100% yet, for instance, I'm currently working on making
sure objects are created/dropped in an order not breaking any dependencies.

-- 
Best regards,

Joel Jacobson
Glue Finance


Re: [HACKERS] pg_dump --split patch

2011-01-03 Thread Joel Jacobson
2011/1/3 Robert Haas robertmh...@gmail.com:
 will become confusing for users and hard for us to maintain.  We're
 going to need to agree on something that won't be perfect for
 everyone, but will hopefully be a sufficient improvement for enough
 people to be worth doing.

Good point.
I think we can at least agree the bare minimum is splitting per
namespace, object type and name.

 On the specific issue of overloaded functions, I have a feeling that
 the only feasible option is going to be to put them all in the same
 file.  If you put them in different files, the names will either be
 very long (because they'll have to include the argument types) or
 fairly incomprehensible (if you did something like hash the argument
 types and append 8 hex digits to the function name) or not all that
 static (if you use OIDs; or if you number them sequentially, like
 foo1.sql, foo2.sql, foo3.sql, then foo3.sql might end up as foo2.sql
 on a system where there are only two variants of foo, making diff not
 work very well).

I agree.
Even if the overloaded functions are not written in the same order,
you will quickly and easily note function(s) of this particular name
has been changed, which should narrow down your
mind-mapping-change-grasping-exercise quite a lot.

-- 
Best regards,

Joel Jacobson
Glue Finance

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump --split patch

2011-01-03 Thread Joel Jacobson
2011/1/3 Tom Lane t...@sss.pgh.pa.us:
 pg_dump from dumping objects in a consistent order ... and once you do
 that, you don't need this patch.
 Yeah, that's exactly it.  I can think of some possible uses for
 splitting up pg_dump output, but frankly to ease diff-ing is not
 one of them.  For that problem, it's nothing but a crude kluge that
 only sort-of helps.  If we're to get anywhere on this, we need a
 better-defined problem statement that everyone can agree is worth
 solving and is well solved with this particular approach.

The problem statement is only partly diffing, I think the two major
other problems with one-single-giant-schema-file is:

1. Automatically version controlling your production database schema
using a file based approach is not possible.
Splitting the schema would mean it's very simple to setup a cronjob
which automatically commits the schema changes every night. You would
be able to follow the changes by simply looking at the vcs log,
instead of grepping the log files and trying to figure out what
changed.

2. Splitting is a single option which reduces the need for any other
imaginable exclude/include options, such as only export these
functions or do not export views etc. Tables are of course the most
common thing you want to include/exclude in a dump, but there are
quite a lot of different object types, however, no single object type
is important enough to motivate a new pg_dump option to allow a
exclusion/inclusion option, but I think the sum of the need for such
an option for all object types is greather than the extra complexity
introduced by a patch consisting of only a few lines of code.

-- 
Best regards,

Joel Jacobson
Glue Finance

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dump --split patch

2011-01-03 Thread Joel Jacobson
 Robert Haas robertmh...@gmail.com writes:
 I have to admit I'm a bit unsold on the approach as well.  It seems
 like you could write a short Perl script which would transform a text
 format dump into the proposed format pretty easily, and if you did
 that and published the script, then the next poor shmuck who had the
 same problem could either use the script as-is or hack it up to meet
 some slightly different set of requirements.  Or maybe you'd be better

That's actually what I first did a couple of months ago,
https://github.com/gluefinance/parse_schema/blob/master/parse_schema.pl

My major concern of parsing the schema file is I would never fully
trust the output from the script, even if the regex is extremely
paranoid and really strict, there is still a risk it contains a bug.

If you cannot trust the output from the schema parse script, it's not
safe to use it to do a partial restore of objects.
Let's say you want to restore only a few functions from your backup
schema. Without --split, you would need to restore them manually
somehow. With the --split option, you could simply restore them from
the indivudual files, at least functions where only the source code
has been modified and not the arguments.

2011/1/3 Tom Lane t...@sss.pgh.pa.us:
 The basic objection that I have to this patch is that it proposes to
 institutionalize a pretty dumb implementation.  And, as you mentioned,

What's dumb about it? It's simple and it works.
Please give me an idea of what a smart implementation would be, and I
will see if I can rewrite the patch.

 once it's in there it'll be more or less set in stone because we aren't
 going to want to support umpteen variants.

Yes, that's why it should be a bare minimum solution to the problem,
which is a lot better than no solution at all.

What other variants than the discussion on the path name structure can
you think of?

 I like the idea of a postprocessing script a lot better --- it seems
 like it wouldn't get in the way of people making their own variants.
 And as you say it'd likely be pretty trivial to do.

I'm quite sure such a postprocessing script has been written hundreds
of times over the past years by different postgres users, not even
submitting a question to the mailing list, since it's a quite
managable task for anyone with some regex and scripting experience.

Why not just provide a simple bare minimum solution to the problem
and let the remaining x % of the users who have a very specific need
write their own specialized script if they need to? It would save a
lot of time for all future users who need to write their own script to
solve the problem or google until they find my script or any other
script solving the same problem. Multiple almost identical solutions
to exactly the same problem is kind of time waste.

-- 
Best regards,

Joel Jacobson
Glue Finance

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Implementing pg_dump_sort.c topological sorting in sql/plpgsql/plperl?

2011-01-04 Thread Joel Jacobson
Hi hackers,

The project I'm currently working with fsnapshot[1], is written in
plain plpgsql and I need to sort all the oids in their
creatable/droppable order.
This has already been properly implemented in pg_dump_sort.c using
Knuth's algorithm for topological sorting, with some special magic to
find and break dependency loops.

It's not possible to use a plain recursive query to do the trick (due
to 'i' bidirectional dependencies and dependency loops).

I need a general approach, only making use of pg_depend.

The function should take no input arguments and the output argument
should be oid[], containing a list of the oids in a
creatable/droppable or order.
It doesn't matter if it is left-to-right, least number of edges first.
Any valid topological sort will do.

I'm sure it's possible to implement it in plpgsql or plperl, but I
wanted to check first if anyone has already made such a function to
hopefully save some time?

Thanks a lot!

[1] https://github.com/gluefinance/fsnapshot

-- 
Best regards,

Joel Jacobson
Glue Finance

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pov and tsort

2011-01-06 Thread Joel Jacobson
();
  tsort
--

 DESCRIPTION
 tsort - return a tree's nodes in topological order


 SYNOPSIS
 tsort(edges text, delimiter text, debug integer, algorithm text,
 selection text, operator text, nodes text, direction text);

 OUTPUT
 nodes   text[]  Array of nodes in topologically sorted order

 INPUT PARAMETERS
 Parameter   TypeRegex Description
 

 edges   text^.+$Node pairs, separated by [delimiter].

 delimiter   text^.*$Node separator in [edges],
 default is ' ', i.e. single blank space.

 debug   integer Print debug information using RAISE DEBUG:
 0   no debug (default)
 1   some debug
 2   verbose debug

 algorithm   textSorting algorithm:
 DFS depth-first (default)
 explores as far as possible along each
 branch before backtracking.
 BFS breadth-first
 explores all the neighboring nodes,
 then for each of those nearest nodes,
 it explores their unexplored neighbor
 nodes, and so on.
 ^subsort using perl subroutine.
 examples:
 # sort numerically ascending
 sub {$a = $b}

 # sort numerically descending
 sub {$b = $a}

 #sort lexically ascending
 sub {$a cmp $b}

 # sort lexically descending
 sub {$b cmp $a}

 # sort case-insensitively
 sub {uc($a) cmp uc($b)}

 For more examples, please goto:
 http://perldoc.perl.org/functions/sort.html

 The following options will not affect the order of the nodes in the result,
 they only control which nodes are included in the result:

 Parameter   TypeRegex Description
 

 selection   textSelection of nodes, used by [operator]:
 ALL select all nodes (default)
 ISOLATEDselect nodes without any
 successors nor predecessors
 SOURCE  select nodes with successors
 but no predecessors
 SINKselect nodes with predecessors
 but no successors
 CONN_INCL   select nodes connected to [nodes],
 including [nodes]
 CONN_EXCL   select nodes connected to [nodes],
 excluding [nodes]
 separated by [delimiter]

 operatortextInclude or exclude nodes in [selection]:
 INCLUDE include nodes (default)
 EXCLUDE exclude nodes

 The following options are only applicable if,
 [selection] is CONN_INCL or CONN_EXCL

 Parameter   TypeRegex Description
 


 nodes   textselect nodes connected to [nodes]
 NULLnot applicable (default)
 [nodes] the start nodes, separated by [delimiter]


 direction   textdirection to look for connected nodes
 BOTHtraverse both successors and
 predecessors (default)
 UP  only traverse predecessors
 DOWNonly traverse successors








-- 
Best regards,

Joel Jacobson
Glue Finance

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] obj_unique_identifier(oid)

2011-01-07 Thread Joel Jacobson
Hi all!

I was a bit frustrated there was no function to generate a unique
identifier for any oid.
Instead of complaining, I decided to automate the process as far as possible. :)

The result is a simple perl function to automatically generate a
function for each regclass able to generate a unique text identifier.

The function obj_unique_identifier(oid) will return a unique name for _any_ oid.

I have looked at the unique constraints for each system_catalog to
make sure all identifiers are unique.

Source code:
perl script to generate .sql file:
https://github.com/gluefinance/pov/blob/master/sql/schema/pov/functions/obj_unique_identifier.pl
output from perl script:
https://github.com/gluefinance/pov/blob/master/sql/schema/pov/functions/obj_unique_identifier.sql

I would highly appreicate feedback on the structure of the identifier.
It must be composed in a way which will guarantee uniqueness.

Example:

glue=# select obj_unique_identifier(refobjid) from pg_depend order by
random() limit 10;
   obj_unique_identifier

 pg_proc.pg_catalog.iso8859_1_to_utf8(integer, integer, cstring,
internal, integer)
 pg_operator.pg_catalog.float8.pg_catalog.float8.pg_catalog.-
 pg_operator.pg_catalog.money.pg_catalog.int4.pg_catalog.*
 pg_amproc.gin.pg_catalog.array_ops.pg_catalog._time.pg_catalog._time.4
 pg_operator.pg_catalog.int2.pg_catalog.int4.pg_catalog.-
 pg_class.pg_catalog.pg_statio_sys_sequences
 pg_amproc.gin.pg_catalog.array_ops.pg_catalog._bool.pg_catalog._bool.1
 pg_class.pg_catalog.pg_stat_all_indexes
 pg_class.pg_catalog.pg_type
 pg_proc.pg_catalog.pg_stat_get_function_time(oid)
(10 rows)



-- 
Best regards,

Joel Jacobson
Glue Finance

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] obj_unique_identifier(oid)

2011-01-07 Thread Joel Jacobson
Sent from my iPhone

On 7 jan 2011, at 20:46, Tom Lane t...@sss.pgh.pa.us wrote:

 Joel Jacobson j...@gluefinance.com writes:
 The function obj_unique_identifier(oid) will return a unique name for _any_ 
 oid.

 Surely this is broken by design?  You can *not* assume that the same OID
 isn't in use for different things in different system catalogs.  They're
 only guaranteed unique within a catalog.  That's the main reason why
 pg_depend has to include the classid.

regards, tom lane

Correct. That is why the regclass name (classid) is included in the unique name.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] obj_unique_identifier(oid)

2011-01-07 Thread Joel Jacobson
The function should take both classid and oid as input. I'll fix.

Sent from my iPhone

On 7 jan 2011, at 20:59, Joel Jacobson j...@gluefinance.com wrote:

 Sent from my iPhone

 On 7 jan 2011, at 20:46, Tom Lane t...@sss.pgh.pa.us wrote:

 Joel Jacobson j...@gluefinance.com writes:
 The function obj_unique_identifier(oid) will return a unique name for _any_ 
 oid.

 Surely this is broken by design?  You can *not* assume that the same OID
 isn't in use for different things in different system catalogs.  They're
 only guaranteed unique within a catalog.  That's the main reason why
 pg_depend has to include the classid.

   regards, tom lane

 Correct. That is why the regclass name (classid) is included in the unique 
 name.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] obj_unique_identifier(oid)

2011-01-07 Thread Joel Jacobson
2011/1/7 Jim Nasby j...@nasby.net:
 BTW, if you're looking at making pg_depnd easier to use, see 
 http://archives.postgresql.org/message-id/129774-sup-2...@alvh.no-ip.org

I guess there are more than one ways to do it, C, sql, plperl, plpgsql. :)
I guess at least one of the methods should be provided in the vanilla distro. :)

-- 
Best regards,

Joel Jacobson
Glue Finance

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] obj_unique_identifier(oid)

2011-01-08 Thread Joel Jacobson
2011/1/8 Dimitri Fontaine dimi...@2ndquadrant.fr:
  http://git.postgresql.org/gitweb?p=postgresql.git;a=commitdiff;h=6cc2deb86e9183262493a6537700ee305fb3e096

Nice! Has the patch been accepted and will be made available in future
versions of pg?

Also, why return NULL for pinned objects? They can also be described
using a unique identifier.
(+   /* for pinned items in pg_depend, return null */)

It is useful to describe such objects to be able to diff different
versions of pg, i.e. comparing which pinned objects exists, doing so
can tell you the odds for an application depending on certain pinned
objects being compatible with a specific version of the database.

-- 
Best regards,

Joel Jacobson
Glue Finance

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] obj_unique_identifier(oid)

2011-01-08 Thread Joel Jacobson
2011/1/8 Magnus Hagander mag...@hagander.net:
 Yes. Once things are committed to the main repository, they are only
 backed out if someone finds a major issue with them that is not
 fixable (ina reasonable timeframe). That almost never happens. We
 don't keep unapproved patches or development branches in the main
 repository - those are all in the personal repositories of the
 developers.

Thanks for clarifying.

I found a bug in the pg_catalog.pg_describe_object function.

The query below should not return any rows, because if it does, then
there are oids with non-unique descriptions.
While the description is good enough for a human to interpret, it
cannot be used in an application as a unique identifier unless it is
really unique.

WITH
all_objects AS (
SELECT classid, objid, objsubid FROM pg_depend
UNION
SELECT refclassid, refobjid, refobjsubid FROM pg_depend
)
SELECT pg_catalog.pg_describe_object(classid,objid,objsubid)
FROM all_objects
GROUP BY pg_catalog.pg_describe_object(classid,objid,objsubid)
HAVING COUNT(*)  1


pg_describe_object

 function 2 ginarrayextract(anyarray,internal) of operator family
array_ops for access method gin
 function 4 
ginarrayconsistent(internal,smallint,anyarray,integer,internal,internal)
of operator family array_ops for access method gin
 function 3 ginqueryarrayextract(anyarray,internal,smallint,internal,internal)
of operator family array_ops for access method gin
 function 1 network_cmp(inet,inet) of operator family array_ops for
access method gin
 function 1 bttextcmp(text,text) of operator family array_ops for
access method gin
(5 rows)

There are 94 objects such objects:

 classid | objid | objsubid |
obj_unique_identifier |
   pg_describe_object
-+---+--+--+
2603 | 10606 |0 |
pg_amproc.gin.pg_catalog.array_ops.pg_catalog._text.pg_catalog._text.1
  | function 1 bttextcmp(text,text) of operator family
array_ops for access method gin
2603 | 10610 |0 |
pg_amproc.gin.pg_catalog.array_ops.pg_catalog._varchar.pg_catalog._varchar.1
| function 1 bttextcmp(text,text) of operator family array_ops
for access method gin
2603 | 10650 |0 |
pg_amproc.gin.pg_catalog.array_ops.pg_catalog._inet.pg_catalog._inet.1
  | function 1 network_cmp(inet,inet) of operator family
array_ops for access method gin
2603 | 10654 |0 |
pg_amproc.gin.pg_catalog.array_ops.pg_catalog._cidr.pg_catalog._cidr.1
  | function 1 network_cmp(inet,inet) of operator family
array_ops for access method gin
2603 | 10631 |0 |
pg_amproc.gin.pg_catalog.array_ops.pg_catalog._bytea.pg_catalog._bytea.2
| function 2 ginarrayextract(anyarray,internal) of
operator family array_ops for access method gin
2603 | 10671 |0 |
pg_amproc.gin.pg_catalog.array_ops.pg_catalog._macaddr.pg_catalog._macaddr.2
| function 2 ginarrayextract(anyarray,internal) of operator
family array_ops for access method gin
2603 | 10667 |0 |
pg_amproc.gin.pg_catalog.array_ops.pg_catalog._interval.pg_catalog._interval.2
  | function 2 ginarrayextract(anyarray,internal) of operator
family array_ops for access method gin
2603 | 10675 |0 |
pg_amproc.gin.pg_catalog.array_ops.pg_catalog._name.pg_catalog._name.2
  | function 2 ginarrayextract(anyarray,internal) of
operator family array_ops for access method gin
2603 | 10719 |0 |
pg_amproc.gin.pg_catalog.array_ops.pg_catalog._tinterval.pg_catalog._tinterval.2
| function 2 ginarrayextract(anyarray,internal) of operator family
array_ops for access method gin
2603 | 10607 |0 |
pg_amproc.gin.pg_catalog.array_ops.pg_catalog._text.pg_catalog._text.2
  | function 2 ginarrayextract(anyarray,internal) of
operator family array_ops for access method gin
2603 | 10611 |0 |
pg_amproc.gin.pg_catalog.array_ops.pg_catalog._varchar.pg_catalog._varchar.2
| function 2 ginarrayextract(anyarray,internal) of operator
family array_ops for access method gin
2603 | 10655 |0 |
pg_amproc.gin.pg_catalog.array_ops.pg_catalog._cidr.pg_catalog._cidr.2
  | function 2 ginarrayextract(anyarray,internal) of
operator family array_ops for access method gin
2603 | 10707 |0 |
pg_amproc.gin.pg_catalog.array_ops.pg_catalog._timestamp.pg_catalog._timestamp.2
| function 2 ginarrayextract(anyarray,internal) of operator family
array_ops for access method gin
2603 | 10711 |0 |

Re: [HACKERS] obj_unique_identifier(oid)

2011-01-08 Thread Joel Jacobson
2011/1/8 Robert Haas robertmh...@gmail.com:
 I don't think your analysis is correct.  Each entry in pg_depend
 represents the fact that one object depends on another object, and an
 object could easily depend on more than one other object, or be
 depended upon by more than one other object, or depend on one object
 and be depended on by another.

What does that have to do with this?

Two different oids represents two different objects, right?
Two different objects should have two different descriptions, right?
Otherwise I cannot see how one can argue the description being unique.

The pg_describe_object returns unique descriptions for all object
types, except for the 5 types I unexpectedly found.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] obj_unique_identifier(oid)

2011-01-08 Thread Joel Jacobson
2011/1/9 Andreas Karlsson andr...@proxel.se:
 Here is a patch, but I am not sure I am not sure if I like my idea for
 format. What do you think?

 SELECT pg_describe_object('pg_amproc'::regclass,oid,0)
    FROM pg_amproc WHERE oid IN (10608,10612);
                                                        pg_describe_object
 --
  function 1 bttextcmp(text,text) of operator family array_ops for access 
 method gin for (text[],text[])
  function 1 bttextcmp(text,text) of operator family array_ops for access 
 method gin for (character varying[],character varying[])
 (2 rows)

Looks great! Many thanks for fixing the bug!


 Andreas





-- 
Best regards,

Joel Jacobson
Glue Finance

E: j...@gluefinance.com
T: +46 70 360 38 01

Postal address:
Glue Finance AB
Box  549
114 11  Stockholm
Sweden

Visiting address:
Glue Finance AB
Birger Jarlsgatan 14
114 34 Stockholm
Sweden

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] obj_unique_identifier(oid)

2011-01-08 Thread Joel Jacobson
2011/1/9 Robert Haas robertmh...@gmail.com:
 Oops.  I misread your query.  I thought the duplicates were because
 you were feeding pg_describe_object the same classoid, objoid,
 objsubid pair more than once, but I see now that's not the case (UNION
 != UNION ALL).

Ah, I see, yes, the query should actually be UNION, it would produce
the same result, but perhaps it would be a bit faster.

-- 
Best regards,

Joel Jacobson
Glue Finance

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug in pg_describe_object (was: Re: [HACKERS] obj_unique_identifier(oid))

2011-01-10 Thread Joel Jacobson
2011/1/10 Andreas Karlsson andr...@proxel.se:
 Here is the bug-fix patch again with a description of the context so I
 can add it to the commit fest.

Many thanks for fixing the bug!

I also implemented the pg_describe_object in pure SQL, for those of us
who have not yet switched to PostgreSQL 9 in the production. Very
helpful function indeed!

https://github.com/gluefinance/pov/blob/master/sql/schema/pov/functions/pg_describe_object.sql

-- 
Best regards,

Joel Jacobson
Glue Finance

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug in pg_describe_object (was: Re: [HACKERS] obj_unique_identifier(oid))

2011-01-10 Thread Joel Jacobson
2011/1/10 Tom Lane t...@sss.pgh.pa.us:
 There was never any intention that that code produce a guaranteed-unique
 identifier; it's only meant to be a humanly useful identifer, and this
 patch seems to me to mostly add noise.

For all objects, except for these pg_amproc regclass, the function
does already generate unique strings. They are guaranteed to be unique
thanks to every component of the unique constraints in alll pg_*
tables are included in the unique text identifier.

It makes a lot more sense to fix the function to return a unique
string also for pg_amproc, than to introduce a entirely new function
which returns a unique string identifier. It would hardly break
anything and I think you exaggerate the noise factor.

I can think of numerous reasons why it is absolutely necessary to
provide a function generating unique identifiers for objects:

a) To allow comparing all objects in two different databases, by
comparing objects with the same identifier. This cannot be done using
the oids, since they naturally differ between databases.

b) To draw nice human readable digraphs in the .dot format , instead
of drawing relations digraphs of classid.objid.subobjid.

c) OIDs are probably misused in a lot of applications, due to
misunderstandings of what they are and not are, I for one didn't know
they are not necessarily unique, but only within their regclass. It
would be better to encourage users to use a text string if they need
to refer to a unique objects in their application, than to force them
to use OIDs (or in combination with the regclass, almost as bad), in
lack of something better.
While you could build your own query to generate a unique string,
based on all the columns defining the unique constraint for each
class, doing so is very cumbersome and requires a lot of
postgres-guru-knowledge.

I think it would be a big improvement and increase the number of
possible use cases of the existing pg_describe_object function if the
documentation would say the returned text is guaranteed to be unique
for each object.

-- 
Best regards,

Joel Jacobson
Glue Finance

E: j...@gluefinance.com
T: +46 70 360 38 01

Postal address:
Glue Finance AB
Box  549
114 11  Stockholm
Sweden

Visiting address:
Glue Finance AB
Birger Jarlsgatan 14
114 34 Stockholm
Sweden

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Add function dependencies

2011-01-10 Thread Joel Jacobson
When a function is created, the system validates the syntax and
complains if any function the created function attempts to call is
missing.
I think this is really good, since it traps typos and warns you if you
have forgotten to install any functions your function depends on.

It would be equally useful if it warned you when trying to drop a
function other functions might depend on.

Currently, I do something like SELECT * FROM pg_proc WHERE prosrc LIKE
'%myfunc%' to verify nothing is using the function I'm about to drop.

Just like you can disable the creation check by setting
check_function_bodies to false,
I would suggest a similar option to disable the check upon dropping
functions, to disable the suggested dependency check.

Additionally, if pg_depend would reveal function dependencies, it
would be trivial to automatically generate function call digraphs in
.dot format, showing a nice call tree of your entire system.

-- 
Best regards,

Joel Jacobson
Glue Finance

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug in pg_describe_object

2011-01-10 Thread Joel Jacobson
2011/1/11 Tom Lane t...@sss.pgh.pa.us:
 It would make dependency error messages significantly longer and less
 readable.  Quite aside from the point at hand here, we elide schema
 names in many cases (and it looks like there are some code paths where
 getObjectDescription never bothers to print them at all).  Another issue
 that might make it interesting to try to use the output for purposes
 other than human-readable descriptions is that we localize all the
 phrases involved.

 My point is that this isn't a bug fix, it's more like moving the
 goalposts on what getObjectDescription is supposed to do.  And I'm not
 even very sure where they're being moved to.  I haven't seen a
 specification for an intended use of pg_describe_object for which its
 existing behavior would be unsatisfactory.

Thanks for some good arguments. I now agree with you it would be a bit
counter productive to change the existing pg_describe_object.
Due to the localization of the phrases and the lack of mandatory
namespace inclusion, you lose the comparison ability anyway.

I instead propose we introduce a new function named
pg_get_object_unique_identifier( classid oid, objid oid, objsubid
integer ) returns text.

The name would make sense since we already have a
pg_get_function_identity_arguments( func_oid ), for a similar purpose
but solely for functions.

-- 
Best regards,

Joel Jacobson
Glue Finance

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug in pg_describe_object

2011-01-11 Thread Joel Jacobson
2011/1/11 Robert Haas robertmh...@gmail.com:
 I don't get it.  If two different items that exist in the system out
 of the box have the same description, it seems clear that relevant
 piece of disambiguating information exists nowhere in the description
 string.

I guess it is a question of prioritization.
If backwards compatibility is to be guaranteed, even for functions
returning text intended to be read by humans, then the function cannot
be modified, without violating that golden rule, if such a rule exists
within the PostgreSQL development project?

If it's not a golden rule, then it's a totally different story and
there is no excuse why it should return the same descriptions for the
same objects.
Any other reasoning is just silly.

-- 
Best regards,

Joel Jacobson
Glue Finance

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug in pg_describe_object

2011-01-11 Thread Joel Jacobson
2011/1/11 Tom Lane t...@sss.pgh.pa.us:
 Seems like concatenating the OIDs would accomplish that.  (If you
 think not, well, you still haven't explained what problem you're trying
 to solve...)

The can be different in two different databases sharing the same
original schema, but of two different versions.
In this case it is better to compare textual strings describing the
objects than to compare based on oids.

-- 
Best regards,

Joel Jacobson
Glue Finance

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pg_depend explained

2011-01-11 Thread Joel Jacobson
Has anyone written a in-depth description on how to traverse the pg_depend tree?
The 'a' and 'i' deptype really makes it hard to figure out the
dependency order, a topological sort does not work.

My latest attempt involved trying to group by all objects connected to
each other via deptype 'a' or 'i', and replacing all such nodes in the
tree with the source node, i.e. the node which according to the
topological order could be created first.

Am I on the right path, trying to fuse the internal/auto objects
together, replacing them with the top most object in the tree?
Or is there a simplier way to figure out the order in which objects
can be created?

I need a general solution, not a custom-made query for each regclass,
which is quite trivial but feels far from bullet proof, I want
something only relying on pg_depend, since it should be the safest
method of them all.

-- 
Best regards,

Joel Jacobson
Glue Finance

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_depend explained

2011-01-11 Thread Joel Jacobson
2011/1/11 Tom Lane t...@sss.pgh.pa.us:
 Try reading the code in src/backend/catalog/dependency.c.

I've tried but failed to figure it out anyway. The focus in
dependency.c is to find out dependencies of a given object.
What I want to do is something slighly different.
I need to figure out the order of creation of all objects, not just
the dependencies for a single object.

Basically, I want to do order by oid, but since you cannot trust the
oid order (like we did in pre-7.3), I need to get the sorted list
using pg_depend somehow.

I guess I could make findDependentObjects() callable from sql and call
it for each and every object, but that's a quite big project, I was
hoping it was possible to do it in plain sql, or at least only by
relying on plpgsql/plperl.

I've implemented tsort() in plperl, but that didn't really help me
much, since you need to jump around in the tree when you encounter
'internal' and 'auto' nodes.

My last resort is to sort by oid, but I really don't want to do that
since it would render the entire solution unreliable and I would never
feel comfortable using it in the production environment.

This is the last component I need in order to complete the work on the
pov-project http://www.github.com/gluefinance/pov

I would highly appreciate your help, I feel a bit lame since I've
spent over two days working on this. It's not difficult if you are
allowed to build specialized queries for each class, but my goal is a
general query only relying on pg_depend.

-- 
Best regards,

Joel Jacobson
Glue Finance

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_depend explained

2011-01-11 Thread Joel Jacobson
2011/1/11 Florian Pflug f...@phlo.org:
 Could you give an example of the kind of trouble you're experiencing trying
 to use a topological sort?

Let's say you have a table t and a view v.
The view v is defined as select * from t;
If we put all objects in a tree, with the public schema as the root,
both v and t will directly under the root, but in reality, v cannot be
created before t.
This is the reason why a normal topological sort doesn't work.
You have to look at the deptype and sort nodes having internal edges
between them differently.
The pg_dump source code of course contains all the logic necessary to
do the trick, but it's not that easy to follow.

I guess it's time for plan B, sorting based on oid, no biggie, it will
work for my purpose, but it's damn ugly.

-- 
Best regards,

Joel Jacobson
Glue Finance

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_depend explained

2011-01-12 Thread Joel Jacobson
2011/1/12 Florian Pflug f...@phlo.org:
 I suggest you try to node-folding strategy and see how far it gets you.

Good suggestion! :-) That's exactly what I've been trying to do, but
failed miserably :-(

I have written a thorough description of my problem and put it on my github:

https://github.com/gluefinance/pov/tree/master/doc

In the example, pg_depend.sql, we want to reach the following
conclusions in an algorithmic way, only by using pg_depend as in-data:

table t1, function f1, sequence s1 have no dependencies, other than
the public schema
table t2 depends on t1 and table t3 depends on f1
view v1 depends on t1 and view v2 depends on t2
view v3 depends on both v1 and v2
view v4 depends on v3 and f1

The topological tree automatically generated from the data in
pg_depend is presented in pg_depend.dot, pg_depend.svg and
pg_depend.png.

The actual topological tree (possible order of creation) created
manually is presented in pg_depend_actual.dot, pg_depend_actual.svg
and pg_depend_actual.png.

The automatically created objects, such as primary key indexes,
constraints and triggers, have been ignored in this graph, as they are
implicitly created when creating the base objects.

Objective:

Define a general algorithm taking ONLY the pg_depend data as input,
generating a valid topological directional graph, including at least
the nodes in pg_depend_actual.dot, but might as well include all
nodes, although it is not necessary, it's certainly won't hurt.

It will be necessary to look not only at the nodes (objects) and edges
(obj-refobj), but also the deptype for each edge.

List of files:

pg_depend.sql : A small but sufficient database model of tables,
sequences, functions and views.
pg_depend.dot : Digraph in DOT language (plaintext format), generated
automatically only by using data from pg_data.
pg_depend.png : PNG generated using GraphViz with pg_depend.dot as input
pg_depend.svg : SVG generated using GraphViz with pg_depend.dot as input

pg_depend_actual.dot : Digraph in DOT language (plaintext format),
generated manually, shows the actual possible order of creation, only
including the base objects.
pg_depend_actual.png : PNG generated using GraphViz with
pg_depend_actual.dot as input
pg_depend_actual.svg : SVG generated using GraphViz with
pg_depend_actual.dot as input


-- 
Best regards,

Joel Jacobson
Glue Finance

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_depend explained

2011-01-12 Thread Joel Jacobson
2011/1/12 Alvaro Herrera alvhe...@commandprompt.com:
 FWIW this idea fails when you consider stuff such as circular foreign
 keys (and I suppose there are other, more common cases).  If you really
 want something general you need to break those apart.  (This is the
 explanation for the “break the loop” code in pg_dump I imagine)

Good point.
Yes, the algorithm must be able to break the loop, i.e. remove the
edges causing the loops.
This has already been properly solved in pg_dump, but I frankly don't
really understand exactly how this is done, at least not the general
rule on how to do it, even after spending hours studying the source
code.

Also, circular dependencies seems impossible for some object classes,
such as functions, views, constraints and triggers. None of them can
possibly refer to them self. If you only need to create/drop objects
of these classes (like in my case within the pov-project), it's not
important to break the loop in a clever way, i.e. simply removing
any of the edges, not necessarily the best suited one, will do for my
purpose.

I have updated the example with two circular relations:

-- Circular dependencies:
CREATE TABLE tselfref  ( id int not null PRIMARY KEY, parentid int not
null REFERENCES tselfref(id) );
CREATE TABLE tcircular ( id int not null PRIMARY KEY, id2  int not
null REFERENCES tselfref(id) );
ALTER TABLE  tselfref ADD COLUMN id2 int not null REFERENCES tcircular ( id );

I have also updated pd_depend.[sql|dot|png|svg] and
pg_depend_actual.[dot|png|svg] with the circular references.

The dotted edges in pg_depend_actual are the edges which must be
removed to break the loop.

Any ideas on how to design an algorithm to transform the digraph
pg_depend into pg_depend_actual are highly appreciated.

-- 
Best regards,

Joel Jacobson
Glue Finance

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_depend explained

2011-01-12 Thread Joel Jacobson
2011/1/12 Tom Lane t...@sss.pgh.pa.us:
 This isn't particularly *useful*, maybe, but it's hardly impossible.
 And if we analyzed function dependencies in any detail, circular
 dependencies among functions would be possible (and useful).

Thanks Tom for clarifying, this makes me even more motivated into
implementing the creation order-algorithm using only sql/plpgsql and
pg_depend.
If you have any ideas on how to do this, in addition to reading the
dependency.c and pg_dump_sort.c source code, they would be highly
appreciated.

Any tips on articles on graph algorithms which not only take edges
(node-node) as indata, but also a edge type as indata (i.e. the
deptype in pg_depend) would also be very useful. I have only found
algorithms to do sorting on normal directional graphs, where all
edges are threated the same.

-- 
Best regards,

Joel Jacobson
Glue Finance

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_depend explained

2011-01-12 Thread Joel Jacobson
(sorry for top posting, iPhone + drunk)

pg_depend_before is a select * from pg_depend before creating the test db model

Sent from my iPhone

On 12 jan 2011, at 20:36, David Fetter da...@fetter.org wrote:

 On Wed, Jan 12, 2011 at 08:06:24PM +0100, Joel Jacobson wrote:
 2011/1/12 Tom Lane t...@sss.pgh.pa.us:
 I've sometimes found it useful to think of internal dependencies as
 acting like normal dependencies pointing in the other direction.
 I'm not sure that would do much to solve your problem, but it might
 be worth trying.

 Tom, you are a genious! No, seriously, I mean it, this is awesome, it
 worked! YES! You totally saved my day! Thank you! Finally! I'm so
 happy! :-) :-) :-)

 This was the little piece of code:

 CASE WHEN DepType ~ '^(a|ni|in|an|na)$' THEN
--- Swap edges
 ELSE
-- Do not swap edges
 END

 Look at the attached svg graph how beautiful the automatically
 generated graph look like now! :-)

 The tsort of the objects now sort all the normal objects in a creatable 
 order!

 Here is the result of the tsort (only including the normal objects
 (the one I care about (I don't have to create the internal/auto
 objects, nor drop them))):

 The query below can both produce a DOT-format graph and a tsort of the
 creatable order of objects:

 WITH
 NewObjectOids AS (
SELECT * FROM pg_depend WHERE deptype  'p'
EXCEPT
SELECT * FROM pg_depend_before

 To what does pg_depend_before refer?

 Cheers,
 David.
 --
 David Fetter da...@fetter.org http://fetter.org/
 Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
 Skype: davidfetter  XMPP: david.fet...@gmail.com
 iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

 Remember to vote!
 Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Bug in pg_dump

2011-01-13 Thread Joel Jacobson
The example from Tom Lane below results in a database which is not
possible to correctly dump using pg_dump.

The view v1 strangely becomes a table in the dump output?!

It's probably a quite useless database to dump in the first place, but
that is no excuse to generate an invalid dump, it would be better to
throw an exception and complain about your database is retarded,
refusing to dump or something like that.

regression=# \d
List of relations
 Schema | Name | Type  |  Owner
+--+---+--
 public | tt   | table | postgres
 public | v1   | view  | postgres
 public | v2   | view  | postgres
(3 rows)

ubuntu@ubuntu:/crypt/postgresql-9.1alpha3/src/bin/pg_dump$ ./pg_dump
regression | grep -v -E '^--' | grep -E '^.+$' | grep -v SET
CREATE OR REPLACE PROCEDURAL LANGUAGE plpgsql;
ALTER PROCEDURAL LANGUAGE plpgsql OWNER TO ubuntu;
CREATE TABLE tt (
f1 integer,
f2 integer
);
ALTER TABLE public.tt OWNER TO postgres;
CREATE TABLE v1 (
f1 integer,
f2 integer
);
ALTER TABLE public.v1 OWNER TO postgres;
CREATE VIEW v2 AS
SELECT v1.f1, v1.f2 FROM v1;
ALTER TABLE public.v2 OWNER TO postgres;
COPY tt (f1, f2) FROM stdin;
\.
CREATE RULE _RETURN AS ON SELECT TO v1 DO INSTEAD SELECT v2.f1, v2.f2 FROM v2;
REVOKE ALL ON SCHEMA public FROM PUBLIC;
REVOKE ALL ON SCHEMA public FROM ubuntu;
GRANT ALL ON SCHEMA public TO ubuntu;
GRANT ALL ON SCHEMA public TO PUBLIC;



2011/1/12 Tom Lane t...@sss.pgh.pa.us:
 regression=# create table tt(f1 int, f2 int);
 CREATE TABLE
 regression=# create view v1 as select * from tt;
 CREATE VIEW
 regression=# create view v2 as select * from v1;
 CREATE VIEW
 regression=# create or replace view v1 as select * from v2;
 CREATE VIEW
 regression=# drop view v1;
 ERROR:  cannot drop view v1 because other objects depend on it
 DETAIL:  view v2 depends on view v1
 HINT:  Use DROP ... CASCADE to drop the dependent objects too.
 regression=# drop view v2;
 ERROR:  cannot drop view v2 because other objects depend on it
 DETAIL:  view v1 depends on view v2
 HINT:  Use DROP ... CASCADE to drop the dependent objects too.

 This isn't particularly *useful*, maybe, but it's hardly impossible.
 And if we analyzed function dependencies in any detail, circular
 dependencies among functions would be possible (and useful).

                        regards, tom lane


-- 
Best regards,

Joel Jacobson
Glue Finance

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Possible bug in pg_settings/pg_depend

2011-01-13 Thread Joel Jacobson
Are multiple identical entires in pg_depend possible?
If so, how do they occur, and what is the purpose of representing
exactly the same dependency two times in pg_depend?

I expected the following query not to return any rows, but it did:

glue=# select count(*), * from pg_depend group by classid, objid,
objsubid, refclassid, refobjid, refobjsubid, deptype having count(*) 
1;
 count | classid | objid | objsubid | refclassid | refobjid |
refobjsubid | deptype
---+-+---+--++--+-+-
 2 |2618 | 11015 |0 |   1259 |11012 |   1 | n
(1 row)


Debug data:


glue=# select version();
   version
-
 PostgreSQL 9.1alpha3 on x86_64-unknown-linux-gnu, compiled by GCC gcc
(Ubuntu 4.4.3-4ubuntu5) 4.4.3, 64-bit
(1 row)

glue=# select pg_describe_object(2618,11015,0);
pg_describe_object

 rule pg_settings_u on view pg_settings
(1 row)

glue=# select pg_describe_object(1259,11012,1);  pg_describe_object
--
 view pg_settings column name
(1 row)



-- 
Best regards,

Joel Jacobson
Glue Finance

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Add function dependencies

2011-01-13 Thread Joel Jacobson
Thanks to the new pg_stat_xact_user_functions and
pg_stat_xact_user_tables views in 9.1, it will be possible to
automatically sample which functions uses which functions/tables to
generate a nice directional graph of the dependency tree, based on
recent real-life activity, excluding any unused relations/functions
not-in-use anymore. It's actually a feature to not include these, as
they make the graph a lot more complicated.

If you want a graph on the activity during Mondays between 2:30pm and
2:31pm, such a graph could easily be generated, or if you want it for
30 days (which would probably include a lot more edges in the graph),
it can also be generated. :-)

It would be quite easy to automatically inject some small code
snippets to the top and bottom of each user function, to get the diff
of select * from pg_stat_xact_user_functions and
pg_stat_xact_user_tables between the entry point of each function and
the exit point.

It would be a lot nicer if it would be possible to automatically let
PostgreSQL sample such data for you, providing nice system views with
information on the sampled data per function, allowing you to query it
and ask,

 - What functions has funciton public.myfunc(int) called and what
tables has it inserted/selected/updated/deleted from since the last
time I resetted the
statistics?

Just an idea...

-- 
Best regards,

Joel Jacobson
Glue Finance

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Possible bug in pg_settings/pg_depend

2011-01-13 Thread Joel Jacobson
2011/1/13 Tom Lane t...@sss.pgh.pa.us:
 Yes, probably.  It's certainly possible to have the same linkage occur
 with different deptypes.  We don't try hard to avoid dups because they
 don't matter.

with different deptypes, yes, but in this case there were two
linkages of the same deptype.

Just seems a bit strange I only found one such in the entire database,
smells like some kind of bug, but might not be, I dunno, just thought
it was worth investigating a bit, but if you're sure about it I of
course trust you.

-- 
Best regards,

Joel Jacobson
Glue Finance

E: j...@gluefinance.com
T: +46 70 360 38 01

Postal address:
Glue Finance AB
Box  549
114 11  Stockholm
Sweden

Visiting address:
Glue Finance AB
Birger Jarlsgatan 14
114 34 Stockholm
Sweden

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_depend explained

2011-01-13 Thread Joel Jacobson
2011/1/13 David Fetter da...@fetter.org:
 Please put a self-contained example on the snippets page, and please
 also to check that it actually runs before doing so.  You'd mangled
 some aliases in the query you sent, which leads me to believe you
 hadn't actually tried running it.

I actually hadn't really solved the problem at the time I wrote my last email,
it turned out I had to do things a bit differently to avoid running
into problems with corner cases.

I will put together a self-contained example like you suggested and
get back shortly :-)

-- 
Best regards,

Joel Jacobson
Glue Finance

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Do magic using pg_depend

2011-01-13 Thread Joel Jacobson
https://github.com/gluefinance/pov/blob/master/doc/example_database.sql

Please feel free to put any of this on the PostgreSQL wiki like
suggested by David Fetter.

This is an example of some functionality provided and used by pov
(PostgreSQL Object version control system).
Most of, if not all, this stuff has already been implemented in pg_dump,
but since pov is a SQL-based system it was necessary to implement the
same functionality
using only sql/plpgsql/plperl.

Author: Joel Jacobson, Glue Finance AB, Sweden, j...@gluefinance.com
Datestamp: 2011-01-13 23:42 Europe/Stockholm
License: MIT (http://www.opensource.org/licenses/mit-license.php)

We will learn how to do a lot of PostgreSQL-magic only by using the
nice system table pg_depend.
Today we will,
a) create nice directional graphs of all object dependencies,
b) sort all objects in a truly sorted topological creatable order,
c) show create/drop commands for most of the objects.

Let the show begin!

Installation:

$ git clone g...@github.com:gluefinance/pov.git
$ cd pov
$ sh install_example_database.sh

a) Generate directional graph in DOT-format.
COPY (SELECT diagraph FROM pov.pg_depend_dot) TO '/tmp/example_database.dot';

Then use the dot (http://www.graphviz.org/) to generate graphs in svg,
png, or any format.
dot -oexample_database.png -Tpng example_database.dot
dot -oexample_database.svg -Tsvg example_database.dot

Or view it in the SQL prompt:

test=# select * from pov.pg_depend_dot;
  diagraph
-
 digraph pg_depend {
 function plpgsql_call_handler() 1255.11599.0 - language
plpgsql 2612.11602.0 [color=black label=n]
 function plpgsql_inline_handler(internal) 1255.11600.0 -
language plpgsql 2612.11602.0 [color=black label=n]
 function plpgsql_validator(oid) 1255.11601.0 - language
plpgsql 2612.11602.0 [color=black label=n]
 function plperl_call_handler() 1255.23562.0 - language plperl
2612.23565.0 [color=black label=n]
 function plperl_inline_handler(internal) 1255.23563.0 -
language plperl 2612.23565.0 [color=black label=n]
 function plperl_validator(oid) 1255.23564.0 - language plperl
2612.23565.0 [color=black label=n]
 function f1(integer) 1255.23656.0 - view v4 1259.23688.0
[color=black label=n]
 function f1(integer) 1255.23656.0 - constraint t3_id_check on
table t3 2606.23673.0 [color=black label=n]
 table t1 1259.23651.0 - table t1 column id 1259.23651.1
[color=yellow label=an]
 table t1 column id 1259.23651.1 - view v1 1259.23676.0
[color=black label=n]
 table t1 column id 1259.23651.1 - constraint t1_pkey on table
t1 2606.23655.0 [color=blue label=a]
 table t1 column id 1259.23651.1 - constraint t2_id_fkey on
table t2 2606.23664.0 [color=black label=n]
 sequence s1 1259.23657.0 - default for table t3 column id
2604.23672.0 [color=black label=n]
 table t2 1259.23659.0 - table t2 column id 1259.23659.1
[color=yellow label=an]
 table t2 column id 1259.23659.1 - view v2 1259.23680.0
[color=black label=n]
 table t2 column id 1259.23659.1 - constraint t2_pkey on table
t2 2606.23663.0 [color=blue label=a]
 table t2 column id 1259.23659.1 - constraint t2_id_fkey on
table t2 2606.23664.0 [color=blue label=a]
 table t3 1259.23669.0 - table t3 column id 1259.23669.1
[color=yellow label=an]
 table t3 column id 1259.23669.1 - default for table t3 column
id 2604.23672.0 [color=blue label=a]
 table t3 column id 1259.23669.1 - constraint t3_id_check on
table t3 2606.23673.0 [color=yellow label=na]
 table t3 column id 1259.23669.1 - constraint t3_pkey on table
t3 2606.23675.0 [color=blue label=a]
 view v1 1259.23676.0 - view v1 column id 1259.23676.1
[color=black label=n]
 view v1 column id 1259.23676.1 - view v3 1259.23684.0
[color=black label=n]
 view v2 1259.23680.0 - view v2 column id 1259.23680.1
[color=black label=n]
 view v2 column id 1259.23680.1 - view v3 1259.23684.0
[color=black label=n]
 view v3 1259.23684.0 - view v3 column id1 1259.23684.1
[color=black label=n]
 view v3 1259.23684.0 - view v3 column id2 1259.23684.2
[color=black label=n]
 view v3 column id1 1259.23684.1 - view v4 1259.23688.0
[color=black label=n]
 view v3 column id2 1259.23684.2 - view v4 1259.23688.0
[color=black label=n]
 constraint t1_pkey on table t1 2606.23655.0 - constraint
t2_id_fkey on table t2 2606.23664.0 [color=black label=n]
 schema public 2615.2200.0 - function f1(integer)
1255.23656.0 [color=black label=n]
 schema public 2615.2200.0 - table t1 1259.23651.0
[color=black label=n]
 schema public 2615.2200.0 - sequence s1 1259.23657.0
[color=black label=n]
 schema public 2615.2200.0 - table t2 1259.23659.0
[color=black label=n]
 schema public 2615.2200.0 - table t3 1259.23669.0
[color=black label=n]
 schema public 2615.2200.0 - view v1

Re: [HACKERS] pg_depend explained

2011-01-14 Thread Joel Jacobson
2011/1/12 Alvaro Herrera alvhe...@commandprompt.com:
 I think this code should live in the Wiki somewhere:
 http://wiki.postgresql.org/wiki/Snippets

This file contains only the relevant remapping of pg_depend, folding
the internal linkages properly:

https://github.com/gluefinance/pov/blob/master/sql/schema/pov/views/pg_depend_remapped.sql

It can be tested stand-alone and does not require any other components
from the pov project.

Can I create a wiki snippet myself or do I need some kind of admin access?

-- 
Best regards,

Joel Jacobson
Glue Finance

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] pov 1.0 is released, testers with huge schemas needed

2011-01-14 Thread Joel Jacobson
Dear pghackers,

During the last two weeks, I have enjoyed a total of 113 hours of
development of pov - PostgreSQL Object Version control system.

Version 1.0 is now finally released!

POV will not touch any of your data (tables, sequences, indexes) -
it will only keep track of your logics (functions, constraints,
defaults, triggers, aggregates and views).

The possibilities are hopefully end-less, imagine a world where you
never have to worrie about cumbersome reverting.
If you screw-up your logics, you can simply travel back in time
without losing any of your data.

I just managed to start out with a clean database, load pov, take a
snapshot: select pov(), import my company's entire production database
(consisting of 2159 objects), take a new snapshot: select pov(), then
restore to the original state: select pov(1), i.e. removing all the
functions/constraints/defaults/triggers/aggregates/views back to the
orignal state, then restoring the database again: select pov(2),
without losing any of the data nor touching any of the data
containers (i.e. tables, sequences, indexes).

5 minute teaser demo video: http://screencast.com/t/mYFlvcI5H38X

To install:
git clone g...@github.com:gluefinance/pov.git
cd pov
psql -f $PGSRC/postgresql-9.1alpha3/contrib/pgcrypto/pgcrypto.sql
psql -f sql/install.sql

To take a snapshot:
postgres=# select pov();

To restore a snapshot:
postgres=# select pov(1); -- replace 1 with the snapshot #

Potential future features: merging schemas, diffing, etc...

I would highly appreciate if you hackers could test to play around
with this, using real-life database schemas, to detect any bugs.

To all of you, have a great weekend! Happy hacking!

-- 
Best regards,

Joel Jacobson
Glue Finance

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pov 1.0 is released, testers with huge schemas needed

2011-01-15 Thread Joel Jacobson
2011/1/14 Joel Jacobson j...@gluefinance.com:
 To install:
 git clone g...@github.com:gluefinance/pov.git

Ops, typo, to do this without my ssh key, you need to do:

git clone git://github.com/gluefinance/pov.git

-- 
Best regards,

Joel Jacobson
Glue Finance

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug in pg_describe_object, patch v2

2011-01-15 Thread Joel Jacobson
2011/1/15 Tom Lane t...@sss.pgh.pa.us:
 But I can read the handwriting on the wall: if I want this done right,
 I'm going to have to do it myself.

                        regards, tom lane

Excellently put! I will with pride steal that phrase and use it
whenever I run into the same situation myself.
Quite often a proposed good enough solution introduces unnecessary
complexity and awkwardness, while at the same time the proper
solution requires an unproportionally amount of resources to
accomplish.
The model most people adapt is a mix of both. Spend time on details
when it is absolutely necessary and accept good enough solutions
until someone (not necessarily yourself, like you suggested) gets
motivated enough to implement the proper solution.

I don't think and I hope it's not just one guy in the universe who can
get it done right.

So, instead of fighting like war maniacs on this
god-damn-two-lines-of-code-patch, could Tom please describe to the
patch submitter what you mean with get it done right, and perhaps
the patch submitter will be motivated enough to invest a few
hundred/thousands hours of his time to solve the problem the way
hinted by Tom?

-- 
Best regards,

Joel Jacobson
Glue Finance

E: j...@gluefinance.com
T: +46 70 360 38 01

Postal address:
Glue Finance AB
Box  549
114 11  Stockholm
Sweden

Visiting address:
Glue Finance AB
Birger Jarlsgatan 14
114 34 Stockholm
Sweden

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug in pg_describe_object, patch v2

2011-01-17 Thread Joel Jacobson
2011/1/16 Tom Lane t...@sss.pgh.pa.us:
 Comments?

I think it's great you undertook the challenge of solving this problem
the proper way.
I think your desire to achieve perfection in every little detail is admirable.

Your patch is according to me, not far from perfect, but could be
improved in a faw ways:

a) pg_describe_object should always include the schema in the name,
even for object in public and pg_catalog. Otherwise it's not
explicitly stated whether an object relies in pg_catalog or public. If
you would create a function in the public schema in a 8.3 database,
with a name which does not exist in pg_catalog, when converting to
version 9 in the future, which provides a few more functions in the
pg_catalog schema than 8.3, there is a risk your function would
conflict with the new function with the same name in version 9. The
pg_catalog function would then be selected by default, unless
explicitly calling it with the fully qualified name. This might or
might not be what you want. Let's say you are unfortunate and unaware
it's bad to name functions pg_* in the public schema, since it's
possible, it could possibly lead to unexpected results. Now, let's go
back to the discussion on what pg_describe_object(oid,oid,int) should
return. I strongly believe it's wiser to include the schema in the
description, even though it's not important nor interesting in most
cases. It's good to explicitly inform the end-user (who's
understanding of SQL or PostgreSQL might be limited) there is a
distinct difference between public.myfunc and pg_catalog.myfunc, and
it's better to always include the schema, than to auto-detect if
there are two functions with the same name, because it's highly
confusing the description depends on your SET search_path TO
setting.

Conclusions in summary:
*) It would be a lot more user-friendly if pg_describe_object always
returned the same text string, independent of your search_path
setting.
*) It would be a lot more programmer-friendly if pg_describe_object
returned the same text string, independent of your search_path
setting and at the same time always included all the unique columns of
the object (including the schema would fix it in 99% of the cases,
and your patch would fix the remaining 1% of the cases).
*) Alternatively, it could possibly be better to introduce a new
function, only returning a text string composed using only the unique
columns (and some constants to do the formatting of the nice text
string, resolved using the unique columns), if it's not feasible to
append the schema name to the description, considering people might
(unlikely, but not impossibly) programtically rely on
pg_describe_object already and parse the text string, which would
break a lot of things.

Just some thoughts...


-- 
Best regards,

Joel Jacobson
Glue Finance

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug in pg_describe_object, patch v2

2011-01-17 Thread Joel Jacobson
2011/1/17 Tom Lane t...@sss.pgh.pa.us:
 Joel Jacobson j...@gluefinance.com writes:
 a) pg_describe_object should always include the schema in the name,
 even for object in public and pg_catalog.

 I knew you were going to demand that next, as soon as you figured out
 that it was an obstacle for using pg_describe_object output as a
 globally unique identifier.  But I'm going to reply, once again, that
 pg_describe_object is not meant to guarantee that and we're not going to
 make it so.

I knew you were going to say that, but I'm going to ask, once again,
is it possible to introduce another function, possibly returning a
differently formatted string, perhaps in JSON, or any other structured
format, suitable to be parsed by an application and which keys are
sorted in a way which guarantees uniqueness? Perhaps it could be named
pg_get_object_identifier or something like that.

It's a huge limitation when designing the type of application I'm
designing, when you need to invent your own solution to the problem,
to avoid the dependency on oids, which are not possible to use across
different databases.

Perhaps I'm the only one working with a project where a unique
identifier for each object is an absolute requirement, but it sounds
unlikely, quite a lot of developers must have thought about these
things before.

-- 
Best regards,

Joel Jacobson
Glue Finance

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug in pg_describe_object, patch v2

2011-01-23 Thread Joel Jacobson
2011/1/23 Dimitri Fontaine dimi...@2ndquadrant.fr:
 Tom Lane t...@sss.pgh.pa.us writes:
   But anyway, this patch has now officially
 been discussed to death.

 +1 :)

+∞ :)

In the aftermath, I realized I was almost about to feel a bit ashamed
about the fact my original forum post probably gave birth to the most
long lived discussion in the history of PostgreSQL. Having realized
this, I realized a secondly even more important fact, namely the
importance of details, making the whole difference between a sloppy
software project and a highly successful project with the ambition of
achieving perfection in every little detail. I'm proud we can conclude
ProstgreSQL is apparently a project of the second category.

-- 
Best regards,

Joel Jacobson
Glue Finance

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL/PgSQL STRICT

2012-12-21 Thread Joel Jacobson
On Fri, Dec 21, 2012 at 4:22 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 What about before it, ie

 STRICT UPDATE ...

+1 from me too.
This feature would be awesome.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] PL/PgSQL STRICT

2012-12-21 Thread Joel Jacobson
On Fri, Dec 21, 2012 at 4:53 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 That was my first reaction too, but Marko's followon examples seem to
 make a reasonable case for it.  There are many situations where you
 expect an UPDATE or DELETE to hit exactly one row.  Often, programmers
 won't bother to add code to check that it did ... but if a one-word
 addition to the command can provide such a check, it seems more likely
 that they would add the check.

Very true.

When I was a PL/PgSQL beginner a few years ago I did exactly that, I
didn't check if the update actually updated any row, I didn't know it
could fail, and felt extremely worried and stupid when I realised
this. I spent an entire day going through all functions fixing this
problem at all places. The fix was not beautiful and it bugged me
there was not a prettier way to fix it.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [PATCH] pg_dump: Sort overloaded functions in deterministic order

2012-07-04 Thread Joel Jacobson
I have received positive feedback on the pg_dump --split option I suggested,
but it depends on pg_dump dumping objects in a deterministic order.

I'm committed to fixing this. The first problem I've spotted is overloaded
functions.

This patch adds a new element to DumpableObject: char *proargs
This is set to the output from pg_get_function_identity_arguments(oid)
for all functions, and set to NULL for all other objects.

sortDumpableObjectsByTypeName calls DOTypeNameCompare, which in addition
to sorting objects by type, namespace and name, now also sorts by
the function identity arguments.

This makes overloaded functions being dumped in the same order,
regardless of which order they were created.

Are there any other object types where the order isn't deterministic?


pg_dump_deterministic_order.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [PATCH] pg_dump: Sort overloaded functions in deterministic order

2012-07-04 Thread Joel Jacobson
I renamed the new element to DumpableObject from proargs to the more
general name sortkey.

This way this element can be used by any object types in the future,
which might require sorting by additional information than type, namespace
and name.

Currently, it's only set for functions/aggregates though, its NULL for all
other object types.

I felt less ugly to add a new element with a general name than one specific
for functions.

I also moved the check to the last part of DOTypeNameCompare, just before
sorting by OIDs as a last resort.

Feedback on the implementation is welcomed.

If this can be achieved without adding a new element to DumpableObject,
it is of course much better, but I couldn't find a way of doing that.


pg_dump_deterministic_order_v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Schema version management

2012-07-05 Thread Joel Jacobson
On Thu, Jul 5, 2012 at 2:38 AM, Robert Haas robertmh...@gmail.com wrote:

 My vote is - when there's an overloaded function, put each version in
 its own file.  And name the files something like
 functionname_something.sql.  And just document that something may not
 be entirely stable.


I would agree that's better if the dump order isn't deterministic.

However, it looks like an easy fix to make the dump order deterministic:
http://archives.postgresql.org/pgsql-hackers/2012-07/msg00232.php

If the dump order is deterministic, I think its cleaner to put all versions
in the same file.

Benefits:
+ Pretty looking filename
+ Same file structure for all object types, no special exception for
functions


Re: [HACKERS] [PATCH] pg_dump: Sort overloaded functions in deterministic order

2012-07-05 Thread Joel Jacobson
New version, made a typo in last one.


pg_dump_deterministic_order_v3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Schema version management

2012-07-05 Thread Joel Jacobson
Maybe it could be made an option to pg_dump?

Some users and their systems might not even have overloaded functions,
and these users will of course prefer a nice looking filename, i.e. all
functions
having the same name kept in the same file. Which for them will mean only
one function per file anyway.

pg_dump --split --overloaded-functions-to-same-file

Other users and their systems might have a lot of overloaded functions,
like the equal() example mentioned, they will of course prefer to keep
all functions in separate files.

pg_dump --split --overloaded-functions-to-separate-files

Then, one can discuss which one should be the default option for --split,
I would prefer the same file variant, and think most other users would too,
except for users with a lot of overloaded functions.


Re: [HACKERS] Schema version management

2012-07-05 Thread Joel Jacobson
On Thu, Jul 5, 2012 at 4:46 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Ick.  Then we have to deal with all the downsides of *both* methods.

 pg_dump is already a bloated, nearly unmaintainable mess.  The very
 last thing it needs is even more options.


When you say bloated, are you referring to the code or the command line
interface?

If you are referring to the code, I don't think that's a good argument
against implementing new good features.
The important ratio is the value of a feature compared to the increased
complexity.
In this case, it's very simple to implement both the --split option and the
fixing of dump order.
I'm not even a C coder and managed to implement it within less of an hour
effective coding.
We are talking ~100 lines of code, with comments and everything.

If you are referring to the command line interface and think it is bloated,
maybe the options should be hidden in the normal --help.
We could create a new --help-advanced text, where we could put these
options, and all other existing less common options.
I think this is a quite common and good way to handle the situation for
UNIX command line tools.


Re: [HACKERS] Schema version management

2012-07-05 Thread Joel Jacobson
On Thu, Jul 5, 2012 at 5:17 PM, Alvaro Herrera
alvhe...@commandprompt.comwrote:

 Agreed.

 However I am also against what seems to be the flow.  Normally, you
 don't write overloaded plpgsql functions such as equal.  Case in
 point, the equality functions in core have funny names like int4eq and
 so on.  Instead, at least in my experience, the overloaded functions
 people seem to have in their databases are like do_stuff_to_foobars()
 and you have one version for foos and another one for bars.

 If you're doing lots of equality functions, surely it would make more
 sense to package them up as an extension anyway along with all the other
 thingies you need for the type you're supposedly writing.  So it's a
 completely different market than what we're aiming at here.


True, very true, I didn't think about that, you are right, I fully agree.
My vote is therefore on the put all overloaded functions in the same file
variant.


Re: [HACKERS] Schema version management

2012-07-05 Thread Joel Jacobson
On Thu, Jul 5, 2012 at 5:59 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 FWIW, I'm attracted to the all-similarly-named-functions-together
 method, mainly because it dodges the problem of how to encode a
 function's argument list into a filename.  However, we're being
 short-sighted to only think of functions here.  What about operators?
 Or casts?  Those don't have simple names either.


Someone suggested to urlencode them. I think that's a quite good solution.

Personally, I don't have any user-defined operators or casts. Don't know
how common it is in general, but it must of course work for these as well.


Re: [HACKERS] Schema version management

2012-07-05 Thread Joel Jacobson
On Thu, Jul 5, 2012 at 6:09 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Well, to be perfectly frank, I already doubt that this entire feature
 passes the complexity-versus-value test, because pg_dump is not a
 substitute for an SCM --- people who have got enough functions to need
 this sort of thing need to be keeping them somewhere else than in dump
 files.  Complicating things more by supporting multiple ways of doing it
 will make that worse.  I think you need to pick one design and stick
 with it, not try to paint the bikeshed every color suggested by anybody.


I agree it should be one option only, and again I think the one file
variant is best.

This is indeed not a substitute for an SCM, but a nice complement.

Personally, I use this feature already to commit the schema for all versions
of my databases (production, test, development) into a git repo every
minute.
It only commits if something has changed.

This makes it super easy to compare the schema of the actual production
database
between different points in time.

This would not be possible if only manually committing stuff to the normal
git repo,
where I also have all the functions, which I modify when developing and
testing.

pg_dump - git means you can be 100% certain version X of the schema was
active in the production database at date/time T.


Re: [HACKERS] [PATCH] pg_dump: Sort overloaded functions in deterministic order

2012-07-05 Thread Joel Jacobson
I agree, good suggestion, I just didn't know how to implement it without a
new field. I'll make a new attempt to get it right.

On Thursday, July 5, 2012, Tom Lane wrote:

 Joel Jacobson j...@trustly.com javascript:; writes:
  New version, made a typo in last one.

 I'm not particularly happy with the idea of adding a sortkey field to
 DumpableObject as such, when most object types don't need it.  That just
 bloats the code and pg_dump's memory consumption.  It would be better to
 modify the already-existing object-type-specific special cases in
 DOTypeNameCompare to take additional information into account as needed.

 BTW, I see no reason to be adding extra calls of
 pg_get_function_identity_arguments.  What is wrong with the funcsig or
 aggsig strings that the code already computes?

 regards, tom lane



Re: [HACKERS] [PATCH] pg_dump: Sort overloaded functions in deterministic order

2012-07-05 Thread Joel Jacobson
Roger that. I'm on it.

On Thursday, July 5, 2012, Tom Lane wrote:

 Joel Jacobson j...@trustly.com javascript:; writes:
 You may in fact need a new field --- I'm just saying it should be in the
 object-type-specific struct, eg FuncInfo, not DumpableObject.

 regards, tom lane



Re: [HACKERS] [PATCH] pg_dump: Sort overloaded functions in deterministic order

2012-07-06 Thread Joel Jacobson
On Thu, Jul 5, 2012 at 10:33 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 You may in fact need a new field --- I'm just saying it should be in the
 object-type-specific struct, eg FuncInfo, not DumpableObject.


I suggest adding char *funcsig to FuncInfo, and moving the funcsig =
format_function_arguments(finfo, funciargs) code from dumpFunc to getFuncs.

Because dumpFunc is called after sortDumpableObjectsByTypeName, setting
funcsig in the FuncInfo struct in dumpFunc would't work, as it needs to be
available when entering sortDumpableObjectsByTypeName.

What do you think?


Re: [HACKERS] Schema version management

2012-07-08 Thread Joel Jacobson
On Saturday, July 7, 2012, Tom Lane wrote:

 If we think that operators outside of extensions will be an infrequent
 special case, what about just dumping all of them into a single file
 named operators?  And similarly for casts?

 regards, tom lane



+1


Re: [HACKERS] Schema version management

2012-07-11 Thread Joel Jacobson
On Wed, Jul 11, 2012 at 12:24 AM, Merlin Moncure mmonc...@gmail.com wrote:

 What does your patch do that you can't already do with pg_restore?


Please read through the entire thread, think question has already been
answered.


Re: [HACKERS] Schema version management

2012-07-12 Thread Joel Jacobson
On Wed, Jul 11, 2012 at 8:01 PM, Merlin Moncure mmonc...@gmail.com wrote:

 After extensive off-list discussion with Joel it became clear that
 per-object dumping ability really belongs in pg_restore.


The only benefit I could see in putting it in pg_restore is you would then
be able to do a --split on already existing historical dumps.

On the other hand, it would require you to use both pg_dump and pg_restore,
instead of only pg_dump, which makes it a bit less user-friendly.

I haven't looked at how it could be implemented in pg_restore, if its even
just
a little more complex, it's probably better to let pg_dump handle the task.


Re: [HACKERS] Schema version management

2012-07-12 Thread Joel Jacobson
On Wed, Jul 11, 2012 at 11:20 PM, Alvaro Herrera alvhe...@commandprompt.com
 wrote:

 operator_!___numeric.sql (postfix, name does not need escape)
 operator_%7C%2F_integer__.sql (prefix)
 operator_%3C_bit_varying__bit_varying.sql (type name with spaces,
 changed to _)


I think the directory structure [schema]/[type]/[name] should be the same
for all object types. I don't like operator being part of the filename,
it should be the directory name.


Re: [HACKERS] Schema version management

2012-07-12 Thread Joel Jacobson
On Thursday, July 12, 2012, Tom Lane wrote:

 What are you going to do with objects that don't have schemas?
 (Including, but not restricted to, the schemas themselves.)


Good question. Maybe something like this?

For objects without schema:
/global/[type]/[name].sql

For objects with schema:
/schema/[schema]/[type]/[name].sql


Re: [HACKERS] Schema version management

2012-07-12 Thread Joel Jacobson
On Thu, Jul 12, 2012 at 4:01 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 FWIW, I think you could save a level of naming if you were willing to
 put the type first, since the type would imply whether the object
 lives in a schema or not:

 [type]/[name].sql
 [type]/[schema]/[name].sql


Could work. But I think it's more relevant and useful to keep all objects
in a schema in its own directory.

That way it's easier to get an overview of what's in a schema,
simply by looking at the file structure of the schema directory.

I think its more common you want to show all objects within schema X
than show all schemas of type X.

PS.

I was thinking -- the guys back in the 70s must have spent a lot of time
thinking about the UNIX directory structure -- before they decided upon it.

I did some googling and found found this explanation which was quite
amusing to say the least :-)

http://lists.busybox.net/pipermail/busybox/2010-December/074114.html


Re: [HACKERS] Schema version management

2012-07-14 Thread Joel Jacobson
On Fri, Jul 13, 2012 at 9:41 PM, Peter Eisentraut pete...@gmx.net wrote:

 Personally, I hate this proposed nested directory structure.  I would
 like to have all objects in one directory.

 But there is a lot of personally in this thread, of course.


Why do you hate it?

It's a bit like saying,
 - I hate database normalization, better to keep all rows in one single
table.
or even,
 - I hate directories.

I have thousands of objects, it would be a total mess to keep them all in a
single directory.

Using a normalized directory structure makes sense for the SCM use-case,
I haven't seen any projects where all the files are kept in one directory.


Re: [HACKERS] Schema version management

2012-07-14 Thread Joel Jacobson
On Sat, Jul 14, 2012 at 11:25 AM, Peter Eisentraut pete...@gmx.net wrote:
 Well, of course everyone uses directories in moderation.  But you might
 want to take a look at the gcc source code.  You'll love it. ;-)

Yes, but GCC was also created by someone who picks stuff from his bare
foot and eats it. ;-)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Schema version management

2012-07-14 Thread Joel Jacobson
On Sat, Jul 14, 2012 at 12:34 PM, Joel Jacobson j...@trustly.com wrote:
 On Sat, Jul 14, 2012 at 11:25 AM, Peter Eisentraut pete...@gmx.net wrote:
 Well, of course everyone uses directories in moderation.  But you might
 want to take a look at the gcc source code.  You'll love it. ;-)

[505][joel.Joel-Jacobsons-iMac-2: /Users/joel/gcc]$ find . -type d | wc -l
   41895
[506][joel.Joel-Jacobsons-iMac-2: /Users/joel/gcc]$ find . -type f | wc -l
  167183
[507][joel.Joel-Jacobsons-iMac-2: /Users/joel/gcc]$

Not that bad actually, only 4 files per directory on average.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [PATCH] pg_dump: Sort overloaded functions in deterministic order

2012-07-19 Thread Joel Jacobson
Makes pg_dump sort overloaded functions in deterministic order.

The field proiargs has been added to FuncInfo and is set by getFuncs()
and getAggregates() for all functions and aggregates.

DOTypeNameCompare uses this field to break ties if the name and number of
arguments are the same. This avoid having to default to OID sorting.

This patch is independent from the ongoing discussion of the pg_dump --split
option. Even if we can't agree on how to do the splitting of objects into
files, it still makes sense to fix the sort order of overloaded functions.


pg_dump_deterministic_order_v4.patch
Description: Binary data


pg_dump_deterministic_order.t
Description: Troff document

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Unexpected plperl difference between 8.4 and 9.1

2012-08-19 Thread Joel Jacobson
After upgrading from 8.4 to 9.1, one of my plperl functions stopped working
properly.

For some reason, when matching a string using a regex, the $1 variable
cannot be returned directly using return_next() but must be
set to a variable first.
If returned directly, it appears to be cached in some strange way,
returning the same value for all 10 rows in the example below.

In 8.4, these two functions returns the same thing, 10 rows of random
numbers.

Is this a feature or a bug?

CREATE OR REPLACE FUNCTION test1() RETURNS SETOF NUMERIC AS $BODY$
use strict;
use warnings;
for(my $i=0 ; $i10; $i++) {
my $rand = rand();
$rand =~ m/(.*)/;
return_next($1);
}
return;
$BODY$ LANGUAGE plperl;

joel=# select * from test1();
 0.482287904847535
 0.482287904847535
 0.482287904847535
 0.482287904847535
 0.482287904847535
 0.482287904847535
 0.482287904847535
 0.482287904847535
 0.482287904847535
 0.482287904847535
(10 rows)

CREATE OR REPLACE FUNCTION test2() RETURNS SETOF NUMERIC AS $BODY$
use strict;
use warnings;
for(my $i=0 ; $i10; $i++) {
my $rand = rand();
$rand =~ m/(.*)/;
my $val = $1;
return_next($val);
}
return;
$BODY$ LANGUAGE plperl;


joel=# select * from test2();
   test2

  0.504361211998972
  0.015757483449562
  0.154422531777254
  0.383329383899088
  0.578318997407354
 0.0022126436077059
  0.970868502733449
  0.465566753133679
  0.215372148522395
  0.390036490131536
(10 rows)


  1   2   3   >