date:20120622

Re: [HACKERS] pl/perl and utf-8 in sql_ascii databases

2012-06-22 Thread Kyotaro HORIGUCHI

Hello.

  The attached ugly patch does it. We seem should put NO_LOCALE=1
  on the 'make check' command line for the encodings not compatible
  with the environmental locale, although it looks work.
 
 +REGRESS_LC0 = $(subst .sql,,$(shell cd sql; ls plperl_lc_$(shell echo
snip.
 Hrm, that's quite cute. I dunno if there is a more cannon way of doing
 the above-- but it seems to work. I'm not sure this regression test is
 worth it. I'm thinking maybe we should just remove
 theegressionegression test instead.

I agree. That is the fundamental question. I've coded just for my
fun but I don't see not so much signicance to do that. We might
omit the test for this which is non-ciritical and corner cases.

I'll leave it to your decision whether to do that.

 There is a minor issue with the patch where
 sql/plperl_lc_sql_ascii.sql contains the text plperl_lc.sql. After
 copying sql/plperl_lc.sql to sql/plperl_lc_sql_ascii.sql everything
 worked as described.

Ah. It is what was a simbolic link. I made the patch with doubt
whether symlink could be encoded into diff, and saw the doubious
result but left as it is :-p. I leaned that no meaningful
symbolic-link cannot be used in source tree managed by git.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

== My e-mail address has been changed since Apr. 1, 2012.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Catalog/Metadata consistency during changeset extraction from wal

2012-06-22 Thread Amit Kapila

Andres Freund Sent: Thursday, June 21, 2012 5:11 PM
 4.)
 Log enough information in the walstream to make decoding possible using
only 
 the walstream.

What I understood is that enough information is catalog data. Is that right
or something else?

 Advantages:
 * Decoding can optionally be done on the master
 * No catalog syncing/access required
 * its possible to make this architecture independent

 Disadvantage:
 * high to very high implementation overhead depending on efficiency aims
 * high space overhead in the wal because at least all the catalog
information 

In Multiversion approach also, there will be overhead of space to
maintain multiple versions
irrespective of any approach you use.

  needs to be logged in a transactional manner repeatedly
Why it needs to be logged repeatedly, once we log the catalog
information in WAL, during that
time we can disallow/block other DDL's and after that changes to Catalog
information can be
retrievied from WAL only.  

 * misuses wal far more than other methods
What is the misuse in this, I believe it can be later used for log
mining purposes also.

 * significant new complexity in somewhat cricital code paths (heapam.c)
 * insanely high space overhead if the decoding should be possible
architecture 
  independent

The option 4 seems to be better as compare to others w.r.t top level
approach to solve the problem.
Some other databases also uses similar approach for the use cases similar to
what you have described.

With Regards,
Amit Kapila.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Catalog/Metadata consistency during changeset extraction from wal

2012-06-22 Thread Simon Riggs

On 21 June 2012 12:41, Andres Freund and...@2ndquadrant.com wrote:

 2.)
 Keep the decoding site up2date by replicating the catalog via normal HS
 recovery
 mechanisms.

 Advantages:
 * most of the technology is already there
 * minimal overhead (space, performance)
 * no danger of out of sync catalogs
 * no support for command triggers required that can keep a catalog in sync,
 including oids

 Disadvantages:
 * driving the catalog recovery that way requires some somewhat intricate code
 as it needs to be done in lockstep with decoding the wal-stream
 * requires an additional feature to guarantee HS always has enough information
 to be queryable after a crash/shutdown
 * some complex logic/low-level fudging required to keep the transactional
 behaviour sensible when querying the catalog
 * full version/architecture compatibility required
 * the decoding site will always ever be only readable

My initial reaction was this wont work, but that requires
qualification since this is a complex topic:  You can use this
approach as long as you realise that the catalog it gives can never be
rewound.

So the generic approach to construct me a catalog as of this LSN
would need to start with a base backup of the catalog and then roll
forward to the appropriate LSN. Which means a generic user of this
approach would need to be able to construct an initial catalog using a
PITR.

Constructing a decoding site requires you to
a)  take a partial base backup of the catalog
b) apply WAL records to bring that forwards to the correct LSN, which
would require some alteration of the recovery code to skip the files
missing in a)

So taking the approach of a decoding site means we have to modify
recovery code, and even when we do that we still end up with a
difficult to deploy option in the real world. Difficult to deploy
becaus we need a whole new instance of Postgres, plus we need all of
the WAL files, which could easily be impractical.

The overall approach is good, but the architecture is wrong. What we
need is a catalog base backup and a means of rolling forward to the
appropriate LSN. Rolling forward using WAL is too bulky, so we need a
separate log of DDL changes to the catalog. So what we need is a
catalog base backup plus a ddl-log.

And we need to be able to reconstruct the correct catalog on the
target server directly.

To translate the WAL we maintain a secondary set of catalog tables,
which only exist for logical replication. These are normal,
non-versioned tables, but held in a new schema pg_logical or similar.
One reason why this must be a secondary catalog is to allow the
translation to take place on the target server, and to allow
translation of WAL from a prior version of the catalog - so we can
allow online upgrades across different catalog versions (and possibly
major versions).

The secondary catalog is used in place of the normal catalog during
InitCatalogCache() in the apply process. All the normal caches exist,
they just point to secondary relations rather than the normal ones.

When we initialise replication we take a copy of the appropriate
tables, columns and rows in a catalog-base-backup, using something
like pg_dump. Overall, this is much smaller than normal catalog since
it avoids temp tables, and anything not related to WAL translation.

On each non-temp change to the database we record changes as SQL in
the ddl-log, together with the LSN of the change.

When number of changes in ddl-log hits a limit we take a new
catalog-base-backup. This process is similar to a checkpoint, but much
less frequent, lets call it a ddl-checkpoint.

When we start to read WAL logs to translate them, we start by
truncating/re-bootstrapping and reloading the secondary catalog from
the base backup. We then apply all changes from the ddl-log (which is
just a sequence of SQL statements) up until the LSN at the start of
WAL. The secondary catalog is then an exact copy of the catalog as of
that LSN.

As we read through WAL we apply further changes to secondary catalog
so it maintains in lock step with the WAL we currently read.

Having the ddl-base-backup and ddl-log allows reconstruction of the
catalog without needing to put whole catalog into WAL each checkpoint.
We can truncate old WAL segments and yet still recreate the DDL needed
to translate current WAL data. As a result, ddl-checkpoints are much
less frequent, perhaps weekly or monthly, rather than every few
minutes.

The whole process is similar in algorithm to recovery, but is just
normal userspace tables and SQL.

Constructing the correct catalog seems to be the heart of this
problem, so it is likely to take a while and look complex. Getting the
HS initial state was around 50% of the effort in making it all work,
so I guess its similar here.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Re: [HACKERS] pl/perl and utf-8 in sql_ascii databases

2012-06-22 Thread Kyotaro HORIGUCHI

Hello.

  Renaming ret to quoted and str to ret as the patch attached might
  make it easily readable.
 
 I think I'm going to refrain from this because it will be more painful
 to backpatch.

I've felt hesitation to do so, too.

The new patch is indeed avoid leaks although which does not lasts
permanently. This will preserve more heap for future work but has
no necessity to do.

On the other hand, the return value of DatumGetTextP() is also
may (and 'should' in this case) palloc'ed but not pfree'd like
other part of PostgreSQL source (as far as I saw..) because of, I
suppose, the nature of these functions that it is
difficult/unable to be predicted/determined whether returning
memory block is palloc'ed ones or not. And the pain to maintain
such codes unrobust for future modification.

From such a point of view, we might be good to refrain to
backport this.


  The attached ugly patch does it. We seem should put NO_LOCALE=1
  on the 'make check' command line for the encodings not compatible
  with the environmental locale, although it looks work.
 
 The idea of separating the test into its own file has its merit; but
 instead of having two different tests, I'm going to have a single test
 and two expected files.  That seems simpler than messing around in the
 makefile.

Yes, you're right. But it was easier to add pairs of .sql and
.out to do that. Plus, as I wrote in another message, I'm
unwilling to push it nevertheless I've wrote it:-(

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

== My e-mail address has been changed since Apr. 1, 2012.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] SP-GiST for ranges based on 2d-mapping and quad-tree

2012-06-22 Thread Jeff Davis

On Thu, 2012-06-14 at 02:56 +0400, Alexander Korotkov wrote:
 Hackers,
 
 
 attached patch implements quad-tree on ranges. Some performance
 results in comparison with current GiST indexing.
 Index creation is slightly slower. Probably, it need some
 investigation. Search queries on SP-GiST use much more pages. However
 this comparison can be not really correct, because SP-GiST can pin
 same buffer several times during one scan. In CPU search queries on
 SP-GiST seems to be slightly faster. Dramatical difference in column
 @ const query is thanks to 2d-mapping.
 
Looking at this patch now. I see that it fails the opr_sanity test (on
master), can you look into that?

It looks very promising from a performance standpoint. I think the col
@ const query will be a common query; and I also think that pattern
will be useful to restrict a large table down to something more
manageable.

In the bounds_connected function, it might make more sense to use the
word adjacent which I already used for ordinary ranges, rather than
using the new word connected.

Also, I'm getting a little confused switching between thinking in terms
of X and Y and lower and upper (particularly since lower and upper
can be confused with  or ). I don't have a suggestion yet how to
clarify that, but it might be good to use the spatial terminology in
more places and avoid lower/upper except where needed.

Please excuse the slow review, I'm catching up on the SP-GiST API.

Regards,
Jeff Davis



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 04/16] Add embedded list interface (header only)

2012-06-22 Thread Andres Freund

On Friday, June 22, 2012 12:23:57 AM Peter Geoghegan wrote:
 On 20 June 2012 14:38, Andres Freund and...@2ndquadrant.com wrote:
  It incurs a rather high performance overhead due to added memory
  allocations and added pointer indirections. Thats fine for most of the
  current users of the List interface, but certainly not for all. In other
  places you cannot even have memory allocations because the list lives in
  shared memory.
 Yes, in general lists interact horribly with the memory hierarchy. I
 think I pointed out to you once a rant of mine on -hackers a while
 back in which I made various points about just how badly they do these
 days.
Yes, but how is that relevant? Its still the best data structure for many use-
cases. Removing one of the two indirections is still a good idea, hence this 
patch ;)

 On modern architectures, with many layers of cache, the cost of the
 linear search to get an insertion point is very large. So this:
 
 /*
  * removes a node from a list
  * Attention: O(n)
  */
 static inline void ilist_s_remove(ilist_s_head *head,
   ilist_s_node *node)
 
 
 is actually even worse than you might suspect.
O(n) is O(n), the constant is irrelevant. Anybody who uses arbitrary node 
removal in a single linked link in the fast path is deserves the pain ;)

  Several of the pieces of code I pointed out in a previous email use
  open-coded list implementation exactly to prevent those problems.
 
 Interesting.
 
 So, it seems like this list implementation could be described as a
 minimal embeddable list implementation that requires the user to do
 all the memory allocation, and offers a doubly-linked list too. Not an
 unreasonable idea. I do think that the constraints you have are not
 well served by any existing implementation, including List and Dllist.
Yep. Note though that you normally wouldn't do extra/manual memory allocation 
because you just use the already allocated memory of the struct where you 
embedded the list element into.

 Are you planning on just overhauling the Dllist interface in your next
 iteration?
It needs to be unified. Not yet sure whether its better to just remove Dllist 
or morph my code into it.

 All of the less popular compilers we support we support precisely
 because they pretend to be GCC, with the sole exception, as always, of
 the Microsoft product, in this case MSVC. So my position is that I'm
 in broad agreement that we should freely allow the use of inline
 without macro hacks, since we generally resists using macro hacks if
 that makes code ugly, which USE_INLINE certainly does, and for a
 benefit that is indistinguishable from zero, at least to me.
Tom already pointed out that not all compilers pretend to be gcc. I agree 
though that we should try to make all supported compilers support USE_INLINE. 
I think with some ugliness that should be possible at least for aCC. Will 
respond to Tom on that.

 Why are you using the stdlib's assert.h? Why have you used the
 NDEBUG macro rather than USE_ASSERT_CHECKING? This might make sense if
 the header was intended to live in port, but it isn't, right?
That should probably be removed, yes. I did it that way that it could be 
tested independently of casserts because the list checking code turns some 
linear algorithms into quadratic ones which is noticeable even when --enable-
cassert is defined.

 Why have you done this:
 
 #ifdef __GNUC__
 #define unused_attr __attribute__((unused))
 #else
 #define unused_attr
 #endif
 
 and then gone on to use this unused_attr macro all over the place?
 Firstly, that isn't going to suppress the warnings on many platforms
 that we support, and we do make an effort to build warning free on at
 least 3 compilers these days - GCC, Clang and MSVC. Secondly,
 compilers give these warnings because it doesn't make any sense to
 have an unused parameter - so why have you used one? At the very
 least, if you require this exact interface, use compatibility macros.
 I can't imagine why that would be important though. And even if you
 did want a standard unused_attr facility, you'd do that in c.h, where
 a lot of that stuff lives.
If you look at the places its mostly used in functions like:

/*
 * adds a node at the beginning of the list
 */
static inline void ilist_d_push_front(ilist_d_head *head, ilist_d_node *node)
{
node-next = head-head.next;
node-prev = head-head;
node-next-prev = node;
head-head.next = node;
ilist_d_check(head);
}
Where ilist_d_check doesn't do anything if assertions aren't enabled which gcc 
unfortunately groks and warns.

The other case is functions like:

static inline void ilist_s_add_after(unused_attr ilist_s_head *head,
 ilist_s_node *after, ilist_s_node *node)
{
node-next = after-next;
after-next = node;
}
Where it makes sense for the api to get the head element for consistency 
reasons. It very well would be possible to add a checking

Re: [HACKERS] [PATCH 04/16] Add embedded list interface (header only)

2012-06-22 Thread Andres Freund

On Friday, June 22, 2012 02:04:02 AM Tom Lane wrote:
 Peter Geoghegan pe...@2ndquadrant.com writes:
  All of the less popular compilers we support we support precisely
  because they pretend to be GCC, with the sole exception, as always, of
  the Microsoft product, in this case MSVC.
 
 This is nonsense.  There are at least three buildfarm machines running
 compilers that do not pretend to be gcc (at least, configure
 recognizes them as not gcc) and are not MSVC either.  We ought to have
 more IMO, because software monocultures are dangerous.  Of those three,
 two pass the quiet inline test and one --- the newest of the three
 if I guess correctly --- does not.  So it is not the case that
 !USE_INLINE is dead code, even if you adopt the position that we don't
 care about any compiler not represented in the buildfarm.
I think you can make hpux's acc do the right thing with some trickery though.  
I don't have access to hpux anymore though so I can't test it.

Should there be no other trick - I think there is though - we could just 
specify -W2177 as an alternative parameter to test in the 'quiet static 
inline' test.

I definitely do not want to bar any sensible compiler from compiling postgres 
but the keyword here is 'sensible'. If it requires some modest force/trickery 
to behave sensible, thats ok, but if we need to ship around huge unreadable 
crufty macros just to support them I don't find it ok.

Andres
-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allow WAL information to recover corrupted pg_controldata

2012-06-22 Thread Amit Kapila

Based on the discussion and suggestions in this mail chain, following features 
can be implemented:

1. To compute the value of max LSN in data pages based on user input whether he 
wants it for an individual file,
   a particular directory or whole database.

2a. To search the available WAL files for the latest checkpoint record and 
prints the value.
2b. To search the available WAL files for the latest checkpoint record and 
recreates a pg_control file pointing at that checkpoint.

I have kept both options to address different kind of corruption scenarios.

1. WAL files are in separate partition which is not corrupt, only the partition 
where data files and pg_control is corrupt. In this case users can use options 
2a or 2b to proceed.
2. All pg_control, data, WAL are on same disk partition which got corrupt. 
   In this case he can use options 1 and 2a to decide the next-LSN for 
pg_control and proceed.

Suggestions?

If there is an agreement to do this features, I can send the proposal which 
kind of options we can keep in existing or new utility for the usage.

With Regards,
Amit Kapila.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Catalog/Metadata consistency during changeset extraction from wal

2012-06-22 Thread Andres Freund

Hi,

On Friday, June 22, 2012 08:48:41 AM Simon Riggs wrote:
 On 21 June 2012 12:41, Andres Freund and...@2ndquadrant.com wrote:
  2.)
  Keep the decoding site up2date by replicating the catalog via normal HS
  recovery
  mechanisms.
  
  Advantages:
  * most of the technology is already there
  * minimal overhead (space, performance)
  * no danger of out of sync catalogs
  * no support for command triggers required that can keep a catalog in
  sync, including oids
  
  Disadvantages:
  * driving the catalog recovery that way requires some somewhat intricate
  code as it needs to be done in lockstep with decoding the wal-stream
  * requires an additional feature to guarantee HS always has enough
  information to be queryable after a crash/shutdown
  * some complex logic/low-level fudging required to keep the transactional
  behaviour sensible when querying the catalog
  * full version/architecture compatibility required
  * the decoding site will always ever be only readable
 
 My initial reaction was this wont work, but that requires
 qualification since this is a complex topic:  You can use this
 approach as long as you realise that the catalog it gives can never be
 rewound.
Well, only as far as the min recovery point has been advanced. Thats advanced 
less frequent than we apply xlog records.

 So the generic approach to construct me a catalog as of this LSN
 would need to start with a base backup of the catalog and then roll
 forward to the appropriate LSN. Which means a generic user of this
 approach would need to be able to construct an initial catalog using a
 PITR.

 Constructing a decoding site requires you to
 a)  take a partial base backup of the catalog
 b) apply WAL records to bring that forwards to the correct LSN, which
 would require some alteration of the recovery code to skip the files
 missing in a)
 
 So taking the approach of a decoding site means we have to modify
 recovery code, and even when we do that we still end up with a
 difficult to deploy option in the real world. Difficult to deploy
 becaus we need a whole new instance of Postgres, plus we need all of
 the WAL files, which could easily be impractical.
 
 The overall approach is good, but the architecture is wrong. What we
 need is a catalog base backup and a means of rolling forward to the
 appropriate LSN. Rolling forward using WAL is too bulky, so we need a
 separate log of DDL changes to the catalog. So what we need is a
 catalog base backup plus a ddl-log.
The idea was to store the applycache to disk everytime UpdateMinRecoveryPoint 
is called. That way you wouldn't have to scroll back, even if the database 
crashes/is stopped hard.
But I agree, I don't like the architecture that much either.

 To translate the WAL we maintain a secondary set of catalog tables,
 which only exist for logical replication. These are normal,
 non-versioned tables, but held in a new schema pg_logical or similar.
 One reason why this must be a secondary catalog is to allow the
 translation to take place on the target server, and to allow
 translation of WAL from a prior version of the catalog - so we can
 allow online upgrades across different catalog versions (and possibly
 major versions).
 The secondary catalog is used in place of the normal catalog during
 InitCatalogCache() in the apply process. All the normal caches exist,
 they just point to secondary relations rather than the normal ones.
 When we initialise replication we take a copy of the appropriate
 tables, columns and rows in a catalog-base-backup, using something
 like pg_dump. Overall, this is much smaller than normal catalog since
 it avoids temp tables, and anything not related to WAL translation.
 
 On each non-temp change to the database we record changes as SQL in
 the ddl-log, together with the LSN of the change.
 
 When number of changes in ddl-log hits a limit we take a new
 catalog-base-backup. This process is similar to a checkpoint, but much
 less frequent, lets call it a ddl-checkpoint.
 
 When we start to read WAL logs to translate them, we start by
 truncating/re-bootstrapping and reloading the secondary catalog from
 the base backup. We then apply all changes from the ddl-log (which is
 just a sequence of SQL statements) up until the LSN at the start of
 WAL. The secondary catalog is then an exact copy of the catalog as of
 that LSN.
 
 As we read through WAL we apply further changes to secondary catalog
 so it maintains in lock step with the WAL we currently read.
I can't see how thats going to fly because the *_out functions use the 
syscache and also plain access to catalog tables. We would have to completely 
map oids to the alternative catalog.
For one I think that mapping would involve far too many places (shared 
catalogs/relmapper. smgr, fd, syscache, ...). For another you need to access 
those tables in a completely normal fashion from non-recovery backends which 
means that we cannot just have duplicated oids hidden away somewhere.


 Constructing the

Re: [HACKERS] COMMUTATOR doesn't seem to work

2012-06-22 Thread Florian Pflug

On Jun22, 2012, at 06:32 , D'Arcy Cain wrote:
 So I have my type working now but I had to create a new C function
 that take the opposite argument order.  Seems redundant but I could
 not see a better way.

There isn't. Postgres itself contains a huge number of such functions,
e.g. for every *lt() (less-than) there's a *gt() (greater-than).

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Transactions over pathological TCP connections

2012-06-22 Thread Dimitri Fontaine

Leon Smith leon.p.sm...@gmail.com writes:
 It's not clear to me that this is even a solvable problem without modifying
 the schema to include both a taken and a finished processing state,
  and then letting elements be re-delievered after a period of time.

You maybe should have a look at PGQ from Skytools before reinventing it.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Add some more documentation for array indexes/operators

2012-06-22 Thread Robert Haas

On Wed, Jun 20, 2012 at 9:23 AM, Ryan Kelly rpkell...@gmail.com wrote:
 I had trouble finding what operators arrays supported or which ones
 had index support or even determining that arrays could be indexed from
 the documentation from the array data type. So, patch.

Yeah, I agree that the  method should be mentioned there.  I edited
this down a bit so that it doesn't duplicate quite as much material,
and committed it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Too frequent message of pgbench -i?

2012-06-22 Thread Robert Haas

On Thu, Jun 21, 2012 at 3:44 AM, Tatsuo Ishii is...@postgresql.org wrote:
 On Wed, Jun 20, 2012 at 4:04 AM, Tatsuo Ishii is...@postgresql.org wrote:
 Currently pgbench -i prints following message every 10k tuples created.

                        fprintf(stderr, %d tuples done.\n, j);

 I think it's long time ago when the frequency of message seemed to be
 appropriate because computer is getting so fast these days and every
 10k message seems to be too often for me. Can we change the frequency
 from 10k to 100k?

 +1 for doing it that way.  I have an old patch lying around for that.
 It's one line, which seems to be about as much work as the problem
 justifies.

 +1. Your patch looks good.

Thanks, committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Pruning the TODO list

2012-06-22 Thread Robert Haas

On Thu, Jun 21, 2012 at 10:25 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On 21 June 2012 15:00, Tom Lane t...@sss.pgh.pa.us wrote:
 Simon Riggs si...@2ndquadrant.com writes:
 On 21 June 2012 08:30, Peter Eisentraut pete...@gmx.net wrote:
 Nonetheless, it would be a good idea to prune the TODO list regularly,
 such as after a release.  We used to do that a bit, not so much lately,
 perhaps.  But everyone is invited to contribute to that.

 The idea is to remove contentious issues from the list, to avoid the
 waste of time.

 The thing is, a lot of stuff gets punted to the TODO list *because*
 it's contentious, ie there's not consensus on what to do.  If there
 were consensus we might've just done it already.  I'm not sure we want
 to remove such entries, though perhaps somehow marking them as debatable
 would be a good thing.

 There may well be stuff on the list that is no longer very relevant in
 today's world, but somebody would have to go through it item by item
 to decide which ones those are.  I'm not volunteering.

 smiles Understood

 I'll have a play. Maybe I should just go with the idea of Simon's
 TODO List - stuff I personally think is worth working on, and leave
 it at that.

+1 for that approach.  I have a page on the wiki which is irregularly
updated and contains a somewhat random list of things that I think are
worth doing.  I think it would be great to have similar lists for
other developers, even if they're not 100% up-to-date or accurate.

I wouldn't be averse to pruning 10% of the TODO list, maybe even 20%,
but I think there's a lot of stuff on there that's actually worth
doing, even if much of it needs discussion before it's implemented.

Novices - and even experienced developers, sometimes - tend to make
the mistake of deciding to implement X, as if it were self-evident
that X is a good thing.  The TODO list can feed that misapprehension,
but isn't really the source of it.  Rather, we all like to believe
that our own ideas are awesome.  This is frequently true, but not so
frequently as we like to believe.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Catalog/Metadata consistency during changeset extraction from wal

2012-06-22 Thread Andres Freund

On Thursday, June 21, 2012 05:40:08 PM Andres Freund wrote:
 On Thursday, June 21, 2012 03:56:54 PM Florian Pflug wrote:
  On Jun21, 2012, at 13:41 , Andres Freund wrote:
   3b)
   Ensure that enough information in the catalog remains by fudging the
   xmin horizon. Then reassemble an appropriate snapshot to read the
   catalog as the tuple in question has seen it.
  
  The ComboCID machinery makes that quite a bit harder, I fear. If a tuple
  is updated multiple times by the same transaction, you cannot decide
  whether a tuple was visible in a certain snapshot unless you have access
  to the updating backend's ComboCID hash.
 
 Thats a very good point. Not sure how I forgot that.
 
 It think it might be possible to reconstruct a sensible combocid mapping
 from the walstream. Let me think about it for a while...
I have a very, very preliminary thing which seems to work somewhat. I just log 
(cmin, cmax) additionally for every modified catalog tuple into the wal and so 
far that seems to be enough.
Do you happen to have suggestions for other problematic things to look into 
before I put more time into it?

Andres
-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Pruning the TODO list

2012-06-22 Thread Simon Riggs

On 22 June 2012 14:15, Robert Haas robertmh...@gmail.com wrote:

 Rather, we all like to believe
 that our own ideas are awesome.  This is frequently true, but not so
 frequently as we like to believe.

Hmm, for me, awesome has nothing to do with it. I strive to produce
useful features that address real problems in the simplest way. I
think most of my proposals are fairly obvious to database users. If I
find a solution, I push it, but not because I found it, or I think its
awesome.

The idea that I'm scratching my own itches is mostly wrong. For me,
this is about working on the features that Postgres needs and then
doing them, with a sense of urgency that seems to be slightly ahead of
the curve.

It's not just reviewers that work hard on tasks they may not be
interested in yet believe are for the common good.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] proposal and patch : support INSERT INTO...RETURNING with partitioned table using rule

2012-06-22 Thread Robert Haas

On Wed, Jun 20, 2012 at 12:24 PM, John Lumby johnlu...@hotmail.com wrote:
     An INSERT which has a RETURNING clause and which is to be rewritten based 
 on
     a rule will be accepted if the rule is an unconditional DO INSTEAD.
     In general I believe unconditional means no WHERE clause, but in 
 practice
     if the rule is of the form
    CREATE RULE insert_part_history as ON INSERT to history \
  DO INSTEAD SELECT history_insert_partitioned(NEW) returning NEW.id
     this is treated as conditional and the query is rejected.

This isn't rejected because the query is treated as condition; it's
rejected because it's not valid syntax.  A SELECT query can't have a
RETURNING clause, because the target list (i.e. the part that
immediately follows the SELECT) already serves that purpose.  The fact
that it's in a CREATE RULE statement is irrelevant:

rhaas=# select 4 returning 3;
ERROR:  syntax error at or near returning
LINE 1: select 4 returning 3;
 ^

   .  I propose to extend the rule system to recognize cases where the INSERT 
 query specifies
  RETURNING and the rule promises to return a row,  and to then permit 
 this query to run
  and return the expected row.   In effect,  to widen the definition of 
 unconditional
  to handle cases such as my testcase.

That already (kind of) works:

rhaas=# create table history (id bigserial, name text);NOTICE:  CREATE
TABLE will create implicit sequence history_id_seq for serial column
history.id
CREATE TABLE
rhaas=# create table history1 () inherits (history);
CREATE TABLE
rhaas=# create rule history_insert as on insert to history do instead
insert into history1 (id, name) values (NEW.id, NEW.name || ' is
awesome!') returning 17::bigint, 'cheeze whiz'::text;
CREATE RULE
rhaas=# insert into history (name) values ('Linus') returning id,
name; id |name
+-
 17 | cheeze whiz
(1 row)

INSERT 0 1
rhaas=# select * from history;
 id |   name
+---
  1 | Linus is awesome!
(1 row)

I do notice that the RETURNING clause of the INSERT can't reference
NEW, which seems like a restriction that we probably ought to lift,
but it doesn't seem to have much to do with your patch.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] initdb and fsync

2012-06-22 Thread Robert Haas

On Sat, Feb 4, 2012 at 8:18 PM, Noah Misch n...@leadboat.com wrote:
 On Sat, Feb 04, 2012 at 03:41:27PM -0800, Jeff Davis wrote:
 On Sat, 2012-01-28 at 13:18 -0500, Tom Lane wrote:
  Yeah.  Personally I would be sad if initdb got noticeably slower, and
  I've never seen or heard of a failure that this would fix.

 I worked up a patch, and it looks like it does about 6 file fsync's and
 a 7th for the PGDATA directory. That degrades the time from about 1.1s
 to 1.4s on my workstation.

 So, is it worth it? Should we make it an option that can be specified?

 If we add fsync calls to the initdb process, they should cover the entire data
 directory tree.  This patch syncs files that initdb.c writes, but we ought to
 also sync files that bootstrap-mode backends had written.  An optimization
 like the pg_flush_data() call in copy_file() may reduce the speed penalty.

 initdb should do these syncs by default and offer an option to disable them.

This may be a stupid question, by why is it initdb's job to fsync the
files the server creates, rather than the server's job?  Normally we
rely on the server to make its own writes persistent.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 01/16] Overhaul walsender wakeup handling

2012-06-22 Thread Robert Haas

 I am not convinced that it's a good idea to wake up every walsender
 every time we do XLogInsert().  XLogInsert() is a super-hot code path,
 and adding more overhead there doesn't seem warranted.  We need to
 replicate commit, commit prepared, etc. quickly, by why do we need to
 worry about a short delay in replicating heap_insert/update/delete,
 for example?  They don't really matter until the commit arrives.  7
 seconds might be a bit long, but that could be fixed by decreasing the
 polling interval for walsender to, say, a second.
 Its not woken up every XLogInsert call. Its only woken up if there was an
 actual disk write + fsync in there. Thats exactly the point of the patch.

Sure, but it's still adding cycles to XLogInsert.  I'm not sure that
XLogBackgroundFlush() is the right place to be doing this, but at
least it's in the background rather than the foreground.

 The wakeup rate is actually lower for synchronous_commit=on than before
 because then it unconditionally did a wakeup for every commit (and similar)
 and now only does that if something has been written + fsynced.

I'm a bit confused by this, because surely if there's been a commit,
then WAL has been written and fsync'd, but the reverse is not true.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Lazy hashaggregate when no aggregation is needed

2012-06-22 Thread Robert Haas

On Tue, Jun 19, 2012 at 5:41 AM, Etsuro Fujita
fujita.ets...@lab.ntt.co.jp wrote:
 I'm confused by this remark, because surely the query planner does it this
 way only if there's no LIMIT.  When there is a LIMIT, we choose based on
 the startup cost plus the estimated fraction of the total cost we expect
 to pay based on dividing the LIMIT by the overall row count estimate.  Or
 is this not what you're talking about?

 I think that Ants is pointing the way of estimating costs in
 choose_hashed_grouping()/choose_hashed_distinct(), ie cost_agg() for
 cheapest_path + hashagg, where the costs are calculated based on the total
 cost only of cheapest_path.  I think that it might be good to do cost_agg()
 for the discussed case with the AGG_SORTED strategy, not the AGG_HASHED
 strategy.

Well, Ants already made some adjustments to those functions; not sure
if this means they need some more adjustment, but I don't see that
there's a general problem with the costing algorithm around LIMIT.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 04/16] Add embedded list interface (header only)

2012-06-22 Thread Tom Lane

Andres Freund and...@2ndquadrant.com writes:
 On Friday, June 22, 2012 12:23:57 AM Peter Geoghegan wrote:
 Why are you using the stdlib's assert.h? Why have you used the
 NDEBUG macro rather than USE_ASSERT_CHECKING? This might make sense if
 the header was intended to live in port, but it isn't, right?

 That should probably be removed, yes. I did it that way that it could be 
 tested independently of casserts because the list checking code turns some 
 linear algorithms into quadratic ones which is noticeable even when --enable-
 cassert is defined.

As far as that goes, I wonder whether the list-checking code hasn't
long since served its purpose.  Neil Conway put it in when he redid the
List API to help catch places that were using no-longer-supported hacks;
but it's been years since I've seen it catch anything.  I suggest that
we might want to either remove it, or enable it via something other than
USE_ASSERT_CHECKING (and not enable it by default).

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 01/16] Overhaul walsender wakeup handling

2012-06-22 Thread Andres Freund

On Friday, June 22, 2012 04:09:59 PM Robert Haas wrote:
  I am not convinced that it's a good idea to wake up every walsender
  every time we do XLogInsert().  XLogInsert() is a super-hot code path,
  and adding more overhead there doesn't seem warranted.  We need to
  replicate commit, commit prepared, etc. quickly, by why do we need to
  worry about a short delay in replicating heap_insert/update/delete,
  for example?  They don't really matter until the commit arrives.  7
  seconds might be a bit long, but that could be fixed by decreasing the
  polling interval for walsender to, say, a second.
  
  Its not woken up every XLogInsert call. Its only woken up if there was an
  actual disk write + fsync in there. Thats exactly the point of the patch.
 Sure, but it's still adding cycles to XLogInsert.  I'm not sure that
 XLogBackgroundFlush() is the right place to be doing this, but at
 least it's in the background rather than the foreground.
It adds one if() if nothing was fsynced. If something was written and fsynced 
inside XLogInsert some kill() calls are surely not the problem.

  The wakeup rate is actually lower for synchronous_commit=on than before
  because then it unconditionally did a wakeup for every commit (and
  similar) and now only does that if something has been written + fsynced.
 I'm a bit confused by this, because surely if there's been a commit,
 then WAL has been written and fsync'd, but the reverse is not true.
As soon as you have significant concurrency by the time the XLogFlush in 
RecordTransactionCommit() is reached another backend or the wal writer may 
have already fsynced the wal up to the requested point. In that case no wakeup 
will performed by the comitting backend at all. 9.2 improved the likelihood of 
that as you know.

Andres
-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 04/16] Add embedded list interface (header only)

2012-06-22 Thread Andres Freund

On Friday, June 22, 2012 04:18:35 PM Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On Friday, June 22, 2012 12:23:57 AM Peter Geoghegan wrote:
  Why are you using the stdlib's assert.h? Why have you used the
  NDEBUG macro rather than USE_ASSERT_CHECKING? This might make sense if
  the header was intended to live in port, but it isn't, right?
  
  That should probably be removed, yes. I did it that way that it could be
  tested independently of casserts because the list checking code turns
  some linear algorithms into quadratic ones which is noticeable even when
  --enable- cassert is defined.
 
 As far as that goes, I wonder whether the list-checking code hasn't
 long since served its purpose.  Neil Conway put it in when he redid the
 List API to help catch places that were using no-longer-supported hacks;
 but it's been years since I've seen it catch anything.  I suggest that
 we might want to either remove it, or enable it via something other than
 USE_ASSERT_CHECKING (and not enable it by default).
Oh, I and Peter weren't talking about the pg_list.h stuff, it was about my 
'embedded list' implementation which started this subthread. The 
pg_list.h/list.c stuff isn't problematic as far as I have seen in profiles; 
its checks are pretty simple so I do not find that surprising. We might want 
to disable it by default anyway.

In my code the list checking stuff iterates over the complete list after 
modifications and checks that all prev/next pointers are correct so its linear 
in itself...

Andres
-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 01/16] Overhaul walsender wakeup handling

2012-06-22 Thread Robert Haas

On Fri, Jun 22, 2012 at 10:19 AM, Andres Freund and...@2ndquadrant.com wrote:
 On Friday, June 22, 2012 04:09:59 PM Robert Haas wrote:
  I am not convinced that it's a good idea to wake up every walsender
  every time we do XLogInsert().  XLogInsert() is a super-hot code path,
  and adding more overhead there doesn't seem warranted.  We need to
  replicate commit, commit prepared, etc. quickly, by why do we need to
  worry about a short delay in replicating heap_insert/update/delete,
  for example?  They don't really matter until the commit arrives.  7
  seconds might be a bit long, but that could be fixed by decreasing the
  polling interval for walsender to, say, a second.
 
  Its not woken up every XLogInsert call. Its only woken up if there was an
  actual disk write + fsync in there. Thats exactly the point of the patch.
 Sure, but it's still adding cycles to XLogInsert.  I'm not sure that
 XLogBackgroundFlush() is the right place to be doing this, but at
 least it's in the background rather than the foreground.
 It adds one if() if nothing was fsynced. If something was written and fsynced
 inside XLogInsert some kill() calls are surely not the problem.

  The wakeup rate is actually lower for synchronous_commit=on than before
  because then it unconditionally did a wakeup for every commit (and
  similar) and now only does that if something has been written + fsynced.
 I'm a bit confused by this, because surely if there's been a commit,
 then WAL has been written and fsync'd, but the reverse is not true.
 As soon as you have significant concurrency by the time the XLogFlush in
 RecordTransactionCommit() is reached another backend or the wal writer may
 have already fsynced the wal up to the requested point. In that case no wakeup
 will performed by the comitting backend at all. 9.2 improved the likelihood of
 that as you know.

Hmm, well, I guess.  I'm still not sure I really understand what
benefit we're getting out of this.  If we lose a few WAL records for
an uncommitted transaction, who cares?  That transaction is gone
anyway.

As an implementation detail, I suggest rewriting WalSndWakeupRequest
and WalSndWakeupProcess as macros.  The old code does an in-line test
for max_wal_senders  0, which suggests that somebody thought the
function call overhead might be enough to matter here.  Perhaps they
were wrong, but it shouldn't hurt anything to keep it that way.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 04/16] Add embedded list interface (header only)

2012-06-22 Thread Tom Lane

Andres Freund and...@2ndquadrant.com writes:
 Oh, I and Peter weren't talking about the pg_list.h stuff, it was about my 
 'embedded list' implementation which started this subthread. The 
 pg_list.h/list.c stuff isn't problematic as far as I have seen in profiles; 
 its checks are pretty simple so I do not find that surprising. We might want 
 to disable it by default anyway.

 In my code the list checking stuff iterates over the complete list after 
 modifications and checks that all prev/next pointers are correct so its 
 linear 
 in itself...

Well, so does list.c, so I'd expect the performance risks to be similar.
Possibly you're testing on longer lists than are typical in the backend.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 01/16] Overhaul walsender wakeup handling

2012-06-22 Thread Andres Freund

On Friday, June 22, 2012 04:34:33 PM Robert Haas wrote:
 On Fri, Jun 22, 2012 at 10:19 AM, Andres Freund and...@2ndquadrant.com 
wrote:
  On Friday, June 22, 2012 04:09:59 PM Robert Haas wrote:
   I am not convinced that it's a good idea to wake up every walsender
   every time we do XLogInsert().  XLogInsert() is a super-hot code
   path, and adding more overhead there doesn't seem warranted.  We
   need to replicate commit, commit prepared, etc. quickly, by why do
   we need to worry about a short delay in replicating
   heap_insert/update/delete, for example?  They don't really matter
   until the commit arrives.  7 seconds might be a bit long, but that
   could be fixed by decreasing the polling interval for walsender to,
   say, a second.
   
   Its not woken up every XLogInsert call. Its only woken up if there was
   an actual disk write + fsync in there. Thats exactly the point of the
   patch.
  
  Sure, but it's still adding cycles to XLogInsert.  I'm not sure that
  XLogBackgroundFlush() is the right place to be doing this, but at
  least it's in the background rather than the foreground.
  
  It adds one if() if nothing was fsynced. If something was written and
  fsynced inside XLogInsert some kill() calls are surely not the problem.
  
   The wakeup rate is actually lower for synchronous_commit=on than
   before because then it unconditionally did a wakeup for every commit
   (and similar) and now only does that if something has been written +
   fsynced.
  
  I'm a bit confused by this, because surely if there's been a commit,
  then WAL has been written and fsync'd, but the reverse is not true.
  
  As soon as you have significant concurrency by the time the XLogFlush in
  RecordTransactionCommit() is reached another backend or the wal writer
  may have already fsynced the wal up to the requested point. In that case
  no wakeup will performed by the comitting backend at all. 9.2 improved
  the likelihood of that as you know.
 Hmm, well, I guess.  I'm still not sure I really understand what
 benefit we're getting out of this.  If we lose a few WAL records for
 an uncommitted transaction, who cares?  That transaction is gone
 anyway.
Well, before the simple fix Simon applied after my initial complaint you 
didn't get wakeups *at all* in the synchronous_commit=off case.

Now, with the additional changes, the walsender is woken exactly when data is 
available to send and not always when a commit happens. I played around with 
various scenarios and it always was a win. One reason is that the walreceiver 
often is a bottleneck because it fsyncs the received data immediately so a 
less blocky transfer pattern is reducing that problem a bit.

 As an implementation detail, I suggest rewriting WalSndWakeupRequest
 and WalSndWakeupProcess as macros.  The old code does an in-line test
 for max_wal_senders  0, which suggests that somebody thought the
 function call overhead might be enough to matter here.  Perhaps they
 were wrong, but it shouldn't hurt anything to keep it that way.
True.

Greetings,

Andres
-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 04/16] Add embedded list interface (header only)

2012-06-22 Thread Andres Freund

On Friday, June 22, 2012 04:41:20 PM Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  Oh, I and Peter weren't talking about the pg_list.h stuff, it was about
  my 'embedded list' implementation which started this subthread. The
  pg_list.h/list.c stuff isn't problematic as far as I have seen in
  profiles; its checks are pretty simple so I do not find that surprising.
  We might want to disable it by default anyway.
  
  In my code the list checking stuff iterates over the complete list after
  modifications and checks that all prev/next pointers are correct so its
  linear in itself...
 
 Well, so does list.c, so I'd expect the performance risks to be similar.
 Possibly you're testing on longer lists than are typical in the backend.
I don't think list.c does so:

static void
check_list_invariants(const List *list)
{
if (list == NIL)
return;

Assert(list-length  0);
Assert(list-head != NULL);
Assert(list-tail != NULL);

Assert(list-type == T_List ||
   list-type == T_IntList ||
   list-type == T_OidList);

if (list-length == 1)
Assert(list-head == list-tail);
if (list-length == 2)
Assert(list-head-next == list-tail);
Assert(list-tail-next == NULL);
}

But yes, the lists I deal with are significantly longer, so replacing O(n) by 
O(n^2) is rather painful there...

Andres
-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pl/perl and utf-8 in sql_ascii databases

2012-06-22 Thread Tom Lane

Kyotaro HORIGUCHI horiguchi.kyot...@lab.ntt.co.jp writes:
 +REGRESS_LC0 = $(subst .sql,,$(shell cd sql; ls plperl_lc_$(shell echo

 Hrm, that's quite cute. I dunno if there is a more cannon way of doing
 the above-- but it seems to work. I'm not sure this regression test is
 worth it. I'm thinking maybe we should just remove
 theegressionegression test instead.

 I agree. That is the fundamental question. I've coded just for my
 fun but I don't see not so much signicance to do that. We might
 omit the test for this which is non-ciritical and corner cases.

We need these tests to work on Windows too, so fancy gmake tricks are
probably not the way to deal with varying results.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 04/16] Add embedded list interface (header only)

2012-06-22 Thread Tom Lane

Andres Freund and...@2ndquadrant.com writes:
 On Friday, June 22, 2012 04:41:20 PM Tom Lane wrote:
 Well, so does list.c, so I'd expect the performance risks to be similar.

 I don't think list.c does so:

Huh, OK.  I seem to remember that the original version actually chased
down the whole list and verified that the length matched.  We must've
soon decided that that was insupportable in practice.  There might be
a lesson here for your checks.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 01/16] Overhaul walsender wakeup handling

2012-06-22 Thread Robert Haas

On Fri, Jun 22, 2012 at 10:45 AM, Andres Freund and...@2ndquadrant.com wrote:
  the likelihood of that as you know.
 Hmm, well, I guess.  I'm still not sure I really understand what
 benefit we're getting out of this.  If we lose a few WAL records for
 an uncommitted transaction, who cares?  That transaction is gone
 anyway.
 Well, before the simple fix Simon applied after my initial complaint you
 didn't get wakeups *at all* in the synchronous_commit=off case.

 Now, with the additional changes, the walsender is woken exactly when data is
 available to send and not always when a commit happens. I played around with
 various scenarios and it always was a win.

Can you elaborate on that a bit?  What scenarios did you play around
with, and what does win mean in this context?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] COMMUTATOR doesn't seem to work

2012-06-22 Thread D'Arcy Cain


On 12-06-22 07:11 AM, Florian Pflug wrote:

On Jun22, 2012, at 06:32 , D'Arcy Cain wrote:

So I have my type working now but I had to create a new C function
that take the opposite argument order.  Seems redundant but I could
not see a better way.


There isn't. Postgres itself contains a huge number of such functions,
e.g. for every *lt() (less-than) there's a *gt() (greater-than).


Right but that's not the same thing.  Assuming you meant lt/gte and
lte/gt those still are not self-commutating (SC).  For example, '='
on two ints is SC.  The issue here is that the operator is SC but
the args are different types.  It would be nice if there was a way
to automatically generate code that reverses arguments.  Maybe such
a thing belongs in the CREATE FUNCTION command.

Or, I guess this works and keeps from creating a second C function:

CREATE OR REPLACE FUNCTION eq(chkpass, text)
RETURNS bool
STRICT
AS 'chkpass.so', 'chkpass_eq'
LANGUAGE 'c'
RETURNS NULL ON NULL INPUT;

CREATE OR REPLACE FUNCTION eq(text, chkpass)
RETURNS bool
AS 'select eq($2, $1);'
LANGUAGE SQL
RETURNS NULL ON NULL INPUT;


--
D'Arcy J.M. Cain da...@druid.net |  Democracy is three wolves
http://www.druid.net/darcy/|  and a sheep voting on
+1 416 425 1212 (DoD#0082)(eNTP)   |  what's for dinner.
IM: da...@vex.net

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allow WAL information to recover corrupted pg_controldata

2012-06-22 Thread Robert Haas

On Fri, Jun 22, 2012 at 5:25 AM, Amit Kapila amit.kap...@huawei.com wrote:
 Based on the discussion and suggestions in this mail chain, following 
 features can be implemented:

 1. To compute the value of max LSN in data pages based on user input whether 
 he wants it for an individual file,
   a particular directory or whole database.

 2a. To search the available WAL files for the latest checkpoint record and 
 prints the value.
 2b. To search the available WAL files for the latest checkpoint record and 
 recreates a pg_control file pointing at that checkpoint.

 I have kept both options to address different kind of corruption scenarios.

I think I can see all of those things being potentially useful.  There
are a couple of pending patches that will revise the WAL format
slightly; not sure how much those are likely to interfere with any
development you might do on (2) in the meantime.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] COMMUTATOR doesn't seem to work

2012-06-22 Thread Tom Lane

D'Arcy Cain da...@druid.net writes:
 ... The issue here is that the operator is SC but
 the args are different types.

Well, that's a weird way of defining self-commutating, but ...

 It would be nice if there was a way
 to automatically generate code that reverses arguments.  Maybe such
 a thing belongs in the CREATE FUNCTION command.

 Or, I guess this works and keeps from creating a second C function:

 CREATE OR REPLACE FUNCTION eq(chkpass, text)
  RETURNS bool
  STRICT
  AS 'chkpass.so', 'chkpass_eq'
  LANGUAGE 'c'
  RETURNS NULL ON NULL INPUT;

 CREATE OR REPLACE FUNCTION eq(text, chkpass)
  RETURNS bool
  AS 'select eq($2, $1);'
  LANGUAGE SQL
  RETURNS NULL ON NULL INPUT;

The thing is that either of those approaches is hugely more expensive
than just providing a second C function.  It costs probably thousands
of cycles to inline that SQL function, each time it's used in a query.
I doubt that an auto reverse the arguments facility would be very
much cheaper.  You could maybe argue that the aggregated maintenance
and space costs of all the commutator-pair functions are enough to
justify having some such solution instead, but I'm doubtful --- and
even if true, getting from here to there would be painful.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Pruning the TODO list

2012-06-22 Thread Andrew Dunstan




On 06/22/2012 09:45 AM, Simon Riggs wrote:

On 22 June 2012 14:15, Robert Haasrobertmh...@gmail.com  wrote:


Rather, we all like to believe
that our own ideas are awesome.  This is frequently true, but not so
frequently as we like to believe.

Hmm, for me, awesome has nothing to do with it. I strive to produce
useful features that address real problems in the simplest way. I
think most of my proposals are fairly obvious to database users. If I
find a solution, I push it, but not because I found it, or I think its
awesome.

The idea that I'm scratching my own itches is mostly wrong. For me,
this is about working on the features that Postgres needs and then
doing them, with a sense of urgency that seems to be slightly ahead of
the curve.

It's not just reviewers that work hard on tasks they may not be
interested in yet believe are for the common good.




That's true of many developers.

I think the real problem with the TODO list is that some people see it 
as some sort of official roadmap, and it really isn't. Neither is there 
anything else that is.


cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Event Triggers reduced, v1

2012-06-22 Thread Robert Haas

On Wed, Jun 20, 2012 at 4:36 PM, Dimitri Fontaine
dimi...@2ndquadrant.fr wrote:
 Robert Haas robertmh...@gmail.com writes:
 1. I still think we ought to get rid of the notion of BEFORE or AFTER
 (i.e. pg_event_trigger.evttype) and just make that detail part of the
 event name (e.g. pg_event_trigger.evtevent).  Many easily forseeable
 event types will be more like during rather than before or
 after, and for those that do have a notion of before and after, we
 can have two different event names and include the word before or
 after there.  I am otherwise satisfied with the schema you've
 chosen.

 It's not before/after anymore, but rather addon/replace if you will. I
 kept the INSTEAD OF keyword for the replace semantics, that you've been
 asking me to keep IIRC, with security policy plugins as a use case.

 Now we can of course keep those semantics and embed them in the event
 name we provide users, I though that maybe a documentation matrix of
 which event support which mode would be cleaner to document. We might
 as well find a clean way to implement both modes for most of the
 commands, I don't know yet.

 So, are you sure you want to embed that part of the event trigger
 semantics in the event name itself?

Yeah, pretty sure.  I think that for regular triggers, BEFORE, AFTER,
and INSTEAD-OF are the firing-point specification.   But even triggers
will have more than three firing points, probably eventually quite a
lot more.  So we need something more flexible.  But we don't need that
more flexible thing AND ALSO the before/after/instead-of
specification, which I think in most cases won't be meaningful anyway.
 It happens to be somewhat sensible for this initial firing point, but
I think for most of them there will be just one place, and in many
cases it will be neither before, nor after, nor instead-of.

 2. I think it's important to be able to add new types of event
 triggers without creating excessive parser bloat.  I think it's

 I've been trying to do that yes, as you can see with event_name and
 event_trigger_variable rules. I've been re-using as much existing
 keywords as I could because I believe that's not causing any measurable
 bloat, I'll kindly reconsider if necessary, even if sadly.

The issue is that the size of the parser tables grow with the square
of the number of states.  This will introduce lots of new states that
we don't really need; and every new kind of event trigger that we want
to add will introduce more.

 3. The event trigger cache seems to be a few bricks shy of a load.

 I wouldn't be that surprised, mind you. I didn't have nearly as much
 time I wanted to working on that project.

 First, event_trigger_cache_is_stalled is mis-named; I think you mean
 stale, not stalled.  Second, instead of setting that flag and then

 Stale. Right. Edited.

 rebuilding the cache when you see the flag set, how about just blowing
 away the cache contents whenever you would have set the flag?  That

 I've been doing that at first, but that meant several full rebuilds in a
 row in the regression tests, which are adding new event triggers then
 using them. I though lazily maintaining the cache would be better.

Well, AFAICS, you're still doing full rebuilds whenever something
changes; you're just keeping the (useless, dead) cache around until
you decide to rebuild it.  Might as well free the memory once you know
that the next access will rebuild it anyway, and for a bonus it saves
you a flag.

 I'm not that fond of psql commands, but I don't think it's going to fly
 not to have one for event triggers. I could buy \dy.

Yeah, I think people are going to want to have one.  I really despise
the \dwhatever syntax, but it's not 100% clear what a better one
would look like.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] COMMUTATOR doesn't seem to work

2012-06-22 Thread D'Arcy Cain


On 12-06-22 11:36 AM, Tom Lane wrote:

D'Arcy Cainda...@druid.net  writes:
The thing is that either of those approaches is hugely more expensive
than just providing a second C function.  It costs probably thousands
of cycles to inline that SQL function, each time it's used in a query.


I assumed itwould be more expensive but didn't know it would be that
much more.


I doubt that an auto reverse the arguments facility would be very
much cheaper.  You could maybe argue that the aggregated maintenance
and space costs of all the commutator-pair functions are enough to
justify having some such solution instead, but I'm doubtful --- and
even if true, getting from here to there would be painful.


And it would only apply to a very specific type of function.

The other idea I had was to just have the second C function call the
first but that didn't work.  Here is what I tried.

PG_FUNCTION_INFO_V1(chkpass_eq);
Datum
chkpass_eq(PG_FUNCTION_ARGS)
{
chkpass*a1 = (chkpass *) PG_GETARG_POINTER(0);
text   *a2 = (text *) PG_GETARG_TEXT_P(1);
charstr[9];

strlcpy(str, a2-vl_dat, sizeof(str));
PG_RETURN_BOOL(strcmp(a1-password, crypt(str, a1-password)) == 0);
}

PG_FUNCTION_INFO_V1(chkpass_eq2);
Datum
chkpass_eq2(PG_FUNCTION_ARGS)
{
return chkpass_eq(PG_GETARG_POINTER(1), PG_GETARG_TEXT_P(0));
}

Now in this specific case the function is trivial and writing it twice
is no big deal but in general I hate writing the same code twice.  I
suppose I could extract the actual operation out to a third function
and call it from the others.  I may do that anyway just for the value
of the example.  Or is there a way to do what I tried above?

--
D'Arcy J.M. Cain da...@druid.net |  Democracy is three wolves
http://www.druid.net/darcy/|  and a sheep voting on
+1 416 425 1212 (DoD#0082)(eNTP)   |  what's for dinner.
IM: da...@vex.net

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 01/16] Overhaul walsender wakeup handling

2012-06-22 Thread Andres Freund

On Friday, June 22, 2012 04:59:45 PM Robert Haas wrote:
 On Fri, Jun 22, 2012 at 10:45 AM, Andres Freund and...@2ndquadrant.com 
wrote:
   the likelihood of that as you know.
  
  Hmm, well, I guess.  I'm still not sure I really understand what
  benefit we're getting out of this.  If we lose a few WAL records for
  an uncommitted transaction, who cares?  That transaction is gone
  anyway.
  
  Well, before the simple fix Simon applied after my initial complaint you
  didn't get wakeups *at all* in the synchronous_commit=off case.
  
  Now, with the additional changes, the walsender is woken exactly when
  data is available to send and not always when a commit happens. I played
  around with various scenarios and it always was a win.
 
 Can you elaborate on that a bit?  What scenarios did you play around
 with, and what does win mean in this context?
I had two machines connected locally and setup HS and my prototype between 
them (not at once obviously).
The patch reduced all the average latency between both nodes (measured by 
'ticker' rows arriving in a table on the standby), the jitter in latency and 
the amount of load I had to put on the master before the standby couldn't keep 
up anymore.

I played with different loads:
* multple concurrent ~50MB COPY's 
* multple concurrent ~50MB COPY's, pgbench
* pgbench

All three had a ticker running concurrently with synchronous_commit=off 
(because it shouldn't cause any difference in the replication pattern itself).

The difference in averagelag and cutoff were smallest with just pgbench running 
alone and biggest with COPY running alone. Highjitter was most visible with 
just pgbench running alone but thats likely just because the average lag was 
smaller.

Its not that surprising imo. On workloads that have a high wal throughput like 
all of the above XLogInsert frequently has to write out data itself. If that 
happens the walsender might not get waken up in the current setup so the 
walsender/receiver pair is inactive and starts to work like crazy afterwards 
to catch up. During that period of higher activity it does fsync's of 
MAX_SEND_SIZE (16 * XLOG_BLKSZ) in a high rate which reduces the throughput of 
apply...

Greetings,

Andres
-- 
Andres Freund   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] libpq compression

2012-06-22 Thread Euler Taveira

On 20-06-2012 17:40, Marko Kreen wrote:
 On Wed, Jun 20, 2012 at 10:05 PM, Florian Pflug f...@phlo.org wrote:
 I'm starting to think that relying on SSL/TLS for compression of
 unencrypted connections might not be such a good idea after all. We'd
 be using the protocol in a way it quite clearly never was intended to
 be used...
 
 Maybe, but what is the argument that we should avoid
 on encryption+compression at the same time?
 
 AES is quite lightweight compared to compression, so should
 be no problem in situations where you care about compression.
 
If we could solve compression problem without AES that will turn things
easier. Compression-only via encryption is a weird manner to solve the problem
in the user's POV.

 RSA is noticeable, but only for short connections.
 Thus easily solvable with connection pooling.
 
RSA overhead is not the main problem. SSL/TLS setup is.

 And for really special compression needs you can always
 create a UDF that does custom compression for you.
 
You have to own the code to modify it; it is not always an option.

 So what exactly is the situation we need to solve
 with postgres-specific protocol compression?
 
Compression only support. Why do I need to set up SSL/TLS just for compression?

IMHO SSL/TLS use is no different from relying in another library to handle
compression for the protocol and more it is compression-specific. That way, we
could implement another algorithms in such library without needing to modify
libpq code. Using SSL/TLS you are bounded by what SSL/TLS software products
decide to use as compression algorithms. I'll be happy to maintain the code
iif it is postgres-specific or even as close as possible to core.


-- 
   Euler Taveira de Oliveira - Timbira   http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] random failing builds on spoonbill - backends not exiting...

2012-06-22 Thread Stefan Kaltenbrunner

It has now happened at least twice that builds on spponbill started to
fail after it failed during ECPGcheck:

http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=spoonbilldt=2012-06-19%2023%3A00%3A04

the first failure was:

http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=spoonbilldt=2012-05-24%2023%3A00%3A05


so in both cases the postmaster was not shuting down properly and it was
in fact still running - I have attached gdb to to the still running backend:


(gdb) bt
#0  0x000208eb5928 in poll () from /usr/lib/libc.so.62.0
#1  0x00020a972b88 in _thread_kern_poll (wait_reqd=Variable
wait_reqd is not available.
) at /usr/src/lib/libpthread/uthread/uthread_kern.c:784
#2  0x00020a973d04 in _thread_kern_sched (scp=0x0) at
/usr/src/lib/libpthread/uthread/uthread_kern.c:384
#3  0x00020a96c080 in select (numfds=Variable numfds is not available.
) at /usr/src/lib/libpthread/uthread/uthread_select.c:170
#4  0x003a2894 in ServerLoop () at postmaster.c:1321
#5  0x003a45ac in PostmasterMain (argc=Variable argc is not
available.
) at postmaster.c:1121
#6  0x00326df8 in main (argc=6, argv=0x14f8) at
main.c:199
(gdb) print Shutdown
$2 = 2
(gdb) print pmState
$3 = PM_WAIT_BACKENDS
(gdb) p *(Backend *) (BackendList-dll_head)
Cannot access memory at address 0x0
(gdb) p *BackendList
$9 = {dll_head = 0x0, dll_tail = 0x0}

all processes are still running:

pgbuild  18020  0.0  1.2  5952 12408 ??  I Wed04AM0:03.98
/home/pgbuild/pgbuildfarm/HEAD/pgsql.5709/src/interfaces/ecpg/test/./tmp_check/install//home/pgbuild/pgbuildfarm/HEAD/inst/bin/postgres
-D /
pgbuild  21483  0.0  0.7  6088  7296 ??  IsWed04AM0:00.68
postgres: checkpointer process(postgres)
pgbuild  12480  0.0  0.4  5952  4464 ??  SsWed04AM0:06.88
postgres: writer process(postgres)
pgbuild   9841  0.0  0.5  5952  4936 ??  SsWed04AM0:06.92
postgres: wal writer process(postgres)
pgbuild623  0.1  0.6  7424  6288 ??  SsWed04AM4:16.76
postgres: autovacuum launcher process(postgres)
pgbuild  30949  0.0  0.4  6280  3896 ??  SsWed04AM0:40.94
postgres: stats collector process(postgres)


sending a manual kill -15 to either of them does not seem to make them
exit either...

I did some further investiagations with robert on IM but I don't think
he has any further ideas other than that I have a weird OS :)
It seems worth noticing that this is OpenBSD 5.1 on Sparc64 which has a
new threading implementation compared to older OpenBSD versions.


Stefan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] random failing builds on spoonbill - backends not exiting...

2012-06-22 Thread Tom Lane

Stefan Kaltenbrunner ste...@kaltenbrunner.cc writes:
 It has now happened at least twice that builds on spponbill started to
 fail after it failed during ECPGcheck:
 http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=spoonbilldt=2012-06-19%2023%3A00%3A04
 the first failure was:
 http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=spoonbilldt=2012-05-24%2023%3A00%3A05
 so in both cases the postmaster was not shuting down properly

panther has been showing similar postmaster-does-not-shut-down failures
every so often, though IIRC always in the IsolationCheck step not ECPG.

 I did some further investiagations with robert on IM but I don't think
 he has any further ideas other than that I have a weird OS :)
 It seems worth noticing that this is OpenBSD 5.1 on Sparc64 which has a
 new threading implementation compared to older OpenBSD versions.

But we don't use threading ...

Still, panther is NetBSD so there may be some general BSD flavor to
whatever's going on here.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] random failing builds on spoonbill - backends not exiting...

2012-06-22 Thread Andrew Dunstan




On 06/22/2012 02:34 PM, Tom Lane wrote:

Stefan Kaltenbrunnerste...@kaltenbrunner.cc  writes:

It has now happened at least twice that builds on spponbill started to
fail after it failed during ECPGcheck:
http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=spoonbilldt=2012-06-19%2023%3A00%3A04
the first failure was:
http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=spoonbilldt=2012-05-24%2023%3A00%3A05
so in both cases the postmaster was not shuting down properly

panther has been showing similar postmaster-does-not-shut-down failures
every so often, though IIRC always in the IsolationCheck step not ECPG.




brolga (Cygwin) has had similar sporadic failures on ecpg checks, so 
much so that, since  I have been unable to debug it in the time I have 
available, for the time being I have disabled the ecpg checks.



cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] random failing builds on spoonbill - backends not exiting...

2012-06-22 Thread Robert Haas

On Fri, Jun 22, 2012 at 2:16 PM, Stefan Kaltenbrunner
ste...@kaltenbrunner.cc wrote:
 sending a manual kill -15 to either of them does not seem to make them
 exit either...

 I did some further investiagations with robert on IM but I don't think
 he has any further ideas other than that I have a weird OS :)
 It seems worth noticing that this is OpenBSD 5.1 on Sparc64 which has a
 new threading implementation compared to older OpenBSD versions.

I remarked to Stefan that the symptoms seem consistent with the idea
that the children have signals blocked.  But I don't know how that
could happen.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] random failing builds on spoonbill - backends not exiting...

2012-06-22 Thread Andres Freund

On Friday, June 22, 2012 08:51:55 PM Robert Haas wrote:
 On Fri, Jun 22, 2012 at 2:16 PM, Stefan Kaltenbrunner
 
 ste...@kaltenbrunner.cc wrote:
  sending a manual kill -15 to either of them does not seem to make them
  exit either...
  
  I did some further investiagations with robert on IM but I don't think
  he has any further ideas other than that I have a weird OS :)
  It seems worth noticing that this is OpenBSD 5.1 on Sparc64 which has a
  new threading implementation compared to older OpenBSD versions.
 
 I remarked to Stefan that the symptoms seem consistent with the idea
 that the children have signals blocked.  But I don't know how that
 could happen.
You cannot block sigkill.

Andres
-- 
Andres Freund   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] random failing builds on spoonbill - backends not exiting...

2012-06-22 Thread Stefan Kaltenbrunner

On 06/22/2012 08:34 PM, Tom Lane wrote:
 Stefan Kaltenbrunner ste...@kaltenbrunner.cc writes:
 It has now happened at least twice that builds on spponbill started to
 fail after it failed during ECPGcheck:
 http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=spoonbilldt=2012-06-19%2023%3A00%3A04
 the first failure was:
 http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=spoonbilldt=2012-05-24%2023%3A00%3A05
 so in both cases the postmaster was not shuting down properly
 
 panther has been showing similar postmaster-does-not-shut-down failures
 every so often, though IIRC always in the IsolationCheck step not ECPG.

hmm

 
 I did some further investiagations with robert on IM but I don't think
 he has any further ideas other than that I have a weird OS :)
 It seems worth noticing that this is OpenBSD 5.1 on Sparc64 which has a
 new threading implementation compared to older OpenBSD versions.
 
 But we don't use threading ...
 
 Still, panther is NetBSD so there may be some general BSD flavor to
 whatever's going on here.

yeah the threading reference was mostly because all backtraces contain
references to threading libs and because the threading tests are the
last ones done by the ECPG changes...


Stefan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Catalog/Metadata consistency during changeset extraction from wal

2012-06-22 Thread Andres Freund

On Friday, June 22, 2012 03:22:03 PM Andres Freund wrote:
 On Thursday, June 21, 2012 05:40:08 PM Andres Freund wrote:
  On Thursday, June 21, 2012 03:56:54 PM Florian Pflug wrote:
   On Jun21, 2012, at 13:41 , Andres Freund wrote:
3b)
Ensure that enough information in the catalog remains by fudging the
xmin horizon. Then reassemble an appropriate snapshot to read the
catalog as the tuple in question has seen it.
   
   The ComboCID machinery makes that quite a bit harder, I fear. If a
   tuple is updated multiple times by the same transaction, you cannot
   decide whether a tuple was visible in a certain snapshot unless you
   have access to the updating backend's ComboCID hash.
  
  Thats a very good point. Not sure how I forgot that.
  
  It think it might be possible to reconstruct a sensible combocid mapping
  from the walstream. Let me think about it for a while...
 
 I have a very, very preliminary thing which seems to work somewhat. I just
 log (cmin, cmax) additionally for every modified catalog tuple into the
 wal and so far that seems to be enough.
 Do you happen to have suggestions for other problematic things to look into
 before I put more time into it?
Im continuing to play around with this. The tricky bit so far is 
subtransaction handling in transactions which modify the catalog (+ possible 
tables which are marked as being required for decoding like pg_enum 
equivalent).

Would somebody fundamentally object to one the following things:
1.
replace

#define IsMVCCSnapshot(snapshot)  \
((snapshot)-satisfies == HeapTupleSatisfiesMVCC)

with something like

#define IsMVCCSnapshot(snapshot)  \
((snapshot)-satisfies == HeapTupleSatisfiesMVCC || 
(snapshot)-satisfies == 
HeapTupleSatisfiesMVCCDuringDecode)

The define is only used sparingly and none of the code path looks so hot that 
this could make a difference.

2.
Set SnapshotNowData.satisfies to HeapTupleSatisfiesNowDuringRecovery while 
reading the catalog for decoding.

Its possible to go on without both but the faking up of data gets quite a bit 
more complex.

The problem making replacement of SnapshotNow.satisfies useful is that there is 
no convenient way to represent subtransactions of the current transaction 
which already have committed according to the TransactionLog but aren't yet 
visible at the current lsn because they only started afterwards. Its 
relatively easy to fake this in an mvcc snapshot but way harder for 
SnapshotNow because you cannot mark transactions as in-progress.

Thanks,

Andres

-- 
Andres Freund   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] random failing builds on spoonbill - backends not exiting...

2012-06-22 Thread Tom Lane

Stefan Kaltenbrunner ste...@kaltenbrunner.cc writes:
 On 06/22/2012 08:34 PM, Tom Lane wrote:
 Still, panther is NetBSD so there may be some general BSD flavor to
 whatever's going on here.

 yeah the threading reference was mostly because all backtraces contain
 references to threading libs and because the threading tests are the
 last ones done by the ECPG changes...

It is weird that this seems to be happening only in the context of the
ecpg and isolation tests, because it's not clear why client-side
activity would have anything to do with it.  Your gdb investigations
confirm that all the actual client-serving backends are gone, and the
postmaster knows it.  But the background service processes haven't shut
down.  AFAICS the postmaster could not have reached PM_WAIT_BACKENDS
state without signaling them, so why aren't they shutting down, and why
does it matter which set of tests we'd been running?

My first thought about it was that maybe a signal got missed, but it's
hard to credit that bgwriter, walwriter, and autovac would all have
missed signals concurrently.  (checkpointer and stats collector don't
get signaled yet, so it's not surprising those are still around.)

I wonder whether signal_child() could have failed?  It logs about
such failures, but only at debug3 which seems overly taciturn.
I wonder if we should crank that up to LOG level.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] random failing builds on spoonbill - backends not exiting...

2012-06-22 Thread Tom Lane

Andres Freund and...@2ndquadrant.com writes:
 On Friday, June 22, 2012 08:51:55 PM Robert Haas wrote:
 I remarked to Stefan that the symptoms seem consistent with the idea
 that the children have signals blocked.  But I don't know how that
 could happen.

 You cannot block sigkill.

sigterm is at issue, not sigkill.  But I don't care for the
signals-blocked theory either, at least not in three different children
at the same time.

(Hey Stefan, is there a way on BSD to check a process's signals-blocked
state from outside?  If so, next time this happens you should try to
determine the children's signal state.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Event Triggers reduced, v1

2012-06-22 Thread Dimitri Fontaine

Robert Haas robertmh...@gmail.com writes:
 It's not before/after anymore, but rather addon/replace if you will. I
 kept the INSTEAD OF keyword for the replace semantics, that you've been
 asking me to keep IIRC, with security policy plugins as a use case.

 Now we can of course keep those semantics and embed them in the event
 name we provide users, I though that maybe a documentation matrix of
 which event support which mode would be cleaner to document. We might
 as well find a clean way to implement both modes for most of the
 commands, I don't know yet.

 So, are you sure you want to embed that part of the event trigger
 semantics in the event name itself?

 Yeah, pretty sure.  I think that for regular triggers, BEFORE, AFTER,
 and INSTEAD-OF are the firing-point specification.   But even triggers
 will have more than three firing points, probably eventually quite a
 lot more.  So we need something more flexible.  But we don't need that
 more flexible thing AND ALSO the before/after/instead-of
 specification, which I think in most cases won't be meaningful anyway.
  It happens to be somewhat sensible for this initial firing point, but
 I think for most of them there will be just one place, and in many
 cases it will be neither before, nor after, nor instead-of.

I agree with using the event name as a the specification for the firing
point, and that we should prefer documenting the ordering of those
rather than offering a fuzzy idea of BEFORE and AFTER steps in there.
The AFTER step is better expressed as BEFORE the next one.

Now, I still think there's an important discrepancy between adding a new
behaviour that adds-up to whatever the backend currently implements and
providing a replacement behaviour with a user defined function that gets
called instead of the backend code.

And I still don't think that the event name should be carrying alone
that semantic discrepancy. Now, I also want the patch to get in, so I
won't insist very much if I'm alone in that position. Anyone else
interested enough to chime in?

The user visible difference would be between those variants:

  create event trigger foo at 'before_security_check' ...
  create event trigger foo at 'replace_security_check' ...

  create event trigger foo before 'security_check' ...
  create event trigger foo instead of 'security_check' ...

Note that in this version the INSTEAD OF variant is not supported, we
only intend to offer it in some very narrow cases, or at least that is
my understanding.

 The issue is that the size of the parser tables grow with the square
 of the number of states.  This will introduce lots of new states that
 we don't really need; and every new kind of event trigger that we want
 to add will introduce more.

It's a little sad not being able to reuse command tag keywords, but it's
even more sad to impact the rest of the query parsing. IIRC you had some
performance test patch with a split of the main parser into queries and
dml on the one hand, and utility commands on the other hand. Would that
help here? (I mean more as a general solution against that bloat problem
than for this very patch here).

I prefer the solution of using 'ALTER TABLE' rather than ALTER TABLE,
even if code wise we're not gaining anything in complexity: the parser
bloat gets replaced by a big series of if branches. Of course you only
exercise it when you need to. I will change that for next patch.

 3. The event trigger cache seems to be a few bricks shy of a load.

 Well, AFAICS, you're still doing full rebuilds whenever something
 changes; you're just keeping the (useless, dead) cache around until
 you decide to rebuild it.  Might as well free the memory once you know
 that the next access will rebuild it anyway, and for a bonus it saves
 you a flag.

I'm just done rewriting the cache management with a catalog cache for
event triggers and a Syscache Callback that calls into a new module
called src/backend/utils/cache/evtcache.c that mimics attoptcache.c. No
more cache stale variable. And a proper composite hash key.

I still have some more work here before being able to send a new release
of the patch, as I said I won't have enough time to make that happen
until within next week. The git repository is updated, though.

  https://github.com/dimitri/postgres/tree/evt_trig_v1
  https://github.com/dimitri/postgres/compare/913091de51...861eb038d0

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_prewarm

2012-06-22 Thread Dimitri Fontaine

Robert Haas robertmh...@gmail.com writes:
 73%?  I think it's got about 15% overlap.

83.7% of stats are wrong. This one included.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] random failing builds on spoonbill - backends not exiting...

2012-06-22 Thread Stefan Kaltenbrunner

On 06/22/2012 09:39 PM, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
 On Friday, June 22, 2012 08:51:55 PM Robert Haas wrote:
 I remarked to Stefan that the symptoms seem consistent with the idea
 that the children have signals blocked.  But I don't know how that
 could happen.
 
 You cannot block sigkill.
 
 sigterm is at issue, not sigkill.  But I don't care for the
 signals-blocked theory either, at least not in three different children
 at the same time.
 
 (Hey Stefan, is there a way on BSD to check a process's signals-blocked
 state from outside?  If so, next time this happens you should try to
 determine the children's signal state.)

with help from RhodiumToad on IRC:

#  ps -o pid,sig,sigcatch,sigignore,sigmask,command -p 12480

  PID  PENDING   CAUGHT  IGNORED  BLOCKED COMMAND
12480 20004004 34084005 c942b002 fffefeff postgres: writer process
(postgres)

#  ps -o pid,sig,sigcatch,sigignore,sigmask,command -p 9841
  PID  PENDING   CAUGHT  IGNORED  BLOCKED COMMAND
 9841 20004004 34084007 c942b000 fffefeff postgres: wal writer process
  (postgres)


Stefan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_dump and dependencies and --section ... it's a mess

2012-06-22 Thread Tom Lane

I wrote:
 I believe the right fix for both of these issues is to add knowledge of
 the section concept to the topological sort logic, so that an ordering
 that puts POST_DATA before DATA or PRE_DATA after DATA is considered to
 be a dependency-ordering violation.  One way to do that is to add dummy
 fencepost objects to the sort, representing the start and end of the
 DATA section.  However, these objects would need explicit dependency
 links to every other DumpableObject, so that doesn't sound very good
 from a performance standpoint.  What I'm going to go look at is whether
 we can mark DumpableObjects with their SECTION codes at creation time
 (rather than adding that information at ArchiveEntry() time) and then
 have the topo sort logic take that marking into account in addition to
 the explicit dependency links.

I gave up on putting any fancy hacks into the topological sort code;
it would have made it extremely ugly, and besides every solution I could
think of wanted to have at least one extra auxiliary array with an entry
per DumpableObject.  Which would have eaten about as much space as the
extra dependency links.  Furthermore, it turns out that the post-data
dependencies need to be changeable, since rules and constraints should
only be forced to be post-data *if* we've decided to dump them
separately from their parent tables/views.  (My original try at this
ended up forcing every rule  constraint to be dumped separately, which
is not what we want.)  Putting knowledge of that into the core
topological sort code seemed right out.  So the attached draft patch
does it the straightforward way, actually creating two dummy boundary
objects and setting up explicit dependency links with them.

I did some simple performance tests and found that this adds a
measurable but pretty negligible cost to pg_dump's runtime.  For
instance, dumping several thousand empty tables went from 9.34 to
9.57 seconds of pg_dump CPU time, compared to multiple minutes of
CPU time spent on the backend side (even with the recent lockmanager
fixes).  So I no longer feel any strong need to optimize the code.

A disadvantage of representing the dependencies explicitly is that
the ones attached to DATA and POST_DATA objects show up in the output
archive.  I'm not particularly worried about this so far as HEAD and
9.2 are concerned, because the other patch to fix emitted dependencies
will make them go away again.  But as I mentioned, I'm not big on
back-patching that one into 9.1.  We could hack something simpler
to directly suppress dependencies on the boundary objects only, or
we could just write it off as not mattering much.  I'd barely have
noticed it except I was testing whether I got an exact match to the
archive produced by an unpatched pg_dump (in cases not involving
the view-vs-constraint bug).

Anyway, the attached patch does seem to fix the constraint bug.

A possible objection to it is that there are now three different ways in
which the pg_dump code knows which DO_XXX object types go in which dump
section: the new addBoundaryDependencies() function knows this, the
SECTION_xxx arguments to ArchiveEntry calls know it, and the sort
ordering constants in pg_dump_sort.c have to agree too.  My original
idea was to add an explicit section field to DumpableObject to reduce
the number of places that know this, but that would increase pg_dump's
memory consumption still more, and yet still not give us a single point
of knowledge.  Has anybody got a better idea?

Barring objections or better ideas, I'll push forward with applying
this patch and the dependency-fixup patch.

regards, tom lane

diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 5fde18921ac3c20f878fbe5ad22c60fabc13a916..71cc3416bb9fd9dcf482c7e8c3a99d01acd7f238 100644
*** a/src/bin/pg_dump/pg_dump.c
--- b/src/bin/pg_dump/pg_dump.c
*** static void dumpACL(Archive *fout, Catal
*** 210,215 
--- 210,220 
  		const char *acls);
  
  static void getDependencies(Archive *fout);
+ 
+ static DumpableObject *createBoundaryObjects(void);
+ static void addBoundaryDependencies(DumpableObject **dobjs, int numObjs,
+ 		DumpableObject *boundaryObjs);
+ 
  static void getDomainConstraints(Archive *fout, TypeInfo *tyinfo);
  static void getTableData(TableInfo *tblinfo, int numTables, bool oids);
  static void makeTableDataInfo(TableInfo *tbinfo, bool oids);
*** main(int argc, char **argv)
*** 270,275 
--- 275,281 
  	int			numTables;
  	DumpableObject **dobjs;
  	int			numObjs;
+ 	DumpableObject *boundaryObjs;
  	int			i;
  	enum trivalue prompt_password = TRI_DEFAULT;
  	int			compressLevel = -1;
*** main(int argc, char **argv)
*** 691,696 
--- 697,713 
  	 */
  	getDependencies(fout);
  
+ 	/* Lastly, create dummy objects to represent the section boundaries */
+ 	boundaryObjs = createBoundaryObjects();
+ 
+ 	/* Get pointers to all the known DumpableObjects */
+

Re: [HACKERS] random failing builds on spoonbill - backends not exiting...

2012-06-22 Thread Tom Lane

Stefan Kaltenbrunner ste...@kaltenbrunner.cc writes:
 On 06/22/2012 09:39 PM, Tom Lane wrote:
 (Hey Stefan, is there a way on BSD to check a process's signals-blocked
 state from outside?  If so, next time this happens you should try to
 determine the children's signal state.)

 with help from RhodiumToad on IRC:

 #  ps -o pid,sig,sigcatch,sigignore,sigmask,command -p 12480

   PID  PENDING   CAUGHT  IGNORED  BLOCKED COMMAND
 12480 20004004 34084005 c942b002 fffefeff postgres: writer process
 (postgres)

 #  ps -o pid,sig,sigcatch,sigignore,sigmask,command -p 9841
   PID  PENDING   CAUGHT  IGNORED  BLOCKED COMMAND
  9841 20004004 34084007 c942b000 fffefeff postgres: wal writer process
   (postgres)

Well, the nonzero PENDING masks sure look like a smoking gun, but why
are there multiple pending signals?  And I'm not sure I know OpenBSD's
signal numbers by heart.  Could you convert those masks into text signal
name lists for us?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] random failing builds on spoonbill - backends not exiting...

2012-06-22 Thread Stefan Kaltenbrunner

On 06/22/2012 11:02 PM, Tom Lane wrote:
 Stefan Kaltenbrunner ste...@kaltenbrunner.cc writes:
 On 06/22/2012 09:39 PM, Tom Lane wrote:
 (Hey Stefan, is there a way on BSD to check a process's signals-blocked
 state from outside?  If so, next time this happens you should try to
 determine the children's signal state.)
 
 with help from RhodiumToad on IRC:
 
 #  ps -o pid,sig,sigcatch,sigignore,sigmask,command -p 12480
 
   PID  PENDING   CAUGHT  IGNORED  BLOCKED COMMAND
 12480 20004004 34084005 c942b002 fffefeff postgres: writer process
 (postgres)
 
 #  ps -o pid,sig,sigcatch,sigignore,sigmask,command -p 9841
   PID  PENDING   CAUGHT  IGNORED  BLOCKED COMMAND
  9841 20004004 34084007 c942b000 fffefeff postgres: wal writer process
   (postgres)
 
 Well, the nonzero PENDING masks sure look like a smoking gun, but why
 are there multiple pending signals?  And I'm not sure I know OpenBSD's
 signal numbers by heart.  Could you convert those masks into text signal
 name lists for us?

this seems to be SIGUSR1,SIGTERM and SIGQUIT



Stefan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] random failing builds on spoonbill - backends not exiting...

2012-06-22 Thread Tom Lane

Stefan Kaltenbrunner ste...@kaltenbrunner.cc writes:
 PID  PENDING   CAUGHT  IGNORED  BLOCKED COMMAND
 12480 20004004 34084005 c942b002 fffefeff postgres: writer process
 9841 20004004 34084007 c942b000 fffefeff postgres: wal writer process

 this seems to be SIGUSR1,SIGTERM and SIGQUIT

OK, I looked up OpenBSD's signal numbers on the web.  It looks to me
like these two processes have everything blocked except KILL and STOP
(which are unblockable of course).  I do not see any place in the PG
code that could possibly set such a mask (note that BlockSig should
have more holes in it than that).  So I'm thinking these must be
blocked inside some system function that's installed a restrictive
signal mask, or some such function forgot to restore the mask on exit.
Could you gdb each of these processes and get a stack trace?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] random failing builds on spoonbill - backends not exiting...

2012-06-22 Thread Tom Lane

oh, and just for comparison's sake, what do the postmaster's signal
masks look like?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] random failing builds on spoonbill - backends not exiting...

2012-06-22 Thread Robert Haas

On Fri, Jun 22, 2012 at 2:57 PM, Andres Freund and...@2ndquadrant.com wrote:
 On Friday, June 22, 2012 08:51:55 PM Robert Haas wrote:
 On Fri, Jun 22, 2012 at 2:16 PM, Stefan Kaltenbrunner

 ste...@kaltenbrunner.cc wrote:
  sending a manual kill -15 to either of them does not seem to make them
  exit either...
 
  I did some further investiagations with robert on IM but I don't think
  he has any further ideas other than that I have a weird OS :)
  It seems worth noticing that this is OpenBSD 5.1 on Sparc64 which has a
  new threading implementation compared to older OpenBSD versions.

 I remarked to Stefan that the symptoms seem consistent with the idea
 that the children have signals blocked.  But I don't know how that
 could happen.
 You cannot block sigkill.

Obviously.  The issue is: the postmaster apparently sent SIGTERM (15)
to all of these children, and yet they're not dead; and a manual
SIGTERM doesn't kill them either.  I'm sure SIGKILL (9) would do the
trick, but then it's not a clean shutdown.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] COMMUTATOR doesn't seem to work

2012-06-22 Thread Robert Haas

On Fri, Jun 22, 2012 at 12:28 PM, D'Arcy Cain da...@druid.net wrote:
 I doubt that an auto reverse the arguments facility would be very
 much cheaper.  You could maybe argue that the aggregated maintenance
 and space costs of all the commutator-pair functions are enough to
 justify having some such solution instead, but I'm doubtful --- and
 even if true, getting from here to there would be painful.


 And it would only apply to a very specific type of function.

 The other idea I had was to just have the second C function call the
 first but that didn't work.  Here is what I tried.

 PG_FUNCTION_INFO_V1(chkpass_eq);
 Datum
 chkpass_eq(PG_FUNCTION_ARGS)
 {
    chkpass    *a1 = (chkpass *) PG_GETARG_POINTER(0);
    text       *a2 = (text *) PG_GETARG_TEXT_P(1);
    char        str[9];

    strlcpy(str, a2-vl_dat, sizeof(str));
    PG_RETURN_BOOL(strcmp(a1-password, crypt(str, a1-password)) == 0);
 }

 PG_FUNCTION_INFO_V1(chkpass_eq2);
 Datum
 chkpass_eq2(PG_FUNCTION_ARGS)
 {
    return chkpass_eq(PG_GETARG_POINTER(1), PG_GETARG_TEXT_P(0));
 }

 Now in this specific case the function is trivial and writing it twice
 is no big deal but in general I hate writing the same code twice.  I
 suppose I could extract the actual operation out to a third function
 and call it from the others.  I may do that anyway just for the value
 of the example.  Or is there a way to do what I tried above?

I think DirectionFunctionCall2 is what you want.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_prewarm

2012-06-22 Thread Josh Berkus


 The biggest problem with pgfincore from my point of view is that it
 only works under Linux, whereas I use a MacOS X machine for my
 development, and there is also Windows to think about.  Even if that
 were fixed, though, I feel we ought to have something in the core
 distribution.  This patch got more +1s than 95% of what gets proposed
 on hackers.

Fincore is only a blocker to this patch if we think pgfincore is ready
to be proposed for the core distribution.  Do we?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] A good illustraton of why we need user-friendly system views

2012-06-22 Thread Josh Berkus

http://pgolub.wordpress.com/2012/06/22/backward-compatibility-never-heard-of-it

If we had stable system views for all database objects (stable as in we
just append to them), then refactoring our system tables wouldn't break
things for our users.  Just sayin'.

(and don't tell me about information_schema, which is fairly useless for
anything except tables and columns)

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] initdb and fsync

2012-06-22 Thread Noah Misch

On Fri, Jun 22, 2012 at 10:04:23AM -0400, Robert Haas wrote:
 On Sat, Feb 4, 2012 at 8:18 PM, Noah Misch n...@leadboat.com wrote:
  If we add fsync calls to the initdb process, they should cover the entire 
  data
  directory tree. ?This patch syncs files that initdb.c writes, but we ought 
  to
  also sync files that bootstrap-mode backends had written. ?An optimization
  like the pg_flush_data() call in copy_file() may reduce the speed penalty.
 
  initdb should do these syncs by default and offer an option to disable them.
 
 This may be a stupid question, by why is it initdb's job to fsync the
 files the server creates, rather than the server's job?  Normally we
 rely on the server to make its own writes persistent.

Modularity would dictate having the server fsync its own work product, but I
expect that approach to perform materially worse.  initdb runs many
single-user server instances, and each would fsync independently.  When N
initdb steps change one file, it would see N fsyncs.  Using sync_file_range to
queue all writes is best for the typical interactive or quasi-interactive
initdb user.  It's not always a win for server fsyncs, so we would need to
either define special cases such that it's used during initdb or forgo the
optimization.  On the other hand, the server could skip some files, like fsm
forks, in a principled manner.

Overall, I think it will hard to improve modularity while retaining the
performance Jeff's approach achieves through exploiting initdb's big-picture
perspective.  So I favor how Jeff has implemented it.

nm

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] A good illustraton of why we need user-friendly system views

2012-06-22 Thread Robert Haas

On Fri, Jun 22, 2012 at 9:30 PM, Josh Berkus j...@agliodbs.com wrote:
 http://pgolub.wordpress.com/2012/06/22/backward-compatibility-never-heard-of-it

 If we had stable system views for all database objects (stable as in we
 just append to them), then refactoring our system tables wouldn't break
 things for our users.  Just sayin'.

This has been discussed before, and I'm still not buying it.  I mean,
suppose you wrote code that depended on anything stated in a
constraint always being true.  Then we added deferrable constraints.
Oops.  But would you rather NOT have that feature?  Appending columns
doesn't help in that case.

Or suppose you wrote code that depended on
pg_stat_user_functions.total_time being an integer.  Well, we could
append a new column with a different datatype, but now you've got two
columns with the same information, which is a confusing mess.

I still remember the first time my application code got broken by a
system catalog change.  Some moron added pg_attribute.attisdropped,
and boy was I annoyed.  However, in between my annoyance, I realized
that (1) adapting my code wasn't really going to be that hard and (2)
being able to drop columns was a pretty good feature.  Granted, a
compatibility view would have worked in this case, but only if I'd
been using the compatibility view rather than the underlying table,
and I am not sure I would have been that smart.

The compatibility breaks that really bother me are the ones that
affect a lot of people: standard_conforming_strings, 8.3's implicit
casting changes, and Tom's PL/plgsql lexer stuff that made a bunch of
things no longer usable as unquoted variable names.  That stuff breaks
application code, sometimes quite a lot of it.  System catalog changes
have a pretty small impact by comparison, although of course (as in
this case) it's not perfect.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] COMMUTATOR doesn't seem to work

2012-06-22 Thread D'Arcy Cain


On 12-06-22 07:09 PM, Robert Haas wrote:

I think DirectionFunctionCall2 is what you want.


Can you elaborate?  I could not find a single hit in Google or the
documentation search on the PG site and it does not appear anywhere
in the source distribution.

--
D'Arcy J.M. Cain da...@druid.net |  Democracy is three wolves
http://www.druid.net/darcy/|  and a sheep voting on
+1 416 425 1212 (DoD#0082)(eNTP)   |  what's for dinner.
IM: da...@vex.net



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] COMMUTATOR doesn't seem to work

2012-06-22 Thread Tom Lane

D'Arcy Cain da...@druid.net writes:
 On 12-06-22 07:09 PM, Robert Haas wrote:
 I think DirectionFunctionCall2 is what you want.

 Can you elaborate?  I could not find a single hit in Google or the
 documentation search on the PG site and it does not appear anywhere
 in the source distribution.

He meant DirectFunctionCall2

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] A good illustraton of why we need user-friendly system views

2012-06-22 Thread Jaime Casanova

On Fri, Jun 22, 2012 at 9:05 PM, Robert Haas robertmh...@gmail.com wrote:
 On Fri, Jun 22, 2012 at 9:30 PM, Josh Berkus j...@agliodbs.com wrote:
 http://pgolub.wordpress.com/2012/06/22/backward-compatibility-never-heard-of-it

 If we had stable system views for all database objects (stable as in we
 just append to them), then refactoring our system tables wouldn't break
 things for our users.  Just sayin'.

 This has been discussed before, and I'm still not buying it.  I mean,
 suppose you wrote code that depended on anything stated in a
 constraint always being true.  Then we added deferrable constraints.
 Oops.  But would you rather NOT have that feature?  Appending columns
 doesn't help in that case.


also the incompatibility in the case of tablespaces was a good one...
i saw cases where the link was manually moved to another place... and
don't ask, don't know why they do this...

so something reading the spclocation would have been misleading

-- 
Jaime Casanova         www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

64 matches

Mail list logo