Re: [HACKERS] [PATCH 10/16] Introduce the concept that wal has a 'origin' node

2012-06-19 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On Monday, June 18, 2012 11:51:27 PM Daniel Farina wrote:
 What's the cost of going a lot higher?  Because if one makes enough
 numerical space available, one can assign node identities without a
 coordinator, a massive decrease in complexity.

 It would increase the size of every wal record. We just have 16bit left there
 by chance...

Every WAL record?  Why in heck would you attach it to every record?
Surely putting it in WAL page headers would be sufficient.  We could
easily afford to burn a page switch (if not a whole segment switch)
when changing masters.

I'm against the idea of eating any spare space we have in WAL record
headers for this purpose, anyway; there are likely to be more pressing
needs in future.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Resource Owner reassign Locks

2012-06-19 Thread Amit Kapila
 And it doesn't seem right for ResourceOwnerRemember/ForgetLock to have to
accept a NULL owner.
  I am not sure, if it can ever enter into this flow without resowner as
mentioned in jeff comments 
  for session level locks. If it cannot enter then it is okay.

 Please take a look to see if I broke something.
  In you previous mail you agreed with level as ERROR for elog message in
function ResourceOwnerForgetLock(..) function,   
  but in your modification you have used PANIC, is there any specific reason
for it.

With Regards,
Amit Kapila.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Resource Owner reassign Locks

2012-06-19 Thread Heikki Linnakangas

On 19.06.2012 09:02, Amit Kapila wrote:

Please take a look to see if I broke something.

   In you previous mail you agreed with level as ERROR for elog message in
function ResourceOwnerForgetLock(..) function,
   but in your modification you have used PANIC, is there any specific reason
for it.


Ah, sorry, that was just a debugging aid. Before posting the patch, I 
had a bug (now fixed) where I got that error, and I changed it 
temporarily to PANIC to get a core dump, but forgot to change it back 
before posting the patch. It should indeed be ERROR, not PANIC.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pg default's verbosity?

2012-06-19 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 There might be something to the idea of demoting a few of the things
 we've traditionally had as NOTICEs, though.  IME, the following two
 messages account for a huge percentage of the chatter:

 NOTICE:  CREATE TABLE will create implicit sequence foo_a_seq for
 serial column foo.a
 NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index
 foo_pkey for table foo

Personally, I'd have no problem with flat-out dropping (not demoting)
both of those two specific messages.  I seem to recall that Bruce has
lobbied for them heavily in the past, though.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Testing 9.2 in ~production environment

2012-06-19 Thread Peter Eisentraut
On mån, 2012-06-18 at 17:57 -0400, James Cloos wrote:
  JB == Josh Berkus j...@agliodbs.com writes:
 
 JB Can you check the collations of the two databases?  I'm wondering if 9.1
 JB is in C collation and 9.2 is something else.
 
 Thanks!
 
 pg_dump -C tells me these two differences:
 
  -SET client_encoding = 'SQL_ASCII';
  +SET client_encoding = 'UTF8';
 
  -CREATE DATABASE dbm WITH TEMPLATE = template0 ENCODING = 'SQL_ASCII' 
 LC_COLLATE = 'C' LC_CTYPE = 'C';
  +CREATE DATABASE dbm WITH TEMPLATE = template0 ENCODING = 'UTF8' LC_COLLATE 
 = 'C' LC_CTYPE = 'en_US.UTF-8';
 
 for every db in the clusters.
 
 I presume that lc_ctype is the significant difference?

It certainly makes some difference, but it's a bit shocking that makes
things that much slower.

 LC_CTYPE *is* specified as 'C' in the dump from which I created the 9.2
 cluster, so it must have been overridden by pg_restore.  I see that my
 dist's /etc rc script now sets LC_CTYPE.  Would that explain why lc_ctype
 changed between the two clusters?

It's possible, depending on how exactly the start up script maze is set
up on your particular OS.

 Is there any way to alter a db's lc_ctype w/o dumping and restoring?  I
 want to preserve some of the changes made since I copied the 9.1 cluster.
 Alter database reports that lc_ctype cannot be changed.

Not really, but in practice you can probably just update pg_database
directly.  If you don't have any case-insensitive indexes, nothing
should change.  Worst case, reindex everything.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Testing 9.2 in ~production environment

2012-06-19 Thread Tom Lane
Peter Eisentraut pete...@gmx.net writes:
 On mån, 2012-06-18 at 17:57 -0400, James Cloos wrote:
 I presume that lc_ctype is the significant difference?

 It certainly makes some difference, but it's a bit shocking that makes
 things that much slower.

If James is testing text-comparison-heavy operations, it doesn't seem
shocking in the least.  strcoll() in most non-C locales is a pig.

 LC_CTYPE *is* specified as 'C' in the dump from which I created the 9.2
 cluster, so it must have been overridden by pg_restore.  I see that my
 dist's /etc rc script now sets LC_CTYPE.  Would that explain why lc_ctype
 changed between the two clusters?

 It's possible, depending on how exactly the start up script maze is set
 up on your particular OS.

pg_dumpall should generate a script that correctly restores database
locales.  However, pg_dump+pg_restore is dependent on user creation
of the specific database, which is likely to be environment sensitive.
We really oughta do something about that ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pg default's verbosity?

2012-06-19 Thread Martijn van Oosterhout
On Mon, Jun 18, 2012 at 09:30:14PM -0400, Robert Haas wrote:
 There might be something to the idea of demoting a few of the things
 we've traditionally had as NOTICEs, though.  IME, the following two
 messages account for a huge percentage of the chatter:
 
 NOTICE:  CREATE TABLE will create implicit sequence foo_a_seq for
 serial column foo.a
 NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index
 foo_pkey for table foo

+1

Absolutely. And if you also suppress the output of the setval's
produced by pg_dump that would make a successful restore of a dump
produce barely any output at all with -q.  That would make errors
significantly more visible.

Not sure how to go about that though.

Have a nice day,
-- 
Martijn van Oosterhout   klep...@svana.org   http://svana.org/kleptog/
 He who writes carelessly confesses thereby at the very outset that he does
 not attach much importance to his own thoughts.
   -- Arthur Schopenhauer


signature.asc
Description: Digital signature


Re: [HACKERS] Testing 9.2 in ~production environment

2012-06-19 Thread Peter Eisentraut
On tis, 2012-06-19 at 02:38 -0400, Tom Lane wrote:
 Peter Eisentraut pete...@gmx.net writes:
  On mån, 2012-06-18 at 17:57 -0400, James Cloos wrote:
  I presume that lc_ctype is the significant difference?
 
  It certainly makes some difference, but it's a bit shocking that
 makes
  things that much slower.
 
 If James is testing text-comparison-heavy operations, it doesn't seem
 shocking in the least.  strcoll() in most non-C locales is a pig. 

Ah yes, of course, having lc_ctype != C also selects strcoll instead of
strcmp.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH 10/16] Introduce the concept that wal has a 'origin' node

2012-06-19 Thread Andres Freund
On Tuesday, June 19, 2012 12:47:54 AM Christopher Browne wrote:
 On Mon, Jun 18, 2012 at 11:50 AM, Andres Freund and...@2ndquadrant.com 
wrote:
  Hi Simon,
  
  On Monday, June 18, 2012 05:35:40 PM Simon Riggs wrote:
  On 13 June 2012 19:28, Andres Freund and...@2ndquadrant.com wrote:
   This adds a new configuration parameter multimaster_node_id which
   determines the id used for wal originating in one cluster.
  
  Looks good and it seems this aspect at least is commitable in this CF.
  
  I think we need to agree on the parameter name. It currently is
  'multimaster_node_id'. In the discussion with Steve we got to
  replication_node_id. I don't particularly like either.
  
  Other suggestions?
 I wonder if it should be origin_node_id?  That is the term Slony uses.
I think in the slony context its clear that thats related to replication. Less 
so in a general postgres guc. So maybe replication_origin_node_id?

  * Size of field. 16 bits is enough for 32,000 master nodes, which is
  quite a lot. Do we need that many? I think we may have need for a few
  flag bits, so I'd like to reserve at least 4 bits for flag bits, maybe
  8 bits. Even if we don't need them in this release, I'd like to have
  them. If they remain unused after a few releases, we may choose to
  redeploy some of them as additional nodeids in future. I don't foresee
  complaints that 256 master nodes is too few anytime soon, so we can
  defer that decision.
  
  I wished we had some flag bits available before as well. I find 256 nodes
  a pretty low value to start with though, 4096 sounds better though, so I
  would be happy with 4 flag bits. I think for cascading setups and such
  you want to add node ids for every node, not only masters...
 Even though the number of nodes that can reasonably participate in
 replication is likely to be not too terribly large, it might be good
 to allow larger values, in case someone is keen on encoding something
 descriptive in the node number.
Well, having a large number space makes the wal records bigger which is not 
something I can see us getting through with. People have gone through 
considerable length to avoid that.
Perhaps we can have a mapping system catalog at some point that includes 
additional infromation to each node id like a name and where its at wrt 
replication...

 I recall the Slony-II project having a notion of attaching a permanent
 UUID-based node ID to each node.  As long as there is somewhere decent
 to find a symbolically significant node name, I like the idea of the
 ID *not* being in a tiny range, and being UUID/OID-like...
I think adding 14 bytes (16bytes of an ooid - 2 bytes available) is out of 
question...

  Any opinions from others on this?
  
  * Do we want origin_id as a parameter or as a setting in pgcontrol?
  IIRC we go to a lot of trouble elsewhere to avoid problems with
  changing on/off parameter values. I think we need some discussion to
  validate where that should live.
  
  Hm. I don't really forsee any need to have it in pg_control. What do you
  want to protect against with that?
  It would need to be changeable anyway, because otherwise it would need to
  become a parameter for initdb which would suck for anybody migrating to
  use replication at some point.
  
  Do you want to protect against problems in replication setups after
  changing the value?
 In Slony, changing the node ID is Not Something That Is Done.  The ID
 is captured in *way* too many places to be able to have any hope of
 updating it in a coordinated way.  I should be surprised if it wasn't
 similarly troublesome here.
If you update you will need to reset consuming nodes, yes. Imo thats still 
something else than disallowing changing the parameter entirely. Requiring 
initdb for that seems like it would make experimentation too hard.

We need to allow at least changing the setting from no node id to an initial 
one.

Greetings,

Andres
-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] psql tab completion for GRANT role

2012-06-19 Thread Peter Eisentraut
On tor, 2012-06-14 at 13:38 -0400, Robert Haas wrote:
 On Sun, Jan 8, 2012 at 3:48 PM, Peter Eisentraut pete...@gmx.net wrote:
  psql tab completion currently only supports the form GRANT privilege ON
  something TO someone (and the analogous REVOKE), but not the form GRANT
  role TO someone.  Here is a patch that attempts to implement the latter.
 
 This seems to have fallen through the cracks.

No, it was committed in January.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH 10/16] Introduce the concept that wal has a 'origin' node

2012-06-19 Thread Andres Freund
Hi,

On Tuesday, June 19, 2012 08:03:04 AM Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On Monday, June 18, 2012 11:51:27 PM Daniel Farina wrote:
  What's the cost of going a lot higher?  Because if one makes enough
  numerical space available, one can assign node identities without a
  coordinator, a massive decrease in complexity.
  
  It would increase the size of every wal record. We just have 16bit left
  there by chance...
 
 Every WAL record?  Why in heck would you attach it to every record?
 Surely putting it in WAL page headers would be sufficient.  We could
 easily afford to burn a page switch (if not a whole segment switch)
 when changing masters.
The idea is that you can have cascading, circular and whatever replication 
topologies if you include the logical origin of a wal causing action into 
it.
That is, if you have nodes A(1) and B(2) and a insert happens on A the wal 
records generated by that will get an xl_origin_id = 1 and when it will be 
decoded and replayed on B it will *also* get the id 1. Only when a change 
originally is generated on Bit will get xl_origin_id = 2.
That way you can easily have circular or hierarchical replication topologies 
including diamonds.

 I'm against the idea of eating any spare space we have in WAL record
 headers for this purpose, anyway; there are likely to be more pressing
 needs in future.
Every other solution to allowing this seems to be far more complicated than 
this, thats why I arrived at the conclusion that its a good idea.

Greetings,

Andres
-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH 10/16] Introduce the concept that wal has a 'origin' node

2012-06-19 Thread Andres Freund
On Tuesday, June 19, 2012 04:12:47 AM Steve Singer wrote:
 On 12-06-18 07:30 AM, Andres Freund wrote:
  Hrmpf #666. I will go through through the series commit-by-commit again
  to make sure everything compiles again. Reordinging this late definitely
  wasn't a good idea...
  
  I pushed a rebased version with all those fixups (and removal of the
  zeroRecPtr patch).
 
 Where did you push that rebased version to? I don't see an attachment,
 or an updated patch in the commitfest app and your repo at
 http://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=summa
 ry hasn't been updated in 5 days.
To the 2ndquadrant internal repo. Which strangely doesn't help you. 
*Headdesk*. Pushed to the correct repo and manually verified.

Andres
-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Skip checkpoint on promoting from streaming replication

2012-06-19 Thread Kyotaro HORIGUCHI
Thank you.

 What happens if the server skips an end-of-recovery checkpoint,
 is promoted to the master, runs some write transactions,
 crashes and restarts automatically before it completes
 checkpoint? In this case, the server needs to do crash recovery
 from the last checkpoint record with old timeline ID to the
 latest WAL record with new timeline ID. How does crash recovery
 do recovery beyond timeline?

Basically the same as archive recovery as far as I saw. It is
already implemented to work in that way.

After this patch applied, StartupXLOG() gets its
recoveryTargetTLI from the new field lastestTLI in the control
file instead of the latest checkpoint. And the latest checkpoint
record informs its TLI and WAL location as before, but its TLI
does not have a significant meaning in the recovery sequence.

Suggest the case following,

  |seg 1 | seg 2|
TLI 1 |...c..|00|
  C   P  X
TLI 2|00|

* C - checkpoint, P - promotion, X - crash just after here

This shows the situation that the latest checkpoint(restartpoint)
has been taken at TLI=1/SEG=1/OFF=4 and promoted at
TLI=1/SEG=2/OFF=5, then crashed just after TLI=2/SEG=2/OFF=8.
Promotion itself inserts no wal records but creates a copy of the
current segment for new TLI. the file for TLI=2/SEG=1 should not
exist. (Who will create it?)

The control file will looks as follows

latest checkpoint : TLI=1/SEG=1/OFF=4
latest TLI: 2

So the crash recovery sequence starts from SEG=1/LOC=4.
expectedTLIs will be (2, 1) so 1 will naturally be selected as
the TLI for SEG1 and 2 for SEG2 in XLogFileReadAnyTLI().

In the closer view, startup constructs expectedTLIs reading the
timeline hisotry file corresponds to the recoveryTargetTLI. Then
runs the recovery sequence from the redo point of the latest
checkpoint using WALs with the largest TLI - which is
distinguised by its file name, not header - within the
expectedTLIs in XLogPageRead(). The only difference to archive
recovery is XLogFileReadAnyTLI() reads only the WAL files already
sit in pg_xlog directory, and not reaches archive. The pages with
the new TLI will be naturally picked up as mentioned above in
this sequence and then will stop at the last readable record.

latestTLI field in the control file is updated just after the TLI
was incremented and the new WAL files with the new TLI was
created. So the crash recovery sequence won't stop before
reaching the WAL with new TLI disignated in the control file.


regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

== My e-mail address has been changed since Apr. 1, 2012.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pg default's verbosity?

2012-06-19 Thread Fabien COELHO



There might be something to the idea of demoting a few of the things
we've traditionally had as NOTICEs, though.  IME, the following two
messages account for a huge percentage of the chatter:

NOTICE:  CREATE TABLE will create implicit sequence foo_a_seq for
serial column foo.a
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index
foo_pkey for table foo


You can also add:

NOTICE:  CREATE TABLE / UNIQUE will create implicit index foo_name_key 
for table foo


I agree that these amount for most of the noise.

As create table does create other objects, I could understand that someone 
wants to hear about that. Maybe move them as info ? Otherwise, changing 
the default message level seems reasonable to me. What we really case when 
loading an SQL script is WARNING  ERROR, so that should be what is 
activated.



I'm not going to claim that nobody in the history of the world has
ever benefited from those notices ... but I would be willing to bet
that a large majority of the people, in a large majority of the cases,
do not care.  And getting rid of them would surely make warnings and
notices that might actually be of interest to the user a lot more
visible.


--
Fabien.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Libxml2 load error on Windows

2012-06-19 Thread Dave Page
On Tue, Jun 19, 2012 at 4:43 AM, Robert Haas robertmh...@gmail.com wrote:


 Please add this patch here so that it doesn't get lost in the shuffle:

 https://commitfest.postgresql.org/action/commitfest_view/open

Hmm, that raises an interesting question (though maybe I've just
missed this happening in the past). This is a bug fix, that we'll want
in the next back branch updates, not just 9.3. Sticking it in the open
commit fest means it may well not get looked at for 2-3 months or so.
How do we prevent that happening?

-- 
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] lock_timeout and common SIGALRM framework

2012-06-19 Thread Boszormenyi Zoltan

2012-06-18 19:46 keltezéssel, Alvaro Herrera írta:

Excerpts from Boszormenyi Zoltan's message of vie may 11 03:54:13 -0400 2012:

Hi,

another rebasing and applied the GIT changes in
ada8fa08fc6cf5f199b6df935b4d0a730aaa4fec to the
Windows implementation of PGSemaphoreTimedLock.

Hi,

I gave the framework patch a look.  One thing I'm not sure about is the
way you've defined the API.  It seems a bit strange to have a nice and
clean, generic interface in timeout.h; and then have the internal
implementation of the API cluttered with details such as what to do when
the deadlock timeout expires.  Wouldn't it be much better if we could
keep the details of CheckDeadLock itself within proc.c, for example?


Do you mean adding a callback function argument to for enable_timeout()
would be better?


Same for the other specific Check*Timeout functions.  It seems to me
that the only timeout.c specific requirement is to be able to poke
base_timeouts[].indicator and fin_time.  Maybe that can get done in
timeout.c and then have the generic checker call the module-specific
checker.


Or instead of static functions, Check* functions can be external
to timeout.c. It seemed to be a good idea to move all the timeout
related functions to timeout.c.


   ... I see that you have things to do before and after setting
indicator.  Maybe you could split the module-specific Check functions
in two and have timeout.c call each in turn.  Other than that I don't
see that this should pose any difficulty.

Also, I see you're calling GetCurrentTimestamp() in the checkers; maybe
more than once per signal if you have multiple timeouts enabled.


Actually, GetCurrentTimestamp() is not called multiple times,
because only the first timeout source in the queue can get triggered.


   Maybe
it'd be better to call it in once handle_sig_alarm and then pass the
value down, if necessary (AFAICS if you restructure the code as I
suggest above, you don't need to get the value down the the
module-specific code).


But yes, this way it can be cleaner.



As for the Init*Timeout() and Destroy*Timeout() functions, they seem
a bit pointless -- I mean if they just always call the generic
InitTimeout and DestroyTimeout functions, why not just define the struct
in a way that makes the specific functions unnecessary?  Maybe you even
already have all that's necessary ... I think you just change
base_timeouts[i].timeout_destroy(false); to DestroyTimeout(i, false) and
so on.


OK, I will experiment with it.


Minor nitpick: the Interface: comment in timeout.c seems useless.
That kind of thing tends to get overlooked and obsolete over time.  We
have examples of such things all over the place.  I'd just rip it and
have timeout.h be the interface documentation.


OK.


BTW it doesn't seem that datatype/timestamp.h is really necessary in
timeout.h.


You are wrong. get_timeout_start() returns TimestampTz and it's
defined in datatype/timestamp.h.

Thanks for the review, I will send the new code soon.

Best regards,
Zoltán Böszörményi

--
--
Zoltán Böszörményi
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de
 http://www.postgresql.at/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allow WAL information to recover corrupted pg_controldata

2012-06-19 Thread Amit Kapila
 I'm almost inclined to suggest that we not get next-LSN from WAL, but
 by scanning all the pages in the main data store and computing the max
 observed LSN.  This is clearly not very attractive from a performance
 standpoint, but it would avoid the obvious failure mode where you lost
 some recent WAL segments along with pg_control.

If we follow this approach, what should be handling in case next-LSN is greater 
than
last checkpoint record location read from WAL files. Currently I can see 
StratUpXLOG throws
PANIC error in such situation.
I think this can happen in case of missing some recent WAL segments.

With Regards,
Amit Kapila.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Lazy hashaggregate when no aggregation is needed

2012-06-19 Thread Etsuro Fujita
Hi,

 -Original Message-
 From: Robert Haas [mailto:robertmh...@gmail.com]
 Sent: Tuesday, June 19, 2012 3:12 AM
 To: Ants Aasma
 Cc: Etsuro Fujita; Jay Levitt; Tom Lane; PostgreSQL-development; Francois
 Deliege
 Subject: Re: [HACKERS] [PATCH] Lazy hashaggregate when no aggregation is
 needed

 On Fri, Jun 15, 2012 at 9:22 AM, Ants Aasma a...@cybertec.at wrote:
  Exactly. I think the first question for this patch should be whether
  this use-case is worth the complexity of the patch. I can't imagine
  any really compelling use cases that need an arbitrary distinct subset
  of results.

 Me neither.

  The original complaint on -performance [1], didn't specify a real
  world use case, but it seemed to be a case of an ORM generating
  suboptimal queries. On the other hand, the patch itself is in my
  opinion rather simple, so it might be worth it.

 Yeah.

  It has one outstanding issue, query_planner chooses the cheapest path
  based on total cost. This can be suboptimal when that path happens to
  have high startup cost. It seems to me that enabling the query_planner
  to find the cheapest unsorted path returning a limited amount of
  tuples would require some major surgery to the planner. To be clear,
  this is only a case of missed optimization, not a regression.

 I'm confused by this remark, because surely the query planner does it this
 way only if there's no LIMIT.  When there is a LIMIT, we choose based on
 the startup cost plus the estimated fraction of the total cost we expect
 to pay based on dividing the LIMIT by the overall row count estimate.  Or
 is this not what you're talking about?

I think that Ants is pointing the way of estimating costs in
choose_hashed_grouping()/choose_hashed_distinct(), ie cost_agg() for
cheapest_path + hashagg, where the costs are calculated based on the total
cost only of cheapest_path.  I think that it might be good to do cost_agg()
for the discussed case with the AGG_SORTED strategy, not the AGG_HASHED
strategy.

  It won't help set returning functions because the tuplestore for those
  is fully materialized when the first row is fetched.

 That's surely a problem for another day.

OK

Best regards,
Etsuro Fujita




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Libxml2 load error on Windows

2012-06-19 Thread Alex Shulgin

Dave Page dp...@pgadmin.org writes:

 On Tue, Jun 19, 2012 at 4:43 AM, Robert Haas robertmh...@gmail.com wrote:


 Please add this patch here so that it doesn't get lost in the shuffle:

 https://commitfest.postgresql.org/action/commitfest_view/open

 Hmm, that raises an interesting question (though maybe I've just
 missed this happening in the past). This is a bug fix, that we'll want
 in the next back branch updates, not just 9.3. Sticking it in the open
 commit fest means it may well not get looked at for 2-3 months or so.
 How do we prevent that happening?

In a real bug-tracking system we would create a new bug/ticket and set
it's target version to 'candidate for next minor release' or something
like that.  This way, if we don't release unless all targeted bugs are
resolved, this would be taken care of (hopefully.)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Libxml2 load error on Windows

2012-06-19 Thread Dave Page
On Tue, Jun 19, 2012 at 11:04 AM, Alex Shulgin a...@commandprompt.com wrote:

 Dave Page dp...@pgadmin.org writes:

 On Tue, Jun 19, 2012 at 4:43 AM, Robert Haas robertmh...@gmail.com wrote:


 Please add this patch here so that it doesn't get lost in the shuffle:

 https://commitfest.postgresql.org/action/commitfest_view/open

 Hmm, that raises an interesting question (though maybe I've just
 missed this happening in the past). This is a bug fix, that we'll want
 in the next back branch updates, not just 9.3. Sticking it in the open
 commit fest means it may well not get looked at for 2-3 months or so.
 How do we prevent that happening?

 In a real bug-tracking system we would create a new bug/ticket and set
 it's target version to 'candidate for next minor release' or something
 like that.  This way, if we don't release unless all targeted bugs are
 resolved, this would be taken care of (hopefully.)

Well yes, but the point is that that is not how the project works. I'm
asking how we do handle this problem at the moment, because I realised
I haven't seen this happen in the past (largely because I haven't been
paying attention).

-- 
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Libxml2 load error on Windows

2012-06-19 Thread Alex Shulgin

Dave Page dp...@pgadmin.org writes:

 On Tue, Jun 19, 2012 at 11:04 AM, Alex Shulgin a...@commandprompt.com wrote:

 In a real bug-tracking system we would create a new bug/ticket and set
 it's target version to 'candidate for next minor release' or something
 like that.  This way, if we don't release unless all targeted bugs are
 resolved, this would be taken care of (hopefully.)

 Well yes, but the point is that that is not how the project works. I'm
 asking how we do handle this problem at the moment, because I realised
 I haven't seen this happen in the past (largely because I haven't been
 paying attention).

It only works as long as there is some mechanism to ensure that the bugs
which were not submitted to commitfest are looked at.  If there is no,
then it doesn't work actually.

So is there a real and reliable mechanism for that?

--
Alex

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] return values of backend sub-main functions

2012-06-19 Thread Peter Eisentraut
On ons, 2012-01-18 at 21:21 +0200, Peter Eisentraut wrote:
 On lör, 2012-01-07 at 16:41 -0500, Tom Lane wrote:
  Peter Eisentraut pete...@gmx.net writes:
   I suggest that we change PostgresMain(), PostmasterMain(), BackendRun(),
   WalSenderMain(), and WalSndLoop() to return void as well.
  
  I agree this code is not very consistent or useful, but one question:
  what should the callers do if one of these functions *does* return?
 
 I was thinking of a two-pronged approach:  First, add
 __attribute__((noreturn)) to the functions.  This will cause a suitable
 compiler to verify on a source-code level that nothing actually returns
 from the function.  And second, at the call site, put an abort();  /*
 not reached */.  Together, this will make the code cleaner and more
 consistent, and will also help the compiler out a bit about the control
 flow.

Patch for 9.3 attached.

diff --git a/src/backend/main/main.c b/src/backend/main/main.c
index c7d48e9..33c5a0a 100644
--- a/src/backend/main/main.c
+++ b/src/backend/main/main.c
@@ -173,7 +173,7 @@
 
 #ifdef EXEC_BACKEND
 	if (argc  1  strncmp(argv[1], --fork, 6) == 0)
-		exit(SubPostmasterMain(argc, argv));
+		SubPostmasterMain(argc, argv); /* does not return */
 #endif
 
 #ifdef WIN32
@@ -189,14 +189,13 @@
 
 	if (argc  1  strcmp(argv[1], --boot) == 0)
 		AuxiliaryProcessMain(argc, argv);		/* does not return */
-
-	if (argc  1  strcmp(argv[1], --describe-config) == 0)
-		exit(GucInfoMain());
-
-	if (argc  1  strcmp(argv[1], --single) == 0)
-		exit(PostgresMain(argc, argv, get_current_username(progname)));
-
-	exit(PostmasterMain(argc, argv));
+	else if (argc  1  strcmp(argv[1], --describe-config) == 0)
+		GucInfoMain();			/* does not return */
+	else if (argc  1  strcmp(argv[1], --single) == 0)
+		PostgresMain(argc, argv, get_current_username(progname)); /* does not return */
+	else
+		PostmasterMain(argc, argv); /* does not return */
+	abort();		/* should not get here */
 }
 
 
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index eeea933..0f8af3a 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -343,8 +343,8 @@ static void LogChildExit(int lev, const char *procname,
 			 int pid, int exitstatus);
 static void PostmasterStateMachine(void);
 static void BackendInitialize(Port *port);
-static int	BackendRun(Port *port);
-static void ExitPostmaster(int status);
+static void	BackendRun(Port *port) __attribute__((noreturn));
+static void ExitPostmaster(int status) __attribute__((noreturn));
 static int	ServerLoop(void);
 static int	BackendStartup(Port *port);
 static int	ProcessStartupPacket(Port *port, bool SSLdone);
@@ -491,7 +491,7 @@ static bool save_backend_variables(BackendParameters *param, Port *port,
 /*
  * Postmaster main entry point
  */
-int
+void
 PostmasterMain(int argc, char *argv[])
 {
 	int			opt;
@@ -1125,7 +1125,7 @@ static bool save_backend_variables(BackendParameters *param, Port *port,
 	 */
 	ExitPostmaster(status != STATUS_OK);
 
-	return 0;	/* not reached */
+	abort();	/* not reached */
 }
 
 
@@ -3295,7 +3295,7 @@ static bool save_backend_variables(BackendParameters *param, Port *port,
 		BackendInitialize(port);
 
 		/* And run the backend */
-		proc_exit(BackendRun(port));
+		BackendRun(port);
 	}
 #endif   /* EXEC_BACKEND */
 
@@ -3539,7 +3539,7 @@ static bool save_backend_variables(BackendParameters *param, Port *port,
  *		Shouldn't return at all.
  *		If PostgresMain() fails, return status.
  */
-static int
+static void
 BackendRun(Port *port)
 {
 	char	  **av;
@@ -3610,7 +3610,7 @@ static bool save_backend_variables(BackendParameters *param, Port *port,
 	 */
 	MemoryContextSwitchTo(TopMemoryContext);
 
-	return (PostgresMain(ac, av, port-user_name));
+	PostgresMain(ac, av, port-user_name);
 }
 
 
@@ -3960,7 +3960,7 @@ static bool save_backend_variables(BackendParameters *param, Port *port,
  * have been inherited by fork() on Unix.  Remaining arguments go to the
  * subprocess FooMain() routine.
  */
-int
+void
 SubPostmasterMain(int argc, char *argv[])
 {
 	Port		port;
@@ -4195,7 +4195,7 @@ static bool save_backend_variables(BackendParameters *param, Port *port,
 		proc_exit(0);
 	}
 
-	return 1;	/* shouldn't get here */
+	abort();	/* shouldn't get here */
 }
 #endif   /* EXEC_BACKEND */
 
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 45a3b2e..a9a8689 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -121,7 +121,7 @@
 
 /* Prototypes for private functions */
 static bool HandleReplicationCommand(const char *cmd_string);
-static int	WalSndLoop(void);
+static void WalSndLoop(void) __attribute__((noreturn));
 static void InitWalSnd(void);
 static void WalSndHandshake(void);
 static void WalSndKill(int code, Datum arg);
@@ -136,7 +136,7 @@
 
 
 /* Main entry point for walsender process */
-int
+void
 WalSenderMain(void)
 {
 	MemoryContext 

Re: [HACKERS] pgsql_fdw in contrib

2012-06-19 Thread Dickson S. Guedes
2012/6/18 Merlin Moncure mmonc...@gmail.com:
 I can't help but wonder (having been down the contrib/core/extension
 road myself) if it isn't better to improve the facilities to register
 and search for qualified extensions (like Perl CPAN) so that people
 looking for code to improve their backends can find it.  That way,
 you're free to release/do xyz/abandon your project as you see fit
 without having to go through -hackers.  This should also remove a lot
 of the stigma with not being in core since if stock postgres
 installations can access the necessary modules via CREATE EXTENSION, I
 think it will make it easier for projects like this to get used with
 the additional benefit of decentralizing project management.

What about PGXN?

BTW, I'm with Robert about we want to have at least one FDW in
core that actually talks to some other database server, rather than
just to a file, and it seems like pgsql_fdw is the obvious choice.

We have dblink as contrib, why not pgsql_fdw too?

Other FDWs could be available at PGXN, pgFoundry, Github, etc.

[]s
-- 
Dickson S. Guedes
mail/xmpp: gue...@guedesoft.net - skype: guediz
http://guedesoft.net - http://www.postgresql.org.br

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] use of int4/int32 in C code

2012-06-19 Thread Peter Eisentraut
What is the latest theory on using int4 vs. int32 in C code?

(equivalently int2, int16)

I had the idea that using int4 was sort of deprecated, and most code
uses int32, but I've come across several uses of int4 lately that looked
odd to me.

I think the main reason that we define int4 in C is for the
src/include/catalog/ files, but that won't work anymore if we ever want
to have an int8 column there.

Ideas:

* Leave it be.

* Replace int4 by int32 everywhere except in the catalog files.  Hope it
stays that way.

* Mark int4 as deprecated, change catalog files to use int32, adjust the
bki generation scripts.  (I think removing int4 might not be wise, as
external modules might be using it.)

While we're at it, how do we feel about using C standard types like
int32_t instead of (or initially in addition to) our own definitions?
These are well established by now and would help modernize our code and
the code of extensions a bit.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Testing 9.2 in ~production environment

2012-06-19 Thread Tom Lane
Peter Eisentraut pete...@gmx.net writes:
 On tis, 2012-06-19 at 02:38 -0400, Tom Lane wrote:
 If James is testing text-comparison-heavy operations, it doesn't seem
 shocking in the least.  strcoll() in most non-C locales is a pig. 

 Ah yes, of course, having lc_ctype != C also selects strcoll instead of
 strcmp.

Come to think of it, another possible factor is that LIKE can't use
ordinary indexes on text if the locale isn't C.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] use of int4/int32 in C code

2012-06-19 Thread Tom Lane
Peter Eisentraut pete...@gmx.net writes:
 What is the latest theory on using int4 vs. int32 in C code?
 (equivalently int2, int16)

I thought the general idea was to use int32 most places, but int4 in
catalog declarations.  I don't think it's tremendously important if
somebody uses the other though.

 While we're at it, how do we feel about using C standard types like
 int32_t instead of (or initially in addition to) our own definitions?

Can't get very excited about this either.  The most likely outcome of
a campaign to substitute the standard types is that back-patching would
become a truly painful activity.  IMO, anything that is going to result
in tens of thousands of diffs had better have a more-than-cosmetic
reason.  (That wouldn't apply if we only used int32_t in new code ...
but then, instead of two approved ways to do it, there would be three.
Which doesn't seem like it improves matters.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgsql_fdw in contrib

2012-06-19 Thread Merlin Moncure
On Mon, Jun 18, 2012 at 11:01 PM, Robert Haas robertmh...@gmail.com wrote:
 On Mon, Jun 18, 2012 at 12:24 PM, Merlin Moncure mmonc...@gmail.com wrote:
 I can't help but wonder (having been down the contrib/core/extension
 road myself) if it isn't better to improve the facilities to register
 and search for qualified extensions (like Perl CPAN) so that people
 looking for code to improve their backends can find it.  That way,
 you're free to release/do xyz/abandon your project as you see fit
 without having to go through -hackers.  This should also remove a lot
 of the stigma with not being in core since if stock postgres
 installations can access the necessary modules via CREATE EXTENSION, I
 think it will make it easier for projects like this to get used with
 the additional benefit of decentralizing project management.

 Well, if you're the type that likes to install everything via RPMs
 (and I am) then you wouldn't want this behavior, especially not
 automagically.  It seems to open up a host of security risks, too: I
 believe Tom has previously stated that Red Hat (and other distro)
 packaging guidelines forbid packages from installing software in
 places where they can then turn around and run it.  I suppose CPAN
 must have some sort of exception to this policy, though, so maybe it's
 not ironclad.

Huh? CPAN is just one example -- PyPM for python, npm for node etc
etc.  There is some distinction to that rule that is being missed so
that it doesn't apply to cases like this -- probably that it is
transacted by the user and/or requires su.

I think your points are supporting mine: the vast majority of postgres
administrators deal only with o/s packaged rpms and therefore with the
status quo are unable to utilize any extension that is not packaged
with contrib.  This means that there are two possible outcomes if you
want to write an extension:

1) get accepted into core
2) never get used

Given that running the gauntlet to #1 is not a particularly attractive
option (or even possible) in many cases for various reasons it tends
to discourage module development by 3rd parties.  There are several
very high quality extensions that could get a lot more exposure...for
example pl/sh.

But anyways, if you're happy with pgsql_fdw (aside: i looked at it,
and it's great!) in core, then by all means...

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allow WAL information to recover corrupted pg_controldata

2012-06-19 Thread Fujii Masao
On Tue, Jun 19, 2012 at 2:44 AM, Amit Kapila amit.kap...@huawei.com wrote:
 AFAIR you can create pg_control from scratch already with pg_resetxlog.
 The hard part is coming up with values for the counters, such as the
 next WAL location.  Some of them such as next OID are pretty harmless
 if you don't guess right, but I'm worried that wrong next WAL could
 make things worse not better.

 I believe if WAL files are proper as mentioned in Alvaro's mail, the purposed 
 logic should generate
 correct values.
 Do you see any problem in logic purposed in my original mail.
 Can I resume my work on this feature?

Maybe I'm missing your point, but... why don't you just use PITR to
recover from the corruption of pg_control?

Regards,

-- 
Fujii Masao

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH 10/16] Introduce the concept that wal has a 'origin' node

2012-06-19 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On Tuesday, June 19, 2012 08:03:04 AM Tom Lane wrote:
 Every WAL record?  Why in heck would you attach it to every record?
 Surely putting it in WAL page headers would be sufficient.

 The idea is that you can have cascading, circular and whatever replication 
 topologies if you include the logical origin of a wal causing action into 
 it.
 That is, if you have nodes A(1) and B(2) and a insert happens on A the wal 
 records generated by that will get an xl_origin_id = 1 and when it will be 
 decoded and replayed on B it will *also* get the id 1. Only when a change 
 originally is generated on Bit will get xl_origin_id = 2.

None of this explains to me why each individual WAL record would need to
carry a separate copy of the indicator.  We don't have multiple masters
putting records into the same WAL file, nor does it seem to me that it
could possibly be workable to merge WAL streams.  (If you are thinking
of something sufficiently high-level that merging could possibly work,
then it's not WAL, and we shouldn't be trying to make the WAL
representation cater for it.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Do we want a xmalloc or similar function in the Backend?

2012-06-19 Thread Andres Freund
Hi,

There are 70+ calls of malloc in the backend in the form of

type* foo = malloc(sizeof(...));
if(!foo)
   elog(ERROR, could not allocate memory);

which is a bit annoying to write at times. Would somebody argue against 
introducing a function that does the above named xmalloc() or malloc_or_die()?

Greetings,

Andres
-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH 10/16] Introduce the concept that wal has a 'origin' node

2012-06-19 Thread Andres Freund
On Tuesday, June 19, 2012 04:17:01 PM Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On Tuesday, June 19, 2012 08:03:04 AM Tom Lane wrote:
  Every WAL record?  Why in heck would you attach it to every record?
  Surely putting it in WAL page headers would be sufficient.
  
  The idea is that you can have cascading, circular and whatever
  replication topologies if you include the logical origin of a wal
  causing action into it.
  That is, if you have nodes A(1) and B(2) and a insert happens on A the
  wal records generated by that will get an xl_origin_id = 1 and when it
  will be decoded and replayed on B it will *also* get the id 1. Only when
  a change originally is generated on Bit will get xl_origin_id = 2.
 
 None of this explains to me why each individual WAL record would need to
 carry a separate copy of the indicator.  We don't have multiple masters
 putting records into the same WAL file, nor does it seem to me that it
 could possibly be workable to merge WAL streams.  (If you are thinking
 of something sufficiently high-level that merging could possibly work,
 then it's not WAL, and we shouldn't be trying to make the WAL
 representation cater for it.)
The idea is that if youre replaying changes on node A originating from node B 
you set the origin to *B* in the wal records that are generated during that. 
So when B, in a bidirectional setup, replays the changes that A has made it 
can simply ignore all changes which originated on itself.

That works rather simple  performant if you have a conflict avoidance scheme.

For many scenarios you need to be able to separate locally generated changes 
and changes that have been replayed from another node. Without support of 
something like this this is really hard to achieve.

Greetings,

Andres

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH 10/16] Introduce the concept that wal has a 'origin' node

2012-06-19 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On Tuesday, June 19, 2012 04:17:01 PM Tom Lane wrote:
 ...  (If you are thinking
 of something sufficiently high-level that merging could possibly work,
 then it's not WAL, and we shouldn't be trying to make the WAL
 representation cater for it.)

 The idea is that if youre replaying changes on node A originating from node B
 you set the origin to *B* in the wal records that are generated during that. 
 So when B, in a bidirectional setup, replays the changes that A has made it 
 can simply ignore all changes which originated on itself.

This is most certainly not possible at the level of WAL.  As I said
above, we shouldn't be trying to shoehorn high level logical-replication
commands into WAL streams.  No good can come of confusing those concepts.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Do we want a xmalloc or similar function in the Backend?

2012-06-19 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 There are 70+ calls of malloc in the backend in the form of

 type* foo = malloc(sizeof(...));
 if(!foo)
elog(ERROR, could not allocate memory);

 which is a bit annoying to write at times. Would somebody argue against 
 introducing a function that does the above named xmalloc() or malloc_or_die()?

99% of the time, you should be using palloc if that's the behavior you
want.  I think most of the malloc calls are in places where we want a
bit more control over the error response.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] sortsupport for text

2012-06-19 Thread Peter Geoghegan
So, just to give a bit more weight to my argument that we should
recognise that equivalent strings ought to be treated identically, I
direct your attention to conformance requirement C9 of Unicode 3.0:

http://www.unicode.org/unicode/standard/versions/enumeratedversions.html#Unicode_3_0_0

This requires that canonically equivalent text be treated
identically. See also C10 and several other places in The Unicode
Standard.

This page, DETERMINISTIC SORTING, was also interesting:

http://www.unicode.org/notes/tn9/

The same hack that we use in varstr_cmp() appears as psuedo-code,
under Forcing Deterministic Comparisons. I'm not sure in what
general sense what *they'd* call a non-deterministic comparison (i.e.
plain old strcoll()) is actually non-deterministic. Clearly a
non-deterministic comparison is deterministic to the extent that
given the same two binary strings and encoding and collation, the
comparison function will always give the same answer.

The article goes on to describe how we could bolt-on a strxfrm() type
value to a string, to ensure deterministic comparison (i.e.
identical behaviour to Postgres master) with pre-processing. Tom did
think that this might still be worth it. I'd rather avoid it.

The article goes on to say of so-called deterministic comparisons:
However, a deterministic comparison is generally not best practice.
First, it has a certain performance cost in comparison, and a quite
substantial impact on sort key size. (For example, [ICU]
language-sensitive sort keys are generally about the size of the
original string, so appending a copy of the original string generally
doubles the size of the sort key.)  A database using these sort keys
can have substantially increased disk footprint and memory footprint,
and consequently reduced performance.

I note also that the The Single UNIX Specification, Version 2 says of
strcoll(): The strxfrm() and strcmp() functions should be used for
sorting large lists. Why has it taken us this long to research this?
I am aware that Greg Stark suggested it back in 2005, but I doubt that
that was the first time.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH 10/16] Introduce the concept that wal has a 'origin' node

2012-06-19 Thread Andres Freund
On Tuesday, June 19, 2012 04:30:59 PM Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On Tuesday, June 19, 2012 04:17:01 PM Tom Lane wrote:
  ...  (If you are thinking
  of something sufficiently high-level that merging could possibly work,
  then it's not WAL, and we shouldn't be trying to make the WAL
  representation cater for it.)
  
  The idea is that if youre replaying changes on node A originating from
  node B you set the origin to *B* in the wal records that are generated
  during that. So when B, in a bidirectional setup, replays the changes
  that A has made it can simply ignore all changes which originated on
  itself.
 
 This is most certainly not possible at the level of WAL. 
Huh? This isn't used during normal crash-recovery replay. The information is 
used when decoding the wal into logical changes and applying those. Its just a 
common piece of information thats needed for a large number of records. 

Alternatively it could be added to all the records that need it, but that 
would smear the necessary logic - which is currently trivial - over more of 
the backend. And it would increase the actual size of wal which this one did 
not.

 As I said above, we shouldn't be trying to shoehorn high level logical-
 replication commands into WAL streams.  No good can come of confusing those
 concepts.
Its not doing anything high-level in there? All that patch does is embedding 
one single piece of information in previously unused space.

I can follow the argument that you do not want *any* logical information in 
the wal. But as I said in the patchset-introducing email: I don't really see 
an alternative. Otherwise we would just duplicate all the locking/scalability 
issues of xlog as well as the amount of writes.
This is, besides logging some more informations when wal_level = logical in 
some particular records (HEAP_UPDATE|DELETE and ensuring fpw's don't remove 
the record data in HEAP_(INSERT|UPDATE|DELETE) in patch 07/16 the only change 
that I really forsee being needed for doing the logical stuff.

Do you really see this as such a big problem?

Andres
-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Do we want a xmalloc or similar function in the Backend?

2012-06-19 Thread Andres Freund
On Tuesday, June 19, 2012 04:38:56 PM Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  There are 70+ calls of malloc in the backend in the form of
  
  type* foo = malloc(sizeof(...));
  if(!foo)
  
 elog(ERROR, could not allocate memory);
  
  which is a bit annoying to write at times. Would somebody argue against
  introducing a function that does the above named xmalloc() or
  malloc_or_die()?
 
 99% of the time, you should be using palloc if that's the behavior you
 want.  I think most of the malloc calls are in places where we want a
 bit more control over the error response.
There are surprisingly many calls that just have the above logic without a 
more elaborate error message. Its mostly cases which allocate memory just once 
and never release it again to avoid having huge static data around for 
processes not using that part of the code.
True enough, most of those could use TopMemoryContext, its a rather 
established pattern not to do so in those cases though...

Andres
-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] lock_timeout and common SIGALRM framework

2012-06-19 Thread Alvaro Herrera

Excerpts from Boszormenyi Zoltan's message of mar jun 19 04:44:35 -0400 2012:
 
 2012-06-18 19:46 keltezéssel, Alvaro Herrera írta:
  Excerpts from Boszormenyi Zoltan's message of vie may 11 03:54:13 -0400 
  2012:
  Hi,
 
  another rebasing and applied the GIT changes in
  ada8fa08fc6cf5f199b6df935b4d0a730aaa4fec to the
  Windows implementation of PGSemaphoreTimedLock.
  Hi,
 
  I gave the framework patch a look.  One thing I'm not sure about is the
  way you've defined the API.  It seems a bit strange to have a nice and
  clean, generic interface in timeout.h; and then have the internal
  implementation of the API cluttered with details such as what to do when
  the deadlock timeout expires.  Wouldn't it be much better if we could
  keep the details of CheckDeadLock itself within proc.c, for example?
 
 Do you mean adding a callback function argument to for enable_timeout()
 would be better?

Well, that's not exactly what I was thinking, but maybe your idea is
better than what I had in mind.

  Same for the other specific Check*Timeout functions.  It seems to me
  that the only timeout.c specific requirement is to be able to poke
  base_timeouts[].indicator and fin_time.  Maybe that can get done in
  timeout.c and then have the generic checker call the module-specific
  checker.
 
 Or instead of static functions, Check* functions can be external
 to timeout.c. It seemed to be a good idea to move all the timeout
 related functions to timeout.c.

Yeah, except that they play with members of the timeout_params struct,
which hopefully does not need to be exported.  So if you can just move
(that is, put back in their original places) the portions that do not
touch that strcut, it'd be great.

  Also, I see you're calling GetCurrentTimestamp() in the checkers; maybe
  more than once per signal if you have multiple timeouts enabled.
 
 Actually, GetCurrentTimestamp() is not called multiple times,
 because only the first timeout source in the queue can get triggered.

I see.

  BTW it doesn't seem that datatype/timestamp.h is really necessary in
  timeout.h.
 
 You are wrong. get_timeout_start() returns TimestampTz and it's
 defined in datatype/timestamp.h.

Oh, right.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgsql_fdw in contrib

2012-06-19 Thread Kohei KaiGai
2012/6/19 Merlin Moncure mmonc...@gmail.com:
 On Mon, Jun 18, 2012 at 11:01 PM, Robert Haas robertmh...@gmail.com wrote:
 On Mon, Jun 18, 2012 at 12:24 PM, Merlin Moncure mmonc...@gmail.com wrote:
 I can't help but wonder (having been down the contrib/core/extension
 road myself) if it isn't better to improve the facilities to register
 and search for qualified extensions (like Perl CPAN) so that people
 looking for code to improve their backends can find it.  That way,
 you're free to release/do xyz/abandon your project as you see fit
 without having to go through -hackers.  This should also remove a lot
 of the stigma with not being in core since if stock postgres
 installations can access the necessary modules via CREATE EXTENSION, I
 think it will make it easier for projects like this to get used with
 the additional benefit of decentralizing project management.

 Well, if you're the type that likes to install everything via RPMs
 (and I am) then you wouldn't want this behavior, especially not
 automagically.  It seems to open up a host of security risks, too: I
 believe Tom has previously stated that Red Hat (and other distro)
 packaging guidelines forbid packages from installing software in
 places where they can then turn around and run it.  I suppose CPAN
 must have some sort of exception to this policy, though, so maybe it's
 not ironclad.

 Huh? CPAN is just one example -- PyPM for python, npm for node etc
 etc.  There is some distinction to that rule that is being missed so
 that it doesn't apply to cases like this -- probably that it is
 transacted by the user and/or requires su.

 I think your points are supporting mine: the vast majority of postgres
 administrators deal only with o/s packaged rpms and therefore with the
 status quo are unable to utilize any extension that is not packaged
 with contrib.  This means that there are two possible outcomes if you
 want to write an extension:

 1) get accepted into core
 2) never get used

 Given that running the gauntlet to #1 is not a particularly attractive
 option (or even possible) in many cases for various reasons it tends
 to discourage module development by 3rd parties.  There are several
 very high quality extensions that could get a lot more exposure...for
 example pl/sh.

 But anyways, if you're happy with pgsql_fdw (aside: i looked at it,
 and it's great!) in core, then by all means...

Let me push the pgsql_fdw in core from different perspective.

Right now, FDW is a feature that will take many enhancement in
the near future like join-pushdown, writable APIs and so on.
If we would not have a FDW extension in core that communicate
with actual RDBMS, instead of flat files, it makes our development
efforts more complex. Please assume a scenario people alwats
tries to submit two patches; one for the core, and the other for
a particular out-of-tree extension.
I believe it is reasonable choice to have a FDW for PostgreSQL
in core, rather than MySQL or Oracle. We can use this extension
to show how new feature works, whenever we try to extend FDW
APIs.

BTW, Hanada-san-
It seems to me the expected result of regression test was not
included in this patch. Please submit it again-

Thanks,
-- 
KaiGai Kohei kai...@kaigai.gr.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] sortsupport for text

2012-06-19 Thread Kevin Grittner
Peter Geoghegan pe...@2ndquadrant.com wrote:
 
 So, just to give a bit more weight to my argument that we should
 recognise that equivalent strings ought to be treated identically
 
Since we appear to be questioning everything in this area, I'll
raise something which has been bugging me for a while: in some other
systems I've used, the tie-breaker comparison for equivalent
values comes after equivalence sorting on *all* sort keys, rather
than *each* sort key.  For example, this much makes sense with
lc_collate = 'en_US.UTF-8':
 
test=# create table c (last_name text not null, first_name text);
CREATE TABLE
test=# insert into c values ('smith', 'bob'), ('smith', 'peter'),
('SMITH', 'EDWARD');
INSERT 0 3
test=# select * from c order by 2;
 last_name | first_name 
---+
 smith | bob
 SMITH | EDWARD
 smith | peter
(3 rows)
 
This seems completely wrong:
 
test=# select * from c order by 1,2;
 last_name | first_name 
---+
 smith | bob
 smith | peter
 SMITH | EDWARD
(3 rows)
 
I have seen other databases which get it in the order I would expect
-- where the C compare only matters within groups of equivalent
rows.  It seems that PostgreSQL effectively orders by:
 
  last_name using collation 'en_US.UTF-8'
  last_name using collation 'C'
  first_name using collation 'en_US.UTF-8'
  first_name using collation 'C'
 
while some other products order by:
 
  last_name using collation 'en_US.UTF-8'
  first_name using collation 'en_US.UTF-8'
  last_name using collation 'C'
  first_name using collation 'C'
 
I'm sure the latter is harder to do and slower to execute; but the
former just doesn't seem defensible as correct.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] initdb and fsync

2012-06-19 Thread David Fetter
On Mon, Jun 18, 2012 at 09:34:30PM +0300, Peter Eisentraut wrote:
 On mån, 2012-06-18 at 18:05 +0200, Andres Freund wrote:
  - defaulting to initdb -N in the regression suite is not a good imo,
  because that way the buildfarm won't catch problems in that area...
  
 The regression test suite also starts postgres with fsync off.  This is
 good, and I don't want to have more overhead in the tests.  It's not
 like we already have awesome test coverage for OS interaction.

What sorts of coverage would we want?

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] libpq compression

2012-06-19 Thread Robert Haas
On Mon, Jun 18, 2012 at 1:42 PM, Martijn van Oosterhout
klep...@svana.org wrote:
 On Sun, Jun 17, 2012 at 12:29:53PM -0400, Tom Lane wrote:
 The fly in the ointment with any of these ideas is that the configure
 list is not a list of exact cipher names, as per Magnus' comment that
 the current default includes tests like !aNULL.  I am not sure that
 we know how to evaluate such conditions if we are applying an
 after-the-fact check on the selected cipher.  Does OpenSSL expose any
 API for evaluating whether a selected cipher meets such a test?

 I'm not sure whether there's an API for it, but you can certainly check
 manually with openssl ciphers -v, for example:

 $ openssl ciphers -v 'ALL:!ADH:RC4+RSA:+HIGH:+MEDIUM:+LOW:+SSLv2:+EXP'
 NULL-SHA                SSLv3 Kx=RSA      Au=RSA  Enc=None      Mac=SHA1
 NULL-MD5                SSLv3 Kx=RSA      Au=RSA  Enc=None      Mac=MD5

 ...etc...

 So unless the openssl includes the code twice there must be a way to
 extract the list from the library.

There doubtless is, but I'd being willing to wager that you won't be
able to figure out the exact method without reading the source code
for 'opennssl ciphers' to see how it was done there, and most likely
you'll find that at least one of the functions they use has no man
page.  Documentation isn't their strong point.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pl/perl and utf-8 in sql_ascii databases

2012-06-19 Thread Robert Haas
On Mon, Jun 18, 2012 at 3:30 PM, Alvaro Herrera
alvhe...@commandprompt.com wrote:
 Excerpts from Alex Hunsaker's message of vie feb 10 16:53:05 -0300 2012:

 Seems like we missed the fact that we still did SvUTF8_on() in sv2cstr
 and SvPVUTF8() when turning a perl string into a cstring.

 Hmm, this patch belongs into back branches too, right?  Not just the
 current development tree?

It seems like a bug fix to me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL format changes

2012-06-19 Thread Andres Freund
On Tuesday, June 19, 2012 10:14:08 AM Heikki Linnakangas wrote:
 On 18.06.2012 21:08, Heikki Linnakangas wrote:
  On 18.06.2012 21:00, Robert Haas wrote:
  On Thu, Jun 14, 2012 at 5:58 PM, Andres Freundand...@2ndquadrant.com
  
  wrote:
  1. Use a 64-bit segment number, instead of the log/seg combination.
  And don't waste the last segment on each logical 4 GB log file. The
  concept of a logical log file is now completely gone. XLogRecPtr is
  unchanged,
  but it should now be understood as a plain 64-bit value, just split
  into
  two 32-bit integers for historical reasons. On disk, this means that
  there will be log files ending in FF, those were skipped before.
  
  Whats the reason for keeping that awkward split now? There aren't
  that many
  users of xlogid/xcrecoff and many of those would be better served by
  using
  helper macros.
  
  I wondered that, too. There may be a good reason for keeping it split
  up that way, but we at least oughta think about it a bit.
  
  The page header contains an XLogRecPtr (LSN), so if we change it we'll
  have to deal with pg_upgrade. I guess we could still keep XLogRecPtr
  around as the on-disk representation, and convert between the 64-bit
  integer and XLogRecPtr in PageGetLSN/PageSetLSN. I can try that out -
  many xlog calculations would admittedly be simpler if it was an uint64.
 
 Well, that was easier than I thought. Attached is a patch to make
 XLogRecPtr a uint64, on top of my other WAL format patches. I think we
 should go ahead with this.
Cool. You plan to merge XLogSegNo with XLogRecPtr in that case? I am not sure 
if having two representations which just have a constant factor inbetween 
really makes sense.

 The LSNs on pages are still stored in the old format, to avoid changing
 the on-disk format and breaking pg_upgrade. The XLogRecPtrs stored the
 control file and WAL are changed, however, so an initdb (or at least
 pg_resetxlog) is required.
Sounds sensible.

 Should we keep the old representation in the replication protocol
 messages? That would make it simpler to write a client that works with
 different server versions (like pg_receivexlog). Or, while we're at it,
 perhaps we should mandate network-byte order for all the integer and
 XLogRecPtr fields in the replication protocol.
The replication protocol uses pq_sendint for integers which should take care 
of converting to big endian already. I don't think anything but the wal itself 
is otherwise transported in a binary fashion? So I don't think there is any 
such architecture dependency in the protocol currently?

I don't really see a point in keeping around a backward-compatible 
representation just for the sake of running such tools on multiple versions. I 
might not be pragmatic enough, but: Why would you want to do that *at the 
moment*? Many of the other tools are already version specific, so...
When the protocol starts to be used by more tools, maybe, but imo were not 
there yet.

But then its not hard to convert to the old representation for that.

 I kept the %X/%X representation in error messages etc. I'm quite used to
 that output, so reluctant to change it, although it's a bit silly now
 that it represents just 64-bit value. Using UINT64_FORMAT would also
 make the messages harder to translate.
No opinion on that. Its easier to see for me whether two values are exactly 
the same or very similar with the 64bit representation but its harder to gauge 
bigger differences. So ...

Greetings,

Andres
-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Pg default's verbosity?

2012-06-19 Thread Robert Haas
On Tue, Jun 19, 2012 at 2:15 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 There might be something to the idea of demoting a few of the things
 we've traditionally had as NOTICEs, though.  IME, the following two
 messages account for a huge percentage of the chatter:

 NOTICE:  CREATE TABLE will create implicit sequence foo_a_seq for
 serial column foo.a
 NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index
 foo_pkey for table foo

 Personally, I'd have no problem with flat-out dropping (not demoting)
 both of those two specific messages.  I seem to recall that Bruce has
 lobbied for them heavily in the past, though.

My vote would be to make 'em DEBUG1, and similarly with the UNIQUE
message that Fabian mentions downthread.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WAL format changes

2012-06-19 Thread Robert Haas
On Tue, Jun 19, 2012 at 4:14 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 Well, that was easier than I thought. Attached is a patch to make XLogRecPtr
 a uint64, on top of my other WAL format patches. I think we should go ahead
 with this.

+1.

 The LSNs on pages are still stored in the old format, to avoid changing the
 on-disk format and breaking pg_upgrade. The XLogRecPtrs stored the control
 file and WAL are changed, however, so an initdb (or at least pg_resetxlog)
 is required.

Seems fine.

 Should we keep the old representation in the replication protocol messages?
 That would make it simpler to write a client that works with different
 server versions (like pg_receivexlog). Or, while we're at it, perhaps we
 should mandate network-byte order for all the integer and XLogRecPtr fields
 in the replication protocol. That would make it easier to write a client
 that works across different architectures, in = 9.3. The contents of the
 WAL would of course be architecture-dependent, but it would be nice if
 pg_receivexlog and similar tools could nevertheless be
 architecture-independent.

I share Andres' question about how we're doing this already.  I think
if we're going to break this, I'd rather do it in 9.3 than 5 years
from now.  At this point it's just a minor annoyance, but it'll
probably get worse as people write more tools that understand WAL.

 I kept the %X/%X representation in error messages etc. I'm quite used to
 that output, so reluctant to change it, although it's a bit silly now that
 it represents just 64-bit value. Using UINT64_FORMAT would also make the
 messages harder to translate.

I could go either way on this one, but I have no problem with the way
you did it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Skip checkpoint on promoting from streaming replication

2012-06-19 Thread Fujii Masao
On Tue, Jun 19, 2012 at 5:30 PM, Kyotaro HORIGUCHI
horiguchi.kyot...@lab.ntt.co.jp wrote:
 Thank you.

 What happens if the server skips an end-of-recovery checkpoint,
 is promoted to the master, runs some write transactions,
 crashes and restarts automatically before it completes
 checkpoint? In this case, the server needs to do crash recovery
 from the last checkpoint record with old timeline ID to the
 latest WAL record with new timeline ID. How does crash recovery
 do recovery beyond timeline?

 Basically the same as archive recovery as far as I saw. It is
 already implemented to work in that way.

 After this patch applied, StartupXLOG() gets its
 recoveryTargetTLI from the new field lastestTLI in the control
 file instead of the latest checkpoint. And the latest checkpoint
 record informs its TLI and WAL location as before, but its TLI
 does not have a significant meaning in the recovery sequence.

 Suggest the case following,

      |seg 1     | seg 2    |
 TLI 1 |...c..|00|
          C           P  X
 TLI 2            |00|

 * C - checkpoint, P - promotion, X - crash just after here

 This shows the situation that the latest checkpoint(restartpoint)
 has been taken at TLI=1/SEG=1/OFF=4 and promoted at
 TLI=1/SEG=2/OFF=5, then crashed just after TLI=2/SEG=2/OFF=8.
 Promotion itself inserts no wal records but creates a copy of the
 current segment for new TLI. the file for TLI=2/SEG=1 should not
 exist. (Who will create it?)

 The control file will looks as follows

 latest checkpoint : TLI=1/SEG=1/OFF=4
 latest TLI        : 2

 So the crash recovery sequence starts from SEG=1/LOC=4.
 expectedTLIs will be (2, 1) so 1 will naturally be selected as
 the TLI for SEG1 and 2 for SEG2 in XLogFileReadAnyTLI().

 In the closer view, startup constructs expectedTLIs reading the
 timeline hisotry file corresponds to the recoveryTargetTLI. Then
 runs the recovery sequence from the redo point of the latest
 checkpoint using WALs with the largest TLI - which is
 distinguised by its file name, not header - within the
 expectedTLIs in XLogPageRead(). The only difference to archive
 recovery is XLogFileReadAnyTLI() reads only the WAL files already
 sit in pg_xlog directory, and not reaches archive. The pages with
 the new TLI will be naturally picked up as mentioned above in
 this sequence and then will stop at the last readable record.

 latestTLI field in the control file is updated just after the TLI
 was incremented and the new WAL files with the new TLI was
 created. So the crash recovery sequence won't stop before
 reaching the WAL with new TLI disignated in the control file.

Is it guaranteed that all the files (e.g., the latest timeline history file)
required for such crash recovery exist in pg_xlog? If yes, your
approach might work well.

Regards,

-- 
Fujii Masao

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Transactions over pathological TCP connections

2012-06-19 Thread Robert Haas
On Tue, Jun 19, 2012 at 1:56 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 The transaction would be committed before a command success report is
 delivered to the client, so I don't think delivered-and-not-marked is
 possible.

...unless you have configured synchronous_commit=off, or fsync=off.

Or unless your disk melts into a heap of slag and you have to restore
from backup.  You can protect against that last case using synchronous
replication.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH 10/16] Introduce the concept that wal has a 'origin' node

2012-06-19 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On Tuesday, June 19, 2012 04:30:59 PM Tom Lane wrote:
 ...  (If you are thinking
 of something sufficiently high-level that merging could possibly work,
 then it's not WAL, and we shouldn't be trying to make the WAL
 representation cater for it.)

 Do you really see this as such a big problem?

It looks suspiciously like I have a hammer, therefore every problem
must be a nail.  I don't like the design concept of cramming logical
replication records into WAL in the first place.

However, if we're dead set on doing it that way, let us put information
that is only relevant to logical replication records into only the
logical replication records.  Saving a couple bytes in each such record
is penny-wise and pound-foolish, I'm afraid; especially when you're
nailing down hard, unexpansible limits at the very beginning of the
development process in order to save those bytes.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgsql_fdw in contrib

2012-06-19 Thread Merlin Moncure
On Tue, Jun 19, 2012 at 10:15 AM, Kohei KaiGai kai...@kaigai.gr.jp wrote:
 Let me push the pgsql_fdw in core from different perspective.

 Right now, FDW is a feature that will take many enhancement in
 the near future like join-pushdown, writable APIs and so on.
 If we would not have a FDW extension in core that communicate
 with actual RDBMS, instead of flat files, it makes our development
 efforts more complex. Please assume a scenario people alwats
 tries to submit two patches; one for the core, and the other for
 a particular out-of-tree extension.

yeah. plus, it's nice to have a very high quality fdw implementation
in core for others to crib from.  I have no objection -- I just took a
convenient opportunity to hijack the thread for some hand-wavy ideas
about an extension repository. :-).

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] sortsupport for text

2012-06-19 Thread Peter Geoghegan
On 19 June 2012 16:17, Kevin Grittner kevin.gritt...@wicourts.gov wrote:
 Peter Geoghegan pe...@2ndquadrant.com wrote:

 So, just to give a bit more weight to my argument that we should
 recognise that equivalent strings ought to be treated identically

 Since we appear to be questioning everything in this area, I'll
 raise something which has been bugging me for a while: in some other
 systems I've used, the tie-breaker comparison for equivalent
 values comes after equivalence sorting on *all* sort keys, rather
 than *each* sort key.

Are you sure that they actually have a tie-breaker, and don't just
make the distinction between equality and equivalence (if only
internally)? I would have checked that myself already, but I don't
have access to any other RDBMS that I'd expect to care about these
kinds of distinctions. They make sense for ensuring that the text
comparator's notion of equality is consistent with text's general
notion (if that's bitwise equality, which I suspect it is in these
other products too for the same reasons it is for us). I don't see why
you'd want a tie-breaker across multiple keys. I mean, you could, I
just don't see any reason to.

 test=# select * from c order by 2;
  last_name | first_name
 ---+
  smith     | bob
  SMITH     | EDWARD
  smith     | peter
 (3 rows)

 This seems completely wrong:

 test=# select * from c order by 1,2;
  last_name | first_name
 ---+
  smith     | bob
  smith     | peter
  SMITH     | EDWARD
 (3 rows)

Agreed. Definitely a POLA violation.

 I'm sure the latter is harder to do and slower to execute; but the
 former just doesn't seem defensible as correct.

This same gripe is held by the author of that sorting document I
linked to from the Unicode consortium, with a very similar example. So
it seems like this could be a win from several perspectives, as it
would enable the strxfrm() optimisation. I'm pretty sure that
pg_upgrade wouldn't be very happy about this, so we'd have to have a
legacy compatibility mode.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP Patch: Selective binary conversion of CSV file foreign tables

2012-06-19 Thread Kohei KaiGai
Hi Fujita-san,

Could you rebase this patch towards the latest tree?
It was unavailable to apply the patch cleanly.

I looked over the patch, then noticed a few points.

At ProcessCopyOptions(), defel-arg can be NIL, isn't it?
If so, cstate-convert_binary is not a suitable flag to check
redundant option. It seems to me cstate-convert_selectively
is more suitable flag to check it.

+   else if (strcmp(defel-defname, convert_binary) == 0)
+   {
+   if (cstate-convert_binary)
+   ereport(ERROR,
+   (errcode(ERRCODE_SYNTAX_ERROR),
+errmsg(conflicting or redundant options)));
+   cstate-convert_selectively = true;
+   if (defel-arg == NULL || IsA(defel-arg, List))
+   cstate-convert_binary = (List *) defel-arg;
+   else
+   ereport(ERROR,
+   (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+errmsg(argument to option \%s\ must be a
list of column names,
+   defel-defname)));
+   }


At NextCopyFrom(), this routine computes default values if configured.
In case when these values are not referenced, it might be possible to
skip unnecessary calculations.
Is it unavailable to add logic to avoid to construct cstate-defmap on
unreferenced columns at  BeginCopyFrom()?

Thanks,

2012/5/11 Etsuro Fujita fujita.ets...@lab.ntt.co.jp:
 -Original Message-
 From: Robert Haas [mailto:robertmh...@gmail.com]
 Sent: Friday, May 11, 2012 1:36 AM
 To: Etsuro Fujita
 Cc: pgsql-hackers@postgresql.org
 Subject: Re: [HACKERS] WIP Patch: Selective binary conversion of CSV file
 foreign tables

 On Tue, May 8, 2012 at 7:26 AM, Etsuro Fujita
 fujita.ets...@lab.ntt.co.jp
 wrote:
  I would like to propose to improve parsing efficiency of
  contrib/file_fdw by selective parsing proposed by Alagiannis et
  al.[1], which means that for a CSV/TEXT file foreign table, file_fdw
  performs binary conversion only for the columns needed for query
  processing.  Attached is a WIP patch implementing the feature.

 Can you add this to the next CommitFest?  Looks interesting.

 Done.

 Best regards,
 Etsuro Fujita

 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL
 Company




 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers



-- 
KaiGai Kohei kai...@kaigai.gr.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH 10/16] Introduce the concept that wal has a 'origin' node

2012-06-19 Thread Andres Freund
Hi,

On Tuesday, June 19, 2012 06:11:20 PM Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On Tuesday, June 19, 2012 04:30:59 PM Tom Lane wrote:
  ...  (If you are thinking
  of something sufficiently high-level that merging could possibly work,
  then it's not WAL, and we shouldn't be trying to make the WAL
  representation cater for it.)
  
  Do you really see this as such a big problem?
 It looks suspiciously like I have a hammer, therefore every problem
 must be a nail.  I don't like the design concept of cramming logical
 replication records into WAL in the first place.
There are - so far - no specific logical replication records. Its a 
relatively minor amount of additional data under wal_level=logical for 
existing records. HEAP_UPDATE gets the old primary key on updates changing the 
pkey and HEAP_DELETE always has the pkey. HEAP_INSERT|UPDATE|
DELETE,HEAP2_MULTI_INSERT put their information in another XLogRecData block 
than the page to handle full page writes. Thats it.

I can definitely understand hesitation about that, but I simply see no 
realistic way to solve the issues of existing replication solutions otherwise.
Do you have a better idea to solve those than the above? Without significant 
complications of the backend code and without loads of additional writes going 
on?
I *really* would like to hear them if you do.

 However, if we're dead set on doing it that way, let us put information
 that is only relevant to logical replication records into only the
 logical replication records. 
I found, and still do, the idea of having the origin_id in there rather 
elegant. If people prefer adding the same block to all of the above xlog 
records: I can live with that and will then do so. It makes some things more 
complicated, but its not too bad.

Greetings,

Andres
-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] sortsupport for text

2012-06-19 Thread Kevin Grittner
Peter Geoghegan pe...@2ndquadrant.com wrote:
 Kevin Grittner kevin.gritt...@wicourts.gov wrote:
 
 Since we appear to be questioning everything in this area, I'll
 raise something which has been bugging me for a while: in some
 other systems I've used, the tie-breaker comparison for
 equivalent values comes after equivalence sorting on *all* sort
 keys, rather than *each* sort key.
 
 Are you sure that they actually have a tie-breaker, and don't just
 make the distinction between equality and equivalence (if only
 internally)?
 
I'm pretty sure that when I was using Sybase ASE the order for
non-equal values was always predictable, and it behaved in the
manner I describe below.  I'm less sure about any other product.
 
 I don't see why you'd want a tie-breaker across multiple keys. I
 mean, you could, I just don't see any reason to.
 
So that you can have entirely repeatable results across multiple
runs?
 
 test=# select * from c order by 2;
  last_name | first_name
 ---+
  smith | bob
  SMITH | EDWARD
  smith | peter
 (3 rows)

 This seems completely wrong:

 test=# select * from c order by 1,2;
  last_name | first_name
 ---+
  smith | bob
  smith | peter
  SMITH | EDWARD
 (3 rows)
 
 Agreed. Definitely a POLA violation.
 
 I'm sure the latter is harder to do and slower to execute; but
 the former just doesn't seem defensible as correct.
 
 This same gripe is held by the author of that sorting document I
 linked to from the Unicode consortium, with a very similar
 example. So it seems like this could be a win from several
 perspectives, as it would enable the strxfrm() optimisation. I'm
 pretty sure that pg_upgrade wouldn't be very happy about this, so
 we'd have to have a legacy compatibility mode.
 
At a minimum, it could require rebuilding indexes with multiple
columns where any indexed value before the last index column can be
equivalent but not equal.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Near-duplicate RI NO ACTION and RESTRICT triggers

2012-06-19 Thread Tom Lane
Our current interpretation of the difference between foreign keys with
ON UPDATE/DELETE NO ACTION and those with ON UPDATE/DELETE RESTRICT
is that they mean the same thing but RESTRICT checks are not deferrable.
It follows from this that the trigger code ought to be the same for
NO ACTION and RESTRICT cases, and so it occurred to me that we could
get rid of a few hundred lines in ri_triggers.c if we removed the
duplicated code.

Comparing the actual code in the different functions, though, there is
a difference: the NO ACTION triggers call ri_Check_Pk_Match to see if
another PK row has been inserted/modified to provide the same key values
that the trigger subject row no longer does.  (ri_Check_Pk_Match also
makes some checks for NULL-key cases, but these are redundant with tests
made in the main trigger functions, so they ought to go away.)  The
RESTRICT triggers do not do this.  It's fairly easy to construct a case
where it makes a difference:

create temp table pp (f1 int primary key);
create temp table cc (f1 int references pp on update no action);
insert into pp values(12);
insert into pp values(11);
update pp set f1=f1+1; -- now we have 13 and 12
insert into cc values(13); -- ok
update pp set f1=f1+1; -- now we have 14 and 13, FK is still OK
update pp set f1=f1+1; -- would result in 15 and 14, so FK fails

If you change the foreign key to be ON UPDATE RESTRICT, the second
UPDATE fails because the RESTRICT trigger doesn't notice that another
PK row has been modified to provide the key required by the FK row.

I think that the argument for having the RESTRICT triggers behave
like this is that the SQL spec envisions the RESTRICT check occurring
immediately when the individual PK row is updated/deleted, and so there
would be no opportunity for another PK row to be updated into its place.
(Or, in plainer English, RESTRICT should mean you can't modify this
row's keys at all if it has dependents.)  Because we implement RESTRICT
through an AFTER trigger that can't run earlier than end-of-statement,
we can't exactly match the spec's semantics, but we can get fairly
close so long as you don't think about what would be seen by
e.g. user-written triggers executing during the statement.

I'm happy with continuing to have this behavioral difference between
the two sets of triggers, but wanted to throw it up for discussion:
does anyone think it'd be better to apply ri_Check_Pk_Match in the
RESTRICT triggers too?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] lock_timeout and common SIGALRM framework

2012-06-19 Thread Alvaro Herrera

Excerpts from Boszormenyi Zoltan's message of mar jun 19 12:44:04 -0400 2012:

 OK, all 4 Check* functions are now moved back into proc.c,
 nothing outside of timeout.c touches anything in it. New patches
 are attached.

Yeah, I like this one better, thanks.

It seems to me that the check functions are no longer check anymore,
right?  I mean, they don't check whether the time has expired.  It can
be argued that CheckDeadLock is well named, because it does check
whether there is a deadlock before doing anything else; but
CheckStandbyTimeout is no longer a check -- it just sends a signal.
Do we need to rename these functions?

Why are you using the deadlock timeout for authentication?  Wouldn't it
make more sense to have a separate TimeoutName, just to keep things
clean?

The NB: comment here doesn't seem to be useful anymore:
+ /*
+  * Init, Destroy and Check functions for different timeouts.
+  *
+  * NB: all Check* functions are run inside a signal handler, so be very wary
+  * about what is done in them or in called routines.
+  
*/


In base_timeouts you don't initialize fin_time for any of the members.
This is probably unimportant but then why initialize start_time?

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] sortsupport for text

2012-06-19 Thread Peter Geoghegan
On 19 June 2012 17:45, Kevin Grittner kevin.gritt...@wicourts.gov wrote:
 Peter Geoghegan pe...@2ndquadrant.com wrote:
 Are you sure that they actually have a tie-breaker, and don't just
 make the distinction between equality and equivalence (if only
 internally)?

 I'm pretty sure that when I was using Sybase ASE the order for
 non-equal values was always predictable, and it behaved in the
 manner I describe below.  I'm less sure about any other product.

Maybe it used a physical row identifier as a tie-breaker? Note that we
use ItemPointers as a tie-breaker for sorting index tuples.

I imagine that it was at least predictable among columns being sorted,
if only because en_US.UTF-8 doesn't have any notion of equivalence
(that is, it just so happens that there are no two strings that are
equivalent but not bitwise equal). It would surely be impractical to
do a comparison for the entire row, as that could be really expensive.

 I don't see why you'd want a tie-breaker across multiple keys. I
 mean, you could, I just don't see any reason to.

 So that you can have entirely repeatable results across multiple
 runs?

I suppose that isn't an unreasonable use-case, but it would be
unreasonable to require that everyone pay for it if, particularly if
it implied another strcmp(). You don't need me to tell you that we
generally discourage the idea that the order that tuples are returned
in is defined in any way beyond that asked for by the user.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] initdb and fsync

2012-06-19 Thread Jeff Davis
On Mon, 2012-06-18 at 21:41 +0200, Andres Freund wrote:
 It calls pg_flush_data inside of copy_file which does the posix_fadvise... So 
 maybe just put the sync_file_range in pg_flush_data?

The functions in fd.c aren't linked to initdb, so it's a challenge to
share that code (I remember now: that's why I didn't use pg_flush_data
originally). I don't see a clean way to resolve that, do you?

Also, it seems that sync_file_range() doesn't help with creating a
database much (on my machine), so it doesn't seem important to use it
there.

Right now I'm inclined to leave the patch as-is.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH 10/16] Introduce the concept that wal has a 'origin' node

2012-06-19 Thread Robert Haas
On Tue, Jun 19, 2012 at 12:11 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Andres Freund and...@2ndquadrant.com writes:
 On Tuesday, June 19, 2012 04:30:59 PM Tom Lane wrote:
 ...  (If you are thinking
 of something sufficiently high-level that merging could possibly work,
 then it's not WAL, and we shouldn't be trying to make the WAL
 representation cater for it.)

 Do you really see this as such a big problem?

 It looks suspiciously like I have a hammer, therefore every problem
 must be a nail.  I don't like the design concept of cramming logical
 replication records into WAL in the first place.

Me, neither.  I think it's necessary to try to find a way of
generating logical replication records from WAL.  But once generated,
I think those records should form their own stream, independent of
WAL.  If you take the contrary position that they should be included
in WAL, then when you filter the WAL stream down to just the records
of interest to logical replication, you end up with a WAL stream with
holes in it, which is one of the things that Andres listed as an
unresolved design problem in his original email.

Moreover, this isn't necessary at all for single-master replication,
or even multi-source replication where each table has a single master.
 It's only necessary for full multi-master replication, which we have
no consensus to include in core, and even if we did have a consensus
to include it in core, it certainly shouldn't be the first feature we
design.

 However, if we're dead set on doing it that way, let us put information
 that is only relevant to logical replication records into only the
 logical replication records.

Right.  If we decide we need this, and if we did decide to conflate
the WAL stream, both of which I disagree with as noted above, then we
still don't need it on every record.  It would probably be sufficient
for local transactions to do nothing at all (and we can implicitly
assume that they have master node ID = local node ID) and transactions
which are replaying remote changes to emit one record per XID per
checkpoint cycle containing the remote node ID.

 Saving a couple bytes in each such record
 is penny-wise and pound-foolish, I'm afraid; especially when you're
 nailing down hard, unexpansible limits at the very beginning of the
 development process in order to save those bytes.

I completely agree.  I think that, as Dan said upthread, having a 64
or 128 bit ID so that it can be generated automatically rather than
configured by an administrator who must be careful not to duplicate
node IDs in any pair of systems that could ever end up talking to each
other would be a vast usability improvement.  Perhaps systems A, B,
and C are replicating to each other today, as are systems D and E.
But now suppose that someone decides they want to replicate one table
between A and D.  Suddenly the node IDs have to be distinct where they
didn't before, and now there's potentially a problem to hassle with
that wouldn't have been an issue if the node IDs had been wide enough
to begin with.  It is not unusual for people to decide after-the-fact
to begin replicating between machines where this wasn't originally
anticipated and which may even be even be under different
administrative control.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] initdb and fsync

2012-06-19 Thread Andres Freund
On Tuesday, June 19, 2012 07:22:02 PM Jeff Davis wrote:
 On Mon, 2012-06-18 at 21:41 +0200, Andres Freund wrote:
  It calls pg_flush_data inside of copy_file which does the
  posix_fadvise... So maybe just put the sync_file_range in pg_flush_data?
 
 The functions in fd.c aren't linked to initdb, so it's a challenge to
 share that code (I remember now: that's why I didn't use pg_flush_data
 originally). I don't see a clean way to resolve that, do you?
 
 Also, it seems that sync_file_range() doesn't help with creating a
 database much (on my machine), so it doesn't seem important to use it
 there.
 
 Right now I'm inclined to leave the patch as-is.
Fine with that, I wanted to bring it up and see it documented.

I have marked it with ready for committer. That committer needs to decide on -
N in the regression tests or not, but that shouldn't be much of a problem ;)

Cool!

Andres
-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pl/perl and utf-8 in sql_ascii databases

2012-06-19 Thread Alex Hunsaker
On Mon, Jun 18, 2012 at 1:30 PM, Alvaro Herrera
alvhe...@commandprompt.com wrote:

 Excerpts from Alex Hunsaker's message of vie feb 10 16:53:05 -0300 2012:

 Seems like we missed the fact that we still did SvUTF8_on() in sv2cstr
 and SvPVUTF8() when turning a perl string into a cstring.

 Hmm, this patch belongs into back branches too, right?  Not just the
 current development tree?

(Have been out of town, sorry for the late reply)

Yeah, back to 9.1.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Do we want a xmalloc or similar function in the Backend?

2012-06-19 Thread Robert Haas
On Tue, Jun 19, 2012 at 10:17 AM, Andres Freund and...@2ndquadrant.com wrote:
 There are 70+ calls of malloc in the backend in the form of

 type* foo = malloc(sizeof(...));
 if(!foo)
   elog(ERROR, could not allocate memory);

 which is a bit annoying to write at times. Would somebody argue against
 introducing a function that does the above named xmalloc() or malloc_or_die()?

I can't even find 70 malloc calls in the entire backend, let alone 70
with that pattern.  Still, I don't think malloc_or_error (not die)
would be a bad idea.

But the error should definitely be written as:

ereport(ERROR,
(errcode(ERRCODE_OUT_OF_MEMORY),
 errmsg(out of memory)));

...not elog.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Do we want a xmalloc or similar function in the Backend?

2012-06-19 Thread Andres Freund
On Tuesday, June 19, 2012 07:35:53 PM Robert Haas wrote:
 On Tue, Jun 19, 2012 at 10:17 AM, Andres Freund and...@2ndquadrant.com 
wrote:
  There are 70+ calls of malloc in the backend in the form of
  
  type* foo = malloc(sizeof(...));
  if(!foo)
elog(ERROR, could not allocate memory);
  
  which is a bit annoying to write at times. Would somebody argue against
  introducing a function that does the above named xmalloc() or
  malloc_or_die()?
 
 I can't even find 70 malloc calls in the entire backend, let alone 70
 with that pattern.  Still, I don't think malloc_or_error (not die)
 would be a bad idea.
$ ack '\bmalloc\s*\(' src/backend/|wc -l
70

10-15 or so of those are comments.

The 70+ came from me running on some development branch with commitfest 
patches applied...

 But the error should definitely be written as:
 
 ereport(ERROR,
 (errcode(ERRCODE_OUT_OF_MEMORY),
  errmsg(out of memory)));
 
 ...not elog.
Yes, definitely. Currently some of those locations (e.g. in xlog.c) are only 
protected by Asserts... 

Andres
-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] sortsupport for text

2012-06-19 Thread Kevin Grittner
Peter Geoghegan pe...@2ndquadrant.com wrote:
 Kevin Grittner kevin.gritt...@wicourts.gov wrote:
 
 I'm pretty sure that when I was using Sybase ASE the order for
 non-equal values was always predictable, and it behaved in the
 manner I describe below.  I'm less sure about any other product.
 
 Maybe it used a physical row identifier as a tie-breaker? Note
 that we use ItemPointers as a tie-breaker for sorting index
 tuples.
 
 I imagine that it was at least predictable among columns being
 sorted, if only because en_US.UTF-8 doesn't have any notion of
 equivalence (that is, it just so happens that there are no two
 strings that are equivalent but not bitwise equal). It would
 surely be impractical to do a comparison for the entire row, as
 that could be really expensive.
 
We weren't using en_US.UTF-8 collation (or any other proper
collation) on Sybase -- I'm not sure whether they even supported
proper collation sequences on the versions we used.  I'm thinking of
when we were using their case insensitive sorting.  I don't know
the implementation details, but the behavior was consistent with
including each character-based column twice: once in the requested
position in the ORDER BY clause but folded to a consistent case, and
again after all the columns in the ORDER BY clause in original form,
with C collation.
 
I wasn't aware that en_US.UTF-8 doesn't have equivalence without
equality.  I guess that surprising result in my last post is just
plain inevitable with that collation then.  Bummer.  Is there
actually anyone who finds that to be a useful behavior?  For a
collation which considered upper-case and lower-case to be
equivalent, would PostgreSQL sort as I wanted, or is it doing some
tie-break per column within equivalent values?
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC][PATCH] Logical Replication/BDR prototype and architecture

2012-06-19 Thread Andres Freund
Hi,

The most important part, even for people not following my discussion with 
Robert is at the bottom where the possible wal decoding strategies are laid 
out.

On Tuesday, June 19, 2012 03:20:58 AM Robert Haas wrote:
 On Sat, Jun 16, 2012 at 7:43 AM, Andres Freund and...@2ndquadrant.com 
wrote:
   Hm. Yes, you could do that. But I have to say I don't really see a
   point. Maybe the fact that I do envision multimaster systems at some
   point is clouding my judgement though as its far less easy in that
   case.
  
  Why?  I don't think that particularly changes anything.
  
  Because it makes conflict detection very hard. I also don't think its a
  feature worth supporting. Whats the use-case of updating records you
  cannot properly identify?
 Don't ask me; I just work here.  I think it's something that some
 people want, though.  I mean, if you don't support replicating a table
 without a primary key, then you can't even run pgbench in a
 replication environment.
Well, I have no problem with INSERT only tables not having a PK. And 
pgbench_history is the only pgbench table that doesn't have a pkey? And thats 
only truncated...

 Is that an important workload?  Well,
 objectively, no.  But I guarantee you that other people with more
 realistic workloads than that will complain if we don't have it.
 Absolutely required on day one?  Probably not.  Completely useless
 appendage that no one wants?  Not that, either.
Maybe. And I don't really care, so if others see that as important I am happy 
to appease them ;). 

  In my view, a logical replication solution is precisely one in which
  the catalogs don't need to be in sync.  If the catalogs have to be in
  sync, it's not logical replication.  ISTM that what you're talking
  about is sort of a hybrid between physical replication (pages) and
  logical replication (tuples) - you want to ship around raw binary
  tuple data, but not entire pages.
  Ok, thats a valid point. Simon argued at the cluster summit that
  everything thats not physical is logical. Which has some appeal because
  it seems hard to agree what exactly logical rep is. So definition by
  exclusion makes kind of sense ;)
 Well, the words are fuzzy, but I would define logical replication to
 be something which is independent of the binary format in which stuff
 gets stored on disk.  If it's not independent of the disk format, then
 you can't do heterogenous replication (between versions, or between
 products).  That precise limitation is the main thing that drives
 people to use anything other than SR in the first place, IME.
Not in mine. The main limitation I see is that you cannot write anything on 
the standby. Which sucks majorly for many things. Its pretty much impossible 
to fix that for SR outside of very limited cases.
While many scenarios don't need multimaster *many* need to write outside of 
the standby's replication set.

  I think what you categorized as hybrid logical/physical rep solves an
  important use-case thats very hard to solve at the moment. Before my
  2ndquadrant days I had several client which had huge problemsing the
  trigger based solutions because their overhead simply was to big a
  burden on the master. They couldn't use SR either because every
  consuming database kept loads of local data.
  I think such scenarios are getting more and more common.

 I think this is to some extent true, but I also think you're
 conflating two different things.  Change extraction via triggers
 introduces overhead that can be eliminated by reconstructing tuples
 from WAL in the background rather than forcing them to be inserted
 into a shadow table (and re-WAL-logged!) in the foreground.  I will
 grant that shipping the tuple as a binary blob rather than as text
 eliminates additional overehead on both ends, but it also closes off a
 lot of important use cases.  As I noted in my previous email, I think
 that ought to be a performance optimization that we do, if at all,
 when it's been proven safe, not a baked-in part of the design.  Even a
 solution that decodes WAL to text tuples and ships those around and
 reinserts the via pure SQL should be significantly faster than the
 replication solutions we have today; if it isn't, something's wrong.
Its not only the logging side which is a limitation in todays replication 
scenarios. The apply side scales even worse because its *very* hard to 
distribute it between multiple backends.

  The problem with that is it's going to be tough to make robust.  Users
  could easily end up with answers that are total nonsense, or probably
  even crash the server.
  Why?
 Because the routines that decode tuples don't include enough sanity
 checks to prevent running off the end of the block, or even the end of
 memory completely.  Consider a corrupt TOAST pointer that indicates
 that there is a gigabyte of data stored in an 8kB block.  One of the
 common symptoms of corruption IME is TOAST requests for -3 bytes of
 memory.
Yes, but we need to put 

Re: [HACKERS] sortsupport for text

2012-06-19 Thread Peter Geoghegan
On 19 June 2012 18:57, Kevin Grittner kevin.gritt...@wicourts.gov wrote:
 We weren't using en_US.UTF-8 collation (or any other proper
 collation) on Sybase -- I'm not sure whether they even supported
 proper collation sequences on the versions we used.  I'm thinking of
 when we were using their case insensitive sorting.  I don't know
 the implementation details, but the behavior was consistent with
 including each character-based column twice: once in the requested
 position in the ORDER BY clause but folded to a consistent case, and
 again after all the columns in the ORDER BY clause in original form,
 with C collation.

 I wasn't aware that en_US.UTF-8 doesn't have equivalence without
 equality.

Not that many do. The underlying cause of the problem back in 2005 was
the tacit assumption that none do, which presumably we got away with
for a while. I mentioned Hungarian, but it happens a bit in Swedish
too.

PostgreSQL supported Unicode before 2005, when the tie-breaker was
introduced. I know at least one Swede who used Postgres95. I just took
a look at the REL6_4 branch, and it looks much the same in 1999 as it
did in 2005, in that there is no tie-breaker after the strcoll(). Now,
that being the case, and Hungarian in particular having a whole bunch
of these equivalencies, I have to wonder if the original complainant's
problem really was diagnosed correctly. It could of had something to
do with the fact that texteq() was confused about whether it reported
equality or equivalency - it may have taken that long for the (len1 !=
len2) fastpath thing (only holds for equality, not equivalence,
despite the fact that the 2005-era strcoll() call checks equivalence
within texteq() ) to trip someone out, because texteq() would have
thereby given inconsistent answers in a very subtle way, that were not
correct either according to the Hungarian locale, nor according to
simple bitwise equality. That's mostly speculation, but the question
must be asked.

 I guess that surprising result in my last post is just
 plain inevitable with that collation then.  Bummer.  Is there
 actually anyone who finds that to be a useful behavior?

Looking at the details again and assuming a US locale, yeah, it is.
The substance of your complain holds though.

 For a collation which considered upper-case and lower-case to be
 equivalent, would PostgreSQL sort as I wanted, or is it doing some
 tie-break per column within equivalent values?

You could do that, and some people do use custom collations for
various reasons. That's obviously very much of minority interest
though. Most people will just use citext or something. However, since
citext is itself a client of varstr_cmp(), this won't help you.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Event Triggers reduced, v1

2012-06-19 Thread Robert Haas
On Fri, Jun 15, 2012 at 4:27 PM, Dimitri Fontaine
dfonta...@hi-media.com wrote:
 Allow me to open the new season of the DML trigger series, named
 pg_event_trigger. This first episode is all about setting up the drama,
 so that next ones make perfect sense.

Comments:

1. I still think we ought to get rid of the notion of BEFORE or AFTER
(i.e. pg_event_trigger.evttype) and just make that detail part of the
event name (e.g. pg_event_trigger.evtevent).  Many easily forseeable
event types will be more like during rather than before or
after, and for those that do have a notion of before and after, we
can have two different event names and include the word before or
after there.  I am otherwise satisfied with the schema you've
chosen.

2. I think it's important to be able to add new types of event
triggers without creating excessive parser bloat.  I think it's
important to use some kind of generic syntax here which will be able
to apply to all types of triggers we may want to add, now or in the
future.  The easiest way to do that is to use literal syntax for the
list of command tags, rather than writing them out as key words: e.g.
'ALTER TABLE' rather than ALTER TABLE.  It's not quite as pretty, but
the savings in parser bloat and future code change seem well worth it.
 Or, alternatively, we could use identifier style, e.g. alter_table,
as I previously suggested.

3. The event trigger cache seems to be a few bricks shy of a load.
First, event_trigger_cache_is_stalled is mis-named; I think you mean
stale, not stalled.  Second, instead of setting that flag and then
rebuilding the cache when you see the flag set, how about just blowing
away the cache contents whenever you would have set the flag?  That
seems a whole lot simpler and cleaner, and removes the need for a
force_rebuild flag on BuildEventTriggerCache().  Third, ISTM that this
isn't going to work correctly if backend A performs an event after
backend B has built its cache.  To fix this, I think you need to rip
out all the places where you force a rebuild locally and instead use
something like CacheRegisterSyscacheCallback() to blow away the cache
whenever something changes; you might find it helpful to look at
attoptcache.c.

4. The documentation doesn't build.

openjade:reference.sgml:44:4:W: cannot generate system identifier for
general entity alterEventTrigger
openjade:reference.sgml:44:4:E: general entity alterEventTrigger not
defined and no default entity
openjade:reference.sgml:44:21:E: reference to entity
alterEventTrigger for which no system identifier could be generated
openjade:reference.sgml:44:3: entity was defined here
openjade:reference.sgml:86:4:W: cannot generate system identifier for
general entity createEventTrigger
openjade:reference.sgml:86:4:E: general entity createEventTrigger
not defined and no default entity
openjade:reference.sgml:86:22:E: reference to entity
createEventTrigger for which no system identifier could be generated
openjade:reference.sgml:86:3: entity was defined here
openjade:reference.sgml:125:4:W: cannot generate system identifier for
general entity dropEventTrigger
openjade:reference.sgml:125:4:E: general entity dropEventTrigger not
defined and no default entity
openjade:reference.sgml:125:20:E: reference to entity
dropEventTrigger for which no system identifier could be generated
openjade:reference.sgml:125:3: entity was defined here
openjade:catalogs.sgml:1868:35:E: character _ is not allowed in the
value of attribute ZONE
openjade:catalogs.sgml:1868:19:X: reference to non-existent ID
CATALOG-PG-EVENT_TRIGGER
openjade:trigger.sgml:43:47:X: reference to non-existent ID
SQL-CREATECOMMANDTRIGGER

5. In terms of a psql command, I think that \dev is both not very
mnemonic and, as you mentioned in the comment, possibly confusable
with SQL/MED.  If we're going to have a \d command for this, I suggest
something like \dy, which is not very mnemonic either but at least
seems unlikely to be confused with anything else.  Other things that
are unused include \dh, \dj, \dk, \dq, \dw, and \dz, if any of those
grab you, or a somewhat broader range of things (but still nothing
very memorable) if we include capital letters.  Or we could branch out
into punctuation, like \d -- or things that don't begin with the
letter d, but that doesn't seem like a particularly good idea.

6. In objectaddress.c, I think that get_object_address_event_trigger
should be eliminated in favor of an additional case in
get_object_address_unqualified.

7. There are many references to command triggers that still need to be
cleaned up.  trigger.sgml still makes reference to the name command
triggers.  plpgsql.sgml also contains vestiges of the command trigger
notation, and references to some TG_* variables that I don't think
exist in this version of the patch.  event_trigger.c is identified
(twice) as cmdtrigger.c in the file header comment.  The header
comment for InsertEventTriggerTuple makes reference to pg_cmdtrigger,
as does a line comment in 

Re: [HACKERS] sortsupport for text

2012-06-19 Thread Tom Lane
Kevin Grittner kevin.gritt...@wicourts.gov writes:
 I wasn't aware that en_US.UTF-8 doesn't have equivalence without
 equality.  I guess that surprising result in my last post is just
 plain inevitable with that collation then.  Bummer.  Is there
 actually anyone who finds that to be a useful behavior?  For a
 collation which considered upper-case and lower-case to be
 equivalent, would PostgreSQL sort as I wanted, or is it doing some
 tie-break per column within equivalent values?

Well, there are two different questions there.

As far as the overall structure of an ORDER BY or btree index is
concerned, all it knows is what the datatype comparator functions
tell it.  There is no such thing as a second pass to reconsider
values found to be equal; that's all the info there is.

As far as the behavior of the comparator function for text is concerned,
we choose to break ties reported by strcoll via strcmp.  But that's not
visible from outside the comparator, so there's no notion of an
equivalence behavior different from equality.

I'm not exactly convinced that we could have a second-pass tiebreak
mechanism without creating serious problems for btree indexes; in
particular it seems like ordering considering only some leading keys
might not match the actual index ordering.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] sortsupport for text

2012-06-19 Thread Peter Geoghegan
On 19 June 2012 19:44, Peter Geoghegan pe...@2ndquadrant.com wrote:
 You could do that, and some people do use custom collations for
 various reasons. That's obviously very much of minority interest
 though. Most people will just use citext or something. However, since
 citext is itself a client of varstr_cmp(), this won't help you.

I spoke too soon - that would work fine in the U.S, since the text is
converted to lower case on-the-fly in a locale and collation aware
fashion.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pl/perl and utf-8 in sql_ascii databases

2012-06-19 Thread Alvaro Herrera

Excerpts from Robert Haas's message of mar jun 19 11:36:41 -0400 2012:
 
 On Mon, Jun 18, 2012 at 3:30 PM, Alvaro Herrera
 alvhe...@commandprompt.com wrote:
  Excerpts from Alex Hunsaker's message of vie feb 10 16:53:05 -0300 2012:
 
  Seems like we missed the fact that we still did SvUTF8_on() in sv2cstr
  and SvPVUTF8() when turning a perl string into a cstring.
 
  Hmm, this patch belongs into back branches too, right?  Not just the
  current development tree?
 
 It seems like a bug fix to me.

That's what I thought.  I will commit to both branches soon, then.
Mind you, this should have been an open item, not a commitfest item.
(Actually not even an open item.  We should have committed it right
away.)

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] use of int4/int32 in C code

2012-06-19 Thread Robert Haas
On Tue, Jun 19, 2012 at 9:47 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Peter Eisentraut pete...@gmx.net writes:
 What is the latest theory on using int4 vs. int32 in C code?
 (equivalently int2, int16)

 I thought the general idea was to use int32 most places, but int4 in
 catalog declarations.  I don't think it's tremendously important if
 somebody uses the other though.

I concur with Peter that TMTOWTDI is not the right way to do this.  I
think we ought to get rid of int4 in code and use int32 everywhere.

 While we're at it, how do we feel about using C standard types like
 int32_t instead of (or initially in addition to) our own definitions?

 Can't get very excited about this either.  The most likely outcome of
 a campaign to substitute the standard types is that back-patching would
 become a truly painful activity.  IMO, anything that is going to result
 in tens of thousands of diffs had better have a more-than-cosmetic
 reason.  (That wouldn't apply if we only used int32_t in new code ...
 but then, instead of two approved ways to do it, there would be three.
 Which doesn't seem like it improves matters.)

On this one, I agree with you.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH 04/16] Add embedded list interface (header only)

2012-06-19 Thread Robert Haas
On Wed, Jun 13, 2012 at 7:28 AM, Andres Freund and...@2ndquadrant.com wrote:
 Adds a single and a double linked list which can easily embedded into other
 datastructures and can be used without any additional allocations.

dllist.h advertises that it's embeddable.  Can you use that instead,
or enhance it slightly to support what you want to do?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH 10/16] Introduce the concept that wal has a 'origin' node

2012-06-19 Thread Andres Freund
On Tuesday, June 19, 2012 07:24:13 PM Robert Haas wrote:
 On Tue, Jun 19, 2012 at 12:11 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  Andres Freund and...@2ndquadrant.com writes:
  On Tuesday, June 19, 2012 04:30:59 PM Tom Lane wrote:
  ...  (If you are thinking
  of something sufficiently high-level that merging could possibly work,
  then it's not WAL, and we shouldn't be trying to make the WAL
  representation cater for it.)
  
  Do you really see this as such a big problem?
  
  It looks suspiciously like I have a hammer, therefore every problem
  must be a nail.  I don't like the design concept of cramming logical
  replication records into WAL in the first place.
 
 Me, neither.  I think it's necessary to try to find a way of
 generating logical replication records from WAL.  But once generated,
 I think those records should form their own stream, independent of
 WAL.  If you take the contrary position that they should be included
 in WAL, then when you filter the WAL stream down to just the records
 of interest to logical replication, you end up with a WAL stream with
 holes in it, which is one of the things that Andres listed as an
 unresolved design problem in his original email.
Yes.

 Moreover, this isn't necessary at all for single-master replication,
 or even multi-source replication where each table has a single master.
  It's only necessary for full multi-master replication, which we have
 no consensus to include in core, and even if we did have a consensus
 to include it in core, it certainly shouldn't be the first feature we
 design.
Well, you can't blame a patch/prototype trying to implement what it claims to 
implement ;)

More seriously: Even if we don't put MM in core I think putting the basis for 
it in core so that somebody can build such a solution reusing the existing 
infrastructure is a sensible idea. Imo the only thing that requires explicit 
support which is hard to add outside of core is prevention of loops (aka this 
patch). Everything else should be doable reusing the hopefully modular pieces.

  However, if we're dead set on doing it that way, let us put information
  that is only relevant to logical replication records into only the
  logical replication records.
 Right.  If we decide we need this, and if we did decide to conflate
 the WAL stream, both of which I disagree with as noted above, then we
 still don't need it on every record.  It would probably be sufficient
 for local transactions to do nothing at all (and we can implicitly
 assume that they have master node ID = local node ID) and transactions
 which are replaying remote changes to emit one record per XID per
 checkpoint cycle containing the remote node ID.
Youve gone from a pretty trivial 150 line patch without any runtime/space 
overhead to something *considerably* more complex in that case though. 

You suddently need to have relatively complex logic to remember which 
information you got for a certain xid (and forget that information afterwards) 
and whether you already logged that xid and you need to have to decide about 
logging that information at multiple places.

Btw, what do you mean with conflating the stream? I don't really see that 
being proposed.

Andres
-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] use of int4/int32 in C code

2012-06-19 Thread Peter Geoghegan
On 19 June 2012 20:11, Robert Haas robertmh...@gmail.com wrote:
 On Tue, Jun 19, 2012 at 9:47 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Peter Eisentraut pete...@gmx.net writes:
 What is the latest theory on using int4 vs. int32 in C code?
 (equivalently int2, int16)

 I thought the general idea was to use int32 most places, but int4 in
 catalog declarations.  I don't think it's tremendously important if
 somebody uses the other though.

 I concur with Peter that TMTOWTDI is not the right way to do this.  I
 think we ought to get rid of int4 in code and use int32 everywhere.

 While we're at it, how do we feel about using C standard types like
 int32_t instead of (or initially in addition to) our own definitions?

 Can't get very excited about this either.  The most likely outcome of
 a campaign to substitute the standard types is that back-patching would
 become a truly painful activity.  IMO, anything that is going to result
 in tens of thousands of diffs had better have a more-than-cosmetic
 reason.  (That wouldn't apply if we only used int32_t in new code ...
 but then, instead of two approved ways to do it, there would be three.
 Which doesn't seem like it improves matters.)

 On this one, I agree with you.

Yeah. I find pgindent changes annoying when doing a git blame myself.
Now, granted, you can mostly take care of that by having the tool
ignore whitespace changes, but that doesn't always work perfectly, and
I haven't been able to figure out a better way of managing that.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Allow WAL information to recover corrupted pg_controldata

2012-06-19 Thread Robert Haas
On Tue, Jun 19, 2012 at 1:43 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Amit Kapila amit.kap...@huawei.com writes:
 AFAIR you can create pg_control from scratch already with pg_resetxlog.
 The hard part is coming up with values for the counters, such as the
 next WAL location.  Some of them such as next OID are pretty harmless
 if you don't guess right, but I'm worried that wrong next WAL could
 make things worse not better.

 I believe if WAL files are proper as mentioned in Alvaro's mail, the
 purposed logic should generate correct values.

 I've got a problem with the assumption that, when pg_control is trash,
 megabytes or gigabytes of WAL can still be relied on completely.

 I'm almost inclined to suggest that we not get next-LSN from WAL, but
 by scanning all the pages in the main data store and computing the max
 observed LSN.  This is clearly not very attractive from a performance
 standpoint, but it would avoid the obvious failure mode where you lost
 some recent WAL segments along with pg_control.

I think it could be useful to have a tool that scans all the blocks
and computes that value, but I'd want it to just print the value out
and let me decide what to do about it.  There are cases where you
don't necessarily want to clobber pg_control, but you do have future
LSNs in your data file pages.  This can be either because the disk ate
your WAL, or because you didn't create recovery.conf, or because your
disk corrupted the LSNs on the data file pages.  I'd want a tool that
could be either run on an individual file, or recursively on a
directory.

In terms of the TODO item, I haven't yet heard anyone clearly state I
wanted to use pg_controldata but it couldn't because X so therefore we
need this patch.  Alvaro mentioned the case where pg_control is
missing altogether, but:

[rhaas pgsql]$ rm ~/pgdata/global/pg_control
[rhaas pgsql]$ postgres
postgres: could not find the database system
Expected to find it in the directory /Users/rhaas/pgdata,
but could not open file /Users/rhaas/pgdata/global/pg_control: No
such file or directory
[rhaas pgsql]$ pg_resetxlog ~/pgdata
pg_resetxlog: could not open file global/pg_control for reading: No
such file or directory
If you are sure the data directory path is correct, execute
  touch global/pg_control
and try again.
[rhaas pgsql]$ touch ~/pgdata/global/pg_control
[rhaas pgsql]$ pg_resetxlog ~/pgdata
pg_resetxlog: pg_control exists but is broken or unknown version; ignoring it
Guessed pg_control values:

First log file ID after reset:0
First log file segment after reset:   69
pg_control version number:922
Catalog version number:   201206141
Database system identifier:   5755831325641078488
Latest checkpoint's TimeLineID:   1
Latest checkpoint's full_page_writes: off
Latest checkpoint's NextXID:  0/3
Latest checkpoint's NextOID:  1
Latest checkpoint's NextMultiXactId:  1
Latest checkpoint's NextMultiOffset:  0
Latest checkpoint's oldestXID:3
Latest checkpoint's oldestXID's DB:   0
Latest checkpoint's oldestActiveXID:  0
Maximum data alignment:   8
Database block size:  8192
Blocks per segment of large relation: 131072
WAL block size:   8192
Bytes per WAL segment:16777216
Maximum length of identifiers:64
Maximum columns in an index:  32
Maximum size of a TOAST chunk:1996
Date/time type storage:   64-bit integers
Float4 argument passing:  by value
Float8 argument passing:  by value

If these values seem acceptable, use -f to force reset.
[rhaas pgsql]$ pg_resetxlog -f ~/pgdata
pg_resetxlog: pg_control exists but is broken or unknown version; ignoring it
Transaction log reset
[rhaas pgsql]$ postgres
LOG:  database system was shut down at 2012-06-19 15:25:28 EDT
LOG:  database system is ready to accept connections
LOG:  autovacuum launcher started

So I still don't understand what problem we're solving here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pl/perl and utf-8 in sql_ascii databases

2012-06-19 Thread Robert Haas
On Tue, Jun 19, 2012 at 3:03 PM, Alvaro Herrera
alvhe...@commandprompt.com wrote:
 That's what I thought.  I will commit to both branches soon, then.

I think there might be three branches involved.

 Mind you, this should have been an open item, not a commitfest item.
 (Actually not even an open item.  We should have committed it right
 away.)

True, but it's not a perfect world, and I didn't feel qualified to
commit it myself.  Adding it the CommitFest at least prevented it from
getting lost forever.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] return values of backend sub-main functions

2012-06-19 Thread Robert Haas
On Tue, Jun 19, 2012 at 7:31 AM, Peter Eisentraut pete...@gmx.net wrote:
 On ons, 2012-01-18 at 21:21 +0200, Peter Eisentraut wrote:
 On lör, 2012-01-07 at 16:41 -0500, Tom Lane wrote:
  Peter Eisentraut pete...@gmx.net writes:
   I suggest that we change PostgresMain(), PostmasterMain(), BackendRun(),
   WalSenderMain(), and WalSndLoop() to return void as well.
 
  I agree this code is not very consistent or useful, but one question:
  what should the callers do if one of these functions *does* return?

 I was thinking of a two-pronged approach:  First, add
 __attribute__((noreturn)) to the functions.  This will cause a suitable
 compiler to verify on a source-code level that nothing actually returns
 from the function.  And second, at the call site, put an abort();  /*
 not reached */.  Together, this will make the code cleaner and more
 consistent, and will also help the compiler out a bit about the control
 flow.

 Patch for 9.3 attached.

Seems reasonable on a quick read-through.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH 04/16] Add embedded list interface (header only)

2012-06-19 Thread Andres Freund
On Tuesday, June 19, 2012 09:16:41 PM Robert Haas wrote:
 On Wed, Jun 13, 2012 at 7:28 AM, Andres Freund and...@2ndquadrant.com 
wrote:
  Adds a single and a double linked list which can easily embedded into
  other datastructures and can be used without any additional allocations.
 
 dllist.h advertises that it's embeddable.  Can you use that instead,
 or enhance it slightly to support what you want to do?
Oh, wow. Didn't know that existed. I had $subject lying around from a play-
around project, so I didn't look too hard.
Why is that code not used more widely? Quite a bit of our list usage should be 
replaced embedding list element in larger structs imo. There are also open-
coded inline list manipulations around (check aset.c for example).

*looks*

Not really that happy with it though:

1. dllist.h has double the element overhead by having an inline value pointer 
(which is not needed when embedding) and a pointer to the list (which I have a 
hard time seing as being useful)
2. only double linked list, mine provided single and double linked ones
3. missing macros to use when embedded in a larger struct (containerof() 
wrappers and for(...) support basically)
4. most things are external function calls...
5. way much more branches/complexity in most of the functions. My 
implementation doesn't use any branches for the typical easy modifications 
(push, pop, remove element somewhere) and only one for the typical tests 
(empty, has-next, ...)

The performance and memory aspects were crucial for the aforementioned toy 
project (slab allocator for postgres). Its not that crucial for the applycache 
where the lists currently are mostly used although its also relatively 
performance sensitive and obviously does a lot of list manipulation/iteration.

If I had to decide I would add the missing api in dllist.h to my 
implementation and then remove it. Its barely used - and only in an embedded 
fashion - as far as I can see.
I can understand though if that argument is met with doubt by others ;). If 
thats the way it has to go I would add some more convenience support for 
embedding data to dllist.h and settle for that.

Greetings,

Andres


-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH 01/16] Overhaul walsender wakeup handling

2012-06-19 Thread Robert Haas
On Wed, Jun 13, 2012 at 7:28 AM, Andres Freund and...@2ndquadrant.com wrote:
 From: Andres Freund and...@anarazel.de

 The previous coding could miss xlog writeouts at several places. E.g. when wal
 was written out by the background writer or even after a commit if
 synchronous_commit=off.
 This could lead to delays in sending data to the standby of up to 7 seconds.

 To fix this move the responsibility of notification to the layer where the
 neccessary information is actually present. We take some care not to do the
 notification while we hold conteded locks like WALInsertLock or WalWriteLock
 locks.

I am not convinced that it's a good idea to wake up every walsender
every time we do XLogInsert().  XLogInsert() is a super-hot code path,
and adding more overhead there doesn't seem warranted.  We need to
replicate commit, commit prepared, etc. quickly, by why do we need to
worry about a short delay in replicating heap_insert/update/delete,
for example?  They don't really matter until the commit arrives.  7
seconds might be a bit long, but that could be fixed by decreasing the
polling interval for walsender to, say, a second.

When I was doing some testing recently, the case that was sort of
confusing was the replay of AccessExcusiveLocks.  The lock didn't show
up on the standby for many seconds after it had showed up on the
master.  But that's more a feature than a bug when you really thinking
about it - postponing the lock on the slave for as long as possible
just reduces the user impact of having to take them there at all.

Parenthetically, I find it difficult to extract inline patches.  No
matter whether I try to use it using Gmail + show original or the web
site, something always seems to get garbled.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH 04/16] Add embedded list interface (header only)

2012-06-19 Thread Robert Haas
On Tue, Jun 19, 2012 at 3:48 PM, Andres Freund and...@2ndquadrant.com wrote:
 Why is that code not used more widely? Quite a bit of our list usage should be
 replaced embedding list element in larger structs imo. There are also open-
 coded inline list manipulations around (check aset.c for example).

Because we've got a million+ lines of code and nobody knows where all
the bodies are buried.

 1. dllist.h has double the element overhead by having an inline value pointer
 (which is not needed when embedding) and a pointer to the list (which I have a
 hard time seing as being useful)
 2. only double linked list, mine provided single and double linked ones
 3. missing macros to use when embedded in a larger struct (containerof()
 wrappers and for(...) support basically)
 4. most things are external function calls...
 5. way much more branches/complexity in most of the functions. My
 implementation doesn't use any branches for the typical easy modifications
 (push, pop, remove element somewhere) and only one for the typical tests
 (empty, has-next, ...)

 The performance and memory aspects were crucial for the aforementioned toy
 project (slab allocator for postgres). Its not that crucial for the applycache
 where the lists currently are mostly used although its also relatively
 performance sensitive and obviously does a lot of list manipulation/iteration.

 If I had to decide I would add the missing api in dllist.h to my
 implementation and then remove it. Its barely used - and only in an embedded
 fashion - as far as I can see.
 I can understand though if that argument is met with doubt by others ;). If
 thats the way it has to go I would add some more convenience support for
 embedding data to dllist.h and settle for that.

I think it might be simpler to leave the name as Dllist and just
overhaul the implementation along the lines you suggest, rather than
replacing it with something completely different.  Mostly, I don't
want to add a third thing if we can avoid it, given that Dllist as it
exists today is used only lightly.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH 04/16] Add embedded list interface (header only)

2012-06-19 Thread Marko Kreen
On Wed, Jun 13, 2012 at 2:28 PM, Andres Freund and...@2ndquadrant.com wrote:
 +/*
 + * removes a node from a list
 + * Attention: O(n)
 + */
 +static inline void ilist_s_remove(ilist_s_head *head,
 +                                  ilist_s_node *node)
 +{
 +       ilist_s_node *last = head-head;
 +       ilist_s_node *cur;
 +#ifndef NDEBUG
 +       bool found = false;
 +#endif
 +       while ((cur = last-next))
 +       {
 +               if (cur == node)
 +               {
 +                       last-next = cur-next;
 +#ifndef NDEBUG
 +                       found = true;
 +#endif
 +                       break;
 +               }
 +               last = cur;
 +       }
 +       assert(found);
 +}

This looks weird.

In cyclic list removal is:

  node-prev-next = node-next;
  node-next-prev = node-prev;

And thats it.

-- 
marko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH 04/16] Add embedded list interface (header only)

2012-06-19 Thread Andres Freund
Hi,

On Tuesday, June 19, 2012 09:59:48 PM Marko Kreen wrote:
 On Wed, Jun 13, 2012 at 2:28 PM, Andres Freund and...@2ndquadrant.com 
wrote:
  +/*
  + * removes a node from a list
  + * Attention: O(n)
  + */
  +static inline void ilist_s_remove(ilist_s_head *head,
  +  ilist_s_node *node)
  +{
  +   ilist_s_node *last = head-head;
  +   ilist_s_node *cur;
  +#ifndef NDEBUG
  +   bool found = false;
  +#endif
  +   while ((cur = last-next))
  +   {
  +   if (cur == node)
  +   {
  +   last-next = cur-next;
  +#ifndef NDEBUG
  +   found = true;
  +#endif
  +   break;
  +   }
  +   last = cur;
  +   }
  +   assert(found);
  +}
 
 This looks weird.
 
 In cyclic list removal is:
 
   node-prev-next = node-next;
   node-next-prev = node-prev;
 
 And thats it.
Thats the single linked list, not the double linked one. Thats why it has a 
O(n) warning tacked on...

The double linked one is just as you said:
/*
 * removes a node from a list
 */
static inline void ilist_d_remove(unused_attr ilist_d_head *head, ilist_d_node 
*node)
{
ilist_d_check(head);
node-prev-next = node-next;
node-next-prev = node-prev;
ilist_d_check(head);
}

Greetings,

Andres

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH 01/16] Overhaul walsender wakeup handling

2012-06-19 Thread Andres Freund
On Tuesday, June 19, 2012 09:55:30 PM Robert Haas wrote:
 On Wed, Jun 13, 2012 at 7:28 AM, Andres Freund and...@2ndquadrant.com 
wrote:
  From: Andres Freund and...@anarazel.de
  
  The previous coding could miss xlog writeouts at several places. E.g.
  when wal was written out by the background writer or even after a commit
  if synchronous_commit=off.
  This could lead to delays in sending data to the standby of up to 7
  seconds.
  
  To fix this move the responsibility of notification to the layer where
  the neccessary information is actually present. We take some care not to
  do the notification while we hold conteded locks like WALInsertLock or
  WalWriteLock locks.
 
 I am not convinced that it's a good idea to wake up every walsender
 every time we do XLogInsert().  XLogInsert() is a super-hot code path,
 and adding more overhead there doesn't seem warranted.  We need to
 replicate commit, commit prepared, etc. quickly, by why do we need to
 worry about a short delay in replicating heap_insert/update/delete,
 for example?  They don't really matter until the commit arrives.  7
 seconds might be a bit long, but that could be fixed by decreasing the
 polling interval for walsender to, say, a second.
Its not woken up every XLogInsert call. Its only woken up if there was an 
actual disk write + fsync in there. Thats exactly the point of the patch.
The wakeup rate is actually lower for synchronous_commit=on than before 
because then it unconditionally did a wakeup for every commit (and similar) 
and now only does that if something has been written + fsynced.

 Parenthetically, I find it difficult to extract inline patches.  No
 matter whether I try to use it using Gmail + show original or the web
 site, something always seems to get garbled.
Will use git send-mail --attach next time... Btw, git am should be able to 
extract the patches for you.

Andres
-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH 04/16] Add embedded list interface (header only)

2012-06-19 Thread Marko Kreen
On Tue, Jun 19, 2012 at 11:02 PM, Andres Freund and...@2ndquadrant.com wrote:
 On Tuesday, June 19, 2012 09:59:48 PM Marko Kreen wrote:
 On Wed, Jun 13, 2012 at 2:28 PM, Andres Freund and...@2ndquadrant.com
 wrote:
  +/*
  + * removes a node from a list
  + * Attention: O(n)
  + */
  +static inline void ilist_s_remove(ilist_s_head *head,
  +                                  ilist_s_node *node)
  +{
  +       ilist_s_node *last = head-head;
  +       ilist_s_node *cur;
  +#ifndef NDEBUG
  +       bool found = false;
  +#endif
  +       while ((cur = last-next))
  +       {
  +               if (cur == node)
  +               {
  +                       last-next = cur-next;
  +#ifndef NDEBUG
  +                       found = true;
  +#endif
  +                       break;
  +               }
  +               last = cur;
  +       }
  +       assert(found);
  +}

 This looks weird.

 In cyclic list removal is:

   node-prev-next = node-next;
   node-next-prev = node-prev;

 And thats it.
 Thats the single linked list, not the double linked one. Thats why it has a
 O(n) warning tacked on...

Oh, you have several list implementations there.
Sorry, I was just browsing and it caught my eye.

-- 
marko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH 04/16] Add embedded list interface (header only)

2012-06-19 Thread Andres Freund
On Tuesday, June 19, 2012 09:58:43 PM Robert Haas wrote:
 On Tue, Jun 19, 2012 at 3:48 PM, Andres Freund and...@2ndquadrant.com 
wrote:
  Why is that code not used more widely? Quite a bit of our list usage
  should be replaced embedding list element in larger structs imo. There
  are also open- coded inline list manipulations around (check aset.c for
  example).
 Because we've got a million+ lines of code and nobody knows where all
 the bodies are buried.
Heh, yes ;). Ive hit several times as you uncovered at least twice ;)

  1. dllist.h has double the element overhead by having an inline value
  pointer (which is not needed when embedding) and a pointer to the list
  (which I have a hard time seing as being useful)
  2. only double linked list, mine provided single and double linked ones
  3. missing macros to use when embedded in a larger struct (containerof()
  wrappers and for(...) support basically)
  4. most things are external function calls...
  5. way much more branches/complexity in most of the functions. My
  implementation doesn't use any branches for the typical easy
  modifications (push, pop, remove element somewhere) and only one for the
  typical tests (empty, has-next, ...)
  
  The performance and memory aspects were crucial for the aforementioned
  toy project (slab allocator for postgres). Its not that crucial for the
  applycache where the lists currently are mostly used although its also
  relatively performance sensitive and obviously does a lot of list
  manipulation/iteration.
  
  If I had to decide I would add the missing api in dllist.h to my
  implementation and then remove it. Its barely used - and only in an
  embedded fashion - as far as I can see.
  I can understand though if that argument is met with doubt by others ;).
  If thats the way it has to go I would add some more convenience support
  for embedding data to dllist.h and settle for that.
 
 I think it might be simpler to leave the name as Dllist and just
 overhaul the implementation along the lines you suggest, rather than
 replacing it with something completely different.  Mostly, I don't
 want to add a third thing if we can avoid it, given that Dllist as it
 exists today is used only lightly.
Well, if its the name, I have no problem with changing it, but I don't see how 
you can keep the api as it currently is and address my points.

If there is some buyin I can try to go either way (keeping the existing name, 
changing the api, adjusting the callers or just adjust the callers, throw away 
the old implementation) I just don't want to get into that just to see 
somebody isn't agreeing with the fundamental idea.

The most contentious point is probably relying on USE_INLINE being available 
anywhere. Which I believe to be the point now that we have gotten rid of some 
platforms.

Andres
-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH 10/16] Introduce the concept that wal has a 'origin' node

2012-06-19 Thread Kevin Grittner
Andres Freund and...@2ndquadrant.com wrote:
 Robert Haas wrote:
 Tom Lane t...@sss.pgh.pa.us wrote:
 
 However, if we're dead set on doing it that way, let us put
 information that is only relevant to logical replication records
 into only the logical replication records.
 Right.  If we decide we need this, and if we did decide to
 conflate the WAL stream, both of which I disagree with as noted
 above, then we still don't need it on every record.  It would
 probably be sufficient for local transactions to do nothing at
 all (and we can implicitly assume that they have master node ID =
 local node ID) and transactions which are replaying remote
 changes to emit one record per XID per checkpoint cycle
 containing the remote node ID.
 Youve gone from a pretty trivial 150 line patch without any
 runtime/space overhead to something *considerably* more complex in
 that case though. 
 
I think it might be worth it.  I've done a lot of MM replication,
and so far have not had to use a topology which allowed loops.  Not
only would you be reserving space in the WAL stream which was not
useful for those not using MM replication, you would be reserving it
when even many MM configurations would not need it.  Now you could
argue that the 16 bits you want to use are already there and are not
yet used for anything; but there are two counter-arguments to that:
you lose the opportunity to use them for something else, and you
might want more than 16 bits.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] performance regression in 9.2 when loading lots of small tables

2012-06-19 Thread Robert Haas
On Mon, Jun 18, 2012 at 8:42 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 There was a regression introduced in 9.2 that effects the creation and
 loading of lots of small tables in a single transaction.

 It affects the loading of a pg_dump file which has a large number of
 small tables (10,000 schemas, one table per schema, 10 rows per
 table).  I did not test other schema configurations, so these
 specifics might not be needed to invoke the problem.

 It causes the loading of a dump with psql -1 -f  to run at half the
 previous speed.  Speed of loading without the -1 is not changed.

I tried to figure out why this was happening.  I tried it out on the
IBM POWER7 box after building from
f5297bdfe4c4a47376c41b96161fb55c2294a0b1 and it ran for about 14
minutes.  'time' reports that psql used 38 seconds of user time and 11
seconds of system time.  The remaining time was presumably spent
waiting for the backend and/or the kernel.

I ran the backend under strace -cf and it had this to say:

% time seconds  usecs/call callserrors syscall
-- --- --- - - 
 45.31   39.888639  10   3836379   write
 18.80   16.553878   3   4929697   lseek
 13.51   11.890826   6   1906179 12819 read
 10.249.011690   7   1372354  5995 recv
  6.275.517929   5   1107766   brk
  2.071.818345  91 20068   fsync
  1.461.281999  15 87260   send
  1.151.015621  24 42527 11334 open

I did a separate run using perf and it had this to say:

Events: 1M cycles
+  13.18% postgres  postgres   [.] FlushRelationBuffers
+   9.65% postgres  postgres   [.] comparetup_index_btree
+   9.34%  swapper  [kernel.kallsyms]  [k]
.pseries_dedicated_idle_sleep
+   4.41% postgres  postgres   [.] CopyReadLine
+   4.23% postgres  postgres   [.] tuplesort_heap_siftup
+   4.17% postgres  postgres   [.] LockReassignCurrentOwner
+   3.88% postgres  postgres   [.] pg_mbstrlen_with_len
+   2.74% postgres  postgres   [.] pg_verify_mbstr_len
+   2.46% postgres  postgres   [.] NextCopyFromRawFields
+   2.44% postgres  postgres   [.] btint4cmp
+   2.08% postgres  libc-2.14.90.so[.] memcpy
+   2.06% postgres  postgres   [.] hash_seq_search
+   1.55% postgres  [kernel.kallsyms]  [k] .__copy_tofrom_user
+   1.29% postgres  libc-2.14.90.so[.]
__GI_strtoll_l_internal
+   1.16% postgres  postgres   [.] pg_utf_mblen

There are certainly opportunities for optimization there but it's sure
not obvious to me what it has to do with either of the commits you
mention.  I haven't actually verified that there's a regression on
this box as result of either of those commits, though: I just profiled
master.  Maybe I better go check that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH 10/16] Introduce the concept that wal has a 'origin' node

2012-06-19 Thread Marko Kreen
On Mon, Jun 18, 2012 at 6:35 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On 13 June 2012 19:28, Andres Freund and...@2ndquadrant.com wrote:
 This adds a new configuration parameter multimaster_node_id which determines
 the id used for wal originating in one cluster.

 Looks good and it seems this aspect at least is commitable in this CF.

 Design decisions I think we need to review are

 * Naming of field. I think origin is the right term, borrowing from Slony.

I have not read too deeply here, so maybe I am missing
some important detail here, but idea that users need
to coordinate a integer config parameter globally does not
sound too attractive to me.

Why not limit integers to local storage only and map
them to string idents on config, UI and transport?

-- 
marko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Near-duplicate RI NO ACTION and RESTRICT triggers

2012-06-19 Thread Dean Rasheed
On 19 June 2012 17:48, Tom Lane t...@sss.pgh.pa.us wrote:
 I think that the argument for having the RESTRICT triggers behave
 like this is that the SQL spec envisions the RESTRICT check occurring
 immediately when the individual PK row is updated/deleted, and so there
 would be no opportunity for another PK row to be updated into its place.
 (Or, in plainer English, RESTRICT should mean you can't modify this
 row's keys at all if it has dependents.)  Because we implement RESTRICT
 through an AFTER trigger that can't run earlier than end-of-statement,
 we can't exactly match the spec's semantics, but we can get fairly
 close so long as you don't think about what would be seen by
 e.g. user-written triggers executing during the statement.

 I'm happy with continuing to have this behavioral difference between
 the two sets of triggers, but wanted to throw it up for discussion:
 does anyone think it'd be better to apply ri_Check_Pk_Match in the
 RESTRICT triggers too?


In SQL:2008 they've re-worded the descriptions of these actions and
added an explicit note to clarify the intended difference:


— ON UPDATE RESTRICT: any change to a referenced column in the
referenced table is prohibited if there
is a matching row.
— ON UPDATE NO ACTION (the default): there is no referential update
action; the referential constraint
only specifies a constraint check.

NOTE 38 — Even if constraint checking is not deferred, ON UPDATE
RESTRICT is a stricter condition than ON UPDATE NO
ACTION. ON UPDATE RESTRICT prohibits an update to a particular row if
there are any matching rows; ON UPDATE NO
ACTION does not perform its constraint check until the entire set of
rows to be updated has been processed.


and there's a similar note in the DELETE case. So I think the current
behaviour is correct, and there are probably genuine use cases for
both types of check.

Regards,
Dean

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Testing 9.2 in ~production environment

2012-06-19 Thread Peter Eisentraut
On tis, 2012-06-19 at 09:33 -0400, Tom Lane wrote:
 Come to think of it, another possible factor is that LIKE can't use
 ordinary indexes on text if the locale isn't C.

But he reported that the plans are the same.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC][PATCH] Logical Replication/BDR prototype and architecture

2012-06-19 Thread Kevin Grittner
Andres Freund and...@2ndquadrant.com wrote:
 
 The problem is just that to support basically arbitrary decoding
 requirements you need to provide at least those pieces of
 information in a transactionally consistent manner:
 * the data
 * table names
 * column names
 * type information
 * replication configuration
 
I'm not sure that the last one needs to be in scope for the WAL
stream, but the others I definitely agree eventually need to be
available to a logical transaction stream consumer.  You lay out the
alternative ways to get all of this pretty clearly, and I don't know
what the best answer is; it seems likely that there is not one best
answer.  In the long run, more than one of those options might need
to be supported, to support different environments.
 
As an initial implementation, I'm leaning toward the position that
requiring a hot standby or a catalog-only proxy is acceptable.  I
think that should allow an application to be written which emits
everything except the replication configuration.  That will allow us
to hook up everything we need at our shop.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] performance regression in 9.2 when loading lots of small tables

2012-06-19 Thread Robert Haas
On Tue, Jun 19, 2012 at 4:33 PM, Robert Haas robertmh...@gmail.com wrote:
 On Mon, Jun 18, 2012 at 8:42 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 There was a regression introduced in 9.2 that effects the creation and
 loading of lots of small tables in a single transaction.

 It affects the loading of a pg_dump file which has a large number of
 small tables (10,000 schemas, one table per schema, 10 rows per
 table).  I did not test other schema configurations, so these
 specifics might not be needed to invoke the problem.

 It causes the loading of a dump with psql -1 -f  to run at half the
 previous speed.  Speed of loading without the -1 is not changed.

 I tried to figure out why this was happening.  I tried it out on the
 IBM POWER7 box after building from
 f5297bdfe4c4a47376c41b96161fb55c2294a0b1 and it ran for about 14
 minutes.  'time' reports that psql used 38 seconds of user time and 11
 seconds of system time.  The remaining time was presumably spent
 waiting for the backend and/or the kernel.

 I ran the backend under strace -cf and it had this to say:

 % time     seconds  usecs/call     calls    errors syscall
 -- --- --- - - 
  45.31   39.888639          10   3836379           write
  18.80   16.553878           3   4929697           lseek
  13.51   11.890826           6   1906179     12819 read
  10.24    9.011690           7   1372354      5995 recv
  6.27    5.517929           5   1107766           brk
  2.07    1.818345          91     20068           fsync
  1.46    1.281999          15     87260           send
  1.15    1.015621          24     42527     11334 open

 I did a separate run using perf and it had this to say:

 Events: 1M cycles
 +  13.18%         postgres  postgres               [.] FlushRelationBuffers
 +   9.65%         postgres  postgres               [.] comparetup_index_btree
 +   9.34%          swapper  [kernel.kallsyms]      [k]
 .pseries_dedicated_idle_sleep
 +   4.41%         postgres  postgres               [.] CopyReadLine
 +   4.23%         postgres  postgres               [.] tuplesort_heap_siftup
 +   4.17%         postgres  postgres               [.] 
 LockReassignCurrentOwner
 +   3.88%         postgres  postgres               [.] pg_mbstrlen_with_len
 +   2.74%         postgres  postgres               [.] pg_verify_mbstr_len
 +   2.46%         postgres  postgres               [.] NextCopyFromRawFields
 +   2.44%         postgres  postgres               [.] btint4cmp
 +   2.08%         postgres  libc-2.14.90.so        [.] memcpy
 +   2.06%         postgres  postgres               [.] hash_seq_search
 +   1.55%         postgres  [kernel.kallsyms]      [k] .__copy_tofrom_user
 +   1.29%         postgres  libc-2.14.90.so        [.]
 __GI_strtoll_l_internal
 +   1.16%         postgres  postgres               [.] pg_utf_mblen

 There are certainly opportunities for optimization there but it's sure
 not obvious to me what it has to do with either of the commits you
 mention.  I haven't actually verified that there's a regression on
 this box as result of either of those commits, though: I just profiled
 master.  Maybe I better go check that.

I built REL9_1_STABLE from commit
1643031e5fe02d2de9ae6b8f86ef9ffd09fe7d3f and it took 19m45.250s, even
slower than master.  So something is different in my environment
versus yours.

I ran this build with perf also, and got this:

38.77% postgres  postgres   [.] FlushRelationBuffers
 6.80%  swapper  [kernel.kallsyms]  [k]
.pseries_dedicated_idle_sleep
 4.84% postgres  postgres   [.] comparetup_index_btree
 3.23% postgres  postgres   [.] CopyReadLine
 2.24% postgres  postgres   [.] tuplesort_heap_siftup
 2.23% postgres  postgres   [.] pg_mbstrlen_with_len
 2.08% postgres  postgres   [.] LockReassignCurrentOwner
 2.01% postgres  postgres   [.] pg_verify_mbstr_len
 1.90% postgres  postgres   [.] NextCopyFromRawFields
 1.62% postgres  postgres   [.] btint4cmp
 1.41% postgres  libc-2.14.90.so[.] memcpy
 1.39% postgres  postgres   [.] LWLockAcquire
 1.22% postgres  postgres   [.] LWLockRelease
 1.19% postgres  postgres   [.] pg_utf_mblen
 1.05% postgres  [kernel.kallsyms]  [k] .__copy_tofrom_user

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH 10/16] Introduce the concept that wal has a 'origin' node

2012-06-19 Thread Simon Riggs
On 20 June 2012 04:58, Marko Kreen mark...@gmail.com wrote:
 On Mon, Jun 18, 2012 at 6:35 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On 13 June 2012 19:28, Andres Freund and...@2ndquadrant.com wrote:
 This adds a new configuration parameter multimaster_node_id which determines
 the id used for wal originating in one cluster.

 Looks good and it seems this aspect at least is commitable in this CF.

 Design decisions I think we need to review are

 * Naming of field. I think origin is the right term, borrowing from Slony.

 I have not read too deeply here, so maybe I am missing
 some important detail here, but idea that users need
 to coordinate a integer config parameter globally does not
 sound too attractive to me.

 Why not limit integers to local storage only and map
 them to string idents on config, UI and transport?

Yes, that can be done. This is simply how the numbers are used
internally, similar to oids.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Backport of fsync queue compaction

2012-06-19 Thread Robert Haas
On Tue, Jun 19, 2012 at 5:33 PM, Greg Smith g...@2ndquadrant.com wrote:
 In January of 2011 Robert committed 7f242d880b5b5d9642675517466d31373961cf98
 to try and compact the fsync queue when clients find it full.  There's no
 visible behavior change, just a substantial performance boost possible in
 the rare but extremely bad situations where the background writer stops
 doing fsync absorption.  I've been running that in production at multiple
 locations since practically the day it hit this mailing list, with backports
 all the way to 8.3 being common (and straightforward to construct).  I've
 never seen a hint of a problem with this new code.

I've been in favor of back-porting this for a while, so you'll get no
argument from me.

Anyone disagree?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] use of int4/int32 in C code

2012-06-19 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Tue, Jun 19, 2012 at 9:47 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 I thought the general idea was to use int32 most places, but int4 in
 catalog declarations.  I don't think it's tremendously important if
 somebody uses the other though.

 I concur with Peter that TMTOWTDI is not the right way to do this.  I
 think we ought to get rid of int4 in code and use int32 everywhere.

I have not looked to see how many places do that.  If it's a reasonably
small number of places, I'm OK with getting rid of int4 at the C level.
(int2/int8 the same of course.)

If we are going to do that, though, we need to actually remove those
typedefs.  Leaving them around on the grounds that third-party code
might be using them will just allow cases to creep back in.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH 10/16] Introduce the concept that wal has a 'origin' node

2012-06-19 Thread Robert Haas
On Tue, Jun 19, 2012 at 3:18 PM, Andres Freund and...@2ndquadrant.com wrote:
 More seriously: Even if we don't put MM in core I think putting the basis for
 it in core so that somebody can build such a solution reusing the existing
 infrastructure is a sensible idea. Imo the only thing that requires explicit
 support which is hard to add outside of core is prevention of loops (aka this
 patch). Everything else should be doable reusing the hopefully modular pieces.

I don't think prevention of loops is hard to do outside of core
either, unless you insist on tying logical replication so closely to
WAL that anyone doing MMR is necessarily getting the change stream
from WAL.  In fact, I'd go so far as to say that the ONLY part of this
that's hard to do outside of core is change extraction.  Even
low-level apply can be implemented as a loadable module.

 Right.  If we decide we need this, and if we did decide to conflate
 the WAL stream, both of which I disagree with as noted above, then we
 still don't need it on every record.  It would probably be sufficient
 for local transactions to do nothing at all (and we can implicitly
 assume that they have master node ID = local node ID) and transactions
 which are replaying remote changes to emit one record per XID per
 checkpoint cycle containing the remote node ID.
 Youve gone from a pretty trivial 150 line patch without any runtime/space
 overhead to something *considerably* more complex in that case though.

 You suddently need to have relatively complex logic to remember which
 information you got for a certain xid (and forget that information afterwards)
 and whether you already logged that xid and you need to have to decide about
 logging that information at multiple places.

You need a backend-local hash table inside the wal reader process, and
that hash table needs to map XIDs to node IDs.  And you occasionally
need to prune it, so that it doesn't eat too much memory.  None of
that sounds very hard.

 Btw, what do you mean with conflating the stream? I don't really see that
 being proposed.

It seems to me that you are intent on using the WAL stream as the
logical change stream.  I think that's a bad design.  Instead, you
should extract changes from WAL and then ship them around in a format
that is specific to logical replication.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Backport of fsync queue compaction

2012-06-19 Thread Christopher Browne
On Tue, Jun 19, 2012 at 5:39 PM, Robert Haas robertmh...@gmail.com wrote:
 On Tue, Jun 19, 2012 at 5:33 PM, Greg Smith g...@2ndquadrant.com wrote:
 In January of 2011 Robert committed 7f242d880b5b5d9642675517466d31373961cf98
 to try and compact the fsync queue when clients find it full.  There's no
 visible behavior change, just a substantial performance boost possible in
 the rare but extremely bad situations where the background writer stops
 doing fsync absorption.  I've been running that in production at multiple
 locations since practically the day it hit this mailing list, with backports
 all the way to 8.3 being common (and straightforward to construct).  I've
 never seen a hint of a problem with this new code.

 I've been in favor of back-porting this for a while, so you'll get no
 argument from me.

 Anyone disagree?

I recall reviewing that; it seemed like quite a good change.  Me likes.
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Backport of fsync queue compaction

2012-06-19 Thread Alvaro Herrera

Excerpts from Robert Haas's message of mar jun 19 17:39:46 -0400 2012:
 On Tue, Jun 19, 2012 at 5:33 PM, Greg Smith g...@2ndquadrant.com wrote:
  In January of 2011 Robert committed 7f242d880b5b5d9642675517466d31373961cf98
  to try and compact the fsync queue when clients find it full.  There's no
  visible behavior change, just a substantial performance boost possible in
  the rare but extremely bad situations where the background writer stops
  doing fsync absorption.  I've been running that in production at multiple
  locations since practically the day it hit this mailing list, with backports
  all the way to 8.3 being common (and straightforward to construct).  I've
  never seen a hint of a problem with this new code.
 
 I've been in favor of back-porting this for a while, so you'll get no
 argument from me.

+1.  I even thought we had already backported it and was surprised to
discover we hadn't, when we had this problem at a customer, not long
ago.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Transactions over pathological TCP connections

2012-06-19 Thread Leon Smith
On Tue, Jun 19, 2012 at 11:59 AM, Robert Haas robertmh...@gmail.com wrote:

 On Tue, Jun 19, 2012 at 1:56 AM, Tom Lane t...@sss.pgh.pa.us wrote:
  The transaction would be committed before a command success report is
  delivered to the client, so I don't think delivered-and-not-marked is
  possible.

 ...unless you have configured synchronous_commit=off, or fsync=off.

 Or unless your disk melts into a heap of slag and you have to restore
 from backup.  You can protect against that last case using synchronous
 replication.



But hard disk failure isn't in the failure model I was concerned about.
=)To be perfectly honest,  I'm not too concerned with either hard drive
failure or network failure,   as we are deploying on Raid 1+0 database
server talking to the client over a redundant LAN,  and using asynchronous
(Slony) replication to an identical database server just in case.   No
single point of failure is a key philosophy of this app from top to bottom.
Like I said,  this is mostly idle curiosity.

But I'm also accustomed to trying to get work done on shockingly unreliable
internet connections.   As a result, network failure is something I think
about quite a lot when writing networked applications.   So this is not
entirely idle curiosity either.

And thinking about this a bit more,   it's clear that the database has to
commit before the result is sent,  on the off chance that the transaction
fails and needs to be retried.   And that an explicit transaction block
isn't really a solution either,  because   a BEGIN;  SELECT dequeue_row()
 would get the row to the client without marking it as taken,   but the
pathological TCP disconnect could then attack the  following COMMIT;,
 leaving the client to think that the row has not been actually taken when
it in fact has.

It's not clear to me that this is even a solvable problem without modifying
the schema to include both a taken and a finished processing state,
 and then letting elements be re-delievered after a period of time.   But
this would then allow a pathological demon with the power to cause TCP
connects have a single element delivered and processed multiple times.

In any case, thanks for the responses...

Best,
Leon


Re: [HACKERS] [PATCH 10/16] Introduce the concept that wal has a 'origin' node

2012-06-19 Thread Simon Riggs
On 19 June 2012 14:03, Tom Lane t...@sss.pgh.pa.us wrote:

 Every WAL record?  Why in heck would you attach it to every record?
 Surely putting it in WAL page headers would be sufficient.  We could
 easily afford to burn a page switch (if not a whole segment switch)
 when changing masters.

This does appear to be a reasonable idea at first glance, since it
seems that each node has just a single node id, but that is not the
case.

As we pass changes around we maintain the same origin id for a change,
so there is a mix of origin node ids at the WAL record level, not the
page level. The concept of originating node id is essentially same as
that used in Slony.

 I'm against the idea of eating any spare space we have in WAL record
 headers for this purpose, anyway; there are likely to be more pressing
 needs in future.

Not sure what those pressing needs are, but I can't see any. What we
are doing here is fairly important, just not as important as crash
recovery. But then that has worked pretty much unchanged for some time
now.

I raised the possibility of having variable length headers, but there
is no requirement for that yet.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] use of int4/int32 in C code

2012-06-19 Thread Kevin Grittner
Tom Lane t...@sss.pgh.pa.us wrote:
 
 I have not looked to see how many places do that.  If it's a
reasonably
 small number of places, I'm OK with getting rid of int4 at the C
level.
 (int2/int8 the same of course.)
 
$ find -name '*.h' -or -name '*.c' | egrep -v '/tmp_check/' | xargs cat
\

  | egrep -c '\bint2\b'
178

  | egrep -c '\bint4\b'
570

  | egrep -c '\bint8\b'
212
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   >