date:20120619

Re: [HACKERS] Pg default's verbosity?

2012-06-19 Thread Martijn van Oosterhout

On Mon, Jun 18, 2012 at 09:30:14PM -0400, Robert Haas wrote:
> There might be something to the idea of demoting a few of the things
> we've traditionally had as NOTICEs, though.  IME, the following two
> messages account for a huge percentage of the chatter:
> 
> NOTICE:  CREATE TABLE will create implicit sequence "foo_a_seq" for
> serial column "foo.a"
> NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index
> "foo_pkey" for table "foo"

+1

Absolutely. And if you also suppress the output of the setval's
produced by pg_dump that would make a successful restore of a dump
produce barely any output at all with -q.  That would make errors
significantly more visible.

Not sure how to go about that though.

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> He who writes carelessly confesses thereby at the very outset that he does
> not attach much importance to his own thoughts.
   -- Arthur Schopenhauer


signature.asc
Description: Digital signature

Re: [HACKERS] Testing 9.2 in ~production environment

2012-06-19 Thread Peter Eisentraut

On tis, 2012-06-19 at 02:38 -0400, Tom Lane wrote:
> Peter Eisentraut  writes:
> > On mån, 2012-06-18 at 17:57 -0400, James Cloos wrote:
> >> I presume that lc_ctype is the significant difference?
> 
> > It certainly makes some difference, but it's a bit shocking that
> makes
> > things that much slower.
> 
> If James is testing text-comparison-heavy operations, it doesn't seem
> shocking in the least.  strcoll() in most non-C locales is a pig. 

Ah yes, of course, having lc_ctype != C also selects strcoll instead of
strcmp.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 10/16] Introduce the concept that wal has a 'origin' node

2012-06-19 Thread Andres Freund

On Tuesday, June 19, 2012 12:47:54 AM Christopher Browne wrote:
> On Mon, Jun 18, 2012 at 11:50 AM, Andres Freund  
wrote:
> > Hi Simon,
> > 
> > On Monday, June 18, 2012 05:35:40 PM Simon Riggs wrote:
> >> On 13 June 2012 19:28, Andres Freund  wrote:
> >> > This adds a new configuration parameter multimaster_node_id which
> >> > determines the id used for wal originating in one cluster.
> >> 
> >> Looks good and it seems this aspect at least is commitable in this CF.
> > 
> > I think we need to agree on the parameter name. It currently is
> > 'multimaster_node_id'. In the discussion with Steve we got to
> > "replication_node_id". I don't particularly like either.
> > 
> > Other suggestions?
> I wonder if it should be origin_node_id?  That is the term Slony uses.
I think in the slony context its clear that thats related to replication. Less 
so in a general postgres guc. So maybe replication_origin_node_id?

> >> * Size of field. 16 bits is enough for 32,000 master nodes, which is
> >> quite a lot. Do we need that many? I think we may have need for a few
> >> flag bits, so I'd like to reserve at least 4 bits for flag bits, maybe
> >> 8 bits. Even if we don't need them in this release, I'd like to have
> >> them. If they remain unused after a few releases, we may choose to
> >> redeploy some of them as additional nodeids in future. I don't foresee
> >> complaints that 256 master nodes is too few anytime soon, so we can
> >> defer that decision.
> > 
> > I wished we had some flag bits available before as well. I find 256 nodes
> > a pretty low value to start with though, 4096 sounds better though, so I
> > would be happy with 4 flag bits. I think for cascading setups and such
> > you want to add node ids for every node, not only masters...
> Even though the number of nodes that can reasonably participate in
> replication is likely to be not too terribly large, it might be good
> to allow larger values, in case someone is keen on encoding something
> descriptive in the node number.
Well, having a large number space makes the wal records bigger which is not 
something I can see us getting through with. People have gone through 
considerable length to avoid that.
Perhaps we can have a mapping system catalog at some point that includes 
additional infromation to each node id like a name and where its at wrt 
replication...

> I recall the Slony-II project having a notion of attaching a permanent
> UUID-based node ID to each node.  As long as there is somewhere decent
> to find a symbolically significant node "name," I like the idea of the
> ID *not* being in a tiny range, and being UUID/OID-like...
I think adding 14 bytes (16bytes of an ooid - 2 bytes available) is out of 
question...

> > Any opinions from others on this?
> > 
> >> * Do we want origin_id as a parameter or as a setting in pgcontrol?
> >> IIRC we go to a lot of trouble elsewhere to avoid problems with
> >> changing on/off parameter values. I think we need some discussion to
> >> validate where that should live.
> > 
> > Hm. I don't really forsee any need to have it in pg_control. What do you
> > want to protect against with that?
> > It would need to be changeable anyway, because otherwise it would need to
> > become a parameter for initdb which would suck for anybody migrating to
> > use replication at some point.
> > 
> > Do you want to protect against problems in replication setups after
> > changing the value?
> In Slony, changing the node ID is Not Something That Is Done.  The ID
> is captured in *way* too many places to be able to have any hope of
> updating it in a coordinated way.  I should be surprised if it wasn't
> similarly troublesome here.
If you update you will need to reset consuming nodes, yes. Imo thats still 
something else than disallowing changing the parameter entirely. Requiring 
initdb for that seems like it would make experimentation too hard.

We need to allow at least changing the setting from no node id to an initial 
one.

Greetings,

Andres
-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] psql tab completion for GRANT role

2012-06-19 Thread Peter Eisentraut

On tor, 2012-06-14 at 13:38 -0400, Robert Haas wrote:
> On Sun, Jan 8, 2012 at 3:48 PM, Peter Eisentraut  wrote:
> > psql tab completion currently only supports the form GRANT privilege ON
> > something TO someone (and the analogous REVOKE), but not the form GRANT
> > role TO someone.  Here is a patch that attempts to implement the latter.
> 
> This seems to have fallen through the cracks.

No, it was committed in January.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 10/16] Introduce the concept that wal has a 'origin' node

2012-06-19 Thread Andres Freund

Hi,

On Tuesday, June 19, 2012 08:03:04 AM Tom Lane wrote:
> Andres Freund  writes:
> > On Monday, June 18, 2012 11:51:27 PM Daniel Farina wrote:
> >> What's the cost of going a lot higher?  Because if one makes enough
> >> numerical space available, one can assign node identities without a
> >> coordinator, a massive decrease in complexity.
> > 
> > It would increase the size of every wal record. We just have 16bit left
> > there by chance...
> 
> "Every WAL record"?  Why in heck would you attach it to every record?
> Surely putting it in WAL page headers would be sufficient.  We could
> easily afford to burn a page switch (if not a whole segment switch)
> when changing masters.
The idea is that you can have cascading, circular and whatever replication 
topologies if you include the "logical origin" of a wal causing action into 
it.
That is, if you have nodes A(1) and B(2) and a insert happens on A the wal 
records generated by that will get an xl_origin_id = 1 and when it will be 
decoded and replayed on B it will *also* get the id 1. Only when a change 
originally is generated on Bit will get xl_origin_id = 2.
That way you can easily have circular or hierarchical replication topologies 
including diamonds.

> I'm against the idea of eating any spare space we have in WAL record
> headers for this purpose, anyway; there are likely to be more pressing
> needs in future.
Every other solution to allowing this seems to be far more complicated than 
this, thats why I arrived at the conclusion that its a good idea.

Greetings,

Andres
-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 10/16] Introduce the concept that wal has a 'origin' node

2012-06-19 Thread Andres Freund

On Tuesday, June 19, 2012 04:12:47 AM Steve Singer wrote:
> On 12-06-18 07:30 AM, Andres Freund wrote:
> > Hrmpf #666. I will go through through the series commit-by-commit again
> > to make sure everything compiles again. Reordinging this late definitely
> > wasn't a good idea...
> > 
> > I pushed a rebased version with all those fixups (and removal of the
> > zeroRecPtr patch).
> 
> Where did you push that rebased version to? I don't see an attachment,
> or an updated patch in the commitfest app and your repo at
> http://git.postgresql.org/gitweb/?p=users/andresfreund/postgres.git;a=summa
> ry hasn't been updated in 5 days.
To the 2ndquadrant internal repo. Which strangely doesn't help you. 
*Headdesk*. Pushed to the correct repo and manually verified.

Andres
-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Skip checkpoint on promoting from streaming replication

2012-06-19 Thread Kyotaro HORIGUCHI

Thank you.

> What happens if the server skips an end-of-recovery checkpoint,
> is promoted to the master, runs some write transactions,
> crashes and restarts automatically before it completes
> checkpoint? In this case, the server needs to do crash recovery
> from the last checkpoint record with old timeline ID to the
> latest WAL record with new timeline ID. How does crash recovery
> do recovery beyond timeline?

Basically the same as archive recovery as far as I saw. It is
already implemented to work in that way.

After this patch applied, StartupXLOG() gets its
recoveryTargetTLI from the new field lastestTLI in the control
file instead of the latest checkpoint. And the latest checkpoint
record informs its TLI and WAL location as before, but its TLI
does not have a significant meaning in the recovery sequence.

Suggest the case following,

  |seg 1 | seg 2|
TLI 1 |...c..|00|
  C   P  X
TLI 2|00|

* C - checkpoint, P - promotion, X - crash just after here

This shows the situation that the latest checkpoint(restartpoint)
has been taken at TLI=1/SEG=1/OFF=4 and promoted at
TLI=1/SEG=2/OFF=5, then crashed just after TLI=2/SEG=2/OFF=8.
Promotion itself inserts no wal records but creates a copy of the
current segment for new TLI. the file for TLI=2/SEG=1 should not
exist. (Who will create it?)

The control file will looks as follows

latest checkpoint : TLI=1/SEG=1/OFF=4
latest TLI: 2

So the crash recovery sequence starts from SEG=1/LOC=4.
expectedTLIs will be (2, 1) so 1 will naturally be selected as
the TLI for SEG1 and 2 for SEG2 in XLogFileReadAnyTLI().

In the closer view, startup constructs expectedTLIs reading the
timeline hisotry file corresponds to the recoveryTargetTLI. Then
runs the recovery sequence from the redo point of the latest
checkpoint using WALs with the largest TLI - which is
distinguised by its file name, not header - within the
expectedTLIs in XLogPageRead(). The only difference to archive
recovery is XLogFileReadAnyTLI() reads only the WAL files already
sit in pg_xlog directory, and not reaches archive. The pages with
the new TLI will be naturally picked up as mentioned above in
this sequence and then will stop at the last readable record.

latestTLI field in the control file is updated just after the TLI
was incremented and the new WAL files with the new TLI was
created. So the crash recovery sequence won't stop before
reaching the WAL with new TLI disignated in the control file.


regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

== My e-mail address has been changed since Apr. 1, 2012.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Pg default's verbosity?

2012-06-19 Thread Fabien COELHO




There might be something to the idea of demoting a few of the things
we've traditionally had as NOTICEs, though.  IME, the following two
messages account for a huge percentage of the chatter:

NOTICE:  CREATE TABLE will create implicit sequence "foo_a_seq" for
serial column "foo.a"
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index
"foo_pkey" for table "foo"


You can also add:

NOTICE:  CREATE TABLE / UNIQUE will create implicit index "foo_name_key" 
for table "foo"


I agree that these amount for most of the noise.

As create table does create other objects, I could understand that someone 
wants to hear about that. Maybe move them as "info" ? Otherwise, changing 
the default message level seems reasonable to me. What we really case when 
loading an SQL script is WARNING & ERROR, so that should be what is 
activated.



I'm not going to claim that nobody in the history of the world has
ever benefited from those notices ... but I would be willing to bet
that a large majority of the people, in a large majority of the cases,
do not care.  And getting rid of them would surely make warnings and
notices that might actually be of interest to the user a lot more
visible.


--
Fabien.

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Libxml2 load error on Windows

2012-06-19 Thread Dave Page

On Tue, Jun 19, 2012 at 4:43 AM, Robert Haas  wrote:
>
>
> Please add this patch here so that it doesn't get lost in the shuffle:
>
> https://commitfest.postgresql.org/action/commitfest_view/open

Hmm, that raises an interesting question (though maybe I've just
missed this happening in the past). This is a bug fix, that we'll want
in the next back branch updates, not just 9.3. Sticking it in the open
commit fest means it may well not get looked at for 2-3 months or so.
How do we prevent that happening?

-- 
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] lock_timeout and common SIGALRM framework

2012-06-19 Thread Boszormenyi Zoltan


2012-06-18 19:46 keltezéssel, Alvaro Herrera írta:

Excerpts from Boszormenyi Zoltan's message of vie may 11 03:54:13 -0400 2012:

Hi,

another rebasing and applied the GIT changes in
ada8fa08fc6cf5f199b6df935b4d0a730aaa4fec to the
Windows implementation of PGSemaphoreTimedLock.

Hi,

I gave the framework patch a look.  One thing I'm not sure about is the
way you've defined the API.  It seems a bit strange to have a nice and
clean, generic interface in timeout.h; and then have the internal
implementation of the API cluttered with details such as what to do when
the deadlock timeout expires.  Wouldn't it be much better if we could
keep the details of CheckDeadLock itself within proc.c, for example?


Do you mean adding a callback function argument to for enable_timeout()
would be better?


Same for the other specific Check*Timeout functions.  It seems to me
that the only timeout.c specific requirement is to be able to poke
base_timeouts[].indicator and fin_time.  Maybe that can get done in
timeout.c and then have the generic checker call the module-specific
checker.


Or instead of static functions, Check* functions can be external
to timeout.c. It seemed to be a good idea to move all the timeout
related functions to timeout.c.


   ... I see that you have things to do before and after setting
"indicator".  Maybe you could split the module-specific Check functions
in two and have timeout.c call each in turn.  Other than that I don't
see that this should pose any difficulty.

Also, I see you're calling GetCurrentTimestamp() in the checkers; maybe
more than once per signal if you have multiple timeouts enabled.


Actually, GetCurrentTimestamp() is not called multiple times,
because only the first timeout source in the queue can get triggered.


   Maybe
it'd be better to call it in once handle_sig_alarm and then pass the
value down, if necessary (AFAICS if you restructure the code as I
suggest above, you don't need to get the value down the the
module-specific code).


But yes, this way it can be cleaner.



As for the Init*Timeout() and Destroy*Timeout() functions, they seem
a bit pointless -- I mean if they just always call the generic
InitTimeout and DestroyTimeout functions, why not just define the struct
in a way that makes the specific functions unnecessary?  Maybe you even
already have all that's necessary ... I think you just change
base_timeouts[i].timeout_destroy(false); to DestroyTimeout(i, false) and
so on.


OK, I will experiment with it.


Minor nitpick: the "Interface:" comment in timeout.c seems useless.
That kind of thing tends to get overlooked and obsolete over time.  We
have examples of such things all over the place.  I'd just rip it and
have timeout.h be the interface documentation.


OK.


BTW it doesn't seem that datatype/timestamp.h is really necessary in
timeout.h.


You are wrong. get_timeout_start() returns TimestampTz and it's
defined in datatype/timestamp.h.

Thanks for the review, I will send the new code soon.

Best regards,
Zoltán Böszörményi

--
--
Zoltán Böszörményi
Cybertec Schönig & Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt, Austria
Web: http://www.postgresql-support.de
 http://www.postgresql.at/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allow WAL information to recover corrupted pg_controldata

2012-06-19 Thread Amit Kapila

> I'm almost inclined to suggest that we not get next-LSN from WAL, but
> by scanning all the pages in the main data store and computing the max
> observed LSN.  This is clearly not very attractive from a performance
> standpoint, but it would avoid the obvious failure mode where you lost
> some recent WAL segments along with pg_control.

If we follow this approach, what should be handling in case next-LSN is greater 
than
last checkpoint record location read from WAL files. Currently I can see 
StratUpXLOG throws
PANIC error in such situation.
I think this can happen in case of missing some recent WAL segments.

With Regards,
Amit Kapila.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Lazy hashaggregate when no aggregation is needed

2012-06-19 Thread Etsuro Fujita

Hi,

> -Original Message-
> From: Robert Haas [mailto:robertmh...@gmail.com]
> Sent: Tuesday, June 19, 2012 3:12 AM
> To: Ants Aasma
> Cc: Etsuro Fujita; Jay Levitt; Tom Lane; PostgreSQL-development; Francois
> Deliege
> Subject: Re: [HACKERS] [PATCH] Lazy hashaggregate when no aggregation is
> needed
>
> On Fri, Jun 15, 2012 at 9:22 AM, Ants Aasma  wrote:
> > Exactly. I think the first question for this patch should be whether
> > this use-case is worth the complexity of the patch. I can't imagine
> > any really compelling use cases that need an arbitrary distinct subset
> > of results.
>
> Me neither.
>
> > The original complaint on -performance [1], didn't specify a real
> > world use case, but it seemed to be a case of an ORM generating
> > suboptimal queries. On the other hand, the patch itself is in my
> > opinion rather simple, so it might be worth it.
>
> Yeah.
>
> > It has one outstanding issue, query_planner chooses the cheapest path
> > based on total cost. This can be suboptimal when that path happens to
> > have high startup cost. It seems to me that enabling the query_planner
> > to find the cheapest unsorted path returning a limited amount of
> > tuples would require some major surgery to the planner. To be clear,
> > this is only a case of missed optimization, not a regression.
>
> I'm confused by this remark, because surely the query planner does it this
> way only if there's no LIMIT.  When there is a LIMIT, we choose based on
> the startup cost plus the estimated fraction of the total cost we expect
> to pay based on dividing the LIMIT by the overall row count estimate.  Or
> is this not what you're talking about?

I think that Ants is pointing the way of estimating costs in
choose_hashed_grouping()/choose_hashed_distinct(), ie cost_agg() for
cheapest_path + hashagg, where the costs are calculated based on the total
cost only of cheapest_path.  I think that it might be good to do cost_agg()
for the discussed case with the AGG_SORTED strategy, not the AGG_HASHED
strategy.

> > It won't help set returning functions because the tuplestore for those
> > is fully materialized when the first row is fetched.
>
> That's surely a problem for another day.

OK

Best regards,
Etsuro Fujita




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Libxml2 load error on Windows

2012-06-19 Thread Alex Shulgin


Dave Page  writes:

> On Tue, Jun 19, 2012 at 4:43 AM, Robert Haas  wrote:
>>
>>
>> Please add this patch here so that it doesn't get lost in the shuffle:
>>
>> https://commitfest.postgresql.org/action/commitfest_view/open
>
> Hmm, that raises an interesting question (though maybe I've just
> missed this happening in the past). This is a bug fix, that we'll want
> in the next back branch updates, not just 9.3. Sticking it in the open
> commit fest means it may well not get looked at for 2-3 months or so.
> How do we prevent that happening?

In a real bug-tracking system we would create a new bug/ticket and set
it's target version to 'candidate for next minor release' or something
like that.  This way, if we don't release unless all targeted bugs are
resolved, this would be taken care of (hopefully.)

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Libxml2 load error on Windows

2012-06-19 Thread Dave Page

On Tue, Jun 19, 2012 at 11:04 AM, Alex Shulgin  wrote:
>
> Dave Page  writes:
>
>> On Tue, Jun 19, 2012 at 4:43 AM, Robert Haas  wrote:
>>>
>>>
>>> Please add this patch here so that it doesn't get lost in the shuffle:
>>>
>>> https://commitfest.postgresql.org/action/commitfest_view/open
>>
>> Hmm, that raises an interesting question (though maybe I've just
>> missed this happening in the past). This is a bug fix, that we'll want
>> in the next back branch updates, not just 9.3. Sticking it in the open
>> commit fest means it may well not get looked at for 2-3 months or so.
>> How do we prevent that happening?
>
> In a real bug-tracking system we would create a new bug/ticket and set
> it's target version to 'candidate for next minor release' or something
> like that.  This way, if we don't release unless all targeted bugs are
> resolved, this would be taken care of (hopefully.)

Well yes, but the point is that that is not how the project works. I'm
asking how we do handle this problem at the moment, because I realised
I haven't seen this happen in the past (largely because I haven't been
paying attention).

-- 
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Libxml2 load error on Windows

2012-06-19 Thread Alex Shulgin


Dave Page  writes:

> On Tue, Jun 19, 2012 at 11:04 AM, Alex Shulgin  wrote:
>>
>> In a real bug-tracking system we would create a new bug/ticket and set
>> it's target version to 'candidate for next minor release' or something
>> like that.  This way, if we don't release unless all targeted bugs are
>> resolved, this would be taken care of (hopefully.)
>
> Well yes, but the point is that that is not how the project works. I'm
> asking how we do handle this problem at the moment, because I realised
> I haven't seen this happen in the past (largely because I haven't been
> paying attention).

It only works as long as there is some mechanism to ensure that the bugs
which were not submitted to commitfest are looked at.  If there is no,
then it doesn't work actually.

So is there a real and reliable mechanism for that?

--
Alex

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] return values of backend sub-main functions

2012-06-19 Thread Peter Eisentraut

On ons, 2012-01-18 at 21:21 +0200, Peter Eisentraut wrote:
> On lör, 2012-01-07 at 16:41 -0500, Tom Lane wrote:
> > Peter Eisentraut  writes:
> > > I suggest that we change PostgresMain(), PostmasterMain(), BackendRun(),
> > > WalSenderMain(), and WalSndLoop() to return void as well.
> > 
> > I agree this code is not very consistent or useful, but one question:
> > what should the callers do if one of these functions *does* return?
> 
> I was thinking of a two-pronged approach:  First, add
> __attribute__((noreturn)) to the functions.  This will cause a suitable
> compiler to verify on a source-code level that nothing actually returns
> from the function.  And second, at the call site, put an abort();  /*
> not reached */.  Together, this will make the code cleaner and more
> consistent, and will also help the compiler out a bit about the control
> flow.

Patch for 9.3 attached.

diff --git a/src/backend/main/main.c b/src/backend/main/main.c
index c7d48e9..33c5a0a 100644
--- a/src/backend/main/main.c
+++ b/src/backend/main/main.c
@@ -173,7 +173,7 @@
 
 #ifdef EXEC_BACKEND
 	if (argc > 1 && strncmp(argv[1], "--fork", 6) == 0)
-		exit(SubPostmasterMain(argc, argv));
+		SubPostmasterMain(argc, argv); /* does not return */
 #endif
 
 #ifdef WIN32
@@ -189,14 +189,13 @@
 
 	if (argc > 1 && strcmp(argv[1], "--boot") == 0)
 		AuxiliaryProcessMain(argc, argv);		/* does not return */
-
-	if (argc > 1 && strcmp(argv[1], "--describe-config") == 0)
-		exit(GucInfoMain());
-
-	if (argc > 1 && strcmp(argv[1], "--single") == 0)
-		exit(PostgresMain(argc, argv, get_current_username(progname)));
-
-	exit(PostmasterMain(argc, argv));
+	else if (argc > 1 && strcmp(argv[1], "--describe-config") == 0)
+		GucInfoMain();			/* does not return */
+	else if (argc > 1 && strcmp(argv[1], "--single") == 0)
+		PostgresMain(argc, argv, get_current_username(progname)); /* does not return */
+	else
+		PostmasterMain(argc, argv); /* does not return */
+	abort();		/* should not get here */
 }
 
 
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index eeea933..0f8af3a 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -343,8 +343,8 @@ static void LogChildExit(int lev, const char *procname,
 			 int pid, int exitstatus);
 static void PostmasterStateMachine(void);
 static void BackendInitialize(Port *port);
-static int	BackendRun(Port *port);
-static void ExitPostmaster(int status);
+static void	BackendRun(Port *port) __attribute__((noreturn));
+static void ExitPostmaster(int status) __attribute__((noreturn));
 static int	ServerLoop(void);
 static int	BackendStartup(Port *port);
 static int	ProcessStartupPacket(Port *port, bool SSLdone);
@@ -491,7 +491,7 @@ static bool save_backend_variables(BackendParameters *param, Port *port,
 /*
  * Postmaster main entry point
  */
-int
+void
 PostmasterMain(int argc, char *argv[])
 {
 	int			opt;
@@ -1125,7 +1125,7 @@ static bool save_backend_variables(BackendParameters *param, Port *port,
 	 */
 	ExitPostmaster(status != STATUS_OK);
 
-	return 0;	/* not reached */
+	abort();	/* not reached */
 }
 
 
@@ -3295,7 +3295,7 @@ static bool save_backend_variables(BackendParameters *param, Port *port,
 		BackendInitialize(port);
 
 		/* And run the backend */
-		proc_exit(BackendRun(port));
+		BackendRun(port);
 	}
 #endif   /* EXEC_BACKEND */
 
@@ -3539,7 +3539,7 @@ static bool save_backend_variables(BackendParameters *param, Port *port,
  *		Shouldn't return at all.
  *		If PostgresMain() fails, return status.
  */
-static int
+static void
 BackendRun(Port *port)
 {
 	char	  **av;
@@ -3610,7 +3610,7 @@ static bool save_backend_variables(BackendParameters *param, Port *port,
 	 */
 	MemoryContextSwitchTo(TopMemoryContext);
 
-	return (PostgresMain(ac, av, port->user_name));
+	PostgresMain(ac, av, port->user_name);
 }
 
 
@@ -3960,7 +3960,7 @@ static bool save_backend_variables(BackendParameters *param, Port *port,
  * have been inherited by fork() on Unix.  Remaining arguments go to the
  * subprocess FooMain() routine.
  */
-int
+void
 SubPostmasterMain(int argc, char *argv[])
 {
 	Port		port;
@@ -4195,7 +4195,7 @@ static bool save_backend_variables(BackendParameters *param, Port *port,
 		proc_exit(0);
 	}
 
-	return 1;	/* shouldn't get here */
+	abort();	/* shouldn't get here */
 }
 #endif   /* EXEC_BACKEND */
 
diff --git a/src/backend/replication/walsender.c b/src/backend/replication/walsender.c
index 45a3b2e..a9a8689 100644
--- a/src/backend/replication/walsender.c
+++ b/src/backend/replication/walsender.c
@@ -121,7 +121,7 @@
 
 /* Prototypes for private functions */
 static bool HandleReplicationCommand(const char *cmd_string);
-static int	WalSndLoop(void);
+static void WalSndLoop(void) __attribute__((noreturn));
 static void InitWalSnd(void);
 static void WalSndHandshake(void);
 static void WalSndKill(int code, Datum arg);
@@ -136,7 +136,7 @@
 
 
 /* Main entry point for walsender process */
-int
+void
 Wa

Re: [HACKERS] pgsql_fdw in contrib

2012-06-19 Thread Dickson S. Guedes

2012/6/18 Merlin Moncure :
> I can't help but wonder (having been down the contrib/core/extension
> road myself) if it isn't better to improve the facilities to register
> and search for qualified extensions (like Perl CPAN) so that people
> looking for code to improve their backends can find it.  That way,
> you're free to release/do xyz/abandon your project as you see fit
> without having to go through -hackers.  This should also remove a lot
> of the stigma with not being in core since if stock postgres
> installations can access the necessary modules via CREATE EXTENSION, I
> think it will make it easier for projects like this to get used with
> the additional benefit of decentralizing project management.

What about PGXN?

BTW, I'm with Robert about "we want to have at least one FDW in
core that actually talks to some other database server, rather than
just to a file, and it seems like pgsql_fdw is the obvious choice".

We have dblink as contrib, why not pgsql_fdw too?

Other FDWs could be available at PGXN, pgFoundry, Github, etc.

[]s
-- 
Dickson S. Guedes
mail/xmpp: gue...@guedesoft.net - skype: guediz
http://guedesoft.net - http://www.postgresql.org.br

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] use of int4/int32 in C code

2012-06-19 Thread Peter Eisentraut

What is the latest theory on using int4 vs. int32 in C code?

(equivalently int2, int16)

I had the idea that using int4 was sort of deprecated, and most code
uses int32, but I've come across several uses of int4 lately that looked
odd to me.

I think the main reason that we define int4 in C is for the
src/include/catalog/ files, but that won't work anymore if we ever want
to have an int8 column there.

Ideas:

* Leave it be.

* Replace int4 by int32 everywhere except in the catalog files.  Hope it
stays that way.

* Mark int4 as deprecated, change catalog files to use int32, adjust the
bki generation scripts.  (I think removing int4 might not be wise, as
external modules might be using it.)

While we're at it, how do we feel about using C standard types like
int32_t instead of (or initially in addition to) our own definitions?
These are well established by now and would help modernize our code and
the code of extensions a bit.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Testing 9.2 in ~production environment

2012-06-19 Thread Tom Lane

Peter Eisentraut  writes:
> On tis, 2012-06-19 at 02:38 -0400, Tom Lane wrote:
>> If James is testing text-comparison-heavy operations, it doesn't seem
>> shocking in the least.  strcoll() in most non-C locales is a pig. 

> Ah yes, of course, having lc_ctype != C also selects strcoll instead of
> strcmp.

Come to think of it, another possible factor is that LIKE can't use
ordinary indexes on text if the locale isn't C.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] use of int4/int32 in C code

2012-06-19 Thread Tom Lane

Peter Eisentraut  writes:
> What is the latest theory on using int4 vs. int32 in C code?
> (equivalently int2, int16)

I thought the general idea was to use int32 most places, but int4 in
catalog declarations.  I don't think it's tremendously important if
somebody uses the other though.

> While we're at it, how do we feel about using C standard types like
> int32_t instead of (or initially in addition to) our own definitions?

Can't get very excited about this either.  The most likely outcome of
a campaign to substitute the standard types is that back-patching would
become a truly painful activity.  IMO, anything that is going to result
in tens of thousands of diffs had better have a more-than-cosmetic
reason.  (That wouldn't apply if we only used int32_t in new code ...
but then, instead of two approved ways to do it, there would be three.
Which doesn't seem like it improves matters.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgsql_fdw in contrib

2012-06-19 Thread Merlin Moncure

On Mon, Jun 18, 2012 at 11:01 PM, Robert Haas  wrote:
> On Mon, Jun 18, 2012 at 12:24 PM, Merlin Moncure  wrote:
>> I can't help but wonder (having been down the contrib/core/extension
>> road myself) if it isn't better to improve the facilities to register
>> and search for qualified extensions (like Perl CPAN) so that people
>> looking for code to improve their backends can find it.  That way,
>> you're free to release/do xyz/abandon your project as you see fit
>> without having to go through -hackers.  This should also remove a lot
>> of the stigma with not being in core since if stock postgres
>> installations can access the necessary modules via CREATE EXTENSION, I
>> think it will make it easier for projects like this to get used with
>> the additional benefit of decentralizing project management.
>
> Well, if you're the type that likes to install everything via RPMs
> (and I am) then you wouldn't want this behavior, especially not
> automagically.  It seems to open up a host of security risks, too: I
> believe Tom has previously stated that Red Hat (and other distro)
> packaging guidelines forbid packages from installing software in
> places where they can then turn around and run it.  I suppose CPAN
> must have some sort of exception to this policy, though, so maybe it's
> not ironclad.

Huh? CPAN is just one example -- PyPM for python, npm for node etc
etc.  There is some distinction to that rule that is being missed so
that it doesn't apply to cases like this -- probably that it is
transacted by the user and/or requires su.

I think your points are supporting mine: the vast majority of postgres
administrators deal only with o/s packaged rpms and therefore with the
status quo are unable to utilize any extension that is not packaged
with contrib.  This means that there are two possible outcomes if you
want to write an extension:

1) get accepted into core
2) never get used

Given that running the gauntlet to #1 is not a particularly attractive
option (or even possible) in many cases for various reasons it tends
to discourage module development by 3rd parties.  There are several
very high quality extensions that could get a lot more exposure...for
example pl/sh.

But anyways, if you're happy with pgsql_fdw (aside: i looked at it,
and it's great!) in core, then by all means...

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allow WAL information to recover corrupted pg_controldata

2012-06-19 Thread Fujii Masao

On Tue, Jun 19, 2012 at 2:44 AM, Amit Kapila  wrote:
>> AFAIR you can create pg_control from scratch already with pg_resetxlog.
>> The hard part is coming up with values for the counters, such as the
>> next WAL location.  Some of them such as next OID are pretty harmless
>> if you don't guess right, but I'm worried that wrong next WAL could
>> make things worse not better.
>
> I believe if WAL files are proper as mentioned in Alvaro's mail, the purposed 
> logic should generate
> correct values.
> Do you see any problem in logic purposed in my original mail.
> Can I resume my work on this feature?

Maybe I'm missing your point, but... why don't you just use PITR to
recover from the corruption of pg_control?

Regards,

-- 
Fujii Masao

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 10/16] Introduce the concept that wal has a 'origin' node

2012-06-19 Thread Tom Lane

Andres Freund  writes:
> On Tuesday, June 19, 2012 08:03:04 AM Tom Lane wrote:
>> "Every WAL record"?  Why in heck would you attach it to every record?
>> Surely putting it in WAL page headers would be sufficient.

> The idea is that you can have cascading, circular and whatever replication 
> topologies if you include the "logical origin" of a wal causing action into 
> it.
> That is, if you have nodes A(1) and B(2) and a insert happens on A the wal 
> records generated by that will get an xl_origin_id = 1 and when it will be 
> decoded and replayed on B it will *also* get the id 1. Only when a change 
> originally is generated on Bit will get xl_origin_id = 2.

None of this explains to me why each individual WAL record would need to
carry a separate copy of the indicator.  We don't have multiple masters
putting records into the same WAL file, nor does it seem to me that it
could possibly be workable to merge WAL streams.  (If you are thinking
of something sufficiently high-level that merging could possibly work,
then it's not WAL, and we shouldn't be trying to make the WAL
representation cater for it.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Do we want a xmalloc or similar function in the Backend?

2012-06-19 Thread Andres Freund

Hi,

There are 70+ calls of malloc in the backend in the form of

type* foo = malloc(sizeof(...));
if(!foo)
   elog(ERROR, "could not allocate memory");

which is a bit annoying to write at times. Would somebody argue against 
introducing a function that does the above named xmalloc() or malloc_or_die()?

Greetings,

Andres
-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 10/16] Introduce the concept that wal has a 'origin' node

2012-06-19 Thread Andres Freund

On Tuesday, June 19, 2012 04:17:01 PM Tom Lane wrote:
> Andres Freund  writes:
> > On Tuesday, June 19, 2012 08:03:04 AM Tom Lane wrote:
> >> "Every WAL record"?  Why in heck would you attach it to every record?
> >> Surely putting it in WAL page headers would be sufficient.
> > 
> > The idea is that you can have cascading, circular and whatever
> > replication topologies if you include the "logical origin" of a wal
> > causing action into it.
> > That is, if you have nodes A(1) and B(2) and a insert happens on A the
> > wal records generated by that will get an xl_origin_id = 1 and when it
> > will be decoded and replayed on B it will *also* get the id 1. Only when
> > a change originally is generated on Bit will get xl_origin_id = 2.
> 
> None of this explains to me why each individual WAL record would need to
> carry a separate copy of the indicator.  We don't have multiple masters
> putting records into the same WAL file, nor does it seem to me that it
> could possibly be workable to merge WAL streams.  (If you are thinking
> of something sufficiently high-level that merging could possibly work,
> then it's not WAL, and we shouldn't be trying to make the WAL
> representation cater for it.)
The idea is that if youre replaying changes on node A originating from node B 
you set the origin to *B* in the wal records that are generated during that. 
So when B, in a bidirectional setup, replays the changes that A has made it 
can simply ignore all changes which originated on itself.

That works rather simple & performant if you have a conflict avoidance scheme.

For many scenarios you need to be able to separate locally generated changes 
and changes that have been replayed from another node. Without support of 
something like this this is really hard to achieve.

Greetings,

Andres

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 10/16] Introduce the concept that wal has a 'origin' node

2012-06-19 Thread Tom Lane

Andres Freund  writes:
> On Tuesday, June 19, 2012 04:17:01 PM Tom Lane wrote:
>> ...  (If you are thinking
>> of something sufficiently high-level that merging could possibly work,
>> then it's not WAL, and we shouldn't be trying to make the WAL
>> representation cater for it.)

> The idea is that if youre replaying changes on node A originating from node B
> you set the origin to *B* in the wal records that are generated during that. 
> So when B, in a bidirectional setup, replays the changes that A has made it 
> can simply ignore all changes which originated on itself.

This is most certainly not possible at the level of WAL.  As I said
above, we shouldn't be trying to shoehorn high level logical-replication
commands into WAL streams.  No good can come of confusing those concepts.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Do we want a xmalloc or similar function in the Backend?

2012-06-19 Thread Tom Lane

Andres Freund  writes:
> There are 70+ calls of malloc in the backend in the form of

> type* foo = malloc(sizeof(...));
> if(!foo)
>elog(ERROR, "could not allocate memory");

> which is a bit annoying to write at times. Would somebody argue against 
> introducing a function that does the above named xmalloc() or malloc_or_die()?

99% of the time, you should be using palloc if that's the behavior you
want.  I think most of the malloc calls are in places where we want a
bit more control over the error response.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] sortsupport for text

2012-06-19 Thread Peter Geoghegan

So, just to give a bit more weight to my argument that we should
recognise that equivalent strings ought to be treated identically, I
direct your attention to conformance requirement C9 of Unicode 3.0:

http://www.unicode.org/unicode/standard/versions/enumeratedversions.html#Unicode_3_0_0

This "requires that canonically equivalent text be treated
identically. See also C10 and several other places in The Unicode
Standard."

This page, "DETERMINISTIC SORTING", was also interesting:

http://www.unicode.org/notes/tn9/

The same hack that we use in varstr_cmp() appears as psuedo-code,
under "Forcing Deterministic Comparisons". I'm not sure in what
general sense what *they'd* call a non-deterministic comparison (i.e.
plain old strcoll()) is actually non-deterministic. Clearly a
"non-deterministic comparison" is deterministic to the extent that
given the same two binary strings and encoding and collation, the
comparison function will always give the same answer.

The article goes on to describe how we could bolt-on a strxfrm() type
value to a string, to ensure "deterministic comparison" (i.e.
identical behaviour to Postgres master) with pre-processing. Tom did
think that this might still be worth it. I'd rather avoid it.

The article goes on to say of so-called deterministic comparisons:
"However, a deterministic comparison is generally not best practice.
First, it has a certain performance cost in comparison, and a quite
substantial impact on sort key size. (For example, [ICU]
language-sensitive sort keys are generally about the size of the
original string, so appending a copy of the original string generally
doubles the size of the sort key.)  A database using these sort keys
can have substantially increased disk footprint and memory footprint,
and consequently reduced performance."

I note also that the The Single UNIX Specification, Version 2 says of
strcoll(): "The strxfrm() and strcmp() functions should be used for
sorting large lists". Why has it taken us this long to research this?
I am aware that Greg Stark suggested it back in 2005, but I doubt that
that was the first time.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 10/16] Introduce the concept that wal has a 'origin' node

2012-06-19 Thread Andres Freund

On Tuesday, June 19, 2012 04:30:59 PM Tom Lane wrote:
> Andres Freund  writes:
> > On Tuesday, June 19, 2012 04:17:01 PM Tom Lane wrote:
> >> ...  (If you are thinking
> >> of something sufficiently high-level that merging could possibly work,
> >> then it's not WAL, and we shouldn't be trying to make the WAL
> >> representation cater for it.)
> > 
> > The idea is that if youre replaying changes on node A originating from
> > node B you set the origin to *B* in the wal records that are generated
> > during that. So when B, in a bidirectional setup, replays the changes
> > that A has made it can simply ignore all changes which originated on
> > itself.
> 
> This is most certainly not possible at the level of WAL. 
Huh? This isn't used during normal crash-recovery replay. The information is 
used when decoding the wal into logical changes and applying those. Its just a 
common piece of information thats needed for a large number of records. 

Alternatively it could be added to all the records that need it, but that 
would smear the necessary logic - which is currently trivial - over more of 
the backend. And it would increase the actual size of wal which this one did 
not.

> As I said above, we shouldn't be trying to shoehorn high level logical-
> replication commands into WAL streams.  No good can come of confusing those
> concepts.
Its not doing anything high-level in there? All that patch does is embedding 
one single piece of information in previously unused space.

I can follow the argument that you do not want *any* logical information in 
the wal. But as I said in the patchset-introducing email: I don't really see 
an alternative. Otherwise we would just duplicate all the locking/scalability 
issues of xlog as well as the amount of writes.
This is, besides logging some more informations when wal_level = logical in 
some particular records (HEAP_UPDATE|DELETE and ensuring fpw's don't remove 
the record data in HEAP_(INSERT|UPDATE|DELETE) in patch 07/16 the only change 
that I really forsee being needed for doing the logical stuff.

Do you really see this as such a big problem?

Andres
-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Do we want a xmalloc or similar function in the Backend?

2012-06-19 Thread Andres Freund

On Tuesday, June 19, 2012 04:38:56 PM Tom Lane wrote:
> Andres Freund  writes:
> > There are 70+ calls of malloc in the backend in the form of
> > 
> > type* foo = malloc(sizeof(...));
> > if(!foo)
> > 
> >elog(ERROR, "could not allocate memory");
> > 
> > which is a bit annoying to write at times. Would somebody argue against
> > introducing a function that does the above named xmalloc() or
> > malloc_or_die()?
> 
> 99% of the time, you should be using palloc if that's the behavior you
> want.  I think most of the malloc calls are in places where we want a
> bit more control over the error response.
There are surprisingly many calls that just have the above logic without a 
more elaborate error message. Its mostly cases which allocate memory just once 
and never release it again to avoid having huge static data around for 
processes not using that part of the code.
True enough, most of those could use TopMemoryContext, its a rather 
established pattern not to do so in those cases though...

Andres
-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] lock_timeout and common SIGALRM framework

2012-06-19 Thread Alvaro Herrera


Excerpts from Boszormenyi Zoltan's message of mar jun 19 04:44:35 -0400 2012:
> 
> 2012-06-18 19:46 keltezéssel, Alvaro Herrera írta:
> > Excerpts from Boszormenyi Zoltan's message of vie may 11 03:54:13 -0400 
> > 2012:
> >> Hi,
> >>
> >> another rebasing and applied the GIT changes in
> >> ada8fa08fc6cf5f199b6df935b4d0a730aaa4fec to the
> >> Windows implementation of PGSemaphoreTimedLock.
> > Hi,
> >
> > I gave the framework patch a look.  One thing I'm not sure about is the
> > way you've defined the API.  It seems a bit strange to have a nice and
> > clean, generic interface in timeout.h; and then have the internal
> > implementation of the API cluttered with details such as what to do when
> > the deadlock timeout expires.  Wouldn't it be much better if we could
> > keep the details of CheckDeadLock itself within proc.c, for example?
> 
> Do you mean adding a callback function argument to for enable_timeout()
> would be better?

Well, that's not exactly what I was thinking, but maybe your idea is
better than what I had in mind.

> > Same for the other specific Check*Timeout functions.  It seems to me
> > that the only timeout.c specific requirement is to be able to poke
> > base_timeouts[].indicator and fin_time.  Maybe that can get done in
> > timeout.c and then have the generic checker call the module-specific
> > checker.
> 
> Or instead of static functions, Check* functions can be external
> to timeout.c. It seemed to be a good idea to move all the timeout
> related functions to timeout.c.

Yeah, except that they play with members of the timeout_params struct,
which hopefully does not need to be exported.  So if you can just move
(that is, put back in their original places) the portions that do not
touch that strcut, it'd be great.

> > Also, I see you're calling GetCurrentTimestamp() in the checkers; maybe
> > more than once per signal if you have multiple timeouts enabled.
> 
> Actually, GetCurrentTimestamp() is not called multiple times,
> because only the first timeout source in the queue can get triggered.

I see.

> > BTW it doesn't seem that datatype/timestamp.h is really necessary in
> > timeout.h.
> 
> You are wrong. get_timeout_start() returns TimestampTz and it's
> defined in datatype/timestamp.h.

Oh, right.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgsql_fdw in contrib

2012-06-19 Thread Kohei KaiGai

2012/6/19 Merlin Moncure :
> On Mon, Jun 18, 2012 at 11:01 PM, Robert Haas  wrote:
>> On Mon, Jun 18, 2012 at 12:24 PM, Merlin Moncure  wrote:
>>> I can't help but wonder (having been down the contrib/core/extension
>>> road myself) if it isn't better to improve the facilities to register
>>> and search for qualified extensions (like Perl CPAN) so that people
>>> looking for code to improve their backends can find it.  That way,
>>> you're free to release/do xyz/abandon your project as you see fit
>>> without having to go through -hackers.  This should also remove a lot
>>> of the stigma with not being in core since if stock postgres
>>> installations can access the necessary modules via CREATE EXTENSION, I
>>> think it will make it easier for projects like this to get used with
>>> the additional benefit of decentralizing project management.
>>
>> Well, if you're the type that likes to install everything via RPMs
>> (and I am) then you wouldn't want this behavior, especially not
>> automagically.  It seems to open up a host of security risks, too: I
>> believe Tom has previously stated that Red Hat (and other distro)
>> packaging guidelines forbid packages from installing software in
>> places where they can then turn around and run it.  I suppose CPAN
>> must have some sort of exception to this policy, though, so maybe it's
>> not ironclad.
>
> Huh? CPAN is just one example -- PyPM for python, npm for node etc
> etc.  There is some distinction to that rule that is being missed so
> that it doesn't apply to cases like this -- probably that it is
> transacted by the user and/or requires su.
>
> I think your points are supporting mine: the vast majority of postgres
> administrators deal only with o/s packaged rpms and therefore with the
> status quo are unable to utilize any extension that is not packaged
> with contrib.  This means that there are two possible outcomes if you
> want to write an extension:
>
> 1) get accepted into core
> 2) never get used
>
> Given that running the gauntlet to #1 is not a particularly attractive
> option (or even possible) in many cases for various reasons it tends
> to discourage module development by 3rd parties.  There are several
> very high quality extensions that could get a lot more exposure...for
> example pl/sh.
>
> But anyways, if you're happy with pgsql_fdw (aside: i looked at it,
> and it's great!) in core, then by all means...
>
Let me push the pgsql_fdw in core from different perspective.

Right now, FDW is a feature that will take many enhancement in
the near future like join-pushdown, writable APIs and so on.
If we would not have a FDW extension in core that communicate
with actual RDBMS, instead of flat files, it makes our development
efforts more complex. Please assume a scenario people alwats
tries to submit two patches; one for the core, and the other for
a particular out-of-tree extension.
I believe it is reasonable choice to have a FDW for PostgreSQL
in core, rather than MySQL or Oracle. We can use this extension
to show how new feature works, whenever we try to extend FDW
APIs.

BTW, Hanada-san-
It seems to me the expected result of regression test was not
included in this patch. Please submit it again-

Thanks,
-- 
KaiGai Kohei 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] sortsupport for text

2012-06-19 Thread Kevin Grittner

Peter Geoghegan  wrote:
 
> So, just to give a bit more weight to my argument that we should
> recognise that equivalent strings ought to be treated identically
 
Since we appear to be questioning everything in this area, I'll
raise something which has been bugging me for a while: in some other
systems I've used, the "tie-breaker" comparison for equivalent
values comes after equivalence sorting on *all* sort keys, rather
than *each* sort key.  For example, this much makes sense with
lc_collate = 'en_US.UTF-8':
 
test=# create table c (last_name text not null, first_name text);
CREATE TABLE
test=# insert into c values ('smith', 'bob'), ('smith', 'peter'),
('SMITH', 'EDWARD');
INSERT 0 3
test=# select * from c order by 2;
 last_name | first_name 
---+
 smith | bob
 SMITH | EDWARD
 smith | peter
(3 rows)
 
This seems completely wrong:
 
test=# select * from c order by 1,2;
 last_name | first_name 
---+
 smith | bob
 smith | peter
 SMITH | EDWARD
(3 rows)
 
I have seen other databases which get it in the order I would expect
-- where the C compare only matters within groups of equivalent
rows.  It seems that PostgreSQL effectively orders by:
 
  last_name using collation 'en_US.UTF-8'
  last_name using collation 'C'
  first_name using collation 'en_US.UTF-8'
  first_name using collation 'C'
 
while some other products order by:
 
  last_name using collation 'en_US.UTF-8'
  first_name using collation 'en_US.UTF-8'
  last_name using collation 'C'
  first_name using collation 'C'
 
I'm sure the latter is harder to do and slower to execute; but the
former just doesn't seem defensible as correct.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] initdb and fsync

2012-06-19 Thread David Fetter

On Mon, Jun 18, 2012 at 09:34:30PM +0300, Peter Eisentraut wrote:
> On mån, 2012-06-18 at 18:05 +0200, Andres Freund wrote:
> > - defaulting to initdb -N in the regression suite is not a good imo,
> > because that way the buildfarm won't catch problems in that area...
> > 
> The regression test suite also starts postgres with fsync off.  This is
> good, and I don't want to have more overhead in the tests.  It's not
> like we already have awesome test coverage for OS interaction.

What sorts of coverage would we want?

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] libpq compression

2012-06-19 Thread Robert Haas

On Mon, Jun 18, 2012 at 1:42 PM, Martijn van Oosterhout
 wrote:
> On Sun, Jun 17, 2012 at 12:29:53PM -0400, Tom Lane wrote:
>> The fly in the ointment with any of these ideas is that the "configure
>> list" is not a list of exact cipher names, as per Magnus' comment that
>> the current default includes tests like "!aNULL".  I am not sure that
>> we know how to evaluate such conditions if we are applying an
>> after-the-fact check on the selected cipher.  Does OpenSSL expose any
>> API for evaluating whether a selected cipher meets such a test?
>
> I'm not sure whether there's an API for it, but you can certainly check
> manually with "openssl ciphers -v", for example:
>
> $ openssl ciphers -v 'ALL:!ADH:RC4+RSA:+HIGH:+MEDIUM:+LOW:+SSLv2:+EXP'
> NULL-SHA                SSLv3 Kx=RSA      Au=RSA  Enc=None      Mac=SHA1
> NULL-MD5                SSLv3 Kx=RSA      Au=RSA  Enc=None      Mac=MD5
>
> ...etc...
>
> So unless the openssl includes the code twice there must be a way to
> extract the list from the library.

There doubtless is, but I'd being willing to wager that you won't be
able to figure out the exact method without reading the source code
for 'opennssl ciphers' to see how it was done there, and most likely
you'll find that at least one of the functions they use has no man
page.  Documentation isn't their strong point.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pl/perl and utf-8 in sql_ascii databases

2012-06-19 Thread Robert Haas

On Mon, Jun 18, 2012 at 3:30 PM, Alvaro Herrera
 wrote:
> Excerpts from Alex Hunsaker's message of vie feb 10 16:53:05 -0300 2012:
>
>> Seems like we missed the fact that we still did SvUTF8_on() in sv2cstr
>> and SvPVUTF8() when turning a perl string into a cstring.
>
> Hmm, this patch belongs into back branches too, right?  Not just the
> current development tree?

It seems like a bug fix to me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL format changes

2012-06-19 Thread Andres Freund

On Tuesday, June 19, 2012 10:14:08 AM Heikki Linnakangas wrote:
> On 18.06.2012 21:08, Heikki Linnakangas wrote:
> > On 18.06.2012 21:00, Robert Haas wrote:
> >> On Thu, Jun 14, 2012 at 5:58 PM, Andres Freund
> >> 
> >> wrote:
>  1. Use a 64-bit segment number, instead of the log/seg combination.
>  And don't waste the last segment on each logical 4 GB log file. The
>  concept of a "logical log file" is now completely gone. XLogRecPtr is
>  unchanged,
>  but it should now be understood as a plain 64-bit value, just split
>  into
>  two 32-bit integers for historical reasons. On disk, this means that
>  there will be log files ending in FF, those were skipped before.
> >>> 
> >>> Whats the reason for keeping that awkward split now? There aren't
> >>> that many
> >>> users of xlogid/xcrecoff and many of those would be better served by
> >>> using
> >>> helper macros.
> >> 
> >> I wondered that, too. There may be a good reason for keeping it split
> >> up that way, but we at least oughta think about it a bit.
> > 
> > The page header contains an XLogRecPtr (LSN), so if we change it we'll
> > have to deal with pg_upgrade. I guess we could still keep XLogRecPtr
> > around as the on-disk representation, and convert between the 64-bit
> > integer and XLogRecPtr in PageGetLSN/PageSetLSN. I can try that out -
> > many xlog calculations would admittedly be simpler if it was an uint64.
> 
> Well, that was easier than I thought. Attached is a patch to make
> XLogRecPtr a uint64, on top of my other WAL format patches. I think we
> should go ahead with this.
Cool. You plan to merge XLogSegNo with XLogRecPtr in that case? I am not sure 
if having two representations which just have a constant factor inbetween 
really makes sense.

> The LSNs on pages are still stored in the old format, to avoid changing
> the on-disk format and breaking pg_upgrade. The XLogRecPtrs stored the
> control file and WAL are changed, however, so an initdb (or at least
> pg_resetxlog) is required.
Sounds sensible.

> Should we keep the old representation in the replication protocol
> messages? That would make it simpler to write a client that works with
> different server versions (like pg_receivexlog). Or, while we're at it,
> perhaps we should mandate network-byte order for all the integer and
> XLogRecPtr fields in the replication protocol.
The replication protocol uses pq_sendint for integers which should take care 
of converting to big endian already. I don't think anything but the wal itself 
is otherwise transported in a binary fashion? So I don't think there is any 
such architecture dependency in the protocol currently?

I don't really see a point in keeping around a backward-compatible 
representation just for the sake of running such tools on multiple versions. I 
might not be pragmatic enough, but: Why would you want to do that *at the 
moment*? Many of the other tools are already version specific, so...
When the protocol starts to be used by more tools, maybe, but imo were not 
there yet.

But then its not hard to convert to the old representation for that.

> I kept the %X/%X representation in error messages etc. I'm quite used to
> that output, so reluctant to change it, although it's a bit silly now
> that it represents just 64-bit value. Using UINT64_FORMAT would also
> make the messages harder to translate.
No opinion on that. Its easier to see for me whether two values are exactly 
the same or very similar with the 64bit representation but its harder to gauge 
bigger differences. So ...

Greetings,

Andres
-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Pg default's verbosity?

2012-06-19 Thread Robert Haas

On Tue, Jun 19, 2012 at 2:15 AM, Tom Lane  wrote:
> Robert Haas  writes:
>> There might be something to the idea of demoting a few of the things
>> we've traditionally had as NOTICEs, though.  IME, the following two
>> messages account for a huge percentage of the chatter:
>
>> NOTICE:  CREATE TABLE will create implicit sequence "foo_a_seq" for
>> serial column "foo.a"
>> NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index
>> "foo_pkey" for table "foo"
>
> Personally, I'd have no problem with flat-out dropping (not demoting)
> both of those two specific messages.  I seem to recall that Bruce has
> lobbied for them heavily in the past, though.

My vote would be to make 'em DEBUG1, and similarly with the UNIQUE
message that Fabian mentions downthread.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL format changes

2012-06-19 Thread Robert Haas

On Tue, Jun 19, 2012 at 4:14 AM, Heikki Linnakangas
 wrote:
> Well, that was easier than I thought. Attached is a patch to make XLogRecPtr
> a uint64, on top of my other WAL format patches. I think we should go ahead
> with this.

+1.

> The LSNs on pages are still stored in the old format, to avoid changing the
> on-disk format and breaking pg_upgrade. The XLogRecPtrs stored the control
> file and WAL are changed, however, so an initdb (or at least pg_resetxlog)
> is required.

Seems fine.

> Should we keep the old representation in the replication protocol messages?
> That would make it simpler to write a client that works with different
> server versions (like pg_receivexlog). Or, while we're at it, perhaps we
> should mandate network-byte order for all the integer and XLogRecPtr fields
> in the replication protocol. That would make it easier to write a client
> that works across different architectures, in >= 9.3. The contents of the
> WAL would of course be architecture-dependent, but it would be nice if
> pg_receivexlog and similar tools could nevertheless be
> architecture-independent.

I share Andres' question about how we're doing this already.  I think
if we're going to break this, I'd rather do it in 9.3 than 5 years
from now.  At this point it's just a minor annoyance, but it'll
probably get worse as people write more tools that understand WAL.

> I kept the %X/%X representation in error messages etc. I'm quite used to
> that output, so reluctant to change it, although it's a bit silly now that
> it represents just 64-bit value. Using UINT64_FORMAT would also make the
> messages harder to translate.

I could go either way on this one, but I have no problem with the way
you did it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Skip checkpoint on promoting from streaming replication

2012-06-19 Thread Fujii Masao

On Tue, Jun 19, 2012 at 5:30 PM, Kyotaro HORIGUCHI
 wrote:
> Thank you.
>
>> What happens if the server skips an end-of-recovery checkpoint,
>> is promoted to the master, runs some write transactions,
>> crashes and restarts automatically before it completes
>> checkpoint? In this case, the server needs to do crash recovery
>> from the last checkpoint record with old timeline ID to the
>> latest WAL record with new timeline ID. How does crash recovery
>> do recovery beyond timeline?
>
> Basically the same as archive recovery as far as I saw. It is
> already implemented to work in that way.
>
> After this patch applied, StartupXLOG() gets its
> recoveryTargetTLI from the new field lastestTLI in the control
> file instead of the latest checkpoint. And the latest checkpoint
> record informs its TLI and WAL location as before, but its TLI
> does not have a significant meaning in the recovery sequence.
>
> Suggest the case following,
>
>      |seg 1     | seg 2    |
> TLI 1 |...c..|00|
>          C           P  X
> TLI 2            |00|
>
> * C - checkpoint, P - promotion, X - crash just after here
>
> This shows the situation that the latest checkpoint(restartpoint)
> has been taken at TLI=1/SEG=1/OFF=4 and promoted at
> TLI=1/SEG=2/OFF=5, then crashed just after TLI=2/SEG=2/OFF=8.
> Promotion itself inserts no wal records but creates a copy of the
> current segment for new TLI. the file for TLI=2/SEG=1 should not
> exist. (Who will create it?)
>
> The control file will looks as follows
>
> latest checkpoint : TLI=1/SEG=1/OFF=4
> latest TLI        : 2
>
> So the crash recovery sequence starts from SEG=1/LOC=4.
> expectedTLIs will be (2, 1) so 1 will naturally be selected as
> the TLI for SEG1 and 2 for SEG2 in XLogFileReadAnyTLI().
>
> In the closer view, startup constructs expectedTLIs reading the
> timeline hisotry file corresponds to the recoveryTargetTLI. Then
> runs the recovery sequence from the redo point of the latest
> checkpoint using WALs with the largest TLI - which is
> distinguised by its file name, not header - within the
> expectedTLIs in XLogPageRead(). The only difference to archive
> recovery is XLogFileReadAnyTLI() reads only the WAL files already
> sit in pg_xlog directory, and not reaches archive. The pages with
> the new TLI will be naturally picked up as mentioned above in
> this sequence and then will stop at the last readable record.
>
> latestTLI field in the control file is updated just after the TLI
> was incremented and the new WAL files with the new TLI was
> created. So the crash recovery sequence won't stop before
> reaching the WAL with new TLI disignated in the control file.

Is it guaranteed that all the files (e.g., the latest timeline history file)
required for such crash recovery exist in pg_xlog? If yes, your
approach might work well.

Regards,

-- 
Fujii Masao

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Transactions over pathological TCP connections

2012-06-19 Thread Robert Haas

On Tue, Jun 19, 2012 at 1:56 AM, Tom Lane  wrote:
> The transaction would be committed before a command success report is
> delivered to the client, so I don't think delivered-and-not-marked is
> possible.

...unless you have configured synchronous_commit=off, or fsync=off.

Or unless your disk melts into a heap of slag and you have to restore
from backup.  You can protect against that last case using synchronous
replication.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 10/16] Introduce the concept that wal has a 'origin' node

2012-06-19 Thread Tom Lane

Andres Freund  writes:
> On Tuesday, June 19, 2012 04:30:59 PM Tom Lane wrote:
>>> ...  (If you are thinking
>>> of something sufficiently high-level that merging could possibly work,
>>> then it's not WAL, and we shouldn't be trying to make the WAL
>>> representation cater for it.)

> Do you really see this as such a big problem?

It looks suspiciously like "I have a hammer, therefore every problem
must be a nail".  I don't like the design concept of cramming logical
replication records into WAL in the first place.

However, if we're dead set on doing it that way, let us put information
that is only relevant to logical replication records into only the
logical replication records.  Saving a couple bytes in each such record
is penny-wise and pound-foolish, I'm afraid; especially when you're
nailing down hard, unexpansible limits at the very beginning of the
development process in order to save those bytes.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pgsql_fdw in contrib

2012-06-19 Thread Merlin Moncure

On Tue, Jun 19, 2012 at 10:15 AM, Kohei KaiGai  wrote:
> Let me push the pgsql_fdw in core from different perspective.
>
> Right now, FDW is a feature that will take many enhancement in
> the near future like join-pushdown, writable APIs and so on.
> If we would not have a FDW extension in core that communicate
> with actual RDBMS, instead of flat files, it makes our development
> efforts more complex. Please assume a scenario people alwats
> tries to submit two patches; one for the core, and the other for
> a particular out-of-tree extension.

yeah. plus, it's nice to have a very high quality fdw implementation
in core for others to crib from.  I have no objection -- I just took a
convenient opportunity to hijack the thread for some hand-wavy ideas
about an extension repository. :-).

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] sortsupport for text

2012-06-19 Thread Peter Geoghegan

On 19 June 2012 16:17, Kevin Grittner  wrote:
> Peter Geoghegan  wrote:
>
>> So, just to give a bit more weight to my argument that we should
>> recognise that equivalent strings ought to be treated identically
>
> Since we appear to be questioning everything in this area, I'll
> raise something which has been bugging me for a while: in some other
> systems I've used, the "tie-breaker" comparison for equivalent
> values comes after equivalence sorting on *all* sort keys, rather
> than *each* sort key.

Are you sure that they actually have a tie-breaker, and don't just
make the distinction between equality and equivalence (if only
internally)? I would have checked that myself already, but I don't
have access to any other RDBMS that I'd expect to care about these
kinds of distinctions. They make sense for ensuring that the text
comparator's notion of equality is consistent with text's general
notion (if that's bitwise equality, which I suspect it is in these
other products too for the same reasons it is for us). I don't see why
you'd want a tie-breaker across multiple keys. I mean, you could, I
just don't see any reason to.

> test=# select * from c order by 2;
>  last_name | first_name
> ---+
>  smith     | bob
>  SMITH     | EDWARD
>  smith     | peter
> (3 rows)
>
> This seems completely wrong:
>
> test=# select * from c order by 1,2;
>  last_name | first_name
> ---+
>  smith     | bob
>  smith     | peter
>  SMITH     | EDWARD
> (3 rows)

Agreed. Definitely a POLA violation.

> I'm sure the latter is harder to do and slower to execute; but the
> former just doesn't seem defensible as correct.

This same gripe is held by the author of that sorting document I
linked to from the Unicode consortium, with a very similar example. So
it seems like this could be a win from several perspectives, as it
would enable the strxfrm() optimisation. I'm pretty sure that
pg_upgrade wouldn't be very happy about this, so we'd have to have a
legacy compatibility mode.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP Patch: Selective binary conversion of CSV file foreign tables

2012-06-19 Thread Kohei KaiGai

Hi Fujita-san,

Could you rebase this patch towards the latest tree?
It was unavailable to apply the patch cleanly.

I looked over the patch, then noticed a few points.

At ProcessCopyOptions(), defel->arg can be NIL, isn't it?
If so, cstate->convert_binary is not a suitable flag to check
redundant option. It seems to me cstate->convert_selectively
is more suitable flag to check it.

+   else if (strcmp(defel->defname, "convert_binary") == 0)
+   {
+   if (cstate->convert_binary)
+   ereport(ERROR,
+   (errcode(ERRCODE_SYNTAX_ERROR),
+errmsg("conflicting or redundant options")));
+   cstate->convert_selectively = true;
+   if (defel->arg == NULL || IsA(defel->arg, List))
+   cstate->convert_binary = (List *) defel->arg;
+   else
+   ereport(ERROR,
+   (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+errmsg("argument to option \"%s\" must be a
list of column names",
+   defel->defname)));
+   }


At NextCopyFrom(), this routine computes default values if configured.
In case when these values are not referenced, it might be possible to
skip unnecessary calculations.
Is it unavailable to add logic to avoid to construct cstate->defmap on
unreferenced columns at  BeginCopyFrom()?

Thanks,

2012/5/11 Etsuro Fujita :
>> -Original Message-
>> From: Robert Haas [mailto:robertmh...@gmail.com]
>> Sent: Friday, May 11, 2012 1:36 AM
>> To: Etsuro Fujita
>> Cc: pgsql-hackers@postgresql.org
>> Subject: Re: [HACKERS] WIP Patch: Selective binary conversion of CSV file
>> foreign tables
>>
>> On Tue, May 8, 2012 at 7:26 AM, Etsuro Fujita
> 
>> wrote:
>> > I would like to propose to improve parsing efficiency of
>> > contrib/file_fdw by selective parsing proposed by Alagiannis et
>> > al.[1], which means that for a CSV/TEXT file foreign table, file_fdw
>> > performs binary conversion only for the columns needed for query
>> > processing.  Attached is a WIP patch implementing the feature.
>>
>> Can you add this to the next CommitFest?  Looks interesting.
>
> Done.
>
> Best regards,
> Etsuro Fujita
>
>> --
>> Robert Haas
>> EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL
>> Company
>>
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers



-- 
KaiGai Kohei 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 10/16] Introduce the concept that wal has a 'origin' node

2012-06-19 Thread Andres Freund

Hi,

On Tuesday, June 19, 2012 06:11:20 PM Tom Lane wrote:
> Andres Freund  writes:
> > On Tuesday, June 19, 2012 04:30:59 PM Tom Lane wrote:
> >>> ...  (If you are thinking
> >>> of something sufficiently high-level that merging could possibly work,
> >>> then it's not WAL, and we shouldn't be trying to make the WAL
> >>> representation cater for it.)
> > 
> > Do you really see this as such a big problem?
> It looks suspiciously like "I have a hammer, therefore every problem
> must be a nail".  I don't like the design concept of cramming logical
> replication records into WAL in the first place.
There are - so far - no specific "logical replication records". Its a 
relatively minor amount of additional data under wal_level=logical for 
existing records. HEAP_UPDATE gets the old primary key on updates changing the 
pkey and HEAP_DELETE always has the pkey. HEAP_INSERT|UPDATE|
DELETE,HEAP2_MULTI_INSERT put their information in another XLogRecData block 
than the page to handle full page writes. Thats it.

I can definitely understand hesitation about that, but I simply see no 
realistic way to solve the issues of existing replication solutions otherwise.
Do you have a better idea to solve those than the above? Without significant 
complications of the backend code and without loads of additional writes going 
on?
I *really* would like to hear them if you do.

> However, if we're dead set on doing it that way, let us put information
> that is only relevant to logical replication records into only the
> logical replication records. 
I found, and still do, the idea of having the origin_id in there rather 
elegant. If people prefer adding the same block to all of the above xlog 
records: I can live with that and will then do so. It makes some things more 
complicated, but its not too bad.

Greetings,

Andres
-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] sortsupport for text

2012-06-19 Thread Kevin Grittner

Peter Geoghegan  wrote:
> Kevin Grittner  wrote:
 
>> Since we appear to be questioning everything in this area, I'll
>> raise something which has been bugging me for a while: in some
>> other systems I've used, the "tie-breaker" comparison for
>> equivalent values comes after equivalence sorting on *all* sort
>> keys, rather than *each* sort key.
> 
> Are you sure that they actually have a tie-breaker, and don't just
> make the distinction between equality and equivalence (if only
> internally)?
 
I'm pretty sure that when I was using Sybase ASE the order for
non-equal values was always predictable, and it behaved in the
manner I describe below.  I'm less sure about any other product.
 
> I don't see why you'd want a tie-breaker across multiple keys. I
> mean, you could, I just don't see any reason to.
 
So that you can have entirely repeatable results across multiple
runs?
 
>> test=# select * from c order by 2;
>>  last_name | first_name
>> ---+
>>  smith | bob
>>  SMITH | EDWARD
>>  smith | peter
>> (3 rows)
>>
>> This seems completely wrong:
>>
>> test=# select * from c order by 1,2;
>>  last_name | first_name
>> ---+
>>  smith | bob
>>  smith | peter
>>  SMITH | EDWARD
>> (3 rows)
> 
> Agreed. Definitely a POLA violation.
> 
>> I'm sure the latter is harder to do and slower to execute; but
>> the former just doesn't seem defensible as correct.
> 
> This same gripe is held by the author of that sorting document I
> linked to from the Unicode consortium, with a very similar
> example. So it seems like this could be a win from several
> perspectives, as it would enable the strxfrm() optimisation. I'm
> pretty sure that pg_upgrade wouldn't be very happy about this, so
> we'd have to have a legacy compatibility mode.
 
At a minimum, it could require rebuilding indexes with multiple
columns where any indexed value before the last index column can be
equivalent but not equal.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Near-duplicate RI NO ACTION and RESTRICT triggers

2012-06-19 Thread Tom Lane

Our current interpretation of the difference between foreign keys with
ON UPDATE/DELETE NO ACTION and those with ON UPDATE/DELETE RESTRICT
is that they mean the same thing but RESTRICT checks are not deferrable.
It follows from this that the trigger code ought to be the same for
NO ACTION and RESTRICT cases, and so it occurred to me that we could
get rid of a few hundred lines in ri_triggers.c if we removed the
duplicated code.

Comparing the actual code in the different functions, though, there is
a difference: the NO ACTION triggers call ri_Check_Pk_Match to see if
another PK row has been inserted/modified to provide the same key values
that the trigger subject row no longer does.  (ri_Check_Pk_Match also
makes some checks for NULL-key cases, but these are redundant with tests
made in the main trigger functions, so they ought to go away.)  The
RESTRICT triggers do not do this.  It's fairly easy to construct a case
where it makes a difference:

create temp table pp (f1 int primary key);
create temp table cc (f1 int references pp on update no action);
insert into pp values(12);
insert into pp values(11);
update pp set f1=f1+1; -- now we have 13 and 12
insert into cc values(13); -- ok
update pp set f1=f1+1; -- now we have 14 and 13, FK is still OK
update pp set f1=f1+1; -- would result in 15 and 14, so FK fails

If you change the foreign key to be ON UPDATE RESTRICT, the second
UPDATE fails because the RESTRICT trigger doesn't notice that another
PK row has been modified to provide the key required by the FK row.

I think that the argument for having the RESTRICT triggers behave
like this is that the SQL spec envisions the RESTRICT check occurring
immediately when the individual PK row is updated/deleted, and so there
would be no opportunity for another PK row to be updated into its place.
(Or, in plainer English, RESTRICT should mean "you can't modify this
row's keys at all if it has dependents".)  Because we implement RESTRICT
through an AFTER trigger that can't run earlier than end-of-statement,
we can't exactly match the spec's semantics, but we can get fairly
close so long as you don't think about what would be seen by
e.g. user-written triggers executing during the statement.

I'm happy with continuing to have this behavioral difference between
the two sets of triggers, but wanted to throw it up for discussion:
does anyone think it'd be better to apply ri_Check_Pk_Match in the
RESTRICT triggers too?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] lock_timeout and common SIGALRM framework

2012-06-19 Thread Alvaro Herrera


Excerpts from Boszormenyi Zoltan's message of mar jun 19 12:44:04 -0400 2012:

> OK, all 4 Check* functions are now moved back into proc.c,
> nothing outside of timeout.c touches anything in it. New patches
> are attached.

Yeah, I like this one better, thanks.

It seems to me that the "check" functions are no longer "check" anymore,
right?  I mean, they don't check whether the time has expired.  It can
be argued that CheckDeadLock is well named, because it does check
whether there is a deadlock before doing anything else; but
CheckStandbyTimeout is no longer a check -- it just sends a signal.
Do we need to rename these functions?

Why are you using the deadlock timeout for authentication?  Wouldn't it
make more sense to have a separate TimeoutName, just to keep things
clean?

The "NB:" comment here doesn't seem to be useful anymore:
+ /*
+  * Init, Destroy and Check functions for different timeouts.
+  *
+  * NB: all Check* functions are run inside a signal handler, so be very wary
+  * about what is done in them or in called routines.
+  
*/


In base_timeouts you don't initialize fin_time for any of the members.
This is probably unimportant but then why initialize start_time?

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] sortsupport for text

2012-06-19 Thread Peter Geoghegan

On 19 June 2012 17:45, Kevin Grittner  wrote:
> Peter Geoghegan  wrote:
>> Are you sure that they actually have a tie-breaker, and don't just
>> make the distinction between equality and equivalence (if only
>> internally)?
>
> I'm pretty sure that when I was using Sybase ASE the order for
> non-equal values was always predictable, and it behaved in the
> manner I describe below.  I'm less sure about any other product.

Maybe it used a physical row identifier as a tie-breaker? Note that we
use ItemPointers as a tie-breaker for sorting index tuples.

I imagine that it was at least predictable among columns being sorted,
if only because en_US.UTF-8 doesn't have any notion of equivalence
(that is, it just so happens that there are no two strings that are
equivalent but not bitwise equal). It would surely be impractical to
do a comparison for the entire row, as that could be really expensive.

>> I don't see why you'd want a tie-breaker across multiple keys. I
>> mean, you could, I just don't see any reason to.
>
> So that you can have entirely repeatable results across multiple
> runs?

I suppose that isn't an unreasonable use-case, but it would be
unreasonable to require that everyone pay for it if, particularly if
it implied another strcmp(). You don't need me to tell you that we
generally discourage the idea that the order that tuples are returned
in is defined in any way beyond that asked for by the user.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] initdb and fsync

2012-06-19 Thread Jeff Davis

On Mon, 2012-06-18 at 21:41 +0200, Andres Freund wrote:
> It calls pg_flush_data inside of copy_file which does the posix_fadvise... So 
> maybe just put the sync_file_range in pg_flush_data?

The functions in fd.c aren't linked to initdb, so it's a challenge to
share that code (I remember now: that's why I didn't use pg_flush_data
originally). I don't see a clean way to resolve that, do you?

Also, it seems that sync_file_range() doesn't help with creating a
database much (on my machine), so it doesn't seem important to use it
there.

Right now I'm inclined to leave the patch as-is.

Regards,
Jeff Davis

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 10/16] Introduce the concept that wal has a 'origin' node

2012-06-19 Thread Robert Haas

On Tue, Jun 19, 2012 at 12:11 PM, Tom Lane  wrote:
> Andres Freund  writes:
>> On Tuesday, June 19, 2012 04:30:59 PM Tom Lane wrote:
 ...  (If you are thinking
 of something sufficiently high-level that merging could possibly work,
 then it's not WAL, and we shouldn't be trying to make the WAL
 representation cater for it.)
>
>> Do you really see this as such a big problem?
>
> It looks suspiciously like "I have a hammer, therefore every problem
> must be a nail".  I don't like the design concept of cramming logical
> replication records into WAL in the first place.

Me, neither.  I think it's necessary to try to find a way of
generating logical replication records from WAL.  But once generated,
I think those records should form their own stream, independent of
WAL.  If you take the contrary position that they should be included
in WAL, then when you filter the WAL stream down to just the records
of interest to logical replication, you end up with a WAL stream with
holes in it, which is one of the things that Andres listed as an
unresolved design problem in his original email.

Moreover, this isn't necessary at all for single-master replication,
or even multi-source replication where each table has a single master.
 It's only necessary for full multi-master replication, which we have
no consensus to include in core, and even if we did have a consensus
to include it in core, it certainly shouldn't be the first feature we
design.

> However, if we're dead set on doing it that way, let us put information
> that is only relevant to logical replication records into only the
> logical replication records.

Right.  If we decide we need this, and if we did decide to conflate
the WAL stream, both of which I disagree with as noted above, then we
still don't need it on every record.  It would probably be sufficient
for local transactions to do nothing at all (and we can implicitly
assume that they have master node ID = local node ID) and transactions
which are replaying remote changes to emit one record per XID per
checkpoint cycle containing the remote node ID.

> Saving a couple bytes in each such record
> is penny-wise and pound-foolish, I'm afraid; especially when you're
> nailing down hard, unexpansible limits at the very beginning of the
> development process in order to save those bytes.

I completely agree.  I think that, as Dan said upthread, having a 64
or 128 bit ID so that it can be generated automatically rather than
configured by an administrator who must be careful not to duplicate
node IDs in any pair of systems that could ever end up talking to each
other would be a vast usability improvement.  Perhaps systems A, B,
and C are replicating to each other today, as are systems D and E.
But now suppose that someone decides they want to replicate one table
between A and D.  Suddenly the node IDs have to be distinct where they
didn't before, and now there's potentially a problem to hassle with
that wouldn't have been an issue if the node IDs had been wide enough
to begin with.  It is not unusual for people to decide after-the-fact
to begin replicating between machines where this wasn't originally
anticipated and which may even be even be under different
administrative control.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] initdb and fsync

2012-06-19 Thread Andres Freund

On Tuesday, June 19, 2012 07:22:02 PM Jeff Davis wrote:
> On Mon, 2012-06-18 at 21:41 +0200, Andres Freund wrote:
> > It calls pg_flush_data inside of copy_file which does the
> > posix_fadvise... So maybe just put the sync_file_range in pg_flush_data?
> 
> The functions in fd.c aren't linked to initdb, so it's a challenge to
> share that code (I remember now: that's why I didn't use pg_flush_data
> originally). I don't see a clean way to resolve that, do you?
> 
> Also, it seems that sync_file_range() doesn't help with creating a
> database much (on my machine), so it doesn't seem important to use it
> there.
> 
> Right now I'm inclined to leave the patch as-is.
Fine with that, I wanted to bring it up and see it documented.

I have marked it with ready for committer. That committer needs to decide on -
N in the regression tests or not, but that shouldn't be much of a problem ;)

Cool!

Andres
-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pl/perl and utf-8 in sql_ascii databases

2012-06-19 Thread Alex Hunsaker

On Mon, Jun 18, 2012 at 1:30 PM, Alvaro Herrera
 wrote:
>
> Excerpts from Alex Hunsaker's message of vie feb 10 16:53:05 -0300 2012:
>
>> Seems like we missed the fact that we still did SvUTF8_on() in sv2cstr
>> and SvPVUTF8() when turning a perl string into a cstring.
>
> Hmm, this patch belongs into back branches too, right?  Not just the
> current development tree?

(Have been out of town, sorry for the late reply)

Yeah, back to 9.1.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Do we want a xmalloc or similar function in the Backend?

2012-06-19 Thread Robert Haas

On Tue, Jun 19, 2012 at 10:17 AM, Andres Freund  wrote:
> There are 70+ calls of malloc in the backend in the form of
>
> type* foo = malloc(sizeof(...));
> if(!foo)
>   elog(ERROR, "could not allocate memory");
>
> which is a bit annoying to write at times. Would somebody argue against
> introducing a function that does the above named xmalloc() or malloc_or_die()?

I can't even find 70 malloc calls in the entire backend, let alone 70
with that pattern.  Still, I don't think malloc_or_error (not die)
would be a bad idea.

But the error should definitely be written as:

ereport(ERROR,
(errcode(ERRCODE_OUT_OF_MEMORY),
 errmsg("out of memory")));

...not elog.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Do we want a xmalloc or similar function in the Backend?

2012-06-19 Thread Andres Freund

On Tuesday, June 19, 2012 07:35:53 PM Robert Haas wrote:
> On Tue, Jun 19, 2012 at 10:17 AM, Andres Freund  
wrote:
> > There are 70+ calls of malloc in the backend in the form of
> > 
> > type* foo = malloc(sizeof(...));
> > if(!foo)
> >   elog(ERROR, "could not allocate memory");
> > 
> > which is a bit annoying to write at times. Would somebody argue against
> > introducing a function that does the above named xmalloc() or
> > malloc_or_die()?
> 
> I can't even find 70 malloc calls in the entire backend, let alone 70
> with that pattern.  Still, I don't think malloc_or_error (not die)
> would be a bad idea.
$ ack '\bmalloc\s*\(' src/backend/|wc -l
70

10-15 or so of those are comments.

The 70+ came from me running on some development branch with commitfest 
patches applied...

> But the error should definitely be written as:
> 
> ereport(ERROR,
> (errcode(ERRCODE_OUT_OF_MEMORY),
>  errmsg("out of memory")));
> 
> ...not elog.
Yes, definitely. Currently some of those locations (e.g. in xlog.c) are only 
protected by Asserts... 

Andres
-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] sortsupport for text

2012-06-19 Thread Kevin Grittner

Peter Geoghegan  wrote:
> Kevin Grittner  wrote:

>> I'm pretty sure that when I was using Sybase ASE the order for
>> non-equal values was always predictable, and it behaved in the
>> manner I describe below.  I'm less sure about any other product.
> 
> Maybe it used a physical row identifier as a tie-breaker? Note
> that we use ItemPointers as a tie-breaker for sorting index
> tuples.
> 
> I imagine that it was at least predictable among columns being
> sorted, if only because en_US.UTF-8 doesn't have any notion of
> equivalence (that is, it just so happens that there are no two
> strings that are equivalent but not bitwise equal). It would
> surely be impractical to do a comparison for the entire row, as
> that could be really expensive.

We weren't using en_US.UTF-8 collation (or any other "proper"
collation) on Sybase -- I'm not sure whether they even supported
proper collation sequences on the versions we used.  I'm thinking of
when we were using their "case insensitive" sorting.  I don't know
the implementation details, but the behavior was consistent with
including each character-based column twice: once in the requested
position in the ORDER BY clause but folded to a consistent case, and
again after all the columns in the ORDER BY clause in original form,
with C collation.

I wasn't aware that en_US.UTF-8 doesn't have equivalence without
equality.  I guess that surprising result in my last post is just
plain inevitable with that collation then.  Bummer.  Is there
actually anyone who finds that to be a useful behavior?  For a
collation which considered upper-case and lower-case to be
equivalent, would PostgreSQL sort as I wanted, or is it doing some
tie-break per column within equivalent values?

-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [RFC][PATCH] Logical Replication/BDR prototype and architecture

2012-06-19 Thread Andres Freund

Hi,

The most important part, even for people not following my discussion with 
Robert is at the bottom where the possible wal decoding strategies are laid 
out.

On Tuesday, June 19, 2012 03:20:58 AM Robert Haas wrote:
> On Sat, Jun 16, 2012 at 7:43 AM, Andres Freund  
wrote:
> >> > Hm. Yes, you could do that. But I have to say I don't really see a
> >> > point. Maybe the fact that I do envision multimaster systems at some
> >> > point is clouding my judgement though as its far less easy in that
> >> > case.
> >> 
> >> Why?  I don't think that particularly changes anything.
> > 
> > Because it makes conflict detection very hard. I also don't think its a
> > feature worth supporting. Whats the use-case of updating records you
> > cannot properly identify?
> Don't ask me; I just work here.  I think it's something that some
> people want, though.  I mean, if you don't support replicating a table
> without a primary key, then you can't even run pgbench in a
> replication environment.
Well, I have no problem with INSERT only tables not having a PK. And 
pgbench_history is the only pgbench table that doesn't have a pkey? And thats 
only truncated...

> Is that an important workload?  Well,
> objectively, no.  But I guarantee you that other people with more
> realistic workloads than that will complain if we don't have it.
> Absolutely required on day one?  Probably not.  Completely useless
> appendage that no one wants?  Not that, either.
Maybe. And I don't really care, so if others see that as important I am happy 
to appease them ;). 

> >> In my view, a logical replication solution is precisely one in which
> >> the catalogs don't need to be in sync.  If the catalogs have to be in
> >> sync, it's not logical replication.  ISTM that what you're talking
> >> about is sort of a hybrid between physical replication (pages) and
> >> logical replication (tuples) - you want to ship around raw binary
> >> tuple data, but not entire pages.
> > Ok, thats a valid point. Simon argued at the cluster summit that
> > everything thats not physical is logical. Which has some appeal because
> > it seems hard to agree what exactly logical rep is. So definition by
> > exclusion makes kind of sense ;)
> Well, the words are fuzzy, but I would define logical replication to
> be something which is independent of the binary format in which stuff
> gets stored on disk.  If it's not independent of the disk format, then
> you can't do heterogenous replication (between versions, or between
> products).  That precise limitation is the main thing that drives
> people to use anything other than SR in the first place, IME.
Not in mine. The main limitation I see is that you cannot write anything on 
the standby. Which sucks majorly for many things. Its pretty much impossible 
to "fix" that for SR outside of very limited cases.
While many scenarios don't need multimaster *many* need to write outside of 
the standby's replication set.

> > I think what you categorized as "hybrid logical/physical" rep solves an
> > important use-case thats very hard to solve at the moment. Before my
> > 2ndquadrant days I had several client which had huge problemsing the
> > trigger based solutions because their overhead simply was to big a
> > burden on the master. They couldn't use SR either because every
> > consuming database kept loads of local data.
> > I think such scenarios are getting more and more common.

> I think this is to some extent true, but I also think you're
> conflating two different things.  Change extraction via triggers
> introduces overhead that can be eliminated by reconstructing tuples
> from WAL in the background rather than forcing them to be inserted
> into a shadow table (and re-WAL-logged!) in the foreground.  I will
> grant that shipping the tuple as a binary blob rather than as text
> eliminates additional overehead on both ends, but it also closes off a
> lot of important use cases.  As I noted in my previous email, I think
> that ought to be a performance optimization that we do, if at all,
> when it's been proven safe, not a baked-in part of the design.  Even a
> solution that decodes WAL to text tuples and ships those around and
> reinserts the via pure SQL should be significantly faster than the
> replication solutions we have today; if it isn't, something's wrong.
Its not only the logging side which is a limitation in todays replication 
scenarios. The apply side scales even worse because its *very* hard to 
distribute it between multiple backends.

> >> The problem with that is it's going to be tough to make robust.  Users
> >> could easily end up with answers that are total nonsense, or probably
> >> even crash the server.
> > Why?
> Because the routines that decode tuples don't include enough sanity
> checks to prevent running off the end of the block, or even the end of
> memory completely.  Consider a corrupt TOAST pointer that indicates
> that there is a gigabyte of data stored in an 8kB block.  One of the
> common s

Re: [HACKERS] sortsupport for text

2012-06-19 Thread Peter Geoghegan

On 19 June 2012 18:57, Kevin Grittner  wrote:
> We weren't using en_US.UTF-8 collation (or any other "proper"
> collation) on Sybase -- I'm not sure whether they even supported
> proper collation sequences on the versions we used.  I'm thinking of
> when we were using their "case insensitive" sorting.  I don't know
> the implementation details, but the behavior was consistent with
> including each character-based column twice: once in the requested
> position in the ORDER BY clause but folded to a consistent case, and
> again after all the columns in the ORDER BY clause in original form,
> with C collation.
>
> I wasn't aware that en_US.UTF-8 doesn't have equivalence without
> equality.

Not that many do. The underlying cause of the problem back in 2005 was
the tacit assumption that none do, which presumably we got away with
for a while. I mentioned Hungarian, but it happens a bit in Swedish
too.

PostgreSQL supported Unicode before 2005, when the tie-breaker was
introduced. I know at least one Swede who used Postgres95. I just took
a look at the REL6_4 branch, and it looks much the same in 1999 as it
did in 2005, in that there is no tie-breaker after the strcoll(). Now,
that being the case, and Hungarian in particular having a whole bunch
of these equivalencies, I have to wonder if the original complainant's
problem really was diagnosed correctly. It could of had something to
do with the fact that texteq() was confused about whether it reported
equality or equivalency - it may have taken that long for the (len1 !=
len2) fastpath thing (only holds for equality, not equivalence,
despite the fact that the 2005-era strcoll() call checks equivalence
within texteq() ) to trip someone out, because texteq() would have
thereby given inconsistent answers in a very subtle way, that were not
correct either according to the Hungarian locale, nor according to
simple bitwise equality. That's mostly speculation, but the question
must be asked.

> I guess that surprising result in my last post is just
> plain inevitable with that collation then.  Bummer.  Is there
> actually anyone who finds that to be a useful behavior?

Looking at the details again and assuming a US locale, yeah, it is.
The substance of your complain holds though.

> For a collation which considered upper-case and lower-case to be
> equivalent, would PostgreSQL sort as I wanted, or is it doing some
> tie-break per column within equivalent values?

You could do that, and some people do use custom collations for
various reasons. That's obviously very much of minority interest
though. Most people will just use citext or something. However, since
citext is itself a client of varstr_cmp(), this won't help you.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Event Triggers reduced, v1

2012-06-19 Thread Robert Haas

On Fri, Jun 15, 2012 at 4:27 PM, Dimitri Fontaine
 wrote:
> Allow me to open the new season of the DML trigger series, named
> pg_event_trigger. This first episode is all about setting up the drama,
> so that next ones make perfect sense.

Comments:

1. I still think we ought to get rid of the notion of BEFORE or AFTER
(i.e. pg_event_trigger.evttype) and just make that detail part of the
event name (e.g. pg_event_trigger.evtevent).  Many easily forseeable
event types will be more like "during" rather than "before" or
"after", and for those that do have a notion of before and after, we
can have two different event names and include the word "before" or
"after" there.  I am otherwise satisfied with the schema you've
chosen.

2. I think it's important to be able to add new types of event
triggers without creating excessive parser bloat.  I think it's
important to use some kind of generic syntax here which will be able
to apply to all types of triggers we may want to add, now or in the
future.  The easiest way to do that is to use literal syntax for the
list of command tags, rather than writing them out as key words: e.g.
'ALTER TABLE' rather than ALTER TABLE.  It's not quite as pretty, but
the savings in parser bloat and future code change seem well worth it.
 Or, alternatively, we could use identifier style, e.g. alter_table,
as I previously suggested.

3. The event trigger cache seems to be a few bricks shy of a load.
First, event_trigger_cache_is_stalled is mis-named; I think you mean
"stale", not "stalled".  Second, instead of setting that flag and then
rebuilding the cache when you see the flag set, how about just blowing
away the cache contents whenever you would have set the flag?  That
seems a whole lot simpler and cleaner, and removes the need for a
force_rebuild flag on BuildEventTriggerCache().  Third, ISTM that this
isn't going to work correctly if backend A performs an event after
backend B has built its cache.  To fix this, I think you need to rip
out all the places where you force a rebuild locally and instead use
something like CacheRegisterSyscacheCallback() to blow away the cache
whenever something changes; you might find it helpful to look at
attoptcache.c.

4. The documentation doesn't build.

openjade:reference.sgml:44:4:W: cannot generate system identifier for
general entity "alterEventTrigger"
openjade:reference.sgml:44:4:E: general entity "alterEventTrigger" not
defined and no default entity
openjade:reference.sgml:44:21:E: reference to entity
"alterEventTrigger" for which no system identifier could be generated
openjade:reference.sgml:44:3: entity was defined here
openjade:reference.sgml:86:4:W: cannot generate system identifier for
general entity "createEventTrigger"
openjade:reference.sgml:86:4:E: general entity "createEventTrigger"
not defined and no default entity
openjade:reference.sgml:86:22:E: reference to entity
"createEventTrigger" for which no system identifier could be generated
openjade:reference.sgml:86:3: entity was defined here
openjade:reference.sgml:125:4:W: cannot generate system identifier for
general entity "dropEventTrigger"
openjade:reference.sgml:125:4:E: general entity "dropEventTrigger" not
defined and no default entity
openjade:reference.sgml:125:20:E: reference to entity
"dropEventTrigger" for which no system identifier could be generated
openjade:reference.sgml:125:3: entity was defined here
openjade:catalogs.sgml:1868:35:E: character "_" is not allowed in the
value of attribute "ZONE"
openjade:catalogs.sgml:1868:19:X: reference to non-existent ID
"CATALOG-PG-EVENT_TRIGGER"
openjade:trigger.sgml:43:47:X: reference to non-existent ID
"SQL-CREATECOMMANDTRIGGER"

5. In terms of a psql command, I think that \dev is both not very
mnemonic and, as you mentioned in the comment, possibly confusable
with SQL/MED.  If we're going to have a \d command for this, I suggest
something like \dy, which is not very mnemonic either but at least
seems unlikely to be confused with anything else.  Other things that
are unused include \dh, \dj, \dk, \dq, \dw, and \dz, if any of those
grab you, or a somewhat broader range of things (but still nothing
very memorable) if we include capital letters.  Or we could branch out
into punctuation, like \d& -- or things that don't begin with the
letter d, but that doesn't seem like a particularly good idea.

6. In objectaddress.c, I think that get_object_address_event_trigger
should be eliminated in favor of an additional case in
get_object_address_unqualified.

7. There are many references to command triggers that still need to be
cleaned up.  trigger.sgml still makes reference to the name command
triggers.  plpgsql.sgml also contains vestiges of the command trigger
notation, and references to some TG_* variables that I don't think
exist in this version of the patch.  event_trigger.c is identified
(twice) as cmdtrigger.c in the file header comment.  The header
comment for InsertEventTriggerTuple makes reference to pg_cmdtrigger,
as does a l

Re: [HACKERS] sortsupport for text

2012-06-19 Thread Tom Lane

"Kevin Grittner"  writes:
> I wasn't aware that en_US.UTF-8 doesn't have equivalence without
> equality.  I guess that surprising result in my last post is just
> plain inevitable with that collation then.  Bummer.  Is there
> actually anyone who finds that to be a useful behavior?  For a
> collation which considered upper-case and lower-case to be
> equivalent, would PostgreSQL sort as I wanted, or is it doing some
> tie-break per column within equivalent values?

Well, there are two different questions there.

As far as the overall structure of an ORDER BY or btree index is
concerned, all it knows is what the datatype comparator functions
tell it.  There is no such thing as a second pass to reconsider
values found to be "equal"; that's all the info there is.

As far as the behavior of the comparator function for text is concerned,
we choose to break ties reported by strcoll via strcmp.  But that's not
visible from outside the comparator, so there's no notion of an
"equivalence" behavior different from "equality".

I'm not exactly convinced that we could have a second-pass tiebreak
mechanism without creating serious problems for btree indexes; in
particular it seems like ordering considering only some leading keys
might not match the actual index ordering.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] sortsupport for text

2012-06-19 Thread Peter Geoghegan

On 19 June 2012 19:44, Peter Geoghegan  wrote:
> You could do that, and some people do use custom collations for
> various reasons. That's obviously very much of minority interest
> though. Most people will just use citext or something. However, since
> citext is itself a client of varstr_cmp(), this won't help you.

I spoke too soon - that would work fine in the U.S, since the text is
converted to lower case on-the-fly in a locale and collation aware
fashion.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pl/perl and utf-8 in sql_ascii databases

2012-06-19 Thread Alvaro Herrera


Excerpts from Robert Haas's message of mar jun 19 11:36:41 -0400 2012:
> 
> On Mon, Jun 18, 2012 at 3:30 PM, Alvaro Herrera
>  wrote:
> > Excerpts from Alex Hunsaker's message of vie feb 10 16:53:05 -0300 2012:
> >
> >> Seems like we missed the fact that we still did SvUTF8_on() in sv2cstr
> >> and SvPVUTF8() when turning a perl string into a cstring.
> >
> > Hmm, this patch belongs into back branches too, right?  Not just the
> > current development tree?
> 
> It seems like a bug fix to me.

That's what I thought.  I will commit to both branches soon, then.
Mind you, this should have been an "open item", not a commitfest item.
(Actually not even an open item.  We should have committed it right
away.)

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] use of int4/int32 in C code

2012-06-19 Thread Robert Haas

On Tue, Jun 19, 2012 at 9:47 AM, Tom Lane  wrote:
> Peter Eisentraut  writes:
>> What is the latest theory on using int4 vs. int32 in C code?
>> (equivalently int2, int16)
>
> I thought the general idea was to use int32 most places, but int4 in
> catalog declarations.  I don't think it's tremendously important if
> somebody uses the other though.

I concur with Peter that TMTOWTDI is not the right way to do this.  I
think we ought to get rid of int4 in code and use int32 everywhere.

>> While we're at it, how do we feel about using C standard types like
>> int32_t instead of (or initially in addition to) our own definitions?
>
> Can't get very excited about this either.  The most likely outcome of
> a campaign to substitute the standard types is that back-patching would
> become a truly painful activity.  IMO, anything that is going to result
> in tens of thousands of diffs had better have a more-than-cosmetic
> reason.  (That wouldn't apply if we only used int32_t in new code ...
> but then, instead of two approved ways to do it, there would be three.
> Which doesn't seem like it improves matters.)

On this one, I agree with you.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 04/16] Add embedded list interface (header only)

2012-06-19 Thread Robert Haas

On Wed, Jun 13, 2012 at 7:28 AM, Andres Freund  wrote:
> Adds a single and a double linked list which can easily embedded into other
> datastructures and can be used without any additional allocations.

dllist.h advertises that it's embeddable.  Can you use that instead,
or enhance it slightly to support what you want to do?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 10/16] Introduce the concept that wal has a 'origin' node

2012-06-19 Thread Andres Freund

On Tuesday, June 19, 2012 07:24:13 PM Robert Haas wrote:
> On Tue, Jun 19, 2012 at 12:11 PM, Tom Lane  wrote:
> > Andres Freund  writes:
> >> On Tuesday, June 19, 2012 04:30:59 PM Tom Lane wrote:
>  ...  (If you are thinking
>  of something sufficiently high-level that merging could possibly work,
>  then it's not WAL, and we shouldn't be trying to make the WAL
>  representation cater for it.)
> >> 
> >> Do you really see this as such a big problem?
> > 
> > It looks suspiciously like "I have a hammer, therefore every problem
> > must be a nail".  I don't like the design concept of cramming logical
> > replication records into WAL in the first place.
> 
> Me, neither.  I think it's necessary to try to find a way of
> generating logical replication records from WAL.  But once generated,
> I think those records should form their own stream, independent of
> WAL.  If you take the contrary position that they should be included
> in WAL, then when you filter the WAL stream down to just the records
> of interest to logical replication, you end up with a WAL stream with
> holes in it, which is one of the things that Andres listed as an
> unresolved design problem in his original email.
Yes.

> Moreover, this isn't necessary at all for single-master replication,
> or even multi-source replication where each table has a single master.
>  It's only necessary for full multi-master replication, which we have
> no consensus to include in core, and even if we did have a consensus
> to include it in core, it certainly shouldn't be the first feature we
> design.
Well, you can't blame a patch/prototype trying to implement what it claims to 
implement ;)

More seriously: Even if we don't put MM in core I think putting the basis for 
it in core so that somebody can build such a solution reusing the existing 
infrastructure is a sensible idea. Imo the only thing that requires explicit 
support which is hard to add outside of core is prevention of loops (aka this 
patch). Everything else should be doable reusing the hopefully modular pieces.

> > However, if we're dead set on doing it that way, let us put information
> > that is only relevant to logical replication records into only the
> > logical replication records.
> Right.  If we decide we need this, and if we did decide to conflate
> the WAL stream, both of which I disagree with as noted above, then we
> still don't need it on every record.  It would probably be sufficient
> for local transactions to do nothing at all (and we can implicitly
> assume that they have master node ID = local node ID) and transactions
> which are replaying remote changes to emit one record per XID per
> checkpoint cycle containing the remote node ID.
Youve gone from a pretty trivial 150 line patch without any runtime/space 
overhead to something *considerably* more complex in that case though. 

You suddently need to have relatively complex logic to remember which 
information you got for a certain xid (and forget that information afterwards) 
and whether you already logged that xid and you need to have to decide about 
logging that information at multiple places.

Btw, what do you mean with "conflating" the stream? I don't really see that 
being proposed.

Andres
-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] use of int4/int32 in C code

2012-06-19 Thread Peter Geoghegan

On 19 June 2012 20:11, Robert Haas  wrote:
> On Tue, Jun 19, 2012 at 9:47 AM, Tom Lane  wrote:
>> Peter Eisentraut  writes:
>>> What is the latest theory on using int4 vs. int32 in C code?
>>> (equivalently int2, int16)
>>
>> I thought the general idea was to use int32 most places, but int4 in
>> catalog declarations.  I don't think it's tremendously important if
>> somebody uses the other though.
>
> I concur with Peter that TMTOWTDI is not the right way to do this.  I
> think we ought to get rid of int4 in code and use int32 everywhere.
>
>>> While we're at it, how do we feel about using C standard types like
>>> int32_t instead of (or initially in addition to) our own definitions?
>>
>> Can't get very excited about this either.  The most likely outcome of
>> a campaign to substitute the standard types is that back-patching would
>> become a truly painful activity.  IMO, anything that is going to result
>> in tens of thousands of diffs had better have a more-than-cosmetic
>> reason.  (That wouldn't apply if we only used int32_t in new code ...
>> but then, instead of two approved ways to do it, there would be three.
>> Which doesn't seem like it improves matters.)
>
> On this one, I agree with you.

Yeah. I find pgindent changes annoying when doing a git blame myself.
Now, granted, you can mostly take care of that by having the tool
ignore whitespace changes, but that doesn't always work perfectly, and
I haven't been able to figure out a better way of managing that.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allow WAL information to recover corrupted pg_controldata

2012-06-19 Thread Robert Haas

On Tue, Jun 19, 2012 at 1:43 AM, Tom Lane  wrote:
> Amit Kapila  writes:
>>> AFAIR you can create pg_control from scratch already with pg_resetxlog.
>>> The hard part is coming up with values for the counters, such as the
>>> next WAL location.  Some of them such as next OID are pretty harmless
>>> if you don't guess right, but I'm worried that wrong next WAL could
>>> make things worse not better.
>
>> I believe if WAL files are proper as mentioned in Alvaro's mail, the
>> purposed logic should generate correct values.
>
> I've got a problem with the assumption that, when pg_control is trash,
> megabytes or gigabytes of WAL can still be relied on completely.
>
> I'm almost inclined to suggest that we not get next-LSN from WAL, but
> by scanning all the pages in the main data store and computing the max
> observed LSN.  This is clearly not very attractive from a performance
> standpoint, but it would avoid the obvious failure mode where you lost
> some recent WAL segments along with pg_control.

I think it could be useful to have a tool that scans all the blocks
and computes that value, but I'd want it to just print the value out
and let me decide what to do about it.  There are cases where you
don't necessarily want to clobber pg_control, but you do have future
LSNs in your data file pages.  This can be either because the disk ate
your WAL, or because you didn't create recovery.conf, or because your
disk corrupted the LSNs on the data file pages.  I'd want a tool that
could be either run on an individual file, or recursively on a
directory.

In terms of the TODO item, I haven't yet heard anyone clearly state "I
wanted to use pg_controldata but it couldn't because X so therefore we
need this patch".  Alvaro mentioned the case where pg_control is
missing altogether, but:

[rhaas pgsql]$ rm ~/pgdata/global/pg_control
[rhaas pgsql]$ postgres
postgres: could not find the database system
Expected to find it in the directory "/Users/rhaas/pgdata",
but could not open file "/Users/rhaas/pgdata/global/pg_control": No
such file or directory
[rhaas pgsql]$ pg_resetxlog ~/pgdata
pg_resetxlog: could not open file "global/pg_control" for reading: No
such file or directory
If you are sure the data directory path is correct, execute
  touch global/pg_control
and try again.
[rhaas pgsql]$ touch ~/pgdata/global/pg_control
[rhaas pgsql]$ pg_resetxlog ~/pgdata
pg_resetxlog: pg_control exists but is broken or unknown version; ignoring it
Guessed pg_control values:

First log file ID after reset:0
First log file segment after reset:   69
pg_control version number:922
Catalog version number:   201206141
Database system identifier:   5755831325641078488
Latest checkpoint's TimeLineID:   1
Latest checkpoint's full_page_writes: off
Latest checkpoint's NextXID:  0/3
Latest checkpoint's NextOID:  1
Latest checkpoint's NextMultiXactId:  1
Latest checkpoint's NextMultiOffset:  0
Latest checkpoint's oldestXID:3
Latest checkpoint's oldestXID's DB:   0
Latest checkpoint's oldestActiveXID:  0
Maximum data alignment:   8
Database block size:  8192
Blocks per segment of large relation: 131072
WAL block size:   8192
Bytes per WAL segment:16777216
Maximum length of identifiers:64
Maximum columns in an index:  32
Maximum size of a TOAST chunk:1996
Date/time type storage:   64-bit integers
Float4 argument passing:  by value
Float8 argument passing:  by value

If these values seem acceptable, use -f to force reset.
[rhaas pgsql]$ pg_resetxlog -f ~/pgdata
pg_resetxlog: pg_control exists but is broken or unknown version; ignoring it
Transaction log reset
[rhaas pgsql]$ postgres
LOG:  database system was shut down at 2012-06-19 15:25:28 EDT
LOG:  database system is ready to accept connections
LOG:  autovacuum launcher started

So I still don't understand what problem we're solving here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pl/perl and utf-8 in sql_ascii databases

2012-06-19 Thread Robert Haas

On Tue, Jun 19, 2012 at 3:03 PM, Alvaro Herrera
 wrote:
> That's what I thought.  I will commit to both branches soon, then.

I think there might be three branches involved.

> Mind you, this should have been an "open item", not a commitfest item.
> (Actually not even an open item.  We should have committed it right
> away.)

True, but it's not a perfect world, and I didn't feel qualified to
commit it myself.  Adding it the CommitFest at least prevented it from
getting lost forever.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] return values of backend sub-main functions

2012-06-19 Thread Robert Haas

On Tue, Jun 19, 2012 at 7:31 AM, Peter Eisentraut  wrote:
> On ons, 2012-01-18 at 21:21 +0200, Peter Eisentraut wrote:
>> On lör, 2012-01-07 at 16:41 -0500, Tom Lane wrote:
>> > Peter Eisentraut  writes:
>> > > I suggest that we change PostgresMain(), PostmasterMain(), BackendRun(),
>> > > WalSenderMain(), and WalSndLoop() to return void as well.
>> >
>> > I agree this code is not very consistent or useful, but one question:
>> > what should the callers do if one of these functions *does* return?
>>
>> I was thinking of a two-pronged approach:  First, add
>> __attribute__((noreturn)) to the functions.  This will cause a suitable
>> compiler to verify on a source-code level that nothing actually returns
>> from the function.  And second, at the call site, put an abort();  /*
>> not reached */.  Together, this will make the code cleaner and more
>> consistent, and will also help the compiler out a bit about the control
>> flow.
>
> Patch for 9.3 attached.

Seems reasonable on a quick read-through.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 04/16] Add embedded list interface (header only)

2012-06-19 Thread Andres Freund

On Tuesday, June 19, 2012 09:16:41 PM Robert Haas wrote:
> On Wed, Jun 13, 2012 at 7:28 AM, Andres Freund  
wrote:
> > Adds a single and a double linked list which can easily embedded into
> > other datastructures and can be used without any additional allocations.
> 
> dllist.h advertises that it's embeddable.  Can you use that instead,
> or enhance it slightly to support what you want to do?
Oh, wow. Didn't know that existed. I had $subject lying around from a play-
around project, so I didn't look too hard.
Why is that code not used more widely? Quite a bit of our list usage should be 
replaced embedding list element in larger structs imo. There are also open-
coded inline list manipulations around (check aset.c for example).

*looks*

Not really that happy with it though:

1. dllist.h has double the element overhead by having an inline value pointer 
(which is not needed when embedding) and a pointer to the list (which I have a 
hard time seing as being useful)
2. only double linked list, mine provided single and double linked ones
3. missing macros to use when embedded in a larger struct (containerof() 
wrappers and for(...) support basically)
4. most things are external function calls...
5. way much more branches/complexity in most of the functions. My 
implementation doesn't use any branches for the typical easy modifications 
(push, pop, remove element somewhere) and only one for the typical tests 
(empty, has-next, ...)

The performance and memory aspects were crucial for the aforementioned toy 
project (slab allocator for postgres). Its not that crucial for the applycache 
where the lists currently are mostly used although its also relatively 
performance sensitive and obviously does a lot of list manipulation/iteration.

If I had to decide I would add the missing api in dllist.h to my 
implementation and then remove it. Its barely used - and only in an embedded 
fashion - as far as I can see.
I can understand though if that argument is met with doubt by others ;). If 
thats the way it has to go I would add some more convenience support for 
embedding data to dllist.h and settle for that.

Greetings,

Andres

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 01/16] Overhaul walsender wakeup handling

2012-06-19 Thread Robert Haas

On Wed, Jun 13, 2012 at 7:28 AM, Andres Freund  wrote:
> From: Andres Freund 
>
> The previous coding could miss xlog writeouts at several places. E.g. when wal
> was written out by the background writer or even after a commit if
> synchronous_commit=off.
> This could lead to delays in sending data to the standby of up to 7 seconds.
>
> To fix this move the responsibility of notification to the layer where the
> neccessary information is actually present. We take some care not to do the
> notification while we hold conteded locks like WALInsertLock or WalWriteLock
> locks.

I am not convinced that it's a good idea to wake up every walsender
every time we do XLogInsert().  XLogInsert() is a super-hot code path,
and adding more overhead there doesn't seem warranted.  We need to
replicate commit, commit prepared, etc. quickly, by why do we need to
worry about a short delay in replicating heap_insert/update/delete,
for example?  They don't really matter until the commit arrives.  7
seconds might be a bit long, but that could be fixed by decreasing the
polling interval for walsender to, say, a second.

When I was doing some testing recently, the case that was sort of
confusing was the replay of AccessExcusiveLocks.  The lock didn't show
up on the standby for many seconds after it had showed up on the
master.  But that's more a feature than a bug when you really thinking
about it - postponing the lock on the slave for as long as possible
just reduces the user impact of having to take them there at all.

Parenthetically, I find it difficult to extract inline patches.  No
matter whether I try to use it using Gmail + show original or the web
site, something always seems to get garbled.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 04/16] Add embedded list interface (header only)

2012-06-19 Thread Robert Haas

On Tue, Jun 19, 2012 at 3:48 PM, Andres Freund  wrote:
> Why is that code not used more widely? Quite a bit of our list usage should be
> replaced embedding list element in larger structs imo. There are also open-
> coded inline list manipulations around (check aset.c for example).

Because we've got a million+ lines of code and nobody knows where all
the bodies are buried.

> 1. dllist.h has double the element overhead by having an inline value pointer
> (which is not needed when embedding) and a pointer to the list (which I have a
> hard time seing as being useful)
> 2. only double linked list, mine provided single and double linked ones
> 3. missing macros to use when embedded in a larger struct (containerof()
> wrappers and for(...) support basically)
> 4. most things are external function calls...
> 5. way much more branches/complexity in most of the functions. My
> implementation doesn't use any branches for the typical easy modifications
> (push, pop, remove element somewhere) and only one for the typical tests
> (empty, has-next, ...)
>
> The performance and memory aspects were crucial for the aforementioned toy
> project (slab allocator for postgres). Its not that crucial for the applycache
> where the lists currently are mostly used although its also relatively
> performance sensitive and obviously does a lot of list manipulation/iteration.
>
> If I had to decide I would add the missing api in dllist.h to my
> implementation and then remove it. Its barely used - and only in an embedded
> fashion - as far as I can see.
> I can understand though if that argument is met with doubt by others ;). If
> thats the way it has to go I would add some more convenience support for
> embedding data to dllist.h and settle for that.

I think it might be simpler to leave the name as Dllist and just
overhaul the implementation along the lines you suggest, rather than
replacing it with something completely different.  Mostly, I don't
want to add a third thing if we can avoid it, given that Dllist as it
exists today is used only lightly.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 04/16] Add embedded list interface (header only)

2012-06-19 Thread Marko Kreen

On Wed, Jun 13, 2012 at 2:28 PM, Andres Freund  wrote:
> +/*
> + * removes a node from a list
> + * Attention: O(n)
> + */
> +static inline void ilist_s_remove(ilist_s_head *head,
> +                                  ilist_s_node *node)
> +{
> +       ilist_s_node *last = &head->head;
> +       ilist_s_node *cur;
> +#ifndef NDEBUG
> +       bool found = false;
> +#endif
> +       while ((cur = last->next))
> +       {
> +               if (cur == node)
> +               {
> +                       last->next = cur->next;
> +#ifndef NDEBUG
> +                       found = true;
> +#endif
> +                       break;
> +               }
> +               last = cur;
> +       }
> +       assert(found);
> +}

This looks weird.

In cyclic list removal is:

  node->prev->next = node->next;
  node->next->prev = node->prev;

And thats it.

-- 
marko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 04/16] Add embedded list interface (header only)

2012-06-19 Thread Andres Freund

Hi,

On Tuesday, June 19, 2012 09:59:48 PM Marko Kreen wrote:
> On Wed, Jun 13, 2012 at 2:28 PM, Andres Freund  
wrote:
> > +/*
> > + * removes a node from a list
> > + * Attention: O(n)
> > + */
> > +static inline void ilist_s_remove(ilist_s_head *head,
> > +  ilist_s_node *node)
> > +{
> > +   ilist_s_node *last = &head->head;
> > +   ilist_s_node *cur;
> > +#ifndef NDEBUG
> > +   bool found = false;
> > +#endif
> > +   while ((cur = last->next))
> > +   {
> > +   if (cur == node)
> > +   {
> > +   last->next = cur->next;
> > +#ifndef NDEBUG
> > +   found = true;
> > +#endif
> > +   break;
> > +   }
> > +   last = cur;
> > +   }
> > +   assert(found);
> > +}
> 
> This looks weird.
> 
> In cyclic list removal is:
> 
>   node->prev->next = node->next;
>   node->next->prev = node->prev;
> 
> And thats it.
Thats the single linked list, not the double linked one. Thats why it has a 
O(n) warning tacked on...

The double linked one is just as you said:
/*
 * removes a node from a list
 */
static inline void ilist_d_remove(unused_attr ilist_d_head *head, ilist_d_node 
*node)
{
ilist_d_check(head);
node->prev->next = node->next;
node->next->prev = node->prev;
ilist_d_check(head);
}

Greetings,

Andres

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 01/16] Overhaul walsender wakeup handling

2012-06-19 Thread Andres Freund

On Tuesday, June 19, 2012 09:55:30 PM Robert Haas wrote:
> On Wed, Jun 13, 2012 at 7:28 AM, Andres Freund  
wrote:
> > From: Andres Freund 
> > 
> > The previous coding could miss xlog writeouts at several places. E.g.
> > when wal was written out by the background writer or even after a commit
> > if synchronous_commit=off.
> > This could lead to delays in sending data to the standby of up to 7
> > seconds.
> > 
> > To fix this move the responsibility of notification to the layer where
> > the neccessary information is actually present. We take some care not to
> > do the notification while we hold conteded locks like WALInsertLock or
> > WalWriteLock locks.
> 
> I am not convinced that it's a good idea to wake up every walsender
> every time we do XLogInsert().  XLogInsert() is a super-hot code path,
> and adding more overhead there doesn't seem warranted.  We need to
> replicate commit, commit prepared, etc. quickly, by why do we need to
> worry about a short delay in replicating heap_insert/update/delete,
> for example?  They don't really matter until the commit arrives.  7
> seconds might be a bit long, but that could be fixed by decreasing the
> polling interval for walsender to, say, a second.
Its not woken up every XLogInsert call. Its only woken up if there was an 
actual disk write + fsync in there. Thats exactly the point of the patch.
The wakeup rate is actually lower for synchronous_commit=on than before 
because then it unconditionally did a wakeup for every commit (and similar) 
and now only does that if something has been written + fsynced.

> Parenthetically, I find it difficult to extract inline patches.  No
> matter whether I try to use it using Gmail + show original or the web
> site, something always seems to get garbled.
Will use git send-mail --attach next time... Btw, git am should be able to 
extract the patches for you.

Andres
-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 04/16] Add embedded list interface (header only)

2012-06-19 Thread Marko Kreen

On Tue, Jun 19, 2012 at 11:02 PM, Andres Freund  wrote:
> On Tuesday, June 19, 2012 09:59:48 PM Marko Kreen wrote:
>> On Wed, Jun 13, 2012 at 2:28 PM, Andres Freund 
> wrote:
>> > +/*
>> > + * removes a node from a list
>> > + * Attention: O(n)
>> > + */
>> > +static inline void ilist_s_remove(ilist_s_head *head,
>> > +                                  ilist_s_node *node)
>> > +{
>> > +       ilist_s_node *last = &head->head;
>> > +       ilist_s_node *cur;
>> > +#ifndef NDEBUG
>> > +       bool found = false;
>> > +#endif
>> > +       while ((cur = last->next))
>> > +       {
>> > +               if (cur == node)
>> > +               {
>> > +                       last->next = cur->next;
>> > +#ifndef NDEBUG
>> > +                       found = true;
>> > +#endif
>> > +                       break;
>> > +               }
>> > +               last = cur;
>> > +       }
>> > +       assert(found);
>> > +}
>>
>> This looks weird.
>>
>> In cyclic list removal is:
>>
>>   node->prev->next = node->next;
>>   node->next->prev = node->prev;
>>
>> And thats it.
> Thats the single linked list, not the double linked one. Thats why it has a
> O(n) warning tacked on...

Oh, you have several list implementations there.
Sorry, I was just browsing and it caught my eye.

-- 
marko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 04/16] Add embedded list interface (header only)

2012-06-19 Thread Andres Freund

On Tuesday, June 19, 2012 09:58:43 PM Robert Haas wrote:
> On Tue, Jun 19, 2012 at 3:48 PM, Andres Freund  
wrote:
> > Why is that code not used more widely? Quite a bit of our list usage
> > should be replaced embedding list element in larger structs imo. There
> > are also open- coded inline list manipulations around (check aset.c for
> > example).
> Because we've got a million+ lines of code and nobody knows where all
> the bodies are buried.
Heh, yes ;). Ive hit several times as you uncovered at least twice ;)

> > 1. dllist.h has double the element overhead by having an inline value
> > pointer (which is not needed when embedding) and a pointer to the list
> > (which I have a hard time seing as being useful)
> > 2. only double linked list, mine provided single and double linked ones
> > 3. missing macros to use when embedded in a larger struct (containerof()
> > wrappers and for(...) support basically)
> > 4. most things are external function calls...
> > 5. way much more branches/complexity in most of the functions. My
> > implementation doesn't use any branches for the typical easy
> > modifications (push, pop, remove element somewhere) and only one for the
> > typical tests (empty, has-next, ...)
> > 
> > The performance and memory aspects were crucial for the aforementioned
> > toy project (slab allocator for postgres). Its not that crucial for the
> > applycache where the lists currently are mostly used although its also
> > relatively performance sensitive and obviously does a lot of list
> > manipulation/iteration.
> > 
> > If I had to decide I would add the missing api in dllist.h to my
> > implementation and then remove it. Its barely used - and only in an
> > embedded fashion - as far as I can see.
> > I can understand though if that argument is met with doubt by others ;).
> > If thats the way it has to go I would add some more convenience support
> > for embedding data to dllist.h and settle for that.
> 
> I think it might be simpler to leave the name as Dllist and just
> overhaul the implementation along the lines you suggest, rather than
> replacing it with something completely different.  Mostly, I don't
> want to add a third thing if we can avoid it, given that Dllist as it
> exists today is used only lightly.
Well, if its the name, I have no problem with changing it, but I don't see how 
you can keep the api as it currently is and address my points.

If there is some buyin I can try to go either way (keeping the existing name, 
changing the api, adjusting the callers or just adjust the callers, throw away 
the old implementation) I just don't want to get into that just to see 
somebody isn't agreeing with the fundamental idea.

The most contentious point is probably relying on USE_INLINE being available 
anywhere. Which I believe to be the point now that we have gotten rid of some 
platforms.

Andres
-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 10/16] Introduce the concept that wal has a 'origin' node

2012-06-19 Thread Kevin Grittner

Andres Freund  wrote:
> Robert Haas wrote:
>> Tom Lane  wrote:

>>> However, if we're dead set on doing it that way, let us put
>>> information that is only relevant to logical replication records
>>> into only the logical replication records.
>> Right.  If we decide we need this, and if we did decide to
>> conflate the WAL stream, both of which I disagree with as noted
>> above, then we still don't need it on every record.  It would
>> probably be sufficient for local transactions to do nothing at
>> all (and we can implicitly assume that they have master node ID =
>> local node ID) and transactions which are replaying remote
>> changes to emit one record per XID per checkpoint cycle
>> containing the remote node ID.
> Youve gone from a pretty trivial 150 line patch without any
> runtime/space overhead to something *considerably* more complex in
> that case though. 

I think it might be worth it.  I've done a lot of MM replication,
and so far have not had to use a topology which allowed loops.  Not
only would you be reserving space in the WAL stream which was not
useful for those not using MM replication, you would be reserving it
when even many MM configurations would not need it.  Now you could
argue that the 16 bits you want to use are already there and are not
yet used for anything; but there are two counter-arguments to that:
you lose the opportunity to use them for something else, and you
might want more than 16 bits.

-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] performance regression in 9.2 when loading lots of small tables

2012-06-19 Thread Robert Haas

On Mon, Jun 18, 2012 at 8:42 PM, Jeff Janes  wrote:
> There was a regression introduced in 9.2 that effects the creation and
> loading of lots of small tables in a single transaction.
>
> It affects the loading of a pg_dump file which has a large number of
> small tables (10,000 schemas, one table per schema, 10 rows per
> table).  I did not test other schema configurations, so these
> specifics might not be needed to invoke the problem.
>
> It causes the loading of a dump with "psql -1 -f " to run at half the
> previous speed.  Speed of loading without the -1 is not changed.

I tried to figure out why this was happening.  I tried it out on the
IBM POWER7 box after building from
f5297bdfe4c4a47376c41b96161fb55c2294a0b1 and it ran for about 14
minutes.  'time' reports that psql used 38 seconds of user time and 11
seconds of system time.  The remaining time was presumably spent
waiting for the backend and/or the kernel.

I ran the backend under strace -cf and it had this to say:

% time seconds  usecs/call callserrors syscall
-- --- --- - - 
 45.31   39.888639  10   3836379   write
 18.80   16.553878   3   4929697   lseek
 13.51   11.890826   6   1906179 12819 read
 10.249.011690   7   1372354  5995 recv
  6.275.517929   5   1107766   brk
  2.071.818345  91 20068   fsync
  1.461.281999  15 87260   send
  1.151.015621  24 42527 11334 open

I did a separate run using perf and it had this to say:

Events: 1M cycles
+  13.18% postgres  postgres   [.] FlushRelationBuffers
+   9.65% postgres  postgres   [.] comparetup_index_btree
+   9.34%  swapper  [kernel.kallsyms]  [k]
.pseries_dedicated_idle_sleep
+   4.41% postgres  postgres   [.] CopyReadLine
+   4.23% postgres  postgres   [.] tuplesort_heap_siftup
+   4.17% postgres  postgres   [.] LockReassignCurrentOwner
+   3.88% postgres  postgres   [.] pg_mbstrlen_with_len
+   2.74% postgres  postgres   [.] pg_verify_mbstr_len
+   2.46% postgres  postgres   [.] NextCopyFromRawFields
+   2.44% postgres  postgres   [.] btint4cmp
+   2.08% postgres  libc-2.14.90.so[.] memcpy
+   2.06% postgres  postgres   [.] hash_seq_search
+   1.55% postgres  [kernel.kallsyms]  [k] .__copy_tofrom_user
+   1.29% postgres  libc-2.14.90.so[.]
__GI_strtoll_l_internal
+   1.16% postgres  postgres   [.] pg_utf_mblen

There are certainly opportunities for optimization there but it's sure
not obvious to me what it has to do with either of the commits you
mention.  I haven't actually verified that there's a regression on
this box as result of either of those commits, though: I just profiled
master.  Maybe I better go check that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 10/16] Introduce the concept that wal has a 'origin' node

2012-06-19 Thread Marko Kreen

On Mon, Jun 18, 2012 at 6:35 PM, Simon Riggs  wrote:
> On 13 June 2012 19:28, Andres Freund  wrote:
>> This adds a new configuration parameter multimaster_node_id which determines
>> the id used for wal originating in one cluster.
>
> Looks good and it seems this aspect at least is commitable in this CF.
>
> Design decisions I think we need to review are
>
> * Naming of field. I think origin is the right term, borrowing from Slony.

I have not read too deeply here, so maybe I am missing
some important detail here, but idea that users need
to coordinate a integer config parameter globally does not
sound too attractive to me.

Why not limit integers to local storage only and map
them to string idents on config, UI and transport?

-- 
marko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Near-duplicate RI NO ACTION and RESTRICT triggers

2012-06-19 Thread Dean Rasheed

On 19 June 2012 17:48, Tom Lane  wrote:
> I think that the argument for having the RESTRICT triggers behave
> like this is that the SQL spec envisions the RESTRICT check occurring
> immediately when the individual PK row is updated/deleted, and so there
> would be no opportunity for another PK row to be updated into its place.
> (Or, in plainer English, RESTRICT should mean "you can't modify this
> row's keys at all if it has dependents".)  Because we implement RESTRICT
> through an AFTER trigger that can't run earlier than end-of-statement,
> we can't exactly match the spec's semantics, but we can get fairly
> close so long as you don't think about what would be seen by
> e.g. user-written triggers executing during the statement.
>
> I'm happy with continuing to have this behavioral difference between
> the two sets of triggers, but wanted to throw it up for discussion:
> does anyone think it'd be better to apply ri_Check_Pk_Match in the
> RESTRICT triggers too?
>

In SQL:2008 they've re-worded the descriptions of these actions and
added an explicit note to clarify the intended difference:

"""
— ON UPDATE RESTRICT: any change to a referenced column in the
referenced table is prohibited if there
is a matching row.
— ON UPDATE NO ACTION (the default): there is no referential update
action; the referential constraint
only specifies a constraint check.

NOTE 38 — Even if constraint checking is not deferred, ON UPDATE
RESTRICT is a stricter condition than ON UPDATE NO
ACTION. ON UPDATE RESTRICT prohibits an update to a particular row if
there are any matching rows; ON UPDATE NO
ACTION does not perform its constraint check until the entire set of
rows to be updated has been processed.
"""

and there's a similar note in the DELETE case. So I think the current
behaviour is correct, and there are probably genuine use cases for
both types of check.

Regards,
Dean

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Testing 9.2 in ~production environment

2012-06-19 Thread Peter Eisentraut

On tis, 2012-06-19 at 09:33 -0400, Tom Lane wrote:
> Come to think of it, another possible factor is that LIKE can't use
> ordinary indexes on text if the locale isn't C.

But he reported that the plans are the same.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [RFC][PATCH] Logical Replication/BDR prototype and architecture

2012-06-19 Thread Kevin Grittner

Andres Freund  wrote:

> The problem is just that to support basically arbitrary decoding
> requirements you need to provide at least those pieces of
> information in a transactionally consistent manner:
> * the data
> * table names
> * column names
> * type information
> * replication configuration

I'm not sure that the last one needs to be in scope for the WAL
stream, but the others I definitely agree eventually need to be
available to a logical transaction stream consumer.  You lay out the
alternative ways to get all of this pretty clearly, and I don't know
what the best answer is; it seems likely that there is not one best
answer.  In the long run, more than one of those options might need
to be supported, to support different environments.

As an initial implementation, I'm leaning toward the position that
requiring a hot standby or a catalog-only proxy is acceptable.  I
think that should allow an application to be written which emits
everything except the replication configuration.  That will allow us
to hook up everything we need at our shop.

-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] performance regression in 9.2 when loading lots of small tables

2012-06-19 Thread Robert Haas

On Tue, Jun 19, 2012 at 4:33 PM, Robert Haas  wrote:
> On Mon, Jun 18, 2012 at 8:42 PM, Jeff Janes  wrote:
>> There was a regression introduced in 9.2 that effects the creation and
>> loading of lots of small tables in a single transaction.
>>
>> It affects the loading of a pg_dump file which has a large number of
>> small tables (10,000 schemas, one table per schema, 10 rows per
>> table).  I did not test other schema configurations, so these
>> specifics might not be needed to invoke the problem.
>>
>> It causes the loading of a dump with "psql -1 -f " to run at half the
>> previous speed.  Speed of loading without the -1 is not changed.
>
> I tried to figure out why this was happening.  I tried it out on the
> IBM POWER7 box after building from
> f5297bdfe4c4a47376c41b96161fb55c2294a0b1 and it ran for about 14
> minutes.  'time' reports that psql used 38 seconds of user time and 11
> seconds of system time.  The remaining time was presumably spent
> waiting for the backend and/or the kernel.
>
> I ran the backend under strace -cf and it had this to say:
>
> % time     seconds  usecs/call     calls    errors syscall
> -- --- --- - - 
>  45.31   39.888639          10   3836379           write
>  18.80   16.553878           3   4929697           lseek
>  13.51   11.890826           6   1906179     12819 read
>  10.24    9.011690           7   1372354      5995 recv
>  6.27    5.517929           5   1107766           brk
>  2.07    1.818345          91     20068           fsync
>  1.46    1.281999          15     87260           send
>  1.15    1.015621          24     42527     11334 open
>
> I did a separate run using perf and it had this to say:
>
> Events: 1M cycles
> +  13.18%         postgres  postgres               [.] FlushRelationBuffers
> +   9.65%         postgres  postgres               [.] comparetup_index_btree
> +   9.34%          swapper  [kernel.kallsyms]      [k]
> .pseries_dedicated_idle_sleep
> +   4.41%         postgres  postgres               [.] CopyReadLine
> +   4.23%         postgres  postgres               [.] tuplesort_heap_siftup
> +   4.17%         postgres  postgres               [.] 
> LockReassignCurrentOwner
> +   3.88%         postgres  postgres               [.] pg_mbstrlen_with_len
> +   2.74%         postgres  postgres               [.] pg_verify_mbstr_len
> +   2.46%         postgres  postgres               [.] NextCopyFromRawFields
> +   2.44%         postgres  postgres               [.] btint4cmp
> +   2.08%         postgres  libc-2.14.90.so        [.] memcpy
> +   2.06%         postgres  postgres               [.] hash_seq_search
> +   1.55%         postgres  [kernel.kallsyms]      [k] .__copy_tofrom_user
> +   1.29%         postgres  libc-2.14.90.so        [.]
> __GI_strtoll_l_internal
> +   1.16%         postgres  postgres               [.] pg_utf_mblen
>
> There are certainly opportunities for optimization there but it's sure
> not obvious to me what it has to do with either of the commits you
> mention.  I haven't actually verified that there's a regression on
> this box as result of either of those commits, though: I just profiled
> master.  Maybe I better go check that.

I built REL9_1_STABLE from commit
1643031e5fe02d2de9ae6b8f86ef9ffd09fe7d3f and it took 19m45.250s, even
slower than master.  So something is different in my environment
versus yours.

I ran this build with perf also, and got this:

38.77% postgres  postgres   [.] FlushRelationBuffers
 6.80%  swapper  [kernel.kallsyms]  [k]
.pseries_dedicated_idle_sleep
 4.84% postgres  postgres   [.] comparetup_index_btree
 3.23% postgres  postgres   [.] CopyReadLine
 2.24% postgres  postgres   [.] tuplesort_heap_siftup
 2.23% postgres  postgres   [.] pg_mbstrlen_with_len
 2.08% postgres  postgres   [.] LockReassignCurrentOwner
 2.01% postgres  postgres   [.] pg_verify_mbstr_len
 1.90% postgres  postgres   [.] NextCopyFromRawFields
 1.62% postgres  postgres   [.] btint4cmp
 1.41% postgres  libc-2.14.90.so[.] memcpy
 1.39% postgres  postgres   [.] LWLockAcquire
 1.22% postgres  postgres   [.] LWLockRelease
 1.19% postgres  postgres   [.] pg_utf_mblen
 1.05% postgres  [kernel.kallsyms]  [k] .__copy_tofrom_user

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 10/16] Introduce the concept that wal has a 'origin' node

2012-06-19 Thread Simon Riggs

On 20 June 2012 04:58, Marko Kreen  wrote:
> On Mon, Jun 18, 2012 at 6:35 PM, Simon Riggs  wrote:
>> On 13 June 2012 19:28, Andres Freund  wrote:
>>> This adds a new configuration parameter multimaster_node_id which determines
>>> the id used for wal originating in one cluster.
>>
>> Looks good and it seems this aspect at least is commitable in this CF.
>>
>> Design decisions I think we need to review are
>>
>> * Naming of field. I think origin is the right term, borrowing from Slony.
>
> I have not read too deeply here, so maybe I am missing
> some important detail here, but idea that users need
> to coordinate a integer config parameter globally does not
> sound too attractive to me.
>
> Why not limit integers to local storage only and map
> them to string idents on config, UI and transport?

Yes, that can be done. This is simply how the numbers are used
internally, similar to oids.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Backport of fsync queue compaction

2012-06-19 Thread Robert Haas

On Tue, Jun 19, 2012 at 5:33 PM, Greg Smith  wrote:
> In January of 2011 Robert committed 7f242d880b5b5d9642675517466d31373961cf98
> to try and compact the fsync queue when clients find it full.  There's no
> visible behavior change, just a substantial performance boost possible in
> the rare but extremely bad situations where the background writer stops
> doing fsync absorption.  I've been running that in production at multiple
> locations since practically the day it hit this mailing list, with backports
> all the way to 8.3 being common (and straightforward to construct).  I've
> never seen a hint of a problem with this new code.

I've been in favor of back-porting this for a while, so you'll get no
argument from me.

Anyone disagree?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] use of int4/int32 in C code

2012-06-19 Thread Tom Lane

Robert Haas  writes:
> On Tue, Jun 19, 2012 at 9:47 AM, Tom Lane  wrote:
>> I thought the general idea was to use int32 most places, but int4 in
>> catalog declarations.  I don't think it's tremendously important if
>> somebody uses the other though.

> I concur with Peter that TMTOWTDI is not the right way to do this.  I
> think we ought to get rid of int4 in code and use int32 everywhere.

I have not looked to see how many places do that.  If it's a reasonably
small number of places, I'm OK with getting rid of int4 at the C level.
(int2/int8 the same of course.)

If we are going to do that, though, we need to actually remove those
typedefs.  Leaving them around on the grounds that third-party code
might be using them will just allow cases to creep back in.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 10/16] Introduce the concept that wal has a 'origin' node

2012-06-19 Thread Robert Haas

On Tue, Jun 19, 2012 at 3:18 PM, Andres Freund  wrote:
> More seriously: Even if we don't put MM in core I think putting the basis for
> it in core so that somebody can build such a solution reusing the existing
> infrastructure is a sensible idea. Imo the only thing that requires explicit
> support which is hard to add outside of core is prevention of loops (aka this
> patch). Everything else should be doable reusing the hopefully modular pieces.

I don't think prevention of loops is hard to do outside of core
either, unless you insist on tying "logical" replication so closely to
WAL that anyone doing MMR is necessarily getting the change stream
from WAL.  In fact, I'd go so far as to say that the ONLY part of this
that's hard to do outside of core is change extraction.  Even
low-level apply can be implemented as a loadable module.

>> Right.  If we decide we need this, and if we did decide to conflate
>> the WAL stream, both of which I disagree with as noted above, then we
>> still don't need it on every record.  It would probably be sufficient
>> for local transactions to do nothing at all (and we can implicitly
>> assume that they have master node ID = local node ID) and transactions
>> which are replaying remote changes to emit one record per XID per
>> checkpoint cycle containing the remote node ID.
> Youve gone from a pretty trivial 150 line patch without any runtime/space
> overhead to something *considerably* more complex in that case though.
>
> You suddently need to have relatively complex logic to remember which
> information you got for a certain xid (and forget that information afterwards)
> and whether you already logged that xid and you need to have to decide about
> logging that information at multiple places.

You need a backend-local hash table inside the wal reader process, and
that hash table needs to map XIDs to node IDs.  And you occasionally
need to prune it, so that it doesn't eat too much memory.  None of
that sounds very hard.

> Btw, what do you mean with "conflating" the stream? I don't really see that
> being proposed.

It seems to me that you are intent on using the WAL stream as the
logical change stream.  I think that's a bad design.  Instead, you
should extract changes from WAL and then ship them around in a format
that is specific to logical replication.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Backport of fsync queue compaction

2012-06-19 Thread Christopher Browne

On Tue, Jun 19, 2012 at 5:39 PM, Robert Haas  wrote:
> On Tue, Jun 19, 2012 at 5:33 PM, Greg Smith  wrote:
>> In January of 2011 Robert committed 7f242d880b5b5d9642675517466d31373961cf98
>> to try and compact the fsync queue when clients find it full.  There's no
>> visible behavior change, just a substantial performance boost possible in
>> the rare but extremely bad situations where the background writer stops
>> doing fsync absorption.  I've been running that in production at multiple
>> locations since practically the day it hit this mailing list, with backports
>> all the way to 8.3 being common (and straightforward to construct).  I've
>> never seen a hint of a problem with this new code.
>
> I've been in favor of back-porting this for a while, so you'll get no
> argument from me.
>
> Anyone disagree?

I recall reviewing that; it seemed like quite a good change.  Me likes.
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Backport of fsync queue compaction

2012-06-19 Thread Alvaro Herrera


Excerpts from Robert Haas's message of mar jun 19 17:39:46 -0400 2012:
> On Tue, Jun 19, 2012 at 5:33 PM, Greg Smith  wrote:
> > In January of 2011 Robert committed 7f242d880b5b5d9642675517466d31373961cf98
> > to try and compact the fsync queue when clients find it full.  There's no
> > visible behavior change, just a substantial performance boost possible in
> > the rare but extremely bad situations where the background writer stops
> > doing fsync absorption.  I've been running that in production at multiple
> > locations since practically the day it hit this mailing list, with backports
> > all the way to 8.3 being common (and straightforward to construct).  I've
> > never seen a hint of a problem with this new code.
> 
> I've been in favor of back-porting this for a while, so you'll get no
> argument from me.

+1.  I even thought we had already backported it and was surprised to
discover we hadn't, when we had this problem at a customer, not long
ago.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Transactions over pathological TCP connections

2012-06-19 Thread Leon Smith

On Tue, Jun 19, 2012 at 11:59 AM, Robert Haas  wrote:

> On Tue, Jun 19, 2012 at 1:56 AM, Tom Lane  wrote:
> > The transaction would be committed before a command success report is
> > delivered to the client, so I don't think delivered-and-not-marked is
> > possible.
>
> ...unless you have configured synchronous_commit=off, or fsync=off.
>
> Or unless your disk melts into a heap of slag and you have to restore
> from backup.  You can protect against that last case using synchronous
> replication.
>

But hard disk failure isn't in the failure model I was concerned about.
=)To be perfectly honest,  I'm not too concerned with either hard drive
failure or network failure,   as we are deploying on Raid 1+0 database
server talking to the client over a redundant LAN,  and using asynchronous
(Slony) replication to an identical database server just in case.   No
single point of failure is a key philosophy of this app from top to bottom.
Like I said,  this is mostly idle curiosity.

But I'm also accustomed to trying to get work done on shockingly unreliable
internet connections.   As a result, network failure is something I think
about quite a lot when writing networked applications.   So this is not
entirely idle curiosity either.

And thinking about this a bit more,   it's clear that the database has to
commit before the result is sent,  on the off chance that the transaction
fails and needs to be retried.   And that an explicit transaction block
isn't really a solution either,  because   a "BEGIN;  SELECT dequeue_row()"
 would get the row to the client without marking it as taken,   but the
pathological TCP disconnect could then attack the  following "COMMIT;",
 leaving the client to think that the row has not been actually taken when
it in fact has.

It's not clear to me that this is even a solvable problem without modifying
the schema to include both a "taken" and a "finished processing" state,
 and then letting elements be re-delievered after a period of time.   But
this would then allow a pathological demon with the power to cause TCP
connects have a single element delivered and processed multiple times.

In any case, thanks for the responses...

Best,
Leon

Re: [HACKERS] [PATCH 10/16] Introduce the concept that wal has a 'origin' node

2012-06-19 Thread Simon Riggs

On 19 June 2012 14:03, Tom Lane  wrote:

> "Every WAL record"?  Why in heck would you attach it to every record?
> Surely putting it in WAL page headers would be sufficient.  We could
> easily afford to burn a page switch (if not a whole segment switch)
> when changing masters.

This does appear to be a reasonable idea at first glance, since it
seems that each node has just a single node id, but that is not the
case.

As we pass changes around we maintain the same origin id for a change,
so there is a mix of origin node ids at the WAL record level, not the
page level. The concept of originating node id is essentially same as
that used in Slony.

> I'm against the idea of eating any spare space we have in WAL record
> headers for this purpose, anyway; there are likely to be more pressing
> needs in future.

Not sure what those pressing needs are, but I can't see any. What we
are doing here is fairly important, just not as important as crash
recovery. But then that has worked pretty much unchanged for some time
now.

I raised the possibility of having variable length headers, but there
is no requirement for that yet.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] use of int4/int32 in C code

2012-06-19 Thread Kevin Grittner

Tom Lane  wrote:
 
> I have not looked to see how many places do that.  If it's a
reasonably
> small number of places, I'm OK with getting rid of int4 at the C
level.
> (int2/int8 the same of course.)
 
$ find -name '*.h' -or -name '*.c' | egrep -v '/tmp_check/' | xargs cat
\

  | egrep -c '\bint2\b'
178

  | egrep -c '\bint4\b'
570

  | egrep -c '\bint8\b'
212
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_restore logging inconsistency

2012-06-19 Thread Alvaro Herrera


Excerpts from Josh Kupershmidt's message of mié may 30 14:55:12 -0400 2012:
> Hi all,
> 
> Bosco Rama recently complained[1] about not seeing a message printed
> by pg_restore for each LO to be restored. The culprit seems to be the
> different level passed to ahlog() for this status message:
> 
> pg_backup_archiver.c:ahlog(AH, 2, "restoring large object with OID %u\n", 
> oid);
> pg_backup_tar.c:ahlog(AH, 1, "restoring large object OID 
> %u\n", oid);
> 
> depending on whether one is restoring a tar-format or custom-format
> dump. I think these messages should be logged at the same level, to
> avoid this inconsistency. The attached patch logs them both with
> level=1, and makes the message texts identical. Note, as of 9.0 there
> is already a line like this printed for each LO:
> 
> pg_restore: executing BLOB 135004
> 
> so I could see the argument for instead wanting to hide the "restoring
> large object" messages. However, the OP was interested in seeing
> something like a status indicator for the lo_write() calls which may
> take a long time, and the above message isn't really helpful for that
> purpose as it is printed earlier in the restore process. Plus it seems
> reasonable to make verbose mode, well, verbose.

I applied this patch all the way back to 8.3.  Thanks.

-- 
Álvaro Herrera 
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 10/16] Introduce the concept that wal has a 'origin' node

2012-06-19 Thread Christopher Browne

On Tue, Jun 19, 2012 at 5:46 PM, Robert Haas  wrote:
>> Btw, what do you mean with "conflating" the stream? I don't really see that
>> being proposed.
>
> It seems to me that you are intent on using the WAL stream as the
> logical change stream.  I think that's a bad design.  Instead, you
> should extract changes from WAL and then ship them around in a format
> that is specific to logical replication.

Yeah, that seems worth elaborating on.

What has been said several times is that it's pretty necessary to
capture the logical changes into WAL.  That seems pretty needful, in
order that the replication data gets fsync()ed avidly, and so that we
don't add in the race condition of needing to fsync() something *else*
almost exactly as avidly as is the case for WAL today..

But it's undesirable to pull *all* the bulk of contents of WAL around
if it's only part of the data that is going to get applied.  On a
"physical streaming" replica, any logical data that gets captured will
be useless.  And on a "logical replica," they "physical" bits of WAL
will be useless.

What I *want* you to mean is that there would be:
a) WAL readers that pull the "physical bits", and
b) WAL readers that just pull "logical bits."

I expect it would be fine to have a tool that pulls LCRs out of WAL to
prepare that to be sent to remote locations.  Is that what you have in
mind?  Or are you feeling that the "logical bits" shouldn't get
captured in WAL altogether, so we need to fsync() them into a
different stream of files?
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 10/16] Introduce the concept that wal has a 'origin' node

2012-06-19 Thread Simon Riggs

On 20 June 2012 05:46, Robert Haas  wrote:

> It seems to me that you are intent on using the WAL stream as the
> logical change stream.  I think that's a bad design.  Instead, you
> should extract changes from WAL and then ship them around in a format
> that is specific to logical replication.

The proposal is to read the WAL in order to generate LCRs. The
information needs to be in the original data in order to allow it to
be passed onwards.

In a multi-master config there will be WAL records with many origin
node ids at any time, so this information must be held at WAL record
level not page level.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 10/16] Introduce the concept that wal has a 'origin' node

2012-06-19 Thread Andres Freund

On Tuesday, June 19, 2012 10:58:44 PM Marko Kreen wrote:
> On Mon, Jun 18, 2012 at 6:35 PM, Simon Riggs  wrote:
> > On 13 June 2012 19:28, Andres Freund  wrote:
> >> This adds a new configuration parameter multimaster_node_id which
> >> determines the id used for wal originating in one cluster.
> > 
> > Looks good and it seems this aspect at least is commitable in this CF.
> > 
> > Design decisions I think we need to review are
> > 
> > * Naming of field. I think origin is the right term, borrowing from
> > Slony.
> 
> I have not read too deeply here, so maybe I am missing
> some important detail here, but idea that users need
> to coordinate a integer config parameter globally does not
> sound too attractive to me.
> 
> Why not limit integers to local storage only and map
> them to string idents on config, UI and transport?
That should be possible. I don't want to go there till we have the base stuff 
in but it shouldn't be too hard.

Greetings,

Andres
-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] use of int4/int32 in C code

2012-06-19 Thread Tom Lane

"Kevin Grittner"  writes:
> Tom Lane  wrote:
>> I have not looked to see how many places do that.  If it's a reasonably
>> small number of places, I'm OK with getting rid of int4 at the C level.
>> (int2/int8 the same of course.)

> $ find -name '*.h' -or -name '*.c' | egrep -v '/tmp_check/' | xargs cat
>   | egrep -c '\bint4\b'
> 570

That one looks like a lot, but a quick eyeball check suggests that many
of them are in comments and/or embedded SQL fragments where they
actually represent references to the SQL datatype, and so should be
left alone.  The impression I get is that the actual typedef uses are
concentrated in a couple of contrib modules and some of the tsearch
stuff, where they'd probably not be that painful to fix.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 10/16] Introduce the concept that wal has a 'origin' node

2012-06-19 Thread Simon Riggs

On 20 June 2012 04:31, Kevin Grittner  wrote:

> I've done a lot of MM replication,
> and so far have not had to use a topology which allowed loops.

The proposal is to use WAL to generate the logical change stream. That
has been shown in testing to be around x4 faster than having a
separate change stream, which must also be WAL logged (as Jan noted).

If we use WAL in this way, multi-master implies that the data will
*always* be in a loop. So in any configuration we must be able to tell
difference between changes made by one node and another.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

1 2 >

1 - 100 of 130 matches

Mail list logo