date:20131015

Re: [HACKERS] Triggers on foreign tables

2013-10-15 Thread Ronan Dunklau


> Sorry, I might call it something like primary key, instead of 'tupleid'.
> Apart from whether we can uniquely identify a particular remote record with
> an attribute, what FDW needs here is "something to identify a remote
> record". So, we were talking about same concept with different names.

Ah, that makes sense: I was understanding tupleid as a synonym for ctid.


> >> Does the incomplete tuple mean a tuple image but some of columns
> >> are replaced with NULL due to optimization, as postgres_fdw is doing,
> >> doesn't it?
> >> If so, a solution is to enforce FDW driver to fetch all the columns if
> >> its
> >> managing foreign table has row level triggers for update / delete.
> > 
> > What I meant is that a DELETE statement, for example, does not build a
> > scan- stage reltargetlist consisting of the whole set of attributes: the
> > only attributes that are fetched are the ones built by
> > addForeignUpdateTargets, and whatever additional attributes are involved
> > in the DELETE statement (in the WHERE clause, for example).
> 
> DELETE statement indeed does not (need to) construct a complete tuple
> image on scan stage, however, it does not prohibit FDW driver to keep the
> latest record being fetched from remote server.
> FDW driver easily knows whether relation has row-level trigger preliminary,
> so here is no matter to keep a complete image internally if needed.
> Or, it is perhaps available to add additional target-entries that is
> needed to construct a complete tuple using AddForeignUpdateTargets()
> callback.

I think that the additional target-entries should be added in core, because 
that would require additional work from FDW drivers, and the code would be 
duplicated amongst all FDW drivers that wish to support triggers.


> > I like this idea, but I don't really see all the implications of such a
> > design.
> > Where would this tuple be stored ?
> > From what I understand, after-triggers are queued for later execution,
> > using the old/new tupleid for later retrieval (in the
> > AfterTriggerEventData struct). This would need a major change to work
> > with foreign tables. Would that involve a special path for executing
> > those triggers ASAP ?
> 
> Ops, I oversight here. AfterTriggerSaveEvent() indeed saves only ctid of
> tuples in regular tables, because it can re-construct a complete tuple
> image from a particular ctid any time.
> On the other hand, it is not available on foreign table due to its nature.
> All we can do seems to me, probably, to save copy of before/after tuple
> image on local memory, even though it consumes much memory than
> regular tables.

Or, as suggested above, add a special code path for foreign tables which would 
execute the trigger as soon as possible instead of queuing it.  

> The feature we need here might become a bit clear little by little.
> A complete tuple image shall be fetched on the scan stage, if foreign
> table has row-level trigger. The planSlot may make sense if it contains
> (junk) attributes that is sufficient to re-construct an old tuple image.
> Target-list of DELETE statement contains a junk ctid attribute. Here
> is no reason why we cannot add junk attributes that reference each
> regular attribute, isn't it? Also, target-list of UPDATE statement
> contains new tuple image, however, it is available to add junk attributes
> that just reference old value, as a carrier of old values from scan stage
> to modify stage.
> I want core PostgreSQL to support a part of these jobs. For example,
> it may be able to add junk attribute to re-construct old tuple image on
> rewriteTargetListUD(), if target relation has row-level triggers. Also, it
> may be able to extract these old values from planSlot to construct
> an old tuple image on GetTupleForTrigger(), if relation is foreign table.

So, if I understand you, the target list would be built as follow:


 - core builds the target list, including every attribute
 - this target list is updated by the FDW to add any junk attributes it wishes 
to use 
 - in the case of an UPDATE, core duplicates the whole list of attributes 
(including the added junk attributes), as another set of junk attributes. 
Maybe we could duplicate only the updated attributes.
 


> Unfortunately, I have no special idea to handle old/new remote tuple
> on AfterTriggerSaveEvent(), except for keeping it as copy, but it is
> straightforward.

Maybe avoid it altogether in the case of a trigger on a remote foreign table ?

> 
> >> And, I also want some comments from committers, not only from mine.
> > 
> > +1
> 
> +1
> 
> Thanks,

signature.asc
Description: This is a digitally signed message part.

Re: [HACKERS] Triggers on foreign tables

2013-10-15 Thread Ronan Dunklau

Le mardi 15 octobre 2013 09:47:31 Robert Haas a écrit :
> On Mon, Oct 14, 2013 at 5:24 PM, Kohei KaiGai  wrote:
> >>> And, I also want some comments from committers, not only from mine.
> >> 
> >> +1
> > 
> > +1
> 
> /me pokes head up.  I know I'm going to annoy people with this
> comment, but I feel like it's going to have to be made at some point
> by somebody, so here goes: I don't see the point of this feature.  If
> you want a trigger on a table, why not set it on the remote side?  A
> trigger on the foreign table won't be enforced consistently; it'll
> only work when the update is routed through the foreign table, not
> when people access the underlying table on the remote side through any
> other mechanism.  The number of useful things you can do this way
> seems fairly small.  Perhaps you could use a row-level trigger for
> RLS, to allow only certain rows on the foreign side to be updated, but
> even that seems like a slightly strange design: generally it'd be
> better to enforce the security as close to the target object as
> possible.
> 
> There's another issue that concerns me here also: performance.  IIUC,
> an update of N tuples on the remote side currently requires N+1 server
> round-trips.  That is unspeakably awful, and we really oughta be
> looking for ways to make that number go down, by pushing the whole
> update to the remote side.  But obviously that won't be possible if
> there's a per-row trigger that has to be evaluated on the local side.
> Now, assuming somebody comes up with code that implements that
> optimization, we can just disable it when there are local-side
> triggers.  But, then you're back to having terrible performance.  So
> even if the use case for this seemed really broad, I tend to think
> performance concerns would sink most of the possible real-world uses.
> 
> I could, of course, be all wet

Even if the row-level trigger functionality is controversial, in terms of 
provided features VS performance, the original patch I submitted enables 
statement-level triggers. 

Although my goal was to implement row-level triggers and I hit a wall pretty 
quickly, would it make sense to have statement-level triggers without their 
row counterparts ?

I also think that pushing the whole update to the remote side will not be 
possible in all cases, and like Kohei wrote, it may be an acceptable loss to 
not be able to push it when a trigger is present. If we want to push the whole 
update, we will have to cope with local joins and function evaluations in all 
cases. IMO, triggers are just a special case of these limiting factors.

signature.asc
Description: This is a digitally signed message part.

Re: [HACKERS] Standby catch up state change

2013-10-15 Thread Pavan Deolasee

On Tue, Oct 15, 2013 at 4:51 PM, Andres Freund wrote:

>
>
> I think you're over-intrepreting it.

I think you are right. Someone who understands the replication code very
well advised us to use that log message as a way to measure how much time
it takes to send all the missing WAL to a remote standby on a slow WAN
link. While it worked well for all measurements, when we use a middleware
which caches a lot of traffic on the sender side, this log message was very
counter intuitive. It took several more minutes for the standby to actually
receive all the WAL files and catch up after the message was displayed on
the master side. But then as you said, may be relying on the message was
not the best way to measure the time.

Thanks,
Pavan

-- 
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee

Re: [HACKERS] Compression of full-page-writes

2013-10-15 Thread KONDO Mitsumasa


(2013/10/15 22:01), k...@rice.edu wrote:

Google's lz4 is also a very nice algorithm with 33% better compression
performance than snappy and 2X the decompression performance in some
benchmarks also with a bsd license:

https://code.google.com/p/lz4/

If we judge only performance, we will select lz4. However, we should think
 another important factor which is software robustness, achievement, bug
fix history, and etc... If we see unknown bugs, can we fix it or improve
algorithm? It seems very difficult, because we only use it and don't
understand algorihtms. Therefore, I think that we had better to select
robust and having more user software.

Regards,
--
Mitsumasa KONDO
NTT Open Source Software


















--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-10-15 Thread Peter Geoghegan

On Tue, Oct 15, 2013 at 2:25 PM, Andres Freund  wrote:
> On 2013-10-15 10:53:35 -0700, Peter Geoghegan wrote:
>> On Tue, Oct 15, 2013 at 10:29 AM, Andres Freund  
>> wrote:
>> > I think anything that only works by breaking visibility rules that way
>> > is a nonstarter. Doing that from the C level is one thing, exposing it
>> > this way seems a bad idea.
>>
>> What visibility rule is that?
>
> The early return you added to HTSMVCC.
>
> At the very least it opens you to lots of halloween problem like
> scenarios.

The term "visibility rule" as you've used it here is suggestive of
some authoritative rule that should obviously never even be bent. I'd
suggest that what Postgres does isn't very useful as an authority on
this matter, because Postgres doesn't have upsert. Besides, today
Postgres doesn't just bend the rules (that is, some kind of classic
notion of MVCC as described in "Concurrency Control in Distributed
Database Systems" or something), it totally breaks them, at least in
READ COMMITTED mode (and what I've proposed here just occurs in RC
mode).

It is not actually in evidence that this approach introduces Halloween
problems. In order for HTSMVCC to controversially indicate visibility
under my scheme, it is not sufficient for the row version to just be
exclusive locked by our xact without otherwise being visible - it must
also *not be updated*. Now, I'll freely admit that this could still be
problematic - there might have been a subtlety I missed. But since an
actual example of where this is problematic hasn't been forthcoming, I
take it that it isn't obvious to either yourself or Robert that it
actually is. Any scheme that involves playing cute tricks with
visibility (which is to say, any credible upsert implementation) needs
very careful thought.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] pg_sleep(interval)

2013-10-15 Thread Vik Fearing

On 09/30/2013 01:47 PM, Vik Fearing wrote:
> Yes, I understand you are trying to help, and I appreciate it!  My
> opinion, and that of others as well from the original thread, is that
> this patch should either go in as is and break that one case, or not go
> in at all.  I'm fine with either (although clearly I would prefer it
> went in otherwise I wouldn't have written the patch).

I see this is marked as rejected in the commitfest app, but I don't see
any note about who did it or why.  I don't believe there is consensus
for rejection on this list. In fact I think the opposite is true.

May we have an explanation please from the person who rejected this
without comment?

-- 
Vik

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] buildfarm failures on smew and anole

2013-10-15 Thread Peter Eisentraut

On Mon, 2013-10-14 at 18:14 -0400, Robert Haas wrote:
> > I cleaned the semaphores on smew, but they came back.  Whatever is
> > crashing is leaving the semaphores lying around.
> 
> Ugh.  When did you do that exactly?  I thought I fixed the problem
> that was causing that days ago, and the last 4 days worth of runs all
> show the "too many clients" error.

I did it a few times over the weekend.  At least twice less than 4 days
ago.  There are currently no semaphores left around, so whatever
happened in the last run cleaned it up.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Improve setup for documentation building with FOP

2013-10-15 Thread Peter Eisentraut


On 9/16/13 12:19 PM, Alvaro Herrera wrote:
> The FOP-based build works fine for me.  I gave the output a look.  I
> like that text formatted with fixed-width font wraps at the right
> margin, instead of continuing beyond it; there are some strange
> artifacts about it (such as addition of hyphens in some places, say in
> the middle of a configure option); but I think it's nicer than the
> output of the other toolchain nevertheless.  This willingness to break
> in the middle of  formatted elements looks weird in some places;
> for example table 17.2 SSL Server File Usage; old output puts the option
> name on first line, file path on second line; new output puts option
> name on top, file name is split in half.

The wrapping and hyphenation is a complex issue.  I think we agree that
it's generally better to do suboptimal wrapping than to have the content
run off the page.  Everything between that is, AFAICT, a matter of
fine-tuning various parameters and attributes.

> In the US size PDF, look at page 386; below the "gmake install-world"
> there's the bottom-of-page horizontal line, but the paragraph continues
> below that.  Bug?

Yeah, no idea how that can happen.

> Section 17.8 "Encryption Options" is formatted differently; previous
> method used indentation for the paragraph after each item, new one uses
> a table.  I like the old method better.  This is notoriously bad in the
> "superuser_reserved_connections" entry in "18.3.1. Connection Settings",
> and others.  Also, the synopsis in PQconnectStartParams entry in 31.1
> "Database Connection Control Functions" looks a lot worse.  This
> manifests in many places, and while it's an improvement in a minority of
> them, I think it's a net loss overall.

Yep.  Fixed by setting an option that turns it back to the old way.

> I like that the new output has horizontal lines at top and bottom of the
> text area of the page.  In the old method, the page heading (above that
> line) contains the chapter number and title, while in FOP it only has
> title (no number).  I find that number useful.  Also, limiting the space
> for the title and wrapping if the title is too long seems pointless; see
> example of this in chapter "High Availability, Load Balancing and
> Replication", where even the word Bal-ancing has been hyphenated.

Made a note that fix that later.

> Formatting of ,  and such seems a lot better in FOP.  For
> , the old method used a surrounding box; the new one just has a
> "Warning" title and indented paragraph, just like for  et al.
> Doesn't look like a problem to me.  It'd be nice to have pretty icons
> for these areas.

Could be fine-tuned later.

>  renders as grey-background box in FOP, white-background in old
> tool.  Not sure about this; it looks like the only place with grey
> background in the whole manual.  We only have one  in the
> entire SGML source.

Hmm, maybe it's there to keep the printers honest. ;-)  It would be easy
to change this if desired.

> Example 19.1 "Example pg_hba.conf Entries" is completely broken (no page
> break); renders normally in old method.  There are other cases of this,
> including libpq sample programs and ECPG.

I can't reproduce that.  What version of FOP are you using?  I have 1.1.

>  renders differently: it used to produce a footnote.  FOP instead
> puts the link text inline.  Not really sure about this.

Changed back to footnote.

> The table 25.1 "High Availability, Load Balancing, and Replication
> Feature Matrix" contains a lot of "[bull]", which look odd.  Presumably
> an image of a bull head would be better?

This is an artifact of the SGML to XML conversion.  Already fixed in
separate commit.

> Tables 27-13 and 27-14 are misformatted in both implementations.  Surely
> we can do better than overlaying the contents of cells ...

This also needs to be addressed by adjusting the wrapping and
hyphenation rules.  Or we could manually tweak it by adding some "zero
width space" characters.  It would need to be checked how that would
affect other output formats, however.

> The START_REPLICATION stuff in the Frontend/Backend Protocol chapter is
> completely broken.  Maybe wrong markup?  Also in StartupMessage.

Same issue as with "Encryption Options" etc. above.

> Seems  resulted in nothing in the old toolchain, but now it does
> print the name of the author.  There are only two authors mentioned, in
> NLS and GEQO, though.  In the GEQO case it's a bit funny because the
> author is now mentioned twice.

I propose to remove those two author sections.  In the GEQO chapter, the
information is already present, and in the NLS case, well, I don't care.

> Speaking of GEQO: the links in its "53.4 Further Reading" section don't
> look well in the FOP.  And the bibliography looks completely
> different.

I see.  That is customizable, but maybe not the way it looks in the old
output.  We might need to reassess the entire bibliography at some point
anyway.  It's not really well-maintained.

> Oh, the index at the e

Re: [HACKERS] Long paths for tablespace leads to uninterruptible hang in Windows

2013-10-15 Thread Amit Kapila

On Wed, Oct 16, 2013 at 2:04 AM, Robert Haas  wrote:
> On Tue, Oct 15, 2013 at 4:14 PM, Amit Kapila  wrote:
>> On Tue, Oct 15, 2013 at 6:28 PM, Magnus Hagander  wrote:
>>> On Tue, Oct 15, 2013 at 2:55 PM, Robert Haas  wrote:
 On Mon, Oct 14, 2013 at 1:30 PM, Tom Lane  wrote:
>> Well, that sucks.  So it's a Windows bug.
>
> It's not clear to me that we should do anything about this at all,
> except perhaps document that people should avoid long tablespace
> path names on an unknown set of Windows versions.  We should not
> be in the business of working around any and every bug coming out
> of Redmond.

 It's sort of incomprehensible to me that Microsoft has a bug like this
 and apparently hasn't fixed it.  But I think I still favor trying to
 work around it.  When people try to use a long data directory name and
 it freezes the system, some of them will blame us rather than
 Microsoft.  We've certainly gone to considerable lengths to work
 around extremely strange bugs in various compiler toolchains, even
 relatively obscure ones.  I don't particularly see why we shouldn't do
 the same here.
>>>
>>> I agree we'll probably want to work around it in the end, but I still
>>> think it should be put to Microsoft PSS if we can. The usual - have we
>>> actually produced a self-contained example that does just this (and
>>> doesn't include the full postgres support) and submitted it to
>>> *microsoft* for comments?
>>
>>   I have written a self contained win32 console application with which
>> the issue can be reproduced.
>>   The application project is attached with this mail.
>>
>>   Here is brief description of the project:
>>   This project is created using MSVC 2010, but even if somebody
>> doesn't have this version of VC, functions in file long_path.cpp can
>> be copied and
>>   used in new project.
>>   In project settings, I have changed Character Set to "Use Multi-Byte
>> Character Set" which is what Postgres uses.
>>
>>   It takes 3 parameters as input:
>>   existingpath - path for which link will be created. this path should
>> be an already
>>   existing path with one level less than actual
>> path. For example,
>>   if we want to create a link for path
>> "E:/PG_Patch/Long_Path/path_dir/version_dir",
>>   then this should be "E:/PG_Patch/Long_Path/path_dir".
>>   newpath  - path where link needs to be created. it should be
>> non-absolute path
>>   of format "linked_path_dir/test_version"
>>   curpath   - path to set as current working directory path, it
>> should be the
>>   location to prepend to newpath
>>
>>  Currently I have used input parameters as
>>  E:/PG_Patch/Long_Path/path_dir
>>  linked_path_dir/test_version
>> E:/PG_Patch/Long_Path/
>>
>>  Long path is much less than 260 char limit on windows, I have
>> observed this problem with path length > 130 (approx.)
>
> And this reliably reproduces the hang?

   Yes, it produces hang whenever the length of 'curpath' parameter is
greater then 130 (approx.). In above example, I used curpath of length
159.

> On which Windows version(s)?

   I used Windows 7 64bit to reproduce it. However the original user
has reported this issue on Windows 2008 64bit, so this application
should hang on other Windows 2008 64bit as well.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical changeset generation v6.2

2013-10-15 Thread Peter Geoghegan

On Tue, Oct 15, 2013 at 7:09 AM, Robert Haas  wrote:
> Let's try that again.
>
> User: Wow, that sounds great.  How do I use it?
> Hacker: Well, currently, the output gets dumped as a series of text
> files that are designed to be parsed using a scripting language.  We
> have sample parsers written in Perl and Python that you can use as-is
> or hack up to meet your needs.

Have you heard of multicorn? Plugin authors can write a wrapper that
spits out JSON or whatever other thing they like, which can be
consumed by non C-hackers.

> Now, some users are still going to head for the hills.  But at least
> from where I sit it sounds a hell of a lot better than the first
> answer.  We're not going to solve all of the tooling problems around
> this technology in one release, for sure.  But as far as 95% of our
> users are concerned, a C API might as well not exist at all.  People
> WILL try to machine parse the output of whatever demo plugins we
> provide; so I think we should try hard to provide at least one such
> plugin that is designed to make that as easy as possible.

I agree that this is important, but I wouldn't like to weigh it too heavily.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical changeset generation v6.2

2013-10-15 Thread David Fetter

On Tue, Oct 15, 2013 at 10:09:05AM -0400, Robert Haas wrote:
> On Tue, Oct 15, 2013 at 9:17 AM, Andres Freund  wrote:
> User: So, what's new in PostgreSQL 9.4?
> Hacker: Well, now we have logical replication!
> User: Why is that cool?
> Hacker: Well, streaming replication is awesome for HA, but it has
> significant limitations.  And trigger-based systems are very mature,
> but the overhead is high and their lack of core integration makes them
> hard to use.  With this technology, you can build systems that will
> replicate individual tables or even parts of tables, multi-master
> systems, and lots of other cool stuff.
> User: Wow, that sounds great.  How do I use it?
> Hacker: Well, first you write an output plugin in C using a special API.
> User: Hey, do you know whether the MongoDB guys came to this conference?
> 
> Let's try that again.
> 
> User: Wow, that sounds great.  How do I use it?
> Hacker: Well, currently, the output gets dumped as a series of text
> files that are designed to be parsed using a scripting language.  We
> have sample parsers written in Perl and Python that you can use as-is
> or hack up to meet your needs.

My version:

Hacker: the output gets dumped as a series of JSON files.  We have
docs for this rev of the format and examples of consumers in Perl and
Python you can use as-is or hack up to meet your needs.

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] PSQL return coder

2013-10-15 Thread James Sewell

I was avoiding ON_ERROR_STOP because I was using ON_ERROR_ROLLBACK, but
have just realised that if I encase my SQL in a transaction then rollback
will still happen.

Perfect!


James Sewell,
PostgreSQL Team Lead / Solutions Architect
__


 Level 2, 50 Queen St, Melbourne VIC 3000

*P *(+61) 3 8370 8000 * **W* www.lisasoft.com  *F *(+61) 3 8370 8099



On Fri, Oct 11, 2013 at 12:25 AM, Merlin Moncure  wrote:

> On Thu, Oct 10, 2013 at 1:52 AM, Tom Lane  wrote:
> > James Sewell  writes:
> >> My question is in a rollback scenario is it possible to get PSQL to
> return
> >> a non 0 exit status?
> >
> > Maybe you could use -c instead of -f?
> >
> > $ psql -c 'select 1; select 1/0' regression
> > ERROR:  division by zero
> > $ echo $?
> > 1
> >
> > You won't need explicit BEGIN/END because this is already a single
> > transaction.
>
> According to the man page,
> "EXIT STATUS
>psql returns 0 to the shell if it finished normally, 1 if a fatal
> error
>of its own (out of memory, file not found) occurs, 2 if the
>  connection
>to the server went bad and the session was not interactive, and 3
> if an
>error occurred in a script and the variable ON_ERROR_STOP was set."
>
> So for a longer script ON_ERROR_STOP might be the ticket (which is
> usually a good idea anyways).
>
> merlin
>

-- 


--
The contents of this email are confidential and may be subject to legal or 
professional privilege and copyright. No representation is made that this 
email is free of viruses or other defects. If you have received this 
communication in error, you may not copy or distribute any part of it or 
otherwise disclose its contents to anyone. Please advise the sender of your 
incorrect receipt of this correspondence.

Re: [HACKERS] Patch for reserved connections for replication users

2013-10-15 Thread Gibheer

On Mon, 14 Oct 2013 11:52:57 +0530
Amit Kapila  wrote:

> On Sun, Oct 13, 2013 at 2:08 PM, Gibheer 
> wrote:
> > On Sun, 13 Oct 2013 11:38:17 +0530
> > Amit Kapila  wrote:
> >
> >> On Thu, Oct 10, 2013 at 3:17 AM, Gibheer
> >>  wrote:
> >> > On Mon, 7 Oct 2013 11:39:55 +0530
> >> > Amit Kapila  wrote:
> >> >> Robert Haas wrote:
> >> >> On Mon, Aug 5, 2013 at 2:04 AM, Andres Freund
> >> >>  wrote:
> >> >> >>> Hmm.  It seems like this match is making MaxConnections no
> >> >> >>> longer mean the maximum number of connections, but rather
> >> >> >>> the maximum number of non-replication connections.  I don't
> >> >> >>> think I support that definitional change, and I'm kinda
> >> >> >>> surprised if this is sufficient to implement it anyway
> >> >> >>> (e.g. see InitProcGlobal()).
> >> >> >
> >> >> >> I don't think the implementation is correct, but why don't
> >> >> >> you like the definitional change? The set of things you can
> >> >> >> do from replication connections are completely different
> >> >> >> from a normal connection. So using separate "pools" for them
> >> >> >> seems to make sense. That they end up allocating similar
> >> >> >> internal data seems to be an implementation detail to me.
> >> >>
> >> >> > Because replication connections are still "connections".  If I
> >> >> > tell the system I want to allow 100 connections to the server,
> >> >> > it should allow 100 connections, not 110 or 95 or any other
> >> >> > number.
> >> >>
> >> >> I think that to reserve connections for replication, mechanism
> >> >> similar to superuser_reserved_connections be used rather than
> >> >> auto vacuum workers or background workers.
> >> >> This won't change the definition of MaxConnections. Another
> >> >> thing is that rather than introducing new parameter for
> >> >> replication reserved connections, it is better to use
> >> >> max_wal_senders as it can serve the purpose.
> >> >>
> >> >> Review for replication_reserved_connections-v2.patch,
> >> >> considering we are going to use mechanism similar to
> >> >> superuser_reserved_connections and won't allow definition of
> >> >> MaxConnections to change.
> >> >>
> >>
> >> >
> >> > Hi,
> >> >
> >> > I took the time and reworked the patch with the feedback till
> >> > now. Thank you very much Amit!
> >> >
> >> > So this patch uses max_wal_senders together with the idea of the
> >> > first patch I sent. The error messages are also adjusted to make
> >> > it obvious, how it is supposed to be and all checks work, as far
> >> > as I could tell.
> >>
> >> If I understand correctly, now the patch has implementation such
> >> that a. if the number of connections left are (ReservedBackends +
> >> max_wal_senders), then only superusers or replication connection's
> >> will be allowed
> >> b. if the number of connections left are ReservedBackend, then only
> >> superuser connections will be allowed.
> >
> > That is correct.
> >
> >> So it will ensure that max_wal_senders is used for reserving
> >> connection slots from being used by non-super user connections. I
> >> find new usage of max_wal_senders acceptable, if anyone else thinks
> >> otherwise, please let us know.
> >>
> >>
> >> 1.
> >> +superuser_reserved_connections
> >> +max_wal_senders only superuser and WAL
> >> connections
> >> +are allowed.
> >>
> >> Here minus seems to be missing before max_wal_senders and I think
> >> it will be better to use replication connections rather than WAL
> >> connections.
> >
> > This is fixed.
> >
> >> 2.
> >> -new replication connections will be accepted.
> >> +new WAL or other connections will be accepted.
> >>
> >> I think as per new implementation, we don't need to change this
> >> line.
> >
> > I reverted that change.
> >
> >> 3.
> >> + * reserved slots from max_connections for wal senders. If the
> >> number of free
> >> + * slots (max_connections - max_wal_senders) is depleted.
> >>
> >>  Above calculation (max_connections - max_wal_senders) needs to
> >> include super user reserved connections.
> >
> > My first thought was, that I would not add it here. When superuser
> > reserved connections are not set, then only max_wal_senders would
> > count.
> > But you are right, it has to be set, as 3 connections are reserved
> > by default for superusers.
> 
> + * slots (max_connections - superuser_reserved_connections -
> max_wal_senders) here it should be ReservedBackends rather than
> superuser_reserved_connections.

fixed

> >> 4.
> >> + /*
> >> + * Although replication connections currently require superuser
> >> privileges, we
> >> + * don't allow them to consume the superuser reserved slots, which
> >> are
> >> + * intended for interactive use.
> >>   */
> >>   if ((!am_superuser || am_walsender) &&
> >>   ReservedBackends > 0 &&
> >>   !HaveNFreeProcs(ReservedBackends))
> >>   ereport(FATAL,
> >>   (errcode(ERRCODE_TOO_MANY_CONNECTIONS),
> >> - errmsg("remaining connection slots are reserved for
> >> non-replication superuser conn

Re: [HACKERS] [PoC] pgstattuple2: block sampling to reduce physical read

2013-10-15 Thread Mark Kirkwood


On 11/10/13 17:33, Jaime Casanova wrote:
also the name pgstattuple2, doesn't convince me... maybe you can use 
pgstattuple() if you use a second argument (percentage of the sample) 
to overload the function 


+1, that seems much nicer.

Regards

Mark


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PoC] pgstattuple2: block sampling to reduce physical read

2013-10-15 Thread Mark Kirkwood

On 11/10/13 17:49, Mark Kirkwood wrote:
> On 11/10/13 17:08, Satoshi Nagayasu wrote:
>> (2013/10/11 7:32), Mark Kirkwood wrote:
>>> On 11/10/13 11:09, Mark Kirkwood wrote:
 On 16/09/13 16:20, Satoshi Nagayasu wrote:
> (2013/09/15 11:07), Peter Eisentraut wrote:
>> On Sat, 2013-09-14 at 16:18 +0900, Satoshi Nagayasu wrote:
>>> I'm looking forward to seeing more feedback on this approach,
>>> in terms of design and performance improvement.
>>> So, I have submitted this for the next CF.
>> Your patch fails to build:
>>
>> pgstattuple.c: In function ‘pgstat_heap_sample’:
>> pgstattuple.c:737:13: error: ‘SnapshotNow’ undeclared (first use in
>> this function)
>> pgstattuple.c:737:13: note: each undeclared identifier is reported
>> only once for each function it appears in
> Thanks for checking. Fixed to eliminate SnapshotNow.
>
 This seems like a cool idea! I took a quick look, and initally
 replicated the sort of improvement you saw:


 bench=# explain analyze select * from pgstattuple('pgbench_accounts');
 QUERY PLAN

 
 Function Scan on pgstattuple (cost=0.00..0.01 rows=1 width=72) (actual
 time=786.368..786.369 rows=1 loops=1)
 Total runtime: 786.384 ms
 (2 rows)

 bench=# explain analyze select * from pgstattuple2('pgbench_accounts');
 NOTICE: pgstattuple2: SE tuple_count 0.00, tuple_len 0.00,
 dead_tuple_count 0.00, dead_tuple_len 0.00, free_space 0.00
 QUERY PLAN

 
 Function Scan on pgstattuple2 (cost=0.00..0.01 rows=1 width=72) (actual
 time=12.004..12.005 rows=1 loops=1)
 Total runtime: 12.019 ms
 (2 rows)



 I wondered what sort of difference eliminating caching would make:

 $ sudo sysctl -w vm.drop_caches=3

 Repeating the above queries:


 bench=# explain analyze select * from pgstattuple('pgbench_accounts');
 QUERY PLAN

 
 Function Scan on pgstattuple (cost=0.00..0.01 rows=1 width=72) (actual
 time=9503.774..9503.776 rows=1 loops=1)
 Total runtime: 9504.523 ms
 (2 rows)

 bench=# explain analyze select * from pgstattuple2('pgbench_accounts');
 NOTICE: pgstattuple2: SE tuple_count 0.00, tuple_len 0.00,
 dead_tuple_count 0.00, dead_tuple_len 0.00, free_space 0.00
 QUERY PLAN

 
 Function Scan on pgstattuple2 (cost=0.00..0.01 rows=1 width=72) (actual
 time=12330.630..12330.631 rows=1 loops=1)
 Total runtime: 12331.353 ms
 (2 rows)


 So the sampling code seems *slower* when the cache is completely cold -
 is that expected? (I have not looked at how the code works yet - I'll
 dive in later if I get a chance)!
>> Thanks for testing that. It would be very helpful to improve the
>> performance.
>>
>>> Quietly replying to myself - looking at the code the sampler does 3000
>>> random page reads... I guess this is slower than 163935 (number of pages
>>> in pgbench_accounts) sequential page reads thanks to os readahead on my
>>> type of disk (WD Velociraptor). Tweaking the number of random reads (i.e
>>> the sample size) down helps - but obviously that can impact estimation
>>> accuracy.
>>>
>>> Thinking about this a bit more, I guess the elapsed runtime is not the
>>> *only* theng to consider - the sampling code will cause way less
>>> disruption to the os page cache (3000 pages vs possibly lots more than
>>> 3000 for reading an entire ralation).
>>>
>>> Thoughts?
>> I think it could be improved by sorting sample block numbers
>> *before* physical block reads in order to eliminate random access
>> on the disk.
>>
>> pseudo code:
>> --
>> for (i=0 ; i> {
>> sample_block[i] = random();
>> }
>>
>> qsort(sample_block);
>>
>> for (i=0 ; i> {
>> buf = ReadBuffer(rel, sample_block[i]);
>>
>> do_some_stats_stuff(buf);
>> }
>> --
>>
>> I guess it would be helpful for reducing random access thing.
>>
>> Any comments?
> Ah yes - that's a good idea (rough patch to your patch attached)!
>
> bench=# explain analyze select * from pgstattuple2('pgbench_accounts');
> NOTICE: pgstattuple2: SE tuple_count 0.00, tuple_len 0.00,
> dead_tuple_count 0.00, dead_tuple_len 0.00, free_space 0.00
> QUERY PLAN
>
> 
> Function Scan on pgstattuple2 (cost=0.00..0.01 rows=1 width=72) (actual
> time=9968.318..9968.319 rows=1 loops=1)
> Total runtime: 9968.443 ms
> (2 rows)
>



Actually - correcting my compare function to sort the blocks in
*increasing* order (doh), gets a better

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-10-15 Thread Josh Berkus

On 10/15/2013 02:31 PM, Andres Freund wrote:
> On 2013-10-15 11:55:06 -0700, Josh Berkus wrote:
>> Also, because you can't INDEX CONCURRENTLY a PK, I've been building a
>> lot of databases which have no PKs, only UNIQUE indexes.
> 
> You know that you can add prebuilt primary keys using ALTER TABLE
> ... ADD CONSTRAINT ... PRIMARY KEY (...) USING indexname?

That still requires an ACCESS EXCLUSIVE lock, and then can't be dropped
using DROP INDEX CONCURRENTLY.


-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical changeset generation v6.2

2013-10-15 Thread Josh Berkus

On 10/15/2013 07:56 AM, Andres Freund wrote:
>>> Well, just providing the C API + an example in a first step didn't work
>>> > > out too badly for FDWs. I am pretty sure that once released there will
>>> > > soon be extensions for it on PGXN or whatever for special usecases.
>> > 
>> > I suspect so, too.  But I also think that if that's the only thing
>> > available in the first release, a lot of users will get a poor initial
>> > impression.
> I think lots of people will expect a builtin logical replication
> solution :/. Which seems a tad unlikely to arrive in 9.4.

Well, last I checked the Slony team is hard at work on building
something which will be based on logical changesets.  So there will
likely be at least one tool available shortly after 9.4 is released.

A good and flexible API is, IMHO, more important than having any
finished solution.  The whole reason why logical replication was outside
core PG for so long is that replication systems have differing and
mutually incompatible goals.  A good API can support all of those goals;
a user-level tool, no matter how good, can't.

And, frankly, once the API is built, how hard will it be to write a
script which does the simplest replication approach (replay all
statements on slave)?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-10-15 Thread Andres Freund

On 2013-10-15 11:55:06 -0700, Josh Berkus wrote:
> Also, because you can't INDEX CONCURRENTLY a PK, I've been building a
> lot of databases which have no PKs, only UNIQUE indexes.

You know that you can add prebuilt primary keys using ALTER TABLE
... ADD CONSTRAINT ... PRIMARY KEY (...) USING indexname?

> Postgres doesn't distinguish between UNIQUE indexes
> and PRIMARY KEYs -- as, indeed, it shouldn't, since they're both keys,
> adn the whole concept of a "primary key" is a legacy of index-organized
> databases, which PostgreSQL is not.

There's some other differences, fro one primary keys are automatically
picked up by foreign keys if the referenced columns aren't specified,
for another we do not yet automatically recognize NOT NULL UNIQUE
columns in GROUP BY.

> However, it does seem like the new syntax could be extended with and
> optional "USING unqiue_index_name" in the future (9.5), no?

Yes.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-10-15 Thread Andres Freund

On 2013-10-15 11:23:44 -0700, Josh Berkus wrote:
> (although, AFAICT it doesn't allow for the implementation of one of my
> personal desires, which is UPDATE ... ON NOT FOUND INSERT, for cases
> where updates are expected to occur 95% of the time, but that's another
> topic. Unless "rejects" for an Update could be the leftover rows, but
> then we're getting into full MERGE.).

FWIW I can't see the above syntax as something working very well - you
fundamentally have to SET every column and it only makes sense in
UPDATEs that provably affect only one row.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-10-15 Thread Andres Freund

On 2013-10-15 10:53:35 -0700, Peter Geoghegan wrote:
> On Tue, Oct 15, 2013 at 10:29 AM, Andres Freund  
> wrote:
> > I think anything that only works by breaking visibility rules that way
> > is a nonstarter. Doing that from the C level is one thing, exposing it
> > this way seems a bad idea.
> 
> What visibility rule is that?

The early return you added to HTSMVCC.

At the very least it opens you to lots of halloween problem like
scenarios.

> Upsert *has* to do effectively the same thing as what I've proposed -
> there is no getting away from it. So maybe the visibility rulebook
> (which as far as I can tell is "the way things work today") needs to
> be updated. If we did, say, INSERT...ON DUPLICATE KEY UPDATE, we'd
> have to update a row with potentially no visible-to-snapshot version
> *at all*, and make a new version of that visible. That's just what it
> takes. What's the difference between that and just locking? If the
> only difference is that it isn't necessary to modify tqual.c because
> you're passing a tid directly, that isn't a user-visible difference -
> the "rule" has been broken just the same.  Arguably, it's even more of
> a hack, since it's a special, out-of-band visibility exception.

No, doing it in special case code is fundamentally different since those
locations deal only with one row at a time. There's no scans that can
pass over that row.
That's why I think exposing the "on conflict lock" logic to anything but
C isn't going to fly btw.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Long paths for tablespace leads to uninterruptible hang in Windows

2013-10-15 Thread Robert Haas

On Tue, Oct 15, 2013 at 4:14 PM, Amit Kapila  wrote:
> On Tue, Oct 15, 2013 at 6:28 PM, Magnus Hagander  wrote:
>> On Tue, Oct 15, 2013 at 2:55 PM, Robert Haas  wrote:
>>> On Mon, Oct 14, 2013 at 1:30 PM, Tom Lane  wrote:
> Well, that sucks.  So it's a Windows bug.

 It's not clear to me that we should do anything about this at all,
 except perhaps document that people should avoid long tablespace
 path names on an unknown set of Windows versions.  We should not
 be in the business of working around any and every bug coming out
 of Redmond.
>>>
>>> It's sort of incomprehensible to me that Microsoft has a bug like this
>>> and apparently hasn't fixed it.  But I think I still favor trying to
>>> work around it.  When people try to use a long data directory name and
>>> it freezes the system, some of them will blame us rather than
>>> Microsoft.  We've certainly gone to considerable lengths to work
>>> around extremely strange bugs in various compiler toolchains, even
>>> relatively obscure ones.  I don't particularly see why we shouldn't do
>>> the same here.
>>
>> I agree we'll probably want to work around it in the end, but I still
>> think it should be put to Microsoft PSS if we can. The usual - have we
>> actually produced a self-contained example that does just this (and
>> doesn't include the full postgres support) and submitted it to
>> *microsoft* for comments?
>
>   I have written a self contained win32 console application with which
> the issue can be reproduced.
>   The application project is attached with this mail.
>
>   Here is brief description of the project:
>   This project is created using MSVC 2010, but even if somebody
> doesn't have this version of VC, functions in file long_path.cpp can
> be copied and
>   used in new project.
>   In project settings, I have changed Character Set to "Use Multi-Byte
> Character Set" which is what Postgres uses.
>
>   It takes 3 parameters as input:
>   existingpath - path for which link will be created. this path should
> be an already
>   existing path with one level less than actual
> path. For example,
>   if we want to create a link for path
> "E:/PG_Patch/Long_Path/path_dir/version_dir",
>   then this should be "E:/PG_Patch/Long_Path/path_dir".
>   newpath  - path where link needs to be created. it should be
> non-absolute path
>   of format "linked_path_dir/test_version"
>   curpath   - path to set as current working directory path, it
> should be the
>   location to prepend to newpath
>
>  Currently I have used input parameters as
>  E:/PG_Patch/Long_Path/path_dir
>  linked_path_dir/test_version
> E:/PG_Patch/Long_Path/
>
>  Long path is much less than 260 char limit on windows, I have
> observed this problem with path length > 130 (approx.)

And this reliably reproduces the hang?  On which Windows version(s)?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Triggers on foreign tables

2013-10-15 Thread Kohei KaiGai

2013/10/15 Robert Haas :
> On Mon, Oct 14, 2013 at 5:24 PM, Kohei KaiGai  wrote:
 And, I also want some comments from committers, not only from mine.
>>>
>>> +1
>>>
>> +1
>
> /me pokes head up.  I know I'm going to annoy people with this
> comment, but I feel like it's going to have to be made at some point
> by somebody, so here goes: I don't see the point of this feature.  If
> you want a trigger on a table, why not set it on the remote side?  A
> trigger on the foreign table won't be enforced consistently; it'll
> only work when the update is routed through the foreign table, not
> when people access the underlying table on the remote side through any
> other mechanism.  The number of useful things you can do this way
> seems fairly small.  Perhaps you could use a row-level trigger for
> RLS, to allow only certain rows on the foreign side to be updated, but
> even that seems like a slightly strange design: generally it'd be
> better to enforce the security as close to the target object as
> possible.
>
One reason we should support local triggers is that not all the data
source of FDW support remote trigger. It required FDW drivers to
have RDBMS as its backend, but no realistic assumption.
For example, file_fdw is unavailable to implement remote triggers.

One thing I'd like to know is, where is the goal of FDW feature.
It seems to me, FDW goes into a feature to manage external data
set as if regular tables. If it is right understanding, things we try to
support on foreign table is things we're supporting on regular tables,
such as triggers.

> There's another issue that concerns me here also: performance.  IIUC,
> an update of N tuples on the remote side currently requires N+1 server
> round-trips.  That is unspeakably awful, and we really oughta be
> looking for ways to make that number go down, by pushing the whole
> update to the remote side.  But obviously that won't be possible if
> there's a per-row trigger that has to be evaluated on the local side.
> Now, assuming somebody comes up with code that implements that
> optimization, we can just disable it when there are local-side
> triggers.  But, then you're back to having terrible performance.  So
> even if the use case for this seemed really broad, I tend to think
> performance concerns would sink most of the possible real-world uses.
>
We often have some case that we cannot apply fully optimized path
because of some reasons, like view has security-barrier, qualifier
contained volatile functions, and so on...
Trigger may be a factor to prevent fully optimized path, however,
it depends on the situation which one shall be prioritized; performance
or functionality.

Thanks,
-- 
KaiGai Kohei 


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Long paths for tablespace leads to uninterruptible hang in Windows

2013-10-15 Thread Amit Kapila

On Tue, Oct 15, 2013 at 6:28 PM, Magnus Hagander  wrote:
> On Tue, Oct 15, 2013 at 2:55 PM, Robert Haas  wrote:
>> On Mon, Oct 14, 2013 at 1:30 PM, Tom Lane  wrote:
 Well, that sucks.  So it's a Windows bug.
>>>
>>> It's not clear to me that we should do anything about this at all,
>>> except perhaps document that people should avoid long tablespace
>>> path names on an unknown set of Windows versions.  We should not
>>> be in the business of working around any and every bug coming out
>>> of Redmond.
>>
>> It's sort of incomprehensible to me that Microsoft has a bug like this
>> and apparently hasn't fixed it.  But I think I still favor trying to
>> work around it.  When people try to use a long data directory name and
>> it freezes the system, some of them will blame us rather than
>> Microsoft.  We've certainly gone to considerable lengths to work
>> around extremely strange bugs in various compiler toolchains, even
>> relatively obscure ones.  I don't particularly see why we shouldn't do
>> the same here.
>
> I agree we'll probably want to work around it in the end, but I still
> think it should be put to Microsoft PSS if we can. The usual - have we
> actually produced a self-contained example that does just this (and
> doesn't include the full postgres support) and submitted it to
> *microsoft* for comments?

  I have written a self contained win32 console application with which
the issue can be reproduced.
  The application project is attached with this mail.

  Here is brief description of the project:
  This project is created using MSVC 2010, but even if somebody
doesn't have this version of VC, functions in file long_path.cpp can
be copied and
  used in new project.
  In project settings, I have changed Character Set to "Use Multi-Byte
Character Set" which is what Postgres uses.

  It takes 3 parameters as input:
  existingpath - path for which link will be created. this path should
be an already
  existing path with one level less than actual
path. For example,
  if we want to create a link for path
"E:/PG_Patch/Long_Path/path_dir/version_dir",
  then this should be "E:/PG_Patch/Long_Path/path_dir".
  newpath  - path where link needs to be created. it should be
non-absolute path
  of format "linked_path_dir/test_version"
  curpath   - path to set as current working directory path, it
should be the
  location to prepend to newpath

 Currently I have used input parameters as
 E:/PG_Patch/Long_Path/path_dir
 linked_path_dir/test_version
E:/PG_Patch/Long_Path/

 Long path is much less than 260 char limit on windows, I have
observed this problem with path length > 130 (approx.)

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

long_path.rar
Description: application/rar

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] background workers, round three

2013-10-15 Thread Kohei KaiGai

2013/10/14 Robert Haas :
>> * ephemeral-precious-v1.patch
>> AtEOXact_BackgroundWorker() is located around other AtEOXact_*
>> routines. Doesn't it makes resource management complicated?
>> In case when main process goes into error handler but worker
>> process is still running in health, it may continue to calculate
>> something and put its results on shared memory segment, even
>> though main process suggest postmaster to kill it.
>
> Since I wrote this patch set, I've been thinking a lot more about
> error recovery.  Obviously, one of the big problems as we think about
> parallel query is that you've now got multiple backends floating
> around, and if the transaction aborts (in any backend), the other
> backends don't automatically know that; they need some way to know
> that they, too, short abort processing.  There are a lot of details to
> get right here, and the time I've spent on it so far convinces me that
> the problem is anything but easy.
>
> Having said that, I'm not too concerned about the particular issue
> that you raise here.  The resources that need to be cleaned up during
> transaction abort are backend-private resources.  If, for example, the
> user backend detaches a dynamic shared memory segment that is being
> used for a parallel computation, they're not actually *destroying* the
> segment; they are just detaching it *from their address space*.  The
> last process to detach it will also destroy it.  So the ordering in
> which the various processes detach it doesn't matter much.
>
> One of the things I do this is necessary is a set of on_dsm_detach
> callbacks that work pretty much the way that on_shmem_exit callbacks
> work today.  Just as we can't detach from the main shared memory
> segment without releasing locks and buffer pins and lwlocks and our
> PGXACT, we can't release from a dynamic shared memory segment without
> performing any similar cleanup that is needed.  I'm currently working
> on a patch for that.
>
Hmm. It probably allows to clean-up smaller fraction of data structure
constructed on dynamic shared memory segment, if we map / unmap
for each transactions.

>> All the ResourceOwnerRelease() callbacks are located prior to
>> AtEOXact_BackgroundWorker(), it is hard to release resources
>> being in use by background worker, because they are healthy
>> running until it receives termination signal, but sent later.
>> In addition, it makes implementation complicated if we need to
>> design background workers to release resources if and when it
>> is terminated. I don't think it is a good coding style, if we need
>> to release resources in different location depending on context.
>
> Which specific resources are you concerned about?
>
I assumed smaller chunks allocated on static or dynamic shared
memory segment to be used for communicate between main
process and worker processes because of my motivation.
When we move a chunk of data to co-processor using asynchronous
DMA transfer, API requires the source buffer is mlock()'ed to avoid
unintentional swap out during DMA transfer. On the other hand,
cost of mlock() operation is not ignorable, so it may be a reasonable
design to lock a shared memory segment on start-up time then
continue to use it, without unmapping.
So, I wondered how to handle the situation when extension tries
to manage a resource with smaller granularity than the one
managed by PostgreSQL core.

>> So, I'd like to propose to add a new invocation point of
>> ResourceOwnerRelease() after all AtEOXact_* jobs, with
>> new label something like RESOURCE_RELEASE_FINAL.
>>
>> In addition, AtEOXact_BackgroundWorker() does not synchronize
>> termination of background worker processes being killed.
>> Of course it depends on situation, I think it is good idea to wait
>> for completion of worker processes to be terminated, to ensure
>> resource to be released is backed to the main process if above
>> ResourceOwnerRelease() do the job.
>
> Again, which resources are we talking about here?  I tend to think
> it's an essential property of the system that we *shouldn't* have to
> care about the order in which processes are terminated.  First, that
> will be difficult to control; if an ERROR or FATAL condition has
> occurred and we need to terminate, then there are real limits to what
> guarantees we can provide after that point.  Second, it's also
> *expensive*.  The point of parallelism is to make things faster; any
> steps we add that involve waiting for other processes to do things
> will eat away at the available gains.  For a query that'll run for an
> hour that hardly matters, but for short queries it's important to
> avoid unnecessary overhead.
>
Indeed, you are right. Error path has to be terminated soon.
Probably, ResourceOwnerRelease() callback needs to inform healthy
performing worker process the transaction got aborted thus no need
to return its calculation result, using some way, if I implement it.

Thanks,
-- 
KaiGai Kohei 


-- 
Sent via pgsql-hackers mailing

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-10-15 Thread Josh Berkus

On 10/15/2013 12:03 PM, Peter Geoghegan wrote:
> On Tue, Oct 15, 2013 at 11:55 AM, Josh Berkus  wrote:
>> However, it does seem like the new syntax could be extended with and
>> optional "USING unqiue_index_name" in the future (9.5), no?
> 
> There is no reason why we couldn't do that and just consider that one
> unique index. Whether we should is another question - 

What's the "shouldn't" argument, if any?

> I certainly
> think that mandating it would be very bad.

Agreed.  If there is a PK, we should allow the user to use it implicitly.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-10-15 Thread Peter Geoghegan

On Tue, Oct 15, 2013 at 11:55 AM, Josh Berkus  wrote:
> However, it does seem like the new syntax could be extended with and
> optional "USING unqiue_index_name" in the future (9.5), no?

There is no reason why we couldn't do that and just consider that one
unique index. Whether we should is another question - I certainly
think that mandating it would be very bad.

> I'm just checking that we're not painting ourselves into a corner with
> this particular implementation.  It's OK if it doesn't implement most
> things now; it's bad if it is impossible to build on and we have to
> support it forever.

I don't believe it does. In essence this just simply inserts a row,
and rather than throwing a unique constraint violation, locks the row
that prevented insertion from proceeding in respect of any tuple
proposed for insertion where it does not. That's all. You can build
lots of things with it that you can't today. Or you can not use it at
all. So that covers semantics, I'd say.

As for implementation: I believe that the implementation is by far the
most forward thinking (in terms of building infrastructure for a
proper MERGE) of any proposal to date.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-10-15 Thread Josh Berkus

On 10/15/2013 11:38 AM, Peter Geoghegan wrote:
> We thought about prioritizing where to look (mostly as a performance
> optimization), but right now no. It works with amcanunique methods,
> which in practice means btrees. There is no such thing as a GiST
> unique index, so I guess you're referring to an exclusion constraint
> on an equality operator. That doesn't work with this, but why would
> you want it to? As for generalizing this to work with exclusion
> constraints, which I guess you might have also meant, that's a much
> more difficult and much less compelling proposition, in my opinion.

Yeah, that was one thing I was thinking of.

Also, because you can't INDEX CONCURRENTLY a PK, I've been building a
lot of databases which have no PKs, only UNIQUE indexes.  Historically,
this hasn't been an issue because aside from wonky annoyances (like the
CONCURRENTLY case), Postgres doesn't distinguish between UNIQUE indexes
and PRIMARY KEYs -- as, indeed, it shouldn't, since they're both keys,
adn the whole concept of a "primary key" is a legacy of index-organized
databases, which PostgreSQL is not.

However, it does seem like the new syntax could be extended with and
optional "USING unqiue_index_name" in the future (9.5), no?

I'm just checking that we're not painting ourselves into a corner with
this particular implementation.  It's OK if it doesn't implement most
things now; it's bad if it is impossible to build on and we have to
support it forever.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-10-15 Thread Peter Geoghegan

On Tue, Oct 15, 2013 at 11:23 AM, Josh Berkus  wrote:
> (although, AFAICT it doesn't allow for the implementation of one of my
> personal desires, which is UPDATE ... ON NOT FOUND INSERT, for cases
> where updates are expected to occur 95% of the time, but that's another
> topic. Unless "rejects" for an Update could be the leftover rows, but
> then we're getting into full MERGE.).

This isn't really all that inefficient for that case. Certainly, the
balance in cost between mostly-insert cases and mostly-update cases is
a strength of my basic approach over others.

> Does this version make a distinction between PRIMARY KEY constraints and
> UNIQUE indexes?  If not, how does it pick among keys?  If so, what about
> tables with no PRIMARY KEY for various reasons (like unique GiST indexes?)

We thought about prioritizing where to look (mostly as a performance
optimization), but right now no. It works with amcanunique methods,
which in practice means btrees. There is no such thing as a GiST
unique index, so I guess you're referring to an exclusion constraint
on an equality operator. That doesn't work with this, but why would
you want it to? As for generalizing this to work with exclusion
constraints, which I guess you might have also meant, that's a much
more difficult and much less compelling proposition, in my opinion.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-10-15 Thread Josh Berkus

Peter,

> Note also that this doesn't preclude a variant with a more direct
> update part (not that I think that's all that compelling). Doing
> things this way was motivated by:

I can see the value in the CTE format for this for existing PostgreSQL
users.

(although, AFAICT it doesn't allow for the implementation of one of my
personal desires, which is UPDATE ... ON NOT FOUND INSERT, for cases
where updates are expected to occur 95% of the time, but that's another
topic. Unless "rejects" for an Update could be the leftover rows, but
then we're getting into full MERGE.).

I'm just pointing out that this doesn't do much for the MySQL migration
case; the rewrite is too complex to automate.  I'd been assuming that we
had some plans to implement a MySQL-friendly syntax for 9.5, and this
version was a stepping stone to that.

Does this version make a distinction between PRIMARY KEY constraints and
UNIQUE indexes?  If not, how does it pick among keys?  If so, what about
tables with no PRIMARY KEY for various reasons (like unique GiST indexes?)

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-10-15 Thread Peter Geoghegan

On Tue, Oct 15, 2013 at 11:05 AM, Peter Geoghegan  wrote:
> See the original e-mail in the thread for what I imagine idiomatic
> usage will look like.
>
> http://www.postgresql.org/message-id/cam3swzthwrktvurf1awaih8qthgnmzafydcnw8qju7pqhk5...@mail.gmail.com

Note also that this doesn't preclude a variant with a more direct
update part (not that I think that's all that compelling). Doing
things this way was motivated by:

1) Serving the needs of logical changeset generation plugins, even if
Andres doesn't think that needs to be exposed through SQL. He and I
both want something that does this with low overhead (in particular,
no subtransactions).

2) Getting something effective into the next release. MERGE-like
flexibility seems like a very desirable thing. And the
implementation's infrastructure can be used by an eventual MERGE
implementation.

3) Being simple enough that huge bike shedding over syntax might not
be necessary. Making insert statements grow an update tumor is likely
to get messy fast. I know because I tried it myself.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-10-15 Thread Peter Geoghegan

On Tue, Oct 15, 2013 at 10:58 AM, Josh Berkus  wrote:
> Hmmm.  Is the plan NOT to eventually get to a single-statement upsert?
> If not, then I'm not that keen on this feature.

See the original e-mail in the thread for what I imagine idiomatic
usage will look like.

http://www.postgresql.org/message-id/cam3swzthwrktvurf1awaih8qthgnmzafydcnw8qju7pqhk5...@mail.gmail.com

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-10-15 Thread Josh Berkus

On 10/15/2013 08:11 AM, Robert Haas wrote:
> I'm not saying "go implement MERGE".  I'm saying, make the
> insert-or-update operation a single statement, using some syntax TBD,
> instead of requiring the use of a new insert statement that makes
> invisible rows visible as a side effect, so that you can wrap that in
> a CTE and feed it to an update statement.  That's complex and, AFAICS,
> unlike how any other database product handles this.

Hmmm.  Is the plan NOT to eventually get to a single-statement upsert?
If not, then I'm not that keen on this feature.  I can't say that
anybody I know who's migrating from MySQL would use a 2-statement
version of upsert; if they were prepared for that, then they'd be
prepared to just rewrite their stuff as proper insert/updates anyway.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-10-15 Thread Peter Geoghegan

On Tue, Oct 15, 2013 at 10:29 AM, Andres Freund  wrote:
> I think anything that only works by breaking visibility rules that way
> is a nonstarter. Doing that from the C level is one thing, exposing it
> this way seems a bad idea.

What visibility rule is that? Upsert *has* to do effectively the same
thing as what I've proposed - there is no getting away from it. So
maybe the visibility rulebook (which as far as I can tell is "the way
things work today") needs to be updated. If we did, say, INSERT...ON
DUPLICATE KEY UPDATE, we'd have to update a row with potentially no
visible-to-snapshot version *at all*, and make a new version of that
visible. That's just what it takes. What's the difference between that
and just locking? If the only difference is that it isn't necessary to
modify tqual.c because you're passing a tid directly, that isn't a
user-visible difference - the "rule" has been broken just the same.
Arguably, it's even more of a hack, since it's a special, out-of-band
visibility exception. I'm happy to have total scrutiny of changes to
tqual.c, but I'm surprised that the mere fact of it having been
modified is being weighed so heavily.

Another thing that I'm not clear on is how an update can be backed out
of if the row is modified by another xact. As I think I've already
illustrated, the row locking that takes place has to be kind of
opportunistic. I'm sure you could do it, but it would probably be
quite invasive.

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2013-10-15 Thread Magnus Hagander

On Tue, Oct 15, 2013 at 7:32 PM, Andres Freund  wrote:
> On 2013-10-15 19:29:50 +0200, Magnus Hagander wrote:
>> On Tue, Oct 15, 2013 at 7:26 PM, Andres Freund  
>> wrote:
>> > On 2013-10-15 10:19:06 -0700, Josh Berkus wrote:
>> >> On 10/15/2013 05:52 AM, Magnus Hagander wrote:
>> >> > But the argument about being friendly for new users should definitely
>> >> > have us change wal_level and max_wal_senders.
>> >>
>> >> +1 for having replication supported out-of-the-box aside from pg_hba.conf.
>> >>
>> >> To put it another way: users are more likely to care about replication
>> >> than they are about IO overhead on a non-replicated server.  And for the
>> >> users who care about IO overhead, they are more likely to much about in
>> >> pg.conf *anyway* in order to set a slew of performance-tuning settings.
>> >
>> > But it will hurt people restoring backups using pg_restore -j. I think
>> > people might be rather dissapointed if that slows down by a factor of
>> > three.
>> >
>> > I think we really need to get to the point where we increase the wal
>> > level ondemand...
>>
>> Yeha, there are really two things.
>>
>> If we can increase wal_level on demand, that would solve one of them.
>> Turning that into a SIGHUP parameter would be great. I have no idea
>> how hard it would be. In theory, couldn't we let it be sighup and then
>> just have do_pg_start_backup() block until all backends have
>> acknowledged that they are on the new WAL level somehow? (Yes, I
>> realize this might be a big simplification, but I'm allowed to hope,
>> no?)
>
> Depends on what you want to support. For basebackups, that should be
> doable with some pullups.
> It's unfortunately more complex than that for streaming rep - we really
> need persistent standby registration there. Otherwise the wal_level will
> fall back to minimal when the standby disconnects which will obviously
> break the standby.

I was actually thinking the easier step might not be to do it
dynamically as the standby registers - just allow it to be a SIGHUP
parameter. So you'd still change it in postgresql.conf, but it would
be ok with a reload rather than a restart.

Yes, fully dynamic would be better, so if we could combined those two,
that would make us "require nothing for pg_basebackup, and just a
reload for replication slaves". The point being that we wouldn't need
a *restart* at any point - and that alond would be a big improvement.


>> The other problem is max_wal_senders. I think that's a much smaller
>> problem - setting that one to 5 or so by default shouldn't have a big
>> impact. But without the wal_level changes, it would also be mostly
>> pointless...
>
> Well, you currently cannot even set it when the wal_level isn't set
> appropriately, but that that should be easy enough to change.

Yes, it would be a trivial change to allow that parametre to be set
and then just give an error if you try to initiate streaming in that
case.


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch for reserved connections for replication users

2013-10-15 Thread Josh Berkus

On 10/15/2013 07:36 AM, Robert Haas wrote:
> On Tue, Oct 15, 2013 at 10:34 AM, Andres Freund  
> wrote:
>> Josh said we should treat replication connections in a separate "pool"
>> from normal database connections, right? So you withdraw your earlier
>> objection to that?
> 
> I don't think that's what he said.  Here's what I was referring to:

To clarify: I do, indeed, support the idea of treating replication
connections as a pool outside of max_connections.  Here's why:

FATAL: connection limit exceeded for non-superusers

SHOW max_connections;

100

SELECT COUNT(*) FROM pg_stat_activity;

94

SHOW superuser_reserved_connections;

3

... search around quite a bit,  eventually figure out that you have
three replication connections open.  We've already set up an illogical
and hard-to-troubleshoot situation where replication connections do not
appear in pg_stat_activity, yet they are counted against max_connections.

You could argue that the same is true of superuser_reserved_connections,
but there's a couple reasons why it isn't:

1) if superusers are actually connected, that shows up in
pg_stat_activity (and given how many of our users run their apps as
superuser, they get to max_connections out anyway).

2) the error message spells out that there may be superuser connections
available.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2013-10-15 Thread Andres Freund

On 2013-10-15 19:29:50 +0200, Magnus Hagander wrote:
> On Tue, Oct 15, 2013 at 7:26 PM, Andres Freund  wrote:
> > On 2013-10-15 10:19:06 -0700, Josh Berkus wrote:
> >> On 10/15/2013 05:52 AM, Magnus Hagander wrote:
> >> > But the argument about being friendly for new users should definitely
> >> > have us change wal_level and max_wal_senders.
> >>
> >> +1 for having replication supported out-of-the-box aside from pg_hba.conf.
> >>
> >> To put it another way: users are more likely to care about replication
> >> than they are about IO overhead on a non-replicated server.  And for the
> >> users who care about IO overhead, they are more likely to much about in
> >> pg.conf *anyway* in order to set a slew of performance-tuning settings.
> >
> > But it will hurt people restoring backups using pg_restore -j. I think
> > people might be rather dissapointed if that slows down by a factor of
> > three.
> >
> > I think we really need to get to the point where we increase the wal
> > level ondemand...
> 
> Yeha, there are really two things.
> 
> If we can increase wal_level on demand, that would solve one of them.
> Turning that into a SIGHUP parameter would be great. I have no idea
> how hard it would be. In theory, couldn't we let it be sighup and then
> just have do_pg_start_backup() block until all backends have
> acknowledged that they are on the new WAL level somehow? (Yes, I
> realize this might be a big simplification, but I'm allowed to hope,
> no?)

Depends on what you want to support. For basebackups, that should be
doable with some pullups.
It's unfortunately more complex than that for streaming rep - we really
need persistent standby registration there. Otherwise the wal_level will
fall back to minimal when the standby disconnects which will obviously
break the standby.

> The other problem is max_wal_senders. I think that's a much smaller
> problem - setting that one to 5 or so by default shouldn't have a big
> impact. But without the wal_level changes, it would also be mostly
> pointless...

Well, you currently cannot even set it when the wal_level isn't set
appropriately, but that that should be easy enough to change.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-10-15 Thread Andres Freund

On 2013-10-15 10:19:17 -0700, Peter Geoghegan wrote:
> On Tue, Oct 15, 2013 at 9:56 AM, Robert Haas  wrote:
> > Well, I don't know that any of us can claim to have a lock on what the
> > syntax should look like.
> 
> Sure. But it's not just syntax. We're talking about functional
> differences too, since you're talking about mandating an update, which
> is a not the same as an "update locked row only conditionally", or a
> delete.

I think anything that only works by breaking visibility rules that way
is a nonstarter. Doing that from the C level is one thing, exposing it
this way seems a bad idea.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2013-10-15 Thread Magnus Hagander

On Tue, Oct 15, 2013 at 7:26 PM, Andres Freund  wrote:
> On 2013-10-15 10:19:06 -0700, Josh Berkus wrote:
>> On 10/15/2013 05:52 AM, Magnus Hagander wrote:
>> > But the argument about being friendly for new users should definitely
>> > have us change wal_level and max_wal_senders.
>>
>> +1 for having replication supported out-of-the-box aside from pg_hba.conf.
>>
>> To put it another way: users are more likely to care about replication
>> than they are about IO overhead on a non-replicated server.  And for the
>> users who care about IO overhead, they are more likely to much about in
>> pg.conf *anyway* in order to set a slew of performance-tuning settings.
>
> But it will hurt people restoring backups using pg_restore -j. I think
> people might be rather dissapointed if that slows down by a factor of
> three.
>
> I think we really need to get to the point where we increase the wal
> level ondemand...

Yeha, there are really two things.

If we can increase wal_level on demand, that would solve one of them.
Turning that into a SIGHUP parameter would be great. I have no idea
how hard it would be. In theory, couldn't we let it be sighup and then
just have do_pg_start_backup() block until all backends have
acknowledged that they are on the new WAL level somehow? (Yes, I
realize this might be a big simplification, but I'm allowed to hope,
no?)

The other problem is max_wal_senders. I think that's a much smaller
problem - setting that one to 5 or so by default shouldn't have a big
impact. But without the wal_level changes, it would also be mostly
pointless...

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2013-10-15 Thread Andres Freund

On 2013-10-15 10:19:06 -0700, Josh Berkus wrote:
> On 10/15/2013 05:52 AM, Magnus Hagander wrote:
> > But the argument about being friendly for new users should definitely
> > have us change wal_level and max_wal_senders.
> 
> +1 for having replication supported out-of-the-box aside from pg_hba.conf.
> 
> To put it another way: users are more likely to care about replication
> than they are about IO overhead on a non-replicated server.  And for the
> users who care about IO overhead, they are more likely to much about in
> pg.conf *anyway* in order to set a slew of performance-tuning settings.

But it will hurt people restoring backups using pg_restore -j. I think
people might be rather dissapointed if that slows down by a factor of
three.

I think we really need to get to the point where we increase the wal
level ondemand...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2013-10-15 Thread Josh Berkus

On 10/15/2013 05:52 AM, Magnus Hagander wrote:
> But the argument about being friendly for new users should definitely
> have us change wal_level and max_wal_senders.

+1 for having replication supported out-of-the-box aside from pg_hba.conf.

To put it another way: users are more likely to care about replication
than they are about IO overhead on a non-replicated server.  And for the
users who care about IO overhead, they are more likely to much about in
pg.conf *anyway* in order to set a slew of performance-tuning settings.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical changeset generation v6.2

2013-10-15 Thread Robert Haas

On Tue, Oct 15, 2013 at 10:48 AM, Andres Freund  wrote:
>> > What about columns like:
>> > * action B|I|U|D|C
>>
>> BEGIN and COMMIT?
>
> That's B and C, yes. You'd rather not have them? When would you replay
> the commit without an explicit message telling you to?

No, BEGIN and COMMIT sounds good, actually.  Just wanted to make sure
I understood.

>> Repeating the column names for every row strikes me as a nonstarter.
>> [...]
>> Sure, some people may want JSON or XML
>> output that reiterates the labels every time, but for a lot of people
>> that's going to greatly increase the size of the output and be
>> undesirable for that reason.
>
> But I argue that most simpler users - which are exactly the ones a
> generic output plugin is aimed at - will want all column names since it
> makes replay far easier.

Meh, maybe.

>> If the plugin interface isn't rich enough to provide a convenient way
>> to avoid that, then it needs to be fixed so that it is, because it
>> will be a common requirement.
>
> Oh, it surely is possibly to avoid repeating it. The output plugin
> interface simply gives you a relcache entry, that contains everything
> necessary.
> The output plugin would need to keep track of whether it has output data
> for a specific relation and it would need to check whether the table
> definition has changed, but I don't see how we could avoid that?

Well, it might be nice if there were a callback for, hey, schema has
changed!  Seems like a lot of plugins will want to know that for one
reason or another, and rechecking for every tuple sounds expensive.

>> > What still need to be determined is:
>> > * how do we separate and escape multiple values in one CSV column
>> > * how do we represent NULLs
>>
>> I consider the escaping a key design decision.  Ideally, it should be
>> something that's easy to reverse from a scripting language; ideally
>> also, it should be something similar to how we handle COPY.  These
>> goals may be in conflict; we'll have to pick something.
>
> Note that parsing COPYs is a major PITA from most languages...
>
> Perhaps we should make the default output json instead? With every
> action terminated by a nullbyte?
> That's probably easier to parse from various scripting languages than
> anything else.

I could go for that.  It's not quite as compact as I might hope, but
JSON does seem to make people awfully happy.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-10-15 Thread Peter Geoghegan

On Tue, Oct 15, 2013 at 8:07 AM, Peter Geoghegan  wrote:
> Naturally we all want MERGE. It seems self-defeating to insist on
> something significantly harder that there is significant less demand
> for, though.

I hasten to add: which is not to imply that you're insisting rather
than expressing a sentiment.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical changeset generation v6.2

2013-10-15 Thread Andres Freund

On 2013-10-15 11:02:39 -0400, Robert Haas wrote:
> >> If the plugin interface isn't rich enough to provide a convenient way
> >> to avoid that, then it needs to be fixed so that it is, because it
> >> will be a common requirement.
> >
> > Oh, it surely is possibly to avoid repeating it. The output plugin
> > interface simply gives you a relcache entry, that contains everything
> > necessary.
> > The output plugin would need to keep track of whether it has output data
> > for a specific relation and it would need to check whether the table
> > definition has changed, but I don't see how we could avoid that?
> 
> Well, it might be nice if there were a callback for, hey, schema has
> changed!  Seems like a lot of plugins will want to know that for one
> reason or another, and rechecking for every tuple sounds expensive.

I don't really see how we could provide that in any useful manner. We
could provide a callback that is called whenever another transaction has
changed the schema, but there's nothing easily to be done about schema
changes by the replayed transaction itself. And those are the only ones
where meaningful schema changes can happen since the locks the source
transaction has held will prevent most other schema changes.

As much as I hate such code, I guess checking (and possibly storing) the
ctid||xmin of the pg_class row is the easiest thing we could do :(.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CF 2013-09 Wrap Up

2013-10-15 Thread Mike Blackwell

On Tue, Oct 15, 2013 at 1:39 AM, Noah Misch  wrote:

> On Mon, Oct 14, 2013 at 01:56:42PM -0500, Mike Blackwell wrote:> Any
> patches marked Needs Review will be automatically moved to the next CF.
> >  We will try to make sure that all patches in the current CF have
> received
> > at least one review.
>
> The combined effect of those two statements is not clear to me.  Does that
> mean you'll retain never-reviewed patches and automatically move patches
> that
> have received at least one review?
>

Yes on the latter part.  We will try to get a quick review for
not-yet-reviewed patches and move or return them based on the result of
that review.  If we fail to find a reviewer, the patches will get moved to
the next CF.

For those following along, here are the patches still needing a first look.
 They are for the most part performance or internals patches and could use
the eye of someone more experienced.  Please consider a quick review of one
of them if you fit that description.  We'd like everyone to get a fair
shake here. ^_^

HStore Gin Speedup
Performance Improvement by reducing WAL for Update
Operation
[PoC] pgstattuple2: block sampling to reduce physical
read
ECPG cursor 
readahead

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-10-15 Thread Andres Freund

On 2013-10-15 11:11:24 -0400, Robert Haas wrote:
> I'm not saying "go implement MERGE".  I'm saying, make the
> insert-or-update operation a single statement, using some syntax TBD,
> instead of requiring the use of a new insert statement that makes
> invisible rows visible as a side effect, so that you can wrap that in
> a CTE and feed it to an update statement.  That's complex and, AFAICS,
> unlike how any other database product handles this.

I think we most definitely should provide a single statement
variant. That's the one users yearn for.

I also would like a variant where I can lock a row on conflict, for
multimaster scenarios, but that doesn't necessarily have to be exposed
to SQL.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical changeset generation v6.2

2013-10-15 Thread k...@rice.edu

On Tue, Oct 15, 2013 at 11:02:39AM -0400, Robert Haas wrote:
> >> goals may be in conflict; we'll have to pick something.
> >
> > Note that parsing COPYs is a major PITA from most languages...
> >
> > Perhaps we should make the default output json instead? With every
> > action terminated by a nullbyte?
> > That's probably easier to parse from various scripting languages than
> > anything else.
> 
> I could go for that.  It's not quite as compact as I might hope, but
> JSON does seem to make people awfully happy.
> 
> -- 
> Robert Haas
> EnterpriseDB: http://www.enterprisedb.com
> The Enterprise PostgreSQL Company
> 

Feeding such a JSON stream into a compression algorithm like lz4 or
snappy should result in a pretty compact stream. The latest lz4 updates
also have ability to use a pre-existing dictionary which would really
help remove the redundant pieces.

Regards,
Ken


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-10-15 Thread Robert Haas

On Tue, Oct 15, 2013 at 11:07 AM, Peter Geoghegan  wrote:
> On Tue, Oct 15, 2013 at 5:15 AM, Robert Haas  wrote:
>> Well, the SQL standard way of doing this type of operation is MERGE.
>> The alternative we know exists in other databases is REPLACE; there's
>> also INSERT .. ON DUPLICATE KEY update.  In all of those cases,
>> whatever weirdness exists around MVCC is confined to that one command.
>>  I tend to think we should do similarly, with the goal that
>> HeapTupleSatisfiesMVCC need not change at all.
>
> I don't think that it's very pragmatic to define success in terms of
> not modifying a single visibility function. I feel it would be more
> useful to define it as providing acceptable, non-surprising semantics,
> while not regressing performance in other areas.
>
> The fact remains that you're going to have a create a new snapshot
> type even for this special case, so I don't see any win as regards
> managing invasiveness here. Quite the contrary, in fact.

Well, we might have to agree to disagree.

>> I don't have the only vote here, of course, but my feeling is that
>> that's more likely to be a good route.
>
> Naturally we all want MERGE. It seems self-defeating to insist on
> something significantly harder that there is significant less demand
> for, though. I thought that there was at least informal agreement that
> this sort of approach was preferable to MERGE in its full generality,
> based on feedback at the 2012 developer meeting. I really don't think
> that what I've done here is any worse than INSERT...ON DUPLICATE KEY
> UPDATE in any of the areas you express concern about here. REPLACE has
> some serious problems, and I just don't see it as a viable alternative
> at all - just ask any MySQL user.
>
> MERGE is of course more flexible to what I have here in some ways, but
> actually less flexible in other ways. I think that the real point of
> MERGE is that it's defined in a way that serves data warehousing use
> cases very well: the semantics constrain things such that the executor
> only has to execute a single ModifyTable node that does inserts,
> updates and deletes in a single scan. That's great, but what if it's
> useful to do that CRUD (yes, this can include selects) to entirely
> different tables? Or what if the relevant DML will only come in a
> later statement in the same transaction?

I'm not saying "go implement MERGE".  I'm saying, make the
insert-or-update operation a single statement, using some syntax TBD,
instead of requiring the use of a new insert statement that makes
invisible rows visible as a side effect, so that you can wrap that in
a CTE and feed it to an update statement.  That's complex and, AFAICS,
unlike how any other database product handles this.

Again, other people can have different opinions on this, and that's
fine.  I'm just giving you mine.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical changeset generation v6.2

2013-10-15 Thread Robert Haas

On Tue, Oct 15, 2013 at 10:56 AM, Andres Freund  wrote:
>> > That means you allow trivial remote code execution since you could try
>> > to load system() or something else that's available in every shared
>> > object. Now you can argue that that's OK since we have special checks
>> > for replication connections, but I'd rather not go there.
>>
>> Well, obviously you can't let somebody load any library they want.
>> But that's pretty much true anyway; LOAD had better be confined to
>> superusers unless there is something (like a pg_proc entry) that
>> provides "prior authorization" for that specific load.
>
> Currently you can create users that have permissions for replication but
> which are not superusers. I think we should strive to providing that
> capability for changeset extraction as well.

I agree.

>> >> Perhaps this ought to work similarly.  Create a function in pg_proc
>> >> which returns the structure containing the function pointers.  Then,
>> >> when that output plugin is selected, it'll automatically trigger
>> >> loading the correct shared library if that's needed; and the shared
>> >> library name may (but need not) match the output plugin name.
>> >
>> > I'd like to avoid relying on inserting stuff into pg_proc because that
>> > makes it harder to extract WAL from a HS standby. Requiring to configure
>> > that on the primary to extract data on the standby seems confusing to
>> > me.
>> >
>> > But perhaps that's the correct solution :/
>>
>> That's a reasonable concern.  I don't have another idea at the moment,
>> unless we want to allow replication connections to issue LOAD
>> commands.  Then you can LOAD the library, so that the plug-in is
>> registered under the well-known name you expect it to have, and then
>> use that name to start replication.
>
> But what's the advantage of that over the current situation or one where
> PG_load_output_plugin() is called? The current and related
> implementations allow you to only load libraries in some designated
> postgres directories and it doesn't allow you to call any arbitrary
> functions in there.

Well, I've already said why I don't like conflating the library name
and the plugin name.  It rules out core plugins and libraries that
provide multiple plugins.  I don't have anything to add to that.

> Would you be content with a symbol "PG_load_output_plugin" being called
> that fills out the actual callbacks?

Well, it doesn't fix the muddling of library names with output plugin
names, but I suppose I'd find it a modest improvement.

>> > Well, just providing the C API + an example in a first step didn't work
>> > out too badly for FDWs. I am pretty sure that once released there will
>> > soon be extensions for it on PGXN or whatever for special usecases.
>>
>> I suspect so, too.  But I also think that if that's the only thing
>> available in the first release, a lot of users will get a poor initial
>> impression.
>
> I think lots of people will expect a builtin logical replication
> solution :/. Which seems a tad unlikely to arrive in 9.4.

Yep.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-10-15 Thread Peter Geoghegan

On Tue, Oct 15, 2013 at 5:15 AM, Robert Haas  wrote:
> Well, the SQL standard way of doing this type of operation is MERGE.
> The alternative we know exists in other databases is REPLACE; there's
> also INSERT .. ON DUPLICATE KEY update.  In all of those cases,
> whatever weirdness exists around MVCC is confined to that one command.
>  I tend to think we should do similarly, with the goal that
> HeapTupleSatisfiesMVCC need not change at all.

I don't think that it's very pragmatic to define success in terms of
not modifying a single visibility function. I feel it would be more
useful to define it as providing acceptable, non-surprising semantics,
while not regressing performance in other areas.

The fact remains that you're going to have a create a new snapshot
type even for this special case, so I don't see any win as regards
managing invasiveness here. Quite the contrary, in fact.

> I don't have the only vote here, of course, but my feeling is that
> that's more likely to be a good route.

Naturally we all want MERGE. It seems self-defeating to insist on
something significantly harder that there is significant less demand
for, though. I thought that there was at least informal agreement that
this sort of approach was preferable to MERGE in its full generality,
based on feedback at the 2012 developer meeting. I really don't think
that what I've done here is any worse than INSERT...ON DUPLICATE KEY
UPDATE in any of the areas you express concern about here. REPLACE has
some serious problems, and I just don't see it as a viable alternative
at all - just ask any MySQL user.

MERGE is of course more flexible to what I have here in some ways, but
actually less flexible in other ways. I think that the real point of
MERGE is that it's defined in a way that serves data warehousing use
cases very well: the semantics constrain things such that the executor
only has to execute a single ModifyTable node that does inserts,
updates and deletes in a single scan. That's great, but what if it's
useful to do that CRUD (yes, this can include selects) to entirely
different tables? Or what if the relevant DML will only come in a
later statement in the same transaction?

-- 
Peter Geoghegan

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical changeset generation v6.2

2013-10-15 Thread Andres Freund

On 2013-10-15 10:34:53 -0400, Robert Haas wrote:
> On Tue, Oct 15, 2013 at 10:27 AM, Andres Freund  
> wrote:
> >> I think part of the problem may be that you're using the library name
> >> to identify the output plugin.  I'm not excited about that design.
> >> For functions, you give the function a name and that is a pointer to
> >> where to actually find the function, which may be a 2-tuple
> >> , or perhaps just a 1-tuple
> >> , or maybe the whole text of a PL/pgsql
> >> procedure that should be compiled.
> >
> > That means you allow trivial remote code execution since you could try
> > to load system() or something else that's available in every shared
> > object. Now you can argue that that's OK since we have special checks
> > for replication connections, but I'd rather not go there.
> 
> Well, obviously you can't let somebody load any library they want.
> But that's pretty much true anyway; LOAD had better be confined to
> superusers unless there is something (like a pg_proc entry) that
> provides "prior authorization" for that specific load.

Currently you can create users that have permissions for replication but
which are not superusers. I think we should strive to providing that
capability for changeset extraction as well.

> >> Perhaps this ought to work similarly.  Create a function in pg_proc
> >> which returns the structure containing the function pointers.  Then,
> >> when that output plugin is selected, it'll automatically trigger
> >> loading the correct shared library if that's needed; and the shared
> >> library name may (but need not) match the output plugin name.
> >
> > I'd like to avoid relying on inserting stuff into pg_proc because that
> > makes it harder to extract WAL from a HS standby. Requiring to configure
> > that on the primary to extract data on the standby seems confusing to
> > me.
> >
> > But perhaps that's the correct solution :/
> 
> That's a reasonable concern.  I don't have another idea at the moment,
> unless we want to allow replication connections to issue LOAD
> commands.  Then you can LOAD the library, so that the plug-in is
> registered under the well-known name you expect it to have, and then
> use that name to start replication.

But what's the advantage of that over the current situation or one where
PG_load_output_plugin() is called? The current and related
implementations allow you to only load libraries in some designated
postgres directories and it doesn't allow you to call any arbitrary
functions in there.

Would you be content with a symbol "PG_load_output_plugin" being called
that fills out the actual callbacks?

> >> Now, some users are still going to head for the hills.  But at least
> >> from where I sit it sounds a hell of a lot better than the first
> >> answer.  We're not going to solve all of the tooling problems around
> >> this technology in one release, for sure.  But as far as 95% of our
> >> users are concerned, a C API might as well not exist at all.  People
> >> WILL try to machine parse the output of whatever demo plugins we
> >> provide; so I think we should try hard to provide at least one such
> >> plugin that is designed to make that as easy as possible.
> >
> > Well, just providing the C API + an example in a first step didn't work
> > out too badly for FDWs. I am pretty sure that once released there will
> > soon be extensions for it on PGXN or whatever for special usecases.
> 
> I suspect so, too.  But I also think that if that's the only thing
> available in the first release, a lot of users will get a poor initial
> impression.

I think lots of people will expect a builtin logical replication
solution :/. Which seems a tad unlikely to arrive in 9.4.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical changeset generation v6.2

2013-10-15 Thread Andres Freund

On 2013-10-15 10:15:14 -0400, Robert Haas wrote:
> On Tue, Oct 15, 2013 at 9:47 AM, Andres Freund  wrote:
> > On 2013-10-15 15:17:58 +0200, Andres Freund wrote:
> >> If we go for CSV I think we should put the entire primary key as one
> >> column (containing all the columns) and the entire row another.
> >
> > What about columns like:
> > * action B|I|U|D|C
> 
> BEGIN and COMMIT?

That's B and C, yes. You'd rather not have them? When would you replay
the commit without an explicit message telling you to?

> Repeating the column names for every row strikes me as a nonstarter.
> [...]
> Sure, some people may want JSON or XML
> output that reiterates the labels every time, but for a lot of people
> that's going to greatly increase the size of the output and be
> undesirable for that reason.

But I argue that most simpler users - which are exactly the ones a
generic output plugin is aimed at - will want all column names since it
makes replay far easier.

> If the plugin interface isn't rich enough to provide a convenient way
> to avoid that, then it needs to be fixed so that it is, because it
> will be a common requirement.

Oh, it surely is possibly to avoid repeating it. The output plugin
interface simply gives you a relcache entry, that contains everything
necessary.
The output plugin would need to keep track of whether it has output data
for a specific relation and it would need to check whether the table
definition has changed, but I don't see how we could avoid that?

> > What still need to be determined is:
> > * how do we separate and escape multiple values in one CSV column
> > * how do we represent NULLs
> 
> I consider the escaping a key design decision.  Ideally, it should be
> something that's easy to reverse from a scripting language; ideally
> also, it should be something similar to how we handle COPY.  These
> goals may be in conflict; we'll have to pick something.

Note that parsing COPYs is a major PITA from most languages...

Perhaps we should make the default output json instead? With every
action terminated by a nullbyte?
That's probably easier to parse from various scripting languages than
anything else.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Doc Patch: Subquery section to say that subqueries can't modify data

2013-10-15 Thread Vik Fearing

On 08/06/2013 11:03 PM, Karl O. Pinc wrote:
> The attached documentation patch, doc-subqueries-v1.patch,
> applies against head.
>
> I wanted to document that subqueries can't modify data.
> This is mentioned in the documentation for SELECT and
> implied elsewhere but I was looking for something more
> than an 'in-passing' mention.  
>
> (I wrote a bad query,
> modifying data in a subquery, couldn't recall where
> it was documented that you can't do this, and couldn't
> find the answer from the TOC or the index.  Now that
> there's lots of statements with RETURNING clauses
> it's natural to want to use them in subqueries.)

Hello, I am (finally) reviewing this patch.

After reading your reasoning, David's rebuttal, and the patch itself;
I'm wondering if this is needed or wanted at all.

Supposing it is wanted, it creates more questions than it answers.  The
two biggies are:

* In what other contexts can tabular subqueries be used?
* What are other ways of integrating data returned by data modification
statements?

On a superficial level I find the number of commas a bit clunky, and
"parentheses" is misspelled.

> The last 2 sentences of the first paragraph are
> something in the way of helpful hints and may not
> be appropriate, or even accurate.  I've left them in 
> for review.

I think the last sentence (of the first paragraph) is a bit much, but
the penultimate seems fine.

I'm attaching an updated patch that I think is an improvement but it's
still at a draft level and needs more copyediting.  This new patch does
not attempt to answer the two questions above.

-- 
Vik

diff --git a/doc/src/sgml/queries.sgml b/doc/src/sgml/queries.sgml
index c32c857..b134b66 100644
--- a/doc/src/sgml/queries.sgml
+++ b/doc/src/sgml/queries.sgml
@@ -549,7 +549,7 @@ SELECT * FROM my_table AS m WHERE my_table.a > 5;-- wrong
 SELECT * FROM people AS mother JOIN people AS child ON mother.id = child.mother_id;
 
  Additionally, an alias is required if the table reference is a
- subquery (see ).
+ subquery (see ).
 
 
 
@@ -590,10 +590,10 @@ SELECT a.* FROM (my_table AS a JOIN your_table AS b ON ...) AS c
 

 
-   
-Subqueries
+   
+Subquery Derived Tables
 
-
+
  subquery
 
 
@@ -1315,6 +1315,44 @@ SELECT DISTINCT ON (expression , 
 
 
+ 
+  Subqueries
+
+  
+   subquery
+  
+
+  
+   sub-select
+  
+
+  
+   Subqueries, also called sub-selects, are queries written within
+   parentheses in the text of larger queries.  The values produced by
+   subqueries may be scalar, or tabular.  Scalar subqueries are used within expressions as described
+   in .
+   Tabular subqueries may substitute for tables as described
+   in , generate array
+   content as described
+   in , have their
+   result content tested within expressions as described
+   in , or be used in other
+   contexts.  Often either joins or subqueries can be used to produce
+   different query plans yielding identical output.
+  
+
+  
+   Subqueries may not modify database
+   content.  Common Table
+   Expressions are one way to integrate data returned by data
+   modification statements,
+   i.e. INSERT/UPDATE/DELETE
+   statements with RETURNING clauses, into larger
+   queries.
+  
+ 
+
+
  
   Combining Queries
 

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch for reserved connections for replication users

2013-10-15 Thread Andres Freund

On 2013-10-15 10:36:41 -0400, Robert Haas wrote:
> On Tue, Oct 15, 2013 at 10:34 AM, Andres Freund  
> wrote:
> >> But I also agree that making max_wal_senders act as both a minimum and
> >> a maximum is no good.  +1 to everything Josh Berkus said.
> >
> > Josh said we should treat replication connections in a separate "pool"
> > from normal database connections, right? So you withdraw your earlier
> > objection to that?
> 
> I don't think that's what he said.  Here's what I was referring to:
> 
> $ Changing max_wal_senders requires a restart.  As such, we currently
> $ advise users to set the setting generously: "as many replication
> $ connections as you think you'll ever need, plus two".  If
> $ max_wal_senders is a reservation which could cause the user to run out
> $ of other connections sooner than expected, then the user is faced with a
> $ new "hard to set" parameter: they don't want to set it too high *or* too
> $ low.
> $
> $ This would result in a lot of user frustration as they try to get thier
> $ connection configuration right and have to restart the server multiple
> $ times.  I find few new features worth making it *harder* to configure
> $ PostgreSQL, and reserved replication connections certainly don't qualify.
> $
> $ If it's worth having reserved replication connections (and I can see
> $ some reasons to want it), then we need a new GUC for this:
> $ "reserved_walsender_connections"

I am referring to
http://archives.postgresql.org/message-id/525C31D3.3010006%40agliodbs.com
:
> On 10/14/2013 10:51 AM, Andres Freund wrote:
> > Imo the complications around this prove my (way earlier) point that it'd
> > be much better to treat replication connections as something entirely
> > different to normal SQL connections. There's really not much overlap
> > here and while there's some philosophical point to be made about it all
> > being connections, from a practical POV treating them separately seems
> > better.
> 
> Given that replication connections don't even appear in pg_stat_activity
> now, I'd agree with you.


I still fail to see what the point is in treating those classes of
connections together. It only serves to confuse users and makes
considerations way much more complicated. There's no need for reserved
replication connections or anything like it if would separate the pools.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch for reserved connections for replication users

2013-10-15 Thread Robert Haas

On Tue, Oct 15, 2013 at 10:34 AM, Andres Freund  wrote:
>> But I also agree that making max_wal_senders act as both a minimum and
>> a maximum is no good.  +1 to everything Josh Berkus said.
>
> Josh said we should treat replication connections in a separate "pool"
> from normal database connections, right? So you withdraw your earlier
> objection to that?

I don't think that's what he said.  Here's what I was referring to:

$ Changing max_wal_senders requires a restart.  As such, we currently
$ advise users to set the setting generously: "as many replication
$ connections as you think you'll ever need, plus two".  If
$ max_wal_senders is a reservation which could cause the user to run out
$ of other connections sooner than expected, then the user is faced with a
$ new "hard to set" parameter: they don't want to set it too high *or* too
$ low.
$
$ This would result in a lot of user frustration as they try to get thier
$ connection configuration right and have to restart the server multiple
$ times.  I find few new features worth making it *harder* to configure
$ PostgreSQL, and reserved replication connections certainly don't qualify.
$
$ If it's worth having reserved replication connections (and I can see
$ some reasons to want it), then we need a new GUC for this:
$ "reserved_walsender_connections"

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Release note fix for timeline item

2013-10-15 Thread Bruce Momjian

On Tue, Oct 15, 2013 at 02:32:47PM +0900, KONDO Mitsumasa wrote:
> Sorry for my reply late...
> 
> (2013/10/08 23:26), Bruce Momjian wrote:
> > First, I want to apologize for not completing the release notes earlier
> > so that others could review them.  I started working on the release
> > notes on Friday, but my unfamiliarity with the process and fear of
> > making a mistake caused many delays.  I have improved the documentation
> > on the process which will hopefully help next time.
> There isn't anything in particular that I was dissatisfied about it.
> 
> >You are right that there is alot of details skipped in the release note
> >text.  I have developed the attached patch which I think does a better
> >job.  Is it OK?
> Yes, off course!
> Thanks for your sincere action!

Thanks, patch applied back through 9.1.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical changeset generation v6.2

2013-10-15 Thread Robert Haas

On Tue, Oct 15, 2013 at 10:27 AM, Andres Freund  wrote:
>> I think part of the problem may be that you're using the library name
>> to identify the output plugin.  I'm not excited about that design.
>> For functions, you give the function a name and that is a pointer to
>> where to actually find the function, which may be a 2-tuple
>> , or perhaps just a 1-tuple
>> , or maybe the whole text of a PL/pgsql
>> procedure that should be compiled.
>
> That means you allow trivial remote code execution since you could try
> to load system() or something else that's available in every shared
> object. Now you can argue that that's OK since we have special checks
> for replication connections, but I'd rather not go there.

Well, obviously you can't let somebody load any library they want.
But that's pretty much true anyway; LOAD had better be confined to
superusers unless there is something (like a pg_proc entry) that
provides "prior authorization" for that specific load.

>> Perhaps this ought to work similarly.  Create a function in pg_proc
>> which returns the structure containing the function pointers.  Then,
>> when that output plugin is selected, it'll automatically trigger
>> loading the correct shared library if that's needed; and the shared
>> library name may (but need not) match the output plugin name.
>
> I'd like to avoid relying on inserting stuff into pg_proc because that
> makes it harder to extract WAL from a HS standby. Requiring to configure
> that on the primary to extract data on the standby seems confusing to
> me.
>
> But perhaps that's the correct solution :/

That's a reasonable concern.  I don't have another idea at the moment,
unless we want to allow replication connections to issue LOAD
commands.  Then you can LOAD the library, so that the plug-in is
registered under the well-known name you expect it to have, and then
use that name to start replication.

>> Now, some users are still going to head for the hills.  But at least
>> from where I sit it sounds a hell of a lot better than the first
>> answer.  We're not going to solve all of the tooling problems around
>> this technology in one release, for sure.  But as far as 95% of our
>> users are concerned, a C API might as well not exist at all.  People
>> WILL try to machine parse the output of whatever demo plugins we
>> provide; so I think we should try hard to provide at least one such
>> plugin that is designed to make that as easy as possible.
>
> Well, just providing the C API + an example in a first step didn't work
> out too badly for FDWs. I am pretty sure that once released there will
> soon be extensions for it on PGXN or whatever for special usecases.

I suspect so, too.  But I also think that if that's the only thing
available in the first release, a lot of users will get a poor initial
impression.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch for reserved connections for replication users

2013-10-15 Thread Andres Freund

On 2013-10-15 10:29:58 -0400, Robert Haas wrote:
> On Tue, Oct 15, 2013 at 12:13 AM, Amit Kapila  wrote:
> > If we think this way, then may be we should have max_user_connections
> > instead of max_connections and then max_wal_connections. But still
> > there are other's like pg_basebackup who needs connections and
> > tomorrow there can be new such entities which need connection.
> > Also we might need to have different infrastructure in code to make
> > these options available to users.
> > I think having different parameters to configure maximum connections
> > for different entities can complicate both code as well as user's job.
> 
> Renaming max_connections is far too big a compatibility break to
> consider without far more benefit than what this patch is aiming at.
> I'm not prepared to endure the number of beatings I'd have to take if
> we did that.

+many

> But I also agree that making max_wal_senders act as both a minimum and
> a maximum is no good.  +1 to everything Josh Berkus said.

Josh said we should treat replication connections in a separate "pool"
from normal database connections, right? So you withdraw your earlier
objection to that?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch for reserved connections for replication users

2013-10-15 Thread Robert Haas

On Tue, Oct 15, 2013 at 12:13 AM, Amit Kapila  wrote:
> If we think this way, then may be we should have max_user_connections
> instead of max_connections and then max_wal_connections. But still
> there are other's like pg_basebackup who needs connections and
> tomorrow there can be new such entities which need connection.
> Also we might need to have different infrastructure in code to make
> these options available to users.
> I think having different parameters to configure maximum connections
> for different entities can complicate both code as well as user's job.

Renaming max_connections is far too big a compatibility break to
consider without far more benefit than what this patch is aiming at.
I'm not prepared to endure the number of beatings I'd have to take if
we did that.

But I also agree that making max_wal_senders act as both a minimum and
a maximum is no good.  +1 to everything Josh Berkus said.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical changeset generation v6.2

2013-10-15 Thread Andres Freund

On 2013-10-15 10:09:05 -0400, Robert Haas wrote:
> On Tue, Oct 15, 2013 at 9:17 AM, Andres Freund  wrote:
> > It allows you to use the shared libary both as a normal extension loaded
> > via shared_preload_library or adhoc and as an output plugin which seems
> > like a sensible goal.
> > We could have a single _PG_init_output_plugin() symbol that fills in
> > such a struct which would then not conflict with using the .so
> > independently. If you prefer that I'll change things around.
> 
> I think part of the problem may be that you're using the library name
> to identify the output plugin.  I'm not excited about that design.
> For functions, you give the function a name and that is a pointer to
> where to actually find the function, which may be a 2-tuple
> , or perhaps just a 1-tuple
> , or maybe the whole text of a PL/pgsql
> procedure that should be compiled.

That means you allow trivial remote code execution since you could try
to load system() or something else that's available in every shared
object. Now you can argue that that's OK since we have special checks
for replication connections, but I'd rather not go there.

> Perhaps this ought to work similarly.  Create a function in pg_proc
> which returns the structure containing the function pointers.  Then,
> when that output plugin is selected, it'll automatically trigger
> loading the correct shared library if that's needed; and the shared
> library name may (but need not) match the output plugin name.

I'd like to avoid relying on inserting stuff into pg_proc because that
makes it harder to extract WAL from a HS standby. Requiring to configure
that on the primary to extract data on the standby seems confusing to
me.

But perhaps that's the correct solution :/

> Now, some users are still going to head for the hills.  But at least
> from where I sit it sounds a hell of a lot better than the first
> answer.  We're not going to solve all of the tooling problems around
> this technology in one release, for sure.  But as far as 95% of our
> users are concerned, a C API might as well not exist at all.  People
> WILL try to machine parse the output of whatever demo plugins we
> provide; so I think we should try hard to provide at least one such
> plugin that is designed to make that as easy as possible.

Well, just providing the C API + an example in a first step didn't work
out too badly for FDWs. I am pretty sure that once released there will
soon be extensions for it on PGXN or whatever for special usecases.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical changeset generation v6.2

2013-10-15 Thread Andres Freund

On 2013-10-15 10:20:55 -0400, Robert Haas wrote:
> > For multi-master / conflict resolution you may also want all old
> > values to make sure that they have not changed on target.
> 
> The patch as proposed doesn't make that information available.  If you
> want that to be an option, now would be the right time to argue for
> it.

I don't think you necessarily want it for most MM solutions, but I agree
it will be useful for some scenarios.

I think the ReorderBufferChange struct needs a better way to distinguish
between old-key and old-tuple now, but I'd rather implement the
facililty for logging the full old tuple in a separate patch. The
patchset is big enough as is, lets not tack on more features.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical changeset generation v6.2

2013-10-15 Thread Robert Haas

On Tue, Oct 15, 2013 at 10:09 AM, Hannu Krosing  wrote:
>> I don't see how you can fail to know what "all" is.
> We instinctively know what "all" is - as in the famous case of buddhist
> ordering a
> hamburger - "Make me All wit Everything" :) - but the requirements of
> different  replications systems vary wildly.

That's true to some degree, but let's not exaggerate the degree to
which it is true.

> For multi-master / conflict resolution you may also want all old
> values to make sure that they have not changed on target.

The patch as proposed doesn't make that information available.  If you
want that to be an option, now would be the right time to argue for
it.

> for some forms of conflict resolution we may even want to know
> the database user who initiated the operation. and possibly even
> some session variables like "very_important=yes".

Well, if you have requirements like logging very_important=yes, then
you're definitely into the territory where you need your own output
plugin.  I have no problem telling people who want that sort of thing
that they've got to go write C code.  What I'm trying to do, as Larry
Wall once said, is to make simple things simple and hard things
possible.  The output plugin interface accomplishes the latter, but,
by itself, not the former.

>> And I think we can facilitate that without too much work.
> just provide a to-csv or to-json plugin and the crappy perl guys
> are happy.

Yep, that's exactly what I'm advocating for.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical changeset generation v6.2

2013-10-15 Thread Robert Haas

On Tue, Oct 15, 2013 at 9:47 AM, Andres Freund  wrote:
> On 2013-10-15 15:17:58 +0200, Andres Freund wrote:
>> If we go for CSV I think we should put the entire primary key as one
>> column (containing all the columns) and the entire row another.
>
> What about columns like:
> * action B|I|U|D|C

BEGIN and COMMIT?

> * xid
> * timestamp
>
> * tablename
>
> * key name
> * key column names
> * key column types
>
> * new key column values
>
> * column names
> * column types
> * column values
>
> * candidate_key_changed?
> * old key column values

Repeating the column names for every row strikes me as a nonstarter.
If the plugin interface isn't rich enough to provide a convenient way
to avoid that, then it needs to be fixed so that it is, because it
will be a common requirement.  Sure, some people may want JSON or XML
output that reiterates the labels every time, but for a lot of people
that's going to greatly increase the size of the output and be
undesirable for that reason.

> What still need to be determined is:
> * how do we separate and escape multiple values in one CSV column
> * how do we represent NULLs

I consider the escaping a key design decision.  Ideally, it should be
something that's easy to reverse from a scripting language; ideally
also, it should be something similar to how we handle COPY.  These
goals may be in conflict; we'll have to pick something.

I'm not sure that having multiple values in one column is a good plan,
because now you need multiple levels of parsing to unpack the row.
I'd rather just have a flat column list with a key somewhere
explaining how to interpret the data.  But I'm prepared to give in on
that point so long as we can demonstrate that the format can be easily
parsed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical changeset generation v6.2

2013-10-15 Thread Hannu Krosing

On 10/15/2013 02:47 PM, Andres Freund wrote:
> On 2013-10-15 15:17:58 +0200, Andres Freund wrote:
>> If we go for CSV I think we should put the entire primary key as one
>> column (containing all the columns) and the entire row another.
just use JSON :)
>> What about columns like:
>> * action B|I|U|D|C
>>
>> * xid
>> * timestamp
>>
>> * tablename
>>
>> * key name
>> * key column names
>> * key column types
>>
>> * new key column values
>>
>> * column names
>> * column types
>> * column values
>>
>> * candidate_key_changed?
>> * old key column values
>>
>> And have output plugin options
>> * include-column-types
>> * include-column-names
>> * include-primary-key
>>
>> If something isn't included it's simply left out.
>>
>> What still need to be determined is:
>> * how do we separate and escape multiple values in one CSV column
>> * how do we represent NULLs

or borrow whatever possible from pg_dump as they have
needed to solve most of the same problems already and
consistency is good in general
>
> Greetings,
>
> Andres Freund
>



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical changeset generation v6.2

2013-10-15 Thread Hannu Krosing

On 10/15/2013 01:42 PM, Robert Haas wrote:
> On Mon, Oct 14, 2013 at 9:51 AM, Andres Freund  wrote:
>>> Well, I just think relying on specific symbol names in the .so file is
>>> kind of unfortunate.  It means that, for example, you can't have
>>> multiple output plugins provided by a single .so.  And in general I
>>> think it's something that we've tried to minimize.
>> But that's not really different when you rely on _PG_init doing it's
>> thing, right?
> Sure, that's true.  But in general I think magic symbol names aren't a
> particularly good design.
>
>>> But there's only so much information available here.  Why not just
>>> have a format that logs it all?
>> Because we do not know what "all" is? Also, how would we handle
>> replication sets and such that all of the existing replication solutions
>> have generically?
> I don't see how you can fail to know what "all" is.  
We instinctively know what "all" is - as in the famous case of buddhist
ordering a
hamburger - "Make me All wit Everything" :) - but the requirements of
different  replications systems vary wildly.

> ...
> What people
> are going to want is the operation performed (insert, update, or
> delete), all the values in the new tuple, the key values from the old
> tuple, 
For multi-master / conflict resolution you may also want all old
values to make sure that they have not changed on target.

the difference in WAL volume can be really significant, especially
in the case of DELETE, where there are no new columns.

for some forms of conflict resolution we may even want to know
the database user who initiated the operation. and possibly even
some session variables like "very_important=yes".

> ... The point, for me anyway, is that someone can write a
> crappy Perl script to apply changes from a file like this in a day.
> My contention is that there are a lot of people who will want to do
> just that, for one reason or another.  The plugin interface has
> awesome power and flexibility, and really high-performance replication
> solutions will really benefit from that.  But regular people don't
> want to write C code; they just want to write a crappy Perl script.
> And I think we can facilitate that without too much work.
just provide a to-csv or to-json plugin and the crappy perl guys
are happy.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical changeset generation v6.2

2013-10-15 Thread Robert Haas

On Tue, Oct 15, 2013 at 9:17 AM, Andres Freund  wrote:
> It allows you to use the shared libary both as a normal extension loaded
> via shared_preload_library or adhoc and as an output plugin which seems
> like a sensible goal.
> We could have a single _PG_init_output_plugin() symbol that fills in
> such a struct which would then not conflict with using the .so
> independently. If you prefer that I'll change things around.

I think part of the problem may be that you're using the library name
to identify the output plugin.  I'm not excited about that design.
For functions, you give the function a name and that is a pointer to
where to actually find the function, which may be a 2-tuple
, or perhaps just a 1-tuple
, or maybe the whole text of a PL/pgsql
procedure that should be compiled.

Perhaps this ought to work similarly.  Create a function in pg_proc
which returns the structure containing the function pointers.  Then,
when that output plugin is selected, it'll automatically trigger
loading the correct shared library if that's needed; and the shared
library name may (but need not) match the output plugin name.

>> What I'd probably do is
>> emit the data in CSV format, with the first column of each line being
>> a single character indicating what sort of row this is: H means a
>> header row, defining the format of subsequent rows
>> (H,table_name,new_column1,...,new_columnj,old_key_column1,...,old_key_columnk;
>> a new header row is emitted only when the column list changes); I, U,
>> or D means an insert, update, or delete, with column 2 being the
>> transaction ID, column 3 being the table name, and the remaining
>> columns matching the last header row for emitted for that table, T
>> means meta-information about a transaction, whatever we have (e.g.
>> T,txn_id,commit_time).
>
> There's two issues I have with this:
> a) CSV seems like a bad format for this. If a transaction inserts into
> multiple tables the number of columns will constantly change. Many CSV
> parsers don't deal with that all too gracefully. E.g. you can't even
> load the data into another postgres database as an audit log.

We can pick some other separator.  I don't think ragged CSV is a big
problem; I'm actually more worried about having an easy way to handle
embedded commas and newlines and so on.  But I'd be fine with
tab-separated data or something too, if you think that's better.  What
I want is something that someone can parse with a script that can be
written in a reasonable amount of time in their favorite scripting
language.  I predict that if we provide something like this we'll
vastly expand the number of users who can make use of this new
functionality.

User: So, what's new in PostgreSQL 9.4?
Hacker: Well, now we have logical replication!
User: Why is that cool?
Hacker: Well, streaming replication is awesome for HA, but it has
significant limitations.  And trigger-based systems are very mature,
but the overhead is high and their lack of core integration makes them
hard to use.  With this technology, you can build systems that will
replicate individual tables or even parts of tables, multi-master
systems, and lots of other cool stuff.
User: Wow, that sounds great.  How do I use it?
Hacker: Well, first you write an output plugin in C using a special API.
User: Hey, do you know whether the MongoDB guys came to this conference?

Let's try that again.

User: Wow, that sounds great.  How do I use it?
Hacker: Well, currently, the output gets dumped as a series of text
files that are designed to be parsed using a scripting language.  We
have sample parsers written in Perl and Python that you can use as-is
or hack up to meet your needs.

Now, some users are still going to head for the hills.  But at least
from where I sit it sounds a hell of a lot better than the first
answer.  We're not going to solve all of the tooling problems around
this technology in one release, for sure.  But as far as 95% of our
users are concerned, a C API might as well not exist at all.  People
WILL try to machine parse the output of whatever demo plugins we
provide; so I think we should try hard to provide at least one such
plugin that is designed to make that as easy as possible.

> If we go for CSV I think we should put the entire primary key as one
> column (containing all the columns) and the entire row another.
>
> We also don't have any nice facilities for actually writing CSV - so
> we'll need to start extracting escaping code from COPY. In the end all
> that will make the output plugin very hard to use as an example because
> the code will get more complicated.
>
> b) Emitting new row descriptors everytime the schema changes will
> require keeping track of the schema. I think that won't be trivial. It
> also makes consumption of the data more complicated in comparison to
> including the description with every row.
>
> Both are even more true once we extend the format to support streaming
> of transactions while they are performed.

All fair poi

Re: [HACKERS] Triggers on foreign tables

2013-10-15 Thread Stephen Frost

Robert,

* Robert Haas (robertmh...@gmail.com) wrote:
> /me pokes head up.  I know I'm going to annoy people with this
> comment, but I feel like it's going to have to be made at some point

Perhaps some folks will be annoyed- I'm not annoyed, but I don't
really agree. :)

> by somebody, so here goes: I don't see the point of this feature.  If
> you want a trigger on a table, why not set it on the remote side?  A
> trigger on the foreign table won't be enforced consistently; it'll
> only work when the update is routed through the foreign table, not
> when people access the underlying table on the remote side through any
> other mechanism.  The number of useful things you can do this way
> seems fairly small.  Perhaps you could use a row-level trigger for
> RLS, to allow only certain rows on the foreign side to be updated, but
> even that seems like a slightly strange design: generally it'd be
> better to enforce the security as close to the target object as
> possible.

I can certainly see use-cases for this, a very simple one being a way to
keep track of what's been updated/inserted/whatever through this
particular foreign table (essentially, an auditing table).  The *remote*
side might not be ideal for tracking that information and you might want
the info locally and remotely anyway.

> There's another issue that concerns me here also: performance.  IIUC,
> an update of N tuples on the remote side currently requires N+1 server
> round-trips.  That is unspeakably awful, and we really oughta be
> looking for ways to make that number go down, by pushing the whole
> update to the remote side.  But obviously that won't be possible if
> there's a per-row trigger that has to be evaluated on the local side.
> Now, assuming somebody comes up with code that implements that
> optimization, we can just disable it when there are local-side
> triggers.  But, then you're back to having terrible performance.  So
> even if the use case for this seemed really broad, I tend to think
> performance concerns would sink most of the possible real-world uses.

Performance, while a concern, should probably be secondary when there
are valid use-cases for this where the performance wouldn't be a problem
for users.

Thanks,

Stephen


signature.asc
Description: Digital signature

Re: [HACKERS] logical changeset generation v6.2

2013-10-15 Thread Andres Freund

On 2013-10-15 15:17:58 +0200, Andres Freund wrote:
> If we go for CSV I think we should put the entire primary key as one
> column (containing all the columns) and the entire row another.

What about columns like:
* action B|I|U|D|C

* xid
* timestamp

* tablename

* key name
* key column names
* key column types

* new key column values

* column names
* column types
* column values

* candidate_key_changed?
* old key column values

And have output plugin options
* include-column-types
* include-column-names
* include-primary-key

If something isn't included it's simply left out.

What still need to be determined is:
* how do we separate and escape multiple values in one CSV column
* how do we represent NULLs

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Triggers on foreign tables

2013-10-15 Thread Robert Haas

On Mon, Oct 14, 2013 at 5:24 PM, Kohei KaiGai  wrote:
>>> And, I also want some comments from committers, not only from mine.
>>
>> +1
>>
> +1

/me pokes head up.  I know I'm going to annoy people with this
comment, but I feel like it's going to have to be made at some point
by somebody, so here goes: I don't see the point of this feature.  If
you want a trigger on a table, why not set it on the remote side?  A
trigger on the foreign table won't be enforced consistently; it'll
only work when the update is routed through the foreign table, not
when people access the underlying table on the remote side through any
other mechanism.  The number of useful things you can do this way
seems fairly small.  Perhaps you could use a row-level trigger for
RLS, to allow only certain rows on the foreign side to be updated, but
even that seems like a slightly strange design: generally it'd be
better to enforce the security as close to the target object as
possible.

There's another issue that concerns me here also: performance.  IIUC,
an update of N tuples on the remote side currently requires N+1 server
round-trips.  That is unspeakably awful, and we really oughta be
looking for ways to make that number go down, by pushing the whole
update to the remote side.  But obviously that won't be possible if
there's a per-row trigger that has to be evaluated on the local side.
Now, assuming somebody comes up with code that implements that
optimization, we can just disable it when there are local-side
triggers.  But, then you're back to having terrible performance.  So
even if the use case for this seemed really broad, I tend to think
performance concerns would sink most of the possible real-world uses.

I could, of course, be all wet

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical changeset generation v6.2

2013-10-15 Thread Andres Freund

On 2013-10-15 08:49:26 -0400, Robert Haas wrote:
> On Mon, Oct 14, 2013 at 5:07 PM, Andres Freund  wrote:
> > So, see the attatched benchmark skript. I've always done using a disk
> > bound and a memory bound (using eatmydata, preventing fsyncs) run.
> >
> > * unpatched run, wal_level = hot_standby, eatmydata
> > * unpatched run, wal_level = hot_standby
> >
> > * patched run, wal_level = hot_standby, eatmydata
> > * patched run, wal_level = hot_standby
> >
> > * patched run, wal_level = logical, eatmydata
> > * patched run, wal_level = logical
> >
> > Based on those results, there's no difference above noise for
> > wal_level=hot_standby, with or without the patch. With wal_level=logical
> > there's a measurable increase in wal traffic (~12-17%), but no
> > performance decrease above noise.
> >
> > From my POV that's ok, those are really crazy catalog workloads.
> 
> Any increase in WAL traffic will translate into a performance hit once
> the I/O channel becomes saturated, but I agree those numbers don't
> sound terrible for that faily-brutal test case.

Well, the parallel workloads were fsync saturated although probably not
throughput, that's why I added them. But yes, it's not the same as a
throughput saturated IO channel.
Probably the worst case real-world workload is one that uses lots and
lots of ON COMMIT DROP temporary tables.

> Actually, I was more concerned about the hit on non-catalog workloads.  
> pgbench isn't a
> good test because the key column is so narrow; but suppose we have a
> table like (a text, b integer, c text) where (a, c) is the primary key
> and those strings are typically pretty long - say just short enough
> that we can still index the column.  It'd be worth testing both
> workloads where the primary key doesn't change (so the only overhead
> is figuring out that we need not log it) and those where it does
> (where we're double-logging most of the tuple).  I assume the latter
> has to produce a significant hit to WAL volume, and I don't think
> there's much we can do about that; but the former had better be nearly
> free.

Ah, ok. Then I misunderstood you.

Is there a specific overhead you are "afraid" of in the
pkey-doesn't-change scenario? The changed wal logging (buffer in a
separate rdata entry) or the check whether the primary key has changed?

The only way I have been able to measure differences in that scenario
was to load a table with a low fillfactor and wide tuples, checkpoint,
and then update lots of rows. On wal_level=logical that will result in
full-page-images and tuple data being logged which can be noticeable if
you have really large tuples, even if the pkey doesn't change.

We could optimize that by not actually logging the tuple data in that
case but just include the tid so we could extract things from the Bkp
block ourselves. But that will complicate the code and doesn't yet seem
warranted.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical changeset generation v6.2

2013-10-15 Thread Andres Freund

On 2013-10-15 08:42:20 -0400, Robert Haas wrote:
> On Mon, Oct 14, 2013 at 9:51 AM, Andres Freund  wrote:
> >> Well, I just think relying on specific symbol names in the .so file is
> >> kind of unfortunate.  It means that, for example, you can't have
> >> multiple output plugins provided by a single .so.  And in general I
> >> think it's something that we've tried to minimize.
> >
> > But that's not really different when you rely on _PG_init doing it's
> > thing, right?
> 
> Sure, that's true.  But in general I think magic symbol names aren't a
> particularly good design.

It allows you to use the shared libary both as a normal extension loaded
via shared_preload_library or adhoc and as an output plugin which seems
like a sensible goal.
We could have a single _PG_init_output_plugin() symbol that fills in
such a struct which would then not conflict with using the .so
independently. If you prefer that I'll change things around.

We can't do something like 'output_plugin_in_progress' before calling
_PG_init() because _PG_init() won't be called again if the shared object
is already loaded...

> >> But there's only so much information available here.  Why not just
> >> have a format that logs it all?
> >
> > Because we do not know what "all" is? Also, how would we handle
> > replication sets and such that all of the existing replication solutions
> > have generically?
> 
> I don't see how you can fail to know what "all" is.  There's only a
> certain set of facts available.  I mean you could log irrelevant crap
> like a random number that you just picked or the sum of all numeric
> values in the column, but nobody's likely to want that.  What people
> are going to want is the operation performed (insert, update, or
> delete), all the values in the new tuple, the key values from the old
> tuple, the transaction ID, and maybe some meta-information about the
> transaction (such as the commit timestamp).

Some will want all column names included because that makes replication
into different schemas/databases easier, others won't because it makes
replicating the data more complicated and expensive.
Lots will want the primary key as a separate set of columns even for
inserts, others not.
There's also datatypes of values and null representation.

> What I'd probably do is
> emit the data in CSV format, with the first column of each line being
> a single character indicating what sort of row this is: H means a
> header row, defining the format of subsequent rows
> (H,table_name,new_column1,...,new_columnj,old_key_column1,...,old_key_columnk;
> a new header row is emitted only when the column list changes); I, U,
> or D means an insert, update, or delete, with column 2 being the
> transaction ID, column 3 being the table name, and the remaining
> columns matching the last header row for emitted for that table, T
> means meta-information about a transaction, whatever we have (e.g.
> T,txn_id,commit_time).

There's two issues I have with this:
a) CSV seems like a bad format for this. If a transaction inserts into
multiple tables the number of columns will constantly change. Many CSV
parsers don't deal with that all too gracefully. E.g. you can't even
load the data into another postgres database as an audit log.

If we go for CSV I think we should put the entire primary key as one
column (containing all the columns) and the entire row another.

We also don't have any nice facilities for actually writing CSV - so
we'll need to start extracting escaping code from COPY. In the end all
that will make the output plugin very hard to use as an example because
the code will get more complicated.

b) Emitting new row descriptors everytime the schema changes will
require keeping track of the schema. I think that won't be trivial. It
also makes consumption of the data more complicated in comparison to
including the description with every row.

Both are even more true once we extend the format to support streaming
of transactions while they are performed.

> But regular people don't want to write C code; they just want to write
> a crappy Perl script.  And I think we can facilitate that without too
> much work.

I think the generic output plugin should be a separate one from the
example one (which is the one included in the patchset).

> >> Oh, yuck.  So that means you have to write an extra WAL record for
> >> EVERY heap insert, update, or delete to a catalog table?  OUCH.
> >
> > Yes. We could integrate it into the main record without too many
> > problems, but it didn't seem like an important optimization and it would
> > have higher chances of slowing down wal_level < logical.
> 
> Hmm.  I don't know whether that's an important optimization or not.

Based on the benchmark I'd say no. If we discover we need to go there we
can do so later. I don't forsee this to be really problematic.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Servi

Re: [HACKERS] Compression of full-page-writes

2013-10-15 Thread k...@rice.edu

On Tue, Oct 15, 2013 at 03:11:22PM +0900, KONDO Mitsumasa wrote:
> (2013/10/15 13:33), Amit Kapila wrote:
> >Snappy is good mainly for un-compressible data, see the link below:
> >http://www.postgresql.org/message-id/CAAZKuFZCOCHsswQM60ioDO_hk12tA7OG3YcJA8v=4yebmoa...@mail.gmail.com
> This result was gotten in ARM architecture, it is not general CPU.
> Please see detail document.
> http://www.reddit.com/r/programming/comments/1aim6s/lz4_extremely_fast_compression_algorithm/c8y0ew9
> 
> I found compression algorithm test in HBase. I don't read detail,
> but it indicates snnapy algorithm gets best performance.
> http://blog.erdemagaoglu.com/post/4605524309/lzo-vs-snappy-vs-lzf-vs-zlib-a-comparison-of
> 
> In fact, most of modern NoSQL storages use snappy. Because it has
> good performance and good licence(BSD license).
> 
> >I think it is bit difficult to prove that any one algorithm is best
> >for all kind of loads.
> I think it is necessary to make best efforts in community than I do
> the best choice with strict test.
> 
> Regards,
> -- 
> Mitsumasa KONDO
> NTT Open Source Software Center
> 

Google's lz4 is also a very nice algorithm with 33% better compression
performance than snappy and 2X the decompression performance in some
benchmarks also with a bsd license:

https://code.google.com/p/lz4/

Regards,
Ken


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Long paths for tablespace leads to uninterruptible hang in Windows

2013-10-15 Thread Magnus Hagander

On Tue, Oct 15, 2013 at 2:55 PM, Robert Haas  wrote:
> On Mon, Oct 14, 2013 at 1:30 PM, Tom Lane  wrote:
>>> Well, that sucks.  So it's a Windows bug.
>>
>> It's not clear to me that we should do anything about this at all,
>> except perhaps document that people should avoid long tablespace
>> path names on an unknown set of Windows versions.  We should not
>> be in the business of working around any and every bug coming out
>> of Redmond.
>
> It's sort of incomprehensible to me that Microsoft has a bug like this
> and apparently hasn't fixed it.  But I think I still favor trying to
> work around it.  When people try to use a long data directory name and
> it freezes the system, some of them will blame us rather than
> Microsoft.  We've certainly gone to considerable lengths to work
> around extremely strange bugs in various compiler toolchains, even
> relatively obscure ones.  I don't particularly see why we shouldn't do
> the same here.

I agree we'll probably want to work around it in the end, but I still
think it should be put to Microsoft PSS if we can. The usual - have we
actually produced a self-contained example that does just this (and
doesn't include the full postgres support) and submitted it to
*microsoft* for comments? Not talking about their end user forums, but
the actual microsoft support services? (AFAIK at least EDB, and
probably other pg companies as well, have agreements with MS that lets
you get access to their "real" support. I know I used to have it at my
last job, and used it a number of times during the initial porting
work. The people backing that one are generally pretty good)

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Long paths for tablespace leads to uninterruptible hang in Windows

2013-10-15 Thread Robert Haas

On Mon, Oct 14, 2013 at 1:30 PM, Tom Lane  wrote:
>> Well, that sucks.  So it's a Windows bug.
>
> It's not clear to me that we should do anything about this at all,
> except perhaps document that people should avoid long tablespace
> path names on an unknown set of Windows versions.  We should not
> be in the business of working around any and every bug coming out
> of Redmond.

It's sort of incomprehensible to me that Microsoft has a bug like this
and apparently hasn't fixed it.  But I think I still favor trying to
work around it.  When people try to use a long data directory name and
it freezes the system, some of them will blame us rather than
Microsoft.  We've certainly gone to considerable lengths to work
around extremely strange bugs in various compiler toolchains, even
relatively obscure ones.  I don't particularly see why we shouldn't do
the same here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2013-10-15 Thread Magnus Hagander

On Tue, Oct 15, 2013 at 2:47 PM, MauMau  wrote:
> From: "Dimitri Fontaine" 
>
>> The reason why that parameter default has changed from 5 to 0 is that
>> some people would mistakenly use a prepared transaction without a
>> transaction manager. Few only people are actually using a transaction
>> manager that it's better to have them have to set PostgreSQL.
>
>
> I guess this problem is not unique to PostgreSQL.  I think PostgreSQL can be
> more friendly for normal users (who use external transaction manager), and
> does not need to be too conservative because of people who do irregular
> things.

I would say *using* an external transaction manager *is* the irregular
thing. The current default *is* friendly for normal users, for example
see the comments from Andres about what happens if you make a mistake.
So I definitely agree with your sentiment that we should be more
friendly for normal users - but in this case we are.

If I look through all the customers I've worked with, only a handful
have actually used a transaction manager. And of those, at least half
of them were using it even though they didn't need it, because they
didn't know what it was.

But the argument about being friendly for new users should definitely
have us change wal_level and max_wal_senders.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical changeset generation v6.2

2013-10-15 Thread Robert Haas

On Mon, Oct 14, 2013 at 5:07 PM, Andres Freund  wrote:
> So, see the attatched benchmark skript. I've always done using a disk
> bound and a memory bound (using eatmydata, preventing fsyncs) run.
>
> * unpatched run, wal_level = hot_standby, eatmydata
> * unpatched run, wal_level = hot_standby
>
> * patched run, wal_level = hot_standby, eatmydata
> * patched run, wal_level = hot_standby
>
> * patched run, wal_level = logical, eatmydata
> * patched run, wal_level = logical
>
> Based on those results, there's no difference above noise for
> wal_level=hot_standby, with or without the patch. With wal_level=logical
> there's a measurable increase in wal traffic (~12-17%), but no
> performance decrease above noise.
>
> From my POV that's ok, those are really crazy catalog workloads.

Any increase in WAL traffic will translate into a performance hit once
the I/O channel becomes saturated, but I agree those numbers don't
sound terrible for that faily-brutal test case. Actually, I was more
concerned about the hit on non-catalog workloads.  pgbench isn't a
good test because the key column is so narrow; but suppose we have a
table like (a text, b integer, c text) where (a, c) is the primary key
and those strings are typically pretty long - say just short enough
that we can still index the column.  It'd be worth testing both
workloads where the primary key doesn't change (so the only overhead
is figuring out that we need not log it) and those where it does
(where we're double-logging most of the tuple).  I assume the latter
has to produce a significant hit to WAL volume, and I don't think
there's much we can do about that; but the former had better be nearly
free.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2013-10-15 Thread MauMau


From: "Dimitri Fontaine" 

The reason why that parameter default has changed from 5 to 0 is that
some people would mistakenly use a prepared transaction without a
transaction manager. Few only people are actually using a transaction
manager that it's better to have them have to set PostgreSQL.


I guess this problem is not unique to PostgreSQL.  I think PostgreSQL can be 
more friendly for normal users (who use external transaction manager), and 
does not need to be too conservative because of people who do irregular 
things.


Regards
MauMau



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2013-10-15 Thread Andres Freund

On 2013-10-15 21:41:18 +0900, MauMau wrote:
> Likewise, non-zero max_prepared_transactons would improve the
> impression of PostgreSQL (for limited number of users, though), and it
> wouldn't do any harm.

I've seen several sites shutting down because of forgotten prepared
transactions causing bloat and anti-wraparound shutdowns.

A big, big -1 for changing that default.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] logical changeset generation v6.2

2013-10-15 Thread Robert Haas

On Mon, Oct 14, 2013 at 9:51 AM, Andres Freund  wrote:
>> Well, I just think relying on specific symbol names in the .so file is
>> kind of unfortunate.  It means that, for example, you can't have
>> multiple output plugins provided by a single .so.  And in general I
>> think it's something that we've tried to minimize.
>
> But that's not really different when you rely on _PG_init doing it's
> thing, right?

Sure, that's true.  But in general I think magic symbol names aren't a
particularly good design.

>> But there's only so much information available here.  Why not just
>> have a format that logs it all?
>
> Because we do not know what "all" is? Also, how would we handle
> replication sets and such that all of the existing replication solutions
> have generically?

I don't see how you can fail to know what "all" is.  There's only a
certain set of facts available.  I mean you could log irrelevant crap
like a random number that you just picked or the sum of all numeric
values in the column, but nobody's likely to want that.  What people
are going to want is the operation performed (insert, update, or
delete), all the values in the new tuple, the key values from the old
tuple, the transaction ID, and maybe some meta-information about the
transaction (such as the commit timestamp).  What I'd probably do is
emit the data in CSV format, with the first column of each line being
a single character indicating what sort of row this is: H means a
header row, defining the format of subsequent rows
(H,table_name,new_column1,...,new_columnj,old_key_column1,...,old_key_columnk;
a new header row is emitted only when the column list changes); I, U,
or D means an insert, update, or delete, with column 2 being the
transaction ID, column 3 being the table name, and the remaining
columns matching the last header row for emitted for that table, T
means meta-information about a transaction, whatever we have (e.g.
T,txn_id,commit_time).  There's probably some further tweaking of that
that could be done, and I might be overlooking some salient details,
like maybe we want to indicate the column types as well as their
names, but the range of things that someone can want to do here is not
unlimited.  The point, for me anyway, is that someone can write a
crappy Perl script to apply changes from a file like this in a day.
My contention is that there are a lot of people who will want to do
just that, for one reason or another.  The plugin interface has
awesome power and flexibility, and really high-performance replication
solutions will really benefit from that.  But regular people don't
want to write C code; they just want to write a crappy Perl script.
And I think we can facilitate that without too much work.

>> Oh, yuck.  So that means you have to write an extra WAL record for
>> EVERY heap insert, update, or delete to a catalog table?  OUCH.
>
> Yes. We could integrate it into the main record without too many
> problems, but it didn't seem like an important optimization and it would
> have higher chances of slowing down wal_level < logical.

Hmm.  I don't know whether that's an important optimization or not.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Auto-tuning work_mem and maintenance_work_mem

2013-10-15 Thread MauMau


From: "Magnus Hagander" 

On Oct 12, 2013 2:13 AM, "MauMau"  wrote:

I'm not sure if many use XA features, but I saw the questions and answer

a few times, IIRC.  In the trouble situation, PostgreSQL outputs an
intuitive message like "increase max_prepared_transactions", so many users
might possibly have been able to change the setting and solve the problem
themselves without asking for help, feeling stress like "Why do I have to
set this?"  For example, max_prepared_transactions is called "hideous
creature" in the following page:


https://community.jboss.org/wiki/InstallPostgreSQLOnFedora?_sscc=t


Anybody who follows that page is screwed anyway. I notice they recommend
running regular VACUUM FULL across the whole database, so it's obvious 
they

know nothing about postgresql. There's nothing we can do about what people
write on random pages around the Internet.


Regular VACUUM FULL is certainly overkill.  Apart from that, having to set 
max_prepared_transactions seems to make PostgreSQL difficult for people with 
that level of knowledge, doesn't it?  I wonder if there are other major 
DBMSs which require marameter configuration and server restart to use 
distributed transactions.





According to the below page, the amount of memory consumed for this is

"(770 + 270 * max_locks_per_transaction) * max_prepared_transactions".
With the default setting of maxconnections=100 and
max_locks_per_transaction=64, this is only 180KB.  So the overhead is
negligible.

You are assuming memory is the only overhead. I don't think it is.


Having a quick look at the source code, just setting 
max_prepared_transactions to non-zero seems to produce almost no processing 
overhead.



If the goal is to make PostgreSQL more friendly and run smoothly without

frustration from the start and not perfect tuning, I think
max_prepared_transactions=max_connections is an easy and good item.  If 
the

goal is limited to auto-tuning memory sizes, this improvement can be
treated separately.

Frankly, I think we'd help 1000 times more users of we enabled a few wal
writers by default and jumped the wal level. Mainly so they could run one
off base backup. That's used by orders of magnitude more users than XA.


Agreed.  The default of non-zero max_wal_senders and wal_level > 'archive' 
would be beneficial for more users.  Likewise, non-zero 
max_prepared_transactons would improve the impression of PostgreSQL (for 
limited number of users, though), and it wouldn't do any harm.


Regards
MauMau



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] removing old ports and architectures

2013-10-15 Thread Andres Freund

On 2013-10-13 16:56:12 +0200, Tom Lane wrote:
> More to the point for this specific case, it seems like our process
> ought to be
> (1) select a preferably-small set of gcc atomic intrinsics that we
> want to use.

I suggest:
* pg_atomic_load_u32(uint32 *)
* uint32 pg_atomic_store_u32(uint32 *)
* uint32 pg_atomic_exchange_u32(uint32 *ptr, uint32 val)
* bool pg_atomic_compare_exchange_u32(uint32 *ptr, uint32 *expected, uint32 
newval)
* uint32 pg_atomic_fetch_add_u32(uint32 *ptr, uint32 add)
* uint32 pg_atomic_fetch_sub_u32(uint32 *ptr, uint32 add)
* uint32 pg_atomic_fetch_and_u32(uint32 *ptr, uint32 add)
* uint32 pg_atomic_fetch_or_u32(uint32 *ptr, uint32 add)
* u64 variants of the above
* bool pg_atomic_test_set(void *ptr)
* void pg_atomic_clear(void *ptr)

Ontop of that we can generically implement:
* pg_atomic_add_until_u32(uint32 *ptr, uint32 val, uint32 limit)
* pg_atomic_(add|sub|and|or)_fetch_u32()
* u64 variants of the above

We might also want to provide a generic implementation of the math
operations based on pg_atomic_compare_exchange() to make it easier to
bring up a new architecture.

I think we should leave 64bit support optional for now.

Opinions?

Greetings,

Andres Freund

--
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] INSERT...ON DUPLICATE KEY LOCK FOR UPDATE

2013-10-15 Thread Robert Haas

On Fri, Oct 11, 2013 at 2:30 PM, Peter Geoghegan  wrote:
>>> But that's simpler than any of the alternatives that I see.
>>> Does there really need to be a new snapshot type with one tiny
>>> difference that apparently doesn't actually affect conventional
>>> clients of MVCC snapshots?
>>
>> I think that's the wrong way of thinking about it.  If you're
>> introducing a new type of snapshot, or tinkering with the semantics of
>> an existing one, I think that's a reason to reject the patch straight
>> off.  We should be looking for a design that doesn't require that.  If
>> we can't find one, I'm not sure we should do this at all.
>
> I'm confused by this. We need to lock a row not visible to our
> snapshot under conventional rules. I think we can rule out
> serialization failures at read committed. That just leaves changing
> something about the visibility rules of an existing snapshot type, or
> creating a new snapshot type, no?
>
> It would also be unacceptable to update a tuple, and not have the new
> row version (which of course will still have "information from the
> future") visible to our snapshot - what would regular RETURNING
> return? So what do you have in mind? I don't think that locking a row
> and updating it are really that distinct anyway. The benefit of
> locking is that we don't have to update. We can delete, for example.

Well, the SQL standard way of doing this type of operation is MERGE.
The alternative we know exists in other databases is REPLACE; there's
also INSERT .. ON DUPLICATE KEY update.  In all of those cases,
whatever weirdness exists around MVCC is confined to that one command.
 I tend to think we should do similarly, with the goal that
HeapTupleSatisfiesMVCC need not change at all.

I don't have the only vote here, of course, but my feeling is that
that's more likely to be a good route.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Statistics collection for CLUSTER command

2013-10-15 Thread Vik Fearing

On 09/16/2013 08:26 AM, Satoshi Nagayasu wrote:
> (2013/08/08 20:52), Vik Fearing wrote:
>> As part of routine maintenance monitoring, it is interesting for us to
>> have statistics on the CLUSTER command (timestamp of last run, and
>> number of runs since stat reset) like we have for (auto)ANALYZE and
>> (auto)VACUUM.  Patch against today's HEAD attached.
>>
>> I would add this to the next commitfest but I seem to be unable to log
>> in with my community account (I can log in to the wiki).  Help
>> appreciated.
>
> I have reviewed the patch.

Thank you for your review.

> Succeeded to build with the latest HEAD, and passed the regression
> tests.
>
> Looks good enough, and I'd like to add a test case here, not only
> for the view definition, but also working correctly.
>
> Please take a look at attached one.

Looks good to me.  Attached is a rebased patch with those tests added.

-- 
Vik

*** a/doc/src/sgml/monitoring.sgml
--- b/doc/src/sgml/monitoring.sgml
***
*** 979,984  postgres: user database host 
  
  
+  last_cluster
+  timestamp with time zone
+  Last time at which CLUSTER was issued on this table
+ 
+ 
   vacuum_count
   bigint
   Number of times this table has been manually vacuumed
***
*** 1001,1006  postgres: user database host Number of times this table has been analyzed by the autovacuum
daemon
  
+ 
+  cluster_count
+  bigint
+  Number of times CLUSTER has been issued on this table
+ 
 
 

*** a/src/backend/catalog/system_views.sql
--- b/src/backend/catalog/system_views.sql
***
*** 410,419  CREATE VIEW pg_stat_all_tables AS
  pg_stat_get_last_autovacuum_time(C.oid) as last_autovacuum,
  pg_stat_get_last_analyze_time(C.oid) as last_analyze,
  pg_stat_get_last_autoanalyze_time(C.oid) as last_autoanalyze,
  pg_stat_get_vacuum_count(C.oid) AS vacuum_count,
  pg_stat_get_autovacuum_count(C.oid) AS autovacuum_count,
  pg_stat_get_analyze_count(C.oid) AS analyze_count,
! pg_stat_get_autoanalyze_count(C.oid) AS autoanalyze_count
  FROM pg_class C LEFT JOIN
   pg_index I ON C.oid = I.indrelid
   LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
--- 410,421 
  pg_stat_get_last_autovacuum_time(C.oid) as last_autovacuum,
  pg_stat_get_last_analyze_time(C.oid) as last_analyze,
  pg_stat_get_last_autoanalyze_time(C.oid) as last_autoanalyze,
+ pg_stat_get_last_cluster_time(C.oid) as last_cluster,
  pg_stat_get_vacuum_count(C.oid) AS vacuum_count,
  pg_stat_get_autovacuum_count(C.oid) AS autovacuum_count,
  pg_stat_get_analyze_count(C.oid) AS analyze_count,
! pg_stat_get_autoanalyze_count(C.oid) AS autoanalyze_count,
! pg_stat_get_cluster_count(C.oid) AS cluster_count
  FROM pg_class C LEFT JOIN
   pg_index I ON C.oid = I.indrelid
   LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace)
*** a/src/backend/commands/cluster.c
--- b/src/backend/commands/cluster.c
***
*** 35,40 
--- 35,41 
  #include "commands/vacuum.h"
  #include "miscadmin.h"
  #include "optimizer/planner.h"
+ #include "pgstat.h"
  #include "storage/bufmgr.h"
  #include "storage/lmgr.h"
  #include "storage/predicate.h"
***
*** 407,412  cluster_rel(Oid tableOid, Oid indexOid, bool recheck, bool verbose,
--- 408,417 
  	 verbose);
  
  	/* NB: rebuild_relation does heap_close() on OldHeap */
+ 
+ 	/* Report CLUSTER to the stats collector, but not VACUUM FULL */
+ 	if (indexOid != InvalidOid)
+ 		pgstat_report_cluster(OldHeap);
  }
  
  /*
*** a/src/backend/postmaster/pgstat.c
--- b/src/backend/postmaster/pgstat.c
***
*** 292,297  static void pgstat_recv_resetsinglecounter(PgStat_MsgResetsinglecounter *msg, in
--- 292,298 
  static void pgstat_recv_autovac(PgStat_MsgAutovacStart *msg, int len);
  static void pgstat_recv_vacuum(PgStat_MsgVacuum *msg, int len);
  static void pgstat_recv_analyze(PgStat_MsgAnalyze *msg, int len);
+ static void pgstat_recv_cluster(PgStat_MsgCluster *msg, int len);
  static void pgstat_recv_bgwriter(PgStat_MsgBgWriter *msg, int len);
  static void pgstat_recv_funcstat(PgStat_MsgFuncstat *msg, int len);
  static void pgstat_recv_funcpurge(PgStat_MsgFuncpurge *msg, int len);
***
*** 1385,1390  pgstat_report_analyze(Relation rel,
--- 1386,1412 
  }
  
  /* 
+  * pgstat_report_cluster() -
+  *
+  *	Tell the collector about the table we just CLUSTERed.
+  * 
+  */
+ void
+ pgstat_report_cluster(Relation rel)
+ {
+ 	PgStat_MsgCluster msg;
+ 
+ 	if (pgStatSock == PGINVALID_SOCKET)
+ 		return;
+ 
+ 	pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_CLUSTER);
+ 	msg.m_databaseid = rel->rd_rel->relisshared ? InvalidOid : MyDatabaseId;
+ 	msg.m_tableoid = RelationGetRelid(re

Re: [HACKERS] SSL renegotiation

2013-10-15 Thread Vik Fearing

On 09/23/2013 10:51 PM, Alvaro Herrera wrote:
> + /* are we in the middle of a renegotiation? */
> + static bool in_ssl_renegotiation = false;
> + 

Since this was committed, I'm getting the following warning:

be-secure.c:105:13: warning: ‘in_ssl_renegotiation’ defined but not used
[-Wunused-variable]

-- 
Vik



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Standby catch up state change

2013-10-15 Thread Andres Freund

On 2013-10-15 16:29:47 +0530, Pavan Deolasee wrote:
> On Tue, Oct 15, 2013 at 4:16 PM, Andres Freund wrote:
> 
> >  I don't think delaying the message is a good
> >  idea.
> 
> 
> Comment in walsender.c says:
> 
> /*
>  * If we're in catchup state, move to streaming.  This is an
>  * important state change for users to know about, since before
>  * this point data loss might occur if the primary dies and we
>  * need to failover to the standby.
>  */
> 
> IOW it claims no data loss will occur after this point. But if the WAL is
> cached on the master side, isn't this a false claim i.e. the data loss can
> still occur even after master outputs the log message and changes the state
> to streaming. Or am I still getting it wrong ?

I think you're over-intrepreting it. We don't actually rely on the data
being confirmed received anywhere. And the message doesn't say anything
about everything safely being written out.
So, if you want to adjust that comment, go for it, but I am pretty
firmly confirmed that this isn't worth changing logic.

Note that the ready_to_stop logic *does* make sure everything's flushed.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Standby catch up state change

2013-10-15 Thread Pavan Deolasee

On Tue, Oct 15, 2013 at 4:16 PM, Andres Freund wrote:

>  I don't think delaying the message is a good
>  idea.

Comment in walsender.c says:

/*
 * If we're in catchup state, move to streaming.  This is an
 * important state change for users to know about, since before
 * this point data loss might occur if the primary dies and we
 * need to failover to the standby.
 */

IOW it claims no data loss will occur after this point. But if the WAL is
cached on the master side, isn't this a false claim i.e. the data loss can
still occur even after master outputs the log message and changes the state
to streaming. Or am I still getting it wrong ?

Thanks,
Pavan
-- 
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee

Re: [HACKERS] WITHIN GROUP patch

2013-10-15 Thread Vik Fearing

On 10/09/2013 04:19 PM, Pavel Stehule wrote:
>
> I checked a conformance with ANSI SQL - and I didn't find any issue.
>
> I found so following error message is not too friendly (mainly
> because this functionality will be new)
>
> postgres=# select dense_rank(3,3,2) within group (order by num
> desc, odd) from test4;
> ERROR:  Incorrect number of arguments for hypothetical set function
> LINE 1: select dense_rank(3,3,2) within group (order by num desc,
> od...
>^
> postgres=# select dense_rank(3,3,2) within group (order by num
> desc) from test4;
> ERROR:  Incorrect number of arguments for hypothetical set function
> LINE 1: select dense_rank(3,3,2) within group (order by num desc)
> fr...
>^
> postgres=# select dense_rank(3,3) within group (order by num desc)
> from test4;
> ERROR:  Incorrect number of arguments for hypothetical set function
> LINE 1: select dense_rank(3,3) within group (order by num desc)
> from...
>^
> postgres=# select dense_rank(3,3) within group (order by num desc,
> num) from test4;
>  dense_rank
> 
>   3
> (1 row)
>
> Probably some hint should be there?
>
>

In addition to Pavel's review, I have finally finished reading the
patch. Here are some notes, mainly on style:

First of all, it no longer compiles on HEAD because commit
4d212bac1752e1bad6f3aa6242061c393ae93a0a stole oid 3968.  I modified
that locally to be able to continue my review.

Some of the error messages do not comply with project style.  That is,
they begin with a capital letter.

Ordered set functions cannot have transition functions
Ordered set functions must have final functions
Invalid argument types for hypothetical set function
Invalid argument types for ordered set function
Incompatible change to aggregate definition
Too many arguments to ordered set function
Ordered set finalfns must not be strict
Cannot have multiple ORDER BY clauses with WITHIN GROUP
Cannot have DISTINCT and WITHIN GROUP together
Incorrect number of arguments for hypothetical set function
Incorrect number of direct arguments to ordered set function %s

And in pg_aggregate.c I found a comment with a similar problem that
doesn't match its surrounding code:
Oidtranssortop = InvalidOid;  /* Can be omitted */

I didn't find any more examples like that, but I did see several block
comments that weren't complete sentences whereas I think they should
be.  Also a lot of the code comments say "I" and I don't recall seeing
that elsewhere.  I may be wrong, but I would prefer if they were more
neutral.

The documentation has a number of issues.

collateindex.pl complains of duplicated index entries for PERCENTILE
CONTINUOUS and PERCENTILE DISCRETE.  This is because the index markup is
used for both overloaded versions.  This is the same mistake Bruce made
and then corrected in commit 5dcc48c2c76cf4b2b17c8e14fe3e588ae0c8eff3.

"if there are multiple equally good result" should have an s on the end
in func.sgml.

Table 9-49 has an extra empty column.  That should either be removed, or
filled in with some kind of comment text like other similar tables.

Apart from that, it looks good.  There is some mismatched coding styles
in there but the next run of pgindent should catch them so it's no big deal.

I haven't yet exercised the actual functionality of the new functions,
nor have I tried to create my own.  Andrew alerted me to a division by
zero bug in one of them, so I'll be looking forward to catching that.

So, more review to come.

-- 
Vik

Re: [HACKERS] Standby catch up state change

2013-10-15 Thread Andres Freund

On 2013-10-15 16:12:56 +0530, Pavan Deolasee wrote:
> On Tue, Oct 15, 2013 at 3:59 PM, Andres Freund wrote:
> 
> >
> > I don't think that'd be a good idea - the "caughtup" logic is used to
> > determine whether we need to wait for further wal to be generated
> > locally if we haven't got anything else to do. And we only need to do so
> > when we reached the end of the WAL.
> >
> >
> Obviously I do not understand the logic caughtup fully, but don't you think
> the log message about standby having caught up with master while it hasn't
> because the sender has buffered a lot of data, is wrong ? Or are you saying
> those are two different things really ?

The message is logged when the state changes because the state is
important for the behaviour of replication (e.g. that node becomes
elegible for sync rep). I don't think delaying the message is a good
idea.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Standby catch up state change

2013-10-15 Thread Pavan Deolasee

On Tue, Oct 15, 2013 at 3:59 PM, Andres Freund wrote:

>
> I don't think that'd be a good idea - the "caughtup" logic is used to
> determine whether we need to wait for further wal to be generated
> locally if we haven't got anything else to do. And we only need to do so
> when we reached the end of the WAL.
>
>
Obviously I do not understand the logic caughtup fully, but don't you think
the log message about standby having caught up with master while it hasn't
because the sender has buffered a lot of data, is wrong ? Or are you saying
those are two different things really ?


> Also, we'd have to reset caughtup everytime we send data (in
> XLogSend()), that'd be horrible.
>
>
Sorry, I did not get that. I was only arguing that the log message about
standby having caught up with master should be delayed until standby has
actually received the WAL, not much about the actual implementation.

Thanks,
Pavan

-- 
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee

Re: [HACKERS] Standby catch up state change

2013-10-15 Thread Andres Freund

On 2013-10-15 15:51:46 +0530, Pavan Deolasee wrote:
> Should we not instead wait for the standby to have received all the WAL
> before declaring that it has caught up ? If a failure happens while the
> data is still in the sender's buffer, the standby may not actually catch up
> to the desired point contrary to the LOG message displayed on the master.

I don't think that'd be a good idea - the "caughtup" logic is used to
determine whether we need to wait for further wal to be generated
locally if we haven't got anything else to do. And we only need to do so
when we reached the end of the WAL.

Also, we'd have to reset caughtup everytime we send data (in
XLogSend()), that'd be horrible.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Standby catch up state change

2013-10-15 Thread Pavan Deolasee

Hello,

I wonder if there is an issue with the way state change happens
from WALSNDSTATE_CATCHUP to WALSNDSTATE_STREAMING. Please note my question
is solely based on a strange behavior reported by a colleague and my
limited own code reading. The colleague is trying out replication with a
networking middleware and noticed that the master logs the debug message
about standby catching up, but the write_location in the
pg_stat_replication view takes minutes to reflect the actual catch up
location.

ISTM that the following code in walsender.c assumes that the standby has
caught up once master sends all the required WAL.

1548 /* Do we have any work to do? */
1549 Assert(sentPtr <= SendRqstPtr);
1550 if (SendRqstPtr <= sentPtr)
1551 {
1552 *caughtup = true;
1553 return;
1554 }

But what if the standby has not yet received all the WAL data sent by the
master ? It can happen for various reasons such as caching at the OS level
or the network layer on the sender machine or any other intermediate hops.

Should we not instead wait for the standby to have received all the WAL
before declaring that it has caught up ? If a failure happens while the
data is still in the sender's buffer, the standby may not actually catch up
to the desired point contrary to the LOG message displayed on the master.

Thanks,
Pavan

-- 
Pavan Deolasee
http://www.linkedin.com/in/pavandeolasee

Re: [HACKERS] Heavily modified big table bloat even in auto vacuum is running

2013-10-15 Thread Haribabu kommi

On 12 October 2013 11:30 Tom Lane wrote: 
>Haribabu kommi  writes:
>> To handle the above case instead of directly resetting the dead tuples 
>> as zero, how if the exact dead tuples are removed from the table stats. With 
>> this approach vacuum gets triggered frequently thus it reduces the bloat.

>This does not seem like a very good idea as-is, because it will mean that 
>n_dead_tuples can diverge arbitrarily far from reality over time, as a result 
>of accumulation of errors.  It also doesn't seem 
>like a very good idea that VACUUM sets n_live_tuples while only adjusting 
>n_dead_tuples incrementally; ideally those counters should move in the same 
>fashion.
>In short, I think this patch will create at least as many problems as it fixes.

>What would make more sense to me is for VACUUM to estimate the number of 
>remaining dead tuples somehow and send that in its message.  However, since 
>the whole point here is that we aren't accounting for 
>transactions that commit while VACUUM runs, it's not very clear how to do that.

>Another way to look at it is that we want to keep any increments to 
>n_dead_tuples that occur after VACUUM takes its snapshot.  Maybe we could have 
>VACUUM copy the n_dead_tuples value as it exists when 
>VACUUM starts, and then send that as the value to subtract when it's done?

Taking of n_dead_tuples copy and pass the same at the vacuum end to subtract 
from table stats may not be correct, as vacuum may not be cleaned all the dead 
tuples because of tuple visibility
To other transactions. How about resets the n_dead_tuples as zero if it goes 
negative because of errors?


Regards,
Hari babu.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Description on bind message

2013-10-15 Thread Tatsuo Ishii

> This is already explicitly said in the description of the previous
> field:
> 
> "The number of parameter format codes that follow (denoted C
> below). This can be zero to indicate that there are no parameters or
> that the parameters all use the default format (text); or one, in
> which case the specified format code is applied to all parameters; or
> it can equal the actual number of parameters."

I know it. The confusing part is this:

  This can be zero to indicate that there are no parameters or
  that the parameters all use the default format (text);

Here zero has double meaning. I am not convinced until I actually
looked into the source code what this actually means.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Description on bind message

2013-10-15 Thread Marko Tiikkaja


On 10/15/13 11:02 AM, Tatsuo Ishii wrote:

In manual of "48.5. Message Formats" section, there is a description
of "Bind" message.

Int16[C]
 The parameter format codes. Each must presently be zero (text) or one 
(binary).

This could be completely non-existent field in the current
implementation of PostgreSQL. I think the fact is not very clear from
the description. It would be nice the description is something like:

Int16[C]
 The parameter format codes. Each must presently be zero (text) or one 
(binary).
This field does not exist if the number of prameter values is 0.


This is already explicitly said in the description of the previous field:

"The number of parameter format codes that follow (denoted C below). 
This can be zero to indicate that there are no parameters or that the 
parameters all use the default format (text); or one, in which case the 
specified format code is applied to all parameters; or it can equal the 
actual number of parameters."


Also the documentation for the array notation explains this: "An array 
of k n-bit integers, each in network byte order. The array length k is 
always determined by an earlier field in the message. Eg. Int16[M].".


This part seems clear enough to me, and I don't think repeating that 
*three times* is necessary.  We already say it twice.



Regards,
Marko Tiikkaja


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Description on bind message

2013-10-15 Thread Tatsuo Ishii

In manual of "48.5. Message Formats" section, there is a description
of "Bind" message.

Int16[C]
The parameter format codes. Each must presently be zero (text) or one 
(binary).

This could be completely non-existent field in the current
implementation of PostgreSQL. I think the fact is not very clear from
the description. It would be nice the description is something like:

Int16[C]
The parameter format codes. Each must presently be zero (text) or one 
(binary).
This field does not exist if the number of prameter values is 0.

Same thing can be said to following:

Int16[R]
The result-column format codes. Each must presently be zero (text) or one 
(binary).
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese: http://www.sraoss.co.jp


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Add use of asprintf()

2013-10-15 Thread David Rowley

On Tue, Oct 15, 2013 at 8:00 PM, Asif Naeem  wrote:

>
>
>
> On Tue, Oct 15, 2013 at 10:55 AM, David Rowley wrote:
>
>> Though this is not yet enough to get the windows build work with me...
>> I'm still getting link failures for isolationtester.c
>>
>>
>> "D:\Postgres\c\pgsql.sln" (default target) (1) ->
>> "D:\Postgres\c\isolationtester.vcxproj" (default target) (89) ->
>> (Link target) ->
>>   isolationtester.obj : error LNK2019: unresolved external symbol
>> _pg_strdup referenced in function _try_complete_step
>> [D:\Postgres\c\isolationtester.vcxproj]
>>   isolationtester.obj : error LNK2019: unresolved external symbol
>> _pg_asprintf referenced in function _try_complete_step
>> [D:\Postgres\c\isolationtester.vcxproj
>> ]
>>   .\Release\isolationtester\isolationtester.exe : fatal error LNK1120: 2
>> unresolved externals [D:\Postgres\c\isolationtester.vcxproj]
>>
>> 1 Warning(s)
>>
>> I guess this is down to a make file error somewhere.
>>
>
> Can you please try the attached patch ?. I hope it will solve the problem.
>

Thanks that combined with my patch above seems to get the build running
again.
In summary I've attached a patch which is both of these combined.

I've not run any regression tests yet as that seems to be broken, but
perhaps it is something changed with my build environment. The attached is
probably worth applying to get the windows build farm members building
again to see if they'll go green.

Regards

David Rowley



>
>
>>
>> David
>>
>>
>>
>>
>


asprintf_windows_fix.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Long paths for tablespace leads to uninterruptible hang in Windows

2013-10-15 Thread Amit Kapila

On Mon, Oct 14, 2013 at 11:00 PM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Thu, Oct 10, 2013 at 9:34 AM, Amit Kapila  wrote:
>>> On further analysis, I found that hang occurs in some of Windows
>>> API(FindFirstFile, RemoveDirectroy) when symlink path
>>> (pg_tblspc/spcoid/TABLESPACE_VERSION_DIRECTORY) is used in these
>>> API's. For above testcase, it will hang in path
>>> destroy_tablespace_directories->ReadDir->readdir->FindFirstFile
>
>> Well, that sucks.  So it's a Windows bug.
>
> It's not clear to me that we should do anything about this at all,
> except perhaps document that people should avoid long tablespace
> path names on an unknown set of Windows versions.

There are few more relatively minor issues with long paths in Windows.
For Example:
In function CreateTableSpace(), below check protects to create
tablespace on longer paths.

if (strlen(location) + 1 + strlen(TABLESPACE_VERSION_DIRECTORY) + 1 +
OIDCHARS + 1 + OIDCHARS + 1 + OIDCHARS > MAXPGPATH)
  ereport(ERROR,
  (errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
   errmsg("tablespace location \"%s\" is too long",
   location)));

MAXPGPATH is defined to be 1024, whereas the windows API's used in PG
have limit of 260 due to which error comes directly from API's use
rather than from above check.
So, one of the change I am thinking is to define MAXPGPATH for windows
separately.

> We should not
> be in the business of working around any and every bug coming out
> of Redmond.

This bug leads to an uninterruptible hang (I am not able to kill
process by task manager or any other way) and the corresponding
backend started consuming ~100% of CPU, so user doesn't have much
options but to restart his m/c. Any form of shutdown of PG is also not
successful.
I had proposed to fix this issue based on its severity, but if you
feel that we should keep the onus of such usage on user, then I think
I can try to fix other relatively minor problems on usage of long
paths.

With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Add use of asprintf()

2013-10-15 Thread Asif Naeem

On Tue, Oct 15, 2013 at 10:55 AM, David Rowley  wrote:

> Though this is not yet enough to get the windows build work with me... I'm
> still getting link failures for isolationtester.c
>
>
> "D:\Postgres\c\pgsql.sln" (default target) (1) ->
> "D:\Postgres\c\isolationtester.vcxproj" (default target) (89) ->
> (Link target) ->
>   isolationtester.obj : error LNK2019: unresolved external symbol
> _pg_strdup referenced in function _try_complete_step
> [D:\Postgres\c\isolationtester.vcxproj]
>   isolationtester.obj : error LNK2019: unresolved external symbol
> _pg_asprintf referenced in function _try_complete_step
> [D:\Postgres\c\isolationtester.vcxproj
> ]
>   .\Release\isolationtester\isolationtester.exe : fatal error LNK1120: 2
> unresolved externals [D:\Postgres\c\isolationtester.vcxproj]
>
> 1 Warning(s)
>
> I guess this is down to a make file error somewhere.
>

Can you please try the attached patch ?. I hope it will solve the problem.


>
> David
>
>
>
>


Win_isolationtester_fix.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

97 matches

Mail list logo