date:20120614

Re: [HACKERS] hint bit i/o reduction

2012-06-14 Thread Amit Kapila

 yeah -- but you still have to call:
 *) TransactionIdIsCurrentTransactionId(), which adds a couple of tests
 (and a bsearch in the presence of subtransactions)
 *) TransactionIdIsInProgress() which adds a few tests and a out of
   line call in the typical case
 *) TransactionIdDidCommit() which adds a few tests and two out of line
  calls in the typical case
and, while setting the hint it:
 *) Several more tests plus:
 *) TransactionIdGetCommitLSN()  another out of line call and a test
 *) XLogNeedsFlush() two more out of line calls plus a few tests
 *) SetBufferCommitInfoNeedsSave out of line call, several more tests

adding a few instructions to the above probably won't be measurable

You are right, adding in above path should not impact any performance and
moreover all the checks and assignment you are purposing is backend local so
no contention also.

Just an observation that in some cases, it will just do only point-1 or
point-1 and point-2.
For example 
1. xmin is committed and hint bit is already set.
2. xmax is hint bit is not set, and its still not visible to current
transaction.
In this case it will just do point-1 and point-2.


-Original Message-
From: Merlin Moncure [mailto:mmonc...@gmail.com] 
Sent: Wednesday, June 13, 2012 8:36 PM
To: Amit Kapila
Cc: PostgreSQL-development
Subject: Re: [HACKERS] hint bit i/o reduction

On Wed, Jun 13, 2012 at 9:02 AM, Amit Kapila amit.kap...@huawei.com wrote:
 Yes, but only in the unhinted case -- in oltp workloads tuples get
 hinted fairly quickly so I doubt this would be a huge impact.  Hinted
 scans will work exactly as they do now.  In the unhinted case for OLTP
 a  few tests are added but that's noise compared to the other stuff
 going on.

 I believe the design you have purposed is for the unhinted cases only,
means
 if the tuple has already hint set then it will not go to new logic. Is
that
 right?

 In unhinted cases, there can be 2 ways hint bit can be set
 1. It gets the transaction status from Clog memory buffers
 2. It needs to do I/O to get the transaction status from Clog.

 So, I think it will add overhead in case-1 where the processing is fast,
but
 now we will add some new instructions to it.

yeah -- but you still have to call:
*) TransactionIdIsCurrentTransactionId(), which adds a couple of tests
(and a bsearch in the presence of subtransactions)
*) TransactionIdIsInProgress() which adds a few tests and a out of
line call in the typical case
*) TransactionIdDidCommit() which adds a few tests and two out of line
calls in the typical case
and, while setting the hint it:
*) Several more tests plus:
*) TransactionIdGetCommitLSN()  another out of line call and a test
*) XLogNeedsFlush() two more out of line calls plus a few tests
*) SetBufferCommitInfoNeedsSave out of line call, several more tests

adding a few instructions to the above probably won't be measurable
(naturally it would be tested to confirm that).

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] libpq compression

2012-06-14 Thread Albe Laurenz

Euler Taveira wrote:
 There was already some discussion about compressing libpq data
[1][2][3].
 Recently, I faced a scenario that would become less problematic if we
have had
 compression support. The scenario is frequent data load (aka COPY)
over
 slow/unstable links. It should be executed in a few hundreds of
PostgreSQL
 servers all over Brazil. Someone could argue that I could use ssh
tunnel to
 solve the problem but (i) it is complex because it involves a
different port
 in the firewall and (ii) it's an opportunity to improve other
scenarios like
 reducing bandwidth consumption during replication or normal operation
over
 slow/unstable links.

Maybe I'm missing something obvious, but shouldn't a regular SSL
connection (sslmode=require) do what you are asking for?

At least from OpenSSL 0.9.8 on, data get compressed by default.
You don't need an extra port in the firewall for that.

Yours,
Laurenz Albe

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: [COMMITTERS] pgsql: Send new protocol keepalive messages to standby servers.

2012-06-14 Thread Dimitri Fontaine

Tom Lane t...@sss.pgh.pa.us writes:
 Well, TBH I didn't think that concept was a useful solution for this at
 all.  I can't imagine that we would define features at a sufficiently
 fine granularity, or with enough advance foresight, to solve the sort of
 problem that's being faced here.  How would you deal with the need for,
 say, some of contrib/xml2's functions to get migrated to core in 9.4 or
 so?  When you didn't know that would happen, much less exactly which
 functions, when 9.3 came out?  AFAICS the only way that features could

You can provide and require new feature names on the same dependency
graph, and the patch allows us to do so without foresight. The trade off
here is that we would need to edit the previous version of PostgreSQL to
add some new feature names.

In your example we would add a feature name to the contrib/xmls
extensions in a 9.3 minor release and add the same feature to 9.4 core
distribution. That requires minor upgrade before major upgrade for us to
have a chance to deal with lack of foresight.

 fix this would be if we always created a feature for every exposed
 function, which is unmanageable.

Agreed, exposing each function as a feature is not the way to go.

 AFAICS, the SQL-standard features list is just about entirely irrelevant
 to this discussion.  How often is it going to happen that we implement a
 standard feature in a contrib module before migrating it into core?

I wanted the standard to help me with the core features idea, I agree
it's not a smart move here.

 I think almost every case in which we'll face this problem will involve
 a PG-specific feature not mentioned in the SQL feature taxonomy.  The
 case at hand (some proposed new functions for managing replication)
 certainly isn't going to be found there.

Agreed.

 And, quite aside from whether we could invent feature names that match
 what we want to move from contrib to core, exactly how would having a
 feature name help?  The problem we're actually facing is getting
 pg_upgrade to not dump particular functions when it's doing a
 binary-upgrade dump of an extension.  Maybe I've forgotten, but I do not
 recall any exposed connection between feature names and particular SQL
 objects in your proposal.

We're talking about several distinct problems here, one is low level
upgrade mechanism to keep OIDs and the other is about upgrading to a
newer version of the same extension, which the content changed. And we
want to do both, of course.

The idea would be to implement the upgrade as you're proposing, or at
least my understanding of it, which is to just issue a create extension
command as part of the schema creation. That will run the new extension
script which will know not to install deprecated functions.

The problem with that is to conserve OIDs that might be stored in user
relations, such as types. If we want pg_upgrade to cope with upgrading
an extension to a new content, we have to change its implementation.

An idea would be to add hooks in the backend code at OID attribution
time so that pg_upgrade can attach code that will provide the OID. As
input it should have creating_extension, extension name, schema name and
object name, and also object argument list for more complex things
such as functions and aggregates.

I don't think the extension features dependency management patch
should be fixing the running the newer extension script while keeping
existing OID list by itself.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch pg_is_in_backup()

2012-06-14 Thread Marco Nenciarini

Hi Gilles,

unfortunately Gabriele has been very busy recently and he asked me to
check again the status of this patch for this commit fest. In order to
speed up the application of the patch, I am sending an updated version
which correctly compiles with current head. Gabriele will then proceed
with the review.

Thank you,
Marco

-- 
Marco Nenciarini - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
marco.nenciar...@2ndquadrant.it | www.2ndQuadrant.it 



pg_backup_start_time-and-pg_is_in_backup-v4.1.patch.bz2
Description: application/bzip

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch pg_is_in_backup()

2012-06-14 Thread Gabriele Bartolini


Hello Gilles,

   thank you very much for your patience (and thank you Marco for 
supporting me). I apologise for the delay.


   I have retested the updated patch and it works fine with me. It is 
ready for committer for me.


Cheers,
Gabriele


Il 14/06/12 11:36, Marco Nenciarini ha scritto:

Hi Gilles,

unfortunately Gabriele has been very busy recently and he asked me to
check again the status of this patch for this commit fest. In order to
speed up the application of the patch, I am sending an updated version
which correctly compiles with current head. Gabriele will then proceed
with the review.

Thank you,
Marco



This body part will be downloaded on demand.



--
 Gabriele Bartolini - 2ndQuadrant Italia
 PostgreSQL Training, Services and Support
 gabriele.bartol...@2ndquadrant.it | www.2ndQuadrant.it

[HACKERS] Support for array_remove and array_replace functions

2012-06-14 Thread Marco Nenciarini

Hi,

  following Gabriele's email regarding our previous patch on Foreign
Key Arrays[1], I am sending a subset of that patch which includes only
two array functions which will be needed in that patch: array_remove
(limited to single-dimensional arrays) and array_replace.

  The patch includes changes to the documentation.

Cheers,
Marco

[1] http://archives.postgresql.org/message-id/4FD8F422.40709%402ndQuadrant.it

-- 
Marco Nenciarini - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
marco.nenciar...@2ndquadrant.it | www.2ndQuadrant.it 



array-functions.patch.bz2
Description: application/bzip

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Minimising windows installer password confusion

2012-06-14 Thread Dave Page

On Thu, Jun 14, 2012 at 12:55 AM, Craig Ringer
cr...@postnewspapers.com.au wrote:
 On 06/13/2012 05:14 PM, Dave Page wrote:

 On Wed, Jun 13, 2012 at 2:18 AM, Craig Ringer
 cr...@postnewspapers.com.au wrote:

 On 06/12/2012 08:08 PM, Dave Page wrote:

 Some background: By default the installer will use 'postgres' for both
 the service (OS) account, and the database superuser account. It will
 use the same password for both (though, users have complete control at
 the command line if they want it, which is why the account names are
 shown in the installer). Unlike *nix installations, the service
 account must have a password, which is required during both
 installation and upgrade to configure the PostgreSQL service. We
 therefore ask for the password during both installation and upgrade.
 If the user account exists (which can happen if there has previously
 been an installation of Postgres on the system), the user must specify
 the existing password for the account during installation (and of
 course, during all upgrades). This is where users normally get stuck,
 as they've forgotten the password they set in the past.

 They may not even have set it, because the EnterpriseDB installer can be
 run
 unattended and may've been used by some 3rd party package.

 I'd be interested in seeing what Microsoft installers that create
 isolated
 user accounts do. I think .NET creates one for ASP, so that'd be one
 option
 to look into.

 They tend to use the localsystem or networkservice accounts for most
 things, which don't have passwords. The reason we don't do that is
 that since the early days of 8.0 we've said we want to enable users to
 use the service account as if it were a regular account, so that they
 can do things like access network resources (useful for server-side
 copy for example).

 I wasn't overly convinced back then that that was necessary, and I'm
 still not now. I suppose we potentially could use the networkservice
 account by default, and let advanced users override that on the
 command line if they need to. Then remembering the password becomes
 their responsibility.

 +1 from me on this, though I think making the service account part of the
 install process makes sense.

 SERVICE ACCOUNT
 -

 You can probably just accept the default here.

 PostgreSQL runs in the background as a network service in a user account
 that
 only has limited access to the files and programs on the computer. This is
 fine
 for most purposes, but will prevent certain operations like server-side COPY
 and
 direct access by the server to resources on shared folders from working. If
 you
 need these capabilities, we recommend that you create a postgres user
 account
 below and have the PostgreSQL server run in that instead of the default
 NetworkService account.

 -- [service account] ---
 |                                          |
 | [*] LocalService                         |
 |                                          |
 | [ ] Custom Service Account               |
 |                                          |
 | -- [custom account]--|   |
 | |                                    |   |
 | | Username: [                 ]      |   |
 | | Password: [                 ]      |   |
 | | Domain:   [ THISCOMPUTER    ]      |   |
 | |                                    |   |
 | ||   |
 |--|

 Reasonable option?

Too complicated - it'll confuse users too (and it goes against the
primary goal of the installers which is to setup the server in a
suitable way for production use, with the absolute minimum of user
interaction). We try to provide more complex config options that 99%
of users won't need through the command line only (though, in the
future I guess it might make sense to have a Show advanced options
checkbox on the first page, though that's definitely out of scope for
9.2).

I'll have a play with it and see if a simple switch to NetworkService
seems feasible.

-- 
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Support for foreign keys with arrays

2012-06-14 Thread Marco Nenciarini

Il giorno mer, 13/06/2012 alle 18.07 -0400, Noah Misch ha scritto:
 On Wed, Jun 13, 2012 at 10:12:18PM +0200, Gabriele Bartolini wrote:
  Our goal is to work on this patch from the next commit fest.
 
  What we are about to do for this commit fest is to split the previous  
  patch and send a small one just for the array_remove() and  
  array_replace() functions.
 
  Then we will sit down and organise the development of the feature  
  according to Peter's feedback. It is important indeed that we find a  
  commonly accepted terminology and syntax for this feature.
 
  I hope this sounds like a reasonable plan. Thank you.
 
 Sounds fine.  The 2012-01 CF entry for this patch has been moved to the
 2012-06 CF.  Please mark that entry Returned with Feedback and create a new
 entry for the subset you're repackaging for this CommitFest.
 
 Thanks,
 nm
 

Dear Noah,

   I have added a new patch to the commitfest (miscellaneous category)
entitled Support for array_remove and array_replace functions. I have
also marked the Foreign Key Array patch as  Returned with Feedback
as per your request. We will start working again on this patch in the
next commitfest as anticipated by Gabriele, hoping that the array
functions will be committed by then.

   Thank you.

Cheers,
Marco

-- 
Marco Nenciarini - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
marco.nenciar...@2ndquadrant.it | www.2ndQuadrant.it 



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Ability to listen on two unix sockets

2012-06-14 Thread Robert Haas

On Thu, Jun 14, 2012 at 1:04 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 Maybe:

 listen_addresses = { host | host:port | * | *:port } [, ...]
 unix_socket_directory = { directory | directory:port ] [,...]

 ...except that colon is a valid character in a directory name.  Not
 sure what to do about that.

 Do we need to do anything?  There are no standard scenarios in which a
 colon would appear in such paths, and I don't see why we need to cater
 for it.  (Remember that Windows doesn't enter into this ...)

True.  And, we should be able to design the parsing algorithm so that
they can fix it by adding an otherwise-redundant port specification -
i.e. if you want to put the socket in a directory called /me:1, then
write /me:1:5432

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] libpq compression

2012-06-14 Thread Euler Taveira

On 14-06-2012 02:19, Tom Lane wrote:
 I still think that pushing this off to openssl (not an ssh tunnel, but
 the underlying transport library) would be an adequate solution.
 If you are shoving data over a connection that is long enough to need
 compression, the odds that every bit of it is trustworthy seem pretty
 small, so you need encryption too.
 
I don't want to pay the SSL connection overhead. Also I just want compression,
encryption is not required. OpenSSL give us encryption with/without
compression; we need an option to obtain compression in non-SSL connections.

 We do need the ability to tell openssl to use compression.  We don't
 need to implement it ourselves, nor to bring a bunch of new library
 dependencies into our builds.  I especially think that importing bzip2
 is a pretty bad idea --- it's not only a new dependency, but bzip2's
 compression versus speed tradeoff is entirely inappropriate for this
 use-case.
 
I see, the idea is that bzip2 would be a compiled-in option (not enabled by
default) just to give another compression option. I don't have a strong
opinion about including it as another dependency. We already depend on zlib
and implementing compression using it won't add another dependency.

What do you think about adding a hook at libpq to load an extension that does
the compression? That way we don't add another dependency at libpq and also a
lot of extensions could be coded to cover a variety of algorithms without
putting us in trouble because of patent infringement.


-- 
   Euler Taveira de Oliveira - Timbira   http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] \conninfo and SSL

2012-06-14 Thread Robert Haas

On Sun, Jun 3, 2012 at 5:30 AM, Alastair Turner b...@ctrlf5.co.za wrote:
 A one-line change adds the SSL info on its own line like

 --
 You are connected to database scratch as user scratch on host
 127.0.0.1 at port 5432.
 SSL connection (cipher: DHE-RSA-AES256-SHA, bits: 256)
 --

 Does this need a more integrated presentation, and therefore a broader
 change to make it translatable?

Committed to master.  I didn't make it conditional on a non-local
connection, though, since there seemed to be no reason to it that way.

P.S. Email mangles patches.  It's better to attach them rather than
including them inline.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 02/16] Add zeroRecPtr as a shortcut for initializing a local variable to {0, 0}

2012-06-14 Thread Robert Haas

On Wed, Jun 13, 2012 at 7:28 AM, Andres Freund and...@2ndquadrant.com wrote:
 This is locally defined in lots of places and would get introduced frequently
 in the next commits. It is expected that this can be defined in a header-only
 manner as soon as the XLogInsert scalability groundwork from Heikki gets in.

This appears to be redundant with the existing InvalidXLogRecPtr,
defined in access/transam.h.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Restrict ALTER FUNCTION CALLED ON NULL INPUT (was Re: Not quite a security hole: CREATE LANGUAGE for non-superusers)

2012-06-14 Thread Noah Misch

On Tue, Jun 12, 2012 at 03:13:26PM -0400, Tom Lane wrote:
 Even if I accepted that premise, this patch is a pretty bad
 implementation of it, because it restricts cases that there is no
 reason to think are unsafe.
 
 A less bizarre and considerably more future-proof restriction, IMO,
 would simply refuse any attempt to give ownership of a C function
 to a non-superuser.

That just moves the collateral damage to a different set of victims.

Hardcoding the list of vulnerable languages isn't so future-proof.  I'd
otherwise agree in principle if we were designing a system from scratch, but
it doesn't fit with the need to harmoniously protect existing systems.  Adding
this restriction now would make some existing databases fail to restore from
dumps.  That is grave enough at any juncture, let alone for a backpatched fix.

On Tue, Jun 12, 2012 at 04:14:48PM -0400, Tom Lane wrote:
 Basically, if we go down the road Noah is proposing, I foresee a steady
 stream of security bugs and ensuing random restrictions on what function
 owners can do.  I do not like that future.

Then let's have every ALTER FUNCTION require the same language usage check as
CREATE FUNCTION.  Or, if you insist, do so only for the hardcoded cases of
INTERNALlanguageId and ClanguageId, and document that no third-party PL may
rely on STRICT to the extent they do.  This of course forbids more than
necessary but still strictly less than your proposal of blocking the original
ownership change.

nm

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP patch for Todo Item : Provide fallback_application_name in contrib/pgbench, oid2name, and dblink

2012-06-14 Thread Robert Haas

On Wed, Jun 13, 2012 at 12:07 AM, Amit Kapila amit.kap...@huawei.com wrote:

 Why not 'dblink'?

 We can do for dblink as well. I just wanted to check before implementing in
 dblink.

 I have checked the dblink_connect() function, it receives the connect string
 and used in most cases that string to
 call libpq connect which is different from pgbench and oid2name where
 connection parameters are formed in main function and then call libpq
 connect.

 To achieve the same in dblink, we need to parse the passed connection string
 and check if it contains fallback_application_name, if yes then its okay,
 otherwise we need to append fallback_application_name in connection string.

That seems undesirable.  I don't think this is important enough to be
worth reparsing the connection string for.  I'd just forget about
doing it for dblink if there's no cheaper way.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 02/16] Add zeroRecPtr as a shortcut for initializing a local variable to {0, 0}

2012-06-14 Thread Andres Freund

On Thursday, June 14, 2012 03:50:28 PM Robert Haas wrote:
 On Wed, Jun 13, 2012 at 7:28 AM, Andres Freund and...@2ndquadrant.com 
wrote:
  This is locally defined in lots of places and would get introduced
  frequently in the next commits. It is expected that this can be defined
  in a header-only manner as soon as the XLogInsert scalability groundwork
  from Heikki gets in.
 
 This appears to be redundant with the existing InvalidXLogRecPtr,
 defined in access/transam.h.
Oh. I didn't find that one. Judging from all the code defining local variants 
of it I am not alone in that... Why is it in transam.h and not xlogdefs.h?

Obviously that patch is void then. Doesn't warrant rebasing the other patches 
yet though...

Thanks!

Andres
-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] creating objects in pg_catalog

2012-06-14 Thread Robert Haas

On Thu, Jun 7, 2012 at 8:44 AM, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Jun 6, 2012 at 5:21 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 rhaas=# create table pg_catalog.tom (a int);
 ERROR:  permission denied to create pg_catalog.tom

 The offending error check is in heap_create(), and based on what
 you're saying here it seems like we should just rip it out.

 Hmm.  Yeah, it seems like the regular permissions tests on the schemas
 in question should be enough to keep Joe User from making tables there,
 and I do not see a reason why the backend would care if there are
 non-catalog tables laying about in pg_catalog.

 Checking the commit history, it seems this was originally a test to
 prevent people from creating tables named pg_xxx:

 http://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=f5a10e722c052006886b678995695001958a#patch3

 which may or may not have been critical once upon a time, but surely is
 not any more.

 So no objection to removing that particular test.

 Patch attached.  Barring objections, I'll apply this to 9.3 once we branch.

Hearing no objections, done.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 02/16] Add zeroRecPtr as a shortcut for initializing a local variable to {0, 0}

2012-06-14 Thread Robert Haas

On Thu, Jun 14, 2012 at 9:57 AM, Andres Freund and...@2ndquadrant.com wrote:
 On Thursday, June 14, 2012 03:50:28 PM Robert Haas wrote:
 On Wed, Jun 13, 2012 at 7:28 AM, Andres Freund and...@2ndquadrant.com
 wrote:
  This is locally defined in lots of places and would get introduced
  frequently in the next commits. It is expected that this can be defined
  in a header-only manner as soon as the XLogInsert scalability groundwork
  from Heikki gets in.

 This appears to be redundant with the existing InvalidXLogRecPtr,
 defined in access/transam.h.
 Oh. I didn't find that one. Judging from all the code defining local variants
 of it I am not alone in that... Why is it in transam.h and not xlogdefs.h?

Uh, not sure.  We used to have a variable by that name defined in a
bunch of places, and I cleaned it up some in commit
503c7305a1e379f95649eef1a694d0c1dbdc674a.  But if there are still more
redundant definitions floating around, it would be nice to clean those
up.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] libpq compression

2012-06-14 Thread Florian Pflug

On Jun14, 2012, at 15:28 , Euler Taveira wrote:
 On 14-06-2012 02:19, Tom Lane wrote:
 I still think that pushing this off to openssl (not an ssh tunnel, but
 the underlying transport library) would be an adequate solution.
 If you are shoving data over a connection that is long enough to need
 compression, the odds that every bit of it is trustworthy seem pretty
 small, so you need encryption too.
 
 I don't want to pay the SSL connection overhead. Also I just want compression,
 encryption is not required. OpenSSL give us encryption with/without
 compression; we need an option to obtain compression in non-SSL connections.

AFAIR, openssl supports a NULL cipher which doesn't do any encryption. We
could have a connection parameter, say compress=on, which selects that
cipher (unless sslmode is set to prefer or higher, of course).

SSL NULL-chipher connections would be treated like unencrypted connections
when matching against pg_hba.conf.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP patch for Todo Item : Provide fallback_application_name in contrib/pgbench, oid2name, and dblink

2012-06-14 Thread Amit Kapila

 That seems undesirable.  I don't think this is important enough to be
worth reparsing  the connection string for.  
The connect string should not be long, so parsing is not a big cost
performance wise.
I have currently modified the code for dblink in the patch I have uploaded
in CF.
However as per your suggestion, I will remove it during handling of other
Review comments for patch unless somebody asks to keep it.

-Original Message-
From: Robert Haas [mailto:robertmh...@gmail.com] 
Sent: Thursday, June 14, 2012 7:25 PM
To: Amit Kapila
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] WIP patch for Todo Item : Provide
fallback_application_name in contrib/pgbench, oid2name, and dblink

On Wed, Jun 13, 2012 at 12:07 AM, Amit Kapila amit.kap...@huawei.com
wrote:

 Why not 'dblink'?

 We can do for dblink as well. I just wanted to check before implementing
in
 dblink.

 I have checked the dblink_connect() function, it receives the connect
string
 and used in most cases that string to
 call libpq connect which is different from pgbench and oid2name where
 connection parameters are formed in main function and then call libpq
 connect.

 To achieve the same in dblink, we need to parse the passed connection
string
 and check if it contains fallback_application_name, if yes then its okay,
 otherwise we need to append fallback_application_name in connection
string.

That seems undesirable.  I don't think this is important enough to be
worth reparsing the connection string for.  I'd just forget about
doing it for dblink if there's no cheaper way.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] unlink for DROPs after releasing locks (was Re: Should I implement DROP INDEX CONCURRENTLY?)

2012-06-14 Thread Robert Haas

On Sun, Jun 10, 2012 at 5:37 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Sun, Jun 10, 2012 at 4:19 PM, Noah Misch n...@leadboat.com wrote:
 Agreed.  We now have $OLD_SUBJECT, but this is a win independently.  I have
 reviewed the code that runs between the old and new call sites, and I did 
 not
 identify a hazard of moving it as you describe.

 I looked at this when we last discussed it and didn't see a problem
 either, so I tend to agree that we ought to go ahead and do this,

 +1, as long as you mean in 9.3 not 9.2.  I don't see any risk either,
 but the time for taking new risks in 9.2 is past.

 Noah, please add this patch to the upcoming CF, if you didn't already.

I re-reviewed this and committed it.

Is RESOURCE_RELEASE_AFTER_LOCKS actually used for anything?  Is it
just for extensions?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Backup docs

2012-06-14 Thread Robert Haas

On Wed, Jun 13, 2012 at 3:20 PM, Dimitri Fontaine
dimi...@2ndquadrant.fr wrote:
 Please let's apply that documentation patch to 9.2 too.

Agreed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] unlink for DROPs after releasing locks (was Re: Should I implement DROP INDEX CONCURRENTLY?)

2012-06-14 Thread Tom Lane

Robert Haas robertmh...@gmail.com writes:
 Is RESOURCE_RELEASE_AFTER_LOCKS actually used for anything?  Is it
 just for extensions?

I'm too lazy to go look, but it certainly ought to be in use.
The idea is that that's the phase for post-lock-release cleanup,
and anything that can possibly be postponed till after releasing
locks certainly should be ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] libpq compression

2012-06-14 Thread Phil Sorber

On Thu, Jun 14, 2012 at 10:14 AM, Florian Pflug f...@phlo.org wrote:
 On Jun14, 2012, at 15:28 , Euler Taveira wrote:
 On 14-06-2012 02:19, Tom Lane wrote:
 I still think that pushing this off to openssl (not an ssh tunnel, but
 the underlying transport library) would be an adequate solution.
 If you are shoving data over a connection that is long enough to need
 compression, the odds that every bit of it is trustworthy seem pretty
 small, so you need encryption too.

 I don't want to pay the SSL connection overhead. Also I just want 
 compression,
 encryption is not required. OpenSSL give us encryption with/without
 compression; we need an option to obtain compression in non-SSL connections.

 AFAIR, openssl supports a NULL cipher which doesn't do any encryption. We
 could have a connection parameter, say compress=on, which selects that
 cipher (unless sslmode is set to prefer or higher, of course).

 SSL NULL-chipher connections would be treated like unencrypted connections
 when matching against pg_hba.conf.

 best regards,
 Florian Pflug


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers

It doesn't sound like there is a lot of support for this idea, but I
think it would be nice to get something like lz4
(http://code.google.com/p/lz4/) or snappy
(http://code.google.com/p/snappy/) support. Both are BSD-ish licensed.
It could be useful for streaming replication as well. A hook (as Euler
mentioned) might be a nice compromise.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] unlink for DROPs after releasing locks (was Re: Should I implement DROP INDEX CONCURRENTLY?)

2012-06-14 Thread Robert Haas

On Thu, Jun 14, 2012 at 10:45 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 Is RESOURCE_RELEASE_AFTER_LOCKS actually used for anything?  Is it
 just for extensions?

 I'm too lazy to go look, but it certainly ought to be in use.
 The idea is that that's the phase for post-lock-release cleanup,
 and anything that can possibly be postponed till after releasing
 locks certainly should be ...

Oh, you're right.  I missed the logic in ResourceOwnerReleaseInternal.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 02/16] Add zeroRecPtr as a shortcut for initializing a local variable to {0, 0}

2012-06-14 Thread Andres Freund

On Thursday, June 14, 2012 04:04:22 PM Robert Haas wrote:
 On Thu, Jun 14, 2012 at 9:57 AM, Andres Freund and...@2ndquadrant.com 
wrote:
  On Thursday, June 14, 2012 03:50:28 PM Robert Haas wrote:
  On Wed, Jun 13, 2012 at 7:28 AM, Andres Freund and...@2ndquadrant.com
  
  wrote:
   This is locally defined in lots of places and would get introduced
   frequently in the next commits. It is expected that this can be
   defined in a header-only manner as soon as the XLogInsert scalability
   groundwork from Heikki gets in.
  
  This appears to be redundant with the existing InvalidXLogRecPtr,
  defined in access/transam.h.
  
  Oh. I didn't find that one. Judging from all the code defining local
  variants of it I am not alone in that... Why is it in transam.h and not
  xlogdefs.h?
 
 Uh, not sure.  We used to have a variable by that name defined in a
 bunch of places, and I cleaned it up some in commit
 503c7305a1e379f95649eef1a694d0c1dbdc674a.  But if there are still more
 redundant definitions floating around, it would be nice to clean those
 up.
Forget it, they are in things that don't link to the backend... /me looks 
forward to the 64bit conversion of XLogRecPtr's.

Andres
-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] sortsupport for text

2012-06-14 Thread Peter Geoghegan

On 18 March 2012 15:08, Tom Lane t...@sss.pgh.pa.us wrote:
 One other thing I've always wondered about in this connection is the
 general performance of sorting toasted datums.  Is it better to detoast
 them in every comparison, or pre-detoast to save comparison cycles at
 the cost of having to push much more data around?  I didn't see any
 discussion of this point in Robert's benchmarks, but I don't think we
 should go very far towards enabling sortsupport for text until we
 understand the issue and know whether we need to add more infrastructure
 for it.  If you cross your eyes a little bit, this is very much like
 the strxfrm question...

I see the parallels. I note that glibc's strcoll_l() is implemented
entirely in C (strcoll() itself is implemented in terms of strcoll_l()
), whereas the various strcmp.S are written in hand-optimized
assembler, with SSE3 instructions in the Highly optimized version for
x86-64, for example. I wonder just how important a factor that is. I
suppose the reason why the glibc guys haven't just done something
equivalent internally might be that they much prefer to perform the
comparison in-place, due to the need to target a conservative lowest
common denominator...or it could be because it just doesn't matter
that much.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Allow WAL information to recover corrupted pg_controldata

2012-06-14 Thread Amit Kapila

I am planning to work on the below Todo list item for this CommitFest 
Allow WAL information to recover corrupted pg_controldata 
http://archives.postgresql.org/pgsql-patches/2006-06/msg00025.php 

I wanted to confirm my understanding about the work involved for this patch:

The existing patch has following set of problems: 
   1. Memory leak and linked list code path is not proper 
   2. lock check for if the server is already running, is removed in patch
which needs to be reverted 
   3. Refactoring of the code.

Apart from above what I understood from the patch is that its intention is
to generate values for ControlFile using WAL logs when -r option is used. 

The change in algorithm from current will be if control file is corrupt
which essentialy means ReadControlFile() will return False, then it should 
generate values (checkPointCopy, checkPoint, prevCheckPoint, state) using
WAL if -r option is enabled. 

Also for -r option, it doesn't need to call function FindEndOfXLOG() as the
that work will be achieved by above point. 

It will just rewrite the control file and don't do other resets. 

  
The algorithm of restoring the pg_control value from old xlog file: 
   1. Retrieve all of the active xlog files from xlog direcotry into a list
by increasing order, according their timeline, log id, segment id. 
   2. Search the list to find the oldest xlog file of the lastest time line.

   3. Search the records from the oldest xlog file of latest time line to
the latest xlog file of latest time line, if the checkpoint record 
  has been found, update the latest checkpoint and previous checkpoint. 

 

Apart from above some changes in code will be required after the Xlog patch
by Heikki.
  
Suggest me if my understanding is correct?

Re: [HACKERS] Minimising windows installer password confusion

2012-06-14 Thread Dave Page

On Thu, Jun 14, 2012 at 11:43 AM, Dave Page dp...@pgadmin.org wrote:

 I'll have a play with it and see if a simple switch to NetworkService
 seems feasible.

OK, I worked up a patch which uses NT AUTHORITY\NetworkService as
the service account by default. This doesn't need a password, so
allows us to simply prompt during installation for the superuser
password for the cluster, and not at all during upgrade. If you run
the installer from the command line with --serviceaccount postgres
(or some other account name), you get the current behaviour.

I've posted it on our internal ReviewBoard system for the rest of the
team to review and test on various platforms (I've only tried it on XP
so far). Now would be a very good time for people to yelp if they
think this is a bad idea (the only downside I can see if accessing
files for server-side copy etc, but users that want to do that can
install with a non-default account). If we go ahead, we'll include it
in 9.2.

Thanks for the feedback folks.

-- 
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] libpq compression

2012-06-14 Thread Merlin Moncure

On Thu, Jun 14, 2012 at 9:57 AM, Phil Sorber p...@omniti.com wrote:
 It doesn't sound like there is a lot of support for this idea, but I
 think it would be nice to get something like lz4
 (http://code.google.com/p/lz4/) or snappy
 (http://code.google.com/p/snappy/) support. Both are BSD-ish licensed.
 It could be useful for streaming replication as well. A hook (as Euler
 mentioned) might be a nice compromise.

There is a lot of support for the idea: it's one of the more requested
features.  I think a well thought out framework that bypassed the
dependency issues via plugging might get some serious traction.
Emphasis 'on well thought out' :-).

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [RFC][PATCH] Logical Replication/BDR prototype and architecture

2012-06-14 Thread Robert Haas

On Wed, Jun 13, 2012 at 7:27 AM, Andres Freund and...@2ndquadrant.com wrote:
 === Design goals for logical replication === :
 - in core
 - fast
 - async
 - robust
 - multi-master
 - modular
 - as unintrusive as possible implementation wise
 - basis for other technologies (sharding, replication into other DBMSs, ...)

I agree with all of these goals except for multi-master.  I am not
sure that there is a need to have a multi-master replication solution
in core.  The big tricky part of multi-master replication is conflict
resolution, and that seems like an awful lot of logic to try to build
into core - especially given that we will want it to be extensible.

More generally, I would much rather see us focus on efficiently
extracting changesets from WAL and efficiently applying those
changesets on another server.  IMHO, those are the things that are
holding back the not-in-core replication solutions we have today,
particularly the first.  If we come up with a better way of applying
change-sets, well, Slony can implement that too; they are already
installing C code.  What neither they nor any other non-core solution
can implement is change-set extraction, and therefore I think that
ought to be the focus.

To put all that another way, I think it is a 100% bad idea to try to
kill off Slony, Bucardo, Londiste, or any of the many home-grown
solutions that are out there to do replication.  Even if there were no
technical place for third-party replication products (and I think
there is), we will not win many friends by making it harder to extend
and add on to the server.  If we build an in-core replication solution
that can be used all by itself, that is fine with me.  But I think it
should also be able to expose its constituent parts as a toolkit for
third-party solutions.

 While you may argue that most of the above design goals are already provided 
 by
 various trigger based replication solutions like Londiste or Slony, we think
 that thats not enough for various reasons:

 - not in core (and thus less trustworthy)
 - duplication of writes due to an additional log
 - performance in general (check the end of the above presentation)
 - complex to use because there is no native administration interface

I think that your parenthetical note (and thus less trustworthy)
gets at another very important point, which is that one of the
standards for inclusion in core is that it must in fact be trustworthy
enough to justify the confidence that users will place in it.  It will
NOT benefit the project to have two replication solutions in core, one
of which is crappy.  More, even if what we put in core is AS GOOD as
the best third-party solutions that are available, I don't think
that's adequate.  It has to be better.  If it isn't, there is no
excuse for preempting what's already out there.

I imagine you are thinking along similar lines, but I think that it
bears being explicit about.

 As we need a change stream that contains all required changes in the correct
 order, the requirement for this stream to reflect changes across multiple
 concurrent backends raises concurrency and scalability issues. Reusing the
 WAL stream for this seems a good choice since it is needed anyway and adresses
 those issues already, and it further means that we don't incur duplicate
 writes. Any other stream generating componenent would introduce additional
 scalability issues.

Agreed.

 We need a change stream that contains all required changes in the correct 
 order
 which thus needs to be synchronized across concurrent backends which 
 introduces
 obvious concurrency/scalability issues.
 Reusing the WAL stream for this seems a good choice since it is needed anyway
 and adresses those issues already, and it further means we don't duplicate the
 writes and locks already performance for its maintenance.

Agreed.

 Unfortunately, in this case, the WAL is mostly a physical representation of 
 the
 changes and thus does not, by itself, contain the necessary information in a
 convenient format to create logical changesets.

Agreed.

 The biggest problem is, that interpreting tuples in the WAL stream requires an
 up-to-date system catalog and needs to be done in a compatible backend and
 architecture. The requirement of an up-to-date catalog could be solved by
 adding more data to the WAL stream but it seems to be likely that that would
 require relatively intrusive  complex changes. Instead we chose to require a
 synchronized catalog at the decoding site. That adds some complexity to use
 cases like replicating into a different database or cross-version
 replication. For those it is relatively straight-forward to develop a proxy pg
 instance that only contains the catalog and does the transformation to textual
 changes.

The actual requirement here is more complex than an up-to-date
catalog.  Suppose transaction X begins, adds a column to a table,
inserts a row, and commits.  That tuple needs to be interpreted using
the tuple descriptor that

Re: [HACKERS] sortsupport for text

2012-06-14 Thread Robert Haas

On Thu, Jun 14, 2012 at 11:36 AM, Peter Geoghegan pe...@2ndquadrant.com wrote:
 On 18 March 2012 15:08, Tom Lane t...@sss.pgh.pa.us wrote:
 One other thing I've always wondered about in this connection is the
 general performance of sorting toasted datums.  Is it better to detoast
 them in every comparison, or pre-detoast to save comparison cycles at
 the cost of having to push much more data around?  I didn't see any
 discussion of this point in Robert's benchmarks, but I don't think we
 should go very far towards enabling sortsupport for text until we
 understand the issue and know whether we need to add more infrastructure
 for it.  If you cross your eyes a little bit, this is very much like
 the strxfrm question...

 I see the parallels.

The problem with pre-detoasting to save comparison cycles is that you
can now fit many, many fewer tuples in work_mem.  There might be cases
where it wins (for example, because the entire data set fits even
after decompressing everything) but in most cases it seems like a
loser.

Also, my guess is that most values people sort by are pretty short,
making this concern mostly academic.  Suppose you are sorting a bunch
of strings which might be either 100 characters in length or 1MB.  If
they're all 100 characters, you probably don't need to detoast.  If
they're all 1MB, you probably can't detoast without eating up a ton of
memory (and even if you have it, this might not be the best use for
it).  If you have a mix, detoasting might be affordable provided that
the percentage of long strings is small, but it's also not going to
save you much, because if the percentage of long strings is small,
then most comparisons will be between two short strings where we don't
save anything anyway.

All things considered, this seems to me to be aiming at a pretty
narrow target, but maybe I'm just not thinking about it creatively
enough.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Minimising windows installer password confusion

2012-06-14 Thread Robert Haas

On Thu, Jun 14, 2012 at 11:59 AM, Dave Page dp...@pgadmin.org wrote:
 On Thu, Jun 14, 2012 at 11:43 AM, Dave Page dp...@pgadmin.org wrote:

 I'll have a play with it and see if a simple switch to NetworkService
 seems feasible.

 OK, I worked up a patch which uses NT AUTHORITY\NetworkService as
 the service account by default. This doesn't need a password, so
 allows us to simply prompt during installation for the superuser
 password for the cluster, and not at all during upgrade. If you run
 the installer from the command line with --serviceaccount postgres
 (or some other account name), you get the current behaviour.

 I've posted it on our internal ReviewBoard system for the rest of the
 team to review and test on various platforms (I've only tried it on XP
 so far). Now would be a very good time for people to yelp if they
 think this is a bad idea (the only downside I can see if accessing
 files for server-side copy etc, but users that want to do that can
 install with a non-default account). If we go ahead, we'll include it
 in 9.2.

What happens if they try to use this to upgrade from the EDB 9.1 installers?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Minimising windows installer password confusion

2012-06-14 Thread Dave Page

On Thu, Jun 14, 2012 at 5:38 PM, Robert Haas robertmh...@gmail.com wrote:
 On Thu, Jun 14, 2012 at 11:59 AM, Dave Page dp...@pgadmin.org wrote:
 On Thu, Jun 14, 2012 at 11:43 AM, Dave Page dp...@pgadmin.org wrote:

 I'll have a play with it and see if a simple switch to NetworkService
 seems feasible.

 OK, I worked up a patch which uses NT AUTHORITY\NetworkService as
 the service account by default. This doesn't need a password, so
 allows us to simply prompt during installation for the superuser
 password for the cluster, and not at all during upgrade. If you run
 the installer from the command line with --serviceaccount postgres
 (or some other account name), you get the current behaviour.

 I've posted it on our internal ReviewBoard system for the rest of the
 team to review and test on various platforms (I've only tried it on XP
 so far). Now would be a very good time for people to yelp if they
 think this is a bad idea (the only downside I can see if accessing
 files for server-side copy etc, but users that want to do that can
 install with a non-default account). If we go ahead, we'll include it
 in 9.2.

 What happens if they try to use this to upgrade from the EDB 9.1 installers?

The installers don't support major version upgrades - it'll install
alongside 9.1.

If someone has an existing beta installation of 9.2, it'll use the
service account that was installed with, just as past minor-version
upgrades would (including asking for the password).

-- 
Dave Page
Blog: http://pgsnake.blogspot.com
Twitter: @pgsnake

EnterpriseDB UK: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch pg_is_in_backup()

2012-06-14 Thread Robert Haas

On Thu, Jun 14, 2012 at 6:06 AM, Gabriele Bartolini
gabriele.bartol...@2ndquadrant.it wrote:
    thank you very much for your patience (and thank you Marco for supporting
 me). I apologise for the delay.

    I have retested the updated patch and it works fine with me. It is ready
 for committer for me.

Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Namespace of array of user defined types is confused by the parser in insert?

2012-06-14 Thread Robert Haas

On Mon, Apr 23, 2012 at 9:42 AM, Krzysztof Nienartowicz
krzysztof.nienartow...@gmail.com wrote:
 Hello,
 Sorry for re-posting - I initially posted this in pgsql.sql - probably
 this group is more appropriate.

 I have a bizzare problem that started to manifest itself after
 addition of field being the array of compound UDTs to the table
 declared in multiple schemas.
 It is clearly related to how the type namespace is resolved and shows
 up for the JDBC client (probably related to the paramterized query, as
 the static query works without problems).

I'm replying to this awfully late, but I'm guessing this is some kind
of JDBC magic, not anything that PostgreSQL is causing directly.  You
might want to post to pgsql-jdbc, if you haven't already.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] psql tab completion for GRANT role

2012-06-14 Thread Robert Haas

On Sun, Jan 8, 2012 at 3:48 PM, Peter Eisentraut pete...@gmx.net wrote:
 psql tab completion currently only supports the form GRANT privilege ON
 something TO someone (and the analogous REVOKE), but not the form GRANT
 role TO someone.  Here is a patch that attempts to implement the latter.

This seems to have fallen through the cracks.  It doesn't apply any
more, but one general comment is that it seems undesirable to
repeatedly recapitulate the list of all privileges that exist in the
system.  That's a lot of places that someone will have to find and fix
when new privileges are added.

But +1 on the general idea.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch pg_is_in_backup()

2012-06-14 Thread Gurjeet Singh

On Thu, Jun 14, 2012 at 1:29 PM, Robert Haas robertmh...@gmail.com wrote:

 On Thu, Jun 14, 2012 at 6:06 AM, Gabriele Bartolini
 gabriele.bartol...@2ndquadrant.it wrote:
 thank you very much for your patience (and thank you Marco for
 supporting
  me). I apologise for the delay.
 
 I have retested the updated patch and it works fine with me. It is
 ready
  for committer for me.

 Committed.


A minor gripe:

+ /*
+ * Close the backup label file.
+ */
+ if (ferror(lfp) || FreeFile(lfp)) {
+ereport(ERROR,
+(errcode_for_file_access(),
+errmsg(could not read file \%s\: %m,
+ BACKUP_LABEL_FILE)));
+ }
+

If ferror(lfp) returns false, wouldn't we miss the FreeFile() and leak a
file pointer?

Regards,
-- 
Gurjeet Singh
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Re: [HACKERS] sortsupport for text

2012-06-14 Thread Peter Geoghegan

On 14 June 2012 17:35, Robert Haas robertmh...@gmail.com wrote:
 The problem with pre-detoasting to save comparison cycles is that you
 can now fit many, many fewer tuples in work_mem.  There might be cases
 where it wins (for example, because the entire data set fits even
 after decompressing everything) but in most cases it seems like a
 loser.

If I had to guess, I'd say you're probably right about that -
optimising sorting toasted text doesn't seem like a terribly sensible
use of your time.

What about the strxfrm suggestion of Greg's? You might find that the
added benefit of being able to avail of a highly optimised strcmp()
tipped the balance in favour of that idea, beyond the simple fact that
there's only a linear number of what you might loosely call strcoll_l
units of work rather than as many as O(n ^ 2). Furthermore, I'd
speculate that if you were to interlace the strxfrm() calls with
copying each text string, somewhere like within a specialised
datumCopy(), that would make the approach more efficient still, as you
specify a location for the blob in the just-palloc()'d leading-key
private memory directly, rather than just using memcpy.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch pg_is_in_backup()

2012-06-14 Thread Robert Haas

On Thu, Jun 14, 2012 at 1:48 PM, Gurjeet Singh singh.gurj...@gmail.com wrote:
 On Thu, Jun 14, 2012 at 1:29 PM, Robert Haas robertmh...@gmail.com wrote:

 On Thu, Jun 14, 2012 at 6:06 AM, Gabriele Bartolini
 gabriele.bartol...@2ndquadrant.it wrote:
     thank you very much for your patience (and thank you Marco for
  supporting
  me). I apologise for the delay.
 
     I have retested the updated patch and it works fine with me. It is
  ready
  for committer for me.

 Committed.


 A minor gripe:

 +     /*
 +     * Close the backup label file.
 +     */
 +     if (ferror(lfp) || FreeFile(lfp)) {
 +        ereport(ERROR,
 +            (errcode_for_file_access(),
 +            errmsg(could not read file \%s\: %m,
 +                             BACKUP_LABEL_FILE)));
 +     }
 +

 If ferror(lfp) returns false, wouldn't we miss the FreeFile() and leak a
 file pointer?

Well, according to the comments for AllocateFile:

 * fd.c will automatically close all files opened with AllocateFile at
 * transaction commit or abort; this prevents FD leakage if a routine
 * that calls AllocateFile is terminated prematurely by ereport(ERROR).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch pg_is_in_backup()

2012-06-14 Thread Fujii Masao

On Fri, Jun 15, 2012 at 2:29 AM, Robert Haas robertmh...@gmail.com wrote:
 On Thu, Jun 14, 2012 at 6:06 AM, Gabriele Bartolini
 gabriele.bartol...@2ndquadrant.it wrote:
    thank you very much for your patience (and thank you Marco for supporting
 me). I apologise for the delay.

    I have retested the updated patch and it works fine with me. It is ready
 for committer for me.

 Committed.

I think the attached patch needs to be applied.

Regards,

-- 
Fujii Masao


doc_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] sortsupport for text

2012-06-14 Thread Peter Geoghegan

On 2 March 2012 20:45, Robert Haas robertmh...@gmail.com wrote:
 I decided to investigate the possible virtues of allowing text to
 use the sortsupport infrastructure, since strings are something people
 often want to sort.

I should mention up-front that I agree with the idea that it is worth
optimising text sorting because it is a very common thing to have to
do, and therefore the standard for inclusion ought to be lower. I
don't intend to talk about tapesort though - that isn't really fair,
not least because I have some serious doubts about the quality of our
implementation. Furthermore, I think that it is logical that doing
things like resolving collations occur within a preparatory function
in advance of sorting, rather than redundantly doing that for each and
every comparison.

Why have you made the reusable buffer managed by sortsupport
TEXTBUFLEN-aligned? The existing rationale for that constant (whose
value is 1024) does not seem to carry forward here:

 * This should be large enough that most strings will be fit, but small
 * enough that we feel comfortable putting it on the stack.

ISTM it would be on average worth the hit of having to repalloc a few
more times for larger strings by making that buffer much smaller
initially, and doubling its size each time that proved insufficient,
rather than increasing its size to the smallest possible
TEXTBUFLEN-aligned size that you can get away with for the immediately
subsequent memcpy. Realistically, any database I've ever worked with
would probably be able to fit a large majority of its text strings
into 16 chars of memory - you yourself said that sorting toasted text
isn't at all common.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] sortsupport for text

2012-06-14 Thread Robert Haas

On Thu, Jun 14, 2012 at 1:56 PM, Peter Geoghegan pe...@2ndquadrant.com wrote:
 On 14 June 2012 17:35, Robert Haas robertmh...@gmail.com wrote:
 The problem with pre-detoasting to save comparison cycles is that you
 can now fit many, many fewer tuples in work_mem.  There might be cases
 where it wins (for example, because the entire data set fits even
 after decompressing everything) but in most cases it seems like a
 loser.

 If I had to guess, I'd say you're probably right about that -
 optimising sorting toasted text doesn't seem like a terribly sensible
 use of your time.

 What about the strxfrm suggestion of Greg's? You might find that the
 added benefit of being able to avail of a highly optimised strcmp()
 tipped the balance in favour of that idea, beyond the simple fact that
 there's only a linear number of what you might loosely call strcoll_l
 units of work rather than as many as O(n ^ 2). Furthermore, I'd
 speculate that if you were to interlace the strxfrm() calls with
 copying each text string, somewhere like within a specialised
 datumCopy(), that would make the approach more efficient still, as you
 specify a location for the blob in the just-palloc()'d leading-key
 private memory directly, rather than just using memcpy.

Well, it's still got the problem of blowing up memory usage.  I just
can't get excited about optimizing for the case where we can consume
10x the memory and still fit in work_mem.  If we've got that case, the
sort is gonna be pretty fast anyway.   The case where preprocessing
wins is when there are going to be a large number of comparisons
against each tuple - i.e. lg(N) is large.  But the cases where we
could pre-transform with strxfrm are those where the data fits in a
small percentage of work mem - i.e. lg(N) is small.  I'm open to
somebody showing up with a test result that demonstrates that it's
worthwhile, but to me it seems like it's chasing diminishing returns.

The point of this patch isn't really to improve things for the
collation-aware case, although it's nice that it does.  The point is
rather to shave off a double-digit percentage off the time it takes to
do the sort required to build a C-collation index, which is what
people should be using when they don't care about  and , which most
don't.  Despite Tom's concerns, I don't think there's anything in this
patch that can't be fairly easily revised at a later date if we decide
we want a different API.  I think it's worth picking the low-hanging
fruit in the meantime.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] sortsupport for text

2012-06-14 Thread Robert Haas

On Thu, Jun 14, 2012 at 2:10 PM, Peter Geoghegan pe...@2ndquadrant.com wrote:
 Why have you made the reusable buffer managed by sortsupport
 TEXTBUFLEN-aligned? The existing rationale for that constant (whose
 value is 1024) does not seem to carry forward here:

  * This should be large enough that most strings will be fit, but small
  * enough that we feel comfortable putting it on the stack.

Well, as the comments explain:

+   /*
+* We avoid repeated palloc/pfree on long strings by keeping the buffers
+* we allocate around for the duration of the sort.  When we
expand them,
+* we round off the to the next multiple of TEXTBUFLEN in order to avoid
+* repeatedly expanding them by very small amounts.
+*/

 ISTM it would be on average worth the hit of having to repalloc a few
 more times for larger strings by making that buffer much smaller
 initially, and doubling its size each time that proved insufficient,
 rather than increasing its size to the smallest possible
 TEXTBUFLEN-aligned size that you can get away with for the immediately
 subsequent memcpy. Realistically, any database I've ever worked with
 would probably be able to fit a large majority of its text strings
 into 16 chars of memory - you yourself said that sorting toasted text
 isn't at all common.

I thought that doubling repeatedly would be overly aggressive in terms
of memory usage.  Blowing the buffers out to 8kB because we hit a
string that's a bit over 4kB isn't so bad, but blowing them out to
256MB because we hit a string that's a bit over 128MB seems a bit
excessive.

Also, I don't think it really saves anything from a performance point
of view.  The worst case for this is if we're asked to repeatedly
compare strings that expand the buffer by a kilobyte each time.
First, how likely is that to happen in a real world test case?  In
many cases, most of the input strings will be of approximately equal
length; also in many cases, that length will be short; even if the
lengths take the worst possible distribution, we have to hit them in
the worst possible order for this to be a problem.  Second, even if it
does happen, does it really matter?  Suppose we expand the buffer a
kilobyte at a time from an initial size of 1kB all the way out to
256MB.  That's 256,000 palloc calls, so we must be sorting at least
256,000 datums, at least 128,000 of which are longer than 128MB.  I
think the cost of calling memcpy() and strcoll() repeatedly on all
those long datums - not to mention repeatedly detoasting them - is
going to bludgeon the palloc overhead into complete insignificance.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch pg_is_in_backup()

2012-06-14 Thread Robert Haas

On Thu, Jun 14, 2012 at 2:05 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Fri, Jun 15, 2012 at 2:29 AM, Robert Haas robertmh...@gmail.com wrote:
 On Thu, Jun 14, 2012 at 6:06 AM, Gabriele Bartolini
 gabriele.bartol...@2ndquadrant.it wrote:
    thank you very much for your patience (and thank you Marco for supporting
 me). I apologise for the delay.

    I have retested the updated patch and it works fine with me. It is ready
 for committer for me.

 Committed.

 I think the attached patch needs to be applied.

Thanks, committed.

(Somebody should really give you your own commit bit.)

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] libpq compression

2012-06-14 Thread Tom Lane

Euler Taveira eu...@timbira.com writes:
 On 14-06-2012 02:19, Tom Lane wrote:
 ...  I especially think that importing bzip2
 is a pretty bad idea --- it's not only a new dependency, but bzip2's
 compression versus speed tradeoff is entirely inappropriate for this
 use-case.

 I see, the idea is that bzip2 would be a compiled-in option (not enabled by
 default) just to give another compression option.

I'm not particularly thrilled with let's have more compression options
just to have options.  Each such option you add is another source of
fail-to-connect incompatibilities (when either the client or the server
doesn't have it).  Moreover, while there are a lot of compression
algorithms out there, a lot of them are completely unsuited for this
use-case.  If memory serves, bzip2 for example requires fairly large
data blocks in order to get decent compression, which means you are
either going to get terrible compression or suffer very bad latency
when trying to apply it to a connection data stream.

So I've got very little patience with the idea of let's put in some
hooks and then great things will happen.  It would be far better all
around if we supported exactly one, well-chosen, method.  But really
I still don't see a reason not to let openssl do it for us.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 03/16] Add a new syscache to fetch a pg_class entry via its relfilenode

2012-06-14 Thread Robert Haas

On Wed, Jun 13, 2012 at 7:28 AM, Andres Freund and...@2ndquadrant.com wrote:
 This patch is problematic because formally indexes used by syscaches needs to
 be unique, this one is not though because of 0/InvalidOids entries for
 nailed/shared catalog entries. Those values aren't allowed to be queried 
 though.

That's not the only reason it's not unique.  Take a look at
GetRelFileNode().  We really only guarantee that database OID,
tablespace OID, relfilenode, backend-ID, taken as a four-tuple, is
unique.  You could have the same relfilenode in different tablespaces,
or even within the same tablespace with different backend-IDs.  The
latter might not matter for you because you're presumably disregarding
temp tables, but the former probably does.  It's an uncommon scenario
because we normally set relid = relfilenode, and of course relid is
unique across the database, but the table gets rewritten then you end
up with relid != relfilenode, and I don't think there's anything at
that point that will prevent the new relfilenode from being chosen as
some other relations relfilenode, as long as it's in a different
tablespace.

I think the solution may be to create a specialized cache for this,
rather than relying on the general syscache infrastructure.  You might
look at, e.g., attoptcache.c for an example.  That would allow you to
build a cache that is aware of things like the relmapper
infrastructure, and the fact that temp tables are ignorable for your
purposes.  But I think you will need to include at least the
tablespace OID in the key along with the relfilenode to make it
bullet-proof.

I haven't read through the patch series far enough to know what this
is being used for yet, but my fear is that you're using it to handle
mapping a relflenode extracted from the WAL stream back to a relation
OID.  The problem with that is that relfilenode assignments obey
transaction semantics.  So, if someone begins a transaction, truncates
a table, inserts a tuple, and commits, the heap_insert record is going
to refer to a relfilenode that, according to the system catalogs,
doesn't exist.  This is similar to one of the worries in my other
email, so I won't belabor the point too much more here...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allow WAL information to recover corrupted pg_controldata

2012-06-14 Thread Robert Haas

On Thu, Jun 14, 2012 at 11:39 AM, Amit Kapila amit.kap...@huawei.com wrote:
 I am planning to work on the below Todo list item for this CommitFest
 Allow WAL information to recover corrupted pg_controldata
 http://archives.postgresql.org/pgsql-patches/2006-06/msg00025.php

The deadline for patches for this CommitFest is today, so I think you
should target any work you're starting now for the NEXT CommitFest.

 I wanted to confirm my understanding about the work involved for this patch:
 The existing patch has following set of problems:
    1. Memory leak and linked list code path is not proper
    2. lock check for if the server is already running, is removed in patch
 which needs to be reverted
    3. Refactoring of the code.

 Apart from above what I understood from the patch is that its intention is
 to generate values for ControlFile using WAL logs when -r option is used.

 The change in algorithm from current will be if control file is corrupt
 which essentialy means ReadControlFile() will return False, then it should
 generate values (checkPointCopy, checkPoint, prevCheckPoint, state) using
 WAL if -r option is enabled.

 Also for -r option, it doesn't need to call function FindEndOfXLOG() as the
 that work will be achieved by above point.

 It will just rewrite the control file and don’t do other resets.


 The algorithm of restoring the pg_control value from old xlog file:
    1. Retrieve all of the active xlog files from xlog direcotry into a list
 by increasing order, according their timeline, log id, segment id.
    2. Search the list to find the oldest xlog file of the lastest time line.
    3. Search the records from the oldest xlog file of latest time line to
 the latest xlog file of latest time line, if the checkpoint record
       has been found, update the latest checkpoint and previous checkpoint.



 Apart from above some changes in code will be required after the Xlog patch
 by Heikki.

 Suggest me if my understanding is correct?

I guess my first question is: why do we need this?  There are lots of
things in the TODO list that someone wanted once upon a time, but
they're not all actually important.  Do you have reason to believe
that this one is?  It's been six years since that email, so it's worth
asking if this is actually relevant.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch pg_is_in_backup()

2012-06-14 Thread Gurjeet Singh

On Thu, Jun 14, 2012 at 2:02 PM, Robert Haas robertmh...@gmail.com wrote:

 On Thu, Jun 14, 2012 at 1:48 PM, Gurjeet Singh singh.gurj...@gmail.com
 wrote:
  On Thu, Jun 14, 2012 at 1:29 PM, Robert Haas robertmh...@gmail.com
 wrote:
  Committed.
 
 
  A minor gripe:
 
  + /*
  + * Close the backup label file.
  + */
  + if (ferror(lfp) || FreeFile(lfp)) {
  +ereport(ERROR,
  +(errcode_for_file_access(),
  +errmsg(could not read file \%s\: %m,
  + BACKUP_LABEL_FILE)));
  + }
  +
 
  If ferror(lfp) returns false, wouldn't we miss the FreeFile() and leak a
  file pointer?

 Well, according to the comments for AllocateFile:

  * fd.c will automatically close all files opened with AllocateFile at
  * transaction commit or abort; this prevents FD leakage if a routine
  * that calls AllocateFile is terminated prematurely by ereport(ERROR).


I bet anyone else looking at this code, who is not in the know, will trip
over this again.

Another problem with that code block is that it will throw could not read
even though read has succeeded but FreeFile() failed.

I say we break it into two blocks, one to handle read error, and then close
the file separately. Also, either make sure FreeFile() is called in all
code paths, or not call FreeFile() at all and reference to the comment
above AllocateFile().

Patch attached.

Regards,
-- 
Gurjeet Singh
EnterpriseDB Corporation
The Enterprise PostgreSQL Company


FreeFile.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] COMMENT on function's arguments

2012-06-14 Thread Robert Haas

On Tue, Jun 12, 2012 at 10:59 PM, Vlad Arkhipov arhi...@dc.baikal.ru wrote:
 Does it make sense to have a comment on function's arguments? Of course it
 is possible to include these comments in a function's comment, but may be
 better to have them in more formalized way like comments on columns of a
 table. IDEs may use this information when providing hints for a function
 like in other languages.

This would be somewhat tricky, because our COMMENT support assumes
that the object upon which we're commenting has an ObjectAddress, and
individual arguments to a function don't, although perhaps the
sub-object-id stuff that we currently use to handle comments on table
columns could be extended to handle this case.  I guess I wouldn't
object to a well-done patch that made this work, but creating such a
patch seems likely to be tricky, owing to the fact that there's
nothing in the system that thinks of the individual arguments to a
function as separate objects at present.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch pg_is_in_backup()

2012-06-14 Thread Robert Haas

On Thu, Jun 14, 2012 at 3:10 PM, Gurjeet Singh singh.gurj...@gmail.com wrote:
 Well, according to the comments for AllocateFile:

  * fd.c will automatically close all files opened with AllocateFile at
  * transaction commit or abort; this prevents FD leakage if a routine
  * that calls AllocateFile is terminated prematurely by ereport(ERROR).

 I bet anyone else looking at this code, who is not in the know, will trip
 over this again.

 Another problem with that code block is that it will throw could not read
 even though read has succeeded but FreeFile() failed.

 I say we break it into two blocks, one to handle read error, and then close
 the file separately. Also, either make sure FreeFile() is called in all code
 paths, or not call FreeFile() at all and reference to the comment above
 AllocateFile().

 Patch attached.

I agree with breaking it into two blocks, but I don't agree that the
comments need to recapitulate how AllocateFile works.  Also, you had
the same primary content for both comments, and you incorrectly
reversed the sense of the FreeFile() test.

Committed with those changes.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] sortsupport for text

2012-06-14 Thread Peter Geoghegan

On 14 June 2012 19:28, Robert Haas robertmh...@gmail.com wrote:
 I thought that doubling repeatedly would be overly aggressive in terms
 of memory usage.  Blowing the buffers out to 8kB because we hit a
 string that's a bit over 4kB isn't so bad, but blowing them out to
 256MB because we hit a string that's a bit over 128MB seems a bit
 excessive.

That's pretty much what all popular dynamic array implementations do,
from C++'s std::vector to Python's list (it's a misnomer). Having to
allocate 256MB for a buffer to contain a string a bit over 128MB may
seem excessive, until you later get a 250MB string. Even if doubling
is generally excessive, which I doubt, that's beside the point, which
is that expanding the array by some constant proportion results in
each insertion taking amortized constant time.

 Suppose we expand the buffer a
 kilobyte at a time from an initial size of 1kB all the way out to
 256MB.  That's 256,000 palloc calls, so we must be sorting at least
 256,000 datums, at least 128,000 of which are longer than 128MB.  I
 think the cost of calling memcpy() and strcoll() repeatedly on all
 those long datums - not to mention repeatedly detoasting them - is
 going to bludgeon the palloc overhead into complete insignificance.

I fail to understand how this sortsupport buffer fundamentally differs
from a generic dynamic array abstraction built to contain chars. That
being the case, I see no reason not to just do what everyone else does
when expanding dynamic arrays, and no reason why we shouldn't make
essentially the same time-space trade-off here as others do elsewhere.

Another concern is that it seems fairly pointless to have two buffers.
Wouldn't it be more sensible to have a single buffer that was
partitioned to make two logical, equally-sized buffers, given that in
general each buffer is expected to grow at exactly the same rate?

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] sortsupport for text

2012-06-14 Thread Robert Haas

On Thu, Jun 14, 2012 at 3:24 PM, Peter Geoghegan pe...@2ndquadrant.com wrote:
 On 14 June 2012 19:28, Robert Haas robertmh...@gmail.com wrote:
 I thought that doubling repeatedly would be overly aggressive in terms
 of memory usage.  Blowing the buffers out to 8kB because we hit a
 string that's a bit over 4kB isn't so bad, but blowing them out to
 256MB because we hit a string that's a bit over 128MB seems a bit
 excessive.

 That's pretty much what all popular dynamic array implementations do,
 from C++'s std::vector to Python's list (it's a misnomer). Having to
 allocate 256MB for a buffer to contain a string a bit over 128MB may
 seem excessive, until you later get a 250MB string. Even if doubling
 is generally excessive, which I doubt, that's beside the point, which
 is that expanding the array by some constant proportion results in
 each insertion taking amortized constant time.

Yeah, but *it doesn't matter*.  If you test this on strings that are
long enough that they get pushed out to TOAST, you'll find that it
doesn't measurably improve performance, because the overhead of
detoasting so completely dominates any savings on the palloc side that
you can't pick them out of the inter-run noise.  Risking eating up an
extra 100MB of memory that we don't really need in order to obtain a
performance optimization that is far too small to measure does not
make sense.  The case with std::vector is not analagous; they don't
have any way of knowing what other overhead you are incurring between
insertions into the vector, so it's reasonable to support that the
cost of the vector insertions themselves might be material.  Here we
know that it doesn't matter, so the application of Knuth's first law
of optimization is appropriate.

 Suppose we expand the buffer a
 kilobyte at a time from an initial size of 1kB all the way out to
 256MB.  That's 256,000 palloc calls, so we must be sorting at least
 256,000 datums, at least 128,000 of which are longer than 128MB.  I
 think the cost of calling memcpy() and strcoll() repeatedly on all
 those long datums - not to mention repeatedly detoasting them - is
 going to bludgeon the palloc overhead into complete insignificance.

 I fail to understand how this sortsupport buffer fundamentally differs
 from a generic dynamic array abstraction built to contain chars. That
 being the case, I see no reason not to just do what everyone else does
 when expanding dynamic arrays, and no reason why we shouldn't make
 essentially the same time-space trade-off here as others do elsewhere.

 Another concern is that it seems fairly pointless to have two buffers.
 Wouldn't it be more sensible to have a single buffer that was
 partitioned to make two logical, equally-sized buffers, given that in
 general each buffer is expected to grow at exactly the same rate?

Sure, but it would be making the code more complicated in return for
no measurable performance benefit.  We generally avoid that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch pg_is_in_backup()

2012-06-14 Thread Gurjeet Singh

On Thu, Jun 14, 2012 at 3:20 PM, Robert Haas robertmh...@gmail.com wrote:

 Also, you had
 the same primary content for both comments, and you incorrectly
 reversed the sense of the FreeFile() test.


I am known for my fat fingers :)


 Committed with those changes.


Thanks!
-- 
Gurjeet Singh
EnterpriseDB Corporation
The Enterprise PostgreSQL Company

Re: [HACKERS] libpq compression

2012-06-14 Thread Merlin Moncure

On Thu, Jun 14, 2012 at 1:43 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 So I've got very little patience with the idea of let's put in some
 hooks and then great things will happen.  It would be far better all
 around if we supported exactly one, well-chosen, method.  But really
 I still don't see a reason not to let openssl do it for us.

Well, for toast compression the right choice is definitely one of the
lz based algorithms (not libz!).  For transport compression you have
the case of sending large data over very slow and/or expensive links
in which case you want to use bzip type methods.  But in the majority
of cases I'd probably be using lz there too.  So if I had to pick just
one, there you go.  But which one? the lz algorithm with arguably the
best pedigree (lzo) is gnu but there are many other decent candidates,
some of which have really tiny implementations.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] libpq compression

2012-06-14 Thread k...@rice.edu

On Thu, Jun 14, 2012 at 02:38:02PM -0500, Merlin Moncure wrote:
 On Thu, Jun 14, 2012 at 1:43 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  So I've got very little patience with the idea of let's put in some
  hooks and then great things will happen.  It would be far better all
  around if we supported exactly one, well-chosen, method.  But really
  I still don't see a reason not to let openssl do it for us.
 
 Well, for toast compression the right choice is definitely one of the
 lz based algorithms (not libz!).  For transport compression you have
 the case of sending large data over very slow and/or expensive links
 in which case you want to use bzip type methods.  But in the majority
 of cases I'd probably be using lz there too.  So if I had to pick just
 one, there you go.  But which one? the lz algorithm with arguably the
 best pedigree (lzo) is gnu but there are many other decent candidates,
 some of which have really tiny implementations.
 
 merlin
 

+1 for a very fast compressor/de-compressor. lz4 from Google has
a BSD license and at 8.5X faster compression than zlib(-1) and
5X faster de-compression the zlib (-1), 2X faster than LZO even
would be my pick. 

Regards,
Ken

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [RFC][PATCH] Logical Replication/BDR prototype and architecture

2012-06-14 Thread Andres Freund

Hi Robert,

Thanks for your answer.

On Thursday, June 14, 2012 06:17:26 PM Robert Haas wrote:
 On Wed, Jun 13, 2012 at 7:27 AM, Andres Freund and...@2ndquadrant.com 
wrote:
  === Design goals for logical replication === :
  - in core
  - fast
  - async
  - robust
  - multi-master
  - modular
  - as unintrusive as possible implementation wise
  - basis for other technologies (sharding, replication into other DBMSs,
  ...)
 
 I agree with all of these goals except for multi-master.  I am not
 sure that there is a need to have a multi-master replication solution
 in core.  The big tricky part of multi-master replication is conflict
 resolution, and that seems like an awful lot of logic to try to build
 into core - especially given that we will want it to be extensible.
I don't plan to throw in loads of conflict resolution smarts. The aim is to get 
to the place where all the infrastructure is there so that a MM solution can 
be built by basically plugging in a conflict resolution mechanism. Maybe 
providing a very simple one.
I think without in-core support its really, really hard to build a sensible MM 
implementation. Which doesn't mean it has to live entirely in core.

Loads of the use-cases we have seen lately have a relatively small, low-
conflict shared dataset and a far bigger sharded one. While that obviously 
isn't the only relevant use case it is a senible important one.

 More generally, I would much rather see us focus on efficiently
 extracting changesets from WAL and efficiently applying those
 changesets on another server.  IMHO, those are the things that are
 holding back the not-in-core replication solutions we have today,
 particularly the first.  If we come up with a better way of applying
 change-sets, well, Slony can implement that too; they are already
 installing C code.  What neither they nor any other non-core solution
 can implement is change-set extraction, and therefore I think that
 ought to be the focus.
It definitely is a very important focus. I don't think it is the only one.  But 
that doesn't seem to be a problem to me as long as everything is kept fairly 
modular (which I tried rather hard to).

 To put all that another way, I think it is a 100% bad idea to try to
 kill off Slony, Bucardo, Londiste, or any of the many home-grown
 solutions that are out there to do replication.  Even if there were no
 technical place for third-party replication products (and I think
 there is), we will not win many friends by making it harder to extend
 and add on to the server.  If we build an in-core replication solution
 that can be used all by itself, that is fine with me.  But I think it
 should also be able to expose its constituent parts as a toolkit for
 third-party solutions.
I agree 100%. Unfortunately I forgot to explictly make that point, but the 
plan is definitely is to make the life of other replication solutions easier 
not harder. I don't think there will ever be one replication solution that fits 
every use-case perfectly.
At pgcon I talked with some of the slony guys and they were definetly 
interested in the changeset generation and I have kept that in mind. If some 
problems that need resolving indepently of that issue is resolved (namely DDL) 
it shouldn't take much generating their output format. The 'apply' code is 
fully abstracted and separted.

  While you may argue that most of the above design goals are already
  provided by various trigger based replication solutions like Londiste or
  Slony, we think that thats not enough for various reasons:
  
  - not in core (and thus less trustworthy)
  - duplication of writes due to an additional log
  - performance in general (check the end of the above presentation)
  - complex to use because there is no native administration interface
 
 I think that your parenthetical note (and thus less trustworthy)
 gets at another very important point, which is that one of the
 standards for inclusion in core is that it must in fact be trustworthy
 enough to justify the confidence that users will place in it.  It will
 NOT benefit the project to have two replication solutions in core, one
 of which is crappy.  More, even if what we put in core is AS GOOD as
 the best third-party solutions that are available, I don't think
 that's adequate.  It has to be better.  If it isn't, there is no
 excuse for preempting what's already out there.
I aggree that it has to be very good. *But* I think it is totally acceptable 
if it doesn't have all the bells and whistles from the start. That would be a 
sure road to disaster. For one implementing all that takes time and for 
another the amount of discussions till we are there is rather huge.

 I imagine you are thinking along similar lines, but I think that it
 bears being explicit about.
Seems like were thinking along the same lines, yes.

  The biggest problem is, that interpreting tuples in the WAL stream
  requires an up-to-date system catalog and needs to be done in a
  compatible backend

Re: [HACKERS] [PATCH 03/16] Add a new syscache to fetch a pg_class entry via its relfilenode

2012-06-14 Thread Andres Freund

On Thursday, June 14, 2012 08:50:51 PM Robert Haas wrote:
 On Wed, Jun 13, 2012 at 7:28 AM, Andres Freund and...@2ndquadrant.com 
wrote:
  This patch is problematic because formally indexes used by syscaches
  needs to be unique, this one is not though because of 0/InvalidOids
  entries for nailed/shared catalog entries. Those values aren't allowed
  to be queried though.
 That's not the only reason it's not unique.  Take a look at
 GetRelFileNode().  We really only guarantee that database OID,
 tablespace OID, relfilenode, backend-ID, taken as a four-tuple, is
 unique.  You could have the same relfilenode in different tablespaces,
 or even within the same tablespace with different backend-IDs.  The
 latter might not matter for you because you're presumably disregarding
 temp tables, but the former probably does.  It's an uncommon scenario
 because we normally set relid = relfilenode, and of course relid is
 unique across the database, but the table gets rewritten then you end
 up with relid != relfilenode, and I don't think there's anything at
 that point that will prevent the new relfilenode from being chosen as
 some other relations relfilenode, as long as it's in a different
 tablespace.
 
 I think the solution may be to create a specialized cache for this,
 rather than relying on the general syscache infrastructure.  You might
 look at, e.g., attoptcache.c for an example.  That would allow you to
 build a cache that is aware of things like the relmapper
 infrastructure, and the fact that temp tables are ignorable for your
 purposes.  But I think you will need to include at least the
 tablespace OID in the key along with the relfilenode to make it
 bullet-proof.
Yes, the tablespace OID should definitely be in there. Need to read up on the 
details of an own cache. Once more I didn't want to put in more work before 
discussing it here.

 I haven't read through the patch series far enough to know what this
 is being used for yet, but my fear is that you're using it to handle
 mapping a relflenode extracted from the WAL stream back to a relation
 OID.  The problem with that is that relfilenode assignments obey
 transaction semantics.  So, if someone begins a transaction, truncates
 a table, inserts a tuple, and commits, the heap_insert record is going
 to refer to a relfilenode that, according to the system catalogs,
 doesn't exist.  This is similar to one of the worries in my other
 email, so I won't belabor the point too much more here...
Well, yes. We *need* to do the mapping back from the relfilenode to a table. 
The idea is that by the fact that the receiving side, be it a full cluster or 
just a catalog one, has a fully synchronized catalog in which DDL gets applied 
correctly, inside the transaction just as in the sending side, it should never 
be wrong to do that mapping.
It probably is necessary to make the syscache lookup/infrastructure to use an 
MVCCish snapshot though. No idea how hard that would be yet. Might be a good 
argument for your argument of using a specialized cache.

Lets sidetrack this till we have a tender agreement on how to handle DDL ;). I 
am aware of the issues with rollbacks, truncate et al...

Thanks,

Andres

-- 
Andres Freund   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 03/16] Add a new syscache to fetch a pg_class entry via its relfilenode

2012-06-14 Thread Robert Haas

On Thu, Jun 14, 2012 at 4:51 PM, Andres Freund and...@2ndquadrant.com wrote:
 Lets sidetrack this till we have a tender agreement on how to handle DDL ;). I
 am aware of the issues with rollbacks, truncate et al...

Agreed; I will write up my thoughts about DDL on the other thread.  I
think that's a key thing we need to figure out; once we understand how
we're handling that, the correct design for this will probably fall
out pretty naturally.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 03/16] Add a new syscache to fetch a pg_class entry via its relfilenode

2012-06-14 Thread Christopher Browne

On Thu, Jun 14, 2012 at 5:00 PM, Robert Haas robertmh...@gmail.com wrote:
 On Thu, Jun 14, 2012 at 4:51 PM, Andres Freund and...@2ndquadrant.com wrote:
 Lets sidetrack this till we have a tender agreement on how to handle DDL ;). 
 I
 am aware of the issues with rollbacks, truncate et al...

 Agreed; I will write up my thoughts about DDL on the other thread.  I
 think that's a key thing we need to figure out; once we understand how
 we're handling that, the correct design for this will probably fall
 out pretty naturally.

I wonder if *an* answer (if not forcibly a perfect one) is to provide
a capturable injection point.

A possible shape of this would be to have a function to which you pass
a DDL statement, at which point, it does two things:
a) Runs the DDL, to make the requested change, and
b) Captures the DDL in a convenient form in the WAL log so that it may
be detected and replayed at the right point in processing.

That's not a solution for capturing DDL automatically, but it's
something that would be useful-ish even today, for systems like Slony
and Londiste, and this would be natural to connect to Dimitri's Event
Triggers.

That also fits with the desire to have components that are (at least
hopefully) usable to other existing replication systems.
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, How would the Lone Ranger handle this?

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 06/16] Add support for a generic wal reading facility dubbed XLogReader

2012-06-14 Thread Heikki Linnakangas


On 13.06.2012 14:28, Andres Freund wrote:

Features:
- streaming reading/writing
- filtering
- reassembly of records

Reusing the ReadRecord infrastructure in situations where the code that wants
to do so is not tightly integrated into xlog.c is rather hard and would require
changes to rather integral parts of the recovery code which doesn't seem to be
a good idea.


It would be nice refactor ReadRecord and its subroutines out of xlog.c. 
That file has grown over the years to be really huge, and separating the 
code to read WAL sounds like it should be a pretty natural split. I 
don't want to duplicate all the WAL reading code, so we really should 
find a way to reuse that. I'd suggest rewriting ReadRecord into a thin 
wrapper that just calls the new xlogreader code.



Missing:
- compressing the stream when removing uninteresting records
- writing out correct CRCs
- validating CRCs
- separating reader/writer


- comments.

At a quick glance, I couldn't figure out how this works. There seems to 
be some callback functions? If you want to read an xlog stream using 
this facility, what do you do? Can this be used for writing WAL, as well 
as reading? If so, what do you need the write support for?


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: relation metapages

2012-06-14 Thread Heikki Linnakangas


On 14.06.2012 18:01, Robert Haas wrote:

What I'm really looking for at this stage of the game is feedback on
the design decisions I made.  The intention here is that it should be
possible to read old-format heaps and indexes transparently, but that
when we create or rewrite a relation, we add a new-style metapage.


That dodges the problem of pg_upgrade, but eventually, we will want to 
use the metapage for something important, and it can't be optional at 
that point anymore. So we will eventually need to deal with pg_upgrade 
somehow.



I put the new metapage code in src/backend/access/common/metapage.c,
but I don't have a lot of confidence that that's the appropriate
location for it.  Suggestions are appreciated.


That seems good to me.

It would be nice to have the oid of the access method in the metapage 
(or some other way to identify it).


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 06/16] Add support for a generic wal reading facility dubbed XLogReader

2012-06-14 Thread Andres Freund

On Thursday, June 14, 2012 11:19:00 PM Heikki Linnakangas wrote:
 On 13.06.2012 14:28, Andres Freund wrote:
  Features:
  - streaming reading/writing
  - filtering
  - reassembly of records
  
  Reusing the ReadRecord infrastructure in situations where the code that
  wants to do so is not tightly integrated into xlog.c is rather hard and
  would require changes to rather integral parts of the recovery code
  which doesn't seem to be a good idea.
 It would be nice refactor ReadRecord and its subroutines out of xlog.c.
 That file has grown over the years to be really huge, and separating the
 code to read WAL sounds like it should be a pretty natural split. I
 don't want to duplicate all the WAL reading code, so we really should
 find a way to reuse that. I'd suggest rewriting ReadRecord into a thin
 wrapper that just calls the new xlogreader code.
I aggree that it is not very nice to duplicate it. But I also don't want to go 
the route of replacing ReadRecord with it for a while, we can replace 
ReadRecord later if we want. As long as it is in flux like it is right now I 
don't really see the point in investing energy in it.
Also I am not that sure how a callback oriented API fits into the xlog.c 
workflow?

  Missing:
  - compressing the stream when removing uninteresting records
  - writing out correct CRCs
  - validating CRCs
  - separating reader/writer
 
 - comments.
 At a quick glance, I couldn't figure out how this works. There seems to
 be some callback functions? If you want to read an xlog stream using
 this facility, what do you do?
You currently have to fill out 4 callbacks:

XLogReaderStateInterestingCB is_record_interesting;
XLogReaderStateWriteoutCB writeout_data;
XLogReaderStateFinishedRecordCB finished_record;
XLogReaderStateReadPageCB read_page;

As an example how to use it (from the walsender support for 
START_LOGICAL_REPLICATION):

if(!xlogreader_state){
xlogreader_state = XLogReaderAllocate();
xlogreader_state-is_record_interesting = 
RecordRelevantForLogicalReplication;
xlogreader_state-finished_record = ProcessRecord;
xlogreader_state-writeout_data = WriteoutData;
xlogreader_state-read_page = XLogReadPage;

/* startptr is the current XLog position */
xlogreader_state-startptr = startptr;

XLogReaderReset(xlogreader_state);
}

/* how far does valid data go */
xlogreader_state-endptr = endptr;

XLogReaderRead(xlogreader_state);

The last step will then call the above callbacks till it reaches endptr. I.e. 
it first reads a page with read_page; then checks whether a record is 
interesting for the use-case (is_record_interesting); in case it is 
interesting, it gets reassembled and passed to the finished_record callback. 
Then the bytestream gets written out again with writeout_data.

In this case it gets written to the buffer the walsender has allocated. In 
others it might just get thrown away.

 Can this be used for writing WAL, as well as reading? If so, what do you
 need the write support for?
It currently can replace records which are not interesting (e.g. index changes 
in the case of logical rep). Filtered records are replaced with XLOG_NOOP 
records with correct length currently. In future the actual amount of data 
should really be reduced. I don't know yet know how to map LSNs of 
uncompressed/compressed stream onto each other...
The filtered data is then passed to a writeout callback (in a streaming 
fashion).

The whole writing out part is pretty ugly at the moment and I just bolted it 
ontop because it was convenient for the moment. I am not yet sure how the api 
for that should look

Andres

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] measuring spinning

2012-06-14 Thread Robert Haas

On Wed, Jan 11, 2012 at 8:48 PM, Robert Haas robertmh...@gmail.com wrote:
 I've had cause, a few times this development cycle, to want to measure
 the amount of spinning on each lwlock in the system.  To that end,
 I've found the attached patch useful.  Note that if you don't define
 LWLOCK_STATS, this changes nothing except that the return value from
 s_lock becomes int rather than void.  If you do define LWLOCK_STATS,
 then LWLockAcquire() counts the number of pg_usleep() calls that are
 required to acquire each LWLock, in addition to the other statistics.
 Since this has come up for me a few times now, I'd like to proposing
 including it in core.

Well, this fell through the cracks, because I forgot to add it to the
January CommitFest.  Here it is again, rebased.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


spindebug-v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 06/16] Add support for a generic wal reading facility dubbed XLogReader

2012-06-14 Thread Andres Freund

On Thursday, June 14, 2012 11:19:00 PM Heikki Linnakangas wrote:
 On 13.06.2012 14:28, Andres Freund wrote:
  Features:
  - streaming reading/writing
  - filtering
  - reassembly of records
  
  Reusing the ReadRecord infrastructure in situations where the code that
  wants to do so is not tightly integrated into xlog.c is rather hard and
  would require changes to rather integral parts of the recovery code
  which doesn't seem to be a good idea.
 
 It would be nice refactor ReadRecord and its subroutines out of xlog.c.
 That file has grown over the years to be really huge, and separating the
 code to read WAL sounds like it should be a pretty natural split. I
 don't want to duplicate all the WAL reading code, so we really should
 find a way to reuse that. I'd suggest rewriting ReadRecord into a thin
 wrapper that just calls the new xlogreader code.
 
  Missing:
  - compressing the stream when removing uninteresting records
  - writing out correct CRCs
  - validating CRCs
  - separating reader/writer
 
 - comments.
 
 At a quick glance, I couldn't figure out how this works. There seems to
 be some callback functions? If you want to read an xlog stream using
 this facility, what do you do? Can this be used for writing WAL, as well
 as reading? If so, what do you need the write support for?
Oh, btw, the callbacks and parameters are somewhat documented in the 
xlogreader.h header in the XLogReaderState struct.
Still needs improvement though.

Andres
-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL format changes

2012-06-14 Thread Andres Freund

On Thursday, June 14, 2012 11:01:42 PM Heikki Linnakangas wrote:
 As I threatened earlier
 (http://archives.postgresql.org/message-id/4fd0b1ab.3090...@enterprisedb.co
 m), here are three patches that change the WAL format. The goal is to
 change the format so that when you're inserting a WAL record of a given
 size, you know exactly how much space it requires in the WAL.
I fear the patches need rebasing after the pgindent run... Even before that 
(60801944fa105252b48ea5688d47dfc05c695042) it only applies with offsets?

Greetings,

Andres
-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WAL format changes

2012-06-14 Thread Andres Freund

On Thursday, June 14, 2012 11:01:42 PM Heikki Linnakangas wrote:
 As I threatened earlier
 (http://archives.postgresql.org/message-id/4fd0b1ab.3090...@enterprisedb.co
 m), here are three patches that change the WAL format. The goal is to
 change the format so that when you're inserting a WAL record of a given
 size, you know exactly how much space it requires in the WAL.
 
 1. Use a 64-bit segment number, instead of the log/seg combination. And
 don't waste the last segment on each logical 4 GB log file. The concept
 of a logical log file is now completely gone. XLogRecPtr is unchanged,
 but it should now be understood as a plain 64-bit value, just split into
 two 32-bit integers for historical reasons. On disk, this means that
 there will be log files ending in FF, those were skipped before.
Whats the reason for keeping that awkward split now? There aren't that many 
users of xlogid/xcrecoff and many of those would be better served by using 
helper macros.
API compatibility isn't a great argument either as code manually playing 
around with those needs to be checked anyway. I think there might be some code 
around that does XLogRecPtr addition manuall and such.

 2. Always include the xl_rem_len field, used for continuation records,
 in the xlog page header. A continuation log record only contained that
 one field, it's now included straight in the page header, so the concept
 of a continuation record doesn't exist anymore. Because of alignment,
 this wastes 4 bytes on every page that contains continued data from a
 previous record, and 8 bytes on pages that don't. That's not very much,
 and the next step will buy that back:
 
 3. Allow WAL record header to be split across pages. Per Tom's
 suggestion, move xl_tot_len to be the first field in XLogRecord, so that
 even if the header is split, xl_tot_len is always on the first page.
 xl_crc is moved to be the last field, and xl_prev is the second to last.
 This has the advantage that you can calculate the CRC for all the other
 fields before acquiring WALInsertLock. For xl_prev, you need to know
 where exactly the record is inserted, so it's handy that it's the last
 field before CRC. This patch doesn't try to take advantage of that,
 however, and I'm not sure if that makes any difference once I finish the
 patch to make XLogInsert scale better, which is the ultimate goal of all
 this.
 
 Those are the three patches I'd like to get committed in this
 commitfest. To see where all this is leading to, I've included a rough
 WIP version of the XLogInsert scaling patch. This version is quite
 different from the one I posted in spring, it takes advantage of the WAL
 format changes, and I'm also experimenting with a different method of
 tracking how far each WAL insertion has progressed. But more on that later.
 
 (Note to self: remember to bump XLOG_PAGE_MAGIC)
Will review.

Andres
-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] sortsupport for text

2012-06-14 Thread Peter Geoghegan

On 14 June 2012 20:32, Robert Haas robertmh...@gmail.com wrote:
 Yeah, but *it doesn't matter*.  If you test this on strings that are
 long enough that they get pushed out to TOAST, you'll find that it
 doesn't measurably improve performance, because the overhead of
 detoasting so completely dominates any savings on the palloc side that
 you can't pick them out of the inter-run noise.

That's probably true, but it's also beside the point. As recently as a
few hours ago, you yourself said my guess is that most values people
sort by are pretty short, making this concern mostly academic. Why
are you getting hung up on toasting now?

 Here we know that it doesn't matter, so the application of Knuth's first law
 of optimization is appropriate.

I'm not advocating some Byzantine optimisation, or even something that
could reasonably be described as an optimisation at all here. I'm
questioning why you've unnecessarily complicated the code by having
the buffer size just big enough to fit the biggest value seen so far,
but arbitrarily aligned to a value that is completely irrelevant to
bttextfastcmp_locale(), rather than using simple geometric expansion,
which is more or less the standard way of managing the growth of a
dynamic array.

You have to grow the array in some way. The basic approach I've
outlined has something to recommend it - why does it make sense to
align the size of the buffer to TEXTBUFLEN in particular though? It's
quite easy to imagine what you've done here resulting in an excessive
number of allocations (and pfree()s), which *could* be expensive. If
you're so conservative about allocating memory, don't grow the array
at quite so aggressive a rate as doubling it each time.

There is a trade-off between space and time to be made here, but I
don't know why you think that the right choice is to use almost the
smallest possible amount of memory in all cases.

 Another concern is that it seems fairly pointless to have two buffers.
 Wouldn't it be more sensible to have a single buffer that was
 partitioned to make two logical, equally-sized buffers, given that in
 general each buffer is expected to grow at exactly the same rate?

 Sure, but it would be making the code more complicated in return for
 no measurable performance benefit.  We generally avoid that.

Fair enough.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] transforms

2012-06-14 Thread Peter Eisentraut

Here is my first patch for the transforms feature.  This is a mechanism
to adapt data types to procedural languages.  The previous proposal was
here: http://archives.postgresql.org/pgsql-hackers/2012-05/msg00728.php

At the surface, this contains:

- New catalog pg_transform

- CREATE/DROP TRANSFORM

As proof of concepts and hopefully useful applications to-be, I have
included transforms for

- PL/Perl - hstore
- PL/Python - hstore
- PL/Python - ltree

These, along with the documentation for the above-mentioned DDL commands
and catalog can serve as explanations for reviewers.

There are numerous remaining discussion points and possible work items:

*) Currently, each of the above provided transforms is put into a
separate subdirectory under contrib.  This might be OK, but it is mainly
because we cannot build more than one MODULE_big per directory.  Fixing
that might be more work than it's worth, but if we want to provide more
of these in contrib, it might get crowded.

*) Since we have separate extensions for plperl and plperlu, and also
for plpython2u and plpython3u, we need one extension for adapting each
of these to a given type.  You can see under contrib/hstore_plperl what
this looks like.  Maybe this isn't fixable or worth fixing.

(With the directory and packaging not finalized, I haven't included any
documentation for these contrib modules yet.)

*) No psql backslash commands yet.  Could be added.

*) Permissions: Transforms don't have owners, a bit like casts.
Currently, you are allowed to drop a transform if you own both the type
and the language.  That might be too strict, maybe own the type and have
privileges on the language would be enough.

*) If we want to offer the ability to write transforms to end users,
then we need to install all the header files for the languages and the
types.  This actually worked quite well; including hstore.h and plperl.h
for example, gave you what you needed.  In other cases, some headers
might need cleaning up.  Also, some magic from the plperl and plpython
build systems might need to be exposed, for example to find the header
files.  See existing modules for how this currently looks.

*) There is currently some syntax schizophrenia.  The grammar accepts

CREATE TRANSFORM FOR hstore LANGUAGE plperl (
FROM SQL WITH hstore_to_plperl,
TO SQL WITH plperl_to_hstore
);

but pg_dump produces

CREATE TRANSFORM FOR hstore LANGUAGE plperl (
FROM SQL WITH hstore_to_plperl(hstore),
TO SQL WITH plperl_to_hstore(internal)
);

The SQL standard allows both.  (In the same way that it allows 'DROP
FUNCTION foo' without arguments, if it is not ambigious.)  Precedent is
that CREATE CAST requires arguments, but CREATE LANGUAGE does not.

*) The issue of how to handle arguments of type internal was already
discussed at
http://archives.postgresql.org/pgsql-hackers/2012-05/msg00857.php and
following.  I have adopted the suggestion there to prohibit calling
functions with arguments of type internal, but that is probably full
of holes and needs to be reviewed carefully.

*) I'm not very familiar with the Perl C API, so the hstore_plperl
implementation is probably full of memory leaks and weirdnesses.  It
needs some help from a PL/Perl expert.  (The PL/Python stuff is
hopefully better.)

*) ltree_plpython lacks the transform function from python to ltree,
because of laziness and lack of clear documentation on how to construct
ltree values.

*) I just noticed that the grammar changes broke ECPG.  I'll look into
that.

*) The regression tests for the core CREATE/DROP TRANSFORM functionality
is in contrib/hstore_plpython, because the core has no suitable language
available.  It's a bit ugly, but I don't know of a better way short of
implementing a fake language for regression testing only.



transforms-20120615.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: relation metapages

2012-06-14 Thread Robert Haas

On Thu, Jun 14, 2012 at 5:34 PM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 On 14.06.2012 18:01, Robert Haas wrote:
 What I'm really looking for at this stage of the game is feedback on
 the design decisions I made.  The intention here is that it should be
 possible to read old-format heaps and indexes transparently, but that
 when we create or rewrite a relation, we add a new-style metapage.

 That dodges the problem of pg_upgrade, but eventually, we will want to use
 the metapage for something important, and it can't be optional at that point
 anymore. So we will eventually need to deal with pg_upgrade somehow.

Well, the code as I've written deals with pg_upgrade just fine: you
can move your old relation files over and they still work.  What's
missing at present is an efficient way to convert them to the new
format.  If you don't mind the inefficiency, you can use VACUUM FULL
or CLUSTER, and you're there, but obviously we'll want something
better before we start relying on this for anything too critical.  For
indexes, it should be pretty trivial to reduce the requirement from
rewrite while holding AccessExclusiveLock to brief
AccessExclusiveLock.  Everything except GiST already has a metapage,
so you just need to rewrite that page in the new format, which is a
SMOP.  GiST requires moving the existing metapage out to a free page
(or a new page) and writing a metapage pointing to it into block 0,
which is again pretty simple.  Heaps are a bit harder.  We could adopt
your idea of storing a block number in the metablock; any index TIDs
that point to block 0 will be interpreted as pointing to that block
instead.  Then we can basically just relocate block to any free block,
or a new one, as with GiST.  Alternatively, we can read the tuples in
block 0 and reinsert them; vacuum; and then repurpose block 0.  Taking
it even further, we could try to do better than brief
AccessExclusiveLock.  That might be possible too, especially for
indexes, but I'm not sure that it's necessary, and I haven't thought
through the details.

Even if we just get it down to brief AccessExclusiveLock, I think we
might also be able to improve the experience by making autovacuum do
conversions automatically.  So, if you pg_upgrade, the first
autovacuum worker that spins through the database will
ConditionalLockAcquire an AccessExclusiveLock on each relation in turn
and try to do the conversion.  If it can't get the lock it'll keep
retrying until it succeeds.  Odds are good that for most people this
would make the addition of the metapage completely transparent.  On
the other hand, if metapages exist mostly to support operations like
making an unlogged table logged or visca versa, that's not really
necessary: we can add the metapage when someone performs a DDL
operation that requires it.  There is a lot of optimization possible
on top of the basic mechanism, but how much of it makes sense to do,
and which parts, depends on exactly which of the many things we could
do with this we end up deciding to actually do.  I'm trying to start
with the basics, which means getting the basic infrastructure in place
and working.

 It would be nice to have the oid of the access method in the metapage (or
 some other way to identify it).

Yeah, I was thinking about that.  I think a magic number might be
preferable to an OID, and we actually already have that as the first
4-bytes of the access method metadata for all index types except GIN.
I'm thinking about trying to fix up GIN so that it adds one as well;
the trick is to do it in a way that is backward-compatible, which I
have an idea how to do but haven't tackled yet.  We can add a magic
number for heaps as well.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Allow WAL information to recover corrupted pg_controldata

2012-06-14 Thread Amit Kapila

 I guess my first question is: why do we need this?  There are lots of
 things in the TODO list that someone wanted once upon a time, but
 they're not all actually important.  Do you have reason to believe
 that this one is?  It's been six years since that email, so it's worth
 asking if this is actually relevant.

As far as I know the pg_control is not WAL protected, which means if it gets
corrupt due
to any reason (disk crash during flush, so written partially), it might lead
to failure in recovery of database.
So user can use pg_resetxlog to recover the database. Currently pg_resetxlog
works on guessed values for pg_control.
However this implementation can improve the logic that instead of guessing,
it can try to regenerate the values from
WAL. 
This implementation can allow better recovery in certain circumstances.

 The deadline for patches for this CommitFest is today, so I think you
 should target any work you're starting now for the NEXT CommitFest.

Oh, I am sorry, as this was my first time I was not fully aware of the
deadline.

However I still seek your opinion whether it makes sense to work on this
feature. 


-Original Message-
From: Robert Haas [mailto:robertmh...@gmail.com] 
Sent: Friday, June 15, 2012 12:40 AM
To: Amit Kapila
Cc: pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Allow WAL information to recover corrupted
pg_controldata

On Thu, Jun 14, 2012 at 11:39 AM, Amit Kapila amit.kap...@huawei.com
wrote:
 I am planning to work on the below Todo list item for this CommitFest
 Allow WAL information to recover corrupted pg_controldata
 http://archives.postgresql.org/pgsql-patches/2006-06/msg00025.php

The deadline for patches for this CommitFest is today, so I think you
should target any work you're starting now for the NEXT CommitFest.

 I wanted to confirm my understanding about the work involved for this
patch:
 The existing patch has following set of problems:
    1. Memory leak and linked list code path is not proper
    2. lock check for if the server is already running, is removed in patch
 which needs to be reverted
    3. Refactoring of the code.

 Apart from above what I understood from the patch is that its intention is
 to generate values for ControlFile using WAL logs when -r option is used.

 The change in algorithm from current will be if control file is corrupt
 which essentialy means ReadControlFile() will return False, then it should
 generate values (checkPointCopy, checkPoint, prevCheckPoint, state) using
 WAL if -r option is enabled.

 Also for -r option, it doesn't need to call function FindEndOfXLOG() as
the
 that work will be achieved by above point.

 It will just rewrite the control file and dont do other resets.


 The algorithm of restoring the pg_control value from old xlog file:
    1. Retrieve all of the active xlog files from xlog direcotry into a
list
 by increasing order, according their timeline, log id, segment id.
    2. Search the list to find the oldest xlog file of the lastest time
line.
    3. Search the records from the oldest xlog file of latest time line to
 the latest xlog file of latest time line, if the checkpoint record
       has been found, update the latest checkpoint and previous
checkpoint.



 Apart from above some changes in code will be required after the Xlog
patch
 by Heikki.

 Suggest me if my understanding is correct?

I guess my first question is: why do we need this?  There are lots of
things in the TODO list that someone wanted once upon a time, but
they're not all actually important.  Do you have reason to believe
that this one is?  It's been six years since that email, so it's worth
asking if this is actually relevant.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] libpq compression

2012-06-14 Thread Bruce Momjian

On Thu, Jun 14, 2012 at 02:43:04PM -0400, Tom Lane wrote:
 Euler Taveira eu...@timbira.com writes:
  On 14-06-2012 02:19, Tom Lane wrote:
  ...  I especially think that importing bzip2
  is a pretty bad idea --- it's not only a new dependency, but bzip2's
  compression versus speed tradeoff is entirely inappropriate for this
  use-case.
 
  I see, the idea is that bzip2 would be a compiled-in option (not enabled by
  default) just to give another compression option.
 
 I'm not particularly thrilled with let's have more compression options
 just to have options.  Each such option you add is another source of
 fail-to-connect incompatibilities (when either the client or the server
 doesn't have it).  Moreover, while there are a lot of compression
 algorithms out there, a lot of them are completely unsuited for this
 use-case.  If memory serves, bzip2 for example requires fairly large
 data blocks in order to get decent compression, which means you are
 either going to get terrible compression or suffer very bad latency
 when trying to apply it to a connection data stream.
 
 So I've got very little patience with the idea of let's put in some
 hooks and then great things will happen.  It would be far better all
 around if we supported exactly one, well-chosen, method.  But really
 I still don't see a reason not to let openssl do it for us.

Do we just need to document SSL's NULL encryption option?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] sortsupport for text

2012-06-14 Thread Robert Haas

On Thu, Jun 14, 2012 at 6:30 PM, Peter Geoghegan pe...@2ndquadrant.com wrote:
 Here we know that it doesn't matter, so the application of Knuth's first law
 of optimization is appropriate.

 I'm not advocating some Byzantine optimisation, or even something that
 could reasonably be described as an optimisation at all here. I'm
 questioning why you've unnecessarily complicated the code by having
 the buffer size just big enough to fit the biggest value seen so far,
 but arbitrarily aligned to a value that is completely irrelevant to
 bttextfastcmp_locale(), rather than using simple geometric expansion,
 which is more or less the standard way of managing the growth of a
 dynamic array.

Well, so... palloc, and most malloc implementations, don't actually
just double every time forever.  They double up to some relatively
small number like 1MB, and then they expand in 1MB chunks.
std::vector may well behave as you're describing, but that's doing
something completely different.  We're not accumulating a string of
unknown length into a buffer, with the risk of having to copy the
whole buffer every time we reallocate.  We know the exact size of the
buffer we need for any given string, so the cost of a pfree/palloc is
only the cost of releasing and allocating memory; there is no
additional copying cost, as there would be for std::vector.

Look at it another way.  Right now, the worst case number of temporary
buffer allocations is about 2N lg N, assuming we do about N lg N
comparisons during a sort.  If we simply implemented the most naive
buffer reallocation algorithm possible, we might in the worst case
reallocate each buffer up to N times.  That is already better than
what we do now by a factor of lg N.  If we suppose N is about a
million, that's an improvement of 20x *in the worst case* - typically,
the improvement will be much larger, because all the strings will be
of similar length or we won't hit them in exactly increasing order of
length.  The algorithm I've actually implemented bounds the worst case
1024 times more tightly - given N strings, we can't need to enlarge
the buffer more than N/1024 times.  So with N of around a million,
this algorithm should eliminate *at least* 99.995% of the pallocs we
currently do .  How much better does it need to be?

 You have to grow the array in some way. The basic approach I've
 outlined has something to recommend it - why does it make sense to
 align the size of the buffer to TEXTBUFLEN in particular though?

There's nothing magic about TEXTBUFLEN as far as that goes.  We could
round to the nearest multiple of 8kB or whatever if that seemed
better.  But if we rounded to, say, the nearest 1MB, then someone
sorting strings that are 2kB long would use 2MB of memory for these
buffers.  Considering that we ship with work_mem = 1MB, that could
easily end up wasting almost twice work_mem.  I don't see how you can
view that as a trivial problem.  It might be worth sucking that up if
it seemed likely to dramatically improve performance, but it doesn't.

 It's
 quite easy to imagine what you've done here resulting in an excessive
 number of allocations (and pfree()s), which *could* be expensive.

Actually, I can't imagine that at all, per the above analysis.  If I
could, I might agree with you.  :-)

 There is a trade-off between space and time to be made here, but I
 don't know why you think that the right choice is to use almost the
 smallest possible amount of memory in all cases.

Because I think that with the current implementation I can have my
cake and eat it, too.  I believe I've gotten essentially all the
available speedup for essentially no memory wastage.  If you can
convince me that I've missed something and there is still a meaningful
amount of palloc overhead left to be squeezed out, I'm all ears.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Saving snapshots for later use

2012-06-14 Thread Nikolas Everett

I've been a PostgreSQL user for years and have a feature that I'd like to
see implemented.  I've started playing with the source and have a vague
implementation plan for it, but before I go blundering around I'd like to
run the idea by this list.  So here it is:

I'd like to be able to save the current snapshot and then at a later date
roll the entire database back to that snapshot, essentially erasing
everything that happened since the snapshot.  Well, more like making all
that stuff invisible.  This would be really useful when testing
applications who's databases take a while to setup and who's tests change
the database in some way.  It could also provide a way to recover from some
horrible user error.  Frankly, I'm more interested in this for testing, but
having the ability to recover from errors would be nice.

Once you've saved the snapshot it'd be tempting to expose the ability to
run queries against the old snapshot as well.  This seems like more of a
bonus in my mind, though.

From here on out I'm not sure if I have my terms right so please correct me
if I'm wrong.

So I'm wondering a few things:
1.  I _think_ all I have to do to store one of these snapshots is to store
the current xid and a list of all in progress transactions.  I believe I
should be able to reconstitute this with the commit log to determine
visibility at that snapshot.
2.  I _think_ I can save that information in a system table.  It feels
weird to use a table to store visibility information but part of me thinks
that is ok.  I can't put my finger on why I think that is ok though.
3.  I'm not really sure at all what to do with other connections that are
doing things during a snapshot restore.  They might just have to get cut.
 For my purposes this would be ok but I can immagine that would be a
problem.
4.  I think not allowing the user to save a snapshot during a transaction
would simplify the book keeping a ton.  I know what I typed in (1) wouldn't
cover snapshots inside transactions.  I just don't think that is worth the
effort.
5.  On a more personal note I have no idea if I am capable of implementing
this.  I'm really not very good with C but I haven't had too much trouble
figuring out how to add new parser rules or map them into the tcop layer.
 I figured out how to create a new system table without too much trouble
but haven't a clue how upgrades work for the new table.  I'm not sure how
to read from the system table but I see there is a faq entry for that.
6.  I think it'd be ok if restoring an older snapshot removes a newer
snapshot.  Using the snapshot mechanism to jump forward in time just sounds
hard to keep strait.

I'd attach a patch of my work so far but it certainly isn't anything mind
blowing at this point.

Sorry to ramble on for a solid page.  Thanks for everyone who made it to
the end.

Cheers,

Nik

Re: [HACKERS] Saving snapshots for later use

2012-06-14 Thread Heikki Linnakangas


On 15.06.2012 06:19, Nikolas Everett wrote:

I've been a PostgreSQL user for years and have a feature that I'd like to
see implemented.  I've started playing with the source and have a vague
implementation plan for it, but before I go blundering around I'd like to
run the idea by this list.  So here it is:

I'd like to be able to save the current snapshot and then at a later date
roll the entire database back to that snapshot, essentially erasing
everything that happened since the snapshot.  Well, more like making all
that stuff invisible.  This would be really useful when testing
applications who's databases take a while to setup and who's tests change
the database in some way.  It could also provide a way to recover from some
horrible user error.  Frankly, I'm more interested in this for testing, but
having the ability to recover from errors would be nice.


Have you considered using filesystem snapshots?


So I'm wondering a few things:
1.  I _think_ all I have to do to store one of these snapshots is to store
the current xid and a list of all in progress transactions.  I believe I
should be able to reconstitute this with the commit log to determine
visibility at that snapshot.
2.  I _think_ I can save that information in a system table.  It feels
weird to use a table to store visibility information but part of me thinks
that is ok.  I can't put my finger on why I think that is ok though.
3.  I'm not really sure at all what to do with other connections that are
doing things during a snapshot restore.  They might just have to get cut.
  For my purposes this would be ok but I can immagine that would be a
problem.
4.  I think not allowing the user to save a snapshot during a transaction
would simplify the book keeping a ton.  I know what I typed in (1) wouldn't
cover snapshots inside transactions.  I just don't think that is worth the
effort.
5.  On a more personal note I have no idea if I am capable of implementing
this.  I'm really not very good with C but I haven't had too much trouble
figuring out how to add new parser rules or map them into the tcop layer.
  I figured out how to create a new system table without too much trouble
but haven't a clue how upgrades work for the new table.  I'm not sure how
to read from the system table but I see there is a faq entry for that.
6.  I think it'd be ok if restoring an older snapshot removes a newer
snapshot.  Using the snapshot mechanism to jump forward in time just sounds
hard to keep strait.


To revert the database to the earlier state, you'll also need to somehow 
roll back all the already-committed transactions. At first sight, that 
seems easy - just modify clog to mark them as aborted. However, it's not 
that easy, because you'd also need to somehow clear hint bits that claim 
those transactions to be committed. Or prevent those hint bits from 
being set in the first place, but that affects performance, so I don't 
think that would be very useful for testing. All in all, I think a 
filesystem snapshot is simpler. Or even just use a copy of the database, 
and use rsync to revert it back to the original state. You will have to 
restart postgres, though.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] libpq compression

2012-06-14 Thread Magnus Hagander

On Fri, Jun 15, 2012 at 10:19 AM, Bruce Momjian br...@momjian.us wrote:
 On Thu, Jun 14, 2012 at 02:43:04PM -0400, Tom Lane wrote:
 Euler Taveira eu...@timbira.com writes:
  On 14-06-2012 02:19, Tom Lane wrote:
  ...  I especially think that importing bzip2
  is a pretty bad idea --- it's not only a new dependency, but bzip2's
  compression versus speed tradeoff is entirely inappropriate for this
  use-case.

  I see, the idea is that bzip2 would be a compiled-in option (not enabled by
  default) just to give another compression option.

 I'm not particularly thrilled with let's have more compression options
 just to have options.  Each such option you add is another source of
 fail-to-connect incompatibilities (when either the client or the server
 doesn't have it).  Moreover, while there are a lot of compression
 algorithms out there, a lot of them are completely unsuited for this
 use-case.  If memory serves, bzip2 for example requires fairly large
 data blocks in order to get decent compression, which means you are
 either going to get terrible compression or suffer very bad latency
 when trying to apply it to a connection data stream.

Agreed. I think there's probably arguments to be had for supporting
compression without openssl (see below), but I don't think we need to
have a whole set of potentially incompatible ways of doing it. Picking
one that's good for the common case and not completely crap for the
corner cases would be a better choice (meaning bzip2 is probably a
very bad choice).


 So I've got very little patience with the idea of let's put in some
 hooks and then great things will happen.  It would be far better all
 around if we supported exactly one, well-chosen, method.  But really
 I still don't see a reason not to let openssl do it for us.

 Do we just need to document SSL's NULL encryption option?

Does the SSL NULL encryption+compression thing work if you're not
using openssl?

For one thing, some of us still hold a hope to support non-openssl
libraries in both libpq and server side, so it's something that would
need to be supported by the standard and thus available in most
libraries not to invalidate that.

Second, we also have things like the JDBC driver and the .Net driver
that don't use libpq. the JDBC driver uses the native java ssl
support, AFAIK. Does that one support the compression, and does it
support controlling it?

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

75 matches

Mail list logo