Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-19 Thread Andres Freund
On Friday 19 November 2010 05:38:14 Tom Lane wrote:
 Robert Haas robertmh...@gmail.com writes:
  I'm all in favor of having some memory ordering primitives so that we
  can try to implement better algorithms, but if we use it here it
  amounts to a fairly significant escalation in the minimum requirements
  to compile PG (which is bad) rather than just a performance
  optimization (which is good).
 
 I don't believe there would be any escalation in compilation
 requirements: we already have the ability to invoke stronger primitives
 than these.  What is needed is research to find out what the primitives
 are called, on platforms where we aren't relying on direct asm access.
 
 My feeling is it's time to bite the bullet and do that work.  We
 shouldn't cripple the latch operations because of laziness at the
 outset.
I don't think developing the code is the actual code is that hard - s_lock.c 
contains nearly everything necessary.
An 'lock xchg' or similar is only marginally slower then  the barrier-only 
implementation. So doing a TAS() on a slock_t in private memory should be an 
easy enough fallback implementation.

So the complicated case seems to be !defined(HAS_TEST_AND_SET) which uses 
spinlocks for that purpose - no idea where that is true these days.

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] contrib: auth_delay module

2010-11-19 Thread KaiGai Kohei

(2010/11/18 2:17), Robert Haas wrote:

On Wed, Nov 17, 2010 at 10:32 AM, Ross J. Reedstromreeds...@rice.edu  wrote:

On Tue, Nov 16, 2010 at 09:41:37PM -0500, Robert Haas wrote:

On Tue, Nov 16, 2010 at 8:15 PM, KaiGai Koheikai...@ak.jp.nec.com  wrote:

If we don't need a PoC module for each new hooks, I'm not strongly
motivated to push it into contrib tree.
How about your opinion?


I'd say let it go, unless someone else feels strongly about it.


I would use this module (rate limit new connection attempts) as soon as
I could. Putting a cap on potential CPU usage on a production DB by either
a blackhat or mistake by a developer caused by a mistake in
configuration (leaving the port accessible) is definitely useful, even
in the face of max_connections. My production apps already have
their connections and seldom need new ones. They all use CPU though.


If KaiGai updates the code per previous discussion, would you be
willing to take a crack at adding documentation?

P.S. Your email client seems to be setting the Reply-To address to a
ridiculous value.


OK, I'll revise my patch according to the previous discussion.
Please wait for about one week. I have a big event in this weekend.

Thanks,
--
KaiGai Kohei kai...@ak.jp.nec.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Label switcher function

2010-11-19 Thread KaiGai Kohei
(2010/11/18 11:30), Robert Haas wrote:
 2010/11/17 KaiGai Koheikai...@ak.jp.nec.com:
 I revised my patch as I attached.

 The hook function is modified and consolidated as follows:

   typedef enum FunctionCallEventType
   {
  FCET_BE_HOOKED,
  FCET_PREPARE,
  FCET_START,
  FCET_END,
  FCET_ABORT,
   } FunctionCallEventType;

   typedef Datum (*function_call_event_type)(Oid functionId,
 FunctionCallEventType event,
 Datum event_arg);
   extern PGDLLIMPORT function_call_event_type function_call_event_hook;

 Unlike the subject of this e-mail, now it does not focus on only switching
 security labels during execution of a certain functions.
 For example, we may use this hook to track certain functions for security
 auditing, performance tuning, and others.

 In the case of SE-PgSQL, it shall return BoolGetDatum(true), if the target
 function is configured as a trusted procedure, then, this invocation will
 be hooked by fmgr_security_definer. In the first call, it shall compute
 the security context to be assigned during execution on FCET_PREPARE event.
 Then, it switches to the computed label on the FCET_START event, and
 restore it on the FCET_END or ECET_ABORT event.
 
 This seems like it's a lot simpler than before, which is good.  It
 looks to me as though there should really be two separate hooks,
 though, one for what is now FCET_BE_HOOKED and one for everything
 else.  For FCET_BE_HOOKED, you want a function that takes an Oid and
 returns a bool.  For the other event types, the functionId and event
 arguments are OK, but I think you should forget about the save_datum
 stuff and just always pass fcache-flinfo andfcache-private.  The
 plugin can get the effect of save_datum by passing around whatever
 state it needs to hold on to using fcache-private.  So:
 
 bool (*needs_function_call_hook)(Oid fn_oid);
 void (*function_call_hook)(Oid fn_oid, FunctionCallEventType event,
 FmgrInfo flinfo, Datum *private);
 
It seems to me a good idea. The characteristic of FCET_BE_HOOKED event
type was a bit different from other three event types.
Please wait for about a week to revise my patch.

 Another general comment is that you've not done a very complete job
 updating the comments; there are several of them in fmgr.c that are no
 longer accurate.  Also, please zap the unnecessary whitespace changes.
 

Indeed, the comment at middle of the fmgr_info_cxt_security() and just
above definition of the fmgr_security_definer() are not correct.
Did you notice anything else?

Thanks,
-- 
KaiGai Kohei kai...@ak.jp.nec.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] SQL/MED estimated time of arrival?

2010-11-19 Thread Heikki Linnakangas

Some random comments on the patch:

ReleaseConnection is a very generic name for a global function, would be 
good to prefix it with pgsqlfdw or something. Same with any other 
globally visible functions.


Please use the built-in contain_mutable_functions(Node *) instead of 
custom is_immutable_func(). Or at least func_volatile(Oid)


Is it really a good idea to allow LOCK TABLE on foreign tables in its 
current form? It only locks the local foreign table object, not the 
table in the remote server.


Sorry if this was fiercely discussed already, but I don't think the file 
FDW belongs in core. I'd rather see it as a contrib module


I would've expected the contrib install script to create the foreign 
data wrapper for me. While you can specify options to a foreign data 
wrapper, the CREATE FOREIGN DATA WRAPPER seems similar to CREATE 
LANGUAGE, ie. something that happens when the foreign data wrapper 
library is installed.


How do you specify a foreign table that has a different name in the 
remote server? For example, if I wanted to create a foreign table called 
foo, that fetched rows from a remote table called bar?


I would really like to see the SQL query that's shipped to the remote 
host in EXPLAIN. That's essential information for analyzing a query that 
involves a foreign table.


What about transactions? Does the SQL/MED standard have something to say 
about that?



In general, I'm surprised that there's no hook at all into the planning 
phase. You have this TODO comment postgresql_fdw:



/*
 * TODO: omit (deparse to NULL) columns which are not used in the
 * original SQL.
 *
 * We must parse nodes parents of this ForeignScan node to determine 
unused
 * columns because some columns may be used only in parent 
Sort/Agg/Limit
 * nodes.
 */


Parsing the parents of the ForeignScan node seems like a backwards way 
of solving the problem. The planner should tell the FDW what columns it 
needs. And there should be some way for the FDW to tell the planner 
which quals it can handle, so that the executor doesn't need to recheck 
them.


You could make the planner interface infinitely complicated, but that's 
no excuse for doing nothing at all. The interface needs some thought...


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fate of the old cvs repository

2010-11-19 Thread Magnus Hagander
On Sun, Oct 17, 2010 at 19:15, David Fetter da...@fetter.org wrote:
 On Sun, Oct 17, 2010 at 05:54:04PM +0200, Magnus Hagander wrote:
 So, it seems we're pretty firmly on git now, and I doubt we're ever
 going to shift back now :)

 That means I'd like to get the two CVS VMs shut down (that's
 cvs.postgresql.org and anoncvs.postgresql.org), so we don't have to
 attempt to maintain them...

 What should we do with the official old cvs repository when we do
 this? Just create a .tar.gz and drop it on the ftp site? (I assume
 most committers already have such a copy of the repository, but there
 should probably be an official one for the project?) Anything else?

 +1 for dropping a tarball on the FTP mirrors.  That way it's
 distributed and hard to lose. :)

It's now done. It will show up in /pub/dev/archive/ on the ftp mirrors
as soon as they've replicated.


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Isn't HANDLE 64 bits on Win64?

2010-11-19 Thread Tom Lane
Dave Page dp...@pgadmin.org writes:
 On Tue, Nov 16, 2010 at 10:01 AM, Magnus Hagander mag...@hagander.net wrote:
 On Tue, Nov 16, 2010 at 01:35, Tom Lane t...@sss.pgh.pa.us wrote:
 BTW, it seems like it'd be a good thing if we had a Win64 machine in the
 buildfarm.

 Yes. I actually thought we had one. Dave, weren't you going to set one up?

 I was, but I saw one there so didn't bother (hamerkop). Windows
 buildfarm critters can take a surprising amount of herding...

hamerkop seems to have gone AWOL around the time of the git conversion.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Improving prep_buildtree used in VPATH builds

2010-11-19 Thread Alvaro Herrera
Excerpts from Greg Smith's message of vie nov 19 01:52:34 -0300 2010:

 I'd think that if configure takes 
 longer than it has to because the system is heavily loaded, the amount 
 compilation time is going to suffer from that would always dwarf this 
 component of total build time.  But if this was slow enough at some 
 point to motivate you to write a patch for it, maybe that assumption is 
 wrong.

What if instead of -depth you do something like 
find the_args | sort -r
?  If you find a way to filter out the parents that you know have
already been created, you could also cut down on the number of mkdir -p
calls, which could result in a larger speedup.  And maybe we should
remove the test -d.  Also, the `expr` call could be substituted by
${item##$sourcedir}, which is supposed to be a POSIX shell feature
according to 
http://www.unix.org/whitepapers/shdiffs.html and
http://www.opengroup.org/onlinepubs/009695399/utilities/xcu_chap02.html

In short, there are plenty of optimization opportunities for this script
without having to involve nonstandard constructs.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Changes to Linux OOM killer in 2.6.36

2010-11-19 Thread Kevin Grittner
Greg Smith  wrote:
 
 oom_adj is deprecated, scheduled for removal in August 2010:
 
That surprised me so I checked the URL.  I believe you have a typo
there and it's August, 2012.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-19 Thread Robert Haas
On Thu, Nov 18, 2010 at 11:38 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 I'm all in favor of having some memory ordering primitives so that we
 can try to implement better algorithms, but if we use it here it
 amounts to a fairly significant escalation in the minimum requirements
 to compile PG (which is bad) rather than just a performance
 optimization (which is good).

 I don't believe there would be any escalation in compilation
 requirements: we already have the ability to invoke stronger primitives
 than these.  What is needed is research to find out what the primitives
 are called, on platforms where we aren't relying on direct asm access.

I don't believe that's correct, although it's possible that I may be
missing something.  On any platform where we have TAS(), that should
be sufficient to set the flag, but how will we read the flag?  A
simple fetch isn't guaranteed to be sufficient; for some
architectures, you might need to insert a read fence, and I don't
think we have anything like that defined right now.  We've got
special-cases in s_lock.h for all kinds of crazy architectures; and
it's not obvious what would be needed.  For example some operating
system I've never heard of called SINIX has this:

#include abi_mutex.h
typedef abilock_t slock_t;

#define TAS(lock)   (!acquire_lock(lock))
#define S_UNLOCK(lock)  release_lock(lock)
#define S_INIT_LOCK(lock)   init_lock(lock)
#define S_LOCK_FREE(lock)   (stat_lock(lock) == UNLOCKED)

It's far from obvious to me how to make this do what we need - I have
a sneaking suspicion it can't be done with those primitives at all -
and I bet neither of us has a machine on which it can be tested.  Now
maybe we no longer care about supporting SINIX anyway, but the point
is that if we make this change, every platform for which we don't have
working TAS and read-fence operations becomes an unsupported platform.
 Forget about --disable-spinlocks; there is no such thing.  That
strikes me as an utterly unacceptable amount of collateral damage to
avoid a basically harmless API change, not to mention a ton of work.
You might be able to convince me that it's no longer important to
support platforms without a working spinlock implementation (although
I think it's rather nice that we can - might encourage someone to try
out PG and then contribute an implementation for their favorite
platform) but this is also going to break platforms that nominally
have TAS now (some of the TAS implementations aren't really TAS, as in
the above case, and we may not be able to easily determine what's
required for a read-fence even where TAS is a real TAS).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-19 Thread Robert Haas
On Fri, Nov 19, 2010 at 3:07 AM, Andres Freund and...@anarazel.de wrote:
 So the complicated case seems to be !defined(HAS_TEST_AND_SET) which uses
 spinlocks for that purpose - no idea where that is true these days.

Me neither, which is exactly the problem.  Under Tom's proposal, any
architecture we don't explicitly provide for, breaks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] libpq changes for synchronous replication

2010-11-19 Thread Robert Haas
On Thu, Nov 18, 2010 at 7:43 AM, Fujii Masao masao.fu...@gmail.com wrote:
 On Tue, Nov 16, 2010 at 10:49 AM, Robert Haas robertmh...@gmail.com wrote:
 Just in a quick scan, I don't have any objection to v2 except that the
 protocol documentation is lacking.

 OK, I'll mark it Waiting on Author pending that issue.

 The patch is touching protocol.sgml as follows. Isn't this enough?

How about some updates to the Message Flow section, especially the
section on COPY Operations?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-19 Thread Andres Freund
On Friday 19 November 2010 15:16:24 Robert Haas wrote:
 On Fri, Nov 19, 2010 at 3:07 AM, Andres Freund and...@anarazel.de wrote:
  So the complicated case seems to be !defined(HAS_TEST_AND_SET) which uses
  spinlocks for that purpose - no idea where that is true these days.
 Me neither, which is exactly the problem.  Under Tom's proposal, any
 architecture we don't explicitly provide for, breaks.
I doubt its that much of a problem as !defined(HAS_TEST_AND_SET) will be so 
slow that there would be noise from that side more often...

Besides, we can just jump into the kernel and back in that case (which the TAS 
implementation already does), that does more than just a fence...

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-19 Thread Robert Haas
On Fri, Nov 19, 2010 at 9:27 AM, Aidan Van Dyk ai...@highrise.ca wrote:
 On Fri, Nov 19, 2010 at 9:16 AM, Robert Haas robertmh...@gmail.com wrote:
 On Fri, Nov 19, 2010 at 3:07 AM, Andres Freund and...@anarazel.de wrote:
 So the complicated case seems to be !defined(HAS_TEST_AND_SET) which uses
 spinlocks for that purpose - no idea where that is true these days.

 Me neither, which is exactly the problem.  Under Tom's proposal, any
 architecture we don't explicitly provide for, breaks.

 Just a small point of clarification - you need to have both that
 unknown archtecture, and that architecture has to have postgres
 process running simultaneously on difference CPUs with different
 caches that are incoherent to have those problems.

Sure you do.  But so what?  Are you going to compile PostgreSQL and
implement TAS as a simple store and read-fence as a simple load?  How
likely is that to work out well?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-19 Thread Aidan Van Dyk
On Fri, Nov 19, 2010 at 9:16 AM, Robert Haas robertmh...@gmail.com wrote:
 On Fri, Nov 19, 2010 at 3:07 AM, Andres Freund and...@anarazel.de wrote:
 So the complicated case seems to be !defined(HAS_TEST_AND_SET) which uses
 spinlocks for that purpose - no idea where that is true these days.

 Me neither, which is exactly the problem.  Under Tom's proposal, any
 architecture we don't explicitly provide for, breaks.

Just a small point of clarification - you need to have both that
unknown archtecture, and that architecture has to have postgres
process running simultaneously on difference CPUs with different
caches that are incoherent to have those problems.

a.


-- 
Aidan Van Dyk                                             Create like a god,
ai...@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-19 Thread Andres Freund
On Friday 19 November 2010 15:29:10 Andres Freund wrote:
 Besides, we can just jump into the kernel and back in that case (which the
 TAS  implementation already does), that does more than just a fence...
Or if you don't believe that is enough initialize a lock on the stack, lock 
and forget it...

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-19 Thread Robert Haas
On Fri, Nov 19, 2010 at 9:29 AM, Andres Freund and...@anarazel.de wrote:
 On Friday 19 November 2010 15:16:24 Robert Haas wrote:
 On Fri, Nov 19, 2010 at 3:07 AM, Andres Freund and...@anarazel.de wrote:
  So the complicated case seems to be !defined(HAS_TEST_AND_SET) which uses
  spinlocks for that purpose - no idea where that is true these days.
 Me neither, which is exactly the problem.  Under Tom's proposal, any
 architecture we don't explicitly provide for, breaks.
 I doubt its that much of a problem as !defined(HAS_TEST_AND_SET) will be so
 slow that there would be noise from that side more often...

 Besides, we can just jump into the kernel and back in that case (which the TAS
 implementation already does), that does more than just a fence...

Eh, really?  If there's a workaround for platforms for which we don't
know what the appropriate read-fencing incantation is, then I'd feel
more comfortable about doing this.  But I don't see how to make that
work.  The whole problem here is that API is designed in such a way
that the signal handler might be invoked when the lock that it needs
to grab is already held by the same process.  The reason memory
barriers solve the problem is because they'll be atomically released
when we jump into the signal handler, but that is not true of a
spin-lock or a semaphore.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-19 Thread Robert Haas
On Fri, Nov 19, 2010 at 9:35 AM, Aidan Van Dyk ai...@highrise.ca wrote:
 On Fri, Nov 19, 2010 at 9:31 AM, Robert Haas robertmh...@gmail.com wrote:
 Just a small point of clarification - you need to have both that
 unknown archtecture, and that architecture has to have postgres
 process running simultaneously on difference CPUs with different
 caches that are incoherent to have those problems.

 Sure you do.  But so what?  Are you going to compile PostgreSQL and
 implement TAS as a simple store and read-fence as a simple load?  How
 likely is that to work out well?

 If I was trying to port PostgreSQL to some strange architecture, and
 my strange architecture didtt' have all the normal TAS and memory
 bariers stuff because it was only a UP system with no cache, then yes,
 and it would work out well ;-)

I get your point, but obviously this case isn't very interesting or
likely in 2010.

 If it was some strange SMP architecture, I wouldn't expect *anything*
 to work out well if the architecture doesn't have some sort of
 TAS/memory barrier/cache-coherency stuff in it ;-)

Well, you'd be pleasantly surprised to find that you could at least
get it to compile using --disable-spinlocks.  Yeah, the performance
would probably be lousy and you might run out of semaphores, but at
least for basic stuff it would run.  Ripping that out just to avoid an
API change in code we committed two months ago seems a bit extreme,
especially since it's also going to implementing a read-fence
operation on every platform we want to continue supporting.  Maybe you
could default the read-fence to just a simple read for platforms that
are not known to have an issue, but all the platforms where TAS is
calling some OS-provided routine that does mysterious magic under the
covers are going to need attention; and I just don't think that
cleaning up everything that's going to break is a very worthwhile
investment of our limited development resources, even if it doesn't
result in needlessly dropping platform support.

If we're going to work on memory primitives, I would much rather see
us put that effort into, say, implementing more efficient LWLock
algorithms to solve the bottlenecks that the MOSBENCH guys found,
rather than spending it on trying to avoid a minor API complication
for the latch facility.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-19 Thread Andres Freund
On Friday 19 November 2010 15:38:37 Robert Haas wrote:
 Eh, really?  If there's a workaround for platforms for which we don't
 know what the appropriate read-fencing incantation is, then I'd feel
 more comfortable about doing this.  But I don't see how to make that
 work.  The whole problem here is that API is designed in such a way
 that the signal handler might be invoked when the lock that it needs
 to grab is already held by the same process.  The reason memory
 barriers solve the problem is because they'll be atomically released
 when we jump into the signal handler, but that is not true of a
 spin-lock or a semaphore.
Well, its not generally true - you are right there. But there is a wide range 
for syscalls available where its inherently true (which is what I sloppily 
referred to). And you are allowed to call a, although quite restricted, set of 
system calls even in signal handlers. I don't have the list for older posix 
versions in mind, but for 2003 you can choose something from several like 
write, lseek,setpgid which inherently have to serialize. And I am quite sure 
there were sensible calls for earlier versions.

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-19 Thread Andres Freund
On Friday 19 November 2010 15:14:58 Robert Haas wrote:
 On Thu, Nov 18, 2010 at 11:38 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  Robert Haas robertmh...@gmail.com writes:
  I'm all in favor of having some memory ordering primitives so that we
  can try to implement better algorithms, but if we use it here it
  amounts to a fairly significant escalation in the minimum requirements
  to compile PG (which is bad) rather than just a performance
  optimization (which is good).
  
  I don't believe there would be any escalation in compilation
  requirements: we already have the ability to invoke stronger primitives
  than these.  What is needed is research to find out what the primitives
  are called, on platforms where we aren't relying on direct asm access.
 
 I don't believe that's correct, although it's possible that I may be
 missing something.  On any platform where we have TAS(), that should
 be sufficient to set the flag, but how will we read the flag?  A
 simple fetch isn't guaranteed to be sufficient; for some
 architectures, you might need to insert a read fence, and I don't
 think we have anything like that defined right now. 
A TAS is both a read and write fence. After that you don't *need* to fetch it.
And even if it were only a write fence on some platforms  - if we consistently 
issue write fences at the relevant places that ought to be enough.

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-19 Thread Andres Freund
On Friday 19 November 2010 15:49:45 Robert Haas wrote:
 If we're going to work on memory primitives, I would much rather see
 us put that effort into, say, implementing more efficient LWLock
 algorithms to solve the bottlenecks that the MOSBENCH guys found,
 rather than spending it on trying to avoid a minor API complication
 for the latch facility.
But for that you will need more infrastructure in that area anyway.

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-19 Thread Robert Haas
On Fri, Nov 19, 2010 at 9:51 AM, Andres Freund and...@anarazel.de wrote:
 On Friday 19 November 2010 15:49:45 Robert Haas wrote:
 If we're going to work on memory primitives, I would much rather see
 us put that effort into, say, implementing more efficient LWLock
 algorithms to solve the bottlenecks that the MOSBENCH guys found,
 rather than spending it on trying to avoid a minor API complication
 for the latch facility.
 But for that you will need more infrastructure in that area anyway.

True, but you don't have to do it all at once.  You can continue to do
the same old stuff on the platforms you currently support, and use the
newer stuff on platforms where the right thing to do is readily
apparent, like x64 and x86_64.  And people can add support for their
favorite platforms gradually over time, rather than having a flag day
where we stop supporting everything we don't know what to do with.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Label switcher function

2010-11-19 Thread Robert Haas
2010/11/19 KaiGai Kohei kai...@ak.jp.nec.com:
 Indeed, the comment at middle of the fmgr_info_cxt_security() and just
 above definition of the fmgr_security_definer() are not correct.
 Did you notice anything else?

I think I noticed a couple of places, but I didn't write down exactly
which ones.  Sorry

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-19 Thread Aidan Van Dyk
On Fri, Nov 19, 2010 at 9:49 AM, Andres Freund and...@anarazel.de wrote:
 Well, its not generally true - you are right there. But there is a wide range
 for syscalls available where its inherently true (which is what I sloppily
 referred to). And you are allowed to call a, although quite restricted, set of
 system calls even in signal handlers. I don't have the list for older posix
 versions in mind, but for 2003 you can choose something from several like
 write, lseek,setpgid which inherently have to serialize. And I am quite sure
 there were sensible calls for earlier versions.

Well, it's not quite enough just to call into the kernel to serialize
on some point of memory, because your point is to make sure that
*this particular piece of memory* is coherent.  It doesn't matter if
the kernel has proper fencing in it's stuff if the memory it's
guarding is in another cacheline, because that won't *necessarily*
force cache coherency in your local lock/variable memory.

-- 
Aidan Van Dyk                                             Create like a god,
ai...@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-19 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 If we're going to work on memory primitives, I would much rather see
 us put that effort into, say, implementing more efficient LWLock
 algorithms to solve the bottlenecks that the MOSBENCH guys found,
 rather than spending it on trying to avoid a minor API complication
 for the latch facility.

I haven't read all of this very long thread yet, but I will point out
that you seem to be arguing from the position that memory ordering
primitives will only be useful for the latch code.  This is nonsense
of the first order.  We already know that the sinval signalling
mechanism could use it to avoid needing a spinlock.  I submit that
it's very likely that fixing communication bottlenecks elsewhere
will similarly require memory ordering primitives if we are to avoid
the stupid use a lock approach.  I think it's time to build that
infrastructure.

BTW, I agree with Andres' point that we can probably default memory
barriers to be no-ops on unknown platforms.  Weak memory ordering
isn't a common architectural choice.  A look through s_lock.h suggests
that PPC and MIPS are the only supported arches that need to worry
about this.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-19 Thread Robert Haas
On Fri, Nov 19, 2010 at 10:01 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 If we're going to work on memory primitives, I would much rather see
 us put that effort into, say, implementing more efficient LWLock
 algorithms to solve the bottlenecks that the MOSBENCH guys found,
 rather than spending it on trying to avoid a minor API complication
 for the latch facility.

 I haven't read all of this very long thread yet, but I will point out
 that you seem to be arguing from the position that memory ordering
 primitives will only be useful for the latch code.  This is nonsense
 of the first order.  We already know that the sinval signalling
 mechanism could use it to avoid needing a spinlock.  I submit that
 it's very likely that fixing communication bottlenecks elsewhere
 will similarly require memory ordering primitives if we are to avoid
 the stupid use a lock approach.  I think it's time to build that
 infrastructure.

I completely agree, but I'm not too sure I want to drop support for
any platform for which we haven't yet implemented such primitives.
What's different about this case is that fall back to taking the spin
lock is not a workable option.

 BTW, I agree with Andres' point that we can probably default memory
 barriers to be no-ops on unknown platforms.  Weak memory ordering
 isn't a common architectural choice.  A look through s_lock.h suggests
 that PPC and MIPS are the only supported arches that need to worry
 about this.

That's good to hear.  I'm more worried, however, about architectures
where we supposedly have TAS but it isn't really TAS but some
OS-provided acquire a lock primitive.  That won't generalize nicely
to what we need for this case.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-19 Thread Aidan Van Dyk
On Fri, Nov 19, 2010 at 9:31 AM, Robert Haas robertmh...@gmail.com wrote:

 Just a small point of clarification - you need to have both that
 unknown archtecture, and that architecture has to have postgres
 process running simultaneously on difference CPUs with different
 caches that are incoherent to have those problems.

 Sure you do.  But so what?  Are you going to compile PostgreSQL and
 implement TAS as a simple store and read-fence as a simple load?  How
 likely is that to work out well?

If I was trying to port PostgreSQL to some strange architecture, and
my strange architecture didtt' have all the normal TAS and memory
bariers stuff because it was only a UP system with no cache, then yes,
and it would work out well ;-)

If it was some strange SMP architecture, I wouldn't expect *anything*
to work out well if the architecture doesn't have some sort of
TAS/memory barrier/cache-coherency stuff in it ;-)

a.


-- 
Aidan Van Dyk                                             Create like a god,
ai...@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-19 Thread Andres Freund
On Friday 19 November 2010 15:58:39 Aidan Van Dyk wrote:
 On Fri, Nov 19, 2010 at 9:49 AM, Andres Freund and...@anarazel.de wrote:
  Well, its not generally true - you are right there. But there is a wide
  range for syscalls available where its inherently true (which is what I
  sloppily referred to). And you are allowed to call a, although quite
  restricted, set of system calls even in signal handlers. I don't have
  the list for older posix versions in mind, but for 2003 you can choose
  something from several like write, lseek,setpgid which inherently have
  to serialize. And I am quite sure there were sensible calls for earlier
  versions.
 
 Well, it's not quite enough just to call into the kernel to serialize
 on some point of memory, because your point is to make sure that
 *this particular piece of memory* is coherent.  It doesn't matter if
 the kernel has proper fencing in it's stuff if the memory it's
 guarding is in another cacheline, because that won't *necessarily*
 force cache coherency in your local lock/variable memory.
Yes and no. It provides the same guarantees as our current approach of using 
spinlocks for exactly that - that it theoretically is not enough is an 
independent issue (but *definitely* an issue).

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-19 Thread Markus Wanner
On 11/19/2010 03:58 PM, Aidan Van Dyk wrote:
 Well, it's not quite enough just to call into the kernel to serialize
 on some point of memory, because your point is to make sure that
 *this particular piece of memory* is coherent.

Well, that certainly doesn't apply to full fences, that are not specific
to a particular piece of memory. I'm thinking of 'mfence' on x86_64 or
'mf' on ia64.

Regards

Markus Wanner

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] duplicate connection failure messages

2010-11-19 Thread Alvaro Herrera
Excerpts from Bruce Momjian's message of vie nov 19 00:17:59 -0300 2010:
 Alvaro Herrera wrote:
  Excerpts from Bruce Momjian's message of mi nov 17 13:04:46 -0300 2010:
  
   OK, I doubt we want to add complexity to improve this, so I see our
   options as:
   
   o  ignore the problem
   o  display IPv4/IPv6 labels
   o  display only an IPv6 label
   o  something else
  
  I think we should use inet_ntop where available to print the address.
 
 Good idea because inet_ntop() is thread-safe.  Does that work on IPv6? 
 You indicated that inet_ntoa() does not.

According to opengroup.org, IPv6 should work if the underlying libraries
support it, whereas inet_ntoa explicitely does not.
http://www.opengroup.org/onlinepubs/009695399/functions/inet_ntop.html
http://www.opengroup.org/onlinepubs/009695399/functions/inet_addr.html

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] libpq changes for synchronous replication

2010-11-19 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Thu, Nov 18, 2010 at 7:43 AM, Fujii Masao masao.fu...@gmail.com wrote:
 The patch is touching protocol.sgml as follows. Isn't this enough?

 How about some updates to the Message Flow section, especially the
 section on COPY Operations?

Yeah.  You're adding a new fundamental state to the protocol; it's not
enough to bury that in the description of a message format.  I don't
think a whole lot of new verbiage is needed, but the COPY section needs
to point out that this is a different state that allows both send and
receive, and explain what the conditions are for getting into and out of
that state.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-19 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 ... The reason memory
 barriers solve the problem is because they'll be atomically released
 when we jump into the signal handler, but that is not true of a
 spin-lock or a semaphore.

Hm, I wonder whether your concern is stemming from a wrong mental
model.  There is nothing to release.  In my view, a memory barrier
primitive is a sequence point, having the properties that all writes
issued before the barrier shall become visible to other processors
before any writes after it, and also that no reads issued after the
barrier shall be executed until those writes have become visible.
(PPC can separate those two aspects, but I think we probably don't
need to get that detailed for our purposes.)  On most processors,
the barrier primitive will just be ((void) 0) because they don't
deal in out-of-order writes anyway.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] What do these terms mean in the SOURCE CODE?

2010-11-19 Thread Vaibhav Kaushal
I am going through the Executor code and come across the following terms
quite often. Can someone tell me what do they mean (in a few (may be a
couple of) sentences)?

1. Scan State
2. Plan State
3. Tuple Projection
4. EState
5. Qual
6. Expression

They sound quite ambiguous in the source code, specially when some of them
already have terms which have multiple meanings.

Thanks for your time.

-Vaibhav (*_*)


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-19 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 I completely agree, but I'm not too sure I want to drop support for
 any platform for which we haven't yet implemented such primitives.
 What's different about this case is that fall back to taking the spin
 lock is not a workable option.

The point I was trying to make is that the fallback position can
reasonably be a no-op.

 That's good to hear.  I'm more worried, however, about architectures
 where we supposedly have TAS but it isn't really TAS but some
 OS-provided acquire a lock primitive.  That won't generalize nicely
 to what we need for this case.

I did say we need some research ;-).  We need to look into what's the
appropriate primitive for any such OSes that are available for PPC or
MIPS.  I don't feel a need to be paranoid about it for other
architectures.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-19 Thread Tom Lane
Markus Wanner mar...@bluegap.ch writes:
 Well, that certainly doesn't apply to full fences, that are not specific
 to a particular piece of memory. I'm thinking of 'mfence' on x86_64 or
 'mf' on ia64.

Hm, what do those do exactly?  We've never had any such thing in the
Intel-ish spinlock asm, but if out-of-order writes are possible I should
think we'd need 'em.  Or does lock xchgb imply an mfence?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] final patch - plpgsql: for-in-array

2010-11-19 Thread Pavel Stehule
I checked my tests and the most important is a remove a repeated detoast.

postgres=# CREATE OR REPLACE FUNCTION public.filter01(text[], text, integer)
 RETURNS text[]
 LANGUAGE plpgsql
AS $function$
DECLARE
 s text[] := '{}';
 l int := 0; i int;
 v text; loc text[] = $1;
BEGIN
  FOR i IN array_lower(loc,1)..array_upper(loc,1)
  LOOP
EXIT WHEN l = $3;
IF loc[i] LIKE $2 THEN
  s := s || loc[i];
  l := l + 1;
END IF;
  END LOOP;
  RETURN s;
END;$function$;

This code is very slow when array is large - tested on n=1000. With
one small modification can be 20x faster

DECLARE
 s text[] := '{}';
 l int := 0; i int;
 v text; loc text[] = $1 || '{}'::text[]; -- does just detoast and
docomprimation
BEGIN

the final version of test can be:


so result:

Don't access to large unmodified array inside cycle, when data comes
from table (for iteration over A[1000] of text(10)). A speadup is from
451 sec to 15 sec. This rule can be interesting for PostGIS people,
because it can be valid for other long varlena values. But still this
is 2x slower than special statement.

Regards

Pavel Stehule

samples  %symbol name
332  22.1333  exec_eval_expr
311  20.7333  plpgsql_param_fetch
267  17.8000  exec_eval_datum
220  14.6667  exec_stmts
916.0667  setup_param_list
825.4667  exec_eval_cleanup.clone.10
714.7333  __i686.get_pc_thunk.bx
483.2000  exec_simple_cast_value
432.8667  exec_eval_boolean

samples  %symbol name
4636 37.5994  array_seek.clone.0
961   7.7940  pglz_decompress
901   7.3074  list_member_ptr
443   3.5929  MemoryContextAllocZero
384   3.1144  AllocSetAlloc
381   3.0900  ExecEvalParamExtern
334   2.7088  GetSnapshotData
255   2.0681  AllocSetFree
254   2.0600  LWLockRelease
249   2.0195  ExecMakeFunctionResultNoSets
249   2.0195  UTF8_MatchText
234   1.8978  LWLockAcquire
195   1.5815  AllocSetReset
167   1.3544  AllocSetCheck
163   1.3220  pfree
151   1.2247  ExecEvalArrayRef
149   1.2084  RevalidateCachedPlan
138   1.1192  bms_is_member
126   1.0219  CopySnapshot

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] duplicate connection failure messages

2010-11-19 Thread Tom Lane
Alvaro Herrera alvhe...@commandprompt.com writes:
 Excerpts from Bruce Momjian's message of vie nov 19 00:17:59 -0300 2010:
 Alvaro Herrera wrote:
 I think we should use inet_ntop where available to print the address.
 
 Good idea because inet_ntop() is thread-safe.  Does that work on IPv6? 
 You indicated that inet_ntoa() does not.

 According to opengroup.org, IPv6 should work if the underlying libraries
 support it, whereas inet_ntoa explicitely does not.
 http://www.opengroup.org/onlinepubs/009695399/functions/inet_ntop.html
 http://www.opengroup.org/onlinepubs/009695399/functions/inet_addr.html

I get the impression that you guys have forgotten the existence of
src/backend/utils/adt/inet_net_ntop.c

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-19 Thread Markus Wanner
On 11/19/2010 04:51 PM, Tom Lane wrote:
 Hm, what do those do exactly?

Performs a serializing operation on all load-from-memory and
store-to-memory instructions that were issued prior the MFENCE
instruction. [1]

Given the memory ordering guarantees of x86, this instruction might only
be relevant for SMP systems, though.

 Or does lock xchgb imply an mfence?

Probably on older architectures (given the name bus locked exchange),
but OTOH I wouldn't bet on that still being true. Locking the entire bus
sounds like a prohibitively expensive operation with today's amounts of
cores per system.

Regards

Markus Wanner


[1]: random google hit on 'mfence':
http://siyobik.info/index.php?module=x86id=170

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-19 Thread Andres Freund
On Friday 19 November 2010 16:51:00 Tom Lane wrote:
 Markus Wanner mar...@bluegap.ch writes:
  Well, that certainly doesn't apply to full fences, that are not specific
  to a particular piece of memory. I'm thinking of 'mfence' on x86_64 or
  'mf' on ia64.
 Hm, what do those do exactly?  We've never had any such thing in the
 Intel-ish spinlock asm, but if out-of-order writes are possible I should
 think we'd need 'em.  Or does lock xchgb imply an mfence?
Out of order writes are definitely possible if you consider multiple 
processors.
Locked statments like 'lock xaddl;' guarantee that the specific operands (or 
their cachelines) are visible on all processors and are done atomically - but 
its not influencing the whole cache like mfence would.

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-19 Thread Tom Lane
Andres Freund and...@anarazel.de writes:
 Locked statments like 'lock xaddl;' guarantee that the specific operands (or 
 their cachelines) are visible on all processors and are done atomically - but 
 its not influencing the whole cache like mfence would.

Where is this locking the whole cache meme coming from?  What we're
looking for has nothing to do with locking anything.  It's primarily
a directive to the processor to flush any dirty cache lines out to
main memory.  It's not going to block any other processors.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-19 Thread Andres Freund
On Friday 19 November 2010 17:25:57 Tom Lane wrote:
 Andres Freund and...@anarazel.de writes:
  Locked statments like 'lock xaddl;' guarantee that the specific operands
  (or  their cachelines) are visible on all processors and are done
  atomically - but its not influencing the whole cache like mfence would.
 Where is this locking the whole cache meme coming from?  What we're
 looking for has nothing to do with locking anything.  It's primarily
 a directive to the processor to flush any dirty cache lines out to
 main memory.  It's not going to block any other processors.
I was never talking about 'locking the whole cache' - I was talking about 
flushing/fencing it like a global read/write barrier would. And lock 
xchgb/xaddl does not imply anything for other cachelines but its own.

I only used 'locked' in the context of 'lock xaddl'. 

Am I misunderstanding you?

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-19 Thread Tom Lane
Andres Freund and...@anarazel.de writes:
 I was never talking about 'locking the whole cache' - I was talking about 
 flushing/fencing it like a global read/write barrier would. And lock 
 xchgb/xaddl does not imply anything for other cachelines but its own.

If that's the case, why aren't the parallel regression tests falling
over constantly?  My recollection is that when I broke the sinval code
by assuming strong memory ordering without spinlocks, it didn't take
long at all for the PPC buildfarm members to expose the problem.
If it's possible for Intel-ish processors to exhibit weak memory
ordering behavior, I'm quite sure that our current code would be showing
bugs everywhere.

The impression I had of current Intel designs is that they ensure global
cache coherency, ie if one processor has a dirty cache line the others
know that, and will go get the updated data before attempting to access
that piece of memory.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] libpq changes for synchronous replication

2010-11-19 Thread Alvaro Herrera
Excerpts from Tom Lane's message of vie nov 19 12:25:13 -0300 2010:
 Robert Haas robertmh...@gmail.com writes:
  On Thu, Nov 18, 2010 at 7:43 AM, Fujii Masao masao.fu...@gmail.com wrote:
  The patch is touching protocol.sgml as follows. Isn't this enough?
 
  How about some updates to the Message Flow section, especially the
  section on COPY Operations?
 
 Yeah.  You're adding a new fundamental state to the protocol; it's not
 enough to bury that in the description of a message format.  I don't
 think a whole lot of new verbiage is needed, but the COPY section needs
 to point out that this is a different state that allows both send and
 receive, and explain what the conditions are for getting into and out of
 that state.

Is it sane that the new message has so specific a name?

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] libpq changes for synchronous replication

2010-11-19 Thread Tom Lane
Alvaro Herrera alvhe...@commandprompt.com writes:
 Excerpts from Tom Lane's message of vie nov 19 12:25:13 -0300 2010:
 Yeah.  You're adding a new fundamental state to the protocol; it's not
 enough to bury that in the description of a message format.  I don't
 think a whole lot of new verbiage is needed, but the COPY section needs
 to point out that this is a different state that allows both send and
 receive, and explain what the conditions are for getting into and out of
 that state.

 Is it sane that the new message has so specific a name?

Yeah, it might be better to call it something generic like CopyBoth.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-19 Thread Robert Haas
On Fri, Nov 19, 2010 at 10:44 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 I completely agree, but I'm not too sure I want to drop support for
 any platform for which we haven't yet implemented such primitives.
 What's different about this case is that fall back to taking the spin
 lock is not a workable option.

 The point I was trying to make is that the fallback position can
 reasonably be a no-op.

Hmm, maybe you're right.  I was assuming weak memory ordering was a
reasonably common phenomenon, but if it only applies to a very small
number of architectures and we're pretty confident we know which ones
they are, your approach would be far less frightening than I
originally thought.  But is that really true?

I think it would be useful to try to build up a library of primitives
in this area.  For this particular task, we really only need a
write-with-fence primitive and a read-with-fence primitive.  On strong
memory ordering machines, these can just do a store and a read,
respectively; on weak memory ordering machines, they can insert
whatever fencing operations are needed on either the store side or the
load side.  I think it would also be useful to provide macros for
compare-and-swap and fetch-and-add on platforms where they are
available.  Then we could potentially write code like this:

#ifdef HAVE_COMPARE_AND_SWAP
...do it the lock-free way...
#else
...oh, well, do it with spinlocks...
#endif

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-19 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 I think it would be useful to try to build up a library of primitives
 in this area.  For this particular task, we really only need a
 write-with-fence primitive and a read-with-fence primitive.

That's really entirely the wrong way to think about it.  You need a
fence primitive, full stop.  It's a sequence point, not an operation
in itself.  It guarantees that reads/writes occurring before or after
it aren't resequenced around it.  I don't even understand what write
with fence means --- is the write supposed to be fenced against other
writes before it, or other writes after it?

 I think it would also be useful to provide macros for
 compare-and-swap and fetch-and-add on platforms where they are
 available.

That would be a great deal more work, because it's not a no-op anywhere;
and our need for it is still rather hypothetical.  I'm surprised to see
you advocating that when you didn't want to touch fencing a moment ago.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-19 Thread Florian Weimer
* Andres Freund:

 I was never talking about 'locking the whole cache' - I was talking about 
 flushing/fencing it like a global read/write barrier would. And lock 
 xchgb/xaddl does not imply anything for other cachelines but its own.

My understanding is that once you've seen the result of an atomic
operation on i386 and amd64, you are guaranteed to observe all prior
writes performed by the thread which did the atomic operation, too.
Explicit fencing is only necessary if you need synchronization without
atomic operations.

-- 
Florian Weimerfwei...@bfk.de
BFK edv-consulting GmbH   http://www.bfk.de/
Kriegsstraße 100  tel: +49-721-96201-1
D-76133 Karlsruhe fax: +49-721-96201-99

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] contrib: auth_delay module

2010-11-19 Thread Ross J. Reedstrom
On Fri, Nov 19, 2010 at 04:57:03PM +0900, KaiGai Kohei wrote:
 (2010/11/18 2:17), Robert Haas wrote:
 
 If KaiGai updates the code per previous discussion, would you be
 willing to take a crack at adding documentation?
 
 P.S. Your email client seems to be setting the Reply-To address to a
 ridiculous value.
 
 OK, I'll revise my patch according to the previous discussion.
 Please wait for about one week. I have a big event in this weekend.
 

I'll take a crack at the docs, though I might need hand-holding for the
new git stuff (I'll hit the wiki)

Ross 
-- 
Ross Reedstrom, Ph.D. reeds...@rice.edu
Systems Engineer  Admin, Research Scientistphone: 713-348-6166
Connexions  http://cnx.orgfax: 713-348-3665
Rice University MS-375, Houston, TX 77005
GPG Key fingerprint = F023 82C8 9B0E 2CC6 0D8E  F888 D3AE 810E 88F0 BEDE

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-19 Thread Tom Lane
I wrote:
 Markus Wanner mar...@bluegap.ch writes:
 Well, that certainly doesn't apply to full fences, that are not specific
 to a particular piece of memory. I'm thinking of 'mfence' on x86_64 or
 'mf' on ia64.

 Hm, what do those do exactly?

I poked around in the Intel manuals a bit.  They do have mfence (also
lfence and sfence) but so far as I can tell, those are only used to
manage loads and stores that are issued by special instructions that
explicitly mark the operation as weakly ordered.  So the reason we're
not seeing bugs is presumably that C compilers don't generate such
instructions.  Also, Intel architectures do guarantee cache consistency
across multiple processors (and it costs them a lot...)

I found a fairly interesting and detailed paper about memory fencing
in the Linux kernel:
http://www.rdrop.com/users/paulmck/scalability/paper/ordering.2007.09.19a.pdf

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-19 Thread Kevin Grittner
Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 I think it would be useful to try to build up a library of
 primitives in this area.  For this particular task, we really
 only need a write-with-fence primitive and a read-with-fence
 primitive.
 
 That's really entirely the wrong way to think about it.  You need
 a fence primitive, full stop.  It's a sequence point, not an
 operation in itself.  It guarantees that reads/writes occurring
 before or after it aren't resequenced around it.  I don't even
 understand what write with fence means --- is the write supposed
 to be fenced against other writes before it, or other writes after
 it?
 
I was taking it to mean something similar to the memory guarantees
around synchronized blocks in Java.  At the start of a synchronized
block you discard any cached data which you've previously read from
or written to main memory, and must read everything fresh from that
point.  At the end of a synchronized block you must write any
locally written values to main memory, although you retain them in
your thread-local cache for possible re-use.  Reads or writes from
outside the synchronized block can be pulled into the block and
reordered in among the reads and writes within the block (which may
also be reordered) unless there's another block to contain them.
 
It works fine once you have your head around it, and allows for
significant optimization in a heavily multi-threaded application.  I
have no idea whether such a model would be useful for PostgreSQL. If
I understand Tom he is proposing what sounds roughly like what could
be achieved in the Java memory model by keeping all code for a
process within a single synchronized block, with the fence being a
point where you end it (flushing all local writes to main memory)
and start a new one (forcing a discard of locally cached data).
 
Of course I'm ignoring the locking aspect of synchronized blocks and
just discussing the memory access aspect of them.  (A synchronized
block in Java always references some [any] Object, and causes an
exclusive lock to be held on the object from one end of the block to
the other.)
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UNNEST ... WITH ORDINALITY (AND POSSIBLY OTHER STUFF)

2010-11-19 Thread Caleb.Welton
Note the standard also supports unnesting multiple arrays concurrently, the 
rule for handling arrays with different lengths is to use null padding of the 
shorter array.

SELECT * FROM
   UNNEST( ARRAY[5,2,3,4],
   ARRAY['hello', 'world'] )
   WITH ORDINALITY AS t(a,b,i);

a b i
---   -- --
5  'hello'  1
2  'world'  2
3   3
4   4
(4 rows)


To implement this it is not just substituting the existing unnest(anyarray) 
function in multiple times.

Regards,
   Caleb

On Nov 19, 2010, at 4:50 AM, 
pgsql-hackers-ow...@postgresql.orgmailto:pgsql-hackers-ow...@postgresql.org 
pgsql-hackers-ow...@postgresql.orgmailto:pgsql-hackers-ow...@postgresql.org 
wrote:

From: David Fetter da...@fetter.orgmailto:da...@fetter.org
Date: November 18, 2010 11:48:16 PM PST
To: Itagaki Takahiro 
itagaki.takah...@gmail.commailto:itagaki.takah...@gmail.com
Cc: PG Hackers 
pgsql-hackers@postgresql.orgmailto:pgsql-hackers@postgresql.org
Subject: Re: UNNEST ... WITH ORDINALITY (AND POSSIBLY OTHER STUFF)


On Fri, Nov 19, 2010 at 11:40:05AM +0900, Itagaki Takahiro wrote:
On Fri, Nov 19, 2010 at 08:33, David Fetter 
da...@fetter.orgmailto:da...@fetter.org wrote:
In order to get WITH ORDINALITY, would it be better to change
gram.y to account for both WITH ORDINALITY and without, or just
for the WITH ORDINALITY case?

We probably need to change gram.y and make UNNEST to be
COL_NAME_KEYWORD.  UNNEST (without ORDINALITY) will call the
existing unnest() function, and UNNEST() WITH ORDINALITY will call
unnest_with_ordinality().

Thanks for sketching that out :)

BTW, what will we return for arrays with 2 or more dimensions?

At the moment, per the SQL standard, UNNEST without the WITH
ORDINALITY clause flattens all dimensions.

SELECT * FROM UNNEST(ARRAY[[1,2],[3,4]]);
unnest

 1
 2
 3
 4
(4 rows)

Unless we want to do something super wacky and contrary to the SQL
standard, UNNEST(array) WITH ORDINALITY should do the same.

There are no confusion in your two arguments version:
UNNEST(anyarray, number_of_dimensions_to_unnest)
but we will also support one argument version. Array indexes will
be composite numbers in the cases. The possible design would be just
return sequential serial numbers of the values -- the following two
queries return the same results:

- SELECT i, v FROM UNNEST($1) WITH ORDINALITY AS t(v, i)
- SELECT row_number() OVER () AS i, v FROM UNNEST($1) AS t(v)

Yes, that's what the standard says.  Possible less-than-total
unrolling schemes include:

- Flatten specified number of initial dimensions into one list, e.g.
 turn UNNEST(array_3d, 2) into SETOF(array_1d) with one column of
 ORDINALITY

- Flatten similarly, but have an ORDINALITY column for each flattened
 dimension.

- More exotic schemes, such as UNNEST(array_3d, [1,3]), with either of
 the two methods above.

And of course the all-important:

- Other possibilities I haven't thought of :)

Cheers,
David.
--
David Fetter da...@fetter.orgmailto:da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: 
david.fet...@gmail.commailto:david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-19 Thread Tom Lane
Kevin Grittner kevin.gritt...@wicourts.gov writes:
 Tom Lane t...@sss.pgh.pa.us wrote:
 That's really entirely the wrong way to think about it.  You need
 a fence primitive, full stop.  It's a sequence point, not an
 operation in itself.

 I was taking it to mean something similar to the memory guarantees
 around synchronized blocks in Java.  At the start of a synchronized
 block you discard any cached data which you've previously read from
 or written to main memory, and must read everything fresh from that
 point.  At the end of a synchronized block you must write any
 locally written values to main memory, although you retain them in
 your thread-local cache for possible re-use.

That is basically the model that we have implemented in the spinlock
primitives: taking a spinlock corresponds to starting a synchronized
block and releasing the spinlock ends it.  On processors that need
it, the spinlock macros include memory fence instructions that implement
the above semantics.

However, for lock-free interactions I think this model isn't terribly
helpful: it's not clear what is inside and what is outside the sync
block, and forcing your code into that model doesn't improve either
clarity or performance.  What you typically need is a guarantee about
the order in which writes become visible.  To give a concrete example,
the sinval bug I was mentioning earlier boiled down to assuming that a
write into an element of the sinval message array would become visible
to other processors before the change of the last-message pointer
variable became visible to them.  Without a fence instruction, that
doesn't hold on WMO processors, and so they were able to fetch a stale
message value.  In some cases you also need to guarantee the order of
reads.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-19 Thread Kevin Grittner
Tom Lane t...@sss.pgh.pa.us wrote:
 
 What you typically need is a guarantee about the order in which
 writes become visible.
 
 In some cases you also need to guarantee the order of reads.
 
Doesn't that suggest different primitives?
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-19 Thread Andres Freund
On Friday 19 November 2010 18:46:00 Tom Lane wrote:
 I wrote:
  Markus Wanner mar...@bluegap.ch writes:
  Well, that certainly doesn't apply to full fences, that are not specific
  to a particular piece of memory. I'm thinking of 'mfence' on x86_64 or
  'mf' on ia64.
  
  Hm, what do those do exactly?
 
 I poked around in the Intel manuals a bit.  They do have mfence (also
 lfence and sfence) but so far as I can tell, those are only used to
 manage loads and stores that are issued by special instructions that
 explicitly mark the operation as weakly ordered.  So the reason we're
 not seeing bugs is presumably that C compilers don't generate such
 instructions.
Well. Some memcpy() implementations use string (or SIMD) operations which are 
weakly ordered though.

 Also, Intel architectures do guarantee cache consistency
 across multiple processors (and it costs them a lot...)
Only if you are talking about the *same* locations though. See example 8.2.3.4

Combined with:

For the Intel486 and Pentium processors, the LOCK# signal is always asserted 
on the bus during a LOCK operation, even if the area of memory being locked is 
cached in the processor.  For the P6 and more recent processor families, if 
the area of memory being locked during a LOCK operation is cached in the 
processor that is performing the LOCK operation as write-back memory and is 
completely contained in a cache line, the processor may not assert the LOCK# 
signal on the bus. Instead, it will modify the memory location internally and 
allow it’s cache coherency mechanism to ensure that the operation is carried 
out atomically. This operation is called “cache locking.” The cache coherency 
mechanism automatically prevents two or more processors that have cached the 
same area of memory from simultaneously modifying data in that area.


Which means something like (in intel's terminology) can happen:

initially x = 0

P1: mov [_X], 1
P1: lock xchg Y, 1

P2. lock xchg [_Z], 1
P2: mov r1, [_X]

A valid result is that r1 on P2 is 0.

I think that is not biting pg because it always uses the same spinlocks at the 
reading and writing side - but I am not that sure about that.

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UNNEST ... WITH ORDINALITY (AND POSSIBLY OTHER STUFF)

2010-11-19 Thread David Fetter
On Fri, Nov 19, 2010 at 01:48:06PM -0500, caleb.wel...@emc.com wrote:
 Note the standard also supports unnesting multiple arrays
 concurrently, the rule for handling arrays with different lengths is
 to use null padding of the shorter array.

Interesting.  I notice that our version doesn't support multiple-array
UNNEST just yet.

SELECT * FROM UNNEST(ARRAY[1,2,3,4], ARRAY['hello','world']);
ERROR:  function unnest(integer[], text[]) does not exist
LINE 1: SELECT * FROM UNNEST(ARRAY[1,2,3,4], ARRAY['hello','world'])...
  ^
HINT:  No function matches the given name and argument types. You might need to 
add explicit type casts.

 
 SELECT * FROM
UNNEST( ARRAY[5,2,3,4],
ARRAY['hello', 'world'] )
WITH ORDINALITY AS t(a,b,i);
 
 a b i
 ---   -- --
 5  'hello'  1
 2  'world'  2
 3   3
 4   4
 (4 rows)

This looks a lot like an OUTER JOIN on the ORDINALITY column of each
of the individual UNNEST...WITH ORDINALITYs.  Given that we know the
ORDINALITY in advance just by building the arrays, we could optimize
this away from FULL JOIN to LEFT (or RIGHT) JOINs.

 To implement this it is not just substituting the existing unnest(anyarray) 
 function in multiple times.

Right.

 
 Regards,
Caleb
 
 On Nov 19, 2010, at 4:50 AM, 
 pgsql-hackers-ow...@postgresql.orgmailto:pgsql-hackers-ow...@postgresql.org
  
 pgsql-hackers-ow...@postgresql.orgmailto:pgsql-hackers-ow...@postgresql.org
  wrote:
 
 From: David Fetter da...@fetter.orgmailto:da...@fetter.org
 Date: November 18, 2010 11:48:16 PM PST
 To: Itagaki Takahiro 
 itagaki.takah...@gmail.commailto:itagaki.takah...@gmail.com
 Cc: PG Hackers 
 pgsql-hackers@postgresql.orgmailto:pgsql-hackers@postgresql.org
 Subject: Re: UNNEST ... WITH ORDINALITY (AND POSSIBLY OTHER STUFF)
 
 
 On Fri, Nov 19, 2010 at 11:40:05AM +0900, Itagaki Takahiro wrote:
 On Fri, Nov 19, 2010 at 08:33, David Fetter 
 da...@fetter.orgmailto:da...@fetter.org wrote:
 In order to get WITH ORDINALITY, would it be better to change
 gram.y to account for both WITH ORDINALITY and without, or just
 for the WITH ORDINALITY case?
 
 We probably need to change gram.y and make UNNEST to be
 COL_NAME_KEYWORD.  UNNEST (without ORDINALITY) will call the
 existing unnest() function, and UNNEST() WITH ORDINALITY will call
 unnest_with_ordinality().
 
 Thanks for sketching that out :)
 
 BTW, what will we return for arrays with 2 or more dimensions?
 
 At the moment, per the SQL standard, UNNEST without the WITH
 ORDINALITY clause flattens all dimensions.
 
 SELECT * FROM UNNEST(ARRAY[[1,2],[3,4]]);
 unnest
 
  1
  2
  3
  4
 (4 rows)
 
 Unless we want to do something super wacky and contrary to the SQL
 standard, UNNEST(array) WITH ORDINALITY should do the same.
 
 There are no confusion in your two arguments version:
 UNNEST(anyarray, number_of_dimensions_to_unnest)
 but we will also support one argument version. Array indexes will
 be composite numbers in the cases. The possible design would be just
 return sequential serial numbers of the values -- the following two
 queries return the same results:
 
 - SELECT i, v FROM UNNEST($1) WITH ORDINALITY AS t(v, i)
 - SELECT row_number() OVER () AS i, v FROM UNNEST($1) AS t(v)
 
 Yes, that's what the standard says.  Possible less-than-total
 unrolling schemes include:
 
 - Flatten specified number of initial dimensions into one list, e.g.
  turn UNNEST(array_3d, 2) into SETOF(array_1d) with one column of
  ORDINALITY
 
 - Flatten similarly, but have an ORDINALITY column for each flattened
  dimension.
 
 - More exotic schemes, such as UNNEST(array_3d, [1,3]), with either of
  the two methods above.
 
 And of course the all-important:
 
 - Other possibilities I haven't thought of :)
 
 Cheers,
 David.
 --
 David Fetter da...@fetter.orgmailto:da...@fetter.org http://fetter.org/
 Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
 Skype: davidfetter  XMPP: 
 david.fet...@gmail.commailto:david.fet...@gmail.com
 iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics
 
 Remember to vote!
 Consider donating to Postgres: http://www.postgresql.org/about/donate
 

-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] how correctly detoast a Datum value?

2010-11-19 Thread Pavel Stehule
Hello

I try to explicitly detoast a plpgsql var, but I this code breaks a content.

what is wrong?

switch (datum-dtype)
{
case PLPGSQL_DTYPE_VAR:
{
PLpgSQL_var *var = (PLpgSQL_var *) datum;

*typeid = var-datatype-typoid;
*typetypmod = var-datatype-atttypmod;
*isnull = var-isnull;

/*.
 * explicitly detoasting a possible
toasted values,
 * should to protect us under repeated
detoasting.
 * and decomprimiting
 */
if (!*isnull 
!var-datatype-typbyval  var-datatype-typlen == -1)
{
struct varlena *datum =
PG_DETOAST_DATUM(var-value);

if ((Pointer) datum !=
DatumGetPointer(var-value))
{
free_var(var);
var-value =
PointerGetDatum(datum);
}
*value = var-value;
}
else
*value = var-value;

break;
}

Regards

Pavel Stehule

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UNNEST ... WITH ORDINALITY (AND POSSIBLY OTHER STUFF)

2010-11-19 Thread Alvaro Herrera
Excerpts from Caleb.Welton's message of vie nov 19 15:48:06 -0300 2010:
 Note the standard also supports unnesting multiple arrays concurrently, the 
 rule for handling arrays with different lengths is to use null padding of the 
 shorter array.
 
 SELECT * FROM
UNNEST( ARRAY[5,2,3,4],
ARRAY['hello', 'world'] )
WITH ORDINALITY AS t(a,b,i);
 
 a b i
 ---   -- --
 5  'hello'  1
 2  'world'  2
 3   3
 4   4
 (4 rows)

Hmm, this is pretty interesting and useful --- I had to deal with some
XPath code not long ago and I had to turn to plpgsql; I think it could
have been done with multi-array unnest.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-19 Thread Andres Freund
On Friday 19 November 2010 20:03:27 Andres Freund wrote:
 Which means something like (in intel's terminology) can happen:
 
 initially x = 0
 
 P1: mov [_X], 1
 P1: lock xchg Y, 1
 
 P2. lock xchg [_Z], 1
 P2: mov r1, [_X]
 
 A valid result is that r1 on P2 is 0.
 
 I think that is not biting pg because it always uses the same spinlocks at
 the  reading and writing side - but I am not that sure about that.
Which also seems to mean that a simple read memory barrier that does __asm__ 
__volatile__(lock; xaddl $0, ???) seems not to be enough unless you use the 
same address for all those barriers which would cause horrible cacheline 
bouncing.

Am I missing something?

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] duplicate connection failure messages

2010-11-19 Thread Bruce Momjian
Tom Lane wrote:
 Alvaro Herrera alvhe...@commandprompt.com writes:
  Excerpts from Bruce Momjian's message of vie nov 19 00:17:59 -0300 2010:
  Alvaro Herrera wrote:
  I think we should use inet_ntop where available to print the address.
  
  Good idea because inet_ntop() is thread-safe.  Does that work on IPv6? 
  You indicated that inet_ntoa() does not.
 
  According to opengroup.org, IPv6 should work if the underlying libraries
  support it, whereas inet_ntoa explicitely does not.
  http://www.opengroup.org/onlinepubs/009695399/functions/inet_ntop.html
  http://www.opengroup.org/onlinepubs/009695399/functions/inet_addr.html
 
 I get the impression that you guys have forgotten the existence of
 src/backend/utils/adt/inet_net_ntop.c

Yeah, that is nice, but we are calling this from libpq, not the backend.
Let me work up a patch.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] directory archive format for pg_dump

2010-11-19 Thread Dimitri Fontaine
Hi,

Sharing some thoughts after a first round of reviewing, where I only had
time to read the patch itself.

Joachim Wieland j...@mcknight.de writes:
 Since the compression is currently all down in the custom format
 backup code,
 the first thing I've done was refactoring the compression functions
 into a
 separate file. While at it, I have added support for liblzf
 compression.

I think I'd like to see a separate patch for the new compression
support. Sorry about that, I realize that's extra work…

And it could be about personal preferences, but the way you added the
liblzf support strikes me at odd, with all those #ifdefs everywhere. Is
it possible to have a specific file for each supported compression
format, then some routing code in src/bin/pg_dump/compress_io.c?

The routing code already exists but then the file is full of #ifdef
sections to define the right supporting function when I think having a
compress_io_zlib and a compress_io_lzf files would be better.


Then there's the bulk of the new dump format feature in the other part
of the patch, namely src/bin/pg_dump/pg_backup_directory.c. You have to
update the copyright in the file header there, at least :)

I'm yet to devote more time on this part of the patch but it seems like
it's rewriting the full support without using the existing bits. That's
something I have to check, didn't have time to read the existing other
archive formats code there.

I'm hesitant as far as marking the patch Waiting on author to get it
split. Joachim, what do you think?

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] duplicate connection failure messages

2010-11-19 Thread Alvaro Herrera
Excerpts from Bruce Momjian's message of vie nov 19 16:43:33 -0300 2010:
 Tom Lane wrote:

  I get the impression that you guys have forgotten the existence of
  src/backend/utils/adt/inet_net_ntop.c
 
 Yeah, that is nice, but we are calling this from libpq, not the backend.
 Let me work up a patch.

Actually the code seems agnostic (no ereport, palloc etc) so maybe it
could just be moved to src/port.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] duplicate connection failure messages

2010-11-19 Thread Bruce Momjian
Alvaro Herrera wrote:
 Excerpts from Bruce Momjian's message of vie nov 19 16:43:33 -0300 2010:
  Tom Lane wrote:
 
   I get the impression that you guys have forgotten the existence of
   src/backend/utils/adt/inet_net_ntop.c
  
  Yeah, that is nice, but we are calling this from libpq, not the backend.
  Let me work up a patch.
 
 Actually the code seems agnostic (no ereport, palloc etc) so maybe it
 could just be moved to src/port.

I was wondering that.  I am unclear if we need it though --- can we not
assume inet_ntop() exists on all systems?  We assumed inet_ntoa() did. 
Of course, the buildfarm will tell us.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Changes to Linux OOM killer in 2.6.36

2010-11-19 Thread Alex Hunsaker
On Thu, Nov 18, 2010 at 19:43, Greg Smith g...@2ndquadrant.com wrote:
 Last month's new Linux kernel 2.6.36 includes a rewrite of the out of memory
 killer:
 http://lwn.net/Articles/391222/
 http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commitdiff;h=a63d83f427fbce97a6cea0db2e64b0eb8435cd10

Yeah, Ive been following this somewhat closely...
Also of interest is the recent thread about reverting the new oom
(don't know if it will happen, but maybe they won't deprecate
oom_adj):
http://lkml.org/lkml/2010/11/14/5

 The new badness method totals the task's RSS and swap as a percentage of
 RAM, where the old one scored starting with the total memory used by the
 process.  I *think* that this is an improvement for PostgreSQL, based on the
 sort of data I see with:

Well, it seems to be an improvement.  If I look at the oom_score on a
2.6.36 box ruining postgres I get:
$ cd /proc; for a in [0-9]*; do echo `cat $a/oom_score` $a `perl
-pes/'\0.*$'//  $a/cmdline`;   done|grep -v ^0|sort -n |less
1 1309 supervising syslog-ng
1 1310 /usr/sbin/syslog-ng
1 1336 /usr/sbin/crond
1 1368 /usr/sbin/irqbalance
1 1485 /usr/sbin/ntpd
1 1495 /usr/local/bin/pgbouncer
1 1506 /sbin/agetty

1 3391 /var/lib/postgres/pgsql-9.0/bin/postgres
1 3393 postgres: writer process
1 3394 postgres: wal writer process
1 3395 postgres: autovacuum launcher process
1 3396 postgres: stats collector process
1 4110 postgres: joshua wopr [local] idle
2 4109 postgres: joshua wopr [local] idle

So in this case it should kill one of the backends *before* the
postmaster.  Ignoring that backend... it looks like postmaster has the
same score as every other process on the system.  It also has a has a
higher RSS than most, so I suspect it will still get killed first:
$ ps ax -o rss,pid,size,vsize,args | sort -n
...
 2416  1680   588  46548 /usr/lib/postfix/master
 2424  1696   640  46748 qmgr -l -t fifo -u
 2956  3395  2416 244644 postgres: autovacuum launcher process
 3116  2216   720  65464 sshd: alex [priv]
 4096  3393  1088 243316 postgres: writer process
 6592  4110  2516 246808 postgres: joshua wopr [local] idle
11756  3391   900 243128 /var/lib/postgres/pgsql-9.0/bin/postgres
32640  4109  9084 255564 postgres: joshua wopr [local] idle in transaction

So I think we will still need to protect the postmaster from OOM :(.

 One thing that's definitely changed is the interface used to control turning
 off the OOM killer.

Grr...  Whatever happens to a stable userspace abi?

 I don't think it's worth doing anything to the database code until tests on
 the newer kernel confirm whether this whole thing is even necessary anymore.

+1

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-19 Thread Robert Haas
On Fri, Nov 19, 2010 at 1:51 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 However, for lock-free interactions I think this model isn't terribly
 helpful: it's not clear what is inside and what is outside the sync
 block, and forcing your code into that model doesn't improve either
 clarity or performance.  What you typically need is a guarantee about
 the order in which writes become visible.  To give a concrete example,
 the sinval bug I was mentioning earlier boiled down to assuming that a
 write into an element of the sinval message array would become visible
 to other processors before the change of the last-message pointer
 variable became visible to them.  Without a fence instruction, that
 doesn't hold on WMO processors, and so they were able to fetch a stale
 message value.  In some cases you also need to guarantee the order of
 reads.

But what about timings vs. random other stuff?  Like in this case
there's a problem if the signal arrives before the memory update to
latch-is_set becomes visible.  I don't know what we need to do to
guarantee that.

This page seems to indicate that x86 is OK as far as this is concerned
- we can simply store a 1 and everyone will see it:

http://coding.derkeiler.com/Archive/Assembler/comp.lang.asm.x86/2004-08/0979.html

...but if we were to, say, increment a counter at that location, it
would not be safe without a LOCK prefix (further messages in the
thread indicate that you might also have a problem if the address in
question is unaligned).

It's not obvious to me, however, what might be required on other processors.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UNNEST ... WITH ORDINALITY (AND POSSIBLY OTHER STUFF)

2010-11-19 Thread Caleb.Welton
The other aspect of the standard that the Postgres implementation does not 
currently support is the fact that unnest is supposed to be defined in terms of 
laterally derived subqueries, e.g. you should be able to unnest another element 
from a from list entry laterally on the left.

CREATE TABLE t1(id int, values int[]);
SELECT id, a FROM t1 UNNEST(values) as u(a);

If you consider it in terms of LATERAL, which Postgres also doesn't support, 
then you may find that it works out much more cleanly to consider the 
multi-array unnest in terms of that rather than in terms of an outer join.  
Specifically since arrays are implicitly ordered on their ordinality a simple 
array lookup is much easier/more efficient than performing a full fledged join 
operator.

E.g. the rewrite is:
  SELECT id, values[i] as a FROM t1 LATERAL(SELECT 
generate_series(array_lower(values, 1), array_upper(values, 1) ) as lat(i);

But then LATERAL support is something that has been discussed on and off for a 
while without seeing much progress.

Regards,
   Caleb

On Nov 19, 2010, at 11:06 AM, David Fetter wrote:

 On Fri, Nov 19, 2010 at 01:48:06PM -0500, caleb.wel...@emc.com wrote:
 Note the standard also supports unnesting multiple arrays
 concurrently, the rule for handling arrays with different lengths is
 to use null padding of the shorter array.
 
 Interesting.  I notice that our version doesn't support multiple-array
 UNNEST just yet.
 
 SELECT * FROM UNNEST(ARRAY[1,2,3,4], ARRAY['hello','world']);
 ERROR:  function unnest(integer[], text[]) does not exist
 LINE 1: SELECT * FROM UNNEST(ARRAY[1,2,3,4], ARRAY['hello','world'])...
  ^
 HINT:  No function matches the given name and argument types. You might need 
 to add explicit type casts.
 
 
 SELECT * FROM
   UNNEST( ARRAY[5,2,3,4],
   ARRAY['hello', 'world'] )
   WITH ORDINALITY AS t(a,b,i);
 
 a b i
 ---   -- --
 5  'hello'  1
 2  'world'  2
 3   3
 4   4
 (4 rows)
 
 This looks a lot like an OUTER JOIN on the ORDINALITY column of each
 of the individual UNNEST...WITH ORDINALITYs.  Given that we know the
 ORDINALITY in advance just by building the arrays, we could optimize
 this away from FULL JOIN to LEFT (or RIGHT) JOINs.
 
 To implement this it is not just substituting the existing unnest(anyarray) 
 function in multiple times.
 
 Right.
 
 
 Regards,
   Caleb
 
 On Nov 19, 2010, at 4:50 AM, 
 pgsql-hackers-ow...@postgresql.orgmailto:pgsql-hackers-ow...@postgresql.org
  
 pgsql-hackers-ow...@postgresql.orgmailto:pgsql-hackers-ow...@postgresql.org
  wrote:
 
 From: David Fetter da...@fetter.orgmailto:da...@fetter.org
 Date: November 18, 2010 11:48:16 PM PST
 To: Itagaki Takahiro 
 itagaki.takah...@gmail.commailto:itagaki.takah...@gmail.com
 Cc: PG Hackers 
 pgsql-hackers@postgresql.orgmailto:pgsql-hackers@postgresql.org
 Subject: Re: UNNEST ... WITH ORDINALITY (AND POSSIBLY OTHER STUFF)
 
 
 On Fri, Nov 19, 2010 at 11:40:05AM +0900, Itagaki Takahiro wrote:
 On Fri, Nov 19, 2010 at 08:33, David Fetter 
 da...@fetter.orgmailto:da...@fetter.org wrote:
 In order to get WITH ORDINALITY, would it be better to change
 gram.y to account for both WITH ORDINALITY and without, or just
 for the WITH ORDINALITY case?
 
 We probably need to change gram.y and make UNNEST to be
 COL_NAME_KEYWORD.  UNNEST (without ORDINALITY) will call the
 existing unnest() function, and UNNEST() WITH ORDINALITY will call
 unnest_with_ordinality().
 
 Thanks for sketching that out :)
 
 BTW, what will we return for arrays with 2 or more dimensions?
 
 At the moment, per the SQL standard, UNNEST without the WITH
 ORDINALITY clause flattens all dimensions.
 
 SELECT * FROM UNNEST(ARRAY[[1,2],[3,4]]);
 unnest
 
 1
 2
 3
 4
 (4 rows)
 
 Unless we want to do something super wacky and contrary to the SQL
 standard, UNNEST(array) WITH ORDINALITY should do the same.
 
 There are no confusion in your two arguments version:
 UNNEST(anyarray, number_of_dimensions_to_unnest)
 but we will also support one argument version. Array indexes will
 be composite numbers in the cases. The possible design would be just
 return sequential serial numbers of the values -- the following two
 queries return the same results:
 
 - SELECT i, v FROM UNNEST($1) WITH ORDINALITY AS t(v, i)
 - SELECT row_number() OVER () AS i, v FROM UNNEST($1) AS t(v)
 
 Yes, that's what the standard says.  Possible less-than-total
 unrolling schemes include:
 
 - Flatten specified number of initial dimensions into one list, e.g.
 turn UNNEST(array_3d, 2) into SETOF(array_1d) with one column of
 ORDINALITY
 
 - Flatten similarly, but have an ORDINALITY column for each flattened
 dimension.
 
 - More exotic schemes, such as UNNEST(array_3d, [1,3]), with either of
 the two methods above.
 
 And of course the all-important:
 
 - Other possibilities I haven't thought of :)
 
 Cheers,
 David.
 --
 David Fetter da...@fetter.orgmailto:da...@fetter.org http://fetter.org/
 

Re: [HACKERS] [PATCH] Custom code int(32|64) = text conversions out of performance reasons

2010-11-19 Thread Andres Freund
On Monday 15 November 2010 17:12:25 Robert Haas wrote: I notice that int8out 
isn't terribly consistent with int2out and
 int4out, in that it does an extra copy.   Maybe that's justified given
 the greater potential memory wastage, but I'm not certain.  One
 approach might be to pick some threshold value and allocate a buffer
 in one of two sizes based on how large the value is relative to that
 cutoff.  But that might also be a stupid idea, not sure.
I removed the extra buffer - its actually a tiny bit faster without it  (I 
guess the allocation pattern is a bit nicer during copy as it will always take 
the same paths and eventually the same address).
I couldn't measure any difference memory-usage wise.

The code was that way before btw.

 It would speed things up for me if you or someone else could take a
 quick pass over what remains here and fix the formatting and
 whitespace to be consistent with our general project style, and make
 the comment headers more consistent among the functions being
 added/modified.
I think I did most of those - the function comments in numutils weren't 
consistent before - now its consistent with the unchanged pg_atoi. 

Thanks for reviewing/applying the first part,

Andres
From 55acfa4f971f5a0e33eb8b9e66d621c16be96d42 Mon Sep 17 00:00:00 2001
From: Andres Freund and...@anarazel.de
Date: Fri, 19 Nov 2010 21:44:29 +0100
Subject: [PATCH] Implement custom int[248]-string conversion routines out of speed reasons.

---
 src/backend/utils/adt/int8.c   |   10 +--
 src/backend/utils/adt/numutils.c   |  130 
 src/include/utils/builtins.h   |1 +
 src/test/regress/expected/int2.out |   13 
 src/test/regress/expected/int4.out |   13 
 src/test/regress/expected/int8.out |   13 
 src/test/regress/sql/int2.sql  |4 +
 src/test/regress/sql/int4.sql  |4 +
 src/test/regress/sql/int8.sql  |4 +
 9 files changed, 172 insertions(+), 20 deletions(-)

diff --git a/src/backend/utils/adt/int8.c b/src/backend/utils/adt/int8.c
index 894110d..8f4ef5a 100644
*** a/src/backend/utils/adt/int8.c
--- b/src/backend/utils/adt/int8.c
***
*** 20,25 
--- 20,26 
  #include funcapi.h
  #include libpq/pqformat.h
  #include utils/int8.h
+ #include utils/builtins.h
  
  
  #define MAXINT8LEN		25
*** Datum
*** 157,170 
  int8out(PG_FUNCTION_ARGS)
  {
  	int64		val = PG_GETARG_INT64(0);
! 	char	   *result;
! 	int			len;
! 	char		buf[MAXINT8LEN + 1];
! 
! 	if ((len = snprintf(buf, MAXINT8LEN, INT64_FORMAT, val))  0)
! 		elog(ERROR, could not format int8);
  
! 	result = pstrdup(buf);
  	PG_RETURN_CSTRING(result);
  }
  
--- 158,166 
  int8out(PG_FUNCTION_ARGS)
  {
  	int64		val = PG_GETARG_INT64(0);
! 	char	   *result = palloc(MAXINT8LEN + 1);
  
! 	pg_lltoa(val, result);
  	PG_RETURN_CSTRING(result);
  }
  
diff --git a/src/backend/utils/adt/numutils.c b/src/backend/utils/adt/numutils.c
index 5f8083f..7b50549 100644
*** a/src/backend/utils/adt/numutils.c
--- b/src/backend/utils/adt/numutils.c
***
*** 3,10 
   * numutils.c
   *	  utility functions for I/O of built-in numeric types.
   *
-  *		integer:pg_atoi, pg_itoa, pg_ltoa
-  *
   * Portions Copyright (c) 1996-2010, PostgreSQL Global Development Group
   * Portions Copyright (c) 1994, Regents of the University of California
   *
--- 3,8 
*** pg_atoi(char *s, int size, int c)
*** 109,135 
  }
  
  /*
!  *		pg_itoa			- converts a short int to its string represention
   *
!  *		Note:
!  *previously based on ~ingres/source/gutil/atoi.c
!  *now uses vendor's sprintf conversion
   */
  void
  pg_itoa(int16 i, char *a)
  {
! 	sprintf(a, %hd, (short) i);
  }
  
  /*
!  *		pg_ltoa			- converts a long int to its string represention
   *
!  *		Note:
!  *previously based on ~ingres/source/gutil/atoi.c
!  *now uses vendor's sprintf conversion
   */
  void
! pg_ltoa(int32 l, char *a)
  {
! 	sprintf(a, %d, l);
  }
--- 107,239 
  }
  
  /*
!  * pg_ltoa - convert a signed 16bit integer to its string representation
   *
!  * It doesnt seem worth implementing this separately.
   */
  void
  pg_itoa(int16 i, char *a)
  {
! 	pg_ltoa((int32)i, a);
  }
  
+ 
  /*
!  * pg_ltoa: convert a signed 32bit integer to its string representation
   *
!  * 'buf' has to be 12 bytes long to fit the result of any 32bit integer.
!  *
!  * Its unfortunate to have this function twice - once for 32bit, once
!  * for 64bit, but incurring the cost of 64bit computation to 32bit
!  * platforms doesn't seem to be acceptable.
   */
  void
! pg_ltoa(int32 value, char *buf)
  {
! 	char *bufstart = buf;
! 	bool neg = false;
! 
! 	/*
! 	 * Avoid problems with the most negative not being representable
! 	 * as a positive integer
! 	 */
! 	if (value == INT32_MIN)
! 	{
! 		memcpy(buf, -2147483648, 12);
! 		return;
! 	}
! 	else if (value  0)
! 	{
! 		value = -value;
! 		neg = true;
! 	}
! 
! 	/* Build the string by computing the wanted 

Re: [HACKERS] directory archive format for pg_dump

2010-11-19 Thread José Arthur Benetasso Villanova
Hi Dimitri and Joachim.

I've looked the patch too, and I want to share some thoughts too. I've
used http://wiki.postgresql.org/wiki/Reviewing_a_Patch to guide my
review.

Submission review:

I've apllied and compiled the patch successfully using the current master.

Usability review:

The dir format generated in my database 60 files, with different
sizes, and it looks very confusing. Is it possible to use the same
trick as pigz and pbzip2, creating a concatenated file of streams?

Feature test:

Just a partial review. I can dump / restore using lzf, but didnt
stress it hard to check robustness.

Performance review:

Didnt test it hard too, but looks ok.


Coding review:

Just a shallow review here.

 I think I'd like to see a separate patch for the new compression
 support. Sorry about that, I realize that's extra work…

Same feeling here, this is the 1st thing that I notice.

The md5.c and kwlookup.c reuse using a link doesn't look nice either.
This way you need to compile twice, among others things, but I think
that its temporary, right?

-- 
José Arthur Benetasso Villanova

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Changes to Linux OOM killer in 2.6.36

2010-11-19 Thread Greg Smith

Kevin Grittner wrote:

Greg Smith  wrote:
 
  

oom_adj is deprecated, scheduled for removal in August 2010:

 
That surprised me so I checked the URL.  I believe you have a typo

there and it's August, 2012.
  


This is why I include references, so that when the cold medicine hits me 
in the middle of proofreading my message and I sent it anyway you aren't 
mislead.  Yes, 2012, only a few months before doomsday.  The aproaching 
end of the world then means any bugs left can be marked WONTFIX.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services and Supportwww.2ndQuadrant.us




[HACKERS] Hot Standby: too many KnownAssignedXids

2010-11-19 Thread Joachim Wieland
Hi,

I am seeing the following here on 9.0.1 on Linux x86-64:

LOG:  redo starts at 1F8/FC00E978
FATAL:  too many KnownAssignedXids
CONTEXT:  xlog redo insert: rel 1663/16384/18373; tid 3829898/23


and this is the complete history:

postgres was running as HS in foreground, Ctrl-C'ed it for a restart.

LOG:  received fast shutdown request
LOG:  aborting any active transactions
FATAL:  terminating walreceiver process due to administrator command
FATAL:  terminating connection due to administrator command
LOG:  shutting down
LOG:  database system is shut down


Started it up again:

$ postgres -D /db/
LOG:  database system was shut down in recovery at 2010-11-19 14:36:30 EST
LOG:  entering standby mode
cp: cannot stat `/archive/000101F90001': No such file or directory
cp: cannot stat `/archive/000101F800FC': No such file or directory
LOG:  redo starts at 1F8/FC00E978
FATAL:  too many KnownAssignedXids
CONTEXT:  xlog redo insert: rel 1663/16384/18373; tid 3829898/23
LOG:  startup process (PID 30052) exited with exit code 1
LOG:  terminating any other active server processes


(copied the log files over...)


./postgres -D /db/

LOG:  database system was interrupted while in recovery at log time
2010-11-19 14:36:12 EST
HINT:  If this has occurred more than once some data might be
corrupted and you might need to choose an earlier recovery target.
LOG:  entering standby mode
LOG:  restored log file 000101F90001 from archive
LOG:  restored log file 000101F800FC from archive
LOG:  redo starts at 1F8/FC00E978
FATAL:  too many KnownAssignedXids
CONTEXT:  xlog redo insert: rel 1663/16384/18373; tid 3829898/23
LOG:  startup process (PID 31581) exited with exit code 1
LOG:  terminating any other active server processes


Changing the line in the source code to give some more output gives me:

FATAL:  too many KnownAssignedXids. head: 0, tail: 0, nxids: 9978,
pArray-maxKnownAssignedXids: 6890


I still have the server, if you want me to debug anything or send a
patch against 9.0.1 that gives more output, just let me know.


Joachim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] directory archive format for pg_dump

2010-11-19 Thread Alvaro Herrera
Excerpts from José Arthur Benetasso Villanova's message of vie nov 19 18:28:03 
-0300 2010:

 The md5.c and kwlookup.c reuse using a link doesn't look nice either.
 This way you need to compile twice, among others things, but I think
 that its temporary, right?

Not sure what you mean here, but kwlookup.c is a symlink without this
patch too.  It's just the way it works; the compilation environments
here and in the backend are different, so there is no other option but
to compile twice.  I guess md5.c is a new one (I didn't check), but I
would assume it's the same thing.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] directory archive format for pg_dump

2010-11-19 Thread Joachim Wieland
Hi Dimitri,

thanks for reviewing my patch!

On Fri, Nov 19, 2010 at 2:44 PM, Dimitri Fontaine
dimi...@2ndquadrant.fr wrote:
 I think I'd like to see a separate patch for the new compression
 support. Sorry about that, I realize that's extra work…

I guess it wouldn't be a very big deal but I also doubt that it makes
the review that much easier. Basically the compression refactor patch
would just touch pg_backup_custom.c (because this is the place where
the libz compression is currently burried into) and the two new
compress_io.(c|h) files. Everything else is pretty much the directory
stuff and is on top of these changes.


 And it could be about personal preferences, but the way you added the
 liblzf support strikes me at odd, with all those #ifdefs everywhere. Is
 it possible to have a specific file for each supported compression
 format, then some routing code in src/bin/pg_dump/compress_io.c?

Sure we could. But I wanted to wait with any fancy function pointer
stuff until we have decided if we want to include the liblzf support
at all. The #ifdefs might be a bit ugly but in case we do not include
liblzf support, it's the easiest way to take it out again. As written
in my introduction, this patch is not really about liblzf, liblzf is
just a proof of concept for factoring out the compression part and I
have included it, so that people can use it and see how much speed
improvement they get.


 The routing code already exists but then the file is full of #ifdef
 sections to define the right supporting function when I think having a
 compress_io_zlib and a compress_io_lzf files would be better.

Sure! I completely agree...


 Then there's the bulk of the new dump format feature in the other part
 of the patch, namely src/bin/pg_dump/pg_backup_directory.c. You have to
 update the copyright in the file header there, at least :)

Well, not sure if we can just change the copyright notice, because in
the end the structure was copied from one of the other files which all
have the copyright notice in them, so my work is based on those other
files...


 I'm hesitant as far as marking the patch Waiting on author to get it
 split. Joachim, what do you think?

I will see if I can split it.


Joachim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] directory archive format for pg_dump

2010-11-19 Thread Tom Lane
Dimitri Fontaine dimi...@2ndquadrant.fr writes:
 I think I'd like to see a separate patch for the new compression
 support. Sorry about that, I realize that's extra work…

That part of the patch is likely to get rejected outright anyway,
so I *strongly* recommend splitting it out.  We have generally resisted
adding random compression algorithms to pg_dump because of license and
patent considerations, and I see no reason to suppose this one is going
to pass muster.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] duplicate connection failure messages

2010-11-19 Thread Tom Lane
Bruce Momjian br...@momjian.us writes:
 I was wondering that.  I am unclear if we need it though --- can we not
 assume inet_ntop() exists on all systems?  We assumed inet_ntoa() did. 

The Single Unix Spec includes inet_ntoa but not inet_ntop.

 Of course, the buildfarm will tell us.

The buildfarm unfortunately contains only a subset of the platforms
we care about.  I don't think this problem is large enough to justify
taking a portability risk by depending on non-SUS library functions.

If you want to do this, please do it as suggested previously, ie depend
on the copy of the code we have internally.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-19 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 But what about timings vs. random other stuff?  Like in this case
 there's a problem if the signal arrives before the memory update to
 latch-is_set becomes visible.  I don't know what we need to do to
 guarantee that.

I don't believe there's an issue there.  A context swap into the kernel
is certainly going to include msync.  If you're afraid otherwise, you
could put an msync before the kill() call, but I think it's a waste of
effort.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-19 Thread Tom Lane
Andres Freund and...@anarazel.de writes:
 On Friday 19 November 2010 18:46:00 Tom Lane wrote:
 I poked around in the Intel manuals a bit.  They do have mfence (also
 lfence and sfence) but so far as I can tell, those are only used to
 manage loads and stores that are issued by special instructions that
 explicitly mark the operation as weakly ordered.  So the reason we're
 not seeing bugs is presumably that C compilers don't generate such
 instructions.

 Well. Some memcpy() implementations use string (or SIMD) operations which are 
 weakly ordered though.

I'd expect memcpy to msync at completion of the move if it does that
kind of thing.  Otherwise it's failing to ensure that the move is really
done before it returns.

 
 For the Intel486 and Pentium processors, the LOCK# signal is always asserted 
 on the bus during a LOCK operation, even if the area of memory being locked 
 is 
 cached in the processor.  For the P6 and more recent processor families, if 
 the area of memory being locked during a LOCK operation is cached in the 
 processor that is performing the LOCK operation as write-back memory and is 
 completely contained in a cache line, the processor may not assert the LOCK# 
 signal on the bus. Instead, it will modify the memory location internally and 
 allow it’s cache coherency mechanism to ensure that the operation is 
 carried 
 out atomically. This operation is called “cache locking.” The cache 
 coherency 
 mechanism automatically prevents two or more processors that have cached the 
 same area of memory from simultaneously modifying data in that area.
 

Like it says, the cache coherency mechanism prevents this from being a
problem for us.  Once the change is made in a processor's cache, it's
the cache's job to ensure that all processors see it --- and on Intel
architectures, the cache does take care of that.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fwd: patch: format function - fixed oid

2010-11-19 Thread Jeff Janes
On Thu, Nov 18, 2010 at 11:54 PM, Pavel Stehule pavel.steh...@gmail.com wrote:
 -- Forwarded message --
 From: Pavel Stehule pavel.steh...@gmail.com
 Date: 2010/11/18
 Subject: Re: patch: format function, next generation
 To: Jeff Janes jeff.ja...@gmail.com
 Kopie: pgsql-hackers-ow...@postgresql.org


 Hello

 somebody takes my oid :)

 updated patch is in attachment

 Regards

 Pavel Stehule


Dear Pavel and Hackers,

I've reviewed this patch.  It applied, makes, and passes make check.
It has added regression tests that seem appropriate.  I think the
feature added matches the consensus that emerged from the very long
email discussion.  The C code seems fine (to my meager abilities to
judge that).

But I think the documentation does need some work.  From func.sgml:


 This functions can be used to create a formated string or
message. There are allowed
 three types of tags: %s as string, %I as SQL identifiers and
%L as SQL literals. Attention:
 result for %I and %L must not be same as result of
functionquote_ident/function and
 functionquote_literal/function functions, because this
function doesn't try to coerce
 parameters to typetext/type type and directly use a
type's output functions. The placeholder
 can be related to some explicit parameter with using a
optional n$ specification inside format.

Should we make it explicit that this is inspired by C's sprintf?  Do
we want to call them tags?  This is introducing what seems to be a
new word to describe what are usually (I think) called conversion
specifiers.

Must not be the same  should be Might not be the same.   However,
it does not appear that quote_ident is willing to use coercion at all,
and the %L behavior is more comparable to quote_nullable.

Maybe:

This function can be used to create a formatted string suitable for
use as dynamic
SQL or as a message.  There are three types of conversion specifiers:
%s for literal strings, %I
for SQL identifiers, and %L for SQL literals.  Note that the results
of the %L conversion
might not be the same as the results of the
functionquote_nullable/function function, as
the latter coerces its argument to typetext/type while
functionformat/function
uses a type's output function.  A conversion can reference an explicit
parameter position
by using an optional n$ in the format specification.

Does type's output function need to cross-reference someplace?
coercion is described elsewhere in this section of docs, but output
functions are not.


And for the changes to plpgsql.sgml, I would propose:

para
 Building a string for dynamic SQL statement can be simplified
 by using the functionformat/function function (see xref
 linkend=functions-string):
programlisting
EXECUTE format('UPDATE tbl SET %I = %L WHERE key = %L', colname,
newvalue, keyvalue);
/programlisting
The functionformat/function format can be used together with
 the literalUSING/literal clause:
programlisting
EXECUTE format('UPDATE tbl SET %I = $1 WHERE key = $2', colname)
   USING newvalue, keyvalue;
/programlisting
 This form is more efficient because the parameters
 literalnewvalue/literal and literalkeyvalue/literal are
not converted to text.
/para


These are mostly grammatical changes, but with the last three lines I
may have missed the meaning of what you originally intended--I'm not
sure on that.


Thanks,

Jeff

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] directory archive format for pg_dump

2010-11-19 Thread Joachim Wieland
On Fri, Nov 19, 2010 at 11:53 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 Dimitri Fontaine dimi...@2ndquadrant.fr writes:
  I think I'd like to see a separate patch for the new compression
  support. Sorry about that, I realize that's extra work…

 That part of the patch is likely to get rejected outright anyway,
 so I *strongly* recommend splitting it out.  We have generally resisted
 adding random compression algorithms to pg_dump because of license and
 patent considerations, and I see no reason to suppose this one is going
 to pass muster.


I was already anticipating that possiblitiy and my inital patch description
is along these lines.

However, liblzf is BSD licensed so on the license side we should be fine.
Regarding patents, your last comment was that you'd like to see if it's
really worth it and so I have included support for lzf for anybody to go
ahead and find that out.

Will send an updated split up patch this weekend (which would actually be
four patches already...).


Joachim


Re: Latches with weak memory ordering (Re: [HACKERS] max_wal_senders must die)

2010-11-19 Thread Andres Freund
On Saturday 20 November 2010 00:08:07 Tom Lane wrote:
 Andres Freund and...@anarazel.de writes:
  On Friday 19 November 2010 18:46:00 Tom Lane wrote:
  I poked around in the Intel manuals a bit.  They do have mfence (also
  lfence and sfence) but so far as I can tell, those are only used to
  manage loads and stores that are issued by special instructions that
  explicitly mark the operation as weakly ordered.  So the reason we're
  not seeing bugs is presumably that C compilers don't generate such
  instructions.
  
  Well. Some memcpy() implementations use string (or SIMD) operations which
  are weakly ordered though.

 Like it says, the cache coherency mechanism prevents this from being a
 problem for us.  Once the change is made in a processor's cache, it's
 the cache's job to ensure that all processors see it --- and on Intel
 architectures, the cache does take care of that.
Check example 8.2.3.4 of 3a. - in my opinion that makes my example correct.

Andres


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Patch to add a primary key using an existing index

2010-11-19 Thread Steve Singer

On 10-11-07 01:54 PM, Gurjeet Singh wrote:

Attached is the patch that extends the same feature for UNIQUE indexes.

It also includes some doc changes for the ALTER TABLE command, but I
could not verify the resulting changes since I don't have the
doc-building infrastructure installed.

Regards,



Gurjeet,

I've taken a stab at reviewing this.

Submission Review:


Tests

The expected output for the regression tests you added don't match
what I'm getting when I run the tests with your patch applied.
I think you just need to regenerate the expected results they seem
to be from a previous version of the patch (different error messages etc..).


Documentation
---

I was able to generate the docs.

The ALTER TABLE page under the synopsis has

 ADD table_constraint

where table_constraint is defined on the CREATE TABLE page.
On the CREATE TABLE page table_constraint isn't defined as having the WITH
, the WITH is part of index_parameters.

I propose the alter table page instead have

ADD table_constraint [index_parameters]

where index_parameters also references the CREATE TABLE page like 
table_constraint.




Usability Review


Behaviour
-
I feel that if the ALTER TABLE ... renames the the index
a NOTICE should be generated.  We generate notices about creating an 
index for a new pkey. We should give them a notice that we are renaming 
an index on them.


Coding Review:
==

Error Messages
-
in tablecmds your errdetail messages often don't start with a capital 
letter. I belive the preference is to have the errdetail strings start 
with a capital letter and end with a period.



tablecmds.c  - get_constraint_index_oid

contains the check

/* Currently only B-tree indexes are suupported for primary keys */
if (index_rel-rd_rel-relam != BTREE_AM_OID)
elog(ERROR, \%s\ is not a B-Tree index, index_name);

but above we already validate that the index is a unique index with 
another check.  Today only B-tree indexes support unique constraints. 
If this changed at some point and we could have a unique index of some 
other type, would something in this patch need to be changed to support 
them?  If we are only depending on the uniqueness property then I think 
this check is covered by the uniquness one higher in the function.


Also note the typo in your comment above (suupported)




Comments
-

index.c: Line 671 and 694.  Your indentation changes make the comments
run over 80 characters.  If you end up submitting a new version
of the patch I'd reformat those two comments.


Other than those issues the patch looks good to me.

Steve


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UNNEST ... WITH ORDINALITY (AND POSSIBLY OTHER STUFF)

2010-11-19 Thread David Fetter
On Fri, Nov 19, 2010 at 04:11:56PM -0500, caleb.wel...@emc.com wrote:
 The other aspect of the standard that the Postgres implementation
 does not currently support is the fact that unnest is supposed to be
 defined in terms of laterally derived subqueries, e.g. you should be
 able to unnest another element from a from list entry laterally on
 the left.
 
 CREATE TABLE t1(id int, values int[]); SELECT id, a FROM t1
 UNNEST(values) as u(a);
 
 If you consider it in terms of LATERAL, which Postgres also doesn't
 support, then you may find that it works out much more cleanly to
 consider the multi-array unnest in terms of that rather than in
 terms of an outer join.  Specifically since arrays are implicitly
 ordered on their ordinality a simple array lookup is much
 easier/more efficient than performing a full fledged join operator.
 
 E.g. the rewrite is: SELECT id, values[i] as a FROM t1
 LATERAL(SELECT generate_series(array_lower(values, 1),
 array_upper(values, 1) ) as lat(i);
 
 But then LATERAL support is something that has been discussed on and
 off for a while without seeing much progress.

Is LATERAL something you'd like to put preliminary support in for? :)

Cheers,
David.
-- 
David Fetter da...@fetter.org http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] Custom code int(32|64) = text conversions out of performance reasons

2010-11-19 Thread Robert Haas
On Fri, Nov 19, 2010 at 4:16 PM, Andres Freund and...@anarazel.de wrote:
 On Monday 15 November 2010 17:12:25 Robert Haas wrote: I notice that int8out
 isn't terribly consistent with int2out and
 int4out, in that it does an extra copy.   Maybe that's justified given
 the greater potential memory wastage, but I'm not certain.  One
 approach might be to pick some threshold value and allocate a buffer
 in one of two sizes based on how large the value is relative to that
 cutoff.  But that might also be a stupid idea, not sure.
 I removed the extra buffer - its actually a tiny bit faster without it  (I
 guess the allocation pattern is a bit nicer during copy as it will always take
 the same paths and eventually the same address).
 I couldn't measure any difference memory-usage wise.

 The code was that way before btw.

Yeah, I know.  After further thought I decided not to commit this
part, because using 32 bytes when you only need 8 is sort of sucky.
I'm not sure if it matters in real life, but if it's only a tiny
speedup I guess I might as well play it safe.

 It would speed things up for me if you or someone else could take a
 quick pass over what remains here and fix the formatting and
 whitespace to be consistent with our general project style, and make
 the comment headers more consistent among the functions being
 added/modified.
 I think I did most of those - the function comments in numutils weren't
 consistent before - now its consistent with the unchanged pg_atoi.

 Thanks for reviewing/applying the first part,

Sure thing.  Thanks for taking time to do this - very nice speedup.
This part now committed, too.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Fix for seg picksplit function

2010-11-19 Thread Robert Haas
On Tue, Nov 16, 2010 at 6:07 AM, Alexander Korotkov
aekorot...@gmail.com wrote:
 On Tue, Nov 16, 2010 at 3:07 AM, Robert Haas robertmh...@gmail.com wrote:
 But on a broader note, I'm not very certain the sorting algorithm is
 sensible.  For example, suppose you have 10 segments that are exactly
 '0' and 20 segments that are exactly '1'.  Maybe I'm misunderstanding,
 but it seems like this will result in a 15/15 split when we almost
 certainly want a 10/20 split.  I think there will be problems in more
 complex cases as well.  The documentation says about the less-than and
 greater-than operators that These operators do not make a lot of
 sense for any practical purpose but sorting.

 In order to illustrate a real problem we should think about
 gist behavior with great enough amount of data. For example, I tried to
 extrapolate this case to 10 of segs where 40% are (0,1) segs and 60% are
 (1,2) segs. And this case doesn't seem a problem for me.

Well, the problem with just comparing on  is that it takes very
little account of the upper bounds.  I think the cases where a simple
split would hurt you the most are those where examining the upper
bound is necessary to to get a good split.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Fwd: What do these terms mean in the SOURCE CODE?

2010-11-19 Thread Vaibhav Kaushal
Is no one ready to help on this? :(

-Vaibhav
-- Forwarded message --
From: Vaibhav Kaushal vaibhavkaushal...@gmail.com
Date: Fri, Nov 19, 2010 at 9:11 PM
Subject: What do these terms mean in the SOURCE CODE?
To: pgsql-hackers@postgresql.org


I am going through the Executor code and come across the following terms
quite often. Can someone tell me what do they mean (in a few (may be a
couple of) sentences)?

1. Scan State
2. Plan State
3. Tuple Projection
4. EState
5. Qual
6. Expression

They sound quite ambiguous in the source code, specially when some of them
already have terms which have multiple meanings.

Thanks for your time.

-Vaibhav (*_*)


Re: [HACKERS] directory archive format for pg_dump

2010-11-19 Thread Joachim Wieland
Hi Jose,

2010/11/19 José Arthur Benetasso Villanova jose.art...@gmail.com:
 The dir format generated in my database 60 files, with different
 sizes, and it looks very confusing. Is it possible to use the same
 trick as pigz and pbzip2, creating a concatenated file of streams?

What pigz is parallelizing is the actual computation of the compressed
data. The directory archive format however is a preparation for a
parallel pg_dump, dumping several tables (especially large tables of
course) in parallel via multiple database connections and multiple
pg_dump frontends. The idea of multiplexing their output into one file
has been rejected on the grounds that it would probably slow down the
whole process.

Nevertheless pigz could be implemented as an alternative compression
algorithm and that way the custom and the directory archive format
could use it, but here as well, license and patent questions might be
in the way, even though it is based on libz.


 The md5.c and kwlookup.c reuse using a link doesn't look nice either.
 This way you need to compile twice, among others things, but I think
 that its temporary, right?

No, it isn't. md5.c is used in the same way by e.g. libpq and there
are other examples for links in core, check out src/bin/psql for
example.

Joachim

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Isn't HANDLE 64 bits on Win64?

2010-11-19 Thread Tom Lane
Magnus Hagander mag...@hagander.net writes:
 On Tue, Nov 16, 2010 at 11:01, Magnus Hagander mag...@hagander.net wrote:
 So yes, it looks completely broken. I guess Windows doesn't actually
 *assign* you a handle larger than 2^32 until you actually ahve that
 many open handles. Typical values on my test system (win64) comes out
 at around 4000 in all tests.

 Patch applied for this and backpatched to 9.0.

I did a bit of googling and found some references claiming that Win64
will never assign system handles that are outside the range
representable as a signed long; and further stating there are standard
macros HandleToLong and LongToHandle to perform those conversions.
So I'd be comfortable with the original coding as long as we used those
macros instead of random casting.  Dunno if you think that'd be cleaner
than what you did.  (It's also a fair question whether those macros
are available on Win32.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] UNNEST ... WITH ORDINALITY (AND POSSIBLY OTHER STUFF)

2010-11-19 Thread Itagaki Takahiro
On Sat, Nov 20, 2010 at 03:48,  caleb.wel...@emc.com wrote:
 Note the standard also supports unnesting multiple arrays concurrently, the 
 rule for handling arrays with different lengths is to use null padding of the 
 shorter array.

   UNNEST( ARRAY[5,2,3,4],
           ARRAY['hello', 'world'] )
   WITH ORDINALITY AS t(a,b,i);

Hmmm, that means we cannot support multi-array unnest() with our
generic aggregate functions. The function prototype might be like
below, but we don't support such definition.

  unnest(anyarray1, anyarray2, ...,
 OUT anyelement1, OUT anyelement2, ...)
  RETURNS SETOF record

So, we would need a special representation for multi-array unnest().

-- 
Itagaki Takahiro

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Isn't HANDLE 64 bits on Win64?

2010-11-19 Thread Tom Lane
Magnus Hagander mag...@hagander.net writes:
 On Tue, Nov 16, 2010 at 16:23, Tom Lane t...@sss.pgh.pa.us wrote:
 What's not clear to me is whether the section title means that only
 certain handles have this guarantee, and if so whether we have to worry
 about running into ones that don't.

 I think it is pretty clear it does - the section has a list of
 different handles at the bottom. What we're using is a File Mapping
 Object, which is not on that list. And which is, AFAICT, not a user or
 gdi handle.

 That doesn't mean it's not guaranteed to be in the 32-bit space, but
 I'm pretty sure that specific page doesn't guarantee it.

Well, the patch as-applied is fine with me.  I just wanted to be sure
we'd considered the alternatives, especially in view of the fact that
we have not seen any clear failures of the previous coding.

The reason this came to mind was
http://archives.postgresql.org/pgsql-admin/2010-11/msg00128.php
which looks for all the world like a handle transmission failure
--- but that person claims to be running Win32, so unless he's
wrong, this particular issue doesn't explain his problem.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers