[HACKERS] Word-smithing doc changes

2011-06-25 Thread Greg Stark
I think this commit was ill-advised:
http://git.postgresql.org/gitweb?p=postgresql.git;a=commitdiff;h=a03feb9354bda5084f19cc952bc52ba7be89f372

 In a concurrent index build, the index is actually entered into the
 system catalogs in one transaction, then the two table scans occur in a
-second and third transaction.
+second and third transaction.  All active transactions at the time the
+second table scan starts, not just ones that already involve the table,
+have the potential to block the concurrent index creation until they
+finish.  When checking for transactions that could still use the original
+index, concurrent index creation advances through potentially interfering
+older transactions one at a time, obtaining shared locks on their virtual
+transaction identifiers to wait for them to complete.


Seems way to implementation-specific and detailed for a user to make
heads or tails of. Except in the sections talking about locking
internals we don't talk about "shared locks on virtual transactions
identifiers" we just talk about waiting for a transaction to complete.
And looping over the transactions one by one is purely an
implementation detail and uninteresting to users. Also it uses
ill-defined terms like "active transactions", "potentially interfering
older transactions", and  "original index" -- from the user's point of
view there's only one index and it just isn't completely built yet.

Are we not yet in string-freeze though? I'll go ahead and edit it if
people don't mind. I'm curious to see the original complaint though.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] spinlock contention

2011-06-25 Thread Greg Stark
On Thu, Jun 23, 2011 at 4:42 PM, Robert Haas  wrote:
> ProcArrayLock looks like a tougher nut to crack - there's simply no
> way, with the system we have right now, that you can take a snapshot
> without locking the list of running processes.  I'm not sure what to
> do about that, but we're probably going to have to come up with
> something, because it seems clear that once we eliminate the lock
> manager LWLock contention, this is a major bottleneck.

Well as Tom observed earlier the kernel of a snapshot is actually a
LSN. A snapshot contains a set of xids which all committed before some
LSN and none which committed after it.

So if we had a record of what log sequence number the commit record
for any given transaction is we could build the snapshot at our
leisure without any exclusive lock. In fact we could even build it
lazily as a kind of cache only when we actually are interested in a
given xid.





-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Range Types, constructors, and the type system

2011-06-25 Thread Jeff Davis
Different ranges over the same subtype make sense when using different
total orders for the subtype. This is most apparent with text collation,
but makes sense (at least mathematically, if not practically) for any
subtype.

For instance:
 [a, Z)
is a valid range in "en_US", but not in "C", so it makes sense to have
multiple ranges over the same subtype with different collations.

But what if you have a function (like a constructor), of the form:
  (anyelement, anyelement) -> anyrange
? To work with the type system, you need to be able to figure out the
return type from the arguments; which means to support functions like
this we need a mapping from the subtype to the range type.
Unfortunately, that restricts us to one range type per subtype (this
isn't a problem for ARRAYs, because there is only one useful array type
for a given element type).

This problem first came up a while ago:
http://archives.postgresql.org/pgsql-hackers/2011-01/msg02788.php

My workaround was to use domains, but that's not a very clean solution
(you have to add a bunch of casts to make sure the right domain is
chosen). It became entirely unworkable with collations, because people
would be using different text collations a lot more frequently than,
say, a different ordering for timestamptz. Tom mentioned that here:

http://archives.postgresql.org/message-id/24831.1308579...@sss.pgh.pa.us

I think Florian proposed the most promising line of attack here:

http://archives.postgresql.org/message-id/ad4fc75d-db99-48ed-9082-52ee3a4d7...@phlo.org

by suggesting that functions of the form:
  (anyelement, [other non-anyrange arguments]) -> anyrange
might be expendable. After all, they are only useful for constructors as
far as we can tell. Other range functions will have an anyrange
parameter, and we can use the actual type of the argument to know the
range type (as well as the subtype).

Although it's very nice to be able to say:
  range(1,10)
and get an int4range out of it, it's not the only way, and it's not
without its problems anyway. For instance, to get an int8range you have
to do:
  range(1::int8, 10::int8)
or similar.

So, we could just make functions like:
  int4range(int4, int4)
  int8range(int8, int8)
  ...
when creating the range type, and it would actually be a usability
improvement.

There are at least a few constructors that would need to be made for
each rangetype: the constructor above, the singleton constructor,
constructors that have infinite bounds, the empty constructor, and all
of the permutations for inclusivity/exclusivity. That adds up to quite a
few catalog entries per range type.

We could reduce some of the permutations by using extra arguments
somehow, but that seems like it adds to the ugliness. This might also be
a time to revisit whether there is a better way to present all of these
constructors (rather than the _[co][co] suffixes to names, etc.).

Even if we're willing to put up with a bunch of catalog entries, it will
take a little creativity to figure out how to run the functions
generically from a fixed set of C functions.

Are there other thoughts or ideas about eliminating the need for generic
constructors like range()?

Another idea Florian suggested (in the same email) was the ability to
declare the return type of a function, and then use the declared type to
infer the argument types. That would be nice because you would just have
to do:
  range(1,10)::int8range
However, that's kind of backwards from how our type inference system
works now, and sounds like a big change.

Maybe we could consider a syntax addition for constructing range values?
That was kicked around briefly, but perhaps we should revisit the
possibilities there.

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] possible connection leak in dblink?

2011-06-25 Thread Peter Eisentraut
On lör, 2011-06-25 at 13:36 -0700, Joe Conway wrote:
> However, since this is really just a case of unused variables and not
> a leaked connection, I'm inclined to just fix git master -- comments
> on that?

Please put it into 9.1 as well, so we can get a clean compile with gcc
4.6 there.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] possible connection leak in dblink?

2011-06-25 Thread Joe Conway
On 06/14/2011 07:41 PM, Fujii Masao wrote:
> On Wed, Jun 15, 2011 at 5:34 AM, Peter Eisentraut  wrote:
>> Otherwise the connection might not get freed.  Could someone verify
>> that?
> 
> ISTM that the root problem is that dblink_send_query calls DBLINK_GET_CONN
> though it doesn't accept the connection string as an argument. Since the first
> argument in dblink_send_query must be the connection name, dblink_send_query
> should call DBLINK_GET_NAMED_CONN instead. The variable 'freeconn' is used
> only when DBLINK_GET_CONN is called. So, if dblink_send_query uses
> DBLINK_GET_NAMED_CONN instead, the variable 'freeconn' is no longer necessary.
> 
> The similar problem exists in dblink_get_result and dblink_record_internal.
> Attached patch fixes those problems.

Fujii's assessment looks correct, although arguably the change is
unnecessary for dblink_record_internal.

Looks like the issue with dblink_send_query goes back through 8.4, while
dblink_record_internal could be fixed as far back as 8.2.

However, since this is really just a case of unused variables and not a
leaked connection, I'm inclined to just fix git master -- comments on that?

Joe

-- 
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting, & 24x7 Support



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] Repeated PredicateLockRelation calls during seqscan

2011-06-25 Thread Kevin Grittner
Heikki Linnakangas  wrote:
 
> BTW, isn't bitgetpage() in nodeBitmapHeapscan.c missing
> PredicateLockTuple() and CheckForSerializableConflictOut() calls in
> the codepath for a lossy bitmap? In the non-lossy case,
> heap_hot_search_buffer() takes care of it, but not in the lossy
> case.
 
I think the attached addresses that.
 
In looking this over I noticed something else that doesn't seem quite
right.  In heapam.c there are two places where the execution of
PredicateLockTuple() is conditioned not just on MVCC visibility, but
also on HeapKeyTest().  I think those calls should be moved to not be
conditioned on that.  Otherwise we have a predicate condition being
tested without "locking the gaps", don't we?
 
-Kevin




ssi-lossy-bitmap.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] heap_hot_search_buffer refactoring

2011-06-25 Thread Jeff Davis
On Fri, 2011-06-24 at 15:32 -0400, Robert Haas wrote:
> On Sun, Jun 19, 2011 at 2:16 PM, Robert Haas  wrote:
> > New patch attached, with that one-line change.
> 
> Jeff, are you planning to review this further?  Do you think it's OK to 
> commit?

1. Patch does not apply to master cleanly, and it's in unified format
(so I can't compare it against the old patch very easily). This review
is for the first patch, disregarding the "skip = !first_call" issue that
you already fixed. If you had other changes in the latest version,
please repost the patch.

2. Comment above heap_hot_search_buffer should be updated to document
that heapTuple is an out-parameter and document the behavior of
first_call

3. The logic around "skip" is slightly confusing to me. Here's my
description: if it's not an MVCC snapshot and it's not the first call,
then you don't actually want to fetch the tuple with the given tid or a
later one in the chain -- you want to fetch the _next_ tuple in the
chain or a later one in the chain. Some wording of that description in a
comment (either in the function's comment or near the use of "skip")
would help a lot. Also, if skip is true, then the tid _must_ be visible
according to the (non-MVCC) snapshot, correct? It might help if that was
apparent from the code/comments.

Other than that, it looks good.

Regards,
Jeff Davis




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] debugging tools inside postgres

2011-06-25 Thread Martijn van Oosterhout
On Fri, Jun 24, 2011 at 02:35:08PM +0800, HuangQi wrote:
> Hi,
>I'm trying to debug a modification for the query planner. But I found it
> seems the data structure of my planned query is incorrect. I was trying to
> print out the data structure by use the "p" command in gdb which is quite
> inconvenient and takes time. May I know is there any embedded function in
> postgres to print out the node data structures, or any other plan related
> data structures? Thanks.

I don't know if anyone has done it, but recent versions of gdb
apparenly can use python scripts, and use them to dump c++ library
structures in readable formats.  I guess someone could write some
script to make debugging postgresql nicer (pretty printing snapshots,
locks, etc).

Have a nice day,
-- 
Martijn van Oosterhout  http://svana.org/kleptog/
> Patriotism is when love of your own people comes first; nationalism,
> when hate for people other than your own comes first. 
>   - Charles de Gaulle


signature.asc
Description: Digital signature


Re: [HACKERS] WIP: Fast GiST index build

2011-06-25 Thread Alexander Korotkov
On Sat, Jun 25, 2011 at 11:03 AM, Jesper Krogh  wrote:

> * Wouldn't it be natural to measure the performance benefits of
>   disc-bound tests in an SSD setup?
>
Sure, it would be great to run performance tests on SSD drives too.
Unfortunately, I don't have corresponding test platform just now.

... my understanding of Fast gi(n|st) index build is that it is
> more or less a challenge to transform a lot of random IO workload
> to be more sequential and collapse multiple changes into fewer.
>
The main benefit of proposed algorithm is to greatly reduce number IO
operations during index build due to dealing with great number of index
tuples simultaneously. And it also makes some IO more sequential. I haven't
precise measures yet, but I belive that contribution of making IO more
sequantial is not very significant.


> In terms of random IO an SSD can easily be x100 better than rotating
> drives and it would be a shame to optimize "against" that world?
>
Actually, I'm not sure that IO is bottle neck of GiST index build on SSD
drives. It's more likely for me that CPU becomes a bottle neck in this case
and optimizing IO can't give much benefit. But anyway, the value of this
work can be in producing better index in some cases and SSD drive lifetime
economy due to less IO operations.

--
With best regards,
Alexander Korotkov.


Re: [HACKERS] WIP: Fast GiST index build

2011-06-25 Thread Jesper Krogh

On 2011-06-06 09:42, Heikki Linnakangas wrote:
took about 15 hours without the patch, and 2 hours with it. That's 
quite dramatic.


With the precense of robust consumer-class SSD-drives that can be
found in sizes where they actually can fit "many" database usage
scenarios. A PostgreSQL version is not likely to hit the streets before
50% of PostgreSQL users are sitting on "some kind" of flash based
storage (for the part where the entire dataset doesn't fit in memory
any more). Point is:

* Wouldn't it be natural to measure the performance benefits of
   disc-bound tests in an SSD setup?

... my understanding of Fast gi(n|st) index build is that it is
more or less a challenge to transform a lot of random IO workload
to be more sequential and collapse multiple changes into fewer.

In terms of random IO an SSD can easily be x100 better than rotating
drives and it would be a shame to optimize "against" that world?

--
Jesper

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers