Re: [HACKERS] switch UNLOGGED to LOGGED

2011-05-10 Thread Leonardo Francalanci
 Yes, that seems like a very appealing approach.   There is plenty of
 bit-space available in xinfo, and we could reserve a bit  each for
 nrels, nsubxacts, and nmsgs, with set meaning that an integer count  of
 that item is present and clear meaning that the count is omitted  from
 the structure (and zero).  This will probably require a bit of  tricky
 code reorganization so I think it should be done separately from  the
 main patch.  

Ok, I'll try and send a patch with this change only.
BTW  xinfo  is 32 bit long, but I think only 2 bits are used right now?
I think I can make it a 8 bits, and add another 8 bits for nrels,
nsubxacts, and nmsgs and the new thing. That should save
another 2 bytes, while leaving space for extention. Or we can make
it a 8 bits only, but only 2 bits would be left empty for future
extentions; I don't know if we care about it...


Leonardo


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Formatting Curmudgeons WAS: MMAP Buffers

2011-05-10 Thread Heikki Linnakangas

On 10.05.2011 04:43, Greg Smith wrote:

Josh Berkus wrote:

As I don't think we can change this, I think the best answer is to
tell people
Don't submit a big patch to PostgreSQL until you've done a few small
patches first. You'll regret it.


When I last did a talk about getting started writing patches, I had a
few people ask me afterwards if I'd ever run into problems with having
patch submissions rejected. I said I hadn't. When asked what my secret
was, I told them my first serious submission modified exactly one line
of code[1]. And *that* I had to defend in regards to its performance
impact.[2]

Anyway, I think the intro message should be Don't submit a big patch to
PostgreSQL until you've done a small patch and some patch review
instead though.


Well, my first patch was two-phase commit. And I had never even used 
PostgreSQL before I dived into the source tree and started to work on 
that. I did, however, lurk on the pgsql-hackers mailing list for a few 
months before posting, so I knew the social dynamics. I basically did 
exactly what Robert described elsewhere in this thread, and successfully 
avoided the culture shock.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Process wakeups when idle and power consumption

2011-05-10 Thread Peter Geoghegan
On 10 May 2011 02:58, Fujii Masao masao.fu...@gmail.com wrote:
 Alright. I'm currently working on a proof-of-concept implementation of
 that. In the meantime, any thoughts on how this should meld with the
 existing latch implementation?

 How about making WaitLatch monitor the file descriptor for the pipe
 by using select()?

Alright, so it's reasonable to assume that all clients of the latch
code are happy to be invariably woken up on Postmaster death?

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Process wakeups when idle and power consumption

2011-05-10 Thread Heikki Linnakangas

On 10.05.2011 11:22, Peter Geoghegan wrote:

On 10 May 2011 02:58, Fujii Masaomasao.fu...@gmail.com  wrote:

Alright. I'm currently working on a proof-of-concept implementation of
that. In the meantime, any thoughts on how this should meld with the
existing latch implementation?


How about making WaitLatch monitor the file descriptor for the pipe
by using select()?


Alright, so it's reasonable to assume that all clients of the latch
code are happy to be invariably woken up on Postmaster death?


That doesn't sound like a safe assumption. All the helper processes 
should die quickly on postmaster death, but I'm not sure if that holds 
for all inter-process communication. I think the caller needs to specify 
if he wants that or not.



Once you add that to the WaitLatchOrSocket function, it's quite clear 
that the API is getting out of hand. There's five different events that 
can wake it up:


* latch is set
* a socket becomes readable
* a socket becomes writeable
* timeout
* postmaster dies

I think we need to refactor the function into something like:

#define WL_LATCH_SET1
#define WL_SOCKET_READABLE 2
#define WL_SOCKET_WRITEABLE 4
#define WL_TIMEOUT  8
#define WL_POSTMASTER_DEATH 16

int WaitLatch(Latch latch, int events, long timeout)

Where 'event's is a bitmask of events that should cause a wakeup, and 
return value is a bitmask identifying which event(s) caused the call to 
return.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Formatting Curmudgeons WAS: MMAP Buffers

2011-05-10 Thread Pavan Deolasee
On Tue, May 10, 2011 at 1:46 PM, Heikki Linnakangas 
heikki.linnakan...@enterprisedb.com wrote:

 On 10.05.2011 04:43, Greg Smith wrote:

 Josh Berkus wrote:

 As I don't think we can change this, I think the best answer is to
 tell people
 Don't submit a big patch to PostgreSQL until you've done a few small
 patches first. You'll regret it.


 When I last did a talk about getting started writing patches, I had a
 few people ask me afterwards if I'd ever run into problems with having
 patch submissions rejected. I said I hadn't. When asked what my secret
 was, I told them my first serious submission modified exactly one line
 of code[1]. And *that* I had to defend in regards to its performance
 impact.[2]

 Anyway, I think the intro message should be Don't submit a big patch to
 PostgreSQL until you've done a small patch and some patch review
 instead though.


 Well, my first patch was two-phase commit. And I had never even used
 PostgreSQL before I dived into the source tree and started to work on that.
 I did, however, lurk on the pgsql-hackers mailing list for a few months
 before posting, so I knew the social dynamics. I basically did exactly what
 Robert described elsewhere in this thread, and successfully avoided the
 culture shock.


Yeah, probably same for me, though I got a lot of support from existing
hackers during my first submission. But it was a tiring experience for sure.
I would submit a patch and then wait anxiously for any comments. I used to
get a lot of interesting and valuable comments, but would know that unless
one of the very few (Tom ?) members say something, good or bad, it won't go
anywhere and those comments did not come in the early days/months. I was an
unknown name and what I was trying to do was very invasive. So when I look
back now, I can understand the reluctance on other members to get excited
about the work. Most often they would see something in the design or the
patch which is completely stupid and they would loose all interest at the
very moment.

Since I had backing of EnterpriseDB and it was my paid job, it was much
easier to keep the enthusiasm, but I wouldn't be surprised if few others
would have turned their back to the project forever.

Fortunately, things have changed for better now. I think the entire commit
fest business is good. Also, we now have a lot more hackers with expertise
in different areas and with influential opinions. Its very likely that if
you submit an idea or a patch, you would get some
comment/suggestion/criticism very early.

Since HOT is mentioned often in these discussions, I thought I should share
my experience.

Thanks,
Pavan

-- 
Pavan Deolasee
EnterpriseDB http://www.enterprisedb.com


Re: [HACKERS] Process wakeups when idle and power consumption

2011-05-10 Thread Peter Geoghegan
On 10 May 2011 09:45, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:

 I think we need to refactor the function into something like:

 #define WL_LATCH_SET    1
 #define WL_SOCKET_READABLE 2
 #define WL_SOCKET_WRITEABLE 4
 #define WL_TIMEOUT      8
 #define WL_POSTMASTER_DEATH 16

While I agree with the need to not box ourselves into a corner on the
latch interface by making sweeping assumptions, isn't the fact that a
socket became readable or writable strictly an implementation detail?

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Process wakeups when idle and power consumption

2011-05-10 Thread Robert Haas
On Tue, May 10, 2011 at 5:14 AM, Peter Geoghegan pe...@2ndquadrant.com wrote:
 On 10 May 2011 09:45, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:

 I think we need to refactor the function into something like:

 #define WL_LATCH_SET    1
 #define WL_SOCKET_READABLE 2
 #define WL_SOCKET_WRITEABLE 4
 #define WL_TIMEOUT      8
 #define WL_POSTMASTER_DEATH 16

 While I agree with the need to not box ourselves into a corner on the
 latch interface by making sweeping assumptions, isn't the fact that a
 socket became readable or writable strictly an implementation detail?

The thing about the socket being readable/writeable is needed for
walsender.  It needs to notice when its connection to walreceiver is
writeable (so it can send more WAL) or readable (so it can receive a
reply message).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] switch UNLOGGED to LOGGED

2011-05-10 Thread Robert Haas
On Tue, May 10, 2011 at 3:35 AM, Leonardo Francalanci m_li...@yahoo.it wrote:
 Yes, that seems like a very appealing approach.   There is plenty of
 bit-space available in xinfo, and we could reserve a bit  each for
 nrels, nsubxacts, and nmsgs, with set meaning that an integer count  of
 that item is present and clear meaning that the count is omitted  from
 the structure (and zero).  This will probably require a bit of  tricky
 code reorganization so I think it should be done separately from  the
 main patch.

 Ok, I'll try and send a patch with this change only.
 BTW  xinfo  is 32 bit long, but I think only 2 bits are used right now?
 I think I can make it a 8 bits, and add another 8 bits for nrels,
 nsubxacts, and nmsgs and the new thing. That should save
 another 2 bytes, while leaving space for extention. Or we can make
 it a 8 bits only, but only 2 bits would be left empty for future
 extentions; I don't know if we care about it...

I don't think making xinfo shorter will save anything, because
whatever follows it is going to be a 4-byte quantity and therefore
4-byte aligned.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] switch UNLOGGED to LOGGED

2011-05-10 Thread Leonardo Francalanci
 I don't  think making xinfo shorter will save anything, because
 whatever follows it is  going to be a 4-byte quantity and therefore
 4-byte aligned.


ups, didn't notice it.

I'll splitxinfo into:
 
uint16   xinfo;
uint16   presentFlags;


I guess it helps with the reading? I mean, instead
of having a single uint32?

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] crash-safe visibility map, take five

2011-05-10 Thread Robert Haas
On Mon, May 9, 2011 at 10:25 PM, Merlin Moncure mmonc...@gmail.com wrote:
 On Fri, May 6, 2011 at 5:47 PM, Robert Haas robertmh...@gmail.com wrote:
 On Wed, Mar 30, 2011 at 8:52 AM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:
 Another question:
 To address the problem in
 http://archives.postgresql.org/pgsql-hackers/2010-02/msg02097.php
 , should we just clear the vm before the log of insert/update/delete?
 This may reduce the performance, is there another solution?

 Yeah, that's a straightforward way to fix it. I don't think the performance
 hit will be too bad. But we need to be careful not to hold locks while doing
 I/O, which might require some rearrangement of the code. We might want to do
 a similar dance that we do in vacuum, and call visibilitymap_pin first, then
 lock and update the heap page, and then set the VM bit while holding the
 lock on the heap page.

 Here's an attempt at implementing the necessary gymnastics.

 Is there a quick synopsis of why you have to do (sometimes) the
 pin-lock-unlock-pin-lock mechanic? How come you only can fail to
 get the pin at most once?

I thought I'd explained it fairly thoroughly in the comments, but
evidently not.  Suggestions for improvement are welcome.

Here goes in more detail: Every time we insert, update, or delete a
tuple in a particular heap page, we must check whether the page is
marked all-visible.  If it is, then we need to clear the page-level
bit marking it as all-visible, and also the corresponding page in the
visibility map.  On the other hand, if the page isn't marked
all-visible, then we needn't touch the visibility map at all.  So,
there are either one or two buffers involved: the buffer containing
the heap page (buffer) and possibly also a buffer containing the
visibility map page in which the bit for the heap page is to be found
(vmbuffer).   Before taking an exclusive content-lock on the heap
buffer, we check whether the page appears to be all-visible.  If it
does, then we pin the visibility map page and then lock the buffer.
If not, we just lock the buffer.  However, since we weren't holding
any lock, it's possible that between the time when we checked the
visibility map bit and the time when we obtained the exclusive
buffer-lock, the visibility map bit might have changed from clear to
set (because someone is concurrently running VACUUM on the table; or
on platforms with weak memory-ordering, someone was running VACUUM
almost concurrently).  If that happens, we give up our buffer lock,
go pin the visibility map page, and reacquire the buffer lock.

At this point in the process, we know that *if* the page is marked
all-visible, *then* we have the appropriate visibility map page
pinned.  There are three possible pathways: (1) If the buffer
initially appeared to be all-visible, we will have pinned the
visibility map page before acquiring the exclusive lock; (2) If the
buffer initially appeared NOT to be all-visible, but by the time we
obtained the exclusive lock it now appeared to be all-visible, then we
will have done the unfortunate unlock-pin-relock dance, and the
visibility map page will now be pinned; (3) if the buffer initially
appeared NOT to be all-visible, and by the time we obtained the
exclusive lock it STILL appeared NOT to be all-visible, then we don't
have the visibility map page pinned - but that's OK, because in this
case no operation on the visibility map needs to be performed.

Now it is very possible that in case (1) or (2) the visibility map
bit, though we saw it set at some point, will actually have been
cleared in the meantime.  In case (1), this could happen before we
obtain the exclusive lock; while in case (2), it could happen after we
give up the lock to go pin the visibility map page and before we
reacquire it.  This will typically happen when a buffer has been
sitting around for a while in an all-visible state and suddenly two
different backends both try to update or delete tuples in that buffer
at almost exactly the same time.  But it causes no great harm - both
backends will pin the visibility map page, whichever one gets the
exclusive lock on the heap page first will clear it, and when the
other backend gets the heap page afterwards, it will see that the bit
has already been cleared and do nothing further.  We've wasted the
effort of pinning and unpinning the visibility map page when it wasn't
really necessary, but that's not the end of the world.

We could avoid all of this complexity - and the possibility of pinning
the visibility map page needlessly - by locking the heap buffer first
and then pinning the visibility map page if the heap page is
all-visible.  However, that would involve holding the lock on the heap
buffer across a possible disk I/O to bring the visibility map page
into memory, which is something the existing code tries pretty hard to
avoid.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing 

Re: [HACKERS] switch UNLOGGED to LOGGED

2011-05-10 Thread Robert Haas
On Tue, May 10, 2011 at 8:03 AM, Leonardo Francalanci m_li...@yahoo.it wrote:
 I don't  think making xinfo shorter will save anything, because
 whatever follows it is  going to be a 4-byte quantity and therefore
 4-byte aligned.


 ups, didn't notice it.

 I'll split    xinfo into:

 uint16   xinfo;
 uint16   presentFlags;


 I guess it helps with the reading? I mean, instead
 of having a single uint32?

My feeling would be just keep it as uint32.  Breaking it up into
chunks doesn't seem useful to me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] stored procedures - use cases?

2011-05-10 Thread Robert Haas
On Mon, May 9, 2011 at 11:58 PM, Pavel Stehule pavel.steh...@gmail.com wrote:
 no - you are little bit confused :). CALL and function execution
 shares nothing. There is significant differences between function and
 procedure. Function is called only from executor - from some plan, and
 you have to know a structure of result before run. The execution of
 CALL is much simple - you just execute code - without plan and waiting
 for any result - if there is.

Now I'm a little confused, or you are.  Surely any SQL has to be
planned and executed, regardless of whether it appears in a function,
a stored procedure, or anywhere else.  Non-SQL statements within a
stored procedure don't need to go through the planner and executor,
but that's true in PL/python or PL/pgsql or whatever today.

I think people are using the term stored procedures to refer to
approximately whatever it is that they're unhappy that functions don't
allow, and that's leading to a lot of people talking across each
other.  The main features seem to be (1) explicit transaction control
and/or execution of commands like VACUUM that can't be invoked from
within a transaction, (2) autonomous transactions, and (3) returning
multiple result sets.  But I don't think anybody would be desperately
unhappy if it magically became possible to do those things from
regular functions, unlikely as that may seem.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] stored procedures - use cases?

2011-05-10 Thread Pavel Stehule
2011/5/10 Robert Haas robertmh...@gmail.com:
 On Mon, May 9, 2011 at 11:58 PM, Pavel Stehule pavel.steh...@gmail.com 
 wrote:
 no - you are little bit confused :). CALL and function execution
 shares nothing. There is significant differences between function and
 procedure. Function is called only from executor - from some plan, and
 you have to know a structure of result before run. The execution of
 CALL is much simple - you just execute code - without plan and waiting
 for any result - if there is.

 Now I'm a little confused, or you are.  Surely any SQL has to be
 planned and executed, regardless of whether it appears in a function,
 a stored procedure, or anywhere else.  Non-SQL statements within a
 stored procedure don't need to go through the planner and executor,
 but that's true in PL/python or PL/pgsql or whatever today.


CALL statement is util command than SQL. It has to execute some NON SQL code.

You can thinking about CALL statement like synonymum for SELECT, but
it isn't correct (it is my opinion)

The stored procedures was prior stored functions (more corectly UDF
- user defined functions). These old time stored procedures was
simply - it was client code moved on server. Usually these procedures
was executed in different process or different thread. Inside
procedures was full client's side functionality and there wasn't a
network overhead. CALL statement is +/- remote call. It isn't SQL
statement.

 I think people are using the term stored procedures to refer to
 approximately whatever it is that they're unhappy that functions don't
 allow, and that's leading to a lot of people talking across each
 other.  The main features seem to be (1) explicit transaction control
 and/or execution of commands like VACUUM that can't be invoked from
 within a transaction, (2) autonomous transactions, and (3) returning
 multiple result sets.  But I don't think anybody would be desperately
 unhappy if it magically became possible to do those things from
 regular functions, unlikely as that may seem.


yes.

@2 Autonomous transaction doesn't need stored procedures. Autonomous
transaction can be isolated by function's flag, by some special
PL/pgSQL statement:

like

BEGIN
  EXECUTE AUTONOMOUS ''
END;

@3 is possible now too - but not too much user friendly. Point 3 is
strange. Oracle doesn't support it. Support in DB2 is little bit
strange. And it is well supported by MySQL, MSSQL, maybe Informix,
Sybase.

Pavel

Regards

Pavel

 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] the big picture for index-only scans

2011-05-10 Thread Robert Haas
On Mon, May 9, 2011 at 10:36 PM, Merlin Moncure mmonc...@gmail.com wrote:
 1. The visibility map needs to be crash-safe.  The basic idea of
 index-only scans is that, instead of checking the heap to find out
 whether each tuple is visible, we first check the visibility map.  If
 the visibility map bit is set, then we know all tuples on the page are
 visible to all transactions, and therefore the tuple of interest is
 visible to our transaction.  Assuming that a significant number of
 visibility map bits are set, this should enable us to avoid a fair
 amount of I/O, especially on large tables, because the visibility map
 is roughly 8000 times smaller than the heap, and therefore far more
 practical to keep in cache.

 hm, what are the implications for tuple hint bits, short and long
 term?  I'm particularly interested if you think any hint bit i/o
 mitigation strategies are worth pursuing.

Well, I don't really want to let this thread on my project get
hijacked to talk about your project (not that I haven't been guilty of
that myself!) but, in brief, I think the main effect of index-only
scans is that the performance difference between a vacuumed table and
an unvacuumed table is going to increase.  It's already the case that
sequential scanning a table which has been vacuumed (and, therefore,
all the pages are marked all-visible) is noticeably faster than
sequential scanning a table which is not vacuumed (even if all the
hint bits are set).  Index-only scans are going to extend that by
making index scans run faster on a table with lots of all-visible
tables than on one where no pages are all-visible.  So the importance
of vacuuming an insert-only table occasionally (which autovacuum won't
do, at present, until it's needed to prevent XID wraparound) is
already more than zero, and it's going to go up.  But the all-visible
bits aren't quite the same as hint bits: I don't think there's any
impact on hint bits per se.

 2. Crash safe visibility map vs. pg_upgrade.  Even if we make the
 visibility map crash-safe in 9.2, people are going to want to use
 pg_upgrade to migrate from older versions, bringing their
 possibly-not-quite-correct visibility map forks along with them.  How
 should we handle that?  We could (2A) arrange to have pg_upgrade nuke
 all visibility forks when upgrading from a release where the
 visibility map is not crash-safe to one where it is;

 +1 on 2A.

OK.  Anybody else?

 3. Statistics.  I believe that in order to accurately estimate the
 cost of an index-only scan, we're going to need to know the fraction
 of tuples that are on pages whose visibility map bits are set.

 It would be helpful to know the performance benefit of index only
 scans before knowing how much benefit to attribute here.  Maybe a
 system wide kludge would for starters anyway, like assuming 60% of
 pages can be vis checked from the VM, or a single GUC, Then again,
 maybe not.

Yeah, maybe I should try to beat the main patch into some kind of
shape before working too much on the statistics stuff.  Then we could
actually benchmark it a bit, which would be good.  I don't think that
a system-wide kludge or GUC is going to work for prime time, but it's
probably fine for initial performance testing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] crash-safe visibility map, take five

2011-05-10 Thread Merlin Moncure
On Tue, May 10, 2011 at 7:48 AM, Robert Haas robertmh...@gmail.com wrote:
 I thought I'd explained it fairly thoroughly in the comments, but
 evidently not.  Suggestions for improvement are welcome.

ok.  that clears it up nicely.

 Here goes in more detail: Every time we insert, update, or delete a
 tuple in a particular heap page, we must check whether the page is
 marked all-visible.  If it is, then we need to clear the page-level
 bit marking it as all-visible, and also the corresponding page in the
 visibility map.  On the other hand, if the page isn't marked
 all-visible, then we needn't touch the visibility map at all.  So,
 there are either one or two buffers involved: the buffer containing
 the heap page (buffer) and possibly also a buffer containing the
 visibility map page in which the bit for the heap page is to be found
 (vmbuffer).   Before taking an exclusive content-lock on the heap
 buffer, we check whether the page appears to be all-visible.  If it
 does, then we pin the visibility map page and then lock the buffer.
 If not, we just lock the buffer.

I see: here's a comment that was throwing me off:
+   /*
+* If we didn't get the lock and it turns out we need it, we'll have to
+* unlock and re-lock, to avoid holding the buffer lock across an I/O.
+* That's a bit unfortunate, but hopefully shouldn't happen often.
+*/

I think that might be phrased as didn't get the pin and it turns out
we need it because the bit can change after inspection.  The visible
bit isn't 'wrong' as suggested in the comments, it just can change so
that it becomes wrong.  Maybe a note of why it could change would be
helpful.

Other than that, it looks pretty good...ISTM an awfully small amount
of code to provide what it's doing (that's a good thing!).

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] the big picture for index-only scans

2011-05-10 Thread Merlin Moncure
On Tue, May 10, 2011 at 8:22 AM, Robert Haas robertmh...@gmail.com wrote:
 On Mon, May 9, 2011 at 10:36 PM, Merlin Moncure mmonc...@gmail.com wrote:
 1. The visibility map needs to be crash-safe.  The basic idea of
 index-only scans is that, instead of checking the heap to find out
 whether each tuple is visible, we first check the visibility map.  If
 the visibility map bit is set, then we know all tuples on the page are
 visible to all transactions, and therefore the tuple of interest is
 visible to our transaction.  Assuming that a significant number of
 visibility map bits are set, this should enable us to avoid a fair
 amount of I/O, especially on large tables, because the visibility map
 is roughly 8000 times smaller than the heap, and therefore far more
 practical to keep in cache.

 hm, what are the implications for tuple hint bits, short and long
 term?  I'm particularly interested if you think any hint bit i/o
 mitigation strategies are worth pursuing.

 Well, I don't really want to let this thread on my project get
 hijacked to talk about your project (not that I haven't been guilty of
 that myself!)

no, that wasn't my intent at all, except in the sense of wondering if
a crash-safe visibility map provides a route of displacing a lot of
hint bit i/o and by extension, making alternative approaches of doing
that, including mine, a lot less useful.  that's a good thing.

meaning: since the vis map approach is going to be a fairly large win
over the classic approach to checking visibility in so many scenarios,
maybe the real long term goal should be just being as aggressive as
possible in terms of making sure it's set properly, and just give up
and be a bit more brute forcey when it's not set.  it's a fair
question.  that's a pretty broad statement, but that's what I'm
thinking about.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] crash-safe visibility map, take five

2011-05-10 Thread Robert Haas
On Tue, May 10, 2011 at 9:45 AM, Merlin Moncure mmonc...@gmail.com wrote:
 I see: here's a comment that was throwing me off:
 +       /*
 +        * If we didn't get the lock and it turns out we need it, we'll have 
 to
 +        * unlock and re-lock, to avoid holding the buffer lock across an I/O.
 +        * That's a bit unfortunate, but hopefully shouldn't happen often.
 +        */

 I think that might be phrased as didn't get the pin and it turns out
 we need it because the bit can change after inspection.  The visible
 bit isn't 'wrong' as suggested in the comments, it just can change so
 that it becomes wrong.  Maybe a note of why it could change would be
 helpful.

Oh, I see.  I did write lock when I meant pin, and your other
point is well-taken as well.  Here's a revised version with some
additional wordsmithing.

 Other than that, it looks pretty good...ISTM an awfully small amount
 of code to provide what it's doing (that's a good thing!).

Thanks.  It's definitely not big in terms of code footprint; it's
mostly a matter of making sure we've dotted all the is and crossed
all the ts.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


visibility-map-v3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] ts_rank

2011-05-10 Thread Mark
Could somebody explain me on which methods is based ts_rank and how it works?
I would appreciate some articles, if exist. 
Thanks a lot for reply.
Mark

--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/ts-rank-tp4384120p4384120.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [v9.2] Leaky view and RLS

2011-05-10 Thread Kohei Kaigai
I'd like to summarize expected issues corresponding to leaky-view and RLS
towards v9.2, and PGcon2011/Developer Meeting.

We already made consensus the leaky-view is a problem to be fixed previous
to the row-level security feature.

We know several ways to leak/infer contents of tuples to be invisible using
a view that is defined to row-level security purpose.

[1] User defined functions with small-cost and side-effects (E.g error message)

If these functions are appended to WHERE clause, it may be executed earlier
than functions to be performed as row-level security policy of security views.
These function can take arguments that references either visible or invisible
tuples, so functions with side-effects enables to leak the contents of invisible
tuples, when it was invoked earlier than conditions to filter out.

[2] Iteration in proving PK/FK or UNIQUE constratins

This type of iteration enables to estimate invisible values. E.g, fails to 
insert
a tuple with a particular primary-key gives us a hint of existence of invisible
tuple. We made consensus that RLS does not handle this type of scenario called 
as
covert-channels. The point is user cannot see the hidden value directly.

[3] Reference to statistics delivered from table contents

One example was selectivity-estimator function; Tom mentioned about before.
The pg_statistic holds statistical information delivered from table contents,
so it may help users to infer the contents using its histograms.
The discussion didn't get hot at that time. However, the point of mine is same
as the reason why we don't handle covert-channels.
The statistical is irreversible translation from the original data, so we need
an intelligence process to infer them, not a direct data reference.


We also had a discussion about a principle what type of scenarios should be
considered as problem. 
One was that we didn't consider it is not a problem if a function internally
references invisible rows, as long as it consumes them internally. Thus, we
considered index-access-methods can be launched earlier than functions to
filter out violated rows.
Second was that we didn't consider it is not a problem if RLS allows users
to infer invisible rows from the circumstances, as long as it is not possible
to dump invisible data directly, because its bandwidth to leak information
was quite slow. So, we decided not to handle covert-channel such as iteration
of PK/FK proving in RLS.

I still think the scenario [1] is the only problem to be solved prior to RLS
features. The scenario [2] and [3] don't allow to leak the original data
directly. In addition, the estimator function mentioned in [3] just references
statistical data internally. It is same situation with index-access-method.

Please give us the points at issue, if I now overlooked.

Thanks,
--
NEC Europe Ltd, SAP Global Competence Center
KaiGai Kohei kohei.kai...@eu.nec.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] collateral benefits of a crash-safe visibility map

2011-05-10 Thread Robert Haas
On Tue, May 10, 2011 at 9:59 AM, Merlin Moncure mmonc...@gmail.com wrote:
 no, that wasn't my intent at all, except in the sense of wondering if
 a crash-safe visibility map provides a route of displacing a lot of
 hint bit i/o and by extension, making alternative approaches of doing
 that, including mine, a lot less useful.  that's a good thing.

Sadly, I don't think it's going to have that effect.  The
page-is-all-visible bits seem to offer a significant performance
benefit over the xmin-committed hint bits; but the benefit of
xmin-committed all by itself is too much to ignore.  The advantages of
the xmin-committed hint bit (as opposed to the all-visible page-level
bit) are:

(1) Setting the xmin-committed hint bit is a much more light-weight
operation than setting the all-visible page-level bit.  It can by done
on-the-fly by any backend, rather than only by VACUUM, and need not be
XLOG'd.
(2) If there are long-running transactions on the system,
xmin-committed can be set much sooner than all-visible - the
transaction need only commit.  All-visible can't be set until
overlapping transactions have ended.
(3) xmin-committed is useful on standby servers, whereas all-visible
is ignored there.  (Note that neither this patch nor index-only scans
changes anything about that: it's existing behavior, necessitated by
different xmin horizons.)

So I think that attempts to minimize the overhead of setting the
xmin-committed bit are not likely to be mooted by anything I'm doing.
Keep up the good work.  :-)

Where I do think that we can possibly squeeze some additional benefit
out of a crash-safe visibility map is in regards to anti-wraparound
vacuuming.  The existing visibility map is used to skip vacuuming of
all-visible pages, but it's not used when XID wraparound is at issue.
The reason is fairly obvious: a regular vacuum only needs to worry
about getting rid of dead tuples (and a visibility map bit being set
is good evidence that there are none), but an anti-wraparound vacuum
also needs to worry about live tuples with xmins that are about to
wrap around from past to future (such tuples must be frozen).  There's
a second reason, too: the visibility map bit, not being crash-safe,
has a small chance of being wrong, and we'd like to eventually get rid
of any dead tuples that slip through the cracks.  Making the
visibility map crash-safe doesn't directly address the first problem,
but it does (if or when we're convinced that it's fairly bug-free)
address the second one.

To address the first problem, what we've talked about doing is
something along the line of freezing the tuples at the time we mark
the page all-visible, so we don't have to go back and do it again
later.  Unfortunately, it's not quite that simple, because freezing
tuples that early would cause all sorts of headaches for hot standby,
not to mention making Tom and Alvaro grumpy when they're trying to
figure out a corruption problem and all the xmins are FrozenXID rather
than whatever they were originally.  We floated the idea of a
tuple-level bit HEAP_XMIN_FROZEN that would tell the system to treat
the tuple as frozen, but wouldn't actually overwrite the xmin field.
That would solve the forensic problem with earlier freezing, but it
doesn't do anything to resolve the Hot Standby problem.  There is a
performance issue to worry about, too: freezing operations must be
xlog'd, as we update relfrozenxid based on the results, and therefore
can't risk losing a freezing operation later on.  So freezing sooner
means more xlog activity for pages that might very well never benefit
from it (if the tuples therein don't stick around long enough for it
to matter).

Nonetheless, I haven't completely given up hope.  The current
situation is that a big table into which new records are slowly being
inserted has to be repeatedly scanned in its entirety for unfrozen
tuples even though only a small and readily identifiable part of it
can actually contain any such tuples, which is clearly less than
ideal.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] improvements to pgtune

2011-05-10 Thread Bruce Momjian

FYI, I can help if you need javascript assistance.

---

Greg Smith wrote:
 Shiv wrote:
   So my exams are over now and am fully committed to the project in 
  terms of time. I have started compiling a sort of personal todo for 
  myself. I agree with your advice to start the project with small steps 
  first. (I have a copy of the code and am trying to glean as much of it 
  as I can)
 
 I just fixed a couple of bugs in the program that were easier to correct 
 than explain.  The code changes have been pushed to the github repo.  
 I've also revised the output format to be a lot nicer.  There's a UI 
 shortcut you may find useful too; the program now takes a single input 
 parameter as the input file, outputting to standard out.
 
 So a sample run might look like this now:
 
 $ ./pgtune postgresql.conf.sample
 [old settings]
 #--
 # pgtune wizard run on 2011-05-08
 # Based on 2060728 KB RAM in the server
 #--
 
 default_statistics_target = 100
 maintenance_work_mem = 120MB
 checkpoint_completion_target = 0.9
 effective_cache_size = 1408MB
 work_mem = 12MB
 wal_buffers = 8MB
 checkpoint_segments = 16
 shared_buffers = 480MB
 max_connections = 80
 
   I would really appreciate your reply to Josh's thoughts. It would 
  help me understand the variety of tasks and a possible ordering for me 
  to attempt them.
  Josh's comments :/ What would you list as the main things pgtune 
  doesn't cover right now?  I have my own list, but I suspect that yours 
  is somewhat different./
  /
  /
  /I do think that autotuning based on interrogating the database is 
  possible.  However, I think the way to make it not be a tar baby is to 
  tackle it one setting at a time, and start with ones we have the most 
  information for.  One of the real challenges there is that some data 
  can be gleaned from pg_* views, but a *lot* of useful performance data 
  only shows up in the activity log, and then only if certain settings 
  are enabled./
 
 I just revised the entire TODO file (which is now TODO.rst, formatted in 
 ReST markup:  http://docutils.sourceforge.net/rst.html ; test with 
 rst2html TODO.rst  TODO.html and look at the result).  It should be 
 easier to follow the flow of now, and it's organized in approximately 
 the order I think things need to get finished in.
 
 There are few major areas for expansion that might happen on this 
 program to choose from.  I was thinking about doing them in this order:
 
 1) Fix the settings validation and limits.  I consider this a good place 
 to start on hacking the code.  it's really necessary work eventually, 
 and it's easier to get started with than the other ideas.
 
 2) Improve internals related to tracking things like memory and 
 connections so they're easier to pass around the program.  Adding a 
 platform class is what I was thinking of.  See the Estimating shared 
 memory usage section of the TODO for more information.  Add PostgreSQL 
 version as another input to that.
 
 3) Improve the settings model used for existing parameters.  Right now 
 people have reported that the work_mem settings suggested in particular 
 are too high for many servers.  Ideas about why that is are in the 
 TODO.  (This really requires the platform change be done first, or the 
 code will be too hard to write/maintain)
 
 4) Estimate memory used by the configuration and output sysctl.conf 
 files.  (Needs platform change too)
 
 5) Add tuning suggestions for new parameters.  The most obvious ideas 
 all involve adding common logging changes.
 
 6) Create some new UIs for running the program.  A text-based program 
 that asked questions (a 'wizard') or a GUI program doing the same are 
 two common suggestions.
 
 The ideas Josh was talking about for interrogating the database for 
 things are all a long ways off from the current state of the code being 
 able to support them.  If (1) through (3) here were done, that whole 
 direction starts with (5) and then runs further that way.  That might be 
 a valid direction to move next instead of the (4), (6) I've listed 
 here.  You'd have finished something that taught enough about how the 
 existing program works to be able to make some more difficult design 
 decisions about fitting new features into it.
 
 If you really want to get right into live server analysis, there's no 
 way for that to fit into the current program yet.  And I don't think 
 you'll get enough practice to see how it would without doing some more 
 basic work first.  You might as well write something new if that's your 
 goal, and expect that you may not finish anything useful by the end of 
 the summer.  If you want to complete a project that results in code that 
 people absolutely will use, the more boring plan I've outlined goes that 
 way.  One of 

Re: [HACKERS] the big picture for index-only scans

2011-05-10 Thread Cédric Villemain
2011/5/10 Robert Haas robertmh...@gmail.com:
 So, what do we need in order to find our way to index-only scans?

 3. Statistics.  I believe that in order to accurately estimate the
 cost of an index-only scan, we're going to need to know the fraction
 of tuples that are on pages whose visibility map bits are set.  I
 believe it should be fairly straightforward to have ANALYZE collect
 this information; and I'm inclined to do that as a separate patch.  It
 seems like it would also be nice to know what fraction of tuples are
 on pages that don't have the visibility map set but where, in fact,
 all tuples on the page are visible to all transactions, so it would be
 legal to set the bit.  A large discrepancy between these two
 percentages might be a good reason to auto-vacuum the table (perhaps
 using a really lazy vacuum[2]).  I'm not sure if this can be added
 cheaply, though, and in any case, any change to the set of criteria
 that will trigger an auto-vacuum is probably a can of worms.
 Thoughts?

ANALYZE can do the stats job for 'free' on the pages it collects
anyway. So that looks like a good idea.
I believe the really lazy vacuum is another topic; even if it will
improve the performance of the index only scan to have tables already
vacuuumed, the stats should expose that and the function
cost_index(_only?)() taking care of that.


 4. Planner and executor changes.  In contrast to Heikki's original
 implementation, I'm inclined to not to try to split the Index Scan
 node into index scan and heap fetch.  Since there are many choices for
 where to put the heap fetch node (any level of the join tree between
 the index scan and the root), this seems likely to result in a
 combinatorial explosion of paths[3], and I'm not real sure that the
 payback will be adequate.  Furthermore, the idea of allowing user code
 to see tuples that will only later be determined not to have been
 visible to that MVCC snapshot in the first place seems pretty scary
 from a security perspective, though certainly there are possible
 benefits[4].  Instead, I'm inclined to just have the planner evaluate
 whether the necessary columns can be extracted purely from the index.

The temptation is high to estimate the cost of an index_scan(only) +
ordered(by ctid) table pages fetch if heap required. (this is what I
understood from heikki suggestion 3-4. and it makes sense). It may be
easier to implement both at once but I didn't find the branch in the
Heikki's git repos. (probably removed since the long time)

 If not, we proceed as now.  If so, we can use the index only
 approach of using the visibility map to decide which heap fetches can
 be skipped.  It's not clear to me whether we need to compare the cost
 of the standard approach with the cost of the index only approach:
 in theory, if there aren't any visibility map bits anyway, the index
 only approach could be slower.  But I'm not sure whether that problem
 is significant or common enough to be worth expending a lot of code
 on.  Either way, the number of actual paths doesn't need to increase,
 because in this design, even if we apply a costing model, one approach
 will dominate the other.  Heikki also suggested considering index
 scans in cases where we don't now[4, again] but I'm inclined to leave
 this, too, for a later optimization, again because balancing the
 increase in paths against the possible performance benefits of using
 indexes in more situations seems finicky.  In short, for a first cut
 at this, I just want to look at this as a way to get cheaper index
 scans, and leave everything else to future work.

Based on ANALYZE stats for the visibility, I believe cost_index and
cost_index_only should be very similar functions (well, atm, I don't
see the point to split it in 2 functions).


 Any thoughts welcome.  Incidentally, if anyone else feels like working
 on this, feel free to let me know and I'm happy to step away, from all
 of it or from whatever part someone else wants to tackle.  I'm mostly
 working on this because it's something that I think we really need to
 get done, more than having a burning desire to be the one who does it.

Indexonly scans are welcome!
I believe I can help on 3 and 4, but (really) not sure for 1 and 2.


 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company

 [1] http://archives.postgresql.org/pgsql-hackers/2011-05/msg00292.php
 [2] http://archives.postgresql.org/pgsql-hackers/2011-03/msg00946.php
 [3] http://archives.postgresql.org/pgsql-hackers/2009-09/msg01379.php
 [4] http://archives.postgresql.org/pgsql-hackers/2009-07/msg00675.php

 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers




-- 
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make 

Re: [HACKERS] the big picture for index-only scans

2011-05-10 Thread Robert Haas
On Tue, May 10, 2011 at 10:58 AM, Cédric Villemain
cedric.villemain.deb...@gmail.com wrote:
 ANALYZE can do the stats job for 'free' on the pages it collects
 anyway. So that looks like a good idea.
 I believe the really lazy vacuum is another topic; even if it will
 improve the performance of the index only scan to have tables already
 vacuuumed, the stats should expose that and the function
 cost_index(_only?)() taking care of that.

I basically agree.  The connection is that - as we use the all-visible
for more things, the performance penalty for failing to vacuum (say)
an insert-only table will continue to grow.  Still, as you say,
clearly a separate topic.

 The temptation is high to estimate the cost of an index_scan(only) +
 ordered(by ctid) table pages fetch if heap required. (this is what I
 understood from heikki suggestion 3-4. and it makes sense). It may be
 easier to implement both at once but I didn't find the branch in the
 Heikki's git repos. (probably removed since the long time)

I was thinking about this as well, at least if I understand you
correctly.  That would be similar to a bitmap index scan, and I think
it would be a great thing to have, not only because it would allow us
to get the advantages of index-only scans in situations that are
well-suited to our current bitmap scans, but also because it could be
batched.  You could allocate a buffer of work_mem bytes and fill it up
with TIDs; then, when it's full, you sort the buffer and start doing
the necessary heap fetches in physical order.  If you still need more
rows, you can clear the buffer and go around for another pass.

 Based on ANALYZE stats for the visibility, I believe cost_index and
 cost_index_only should be very similar functions (well, atm, I don't
 see the point to split it in 2 functions).

Yeah, I would more imagine modifying the existing function.

 Any thoughts welcome.  Incidentally, if anyone else feels like working
 on this, feel free to let me know and I'm happy to step away, from all
 of it or from whatever part someone else wants to tackle.  I'm mostly
 working on this because it's something that I think we really need to
 get done, more than having a burning desire to be the one who does it.

 Indexonly scans are welcome!
 I believe I can help on 3 and 4, but (really) not sure for 1 and 2.

Well, I have code for #1, and just need reviews, and #2 shouldn't be
that hard, and with luck I'll twist Bruce's arm into doing it (*waves
to Bruce*).  So #3 and #4 are the next thing to tackle.  Any thoughts
on what/how you'd like to contribute there?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] the big picture for index-only scans

2011-05-10 Thread Cédric Villemain
2011/5/10 Robert Haas robertmh...@gmail.com:
 On Tue, May 10, 2011 at 10:58 AM, Cédric Villemain
 cedric.villemain.deb...@gmail.com wrote:
 ANALYZE can do the stats job for 'free' on the pages it collects
 anyway. So that looks like a good idea.
 I believe the really lazy vacuum is another topic; even if it will
 improve the performance of the index only scan to have tables already
 vacuuumed, the stats should expose that and the function
 cost_index(_only?)() taking care of that.

 I basically agree.  The connection is that - as we use the all-visible
 for more things, the performance penalty for failing to vacuum (say)
 an insert-only table will continue to grow.  Still, as you say,
 clearly a separate topic.

 The temptation is high to estimate the cost of an index_scan(only) +
 ordered(by ctid) table pages fetch if heap required. (this is what I
 understood from heikki suggestion 3-4. and it makes sense). It may be
 easier to implement both at once but I didn't find the branch in the
 Heikki's git repos. (probably removed since the long time)

 I was thinking about this as well, at least if I understand you
 correctly.  That would be similar to a bitmap index scan, and I think
 it would be a great thing to have, not only because it would allow us
 to get the advantages of index-only scans in situations that are
 well-suited to our current bitmap scans, but also because it could be
 batched.  You could allocate a buffer of work_mem bytes and fill it up
 with TIDs; then, when it's full, you sort the buffer and start doing
 the necessary heap fetches in physical order.  If you still need more
 rows, you can clear the buffer and go around for another pass.

 Based on ANALYZE stats for the visibility, I believe cost_index and
 cost_index_only should be very similar functions (well, atm, I don't
 see the point to split it in 2 functions).

 Yeah, I would more imagine modifying the existing function.

 Any thoughts welcome.  Incidentally, if anyone else feels like working
 on this, feel free to let me know and I'm happy to step away, from all
 of it or from whatever part someone else wants to tackle.  I'm mostly
 working on this because it's something that I think we really need to
 get done, more than having a burning desire to be the one who does it.

 Indexonly scans are welcome!
 I believe I can help on 3 and 4, but (really) not sure for 1 and 2.

 Well, I have code for #1, and just need reviews, and #2 shouldn't be
 that hard, and with luck I'll twist Bruce's arm into doing it (*waves
 to Bruce*).  So #3 and #4 are the next thing to tackle.  Any thoughts
 on what/how you'd like to contribute there?

I can provide initial patchs for cost and analyze, at least.


 --
 Robert Haas
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company




-- 
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] crash-safe visibility map, take five

2011-05-10 Thread Jesper Krogh

On 2011-05-10 14:48, Robert Haas wrote:

We could avoid all of this complexity - and the possibility of pinning
the visibility map page needlessly - by locking the heap buffer first
and then pinning the visibility map page if the heap page is
all-visible.  However, that would involve holding the lock on the heap
buffer across a possible disk I/O to bring the visibility map page
into memory, which is something the existing code tries pretty hard to
avoid.

Assuming that the visibillity map would be used for visibillity testing,
just picking the lock would effectively mean we want it in the buffers,
which would not be that bad?

Or what is the downside for keeping it across IO? Will it block other
processes trying to read it?

--
Jesper

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] the big picture for index-only scans

2011-05-10 Thread Simon Riggs
On Tue, May 10, 2011 at 3:25 AM, Robert Haas robertmh...@gmail.com wrote:

 So, what do we need in order to find our way to index-only scans?

 1. The visibility map needs to be crash-safe.  The basic idea of
 index-only scans is that, instead of checking the heap to find out
 whether each tuple is visible, we first check the visibility map.  If

This topic has been discussed many times, yet I have never seen an
assessment that explains WHY we would want to do index-only scans.

This will be a complex addition to the codebase and one that could
introduce bugs into MVCC. It seems reasonable to look at what the
benefit of this would be, and what the use case/ benefit profile is
before we spend a long time adding this optimization.

I asked for this previously on earlier threads also.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] the big picture for index-only scans

2011-05-10 Thread Cédric Villemain
 The temptation is high to estimate the cost of an index_scan(only) +
 ordered(by ctid) table pages fetch if heap required. (this is what I
 understood from heikki suggestion 3-4. and it makes sense). It may be
 easier to implement both at once but I didn't find the branch in the
 Heikki's git repos. (probably removed since the long time)

 I was thinking about this as well, at least if I understand you

yes.

 correctly.  That would be similar to a bitmap index scan, and I think
 it would be a great thing to have, not only because it would allow us
 to get the advantages of index-only scans in situations that are
 well-suited to our current bitmap scans, but also because it could be
 batched.  You could allocate a buffer of work_mem bytes and fill it up
 with TIDs; then, when it's full, you sort the buffer and start doing
 the necessary heap fetches in physical order.  If you still need more
 rows, you can clear the buffer and go around for another pass.

Issue remaining here is that we don't have 'safe' Indexonly_scan, just
indexscan with probability on the 'only'.



-- 
Cédric Villemain               2ndQuadrant
http://2ndQuadrant.fr/     PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Process wakeups when idle and power consumption

2011-05-10 Thread Simon Riggs
On Tue, May 10, 2011 at 12:45 PM, Robert Haas robertmh...@gmail.com wrote:
 On Tue, May 10, 2011 at 5:14 AM, Peter Geoghegan pe...@2ndquadrant.com 
 wrote:
 On 10 May 2011 09:45, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:

 I think we need to refactor the function into something like:

 #define WL_LATCH_SET    1
 #define WL_SOCKET_READABLE 2
 #define WL_SOCKET_WRITEABLE 4
 #define WL_TIMEOUT      8
 #define WL_POSTMASTER_DEATH 16

 While I agree with the need to not box ourselves into a corner on the
 latch interface by making sweeping assumptions, isn't the fact that a
 socket became readable or writable strictly an implementation detail?

 The thing about the socket being readable/writeable is needed for
 walsender.  It needs to notice when its connection to walreceiver is
 writeable (so it can send more WAL) or readable (so it can receive a
 reply message).

I've got a feeling that things will go easier if we have a separate
connection for the feedback channel.

Yes, two connections, one in either direction.

That would make everything simple, nice one way connections. It would
also mean we could stream at higher data rates.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] the big picture for index-only scans

2011-05-10 Thread Kevin Grittner
Simon Riggs si...@2ndquadrant.com wrote:
 
 This topic has been discussed many times, yet I have never seen an
 assessment that explains WHY we would want to do index-only scans.
 
In databases with this feature, it's not too unusual for a query
which uses just an index to run one or more orders of magnitude
faster than a query which has to randomly access the heap for each
index entry.  That seems like enough evidence of its possible value
in PostgreSQL to proceed to the point where benchmarks become
possible.  I'm assuming that, like all other features added as
performance optimizations, it won't be committed until there are
benchmarks showing the net benefit.
 
As a thought experiment, picture the relative costs of scanning a
portion of an index in index sequence, and being done, versus
scanning a portion of an index in index sequence and jumping to a
random heap access for each index entry as you go.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Process wakeups when idle and power consumption

2011-05-10 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes:
 I've got a feeling that things will go easier if we have a separate
 connection for the feedback channel.

 Yes, two connections, one in either direction.

 That would make everything simple, nice one way connections. It would
 also mean we could stream at higher data rates.

The above sounds like complete nonsense.  TCP connections are already
full-duplex.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Process wakeups when idle and power consumption

2011-05-10 Thread Heikki Linnakangas

On 10.05.2011 14:39, Peter Geoghegan wrote:

Attached is win32 implementation of the named pipe trick.

It consists of a Visual Studio 2008 solution that contains two
projects, named_pipe_trick (which represents the postmaster) and
auxiliary_backend (which represents each auxiliary process). I split
the solution into two projects/programs because Windows lacks fork()
to make it all happen with a single program.

Thoughts? Once I have some buy-in, I'd like to write a patch for the
latch code that incorporates monitoring the postmaster using the named
pipe trick (for both unix_latch.c and win32_latch.c), plus Heikki's
suggestions.


It should be an anonymous pipe that's inherited by the child process by 
rather than a named pipe. Otherwise seems fine to me, as far as this 
proof of concept program goes.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] the big picture for index-only scans

2011-05-10 Thread Robert Haas
On Tue, May 10, 2011 at 11:27 AM, Cédric Villemain
cedric.villemain.deb...@gmail.com wrote:
 2011/5/10 Robert Haas robertmh...@gmail.com:
 On Tue, May 10, 2011 at 10:58 AM, Cédric Villemain
 cedric.villemain.deb...@gmail.com wrote:
 ANALYZE can do the stats job for 'free' on the pages it collects
 anyway. So that looks like a good idea.
 I believe the really lazy vacuum is another topic; even if it will
 improve the performance of the index only scan to have tables already
 vacuuumed, the stats should expose that and the function
 cost_index(_only?)() taking care of that.

 I basically agree.  The connection is that - as we use the all-visible
 for more things, the performance penalty for failing to vacuum (say)
 an insert-only table will continue to grow.  Still, as you say,
 clearly a separate topic.

 The temptation is high to estimate the cost of an index_scan(only) +
 ordered(by ctid) table pages fetch if heap required. (this is what I
 understood from heikki suggestion 3-4. and it makes sense). It may be
 easier to implement both at once but I didn't find the branch in the
 Heikki's git repos. (probably removed since the long time)

 I was thinking about this as well, at least if I understand you
 correctly.  That would be similar to a bitmap index scan, and I think
 it would be a great thing to have, not only because it would allow us
 to get the advantages of index-only scans in situations that are
 well-suited to our current bitmap scans, but also because it could be
 batched.  You could allocate a buffer of work_mem bytes and fill it up
 with TIDs; then, when it's full, you sort the buffer and start doing
 the necessary heap fetches in physical order.  If you still need more
 rows, you can clear the buffer and go around for another pass.

 Based on ANALYZE stats for the visibility, I believe cost_index and
 cost_index_only should be very similar functions (well, atm, I don't
 see the point to split it in 2 functions).

 Yeah, I would more imagine modifying the existing function.

 Any thoughts welcome.  Incidentally, if anyone else feels like working
 on this, feel free to let me know and I'm happy to step away, from all
 of it or from whatever part someone else wants to tackle.  I'm mostly
 working on this because it's something that I think we really need to
 get done, more than having a burning desire to be the one who does it.

 Indexonly scans are welcome!
 I believe I can help on 3 and 4, but (really) not sure for 1 and 2.

 Well, I have code for #1, and just need reviews, and #2 shouldn't be
 that hard, and with luck I'll twist Bruce's arm into doing it (*waves
 to Bruce*).  So #3 and #4 are the next thing to tackle.  Any thoughts
 on what/how you'd like to contribute there?

 I can provide initial patchs for cost and analyze, at least.

OK, cool.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] crash-safe visibility map, take five

2011-05-10 Thread Robert Haas
On Tue, May 10, 2011 at 11:34 AM, Jesper Krogh jes...@krogh.cc wrote:
 On 2011-05-10 14:48, Robert Haas wrote:

 We could avoid all of this complexity - and the possibility of pinning
 the visibility map page needlessly - by locking the heap buffer first
 and then pinning the visibility map page if the heap page is
 all-visible.  However, that would involve holding the lock on the heap
 buffer across a possible disk I/O to bring the visibility map page
 into memory, which is something the existing code tries pretty hard to
 avoid.

 Assuming that the visibillity map would be used for visibillity testing,
 just picking the lock would effectively mean we want it in the buffers,
 which would not be that bad?

 Or what is the downside for keeping it across IO? Will it block other
 processes trying to read it?

Heikki might be in a better position to comment on that than I am,
since he wrote the existing code.  But I think that's basically the
issue.  When one process has an exclusive buffer lock, nobody else can
scan the tuples in that block - so a sequential scan, for example,
that reached that block, or an index scan that needed to probe into
it, would pile up behind the read of the visibility map page.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] the big picture for index-only scans

2011-05-10 Thread Tom Lane
Kevin Grittner kevin.gritt...@wicourts.gov writes:
 Simon Riggs si...@2ndquadrant.com wrote:
 This topic has been discussed many times, yet I have never seen an
 assessment that explains WHY we would want to do index-only scans.
 
 In databases with this feature, it's not too unusual for a query
 which uses just an index to run one or more orders of magnitude
 faster than a query which has to randomly access the heap for each
 index entry.  That seems like enough evidence of its possible value
 in PostgreSQL to proceed to the point where benchmarks become
 possible.  I'm assuming that, like all other features added as
 performance optimizations, it won't be committed until there are
 benchmarks showing the net benefit.
 
 As a thought experiment, picture the relative costs of scanning a
 portion of an index in index sequence, and being done, versus
 scanning a portion of an index in index sequence and jumping to a
 random heap access for each index entry as you go.

It's already the case that we'll flip over to a bitmap indexscan,
and thus get rid of most/all of the random page accesses, in
situations where this is likely to be a big win.  Pointing to the
performance difference in databases that don't do that is therefore
not too convincing.

I'm inclined to agree that index-only scans might be worth the amount
of work that's involved ... but I share Simon's desire to see some proof
before anything gets committed.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] collateral benefits of a crash-safe visibility map

2011-05-10 Thread Simon Riggs
On Tue, May 10, 2011 at 3:47 PM, Robert Haas robertmh...@gmail.com wrote:

 To address the first problem, what we've talked about doing is
 something along the line of freezing the tuples at the time we mark
 the page all-visible, so we don't have to go back and do it again
 later.  Unfortunately, it's not quite that simple, because freezing
 tuples that early would cause all sorts of headaches for hot standby,
 not to mention making Tom and Alvaro grumpy when they're trying to
 figure out a corruption problem and all the xmins are FrozenXID rather
 than whatever they were originally.  We floated the idea of a
 tuple-level bit HEAP_XMIN_FROZEN that would tell the system to treat
 the tuple as frozen, but wouldn't actually overwrite the xmin field.
 That would solve the forensic problem with earlier freezing, but it
 doesn't do anything to resolve the Hot Standby problem.  There is a
 performance issue to worry about, too: freezing operations must be
 xlog'd, as we update relfrozenxid based on the results, and therefore
 can't risk losing a freezing operation later on.  So freezing sooner
 means more xlog activity for pages that might very well never benefit
 from it (if the tuples therein don't stick around long enough for it
 to matter).

Hmmm, do we really need to WAL log freezing?

Can we break down freezing into a 2 stage process, so that we can have
first stage as a lossy operation and a second stage that is WAL
logged?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] hint bit cache v5

2011-05-10 Thread Simon Riggs
On Mon, May 9, 2011 at 5:12 PM, Merlin Moncure mmonc...@gmail.com wrote:

 I'd like to know if this is a strategy that merits further work...If
 anybody has time/interest that is.  It's getting close to the point
 where I can just post it to the commit fest for review.  In
 particular, I'm concerned if Tom's earlier objections can be
 satisfied. If not, it's back to the drawing board...

I'm interested in what you're doing here.

From here, there's quite a lot of tuning possibilities. It would be
very useful to be able to define some metrics we are interested in
reducing and working out how to measure them.

That way we can compare all the different variants of this to see
which way of doing things works best.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] crash-safe visibility map, take five

2011-05-10 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Tue, May 10, 2011 at 11:34 AM, Jesper Krogh jes...@krogh.cc wrote:
 Or what is the downside for keeping it across IO? Will it block other
 processes trying to read it?

 Heikki might be in a better position to comment on that than I am,
 since he wrote the existing code.  But I think that's basically the
 issue.  When one process has an exclusive buffer lock, nobody else can
 scan the tuples in that block - so a sequential scan, for example,
 that reached that block, or an index scan that needed to probe into
 it, would pile up behind the read of the visibility map page.

Right, it's the loss of potential concurrency that's annoying here.

On the other hand, the concurrency loss might be entirely theoretical
--- in particular, if other potential readers of the heap page would
probably also need to wait for the visibility page to come in, then
nothing is gained by letting them acquire the heap page lock sooner.

I've not read this thread in any detail yet, but if we're going to be
jumping through extremely complex hoops to avoid that scenario, it might
be better to KISS ... especially in the first iteration.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] collateral benefits of a crash-safe visibility map

2011-05-10 Thread Robert Haas
On Tue, May 10, 2011 at 12:57 PM, Simon Riggs si...@2ndquadrant.com wrote:
 Hmmm, do we really need to WAL log freezing?

 Can we break down freezing into a 2 stage process, so that we can have
 first stage as a lossy operation and a second stage that is WAL
 logged?

That might solve the relfrozenxid problem - set the bits in the heap,
sync the heap, then update relfrozenxid once the heap is guaranteed
safely on disk - but it again seems problematic for Hot Standby.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] crash-safe visibility map, take five

2011-05-10 Thread Robert Haas
On Tue, May 10, 2011 at 1:00 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Tue, May 10, 2011 at 11:34 AM, Jesper Krogh jes...@krogh.cc wrote:
 Or what is the downside for keeping it across IO? Will it block other
 processes trying to read it?

 Heikki might be in a better position to comment on that than I am,
 since he wrote the existing code.  But I think that's basically the
 issue.  When one process has an exclusive buffer lock, nobody else can
 scan the tuples in that block - so a sequential scan, for example,
 that reached that block, or an index scan that needed to probe into
 it, would pile up behind the read of the visibility map page.

 Right, it's the loss of potential concurrency that's annoying here.

 On the other hand, the concurrency loss might be entirely theoretical
 --- in particular, if other potential readers of the heap page would
 probably also need to wait for the visibility page to come in, then
 nothing is gained by letting them acquire the heap page lock sooner.

 I've not read this thread in any detail yet, but if we're going to be
 jumping through extremely complex hoops to avoid that scenario, it might
 be better to KISS ... especially in the first iteration.

I wouldn't describe the hoops as extremely complex; I don't feel any
inclination to simplify the patch beyond what it is right now.  Of
course, we'll see what the feedback looks like after more people have
read the patch, but my feeling is that the patch strikes a reasonable
balance between performance and keeping it simple.  There are some
more complicated shenanigans that I started to experiment with and
ripped out as premature optimization, but this part I think is OK.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] the big picture for index-only scans

2011-05-10 Thread Robert Haas
On Tue, May 10, 2011 at 12:53 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Kevin Grittner kevin.gritt...@wicourts.gov writes:
 Simon Riggs si...@2ndquadrant.com wrote:
 This topic has been discussed many times, yet I have never seen an
 assessment that explains WHY we would want to do index-only scans.

 In databases with this feature, it's not too unusual for a query
 which uses just an index to run one or more orders of magnitude
 faster than a query which has to randomly access the heap for each
 index entry.  That seems like enough evidence of its possible value
 in PostgreSQL to proceed to the point where benchmarks become
 possible.  I'm assuming that, like all other features added as
 performance optimizations, it won't be committed until there are
 benchmarks showing the net benefit.

 As a thought experiment, picture the relative costs of scanning a
 portion of an index in index sequence, and being done, versus
 scanning a portion of an index in index sequence and jumping to a
 random heap access for each index entry as you go.

 It's already the case that we'll flip over to a bitmap indexscan,
 and thus get rid of most/all of the random page accesses, in
 situations where this is likely to be a big win.  Pointing to the
 performance difference in databases that don't do that is therefore
 not too convincing.

 I'm inclined to agree that index-only scans might be worth the amount
 of work that's involved ... but I share Simon's desire to see some proof
 before anything gets committed.

Well, we're not in the habit of committing performance patches without
performance numbers, so I doubt we'll reverse that trend now, and
certainly I had no intention of doing so.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Collation mega-cleanups

2011-05-10 Thread Ross J. Reedstrom
On Mon, May 09, 2011 at 03:57:12PM -0400, Robert Haas wrote:
 On Mon, May 9, 2011 at 2:58 PM, Bruce Momjian br...@momjian.us wrote:
  Tom this collation stuff has seen more post-feature-commit cleanups than
  I think any patch I remember.  Is there anything we can learn from this?
 
 How about don't commit all the large patches at the end of the cycle?

My take home from following this is: 'Even Tom can get caught in the
just one more little change trap' 

:-)

Ross
-- 
Ross Reedstrom, Ph.D. reeds...@rice.edu
Systems Engineer  Admin, Research Scientistphone: 713-348-6166
Connexions  http://cnx.orgfax: 713-348-3665
Rice University MS-375, Houston, TX 77005
GPG Key fingerprint = F023 82C8 9B0E 2CC6 0D8E  F888 D3AE 810E 88F0 BEDE



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] collateral benefits of a crash-safe visibility map

2011-05-10 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Tue, May 10, 2011 at 12:57 PM, Simon Riggs si...@2ndquadrant.com wrote:
 Hmmm, do we really need to WAL log freezing?

 That might solve the relfrozenxid problem - set the bits in the heap,
 sync the heap, then update relfrozenxid once the heap is guaranteed
 safely on disk - but it again seems problematic for Hot Standby.

... or even warm standby.  You basically *have to* WAL-log freezing
before you can truncate pg_clog.  The only freedom you have here is
freedom to mess with the policy about how soon you try to truncate
pg_clog.

(Doing an unlogged freeze operation first is right out, too, if it
causes the system to fail to perform/log the operation later.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] the big picture for index-only scans

2011-05-10 Thread Simon Riggs
On Tue, May 10, 2011 at 5:17 PM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 Simon Riggs si...@2ndquadrant.com wrote:

 This topic has been discussed many times, yet I have never seen an
 assessment that explains WHY we would want to do index-only scans.

 In databases with this feature, it's not too unusual for a query
 which uses just an index to run one or more orders of magnitude
 faster than a query which has to randomly access the heap for each
 index entry.  That seems like enough evidence of its possible value
 in PostgreSQL to proceed to the point where benchmarks become
 possible.  I'm assuming that, like all other features added as
 performance optimizations, it won't be committed until there are
 benchmarks showing the net benefit.

 As a thought experiment, picture the relative costs of scanning a
 portion of an index in index sequence, and being done, versus
 scanning a portion of an index in index sequence and jumping to a
 random heap access for each index entry as you go.

I can picture that. Regrettably, I can also picture the accesses to
the visibility map, the maintenance operations on the VM that are
needed for this and the contention that both of those will cause.

ISTM quite likely that we'll slow down writes to some extent in order
to improve this use case.

So I'm interested in knowing how broad the use case is and what the
overheads are, rather than have an aw crap! moment in the future
where we finish the code and only then realise its benefit footprint
is not useful. Best to start out with a clear benefit analysis other
than other DBMS do it.

For example, will this be an index-specific tuning option
(manual/automatic), per table or an always-on feature?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Collation mega-cleanups

2011-05-10 Thread Andres Freund
On Tuesday, May 10, 2011 07:08:23 PM Ross J. Reedstrom wrote:
 On Mon, May 09, 2011 at 03:57:12PM -0400, Robert Haas wrote:
  On Mon, May 9, 2011 at 2:58 PM, Bruce Momjian br...@momjian.us wrote:
   Tom this collation stuff has seen more post-feature-commit cleanups
   than I think any patch I remember.  Is there anything we can learn
   from this?
 
  
 
  How about don't commit all the large patches at the end of the cycle?
 
 My take home from following this is: 'Even Tom can get caught in the
 just one more little change trap' 
I don't think any of the changes from Tom deserves that categorization. 

Andres

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] collateral benefits of a crash-safe visibility map

2011-05-10 Thread Heikki Linnakangas

On 10.05.2011 17:47, Robert Haas wrote:

On Tue, May 10, 2011 at 9:59 AM, Merlin Moncuremmonc...@gmail.com  wrote:

no, that wasn't my intent at all, except in the sense of wondering if
a crash-safe visibility map provides a route of displacing a lot of
hint bit i/o and by extension, making alternative approaches of doing
that, including mine, a lot less useful.  that's a good thing.


Sadly, I don't think it's going to have that effect.  The
page-is-all-visible bits seem to offer a significant performance
benefit over the xmin-committed hint bits; but the benefit of
xmin-committed all by itself is too much to ignore.  The advantages of
the xmin-committed hint bit (as opposed to the all-visible page-level
bit) are:

(1) Setting the xmin-committed hint bit is a much more light-weight
operation than setting the all-visible page-level bit.  It can by done
on-the-fly by any backend, rather than only by VACUUM, and need not be
XLOG'd.
(2) If there are long-running transactions on the system,
xmin-committed can be set much sooner than all-visible - the
transaction need only commit.  All-visible can't be set until
overlapping transactions have ended.
(3) xmin-committed is useful on standby servers, whereas all-visible
is ignored there.  (Note that neither this patch nor index-only scans
changes anything about that: it's existing behavior, necessitated by
different xmin horizons.)


(4) xmin-committed flag attached directly to the tuple provides some 
robustness in case of corruption, due to bad hw. Without the flag, a 
single bit flip in the clog could in the worst case render all of your 
bulk-loaded data invisible and vacuumable. Of course, corruption will 
always eat your data to some extent, but the hint bits provide some 
robustness. Hint bits are close to the data itself, not in another file 
like the clog, which can come handy at disaster recovery.


A flag in the heap page header isn't too different from a per-tuple hint 
bit from that point of view, it's still in the same page as the data 
itself. A bit in the clog or visibility map is not.


Not sure how much performance we're willing to sacrifice for that, but 
it's something to keep in mind.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] the big picture for index-only scans

2011-05-10 Thread Kevin Grittner
Tom Lane t...@sss.pgh.pa.us wrote:
 Kevin Grittner kevin.gritt...@wicourts.gov writes:
 Simon Riggs si...@2ndquadrant.com wrote:
 This topic has been discussed many times, yet I have never seen
 an assessment that explains WHY we would want to do index-only
 scans.
  
 In databases with this feature, it's not too unusual for a query
 which uses just an index to run one or more orders of magnitude
 faster than a query which has to randomly access the heap for
 each index entry.  That seems like enough evidence of its
 possible value in PostgreSQL to proceed to the point where
 benchmarks become possible.  I'm assuming that, like all other
 features added as performance optimizations, it won't be
 committed until there are benchmarks showing the net benefit.
  
 As a thought experiment, picture the relative costs of scanning a
 portion of an index in index sequence, and being done, versus
 scanning a portion of an index in index sequence and jumping to a
 random heap access for each index entry as you go.
 
 It's already the case that we'll flip over to a bitmap indexscan,
 and thus get rid of most/all of the random page accesses, in
 situations where this is likely to be a big win.  Pointing to the
 performance difference in databases that don't do that is
 therefore not too convincing.
 
Sure.  Of course, if you're only accessing twenty thousand rows from
a table containing fifty million rows, bitmap index scans could come
out pretty close to random access times; but on the whole I agree
that the scale of benefit in PostgreSQL won't tend to match what
people see in other products.  Note that my words were enough
evidence of its possible value in PostgreSQL to proceed to the point
where benchmarks become possible.
 
In particular, we might want to somehow try to make clear to people
that the very wide indexes they are accustomed to creating to allow
this optimization in other products might be inefficient compared to
creating several one-column indexes which would enable bitmap
logical operations.
 
 I'm inclined to agree that index-only scans might be worth the
 amount of work that's involved
 
So we agree there.
 
 ... but I share Simon's desire to see some proof before anything
 gets committed.
 
And we agree there.  In fact, I can't think of anyone in the
community who doesn't want to see that for *any* purported
performance enhancement.
 
My overall gut feel is that there will be some circumstances where
the covering index optmization is much faster, and some where
people expect it to be, but it isn't.  The trickiest part of this
might be developing a costing model which allows us to make the
right choice most of the time.  And even if we get it perfect, we
can expect questions about why the covering index wasn't used, and
requests for hints so they can force it.  :-(
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] hint bit cache v5

2011-05-10 Thread Merlin Moncure
On Tue, May 10, 2011 at 11:59 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On Mon, May 9, 2011 at 5:12 PM, Merlin Moncure mmonc...@gmail.com wrote:

 I'd like to know if this is a strategy that merits further work...If
 anybody has time/interest that is.  It's getting close to the point
 where I can just post it to the commit fest for review.  In
 particular, I'm concerned if Tom's earlier objections can be
 satisfied. If not, it's back to the drawing board...

 I'm interested in what you're doing here.

 From here, there's quite a lot of tuning possibilities. It would be
 very useful to be able to define some metrics we are interested in
 reducing and working out how to measure them.

 That way we can compare all the different variants of this to see
 which way of doing things works best.

thanks for that!  I settled on this approach because the downside
cases should hopefully be pretty limited.  The upside is a matter of
debate although fairly trivial to demonstrate synthetically.

I'm looking for some way of benchmarking the benefits in non-simulated
fashion.  We desperately need something like a performance farm (as
many many others have mentioned).

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Formatting Curmudgeons WAS: MMAP Buffers

2011-05-10 Thread Greg Smith

Heikki Linnakangas wrote:
Well, my first patch was two-phase commit. And I had never even used 
PostgreSQL before I dived into the source tree and started to work on that


Well, everyone knows you're awesome.  A small percentage of the people 
who write patches end up having the combination of background skills, 
mindset, and approach to pull something like that off.  But there are at 
least a dozens submissions that start review with I don't think there 
will ever work, but I can't even read your malformed patch to be sure 
for every one that went like 2PC.  If every submitter was a budding 
Heikki we wouldn't need patch submission guidelines at all.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] collateral benefits of a crash-safe visibility map

2011-05-10 Thread Merlin Moncure
On Tue, May 10, 2011 at 9:47 AM, Robert Haas robertmh...@gmail.com wrote:
 On Tue, May 10, 2011 at 9:59 AM, Merlin Moncure mmonc...@gmail.com wrote:
 no, that wasn't my intent at all, except in the sense of wondering if
 a crash-safe visibility map provides a route of displacing a lot of
 hint bit i/o and by extension, making alternative approaches of doing
 that, including mine, a lot less useful.  that's a good thing.

 Sadly, I don't think it's going to have that effect.  The
 page-is-all-visible bits seem to offer a significant performance
 benefit over the xmin-committed hint bits; but the benefit of
 xmin-committed all by itself is too much to ignore.  The advantages of
 the xmin-committed hint bit (as opposed to the all-visible page-level
 bit) are:

 (1) Setting the xmin-committed hint bit is a much more light-weight
 operation than setting the all-visible page-level bit.  It can by done
 on-the-fly by any backend, rather than only by VACUUM, and need not be
 XLOG'd.
 (2) If there are long-running transactions on the system,
 xmin-committed can be set much sooner than all-visible - the
 transaction need only commit.  All-visible can't be set until
 overlapping transactions have ended.
 (3) xmin-committed is useful on standby servers, whereas all-visible
 is ignored there.  (Note that neither this patch nor index-only scans
 changes anything about that: it's existing behavior, necessitated by
 different xmin horizons.)

right. #1 could maybe worked around somehow and #2 is perhaps
arguable, at least in some workloads, but #3 is admittedly a killer
especially since the bit is on the page.

I noted your earlier skepticism regarding moving the page visibility
check completely to the VM:
In some ways, that would make things much simpler.  But to make that
work, every insert/update/delete to a page would have to pin the
visibility map page and clear PD_ALL_VISIBLE if appropriate, so it
might not be good from a performance standpoint, especially in
high-concurrency workloads.  Right now, if PD_ALL_VISIBLE isn't set,
we don't bother touching the visibility map page, which seems like a
possibly important optimization.

That's debatable, but probably moot.  Thanks for thinking that through though.

merlin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 4.1beta1: ANYARRAY disallowed for DOMAIN types which happen to be arrays

2011-05-10 Thread J. Greg Davidson
Given:

  CREATE DOMAIN int_array AS int[];

The operator [] works fine in 4.1beta1:

  SELECT (ARRAY[1,2,3]::int_array)[1];

proving that int_array is an array type with element type int.

It is inconsistent that other array functions and operators don't work.

On Mon, 2011-05-09 at 23:32 -0400, Tom Lane wrote:

 So we basically had three alternatives to make it better:
   #1 downcast to the array type, which would possibly silently
 break applications that were relying on the function result
 being considered of the domain type

I do not think of this as Downcasting int_array to int[] but as
allowing an ANYARRAY to match int_array which is an array type.
Since no cast is logically required, the return type is the same
as the first argument type, as expected and as PostgreSQL has done
for some time.

   #2 re-apply domain checks on the function result, which would be
 a performance hit and possibly again result in unobvious
 breakage

If the function result is a new value then nothing is being re-applied.
If it is an existing value of the domain type which was passed in or
extracted from a data structure, then the domain checks have already
been applied.  This is a red herring.

   #3 explicitly break it by throwing a parse error until you
 downcast (and then upcast the function result if you want)

 I realize that #3 is a bit unpleasant, but are either of the other two
 better?  At least #3 shows you where you need to check for problems.

Wrapping most (but not all) of your array operations in downcasts and
upcasts is horrible.

 There is another issue that wasn't really mentioned in the previous
 thread, which is that if we are matching a domain-over-array to a
 function's ANYARRAY argument, what exactly should be allowed to match to
 ANYELEMENT --- or if the function returns ANYELEMENT, what should the
 imputed result type be?

Since PostgreSQL allows indexing of the domain type, we already know
the answer.  I don't even get why there is confusion abou the
element type of an array.

 AFAICS it's impossible to give an answer to
 that without effectively deciding that function argument matching
 smashes the domain to its base type (the array type).  It's not very
 clear what's the point of a domain type if every operation on it is
 going to neglect its domain-ness.

Yes, what is the point of neglecting the domain-ness of a domain type
by being forced to downcast it to an unchecked type before (some) array
operations?  If a value is being constructed of a domain-type which has
constraints, check them.  When I don't want the security of a domain
type I can cast it to its representation type before I passed it, but
it seems bizarre to be required to do such a thing!

I did read the previous threads some time ago. They seemed mostly to be
concerned with discussing the internal implementation of these matters
and the typmod feature (which I still don't understand).  The internal
algorithms and deta structures which PostgreSQL uses to internally
represent SQL types and operations are a weak justification for
PostgreSQL's behavior - they can be changed if they are wrong.

I am still hoping to get rid of my domains which are arrays when
PostgreSQL supports arrays of elements which are of domain types.
Could we at least defer this change until that is done?

_Greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Collation mega-cleanups

2011-05-10 Thread Ross J. Reedstrom
On Tue, May 10, 2011 at 07:21:16PM +0200, Andres Freund wrote:
 On Tuesday, May 10, 2011 07:08:23 PM Ross J. Reedstrom wrote:
  On Mon, May 09, 2011 at 03:57:12PM -0400, Robert Haas wrote:
   On Mon, May 9, 2011 at 2:58 PM, Bruce Momjian br...@momjian.us wrote:
Tom this collation stuff has seen more post-feature-commit cleanups
than I think any patch I remember.  Is there anything we can learn
from this?
  
   
  
   How about don't commit all the large patches at the end of the cycle?
  
  My take home from following this is: 'Even Tom can get caught in the
  just one more little change trap' 
 I don't think any of the changes from Tom deserves that categorization. 

No disrespect intended, far from it. The trap is that something at seems
at a distance as relatively small can grow on closer inspection. Which I
think is exactly what Tom said (paraphrased) The pre-commit review was
insufficent i.e.  the remaining problems seemed little, but were not.

In addition, little is relative to who's doing the changes, over what
domain. Things that are little for Tom on PostgreSQL would not be so
for me. Presumably the inverse is true over other domains.

So perhaps it was more of the This code is less ready than I thought
it was, but now that I've spent the time understanding it and the
problem, the shortest way out is forward. I think we've all been in
that swamp, at one time or another.

Ross
-- 
Ross Reedstrom, Ph.D. reeds...@rice.edu
Systems Engineer  Admin, Research Scientistphone: 713-348-6166
Connexions  http://cnx.orgfax: 713-348-3665
Rice University MS-375, Houston, TX 77005
GPG Key fingerprint = F023 82C8 9B0E 2CC6 0D8E  F888 D3AE 810E 88F0 BEDE

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 4.1beta1: ANYARRAY disallowed for DOMAIN types which happen to be arrays

2011-05-10 Thread Robert Haas
On Mon, May 9, 2011 at 11:32 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 J. Greg Davidson g...@ngender.net writes:
       * Tighten casting checks for domains based on arrays (Tom Lane)

         When a domain is based on an array type,..., such a domain type
         is no longer allowed to match an anyarray parameter of a
         polymorphic function, except by explicitly downcasting it to the
         base array type.

 This will require me to add hundreds of casts to my code.  I do not get
 how this will Tighten casting checks.  It will certainly not tighten
 my code!  Could you explain how it is good to not be able to do array
 operations with a type which is an array?

 The discussion that led up to that decision is in this thread:
 http://archives.postgresql.org/pgsql-hackers/2010-10/msg01362.php
 specifically here:
 http://archives.postgresql.org/pgsql-hackers/2010-10/msg01545.php

 The previous behavior was clearly broken.  The new behavior is at least
 consistent.  It might be more user-friendly if we did automatic
 downcasts in these cases, but we were not (and still are not) doing
 automatic downcasts for domains over scalar types in comparable cases,
 so it's not very clear why domains over array types should be treated
 differently.

 To be concrete, consider the function array_append(anyarray, anyelement)
 yielding anyarray.  Suppose we have a domain D over int[] and the call
 array_append(var_of_type_D, 42).  If we automatically downcast the
 variable to int[], should the result of the function be considered to be
 of type D, or type int[]?  This isn't a trivial distinction because
 choosing to consider it of type D means we have to re-check D's domain
 constraints, which might or might not be satisfied by the modified
 array.  Previous releases considered the result to be of type D,
 *without* rechecking the domain constraints, which was flat out wrong.

 So we basically had three alternatives to make it better:
        * downcast to the array type, which would possibly silently
          break applications that were relying on the function result
          being considered of the domain type
        * re-apply domain checks on the function result, which would be
          a performance hit and possibly again result in unobvious
          breakage
        * explicitly break it by throwing a parse error until you
          downcast (and then upcast the function result if you want)
 I realize that #3 is a bit unpleasant, but are either of the other two
 better?  At least #3 shows you where you need to check for problems.

Aren't any applications that would be broken by #1 broken already?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] collateral benefits of a crash-safe visibility map

2011-05-10 Thread Simon Riggs
On Tue, May 10, 2011 at 6:02 PM, Robert Haas robertmh...@gmail.com wrote:
 On Tue, May 10, 2011 at 12:57 PM, Simon Riggs si...@2ndquadrant.com wrote:
 Hmmm, do we really need to WAL log freezing?

 Can we break down freezing into a 2 stage process, so that we can have
 first stage as a lossy operation and a second stage that is WAL
 logged?

 That might solve the relfrozenxid problem - set the bits in the heap,
 sync the heap, then update relfrozenxid once the heap is guaranteed
 safely on disk - but it again seems problematic for Hot Standby.

How about we truncate the clog differently on each server? We could
have a special kind of VACUUM that runs during Hot Standby, setting
frozen hint bits only.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] collateral benefits of a crash-safe visibility map

2011-05-10 Thread Robert Haas
On Tue, May 10, 2011 at 1:49 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On Tue, May 10, 2011 at 6:02 PM, Robert Haas robertmh...@gmail.com wrote:
 On Tue, May 10, 2011 at 12:57 PM, Simon Riggs si...@2ndquadrant.com wrote:
 Hmmm, do we really need to WAL log freezing?

 Can we break down freezing into a 2 stage process, so that we can have
 first stage as a lossy operation and a second stage that is WAL
 logged?

 That might solve the relfrozenxid problem - set the bits in the heap,
 sync the heap, then update relfrozenxid once the heap is guaranteed
 safely on disk - but it again seems problematic for Hot Standby.

 How about we truncate the clog differently on each server? We could
 have a special kind of VACUUM that runs during Hot Standby, setting
 frozen hint bits only.

Interesting idea.  It does seem complicated.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 4.1beta1: ANYARRAY disallowed for DOMAIN types which happen to be arrays

2011-05-10 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Mon, May 9, 2011 at 11:32 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 So we basically had three alternatives to make it better:
* downcast to the array type, which would possibly silently
  break applications that were relying on the function result
  being considered of the domain type
* re-apply domain checks on the function result, which would be
  a performance hit and possibly again result in unobvious
  breakage
* explicitly break it by throwing a parse error until you
  downcast (and then upcast the function result if you want)
 I realize that #3 is a bit unpleasant, but are either of the other two
 better?  At least #3 shows you where you need to check for problems.

 Aren't any applications that would be broken by #1 broken already?

My point is that doing #1 would break them *silently* --- if you did
have a problem, figuring out what it was could require a great deal
of sleuthing.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 4.1beta1: ANYARRAY disallowed for DOMAIN types which happen to be arrays

2011-05-10 Thread Robert Haas
On Tue, May 10, 2011 at 1:52 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Mon, May 9, 2011 at 11:32 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 So we basically had three alternatives to make it better:
        * downcast to the array type, which would possibly silently
          break applications that were relying on the function result
          being considered of the domain type
        * re-apply domain checks on the function result, which would be
          a performance hit and possibly again result in unobvious
          breakage
        * explicitly break it by throwing a parse error until you
          downcast (and then upcast the function result if you want)
 I realize that #3 is a bit unpleasant, but are either of the other two
 better?  At least #3 shows you where you need to check for problems.

 Aren't any applications that would be broken by #1 broken already?

 My point is that doing #1 would break them *silently* --- if you did
 have a problem, figuring out what it was could require a great deal
 of sleuthing.

Eh, I'm confused.  Explain further?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Formatting Curmudgeons WAS: MMAP Buffers

2011-05-10 Thread Josh Berkus
All,

 Part of the trouble is in the question. Having a patch rejected is not
 really a problem; it's something you should learn from. I know it can be
 annoying. I get annoyed when it happens to me too. But I try to get over
 it as quickly as possible, and either fix the patch, or find another
 (and better) way to do the same thing, or move on. Everybody here is
 acting in good faith, and nobody's on a power trip. That's one of the
 good things about working on Postgres. If it were otherwise I would have
 moved on to something else long ago.

The problem is not that patches get rejected.  It's *how* they get
rejected, and how the submitter experiences the process of them getting
rejected.  Did they learn something from it and understand the reasons
for the rejection?  or did they experience the process as arbitrary,
frustrating, and incomprehesible?

Ideally, we want a sumbitter whose patch has been rejected to walk away
with either my proposal was rejected, and I understand why it's a bad
idea even if I don't agree, or my proposal was rejected, and I know
what needs to be done to fix it.

Of course, there are always idiots who won't learn anything no matter
how good our process is.  But if the whole submission process is
perceived to be fair and understandible, those will be a tiny minority.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] collateral benefits of a crash-safe visibility map

2011-05-10 Thread Simon Riggs
On Tue, May 10, 2011 at 6:08 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Tue, May 10, 2011 at 12:57 PM, Simon Riggs si...@2ndquadrant.com wrote:
 Hmmm, do we really need to WAL log freezing?

 That might solve the relfrozenxid problem - set the bits in the heap,
 sync the heap, then update relfrozenxid once the heap is guaranteed
 safely on disk - but it again seems problematic for Hot Standby.

 ... or even warm standby.  You basically *have to* WAL-log freezing
 before you can truncate pg_clog.  The only freedom you have here is
 freedom to mess with the policy about how soon you try to truncate
 pg_clog.

 (Doing an unlogged freeze operation first is right out, too, if it
 causes the system to fail to perform/log the operation later.)

Trying to think outside of the box from all these things we can't do.

Can we keep track of the relfrozenxid and then note when we fsync the
relevant file, then issue a single WAL record to indicate that? Still
WAL logging, but 1 record per table, not 1 record per tuple.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Process wakeups when idle and power consumption

2011-05-10 Thread Peter Geoghegan
On 10 May 2011 17:43, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:

 It should be an anonymous pipe that's inherited by the child process by
 rather than a named pipe. Otherwise seems fine to me, as far as this proof
 of concept program goes.

Alright, thanks. I'll use an anonymous pipe in the patch itself.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Collation mega-cleanups

2011-05-10 Thread Tom Lane
Ross J. Reedstrom reeds...@rice.edu writes:
 So perhaps it was more of the This code is less ready than I thought
 it was, but now that I've spent the time understanding it and the
 problem, the shortest way out is forward.

Yeah, exactly.  By the time I really understood how incomplete the
collation patch was, I'd done most of the work to fix it; and insisting
on backing it out of 9.1 didn't seem productive (even assuming that I
could have won that argument, which was by no means a given).

I'm still fairly troubled by the potential overhead in the form of extra
lookups during parse time, but have not had the time to try to measure
that.  Too bad we haven't got a performance-test farm.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Server Programming Interface underspecified in 4.1beta1

2011-05-10 Thread Kevin Grittner
J. Greg Davidson g...@ngender.net wrote:
 
 I would like to be able to program to a C or C++ SPI
 which is clean, complete and type-safe.  I am good at
 reading API documentation in C or C++ and would be happy
 to review any proposed improvements.
 
I want to second Andrew's post, and emphasize that such suggestions
are not intended to be flippant -- a lot of PostgreSQL development
(documentation included) occurs because someone feels the need for
something and goes ahead with making it happen.  If you want to put
the effort into improving the API documentation for SPI, but don't
know where to start, I'd be happy to help you get started.  It's
great when someone who benefits from PostgreSQL can scratch their
own itch and give back to the community in the process.
 
Let me know.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] the big picture for index-only scans

2011-05-10 Thread Greg Stark
On Tue, May 10, 2011 at 5:53 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 It's already the case that we'll flip over to a bitmap indexscan,
 and thus get rid of most/all of the random page accesses, in
 situations where this is likely to be a big win.  Pointing to the
 performance difference in databases that don't do that is therefore
 not too convincing.

The other major effect is row size. Many databases have very wide
rows, perhaps on the order of 1kB. So the table with a million rows
might be 8GB but the index on a few key columns might only be a few
megabytes. Even if you have to read the entire index in random order
it'll likely all be cached and scan faster than the table itself.

One problem with hanging on benchmarks is that database schema design
can actually change based on what performs well. People get in the
habit of creating indexes in Oracle that are only logical when you
realize they allow the database to do an index-only scan  because they
contain extra columns that aren't actually used in where clauses but
are typically in the select list.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] 4.1beta1: ANYARRAY disallowed for DOMAIN types which happen to be arrays

2011-05-10 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Tue, May 10, 2011 at 1:52 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Mon, May 9, 2011 at 11:32 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 So we basically had three alternatives to make it better:
* downcast to the array type, which would possibly silently
  break applications that were relying on the function result
  being considered of the domain type

 Aren't any applications that would be broken by #1 broken already?

 My point is that doing #1 would break them *silently* --- if you did
 have a problem, figuring out what it was could require a great deal
 of sleuthing.

 Eh, I'm confused.  Explain further?

The previous behavior was effectively to allow a domain-over-array type
to match the ANYARRAY symbol, without doing anything else special with
it.  In particular if the function returned ANYARRAY then its output
would be taken to be of the domain type, which is wrong since the
function might produce an array value that doesn't meet the domain's
constraints.

We could, and perhaps should, instead downcast the domain to the array
type, which would imply that ANYARRAY is matching the base type not the
domain, and in particular that a declared ANYARRAY result type means the
base type not the domain.  The things that were bothering me about this
at the time were (1) it would be a silent change of behavior, and (2)
it doesn't seem very consistent to handle domain-to-ANYARRAY matching
this way without also doing something with domain-to-ANYELEMENT
matching.

An example of the inconsistency is that something like

create domain myi as int;
select array[1,2,3] || 4::myi;

fails with operator does not exist: integer[] || myi, not only in HEAD
but all recent releases.  If we're going to downcast a domain-over-array
to plain array to allow it to be used with array_append, it's not clear
why we don't allow myi to be automatically downcast to int for the same
purpose.

However, exactly what we ought to do instead isn't entirely clear, and
when I brought it up back in October no one seemed to care enough to
pursue the matter.  So I just left both cases as throwing error, which
seemed the most conservative course.

I'm still willing to talk about alternatives, though it seems a bit
late in the 9.1 cycle to be designing behaviors.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] the big picture for index-only scans

2011-05-10 Thread Simon Riggs
On Tue, May 10, 2011 at 6:25 PM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:

 ... but I share Simon's desire to see some proof before anything
 gets committed.

 And we agree there.  In fact, I can't think of anyone in the
 community who doesn't want to see that for *any* purported
 performance enhancement.

I'm not talking about eventual commit, I'm talking about the whole
process of development.

We should be focusing on improving a measurable performance issue, not
on implementing one exact design that someone thought might help.
How will we review the patch except by measuring it against the
declared performance goal? Otherwise all the various options along the
way will just be matters of opinion, instead of measurement.

From what has been said so far, the use case for this is related to
the practice of using covered indexes, which makes me nervous
because that is an expert level tuning task on other DBMS, limiting
the range of people who get benefit. The typical speed up for
non-covered indexes will come when we access a very large table (not
in cache) via an index scan that is smaller than a bitmapindex scan.
Will we be able to gauge selectivities sufficiently accurately to be
able to pinpoint that during optimization? How will we know that the
table is not in cache? Or is this an optimisation in the executor for
a bitmapheap scan?

I'm not being negative, I'm trying to avoid arguments, blind alleys
and much wasted development if we focus on the wrong things or go to
design too early..

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Formatting Curmudgeons WAS: MMAP Buffers

2011-05-10 Thread Greg Stark
On Tue, May 10, 2011 at 6:54 PM, Josh Berkus j...@agliodbs.com wrote:
 Of course, there are always idiots who won't learn anything no matter
 how good our process is.  But if the whole submission process is
 perceived to be fair and understandible, those will be a tiny minority.

The thing is, I think things are much better now than they were three
or four years ago. At the time the project had grown much faster than
the existing stable of developers and the rate at which patches were
being submitted was much greater than they could review.

It's not perfect, Tom still spends more of his time reviewing patches
when he would probably enjoy writing fresh code -- and it's a shame if
you think about the possibilities we might be missing out on if he
were let loose. And patches still don't get a detailed HOWTO on what
needs to happen before commit. But it's better.

We need to be careful about looking at the current situation and
deciding it's not perfect so a wholesale change is needed when the
only reason it's not worse is because the current system was adopted.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Backpatching of Teach the regular expression functions to do case-insensitive matching

2011-05-10 Thread Peter Eisentraut
On mån, 2011-05-09 at 10:56 -0400, Robert Haas wrote:
 I'm just shooting from the hip here, but maybe we could have a
 separate (probably smaller) set of tests that are only designed to
 work in a limited range of locales and/or encodings.  I'm really
 pleased that we now have the src/test/isolation stuff, and I think
 some more auxilliary test suites would be quite excellent.  Even if
 people didn't always want to run every single one when doing things
 manually, the buildfarm certainly could.

Well, the result of people don't always run them is the rest of
src/test/.  How much of that stuff even works anymore?



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Formatting Curmudgeons WAS: MMAP Buffers

2011-05-10 Thread Josh Berkus

 The thing is, I think things are much better now than they were three
 or four years ago.

Oh, no question.

If you read above in this thread, I'm not really proposing a change in
the current process, just documenting the current process.  Right now
there's a gap between how sumbitters expect things to work, and how they
actually do work.

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Backpatching of Teach the regular expression functions to do case-insensitive matching

2011-05-10 Thread Peter Eisentraut
On mån, 2011-05-09 at 12:42 -0400, Tom Lane wrote:
 The problem we'd have is that there's no way (at present) to make such
 a test pass on every platform.  Windows has its own set of locale names
 (which initdb fails to install as collations anyway) and we also have
 the problem that OS X can be counted on to get UTF8 sorting wrong.
 (It might be okay for case-folding though; not sure.)  Possibly we could
 just provide an alternate expected file for OS X, but I don't see a
 decent workaround for Windows --- it would pretty much have to have its
 very own test case.

Windows =Vista has locale names similar to Linux, and my cursory
testing with some hacked up test suite indicates that it produces the
same results as the Linux expected file, modulo some error message
differences.  So I think this could be made to work, it just needs
someone to implement a few bits.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Backpatching of Teach the regular expression functions to do case-insensitive matching

2011-05-10 Thread Tom Lane
Peter Eisentraut pete...@gmx.net writes:
 On mån, 2011-05-09 at 12:42 -0400, Tom Lane wrote:
 The problem we'd have is that there's no way (at present) to make such
 a test pass on every platform.  Windows has its own set of locale names
 (which initdb fails to install as collations anyway) and we also have
 the problem that OS X can be counted on to get UTF8 sorting wrong.
 (It might be okay for case-folding though; not sure.)  Possibly we could
 just provide an alternate expected file for OS X, but I don't see a
 decent workaround for Windows --- it would pretty much have to have its
 very own test case.

 Windows =Vista has locale names similar to Linux, and my cursory
 testing with some hacked up test suite indicates that it produces the
 same results as the Linux expected file, modulo some error message
 differences.  So I think this could be made to work, it just needs
 someone to implement a few bits.

Well, that would be great, but the someone is not going to be me;
I don't do Windows.  I'd be willing to take responsibility for putting
in the regression test once the necessary Windows-specific code was
committed, though.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] the big picture for index-only scans

2011-05-10 Thread Kevin Grittner
Simon Riggs si...@2ndquadrant.com wrote:
 Kevin Grittner kevin.gritt...@wicourts.gov wrote:
 
 ... but I share Simon's desire to see some proof before anything
 gets committed.

 And we agree there.  In fact, I can't think of anyone in the
 community who doesn't want to see that for *any* purported
 performance enhancement.
 
 I'm not talking about eventual commit, I'm talking about the whole
 process of development.
 
I'm confused -- you want to see proof that the concept works well in
PostgreSQL before development effort on it begins?  Or there is some
alternative you would like to see pursued instead?  Something else?
 
 From what has been said so far, the use case for this is related
 to the practice of using covered indexes, which makes me nervous
 because that is an expert level tuning task on other DBMS
 
What?  On the versions of MS SQL Server and Sybase ASE I've used it
costs covered index plans against all the other plans automatically,
and picks this type of plan if the cost looks lower.  Sure, DBAs
sometimes add indexes, or add columns to indexes, in hopes that such
a plan will be chosen -- but what's new and different there?
 
 The typical speed up for non-covered indexes will come when we
 access a very large table (not in cache) via an index scan that is
 smaller than a bitmapindex scan. Will we be able to gauge
 selectivities sufficiently accurately to be able to pinpoint that
 during optimization? How will we know that the table is not in
 cache? Or is this an optimisation in the executor for a bitmapheap
 scan?
 
I would continue to object to using current cache contents for plan
choice because of plan instability and the fact that an odd initial
cache load could skew plans in a bad direction indefinitely.  I do
agree (and have already posted) that I think the hardest part of
this might be developing a good cost model.  I doubt that's an
insoluble problem, especially since it is something we can refine
over time as we gain experience with the edge cases.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Formatting Curmudgeons WAS: MMAP Buffers

2011-05-10 Thread Robert Haas
On Tue, May 10, 2011 at 3:09 PM, Greg Stark gsst...@mit.edu wrote:
 The thing is, I think things are much better now than they were three
 or four years ago. At the time the project had grown much faster than
 the existing stable of developers and the rate at which patches were
 being submitted was much greater than they could review.

Just in the last 2.5 years since I've been around, there have, AFAICT,
been major improvements both in the timeliness and quality of the
feedback we provide, and the quality of the patches we receive.  When
I first started reviewing, it was very common to blow through the
CommitFest application and bounce half the patches back for failure to
apply, failure to pass the regression tests, or other blindingly
obvious breakage.  That's gone down almost to nothing.  It's also
become much more common for patches to include adequate documentation
and regression tests - or at least *an effort* at documentation and
regression tests - than was formerly the case.  We still bounce things
for those reasons from time to time - generally from recurring
contributors who think for some reason that it's someone else's job to
worry about cleaning up their patch - but it's less common than it
used to be.

We still have some rough edges around the incorporation of large
patches.  But it could be so much worse.  We committed something like
six major features in a month: collations, sync rep, SSI, SQL/MED,
extensions, writeable CTEs, and a major overhaul of PL/python.  While
that's likely to delay the release a bit (and already has), and has
already produced quite a few bug reports and will produce many more
before we're done, it's still an impressive accomplishment.  I'm not
sure we could have done that even a year ago.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Backpatching of Teach the regular expression functions to do case-insensitive matching

2011-05-10 Thread Robert Haas
On Tue, May 10, 2011 at 3:09 PM, Peter Eisentraut pete...@gmx.net wrote:
 On mån, 2011-05-09 at 10:56 -0400, Robert Haas wrote:
 I'm just shooting from the hip here, but maybe we could have a
 separate (probably smaller) set of tests that are only designed to
 work in a limited range of locales and/or encodings.  I'm really
 pleased that we now have the src/test/isolation stuff, and I think
 some more auxilliary test suites would be quite excellent.  Even if
 people didn't always want to run every single one when doing things
 manually, the buildfarm certainly could.

 Well, the result of people don't always run them is the rest of
 src/test/.  How much of that stuff even works anymore?

I don't know.  But I'm not sure I see your point.  The fact that we
haven't yet succeeded in doing something doesn't mean that it's either
impossible or unimportant.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] VARIANT / ANYTYPE datatype

2011-05-10 Thread Bruce Momjian
Darren Duncan wrote:
 To follow-up, an additional feature that would be useful and resembles union 
 types is the variant where you could declare a union type first and then 
 separately other types could declare they are a member of the union.  I'm 
 talking about loosely what mixins or type-roles or interfaces etc are in 
 other 
 languages.  The most trivial example would be declaring an ENUM-alike first 
 and 
 then separately declaring the component values where the latter declare they 
 are 
 part of the ENUM, and this could make it easier to add or change ENUM values. 
 But keep in mind that this is a distinct concept from what we're otherwise 
 talking about as being union types. -- Darren Duncan

Should this be a TODO item?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] VARIANT / ANYTYPE datatype

2011-05-10 Thread Alvaro Herrera
Excerpts from Bruce Momjian's message of mar may 10 16:21:36 -0400 2011:
 Darren Duncan wrote:
  To follow-up, an additional feature that would be useful and resembles 
  union 
  types is the variant where you could declare a union type first and then 
  separately other types could declare they are a member of the union.  I'm 
  talking about loosely what mixins or type-roles or interfaces etc are in 
  other 
  languages.  The most trivial example would be declaring an ENUM-alike first 
  and 
  then separately declaring the component values where the latter declare 
  they are 
  part of the ENUM, and this could make it easier to add or change ENUM 
  values. 
  But keep in mind that this is a distinct concept from what we're otherwise 
  talking about as being union types. -- Darren Duncan
 
 Should this be a TODO item?

The general idea of C-style unions, sure.  Mixin-style stuff ... not sure.
Seems like it'd be pretty painful.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] VARIANT / ANYTYPE datatype

2011-05-10 Thread Darren Duncan

Alvaro Herrera wrote:

Excerpts from Bruce Momjian's message of mar may 10 16:21:36 -0400 2011:

Darren Duncan wrote:
To follow-up, an additional feature that would be useful and resembles union 
types is the variant where you could declare a union type first and then 
separately other types could declare they are a member of the union.  I'm 
talking about loosely what mixins or type-roles or interfaces etc are in other 
languages.  The most trivial example would be declaring an ENUM-alike first and 
then separately declaring the component values where the latter declare they are 
part of the ENUM, and this could make it easier to add or change ENUM values. 
But keep in mind that this is a distinct concept from what we're otherwise 
talking about as being union types. -- Darren Duncan

Should this be a TODO item?


The general idea of C-style unions, sure.  Mixin-style stuff ... not sure.
Seems like it'd be pretty painful.


From the perspective of users, the single greatest distinction between these 2 
kinds of unions is being closed versus being open, and that is the primary 
reason to choose one over the other.


A closed union is the C-style, where the union type declares what other types or 
values it ranges over.  The closed union is best when the union definer can 
reasonably assume that the union won't either ever or would rarely be changed, 
and in particular can assume that application or database code would have 
knowledge of the parts that it deals specially with, so it can be assumed that 
if the closed union type ever is changed then any code designed to use it may be 
changed at the same time.


A good example for a closed union would be a boolean type which just ranges over 
the two singletons false and true or an order type which ranges just over the 
three singletons decrease, same, increase.  Or a type which enumerates the 7 
days of the week, as this is unlikely to change in the life of a system.


An open union is the mixin style, where the component types declare they are 
part of the union.  The open union is best when it is likely that there would be 
either user-defined or extension-defined new types for the union to come along 
later, and we want to have code that can be generic or polymorphic for any types 
that can be used in particular ways.


Examples of open union types could be number, which all the numeric types 
compose, and so you can know say that you can use the generic numeric operators 
on values you have simply if their types compose the number union type, and it 
still works if more numeric types appear later.  Likewise, the string open union 
could include both text and blob, as both support catenation and substring 
matches or extraction, for example.


This would aid to operator overloading in a generic way, letting you use the 
same syntax for different types, but allowing types to mix is optional; eg, you 
could support add(int,int) and add(real,real) without supporting 
add(int,real) etc but the syntax add(x,y) is shared, and you do this while 
still having a strong type system; allowing the mixing is optional case-by-case.


Supporting the open union is closer to supporting ANYTYPE while the closed union 
isn't so much.


-- Darren Duncan

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why not install pgstattuple by default?

2011-05-10 Thread Bruce Momjian
Tom Lane wrote:
 Christopher Browne cbbro...@gmail.com writes:
  But people are evidently still setting packaging policies based on how
  things were back in 7.3, even though that perhaps isn't necessary
  anymore.
 
 FWIW, once you get past the client versus server distinction, I think
 most subpackaging decisions are based on either the idea that only a
 minority of people will want this, or a desire to limit how many
 dependencies are pulled in by the main package(s).  Both of those
 concerns apply to various subsets of -contrib, which means it's going
 to be hard to persuade packagers to fold -contrib into the -server
 package altogether.  Nor would you gain their approval by trying to
 pre-empt the decision.
 
 We might get somewhere by trying to identify a small set of particularly
 popular contrib modules that don't add any extra dependencies, and then
 recommending to packagers that those ones get bundled into the main
 server package.
 
  Certainly it's not a huge amount of code; less than 2MB these days.
  - % wc `dpkg -L postgresql-contrib-9.0` | tail -1
15952   67555 1770987 total
 
 Well, to add some concrete facts rather than generalities to my own post,
 here are the sizes of the built RPMs from my last build for Fedora:
 
 -rw-r--r--. 1 tgl tgl  3839458 Apr 18 10:50 postgresql-9.0.4-1.fc13.x86_64.rpm
 -rw-r--r--. 1 tgl tgl   490788 Apr 18 10:50 
 postgresql-contrib-9.0.4-1.fc13.x86_64.rpm
 -rw-r--r--. 1 tgl tgl 27337677 Apr 18 10:51 
 postgresql-debuginfo-9.0.4-1.fc13.x86_64.rpm
 -rw-r--r--. 1 tgl tgl   961660 Apr 18 10:50 
 postgresql-devel-9.0.4-1.fc13.x86_64.rpm
 -rw-r--r--. 1 tgl tgl  7569048 Apr 18 10:50 
 postgresql-docs-9.0.4-1.fc13.x86_64.rpm
 -rw-r--r--. 1 tgl tgl   246506 Apr 18 10:50 
 postgresql-libs-9.0.4-1.fc13.x86_64.rpm
 -rw-r--r--. 1 tgl tgl64940 Apr 18 10:50 
 postgresql-plperl-9.0.4-1.fc13.x86_64.rpm
 -rw-r--r--. 1 tgl tgl65776 Apr 18 10:50 
 postgresql-plpython-9.0.4-1.fc13.x86_64.rpm
 -rw-r--r--. 1 tgl tgl45941 Apr 18 10:50 
 postgresql-pltcl-9.0.4-1.fc13.x86_64.rpm
 -rw-r--r--. 1 tgl tgl  5302117 Apr 18 10:50 
 postgresql-server-9.0.4-1.fc13.x86_64.rpm
 -rw-r--r--. 1 tgl tgl  1370509 Apr 18 10:50 
 postgresql-test-9.0.4-1.fc13.x86_64.rpm
 -rw-r--r--. 1 tgl tgl  3644113 Apr 18 10:50 
 postgresql-upgrade-9.0.4-1.fc13.x86_64.rpm

Is that last one pg_upgrade?  It seems very big.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Backpatching of Teach the regular expression functions to do case-insensitive matching

2011-05-10 Thread Peter Eisentraut
On tis, 2011-05-10 at 15:17 -0400, Tom Lane wrote:
 Well, that would be great, but the someone is not going to be me;
 I don't do Windows.

Yeah, me neither.  At least not for this release.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Backpatching of Teach the regular expression functions to do case-insensitive matching

2011-05-10 Thread Peter Eisentraut
On tis, 2011-05-10 at 15:48 -0400, Robert Haas wrote:
 On Tue, May 10, 2011 at 3:09 PM, Peter Eisentraut pete...@gmx.net wrote:
  Well, the result of people don't always run them is the rest of
  src/test/.  How much of that stuff even works anymore?
 
 I don't know.  But I'm not sure I see your point.  The fact that we
 haven't yet succeeded in doing something doesn't mean that it's either
 impossible or unimportant.

Yes, but doing the same thing again in hope of different results isn't
the right thing either.

I'm all for more test suites, but we should make them as widely
accessible and accessed as possible so that they get maintained.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why not install pgstattuple by default?

2011-05-10 Thread Tom Lane
Bruce Momjian br...@momjian.us writes:
 Tom Lane wrote:
 here are the sizes of the built RPMs from my last build for Fedora:
 
 -rw-r--r--. 1 tgl tgl  3839458 Apr 18 10:50 
 postgresql-9.0.4-1.fc13.x86_64.rpm
 -rw-r--r--. 1 tgl tgl   490788 Apr 18 10:50 
 postgresql-contrib-9.0.4-1.fc13.x86_64.rpm
 -rw-r--r--. 1 tgl tgl 27337677 Apr 18 10:51 
 postgresql-debuginfo-9.0.4-1.fc13.x86_64.rpm
 -rw-r--r--. 1 tgl tgl   961660 Apr 18 10:50 
 postgresql-devel-9.0.4-1.fc13.x86_64.rpm
 -rw-r--r--. 1 tgl tgl  7569048 Apr 18 10:50 
 postgresql-docs-9.0.4-1.fc13.x86_64.rpm
 -rw-r--r--. 1 tgl tgl   246506 Apr 18 10:50 
 postgresql-libs-9.0.4-1.fc13.x86_64.rpm
 -rw-r--r--. 1 tgl tgl64940 Apr 18 10:50 
 postgresql-plperl-9.0.4-1.fc13.x86_64.rpm
 -rw-r--r--. 1 tgl tgl65776 Apr 18 10:50 
 postgresql-plpython-9.0.4-1.fc13.x86_64.rpm
 -rw-r--r--. 1 tgl tgl45941 Apr 18 10:50 
 postgresql-pltcl-9.0.4-1.fc13.x86_64.rpm
 -rw-r--r--. 1 tgl tgl  5302117 Apr 18 10:50 
 postgresql-server-9.0.4-1.fc13.x86_64.rpm
 -rw-r--r--. 1 tgl tgl  1370509 Apr 18 10:50 
 postgresql-test-9.0.4-1.fc13.x86_64.rpm
 -rw-r--r--. 1 tgl tgl  3644113 Apr 18 10:50 
 postgresql-upgrade-9.0.4-1.fc13.x86_64.rpm

 Is that last one pg_upgrade?  It seems very big.

pg_upgrade plus a supporting set of 8.4 files ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] VARIANT / ANYTYPE datatype

2011-05-10 Thread Simon Riggs
On Wed, May 4, 2011 at 5:58 PM, Alvaro Herrera alvhe...@alvh.no-ip.org wrote:

 A customer came to us with this request: a way to store any data in a
 column.  We've gone back and forth trying to determine reasonable
 implementation restrictions, safety and useful semantics for them.
 I note that this has been requested in the past:
 http://archives.postgresql.org/pgsql-general/2004-02/msg01266.php

I think its a reasonably common use case.

Would it be possible to do this with a typed hstore? Seems easier to
add something there than it would be to add the VARIANT type as
discussed here.


 both Oracle and MS-SQL have it

Do they? What types are they called?

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Infinity bsearch crash on Windows

2011-05-10 Thread Greg Smith
A 9.1Beta1 test report from Richard Broersma (and confirmed on another 
system by Mark Watson) showed up pgsql-testers this week at 
http://archives.postgresql.org/pgsql-testers/2011-05/msg0.php with 
the following test crashing his Windows server every time:


SELECT 'INFINITY'::TIMESTAMP;

That works fine for me on Linux.  Richard chased the error in the logs, 
which was a generic you can't touch that memory one, down to a full 
stack trace:  
http://archives.postgresql.org/pgsql-testers/2011-05/msg9.php


It looks like it's losing its mind inside of 
src/backend/utils/adt/datetime.c , specifically at this line in datebsearch:


   3576 while (last = base)
   3577 {
   3578 position = base + ((last - base)  1);
   3579 result = key[0] - position-token[0];

Why crash there only on Windows?  Was the problem actually introduced 
above this part of the code?  These are all questions I have no answer for.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Backpatching of Teach the regular expression functions to do case-insensitive matching

2011-05-10 Thread Tom Lane
Peter Eisentraut pete...@gmx.net writes:
 I'm all for more test suites, but we should make them as widely
 accessible and accessed as possible so that they get maintained.

Yeah.  My preference would really be to push something like
collate.linux.utf8 into the standard regression tests, but we'd
first have to get it to where there was only one .sql file and
not more than three or so variant expected files (one of which
would be the one for platforms that don't support locale_t,
analogous to the no-support expected file for the xml test).

If we were at that point, then instead of having a separate make target,
I'd be very strongly tempted to include the test in the standard tests
by the expedient of having it create and \c to a separate database with
suitable values of ENCODING, LC_COLLATE, etc.

The lack of initdb support for getting more-or-less-standard collation
entries into pg_collation on Windows seems to be the major missing piece
from here (dunno if Peter is aware of others).  If we don't fix that
before release, we're going to regret it anyway IMO, because of the
inevitable tide of questions/complaints from Windows users trying to use
the collation feature.  We've already seen at least one such from a beta
tester.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] VARIANT / ANYTYPE datatype

2011-05-10 Thread Eric McKeeth
On Tue, May 10, 2011 at 3:57 PM, Simon Riggs si...@2ndquadrant.com wrote:

 On Wed, May 4, 2011 at 5:58 PM, Alvaro Herrera alvhe...@alvh.no-ip.org
 wrote:

  A customer came to us with this request: a way to store any data in a
  column.  We've gone back and forth trying to determine reasonable
  implementation restrictions, safety and useful semantics for them.
  I note that this has been requested in the past:
  http://archives.postgresql.org/pgsql-general/2004-02/msg01266.php

 I think its a reasonably common use case.

 Would it be possible to do this with a typed hstore? Seems easier to
 add something there than it would be to add the VARIANT type as
 discussed here.


  both Oracle and MS-SQL have it

 Do they? What types are they called?

 --
  Simon Riggs   http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training  Services


MS SQL Server calls it's variant type 'sql_variant', but it's limited to a
subset of the data types they support. Basically, it can store any numeric
type, or any binary or text type with a constrained length. No timestamps,
geometry, XML, user-defined types, etc. allowed. So it's not really as much
of an any value type as it might look on the surface. Don't know any
details of Oracle's implementation.

-Eric


Re: [HACKERS] the big picture for index-only scans

2011-05-10 Thread Simon Riggs
On Tue, May 10, 2011 at 8:35 PM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 Simon Riggs si...@2ndquadrant.com wrote:
 Kevin Grittner kevin.gritt...@wicourts.gov wrote:

 ... but I share Simon's desire to see some proof before anything
 gets committed.

 And we agree there.  In fact, I can't think of anyone in the
 community who doesn't want to see that for *any* purported
 performance enhancement.

 I'm not talking about eventual commit, I'm talking about the whole
 process of development.

 I'm confused -- you want to see proof that the concept works well in
 PostgreSQL before development effort on it begins?  Or there is some
 alternative you would like to see pursued instead?  Something else?

Well, I didn't ask for that and agree it would be foolish to demand
proof ahead of development.

I know this technique is effective in other DBMS, I just want to be
certain it will be effective for us before too much work is done. We
have the additional requirement for a crash safe vacuum map that needs
to be consulted, with possible contention effects. Sybase etc can
simply avoid the logical I/O, which is always a win, in or out of
cache. So the problem is quite different for us.

What I suggested was a assessment and benefit case because we normally
start with a problem and then work out how to solve it.

Normally, others come forward with the why? when? questions and it
feels like there's a bit of groupthink going on here. This looks to me
like its being approached like it was a feature, but it looks to me
like a possible optimisation, so suggest we treat it that way.

Out of concern, I don't want you to waste time on work that *may* not
be that useful in practice, and I don't want to miss improvements or
alternatives either.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Feature proposal: distinguish each PostgreSQL instance in the event log

2011-05-10 Thread MauMau

From: Tom Lane t...@sss.pgh.pa.us

MauMau maumau...@gmail.com writes:
I've encountered one problem on Windows. I need to support running all of 
my

products on one host simultaneously. Plus, I need to log messages in
syslog/event log. On Linux, I can distinguish the messages of one product
and those of other products by setting syslog_ident in postgresql.conf. 
On
the other hand, I can't do that on Windows: all of the PostgreSQL 
instances

use the same fixed event source name PostgreSQL.



SO I'd like to propose a trivial feature which allows users to set event
source.


I'm a bit concerned by the fact that this seems to require global
actions to be taken (registering/deregistering) when a GUC value
changes.  That's going to cause headaches, not least because you cannot
assume that the value changes synchronously across all Postgres
processes.  Maybe that's only an artifact of the way you worded this and
not a real requirement ... but if it is a requirement you should think
about how to avoid it.  If you can't avoid it then most likely the event
name is going to have to be PGC_POSTMASTER, ie not changeable after
startup.  Also, what happens if we fail to deregister because of
crashing?

Is it better to have one parameter for syslog program name and event 
source,

instead of having both syslog_ident and event_source?


I don't like aliases for GUC variables, and in any case I think it'd be
confusing to force both usages to share a name that's not particularly
appropriate for either one alone.  So personally, -1 for unifying those.
Others might see it differently though.



Considering Tom's advice, I'll make a patch as follows:

1. Create a new GUC variable called event_source and don't touch anything 
with the existing syslog_ident.
2. Make event_source PGC_POSTMASTER at first because I don't think it is not 
a requirement to change the event source name while PostgreSQL server is 
active.



Also, what happens if we fail to deregister because of
crashing?


The registration/deregistration is performed separately from PostgreSQL's 
start/stop, so there is no concern about this.  Please see Registering 
eventlog on Windows on the page below:


Installation Procedure
http://www.postgresql.org/docs/9.0/static/install-procedure.html



In fact, I've almost finished writing the program patch and testing. 
However, I'd like to ask for your opinions about a few things before 
completing and submitting the patch.


1. Documentation
Currently, the event Source registration is described on the above page. 
However, many of my colleagues fail to find the article.  They use 
PostgreSQL as one of many software and don't fully read the manual.


I wonder if it would be better to put the article on the following section, 
because this is in the chapter about installation on Windows:


16.1.4. Cleaning and installing
http://www.postgresql.org/docs/9.0/static/install-windows-full.html

or:

Post-Installation Setup
http://www.postgresql.org/docs/9.0/static/install-post.html

In addition, I think the above page should be linked from the description of 
log_destination parameter.



2. pg_ctl's event logging
pg_ctl also logs to event log.  Should pg_ctl use the event_source setting? 
According to the response to the bug #6011 report, pg_ctl does not need to 
obey the postgresql.conf setting, because postgresql.conf is for the server. 
However, I'm afraid that the users claim that logging with event source 
PostgreSQL instead of the event_source setting is a bug.  For reference, 
pg_ctl uses port parameter in postgresql.conf.



Regards,
MauMau


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Infinity bsearch crash on Windows

2011-05-10 Thread Peter Geoghegan
On 10 May 2011 23:02, Greg Smith g...@2ndquadrant.com wrote:
 Why crash there only on Windows?  Was the problem actually introduced above
 this part of the code?  These are all questions I have no answer for.

I don't find it at all surprising that there's a memory corruption bug
that only manifests itself on Windows. Recently, I reported a bug in
pgAdmin that turned out to be a simple case of forgetting to allocate
an extra byte of memory for a null in a c string. The outward problem
couldn't be reproduced on Mac - it only occurred on Linux. Of course,
the problem with undefined behaviour is not that it might cause your
program to crash, but that it might not cause your program to crash.

For debug builds, Visual C++ allocates no man's land guard bytes on
either side of areas of allocated memory, which is great for catching
heap corruption bugs. My guess is that when the VC++ debugger issues a
breakpoint, that's exactly where the memory is being
corrupted/improperly dereferenced.


-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Feature proposal: distinguish each PostgreSQL instance in the event log

2011-05-10 Thread Tom Lane
MauMau maumau...@gmail.com writes:
 MauMau maumau...@gmail.com writes:
 I've encountered one problem on Windows. I need to support running all of 
 my
 products on one host simultaneously. Plus, I need to log messages in
 syslog/event log. On Linux, I can distinguish the messages of one product
 and those of other products by setting syslog_ident in postgresql.conf. 
 On
 the other hand, I can't do that on Windows: all of the PostgreSQL 
 instances
 use the same fixed event source name PostgreSQL.

 SO I'd like to propose a trivial feature which allows users to set event
 source.

BTW, what will this accomplish exactly that couldn't be accomplished by
setting log_line_prefix to include the desired identifier?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Infinity bsearch crash on Windows

2011-05-10 Thread Tom Lane
Greg Smith g...@2ndquadrant.com writes:
 A 9.1Beta1 test report from Richard Broersma (and confirmed on another 
 system by Mark Watson) showed up pgsql-testers this week at 
 http://archives.postgresql.org/pgsql-testers/2011-05/msg0.php with 
 the following test crashing his Windows server every time:

 SELECT 'INFINITY'::TIMESTAMP;

Hmm ... I bet this is related to the recent reports about ALTER USER
VALID UNTIL 'infinity' crashing on Windows.  Can the people seeing this
get through the regression tests?  Perhaps more to the point, what is
their setting of TimeZone?  What does the pg_timezone_abbrevs view show
for them?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] the big picture for index-only scans

2011-05-10 Thread Kevin Grittner
Simon Riggs si...@2ndquadrant.com wrote:
 
 Normally, others come forward with the why? when? questions and it
 feels like there's a bit of groupthink going on here. This looks
 to me like its being approached like it was a feature, but it
 looks to me like a possible optimisation, so suggest we treat it
 that way.
 
This issue has come up a great many times over the years, and there
has been much discussion around it.  The Wiki page is here:
 
http://wiki.postgresql.org/wiki/Index-only_scans
 
This currently references 11 threads on the topic.  I'd bet that by
spending a couple hours at it I could quadruple that number of
threads.  (I'd really rather not, though.)
 
The problem is that there are regular and fairly frequent complaints
on the list about queries which run slower than people expect
because the heap must be checked for visibility information when
matching index entries are found.  It has become enough of a
conspicuous issue that a lot of people are interested in seeing if
something can be done about it.  After much discussion, people are
trying to advance a plan to find an answer.  I'm sure nobody
involved would ignore any suggestion on how it might be done better;
but at this point, I don't think it's fair to suggest that this is
not being pursued in response to a real problem, or that no serious
thought has been given to direction before people started moving.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Re: [BUGS] BUG #5957: createdb with description and md5 auth forces to provide password twice

2011-05-10 Thread Bruce Momjian
Bruce Momjian wrote:
 Tom Lane wrote:
  Bruce Momjian br...@momjian.us writes:
   Prior to PG 8.2, this was necessary to put the comment on the database,
   but now that we have the shared comment/description table
   pg_shdescription, this is not necessary.
  
   Do we need createdb to be able to create databases for pre-8.2 clusters?
   If not, the attached patch fixes the double-prompting.
  
  Well, if you're only going to change this in HEAD, that might be an
  acceptable limitation, but if you intend to back-patch I think not.
  Older versions of createdb are probably significantly more likely to
  be used with even-older servers.
 
 This code has been that way since pre-8.2 so I see no need to backpatch;
 this is the first such complaint I have seen.
 
  Seems like it wouldn't be that hard to test the server version and only
  reconnect if it's pre-8.2.
 
 I am not excited about adding more code for this so I am thinking
 head-only.

Attached patch applied to head only.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/src/bin/scripts/createdb.c b/src/bin/scripts/createdb.c
new file mode 100644
index 9b72eac..544f2f6
*** a/src/bin/scripts/createdb.c
--- b/src/bin/scripts/createdb.c
*** main(int argc, char *argv[])
*** 192,197 
--- 192,202 
  
  	appendPQExpBuffer(sql, ;\n);
  
+ /*
+  * Connect to the 'postgres' database by default, except have
+  * the 'postgres' user use 'template1' so he can create the
+  * 'postgres' database.
+  */
  	conn = connectDatabase(strcmp(dbname, postgres) == 0 ? template1 : postgres,
  		   host, port, username, prompt_password, progname);
  
*** main(int argc, char *argv[])
*** 208,219 
  	}
  
  	PQclear(result);
- 	PQfinish(conn);
  
  	if (comment)
  	{
- 		conn = connectDatabase(dbname, host, port, username, prompt_password, progname);
- 
  		printfPQExpBuffer(sql, COMMENT ON DATABASE %s IS , fmtId(dbname));
  		appendStringLiteralConn(sql, comment, conn);
  		appendPQExpBuffer(sql, ;\n);
--- 213,221 
*** main(int argc, char *argv[])
*** 231,239 
  		}
  
  		PQclear(result);
- 		PQfinish(conn);
  	}
  
  	exit(0);
  }
  
--- 233,242 
  		}
  
  		PQclear(result);
  	}
  
+ 	PQfinish(conn);
+ 
  	exit(0);
  }
  
diff --git a/src/bin/scripts/dropdb.c b/src/bin/scripts/dropdb.c
new file mode 100644
index 1cf18fd..48f73ae
*** a/src/bin/scripts/dropdb.c
--- b/src/bin/scripts/dropdb.c
*** main(int argc, char *argv[])
*** 113,118 
--- 113,123 
  	appendPQExpBuffer(sql, DROP DATABASE %s;\n,
  	  fmtId(dbname));
  
+ /*
+  * Connect to the 'postgres' database by default, except have
+  * the 'postgres' user use 'template1' so he can drop the
+  * 'postgres' database.
+  */
  	conn = connectDatabase(strcmp(dbname, postgres) == 0 ? template1 : postgres,
  		   host, port, username, prompt_password, progname);
  

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] the big picture for index-only scans

2011-05-10 Thread Greg Stark
On Wed, May 11, 2011 at 12:14 AM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 The problem is that there are regular and fairly frequent complaints
 on the list about queries which run slower than people expect


To be fair about 3/4 of them were actually complaining about the lack
of some global materialized cache of the aggregate value. Covering
index-only scans are only going to be a linear speedup no matter how
large the factor it's not going to turn select count(*) into a O(1)
operation.

I support the idea of thinking of this as an optimization. But I don't
think there's much question. If we can avoid doing the i/o on the heap
that's an obvious and huge win. Sure the costs of maintaining the vm
need to be measured against the gains but it we don't know what those
costs are yet and whoever works on it will be well aware of that
balance.

On a separate note though, Simon, I don't know what you mean by we
normally start with a problem. It's an free software project and
people are free to work on whatever interests them whether that's
because it solves a problem they have, helps a client who's paying
them, or just because it's of academic interest to them. We don't
always take their patches if they aren't of general interest but
people propose all kinds of crazy experimental ideas all the time.

-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] postgresql.conf error checking strategy

2011-05-10 Thread Bruce Momjian
Robert Haas wrote:
 On Mon, May 9, 2011 at 11:10 AM, Tom Lane t...@sss.pgh.pa.us wrote:
  Robert Haas robertmh...@gmail.com writes:
  On Sun, May 8, 2011 at 1:04 AM, Tom Lane t...@sss.pgh.pa.us wrote:
  Yes, definitely. ?Perhaps summarize as rethink how we handle partially
  correct postgresql.conf files. ?Or maybe Robert sees it as rethink
  approach to making sure all backends share the same value of critical
  settings? ?Or maybe those are two different TODOs?
 
  The second is what I had in mind. ?I'm thinking that at least for
  critical GUCs we need a different mechanism for making sure everything
  stays in sync, like having the postmaster write a precompiled file and
  convincing the backends to read it in some carefully synchronized
  fashion. ?However, it's not clear to me whether something along those
  lines (or some other lines) would solve the problem you were
  complaining about; therefore it's possible, as you say, that there are
  two separate action items here. ?Or maybe not: maybe someone can come
  up with an approach that swats both problems in one go.
 
  Well, the thing that was annoying me was that because a backend saw one
  value in postgresql.conf as incorrect, it was refusing to apply any
  changes at all from postgresql.conf. ?And worse, there was no log entry
  to give any hint what was going on. ?This doesn't seem to me to have
  much to do with the problem you're on about. ?I agree it's conceivable
  that someone might think of a way to solve both issues at once, but
  I think we'd better list them as separate TODOs.
 
 OK by me.

Two TODOs added:

Allow postgresql.conf settings to be accepted by backends even if some
settings are invalid for those backends

* http://archives.postgresql.org/pgsql-hackers/2011-04/msg00330.php
* http://archives.postgresql.org/pgsql-hackers/2011-05/msg00375.php 

Incomplete itemAllow all backends to receive postgresql.conf setting
changes at the same time

* http://archives.postgresql.org/pgsql-hackers/2011-04/msg00330.php
* http://archives.postgresql.org/pgsql-hackers/2011-05/msg00375.php 

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] the big picture for index-only scans

2011-05-10 Thread Bruce Momjian
Robert Haas wrote:
  Any thoughts welcome. ?Incidentally, if anyone else feels like working
  on this, feel free to let me know and I'm happy to step away, from all
  of it or from whatever part someone else wants to tackle. ?I'm mostly
  working on this because it's something that I think we really need to
  get done, more than having a burning desire to be the one who does it.
 
  Indexonly scans are welcome!
  I believe I can help on 3 and 4, but (really) not sure for 1 and 2.
 
 Well, I have code for #1, and just need reviews, and #2 shouldn't be
 that hard, and with luck I'll twist Bruce's arm into doing it (*waves
 to Bruce*).  So #3 and #4 are the next thing to tackle.  Any thoughts
 on what/how you'd like to contribute there?

I am happy to have pg_upgrade skip upgrading visibility map files --- it
already has code to conditionally process them because they only exist
in = 8.4:

/* fsm/vm files added in PG 8.4 */
if (GET_MAJOR_VERSION(old_cluster.major_version) = 804)
{
/*
 * Copy/link any fsm and vm files, if they exist
 */

Just give the word and it will be done.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] the big picture for index-only scans

2011-05-10 Thread Bruce Momjian
Greg Stark wrote:
 On a separate note though, Simon, I don't know what you mean by we
 normally start with a problem. It's an free software project and
 people are free to work on whatever interests them whether that's
 because it solves a problem they have, helps a client who's paying
 them, or just because it's of academic interest to them. We don't
 always take their patches if they aren't of general interest but
 people propose all kinds of crazy experimental ideas all the time.

I am confused by Simon's questions too.  

Simon seems to regularly argue for adding features late in the
development cycle and backpatch things no one else thinks should be
backpatched, but he wants more research that index-only scans are going
to improve things before it is implemented?   The first is aggressive
development, the second is very conservative development --- they don't
match, so I now wonder what the motivation is since it isn't consistent.

Isn't speeding up COUNT(*) a sufficient case because it will not have to
touch the heap in many cases?  No one is going to apply this patch until
we fully understand the performance implications, just like every other
patch.  No one has suggested otherwise.

It is helpful to have people critically review all our work, but
disagreeing just for the sake of causing discussion isn't helpful, and I
have seen a lot of these discussions lately.  I am sensing a pattern.  :-(

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] the big picture for index-only scans

2011-05-10 Thread Greg Stark
On Wed, May 11, 2011 at 1:47 AM, Bruce Momjian br...@momjian.us wrote:
 Isn't speeding up COUNT(*) a sufficient case because it will not have to
 touch the heap in many cases?

Putting aside the politics questions, count(*) is an interesting case
-- it exposes some of the unanswered questions about index-only scans.

The reason select count(*) might win would be because we could pick
any index and do an index scan, relying on the visibility map to
optimize away the heap reads. This is only going to be a win if a
large fraction of the heap reads get optimized away.

It's going to be pretty tricky to determine in the optimizer a) which
index will be cheapest and b) what fraction of index tuples will point
to pages where the heap reference can be optimized away. The penalty
for guessing wrong if we use an index-only scan and it turns out to
have many pages that aren't all-visible would be pretty high.


-- 
greg

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Why not install pgstattuple by default?

2011-05-10 Thread Bruce Momjian
Tom Lane wrote:
 Bruce Momjian br...@momjian.us writes:
  Tom Lane wrote:
  here are the sizes of the built RPMs from my last build for Fedora:
  
  -rw-r--r--. 1 tgl tgl  3839458 Apr 18 10:50 
  postgresql-9.0.4-1.fc13.x86_64.rpm
  -rw-r--r--. 1 tgl tgl   490788 Apr 18 10:50 
  postgresql-contrib-9.0.4-1.fc13.x86_64.rpm
  -rw-r--r--. 1 tgl tgl 27337677 Apr 18 10:51 
  postgresql-debuginfo-9.0.4-1.fc13.x86_64.rpm
  -rw-r--r--. 1 tgl tgl   961660 Apr 18 10:50 
  postgresql-devel-9.0.4-1.fc13.x86_64.rpm
  -rw-r--r--. 1 tgl tgl  7569048 Apr 18 10:50 
  postgresql-docs-9.0.4-1.fc13.x86_64.rpm
  -rw-r--r--. 1 tgl tgl   246506 Apr 18 10:50 
  postgresql-libs-9.0.4-1.fc13.x86_64.rpm
  -rw-r--r--. 1 tgl tgl64940 Apr 18 10:50 
  postgresql-plperl-9.0.4-1.fc13.x86_64.rpm
  -rw-r--r--. 1 tgl tgl65776 Apr 18 10:50 
  postgresql-plpython-9.0.4-1.fc13.x86_64.rpm
  -rw-r--r--. 1 tgl tgl45941 Apr 18 10:50 
  postgresql-pltcl-9.0.4-1.fc13.x86_64.rpm
  -rw-r--r--. 1 tgl tgl  5302117 Apr 18 10:50 
  postgresql-server-9.0.4-1.fc13.x86_64.rpm
  -rw-r--r--. 1 tgl tgl  1370509 Apr 18 10:50 
  postgresql-test-9.0.4-1.fc13.x86_64.rpm
  -rw-r--r--. 1 tgl tgl  3644113 Apr 18 10:50 
  postgresql-upgrade-9.0.4-1.fc13.x86_64.rpm
 
  Is that last one pg_upgrade?  It seems very big.
 
 pg_upgrade plus a supporting set of 8.4 files ...

OK, where do I get to dance around that pg_upgrade is packaged in Fedora
thanks to you?  At PGCon?  LOL

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] the big picture for index-only scans

2011-05-10 Thread Bruce Momjian
Greg Stark wrote:
 On Wed, May 11, 2011 at 1:47 AM, Bruce Momjian br...@momjian.us wrote:
  Isn't speeding up COUNT(*) a sufficient case because it will not have to
  touch the heap in many cases?
 
 Putting aside the politics questions, count(*) is an interesting case
 -- it exposes some of the unanswered questions about index-only scans.
 
 The reason select count(*) might win would be because we could pick
 any index and do an index scan, relying on the visibility map to
 optimize away the heap reads. This is only going to be a win if a
 large fraction of the heap reads get optimized away.
 
 It's going to be pretty tricky to determine in the optimizer a) which
 index will be cheapest and b) what fraction of index tuples will point
 to pages where the heap reference can be optimized away. The penalty
 for guessing wrong if we use an index-only scan and it turns out to
 have many pages that aren't all-visible would be pretty high.

Yes, that is the tricky optimizer/analyze part.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] the big picture for index-only scans

2011-05-10 Thread Bruce Momjian
Robert Haas wrote:
 So, what do we need in order to find our way to index-only scans?
 
 1. The visibility map needs to be crash-safe.  The basic idea of
 index-only scans is that, instead of checking the heap to find out
 whether each tuple is visible, we first check the visibility map.  If
 the visibility map bit is set, then we know all tuples on the page are
 visible to all transactions, and therefore the tuple of interest is
 visible to our transaction.  Assuming that a significant number of
 visibility map bits are set, this should enable us to avoid a fair
 amount of I/O, especially on large tables, because the visibility map
 is roughly 8000 times smaller than the heap, and therefore far more
 practical to keep in cache.  However, before we can rely on the

FYI, because the visibility map is only one _bit_ per page, it is 8000 *
8 or 64k times smaller than the heap, e.g. one 8k page covers 64MB of
heap pages.  This is important because we rely on this compactness in
hope that the WAL logging of this information will not be burdensome.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] the big picture for index-only scans

2011-05-10 Thread Bruce Momjian
Greg Stark wrote:
 On Wed, May 11, 2011 at 1:47 AM, Bruce Momjian br...@momjian.us wrote:
  Isn't speeding up COUNT(*) a sufficient case because it will not have to
  touch the heap in many cases?
 
 Putting aside the politics questions, count(*) is an interesting case
 -- it exposes some of the unanswered questions about index-only scans.
 
 The reason select count(*) might win would be because we could pick
 any index and do an index scan, relying on the visibility map to
 optimize away the heap reads. This is only going to be a win if a
 large fraction of the heap reads get optimized away.
 
 It's going to be pretty tricky to determine in the optimizer a) which
 index will be cheapest and b) what fraction of index tuples will point

I assume the smallest non-partial index would be the cheapest index.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] PGC_S_DEFAULT is inadequate

2011-05-10 Thread Tom Lane
I believe I've sussed the reason for the recent reports of Windows
builds crashing when asked to process 'infinity'::timestamp.  It's
a bit tedious, so bear with me:

1. The immediate cause is that datebsearch() is being called with a NULL
pointer and zero count, ie, the powerup default values of timezonetktbl
and sztimezonetktbl, because InstallTimeZoneAbbrevs is never called,
because the GUC variable timezone_abbreviations is never set.  That
routine should be a bit more robust about the case, and I've already
fixed that, but the underlying failure to initialize the GUC variable
remains a problem.

2. The 9.1 change that created the issue is that I changed
pg_timezone_abbrev_initialize to use a source value of PGC_S_DEFAULT
instead of the previous, rather arbitrary choice of PGC_S_ARGV.  That
seemed like a good idea at the time because (a) it looked a lot saner in
pg_settings and (b) it wouldn't override a setting coming from the
postmaster's command line, should anyone ever try to do that (evidently
no one ever has, or at least not complained to us that it didn't work).

3. The reason it fails in Windows and nowhere else is that
write_one_nondefault_variable() ignores and doesn't write out variables
having source == PGC_S_DEFAULT.  So, even though the postmaster has
correctly set its own value of timezone_abbreviations, child processes
don't receive that setting.  You can duplicate this behavior in a
non-Windows machine if you #define EXEC_BACKEND.  Too bad it didn't
occur to me to test the GUC assign hook changes that way.  Although I
might not have found it anyway, because:

4. The problem is masked in the regression database because we create a
database-level setting of timezone_abbreviations, so that backends do
receive a value of the GUC before they are ever asked to parse any
timestamp values.  Else we would have seen this immediately in the
buildfarm.

Isn't that special?

Effectively, write_one_nondefault_variable is assuming that variables
having source == PGC_S_DEFAULT *must* have exactly their boot values.
It turns out there's another place making the same assumption, namely
the kludge in guc-file.l that tries to reset variables that were
formerly set by postgresql.conf and no longer are.  What it does is to
reset them using source == PGC_S_DEFAULT, which will override the
existing setting with the boot_val if and only if the variable currently
has source == PGC_S_DEFAULT, which it just forced to be the case for
anything previously having source == PGC_S_FILE.  So this is fine if the
current value was from the file or was the boot_val, but if we'd
overridden the boot value with a replacement default value using
PGC_S_DEFAULT, that code would cause the value to revert to the boot_val
not the replacement value.  Not desirable.

So, having recognized these two problems, I was about to knuckle under
and make the replacement default value in
pg_timezone_abbrev_initialize be assigned with source PGC_S_ENV_VAR,
which is the next step up.  That would be ugly in the sense of exposing
a confusing source value in pg_settings, but it would not have any worse
effects because there is no real environment variable from which we
might absorb a setting for timezone_abbreviations.  But wait: there's
another place that's also using PGC_S_DEFAULT like this, and that's the
assignment of client_encoding from database encoding in postinit.c.  And
for that variable, there *is* a potential assignment from an environment
variable, namely we will absorb a value from PGCLIENTENCODING if that's
set in the server environment.  (For the record, I take no position on
whether that's actually a good behavior; but it is the historical,
documented behavior and we've not had complaints about it.)  If we have
postinit.c use PGC_S_ENV_VAR for this purpose, then PGCLIENTENCODING
will stop working because it will always be overridden from the database
encoding, because that setting will be applied later with the same
priority level.

My conclusion about all this is that we really need to invent another
GucSource value falling between PGC_S_DEFAULT and PGC_S_ENV_VAR, called
perhaps PGC_S_DYNAMIC_DEFAULT, for the purpose of denoting values that
are defaults in terms of the precedence pecking order but are not simply
the hard-wired boot values.  There's no real need for clients to see the
difference, so we could have the external representation in pg_settings
be default for both, but guc.c really needs to be aware of which
settings are truly boot values and which are not.

Comments?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


  1   2   >