Re: [HACKERS] Patch queue triage

2007-05-02 Thread Pavan Deolasee

On 5/2/07, Tom Lane [EMAIL PROTECTED] wrote:




* [PATCHES] HOT Patch - Ready for review  /Pavan Deolasee/

This needs a *lot* of review.  Can we break it down into more manageable
chunks?  I'm not sure that anyone's got a full grasp of the implications
of this patch, and that's a scary thought.




Sure, we can do that. I actually did that when I posted the
incremental versions of the HOT-patch, each version implementing
the next big chunk of the code. I can reverse engineer that again.

When I do that, should I just break the patch into logical pieces without
worrying about whether each piece alone builds/works correcttly ?
Or should I try to make each piece complete ? I know the second
would be a preferred way, but it would be more work. But if that can
considerably ease review process, I would do that by all means.
In any case, there will be dependecies amongst the patches.

I am on leave today, so would start on this tomorrow.

Thanks,
Pavan


--

EnterpriseDB http://www.enterprisedb.com


Re: [HACKERS] Patch queue triage

2007-05-02 Thread Oleg Bartunov

On Tue, 1 May 2007, Tom Lane wrote:

* [HACKERS] tsearch_core patch for inclusion
 /Teodor Sigaev/

Have we resolved whether the API and the dump/restore strategy are
acceptable?  The code needs review too, but not till we've settled
on the basic question whether we like the feature set.



There were several discussion and we tried to satisfy a claims.
We added the ability of creation of FTS with one command
(just create GIN index on text attribute), which many 
people like a lot, we changed SQL syntax to be more SQL-ish and we like it.
We discussed the new FTS on two conferences in Moscow and got positive 
opinions. Also, we wrote FTSBOOK

( http://mira.sai.msu.su/~megera/pgsql/ftsdoc/ )
 in english and FTS introduction in russian
( http://www.sai.msu.su/~megera/postgres/talks/fts_pgsql_intro.html ).

For the period from the patch was announced there were  3100 unique IP, which
downloaded FTSBOOK.

Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] Heap page diagnostic functions

2007-05-02 Thread Pavan Deolasee

On 5/2/07, Heikki Linnakangas [EMAIL PROTECTED] wrote:




I'm going to go with pgdiagnostics. We could short it to just pgdiag,
but that feels too short :). We could make it pgdiagfuncs, but that's
not much shorter than pgdiagnostics.




Just to add more confusion :-), how about pginspect ?

Thanks,
Pavan

--

EnterpriseDB http://www.enterprisedb.com


[HACKERS] Patch queue triage

2007-05-02 Thread Pavel Stehule

Hello Tom,

All what you wrote is true. plpgpsm copies-and-pastes about 30% of code and 
It is terrible for long run. But when I can change it? Writing differnt 
runtime is nonsense, better way is refactoring plpgsql and then sharing code 
with its. It's not propable in 8.4 ..  there will by lot of important 
patches from 8.3, and it's mean so interpret wll not be in core before two 
years. All your last patches I merged in one day.


In next months plpgpsm can follow plpgsql, or else. plpgpsm can be 
experimental and can be used for integration into core and creating SQL 
procedural language API in 8.5 (and plpgsql will be in 8.4, 8.5 without 
changes) and in 8.6 plpgsql will be modified to use this API. This road 
expect stable plpgsql for next two, three years. plpgpsm can solve some 
questions about future plpgsql. It contains some others construct which is 
foreign in plpgsql and plpgsql can be in Oracle's style forever (with 
David's patch Oracle collections are possible).


Bigger problem for plpgpsm isn't runtime but users. It needs bigger discuss 
about integration into core, and it isn't possible without integration into 
core. Current API can be dismissed in others API. With variable API we can 
drop variables substitution in SQL, FAST SQL call can be part of SPI. But 
all needs time. From plpgsql view simple change of caching was big patch. I 
will be happy if 8.4 will contains true session variables. It can be used in 
SQL languages later. I afraid so all these steps needs long time.


plpgpsm is ready. It's patch without dependencies and has not influence to 
other parts of postgresql. I am working on documentation now. Czech version 
is completed, waiting for translation to english.


Regards
Pavel Stehule



* [PATCHES] plpgpsm  /Pavel Stehule/

I think this has to be held for 8.4: it was submitted too late for the 8.3
feature deadline, and in fact I don't recall that there was any community
discussion/consensus on accepting it into core at all.  Another big
problem is that it largely copies-and-pastes the plpgsql code, which I
think is an unacceptable maintenance burden in the long run.  We need to
consider whether we can't refactor to avoid code duplication.


_
Najdete si svou lasku a nove pratele na Match.com. http://www.msn.cz/


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] Feature freeze progress report

2007-05-02 Thread Magnus Hagander
On Tue, May 01, 2007 at 07:09:24PM -0400, Bruce Momjian wrote:
  The current patch-queue process is failing to scale with the project: every 
  release it gets to be more work for you  Tom to integrate the patches.  We 
  need to think of new approaches to make the review process scale.  As a 
  pointed example, you're about to go on tour for 2 weeks and patch review 
  will 
  stall while you're gone.  That's not sustainable.
 
 I am traveling --- I left on Friday.  I am in Sydney now.
 
 As far as scaling, patch information isn't our problem right now.  If
 someone wants to help we can give them up-to-date information on exactly
 how to deal with the patch.  It is people doing the work that is the
 bottleneck.

snip

 As an example, how is patch information going to help us review HOT or
 group-item-index?  There is frankly more information about these in the
 archives than someone could reasonable read.  What someone needs is a
 summary of where we are now on the patches, and lots of time.
 
 FYI, Tom, Heikki, I need one of you to post the list of patches and
 where we think we are on each one, even if the list is imperfect.

I think you just contradicted yourself. Information isn ot the problem, but
you need more information...

I think this is a fairly clear indication that we do need a better way to
track this information.

//Magnus


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[HACKERS] Getting parse Error at or near character and

2007-05-02 Thread friend 4ever
Hi,
 
 I am getting the parse error while i try to execute a simple sql query in 
postgres.
 
 java.sql.SQLException: ERROR:  parser: parse error at or near and at 
character 58
 
  The Query has been changed  and it is very much like the below one
 
 select * from emp where empName like 'X' and empId=206
 
 when I try to execute the same query in the postgresql console it is working 
fine but when i try to execute the same as a prepared statement from my java 
code it is giving the java.sql.SQLException like the above.
 
 Do Anyone have the answer for this problem
 
 Expecting a Quick response.
 
 Here is my code:
 String getEmpId=SQLQueriesHandler.getQuery(eis.toGetEmpId);
 psmt=conn.prepareStatement(getEmpId);
 psmt.setString(1,empForm.getEmpName());
 psmt.setInt(2,empForm.getDeptId());
 System.out.println(The Query to empId with Parameters is : +psmt.toString());
 rs=psmt.executeQuery(getEmpId);
 
 where the query will be like
 
 select empId from empTab where empName like 'XXX' and deptId=13
 
 And in the above code empForm is the  Struts ActionForm object and getters are 
the form beans.
 If i try to execute that as a statement it is executing well.
 and also if i try to execute that query as a pprepare statement in the code 
itself it is executing well.
 The only problem is when i try to get the query from the properties file 
eg:query.properties
 
 The variable names are changed keeping the code as it is.

Plz reply back soon
 
 Thanks in Advance.
 
 Regards,
 urmyfriend  
   
-
Ahhh...imagining that irresistible new car smell?
 Check outnew cars at Yahoo! Autos.

Re: [HACKERS] Fwd: [PATCHES] Preliminary GSSAPI Patches

2007-05-02 Thread Dave Page

Magnus Hagander wrote:

Also, last I checked OpenSSL didn't ship with Windows and Kerberos
encryption did.

How long ago did you check? I've been using OpenSSL on windows for many
years. Actually, it was supported just fine on Windows back when it was
added to PostgreSQL *at least*.
I didn't say *available for download*, I said *ship with*.  That is, does a 
Windows Vista Pro box from the factory come with OpenSSL on it?  It does 
come with Microsoft SSPI, although I don't know compatibility issues.


No, of course not. Microsoft OSes don't ship with *any* third party
software. So yeah, didn't get what you meant, and you do have a point
there. Provided the SSPI stuff actually does gssapi encryption - but
I'll trust the people who say it does. I've only ever used the
authentication parts myself.


I think it's largely irrelevant though - I can't see us shipping a 
non-ssl enabled build now (thats not to say we wouldn't add SSPI support 
of course).


/D

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] Getting parse Error at or near character and

2007-05-02 Thread Richard Huxton

friend 4ever wrote:

Hi,
 
 I am getting the parse error while i try to execute a simple sql query in postgres.


This isn't a question for the hackers list.

Try the general, or jdbc lists.

--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] Getting parse Error at or near character and

2007-05-02 Thread Richard Huxton

friend 4ever wrote:

Hi,
 
 I am getting the parse error while i try to execute a simple sql query in postgres.
 
 java.sql.SQLException: ERROR:  parser: parse error at or near and at character 58
 
  The Query has been changed  and it is very much like the below one


Oh, and the quickest way to find out what the error is would be to 
provide the *actual* query, not something very much like it. Perhaps 
turn statement logging on in your postgresql.conf if it isn't already.


--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] Patch queue triage

2007-05-02 Thread FAST PostgreSQL

* [PATCHES] Updateable cursors patch  /FAST PostgreSQL/

This is incomplete, and I fear at this point has to be held over to 8.4.



It is true that my original patch post said that I need to modify the 
patch to work with tidscan. Since then I have realized that this 
modification is not needed as it would have the same result as the 
'branching out from sequential scan' solution currently implemented.


I was hoping that I could discuss this with whoever picks up the patch 
for review before doing modifications if any is needed. So in my humble 
opinion, it would be great if this can be considered for 8.3 as there 
are not many modifications needed.


P.S. Only Simon commented on my original patch. Simon, do you have time 
to revisit the patch so that we could discuss this?


Rgds,
Arul Shaji



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Feature freeze progress report

2007-05-02 Thread Heikki Linnakangas

Tom Lane wrote:

Josh Berkus [EMAIL PROTECTED] writes:
Actually, that can happen with the current system. The real blocker there is 
that some people, particularly Tom, work so fast that there's no chance for a
new reviewer to tackle the easy stuff.  Maybe the real solution is to 
encourage some of our other contributors to get their feet wet with easy 
patches so that they can help with the big ones later on?


Yeah, I hear what you say.  This is particularly a problem for small bug
fixes: I tend to zing small bugs quickly, first because I enjoy finding/
fixing them and second because I worry that they'll fall off the radar
screen if not fixed.  But I am well aware that fixing those sorts of
issues is a great way to learn your way around the code (I think that's
largely how I learned whatever I know about Postgres).  I'd be more
willing to stand aside and let someone else do it if I had confidence
that issues wouldn't get forgotten.  So in a roundabout way we come back
to the idea that we need a bug tracker (NOT a patch tracker), plus
people putting in the effort to make sure it stays a valid source
of up-to-date info.  Without the latter it won't really be useful.


A great way to learn would be to look at the patches in the queue, and 
find bugs in them. There's a lot more bugs to be found in submitted 
patches than in PostgreSQL itself. A patch tracker would help with that.


I'm in favor of some kind of a patch tracker. It doesn't need to be too 
fancy, but if for each patch we had:


Patch name: Kitchen sink addition to planner
Latest patch:   kitchen-sink-v123.patch, click to download
Summary:	Adds a kitchen-sink node type to the planner to enable liquid 
queries.
Status:		Will be rejected unless race conditions are fixed. Needs 
performance testing.

Discussions:links to mail threads, like in the current patch queue

That wouldn't necessarily help committers directly, but it'd give more 
visibility to the patches. That would encourage more people to review 
and test patches. And it'd make it clear what the status of all the 
patches are to anyone who's interested.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] Patch queue triage

2007-05-02 Thread Gregory Stark
Pavan Deolasee [EMAIL PROTECTED] writes:

 On 5/2/07, Tom Lane [EMAIL PROTECTED] wrote:

 This needs a *lot* of review.  Can we break it down into more manageable
 chunks?  

 Sure, we can do that. I actually did that when I posted the
 incremental versions of the HOT-patch, each version implementing
 the next big chunk of the code. I can reverse engineer that again.

Can we? I mean, sure you can break the patch up into chunks which might make
it easier to read, but are any of the chunks useful alone?

I suppose inserting HOT tuples without index maintenance is useful even if no
changes to the space allocation is made is useful. It won't get the space
usage but it would save on index thrashing. But that still implies all the
code to handle scans, updates, index builds, etc. Those chunks could be
separated out but you can't commit without them.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Patch queue triage

2007-05-02 Thread Gregory Stark
Tom Lane [EMAIL PROTECTED] writes:

 Bruce Momjian [EMAIL PROTECTED] writes:
 FYI, Tom, Heikki, I need one of you to post the list of patches and
 where we think we are on each one, even if the list is imperfect.

 This message is an attempt to sort out which patch queue entries have
 no hope of getting into 8.3 (and so we shouldn't spend more time on them
 right now), which ones can get in but are waiting on their authors for
 more work, and which ones are ready for immediate review.

Thanks for this. This is exactly what we've been missing recently I think.

 * [PATCHES] non-recursive WITH clause support 
   /Gregory Stark/

 I think the consensus is that we should wait for a more complete WITH
 implementation to be submitted, since this one creates compatibility
 issues without any real return in the form of added functionality.

I submitted it because it filled in a missing ANSI feature but nobody's piped
up saying they missed that feature so, sure.

 * [PATCHES] Concurrent psql v4 [WIP]  /stark/

 This is waiting on the author to change the API per list discussions; we
 can't apply something that is about to have some commands removed ...

I did finish the api changes -- though I'm not too happy with the names. I was
expecting the list to play the name game so I just put in placeholder names
originally. I'm adding documentation and example regression tests now. 

Also I'm trying to fix the cursor-mode FETCH_COUNT support which it breaks.
I'm thinking of once the first batch of rows arrives just going into a
synchronous function to fetch the rest of the resultsets.

 * Re: [HACKERS] Modifying TOAST thresholds  /Tom Lane/

 At this point it seems nothing will be done about this issue for 8.3.

I'm not sure anyone has an idea how to test it. TPCC isn't really useful
because it has a fixed size (500 byte) string buffer. Perhaps if we modified
it to have a random string length uniformly distributed between 0-2k ? But
even then it never does any scans based on that buffer. But the problem with
going with something more natural is that it'll be harder to tell exactly what
it's testing.

 * [HACKERS] tsearch_core patch for inclusion 
   /Teodor Sigaev/

 Have we resolved whether the API and the dump/restore strategy are
 acceptable?  The code needs review too, but not till we've settled
 on the basic question whether we like the feature set.

Personally I actually do like the idea of promoting tsearch to first-class
citizen by giving it keywords and a place in the grammar. I think it's a more
fully integrated interface than the function based one. The only reason I
might think otherwise was if it was just a crutch for missing features it had
exposed that would be better implemented more generically. But I don't think
that's the case.

 * Re: [PATCHES] [Fwd: Deferred Transactions, Transaction Guarantee
   and COMMITwithout waiting]  /Simon Riggs/

 Simon is on the hook to submit an updated patch.  I hope this one
 makes it in, as it looks like a really nice performance improvement
 for those who can use it; but the original patch seems overcomplicated.

I know Simon's really busy. I might be able to help with it if he wants.

 * Re: [PATCHES] LIMIT/SORT optimization  /Gregory Stark/
 * [PATCHES] updated SORT/LIMIT patch  /Gregory Stark/

 I will look at this.  I recall being concerned about whether there
 wasn't a cleaner way to introduce the required inter-node communication.

The next big thing to keep in mind in this area is a unique sort which would
need to know if there's a Unique node above it with the same key. If the
resulting inter-node communication arrangement has your blessing and can
handle that as well then I'll probably do that for 8.4.

Incidentally I would prefer it if you want to make changes that you explain
the changes to me and let me make them. It gives me a better chance to
understand what the changes really are and the motivation than trying to read
a patch later and understand why you made the changes you did. I understand
sometimes it's easier to just write the code than to explain the idea to
someone else and then review the resulting code though and there's already
enough work your plate so if that's the case then so be it.


-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Heap page diagnostic functions

2007-05-02 Thread Zeugswetter Andreas ADI SD

  Any suggestions? pgdiagnostics?
 
 Yes, I like diagnostics, or internals.  I just think 
 forensics isn't going to be understood by the average native 
 English speaker, let alone non-English speakers.

I think forensics is ok. The world is currently beeing swamped with
related tv shows :-)

Andreas

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Fwd: [PATCHES] Preliminary GSSAPI Patches

2007-05-02 Thread Magnus Hagander
On Tue, May 01, 2007 at 04:26:13PM -0700, Henry B. Hotz wrote:
 
 On May 1, 2007, at 3:11 PM, Magnus Hagander wrote:
 
 Also, last I checked OpenSSL didn't ship with Windows and Kerberos
 encryption did.
 How long ago did you check? I've been using OpenSSL on windows  
 for many
 years. Actually, it was supported just fine on Windows back when  
 it was
 added to PostgreSQL *at least*.
 
 I didn't say *available for download*, I said *ship with*.  That  
 is, does a
 Windows Vista Pro box from the factory come with OpenSSL on it?   
 It does
 come with Microsoft SSPI, although I don't know compatibility issues.
 
 No, of course not. Microsoft OSes don't ship with *any* third party
 software. So yeah, didn't get what you meant, and you do have a point
 there. Provided the SSPI stuff actually does gssapi encryption - but
 I'll trust the people who say it does. I've only ever used the
 authentication parts myself.
 
 The SSPI has encryption and integrity functions, just like the  
 GSSAPI.  I don't remember Jeffrey Altman's interop example code well  
 enough to say if he demonstrates that they interoperate as well.   
 Spending 5 seconds looking at it, the SSPI appears to make a  
 distinction between message and stream encryption that the GSSAPI  
 does not make, so there is at least some profiling needed to identify  
 what's common.  I suspect that interoperability was intended.  If we  
 find bugs and tell the right people Microsoft might even fix them  
 someday.

Ok. Well, that's for later.


 As to the question of GSSAPI vs SSL, I would never argue we don't  
 want both.
 
 Part of what made the GSSAPI encryption mods difficult was my intent  
 to insert them above the SSL encryption/buffering layer.  That way  
 you could double-encrypt the channel.  Since GSSAPI and SSL are  
 (probably, not necessarily) referenced to completely different ID  
 infrastructure there are scenarios where that's beneficial.

We might want to consider restructuring how SSL works when we do, that
might make it easier. The way it is now with #ifdefs all around can lead to
a horrible mess if there are too many different things to choose from.
Something like transport filters or whatever might be a way to do it. I
recall having looked at that at some point, but it was too long ago to
remember any details..

//Magnus


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Patch queue triage

2007-05-02 Thread Magnus Hagander
On Tue, May 01, 2007 at 10:44:38PM -0400, Tom Lane wrote:

 * [PATCHES] Preliminary GSSAPI Patches  /Henry B. Hotz/
 
 Magnus is reviewing this one.

Check.

 * [PATCHES] Clear up strxfrm() in UTF-8 with locale on Windows
/ITAGAKI Takahiro/
 
 Someone else needs to look at this; I can't test it.  Magnus?

Yup, it's on my list.


//Magnus


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Feature freeze progress report

2007-05-02 Thread Bruce Momjian
Magnus Hagander wrote:
  As an example, how is patch information going to help us review HOT or
  group-item-index?  There is frankly more information about these in the
  archives than someone could reasonable read.  What someone needs is a
  summary of where we are now on the patches, and lots of time.
  
  FYI, Tom, Heikki, I need one of you to post the list of patches and
  where we think we are on each one, even if the list is imperfect.
 
 I think you just contradicted yourself. Information isn ot the problem, but
 you need more information...
 
 I think this is a fairly clear indication that we do need a better way to
 track this information.

No, my point is that 100% information is already available by looking at
email archives.  What we need is a short description of where we are on
each patch --- that is a manual process, not something that can be
automated.

Tom has posted it --- tell me how we will get such a list in an
automated manner.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Feature freeze progress report

2007-05-02 Thread Bruce Momjian
Josh Berkus wrote:
 Bruce,
 
  As an example, how is patch information going to help us review HOT or
  group-item-index?  There is frankly more information about these in the
  archives than someone could reasonable read.  What someone needs is a
  summary of where we are now on the patches, and lots of time.
 
 The idea is to provide ways for other people to help where they can and to 
 provide better feedback to patch submitters so that they fix their own issues 
 faster.  Also, lesser PostgreSQL hackers than you could take on reviewing the 
 small patches, leaving you to devote all of your attention to the big 
 patches.
 
 Actually, that can happen with the current system. The real blocker there is 
 that some people, particularly Tom, work so fast that there's no chance for a 
 new reviewer to tackle the easy stuff.  Maybe the real solution is to 
 encourage some of our other contributors to get their feet wet with easy 
 patches so that they can help with the big ones later on?
 
 That is, if the problem is people and not tools, then what are we doing to 
 train up the people we need?

We seem to handle trivial patches just fine.  The current problem is
that the remaining patches require domain or subsystem-specific
knowledge to apply, e.g. XML or WAL, and those skills are available in a
limited number of people.  If I had the expertise in those areas, I
would have applied the patches already.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Getting parse Error at or near character and

2007-05-02 Thread Albe Laurenz
 Expecting a Quick response.

Don't hold your breath.
Please post questions like this to the JDBC and/or the 'general' mailing
list.

Yours,
Laurenz Albe

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Patch queue triage

2007-05-02 Thread Bruce Momjian
Gregory Stark wrote:
 Tom Lane [EMAIL PROTECTED] writes:
 
  Bruce Momjian [EMAIL PROTECTED] writes:
  FYI, Tom, Heikki, I need one of you to post the list of patches and
  where we think we are on each one, even if the list is imperfect.
 
  This message is an attempt to sort out which patch queue entries have
  no hope of getting into 8.3 (and so we shouldn't spend more time on them
  right now), which ones can get in but are waiting on their authors for
  more work, and which ones are ready for immediate review.
 
 Thanks for this. This is exactly what we've been missing recently I think.

100% agree.

  * Re: [HACKERS] Modifying TOAST thresholds  /Tom Lane/
 
  At this point it seems nothing will be done about this issue for 8.3.
 
 I'm not sure anyone has an idea how to test it. TPCC isn't really useful
 because it has a fixed size (500 byte) string buffer. Perhaps if we modified
 it to have a random string length uniformly distributed between 0-2k ? But
 even then it never does any scans based on that buffer. But the problem with
 going with something more natural is that it'll be harder to tell exactly what
 it's testing.

My idea on this was to create two backends, one with the default TOAST
value, and a second with a value of 50 bytes.  Create a table with one
TEXT field, and several other columns, each column  50 bytes.

Then, fill the table with random data (script attached that might help),
and the try 2000, 1500, 1000, etc, bytes in the TEXT column for each row
(use random data so the compression code doesn't shrink it).  Then run a
test with both backends acessing the TEXT column and non-TEXT column and
measure the difference between the two backends, i.e. the backend with a
TOAST value of 50 should show faster access on the non-TEXT field, but
slower access on the TEXT field.

Then, figure out where the gains on the non-TEXT field seem to diminish
in usefulness.  Basically, with a lower TOAST value, we are going to
spend more time accessing the TEXT field, but the speedup for the
non-TEXT field should be large enough win that we don't care. As the
TEXT column becomes shorter, it has less affect on the non-TEXT access.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +
:

ROWS=10
COLSIZE=2500

echo 
DROP TABLE test;
CREATE TABLE test(i SERIAL, t text);
INSERT INTO test (t)
SELECT  array_to_string(ARRAY(
SELECT chr(32 + (95 * random())::integer)
FROM generate_series(1,$COLSIZE)),'')
FROMgenerate_series(1, $ROWS);
SELECT  pg_relation_size('test') AS HEAP,
pg_total_relation_size('test') - pg_relation_size('test') AS 
TOAST;
SET log_min_duration_statement = 0;
SET client_min_messages = 'log';
SELECT t FROM test WHERE i = 234329823; | 
sql test

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Heap page diagnostic functions

2007-05-02 Thread Bruce Momjian
Pavan Deolasee wrote:
 On 5/2/07, Heikki Linnakangas [EMAIL PROTECTED] wrote:
 
 
 
  I'm going to go with pgdiagnostics. We could short it to just pgdiag,
  but that feels too short :). We could make it pgdiagfuncs, but that's
  not much shorter than pgdiagnostics.
 
 
 
 Just to add more confusion :-), how about pginspect ?

I like it.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Heap page diagnostic functions

2007-05-02 Thread Zdenek Kotala

Heikki Linnakangas wrote:

I'm going to go with pgdiagnostics. We could short it to just pgdiag, 
but that feels too short :). We could make it pgdiagfuncs, but that's 
not much shorter than pgdiagnostics.


pgdiagfn? It is only 8 chars length ;).


Zdenek

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] Patch queue triage

2007-05-02 Thread Gregory Stark

Bruce Momjian [EMAIL PROTECTED] writes:

 Then, figure out where the gains on the non-TEXT field seem to diminish
 in usefulness.  Basically, with a lower TOAST value, we are going to
 spend more time accessing the TEXT field, but the speedup for the
 non-TEXT field should be large enough win that we don't care. As the
 TEXT column becomes shorter, it has less affect on the non-TEXT access.

I guess the key is to break down what it is we want to measure into several
parts. These can each be measured precisely for various sized TOASTed data.

Costs:

1) cost of retrieving data from TOAST pointer versus retrieving data from
   inline tuple. We just want the absolute time difference between the two
   operations, not the percentage difference.

2) cost of creating TOAST pointer (ie, inserting a new tuple with a TOAST
   pointer or updating a previously inlined tuple to have a TOASTed column). 

3) cost of deleting a TOAST pointer (ie, deleting a tuple or updating a tuple
   to no longer have a TOASTed column)

3) cost of deleting a tuple with an existing TOAST pointer (or updating a
   tuple to be all inlined) versus deleting an plain tuple or updating a plain
   tuple.

Savings:

1) time savings accessing a tuple without retrieving the TOAST pointer versus
   having to access the tuple with the data inlined.

2) time savings updating a tuple without modifying toasted data versus
   updating same tuple with the data inlined in both versions.

The plan you described would be testing costs 1 and savings 1 but I think we
need to continue to the others as well.

Then the trick is to somehow make some argument about the frequency of the
various operations and the acceptable tradeoff. I think you're right that the
time spent accessing the data would be the most important metric.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Feature freeze progress report

2007-05-02 Thread Gregory Stark

Bruce Momjian [EMAIL PROTECTED] writes:

 We seem to handle trivial patches just fine.  

You keep saying that but I think it's wrong. There are trivial patches that
were submitted last year that are still sitting in the queue.

In fact I claim we handle complex patches better than trivial ones. HOT, LDC,
DSM etc receive tons of feedback and acquire a momentum of their own.
Admittedly GII is a counter-example though.

On the other hand trivial patches tend to interest relatively few people and
have little urgency.

 The current problem is that the remaining patches require domain or
 subsystem-specific knowledge to apply, e.g. XML or WAL, and those skills are
 available in a limited number of people. If I had the expertise in those
 areas, I would have applied the patches already.

Well, I claim it's often the trivial patches that require the domain-specific
knowledge you describe. If they were major patches they would touch more parts
of the system. But that means they should be easy to commit if you could just
fill in the missing knowledge.

Could you pick a non-committer with the domain-specific knowledge you think a
patch needs and ask for their analysis of the patch then commit it yourself?
You can still review it for general code quality and trust the non-committer's
review of whether the domain-specific change is correct.
 
-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Heap page diagnostic functions

2007-05-02 Thread Gregory Stark

Tom Lane [EMAIL PROTECTED] writes:

 Hmm ... the Oxford English Dictionary defines forensic as pertaining
 to, connected with, or used in courts of law.  There are also some
 senses related to argumentation, but nothing specifically about evidence
 analysis, whether after-the-fact or not.  So yeah, it doesn't seem like
 a good name for these functions anyhow.

Zeugswetter Andreas ADI SD [EMAIL PROTECTED] writes:

 I think forensics is ok. The world is currently beeing swamped with
 related tv shows :-)

Indeed, it's the only proprosal so far with connotation of the right level of
detail that the functions about the data. Police forensics laboratories spend
their time picking up fibres and laboriously fingerprinting every surface
which is about what it feels like to dig through every tuple of a heap page.

Incidentally, Tom, were you consulting the dead-tree edition of the OED? In
which case you would have naturally not seen this in 1993 additions:

ADDITIONS SERIES 1993

forensic, a. and n.

Add: [B.] 2. ellipt. use of the adj. A forensic science department,
laboratory, etc. colloq. 1963 Guardian 2 Sept. 8/5 When a police officer
hisses in my ear in court, `Are you from forensic?' I no longer protest. I
just weakly nod my head. 1971 W. J. BURLEY Guilt Edged iv. 74 A breakdown
truck is taking it to Division. Forensic can look at it in their garage.
1991 J. NEEL Death of Partner vi. 83 `Forensic rang,' Bruce reported
dourly. `The autopsy report is on its way.'

Though it seems like a poor definition. My English teachers would have chided
me for using the word in its definition...

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Feature freeze progress report

2007-05-02 Thread Bruce Momjian
Gregory Stark wrote:
 
 Bruce Momjian [EMAIL PROTECTED] writes:
 
  We seem to handle trivial patches just fine.  
 
 You keep saying that but I think it's wrong. There are trivial patches that
 were submitted last year that are still sitting in the queue.

You seem to be looking at something different than me.  Which patches?

 In fact I claim we handle complex patches better than trivial ones. HOT, LDC,
 DSM etc receive tons of feedback and acquire a momentum of their own.
 Admittedly GII is a counter-example though.
 
 Well, I claim it's often the trivial patches that require the domain-specific
 knowledge you describe. If they were major patches they would touch more parts
 of the system. But that means they should be easy to commit if you could just
 fill in the missing knowledge.
 
 Could you pick a non-committer with the domain-specific knowledge you think a
 patch needs and ask for their analysis of the patch then commit it yourself?
 You can still review it for general code quality and trust the non-committer's
 review of whether the domain-specific change is correct.

We are already pushing out patches to people with domain-specific
knowledge.  Tom posted that summary today.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Patch queue triage

2007-05-02 Thread Bruce Momjian

Another complexity is testing cases where the table fits in shared
memory/RAM, and those that don't, and testing cases where heap fits in
RAM only because we pushed things to TOAST, and cases where we have to
do lots of TOAST access that doesn't fit in RAM.  I wonder if it is even
worth testing it because pushing more to TOAST probably means the more
frequently accessed data is in RAM.

Anyway, how is going to run these tests?

---

Gregory Stark wrote:
 
 Bruce Momjian [EMAIL PROTECTED] writes:
 
  Then, figure out where the gains on the non-TEXT field seem to diminish
  in usefulness.  Basically, with a lower TOAST value, we are going to
  spend more time accessing the TEXT field, but the speedup for the
  non-TEXT field should be large enough win that we don't care. As the
  TEXT column becomes shorter, it has less affect on the non-TEXT access.
 
 I guess the key is to break down what it is we want to measure into several
 parts. These can each be measured precisely for various sized TOASTed data.
 
 Costs:
 
 1) cost of retrieving data from TOAST pointer versus retrieving data from
inline tuple. We just want the absolute time difference between the two
operations, not the percentage difference.
 
 2) cost of creating TOAST pointer (ie, inserting a new tuple with a TOAST
pointer or updating a previously inlined tuple to have a TOASTed column). 
 
 3) cost of deleting a TOAST pointer (ie, deleting a tuple or updating a tuple
to no longer have a TOASTed column)
 
 3) cost of deleting a tuple with an existing TOAST pointer (or updating a
tuple to be all inlined) versus deleting an plain tuple or updating a plain
tuple.
 
 Savings:
 
 1) time savings accessing a tuple without retrieving the TOAST pointer versus
having to access the tuple with the data inlined.
 
 2) time savings updating a tuple without modifying toasted data versus
updating same tuple with the data inlined in both versions.
 
 The plan you described would be testing costs 1 and savings 1 but I think we
 need to continue to the others as well.
 
 Then the trick is to somehow make some argument about the frequency of the
 various operations and the acceptable tradeoff. I think you're right that the
 time spent accessing the data would be the most important metric.
 
 -- 
   Gregory Stark
   EnterpriseDB  http://www.enterprisedb.com
 
 
 ---(end of broadcast)---
 TIP 6: explain analyze is your friend

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Patch queue triage

2007-05-02 Thread Andrew Dunstan



Tom Lane wrote:

Bruce Momjian [EMAIL PROTECTED] writes:
  

FYI, Tom, Heikki, I need one of you to post the list of patches and
where we think we are on each one, even if the list is imperfect.



This message is an attempt to sort out which patch queue entries have
no hope of getting into 8.3 (and so we shouldn't spend more time on them
right now), which ones can get in but are waiting on their authors for
more work, and which ones are ready for immediate review.
  


Excellent list.

I have a little time available now, so I'll work on some, starting with 
trying to complete arrays of complex types.


cheers

andrew


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Feature freeze progress report

2007-05-02 Thread Andrew Dunstan



Naz Gassiep wrote:

Andrew Dunstan wrote:
  

Naz Gassiep wrote:


I believe the suggestion was to have an automated process that only ran
on known, sane patches.
  

How do we know in advance of reviewing them that they are sane?

Same way as happens now. 
  


The question was rhetorical ... there is no list of certified sane but 
unapplied patches. You are proceeding on the basis of a faulty 
understanding of how our processes work.


cheers

andrew

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] Feature freeze progress report

2007-05-02 Thread Andrew Dunstan



Tom Lane wrote:

So in a roundabout way we come back
to the idea that we need a bug tracker (NOT a patch tracker), plus
people putting in the effort to make sure it stays a valid source
of up-to-date info.  Without the latter it won't really be useful.


  


Hallelujah Brother!

BTW, a bug tracker can be used as a patch tracker, although the reverse 
isn't true. For example, the BZ people use BZ that way, in fact - most 
patches arrive as attachments to bugs. And trackers can be used just as 
well for tracking features as well as bugs.


cheers

andrew



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] strange buildfarm failures

2007-05-02 Thread Alvaro Herrera
Tom Lane wrote:
 I wrote:
  Hmm ... I was about to say that the postmaster never sets
  PG_exception_stack, but maybe an error out of a PG_TRY/PG_RE_THROW
  could do it?  Does the postmaster ever execute PG_TRY?
 
 Doh, I bet that's it, and it's not the postmaster that's at issue
 but PG_TRY blocks executed during subprocess startup.  Inheritance
 of a PG_exception_stack setting from the postmaster could only happen if
 the postmaster were to fork() within a PG_TRY block, which I think we
 can safely say it doesn't.  But suppose we get an elog(ERROR) inside
 a PG_TRY block when there is no outermost longjmp catcher.   elog.c
 will think it should longjmp, and that will eventually lead to
 executing
 
 #define PG_RE_THROW()  \
   siglongjmp(*PG_exception_stack, 1)
 
 with PG_exception_stack = NULL; which seems entirely likely to cause
 a stack smash of gruesome dimensions.  What's more, nothing would have
 been printed to the postmaster log beforehand, agreeing with observation.

I agree that that would be a bug and we should fix it, but I don't think
it explains the problem we're seeing because there is no PG_TRY block
in the autovac startup code that I can see :-(

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[HACKERS] Sequential scans

2007-05-02 Thread Heikki Linnakangas

Hi,

I'm starting to review the synchronized scans and scan-resistant 
buffer cache patches. The patches have complex interactions so I'm 
taking a holistic approach.


There's four outstanding issues with the sync scans in particular:

1. The simplistic hash approach. While it's nice to not have a lock, I'm 
worried of collisions. If you had a collision every now and then, it 
wouldn't be that bad, but because the hash value is computed from the 
oid, a collision would be persistent. If you create a database and 
happen to have two frequently seqscanned tables that collide, the only 
way to get rid of the collision is to drop and recreate a table. 
Granted, that'd probably be very rare in practice, but when it happens 
it would be next to impossible to figure out what's going on.


Let's use a normal hash table instead, and use a lock to protect it. If 
we only update it every 10 pages or so, the overhead should be 
negligible. To further reduce contention, we could modify ReadBuffer to 
let the caller know if the read resulted in a physical read or not, and 
only update the entry when a page is physically read in. That way all 
the synchronized scanners wouldn't be updating the same value, just the 
one performing the I/O. And while we're at it, let's use the full 
relfilenode instead of just the table oid in the hash.


2. Under what circumstances does the patch help and when does it hurt? I 
think the patch is safe in that it should never be any worse than what 
we have now. But when does it help? That needs to be looked at together 
with the other patch.


I need to dig the archives for the performance test results you posted 
earlier and try to understand them.


There's six distinct scenarios I've come up with this far that need to 
be looked at:

A. A seq scan on a small table
B. A seq scan on a table that's 110% the size of shared_buffers, but 
smaller than RAM

C. A seq scan on a table that's 110% the size of RAM
D. A seq scan on a huge table
E. Two simultaneous seq scans on a large table starting at the same time
F. Two simultaneous seq scans on a large table, 2nd one starting when 
the 1st one is halfway through


Also, does it change things if you have a bulk update instead of 
read-only query? How about bitmap heap scans and large index scans? And 
vacuums? And the above scenarios need to be considered both alone, and 
in the presence of other OLTP kind of workload.


I realize that we can't have everything, and as long as we get some 
significant benefit in some scenarios, and don't hurt others, the patch 
is worthwhile. But let's try to cover as much as we reasonably can.


One random idea I had to cover B  C without having the offset variable: 
Start scanning *backwards* from the page that's in the shared hash 
table, until you hit a page that's not in buffer cache. Then you 
continue scanning forwards from the page you started from.


This needs more thought but I think we can come up with a pretty simple 
solution that covers the most common cases.


3. By having different backends doing the reads, are we destroying OS 
readahead as Tom suggested? I remember you performed some tests on that, 
and it was a problem on some systems but not on others. This needs some 
thought, there may be some simple way to address that.


4. It fails regression tests. You get an assertion failure on the portal 
test. I believe that changing the direction of a scan isn't handled 
properly; it's probably pretty easy to fix.


Jeff, could you please fix 1 and 4? I'll give 2 and 3 some more thought, 
and take a closer look at the scan-resistant scans patch. Any comments 
and ideas are welcome, of course..


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] Feature freeze progress report

2007-05-02 Thread Alvaro Herrera
Andrew Dunstan wrote:
 
 
 Tom Lane wrote:
 So in a roundabout way we come back
 to the idea that we need a bug tracker (NOT a patch tracker), plus
 people putting in the effort to make sure it stays a valid source
 of up-to-date info.  Without the latter it won't really be useful.
 
 Hallelujah Brother!

Amen

 BTW, a bug tracker can be used as a patch tracker, although the reverse 
 isn't true. For example, the BZ people use BZ that way, in fact - most 
 patches arrive as attachments to bugs. And trackers can be used just as 
 well for tracking features as well as bugs.

The pidgin (previously known as Gaim) guys also use it that way.  They
add a bug for each thing they want to change, even new features, and
track the patches in there.  Then they have a list of issues that should
be solved for each release, so it's easy to see which ones are still
missing using their Trac interface.

http://developer.pidgin.im/roadmap

So the status email that Tom sent yesterday would be a very simple thing
to generate, just looking at the bugs to fix page.

I'm not saying we should use Trac, mainly because I hate how it
(doesn't) interact with email.  But it does say that a bug tracker can
be useful to us.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] strange buildfarm failures

2007-05-02 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 I agree that that would be a bug and we should fix it, but I don't think
 it explains the problem we're seeing because there is no PG_TRY block
 in the autovac startup code that I can see :-(

I'm wondering if there is some code path that invokes a PG_TRY deep in
the bowels of the system.  Anyway, I'll go fix this, and we should know
soon enough if it changes the buildfarm behavior.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Feature freeze progress report

2007-05-02 Thread Chris Browne
[EMAIL PROTECTED] (Andrew Dunstan) writes:
 Tom Lane wrote:
 So in a roundabout way we come back
 to the idea that we need a bug tracker (NOT a patch tracker), plus
 people putting in the effort to make sure it stays a valid source
 of up-to-date info.  Without the latter it won't really be useful.

 Hallelujah Brother!

 BTW, a bug tracker can be used as a patch tracker, although the
 reverse isn't true. For example, the BZ people use BZ that way, in
 fact - most patches arrive as attachments to bugs. And trackers can be
 used just as well for tracking features as well as bugs.

Well, Command Prompt set up a Bugzilla instance specifically so people
could track PG bugs.  If only someone took interest and started using
it...
-- 
cbbrowne,@,linuxfinances.info
http://cbbrowne.com/info/lisp.html
Do Roman paramedics refer to IV's as 4's? 

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] strange buildfarm failures

2007-05-02 Thread Alvaro Herrera
Tom Lane wrote:
 Alvaro Herrera [EMAIL PROTECTED] writes:
  I agree that that would be a bug and we should fix it, but I don't think
  it explains the problem we're seeing because there is no PG_TRY block
  in the autovac startup code that I can see :-(
 
 I'm wondering if there is some code path that invokes a PG_TRY deep in
 the bowels of the system.

Well, I checked all the bowels involved in autovacuum startup.

 Anyway, I'll go fix this, and we should know soon enough if it changes
 the buildfarm behavior.

Agreed.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Patch queue triage

2007-05-02 Thread Heikki Linnakangas

Tom Lane wrote:

* [pgsql-patches] Ctid chain following enhancement
   /Pavan Deolasee/

I'm not very excited about this --- it seems to me to complicate the code
in some places that are not in fact performance-critical.  While it
doesn't seem likely to break things, I'm not in favor of reducing code
readability unless a measurable performance improvement can be shown.
Can we have some tests showing this is worthwhile?


IIRC this patch was originally part of an old HOT patch, and it was 
submitted as a separate patch because it has some benefit on its own but 
more importantly getting it applied first would make the HOT patch 
slightly smaller. I'm not sure if the latest HOT patch requires or 
includes this change anymore. If not we should drop this. If it does, 
then let's deal with this before attacking the hard core of HOT.



* [HACKERS] Grouped Index Tuples  /Heikki Linnakangas/
* [HACKERS] Grouped Index Tuples / Clustered Indexes
   /Heikki Linnakangas/

Needs review.  I'm not sure how many people besides Heikki have really
looked at this (I know I haven't).


The patch is ugly as it is. We need API changes to make it less ugly, I 
had hoped to discuss and reach consensus on them well before feature 
freeze, that's what the indexam API proposal and Stream bitmaps 
threads in the patch queue are all about. But those discussions and 
patches stalled, so the clustered index patch is still in the same ugly 
state.


I'm afraid we're getting past due on clustered indexes. The patch 
isn't ready for committing as it is, and we still don't have agreement 
on the API changes or even on the design in general. :(



* [HACKERS] Stream bitmaps  /Heikki Linnakangas/

I think this is on hold unless a finished bitmap-index patch shows up;
however there was some discussion of using this in support of clustered
indexes, so maybe it's live anyway?  Heikki?


This particular thread is closely related to bitmap indexes. But see 
next item:


* Re: [HACKERS] [PATCHES] Bitmapscan changes 
  /Heikki Linnakangas/


I had objected to this on the grounds that it seemed to be covering
only a narrow territory between HOT and bitmap indexes, but given the
probability that one or even both of those won't make it, we need to
give this one a second look.  So: live, needs review.


Are you talking about the patch I submitted at the beginning of that 
thread? Because the mail in the patch queue is actually about whether or 
not we want clustered indexes.


I think the original bitmapscan changes patch I submitted is live and 
needs review, even if clustered indexes and bitmap indexes are rejected. 
It should give some performance benefit when you do bitmap ANDs with 
partially lossy bitmaps, and from setting bits directly in the bitmap in 
the indexam in one call, instead of calling amgetmulti many times. 
Though I never measured that.



* [HACKERS] Indexam interface proposal  /Heikki Linnakangas/

AFAICS this discussion died out with no actual patch submitted.


This is part of clustered indexes.. This was my proposal of what the 
indexam API changes would be like. This patch is either live or dead 
together with clustered indexes.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] Patch queue triage

2007-05-02 Thread Teodor Sigaev

http://www.sigaev.ru/misc/tsearch_core-0.46.gz
Patch is synced with current CVS HEAD and synced with bugfixes in 
contrib/tsearch2



--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] strange buildfarm failures

2007-05-02 Thread Alvaro Herrera
Alvaro Herrera wrote:
 Tom Lane wrote:
  Alvaro Herrera [EMAIL PROTECTED] writes:
   I agree that that would be a bug and we should fix it, but I don't think
   it explains the problem we're seeing because there is no PG_TRY block
   in the autovac startup code that I can see :-(
  
  I'm wondering if there is some code path that invokes a PG_TRY deep in
  the bowels of the system.
 
 Well, I checked all the bowels involved in autovacuum startup.

Huh, hang on ... there is one caller, which is to set client_encoding
(call_string_assign_hook uses a PG_TRY block), but it is called *after*
the sigsetjmp block -- in InitPostgres.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] strange buildfarm failures

2007-05-02 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 I'm wondering if there is some code path that invokes a PG_TRY deep in
 the bowels of the system.

 Huh, hang on ... there is one caller, which is to set client_encoding
 (call_string_assign_hook uses a PG_TRY block), but it is called *after*
 the sigsetjmp block -- in InitPostgres.

While testing the PG_RE_THROW problem I noted that what I get here is
a SIGSEGV crash, rather than SIGABRT as seen on Stefan's machines, so
that's another hint that this may be unrelated.  Still, it's clearly
at risk of causing a problem as more PG_TRY's get added to the code,
so I'm going to fix it anyway.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] strange buildfarm failures

2007-05-02 Thread Alvaro Herrera
Alvaro Herrera wrote:
 Stefan Kaltenbrunner wrote:
 
  well - i now have a core file but it does not seem to be much worth
  except to prove that autovacuum seems to be the culprit:
  
  Core was generated by `postgres: autovacuum worker process
   '.
  Program terminated with signal 6, Aborted.
  
  [...]
  
  #0  0x0ed9 in ?? ()
  warning: GDB can't find the start of the function at 0xed9.

I just noticed an ugly bug in the worker code which I'm fixing.  I think
this one would also throw SIGSEGV, not SIGABRT.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] strange buildfarm failures

2007-05-02 Thread Alvaro Herrera
Alvaro Herrera wrote:
 Alvaro Herrera wrote:
  Stefan Kaltenbrunner wrote:
  
   well - i now have a core file but it does not seem to be much worth
   except to prove that autovacuum seems to be the culprit:
   
   Core was generated by `postgres: autovacuum worker process
'.
   Program terminated with signal 6, Aborted.
   
   [...]
   
   #0  0x0ed9 in ?? ()
   warning: GDB can't find the start of the function at 0xed9.
 
 I just noticed an ugly bug in the worker code which I'm fixing.  I think
 this one would also throw SIGSEGV, not SIGABRT.

Nailed it -- this is the actual bug that causes the abort.  But I am
surprised that it doesn't print the error message in Stefan machine's;
here it outputs


TRAP: FailedAssertion(!unsigned long)(elem))  ShmemBase)), File: 
/pgsql/source/00head/src/backend/storage/ipc/shmqueue.c, Line: 107)
16496 2007-05-02 11:30:31 CLT DEBUG:  server process (PID 16540) was terminated 
by signal 6: Aborted
16496 2007-05-02 11:30:31 CLT LOG:  server process (PID 16540) was terminated 
by signal 6: Aborted
16496 2007-05-02 11:30:31 CLT LOG:  terminating any other active server 
processes
16496 2007-05-02 11:30:31 CLT DEBUG:  sending SIGQUIT to process 16541
16496 2007-05-02 11:30:31 CLT DEBUG:  sending SIGQUIT to process 16498
16496 2007-05-02 11:30:31 CLT DEBUG:  sending SIGQUIT to process 16500
16496 2007-05-02 11:30:31 CLT DEBUG:  sending SIGQUIT to process 16499
16541 2007-05-02 11:30:33 CLT WARNING:  terminating connection because of crash 
of another server process


Maybe stderr is going somewhere else?  That would be strange, I think.

I'll commit the fix shortly; attached.

-- 
Alvaro Herrera http://www.flickr.com/photos/alvherre/
La primera ley de las demostraciones en vivo es: no trate de usar el sistema.
Escriba un guión que no toque nada para no causar daños. (Jakob Nielsen)
Index: src/backend/postmaster/autovacuum.c
===
RCS file: /home/alvherre/Code/cvs/pgsql/src/backend/postmaster/autovacuum.c,v
retrieving revision 1.42
diff -c -p -r1.42 autovacuum.c
*** src/backend/postmaster/autovacuum.c	18 Apr 2007 16:44:18 -	1.42
--- src/backend/postmaster/autovacuum.c	2 May 2007 15:25:27 -
*** AutoVacWorkerMain(int argc, char *argv[]
*** 1407,1431 
  	 * Get the info about the database we're going to work on.
  	 */
  	LWLockAcquire(AutovacuumLock, LW_EXCLUSIVE);
! 	MyWorkerInfo = (WorkerInfo) MAKE_PTR(AutoVacuumShmem-av_startingWorker);
! 	dbid = MyWorkerInfo-wi_dboid;
! 	MyWorkerInfo-wi_workerpid = MyProcPid;
! 
! 	/* insert into the running list */
! 	SHMQueueInsertBefore(AutoVacuumShmem-av_runningWorkers, 
! 		 MyWorkerInfo-wi_links);
  	/*
! 	 * remove from the starting pointer, so that the launcher can start a new
! 	 * worker if required
  	 */
! 	AutoVacuumShmem-av_startingWorker = INVALID_OFFSET;
! 	LWLockRelease(AutovacuumLock);
  
! 	on_shmem_exit(FreeWorkerInfo, 0);
  
! 	/* wake up the launcher */
! 	if (AutoVacuumShmem-av_launcherpid != 0)
! 		kill(AutoVacuumShmem-av_launcherpid, SIGUSR1);
  
  	if (OidIsValid(dbid))
  	{
--- 1407,1442 
  	 * Get the info about the database we're going to work on.
  	 */
  	LWLockAcquire(AutovacuumLock, LW_EXCLUSIVE);
! 
  	/*
! 	 * beware of startingWorker being INVALID; this could happen if the
! 	 * launcher thinks we've taking too long to start.
  	 */
! 	if (AutoVacuumShmem-av_startingWorker != INVALID_OFFSET)
! 	{
! 		MyWorkerInfo = (WorkerInfo) MAKE_PTR(AutoVacuumShmem-av_startingWorker);
! 		dbid = MyWorkerInfo-wi_dboid;
! 		MyWorkerInfo-wi_workerpid = MyProcPid;
! 
! 		/* insert into the running list */
! 		SHMQueueInsertBefore(AutoVacuumShmem-av_runningWorkers, 
! 			 MyWorkerInfo-wi_links);
! 		/*
! 		 * remove from the starting pointer, so that the launcher can start a new
! 		 * worker if required
! 		 */
! 		AutoVacuumShmem-av_startingWorker = INVALID_OFFSET;
! 		LWLockRelease(AutovacuumLock);
  
! 		on_shmem_exit(FreeWorkerInfo, 0);
  
! 		/* wake up the launcher */
! 		if (AutoVacuumShmem-av_launcherpid != 0)
! 			kill(AutoVacuumShmem-av_launcherpid, SIGUSR1);
! 	}
! 	else
! 		/* no worker entry for me, go away */
! 		LWLockRelease(AutovacuumLock);
  
  	if (OidIsValid(dbid))
  	{
*** AutoVacWorkerMain(int argc, char *argv[]
*** 1466,1473 
  	}
  
  	/*
! 	 * FIXME -- we need to notify the launcher when we are gone.  But this
! 	 * should be done after our PGPROC is released, in ProcKill.
  	 */
  
  	/* All done, go away */
--- 1477,1484 
  	}
  
  	/*
! 	 * The launcher will be notified of my death in ProcKill, *if* we managed
! 	 * to get a worker slot at all
  	 */
  
  	/* All done, go away */

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] strange buildfarm failures

2007-05-02 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 Nailed it -- this is the actual bug that causes the abort.  But I am
 surprised that it doesn't print the error message in Stefan machine's;

Hm, maybe we need an fflush(stderr) in ExceptionalCondition?

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Fwd: [PATCHES] Preliminary GSSAPI Patches

2007-05-02 Thread Henry B. Hotz


On May 2, 2007, at 3:11 AM, Magnus Hagander wrote:


As to the question of GSSAPI vs SSL, I would never argue we don't
want both.

Part of what made the GSSAPI encryption mods difficult was my intent
to insert them above the SSL encryption/buffering layer.  That way
you could double-encrypt the channel.  Since GSSAPI and SSL are
(probably, not necessarily) referenced to completely different ID
infrastructure there are scenarios where that's beneficial.


We might want to consider restructuring how SSL works when we do, that
might make it easier. The way it is now with #ifdefs all around can  
lead to

a horrible mess if there are too many different things to choose from.
Something like transport filters or whatever might be a way to do  
it. I

recall having looked at that at some point, but it was too long ago to
remember any details..

//Magnus


If someone wants to make it easier, that would be nice,  I'm not up  
for it, I don't think.



The opinions expressed in this message are mine,
not those of Caltech, JPL, NASA, or the US Government.
[EMAIL PROTECTED], or [EMAIL PROTECTED]



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


[HACKERS] Creating new system catalog problems.

2007-05-02 Thread Akmal Akmalhojaev

Hello.

I wanted to create a new system catalog in Postgres. So I changed the source
code. Everything seems to be OK. I compiled it, but now after initdb I
receive:

$ initdb -D /usr/home/postgres/post1
The files belonging to this database system will be owned by user
postgres.
This user must also own the server process.

The database cluster will be initialized with locale ru_RU.KOI8-R.
The default database encoding has accordingly been set to KOI8.

creating directory /usr/home/postgres/post1 ... ok
creating subdirectories ... ok
selecting default max_connections ... 40
selecting default shared_buffers/max_fsm_pages ... 28MB/179200
creating configuration files ... ok
creating template1 database in /usr/home/postgres/post1/base/1 ... FATAL:
could
not open relation with OID 2617
child process exited with exit code 1
initdb: removing data directory /usr/home/postgres/post1

I can't understand the reason. Can somebody help me.

Akmal.


Re: [HACKERS] Feature freeze progress report

2007-05-02 Thread Marc Munro
On Wed, 2007-02-05 at 08:27 -0400, Andrew Dunstan wrote:
 
 Naz Gassiep wrote:
  Andrew Dunstan wrote:

  Naz Gassiep wrote:
  
  I believe the suggestion was to have an automated process that only ran
  on known, sane patches.

  How do we know in advance of reviewing them that they are sane?
  
  Same way as happens now. 

 
 The question was rhetorical ... there is no list of certified sane but 
 unapplied patches. You are proceeding on the basis of a faulty 
 understanding of how our processes work.

Why do we need to know the patch is sane?  If it does not apply cleanly
or causes regression tests to fail, the process would figure that out
quickly and cheaply.  There is little cost in attempting to apply a
non-sane patch.

I am not sure that I have explained exactly what I was suggesting.  Some
people seem to grok it, others seem to be talking something slightly
different.  To clarify, here it is in pseudo-code:

for each patch in the queue
  regression_success := false
  patch_success := attempt to apply patch to head
  if patch_success
regression_success := attempt to run regression tests
-- (On one machine only, not on the buildfarm)
  end if
  if this is a new patch
maybe mail the author and tell them patch_success and
regression_success
  else
if status is different from last time
  mail the author and tell them their patch has changed status
end
  end
  record the status for this patch
end loop

__
Marc


signature.asc
Description: This is a digitally signed message part


[HACKERS] autovacuum starvation

2007-05-02 Thread Alvaro Herrera
Hi,

The recently discovered autovacuum bug made me notice something that is
possibly critical.  The current autovacuum code makes an effort not to
leave workers in a starting state for too long, lest there be failure
to timely tend all databases needing vacuum.

This is how the launching of workers works:
1) the launcher puts a pointer to a WorkerInfo entry in shared memory,
   called the starting worker pointer
2) the launcher sends a signal to the postmaster
3) the postmaster forks a worker
4) the new worker checks the starting worker pointer
5) the new worker resets the starting worker pointer
6) the new worker connects to the given database and vacuums it

The problem is this: I originally added some code in the autovacuum
launcher to check that a worker does not take too long to start.  This
is autovacuum_naptime seconds.  If this happens, the launcher resets the
starting worker pointer, which means that the newly starting worker will
not see anything that needs to be done and exit quickly.

The problem with this is that on a high load machine, for example
lionfish during buildfarm runs, this would cause autovacuum starvation
for the period in which the high load is sustained.  This could prove
dangerous.

The problem is that things like fork() failure cannot be communicated
back to the launcher.  So when the postmaster tries to start a process
and it fails for some reason (failure to fork, or out of memory) we need
a way to re-initiate the worker that failed.

The current code resets the starting worker pointer, and leave the slot
free for another worker, maybe in another database, to start.

I recently added code to resend the postmaster signal when the launcher
sees the starting worker pointer not invalid -- step 2 above.  I think
this is fine, but

1) we should remove the logic to remove the starting worker pointer.  It
is not needed, because database-local failures will be handled by
subsequent checks

2) we should leave the logic to resend the postmaster, but we should
make an effort to avoid sending it too frequently

Opinions?

If I haven't stated the problem clearly please let me know and I'll try
to rephrase.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[HACKERS] conversion_procs makefiles

2007-05-02 Thread Alvaro Herrera
I noticed that conversion_procs is sadly single-tasked to build.  I am
wondering if it would be acceptable to rework the Makefile.shlib to have
an option to allow building multiple libs, by creating a rule to collect
libraries to build, and have each conversion_proc Makefile add a target
to that.  Then the whole lot of libraries would be built by a single
non-recursive make pass, instead of the current recursive approach.

I don't have a patch yet but I've been playing with it a bit.  I don't
think I'd have time for 8.3 though.

Objections?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Feature freeze progress report

2007-05-02 Thread Magnus Hagander
On Wed, May 02, 2007 at 06:44:12AM -0400, Bruce Momjian wrote:
 Magnus Hagander wrote:
   As an example, how is patch information going to help us review HOT or
   group-item-index?  There is frankly more information about these in the
   archives than someone could reasonable read.  What someone needs is a
   summary of where we are now on the patches, and lots of time.
   
   FYI, Tom, Heikki, I need one of you to post the list of patches and
   where we think we are on each one, even if the list is imperfect.
  
  I think you just contradicted yourself. Information isn ot the problem, but
  you need more information...
  
  I think this is a fairly clear indication that we do need a better way to
  track this information.
 
 No, my point is that 100% information is already available by looking at
 email archives.

Yes, and hard to find.

  What we need is a short description of where we are on
 each patch --- that is a manual process, not something that can be
 automated.

Right. But it can be presented in a central way and incrementally updated.


 Tom has posted it --- tell me how we will get such a list in an
 automated manner.

There's no way to get it automated, but you can get it incrementally
updated.

//Magnus


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] strange buildfarm failures

2007-05-02 Thread Gregory Stark
Tom Lane [EMAIL PROTECTED] writes:

 Alvaro Herrera [EMAIL PROTECTED] writes:
 Nailed it -- this is the actual bug that causes the abort.  But I am
 surprised that it doesn't print the error message in Stefan machine's;

 Hm, maybe we need an fflush(stderr) in ExceptionalCondition?

stderr is supposed to be line-buffered by default. Couldn't hurt I suppose.



-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Sequential scans

2007-05-02 Thread Jeff Davis
On Wed, 2007-05-02 at 14:26 +0100, Heikki Linnakangas wrote:
 Hi,
 
 I'm starting to review the synchronized scans and scan-resistant 
 buffer cache patches. The patches have complex interactions so I'm 
 taking a holistic approach.
 
 There's four outstanding issues with the sync scans in particular:
 
 1. The simplistic hash approach. While it's nice to not have a lock, I'm 
 worried of collisions. If you had a collision every now and then, it 
 wouldn't be that bad, but because the hash value is computed from the 
 oid, a collision would be persistent. If you create a database and 
 happen to have two frequently seqscanned tables that collide, the only 
 way to get rid of the collision is to drop and recreate a table. 
 Granted, that'd probably be very rare in practice, but when it happens 
 it would be next to impossible to figure out what's going on.
 
 Let's use a normal hash table instead, and use a lock to protect it. If 
 we only update it every 10 pages or so, the overhead should be 
 negligible. To further reduce contention, we could modify ReadBuffer to 
 let the caller know if the read resulted in a physical read or not, and 
 only update the entry when a page is physically read in. That way all 
 the synchronized scanners wouldn't be updating the same value, just the 
 one performing the I/O. And while we're at it, let's use the full 
 relfilenode instead of just the table oid in the hash.

What should be the maximum size of this hash table? Is there already-
existing hash table code that I should use to be consistent with the
rest of the code?

I'm still trying to understand the effect of using the full relfilenode.
Do you mean using the entire relation _segment_ as the key? That doesn't
make sense to me. Or do you just mean using the relfilenode (without the
segment) as the key?

 3. By having different backends doing the reads, are we destroying OS 
 readahead as Tom suggested? I remember you performed some tests on that, 
 and it was a problem on some systems but not on others. This needs some 
 thought, there may be some simple way to address that.

Linux with CFQ I/O scheduler performs very poorly and inconsistently
with concurrent sequential scans regardless of whether the scans are
synchronized or not. I suspect the reason for this is that CFQ is
designed to care more about the process issuing the request than any
other factor.

Every other I/O system performed either ideally (no interference between
scans) or had some interference but still much better than current
behavior.

Of course, my tests are limited and there are many possible combinations
of I/O systems that I did not try. 

 4. It fails regression tests. You get an assertion failure on the portal 
 test. I believe that changing the direction of a scan isn't handled 
 properly; it's probably pretty easy to fix.
 

I will examine the code more carefully. As a first guess, is it possible
that test is failing because of the non-deterministic order in which
tuples are returned?

 Jeff, could you please fix 1 and 4? I'll give 2 and 3 some more thought, 
 and take a closer look at the scan-resistant scans patch. Any comments 
 and ideas are welcome, of course..
 

Yes. I'll also try to address the other issues in your email. Thanks for
your comments.

Regards,
Jeff Davis



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Sequential scans

2007-05-02 Thread Gregory Stark
Heikki Linnakangas [EMAIL PROTECTED] writes:

 Let's use a normal hash table instead, and use a lock to protect it. If we 
 only
 update it every 10 pages or so, the overhead should be negligible. To further
 reduce contention, we could modify ReadBuffer to let the caller know if the
 read resulted in a physical read or not, and only update the entry when a page
 is physically read in. That way all the synchronized scanners wouldn't be
 updating the same value, just the one performing the I/O. And while we're at
 it, let's use the full relfilenode instead of just the table oid in the hash.

It's probably fine to just do that. But if we find it's a performance
bottleneck we could probably still manage to avoid the lock except when
actually inserting a new hash element. If you just store in the hash an index
into an array stored in global memory then you could get away without a lock
on the element in the array. 

It starts to get to be a fair amount of code when you think about how you
would reuse elements of the array. That's why I suggest only looking at this
if down the road we find that it's a bottleneck.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Feature freeze progress report

2007-05-02 Thread Josh Berkus
Bruce, all,

 No, my point is that 100% information is already available by looking at
 email archives.  What we need is a short description of where we are on
 each patch --- that is a manual process, not something that can be
 automated.

 Tom has posted it --- tell me how we will get such a list in an
 automated manner.

Several of us have already suggested a method.  If we want the information to 
be up-to-date, then the patch manager, or bug tracker, needs to be a required 
part of the approval  application process, NOT an optional accessory.  That 
is, if patches  bug fixes can come in, get modified, get approved  applied 
entirely on pgsql-patches or pgsql-bugs without ever touching the tracker 
tool, then the tracker tool will be permanently out of date and useless.

It's going to require the people who are doing the majority of the bug hunting 
 patch review to change the way they work, with the idea that any extra time 
associated with the new tool will be offset by being able to spread the work 
more and having information easy to find later, for you as well as others.  
Tom seems to be willing; are you?



 Status: Will be rejected unless race conditions are fixed. Needs
 performance testing.
 Discussions:links to mail threads, like in the current patch queue

... this brings up another reason we could use a tracker.  I now have access 
to a performance testing lab and staff.  However, these people are NOT going 
to follow 3 different high-traffic mailing lists in order to keep up with 
which patches to test.  As a result, they haven't done much testing of 8.3 
patches; they're depenant on me to keep them updated on new patch versions 
and known issues and I'm on the road a lot.

If I had a web tool I could point them to where they could simply download the 
current version of the patch, test, and comment a report, we'd get a LOT more 
useful performance feedback from Sun.  I suspect the same is true of Unisys.

-- 
Josh Berkus
PostgreSQL @ Sun
San Francisco

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Feature freeze progress report

2007-05-02 Thread Joshua D. Drake

Josh Berkus wrote:

Bruce, all,


No, my point is that 100% information is already available by looking at
email archives.  What we need is a short description of where we are on
each patch --- that is a manual process, not something that can be
automated.

Tom has posted it --- tell me how we will get such a list in an
automated manner.


Several of us have already suggested a method.  If we want the information to 
be up-to-date, then the patch manager, or bug tracker, needs to be a required 
part of the approval  application process, NOT an optional accessory.  That 
is, if patches  bug fixes can come in, get modified, get approved  applied 
entirely on pgsql-patches or pgsql-bugs without ever touching the tracker 
tool, then the tracker tool will be permanently out of date and useless.


It's going to require the people who are doing the majority of the bug hunting 
 patch review to change the way they work, with the idea that any extra time 
associated with the new tool will be offset by being able to spread the work 
more and having information easy to find later, for you as well as others.  
Tom seems to be willing; are you?


Hello,

Well according to himself the last time this came up:

http://archives.postgresql.org/pgsql-hackers/2006-08/msg01253.php

No, he isn't.

Sincerely,

Joshua D. Drake




--

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] autovacuum does not start in HEAD

2007-05-02 Thread Alvaro Herrera
ITAGAKI Takahiro wrote:
 I wrote:
  I found that autovacuum launcher does not launch any workers in HEAD.
 
 The attached autovacuum-fix.patch could fix the problem. I changed
 to use 'greater or equal' instead of 'greater' at the decision of
 next autovacuum target.

I have committed a patch which might fix this issue in autovacuum.c rev 1.44.
Please retest.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Sequential scans

2007-05-02 Thread Heikki Linnakangas

Gregory Stark wrote:

Heikki Linnakangas [EMAIL PROTECTED] writes:


Let's use a normal hash table instead, and use a lock to protect it. If we only
update it every 10 pages or so, the overhead should be negligible. To further
reduce contention, we could modify ReadBuffer to let the caller know if the
read resulted in a physical read or not, and only update the entry when a page
is physically read in. That way all the synchronized scanners wouldn't be
updating the same value, just the one performing the I/O. And while we're at
it, let's use the full relfilenode instead of just the table oid in the hash.


It's probably fine to just do that. But if we find it's a performance
bottleneck we could probably still manage to avoid the lock except when
actually inserting a new hash element. If you just store in the hash an index
into an array stored in global memory then you could get away without a lock
on the element in the array. 


It starts to get to be a fair amount of code when you think about how you
would reuse elements of the array. That's why I suggest only looking at this
if down the road we find that it's a bottleneck.


Another trick you could do is to use acquire the lock conditionally when 
updating it. But I doubt it's a problem anyhow, if we put some sane 
lower limit in there so that it's not used at all for small tables.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] temporal variants of generate_series()

2007-05-02 Thread JEAN-PIERRE PELLETIER

Here's a shorter version:

On the date variant, I wasn't sure how to handle intervals with parts 
smaller than days:

floor, ceiling, round or error out
To get round, the last parameters of generate_series would be
extract('epoch' FROM '1 day'::interval)::bigint * round(extract('epoch' FROM 
$3) / extract('epoch' FROM '1 day'::interval))::bigint


CREATE OR REPLACE FUNCTION generate_series (
   start_ts timestamptz,
   end_ts timestamptz,
   step interval
) RETURNS SETOF timestamptz
STRICT
LANGUAGE sql
AS $$
SELECT
  'epoch'::timestamptz + s.i * '1 second'::interval AS generate_series
FROM
  generate_series(
   extract('epoch' FROM $1)::bigint,
   extract('epoch' FROM $2)::bigint,
   extract('epoch' FROM $3)::bigint
  ) s(i);
$$;

CREATE OR REPLACE FUNCTION generate_series (
   start_ts date,
   end_ts date,
   step interval
) RETURNS SETOF date
STRICT
LANGUAGE sql
AS $$
SELECT
  ('epoch'::date + s.i * '1 second'::interval)::date AS generate_series
FROM
  generate_series(
   extract('epoch' FROM $1)::bigint,
   extract('epoch' FROM $2)::bigint,
   extract('epoch' FROM date_trunc('day', $3))::bigint -- does a floor
  ) s(i);
$$;

Jean-Pierre Pelletier
e-djuster



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate


Re: [HACKERS] temporal variants of generate_series()

2007-05-02 Thread JEAN-PIERRE PELLETIER

Here's a shorter version:

On the date variant, I wasn't sure how to handle intervals with parts 
smaller than days:

floor, ceiling, round or error out
To get round, the last parameters of generate_series would be
extract('epoch' FROM '1 day'::interval)::bigint * round(extract('epoch' FROM 
$3) / extract('epoch' FROM '1 day'::interval))::bigint


CREATE OR REPLACE FUNCTION generate_series (
   start_ts timestamptz,
   end_ts timestamptz,
   step interval
) RETURNS SETOF timestamptz
STRICT
LANGUAGE sql
AS $$
SELECT
  'epoch'::timestamptz + s.i * '1 second'::interval AS generate_series
FROM
  generate_series(
   extract('epoch' FROM $1)::bigint,
   extract('epoch' FROM $2)::bigint,
   extract('epoch' FROM $3)::bigint
  ) s(i);
$$;

CREATE OR REPLACE FUNCTION generate_series (
   start_ts date,
   end_ts date,
   step interval
) RETURNS SETOF date
STRICT
LANGUAGE sql
AS $$
SELECT
  ('epoch'::date + s.i * '1 second'::interval)::date AS generate_series
FROM
  generate_series(
   extract('epoch' FROM $1)::bigint,
   extract('epoch' FROM $2)::bigint,
   extract('epoch' FROM date_trunc('day', $3))::bigint -- does a floor
  ) s(i);
$$;

Jean-Pierre Pelletier
e-djuster



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Sequential scans

2007-05-02 Thread Heikki Linnakangas

Jeff Davis wrote:

On Wed, 2007-05-02 at 14:26 +0100, Heikki Linnakangas wrote:
Let's use a normal hash table instead, and use a lock to protect it. If 
we only update it every 10 pages or so, the overhead should be 
negligible. To further reduce contention, we could modify ReadBuffer to 
let the caller know if the read resulted in a physical read or not, and 
only update the entry when a page is physically read in. That way all 
the synchronized scanners wouldn't be updating the same value, just the 
one performing the I/O. And while we're at it, let's use the full 
relfilenode instead of just the table oid in the hash.


What should be the maximum size of this hash table? 


Good question. And also, how do you remove entries from it?

I guess the size should somehow be related to number of backends. Each 
backend will realistically be doing just 1 or max 2 seq scan at a time. 
 It also depends on the number of large tables in the databases, but we 
don't have that information easily available. How about using just 
NBackends? That should be plenty, but wasting a few hundred bytes of 
memory won't hurt anyone.


I think you're going to need an LRU list and counter of used entries in 
addition to the hash table, and when all entries are in use, remove the 
least recently used one.


The thing to keep an eye on is that it doesn't add too much overhead or 
lock contention in the typical case when there's no concurrent scans.


For the locking, use a LWLock.


Is there already-
existing hash table code that I should use to be consistent with the
rest of the code?


Yes, see utils/hash/dynahash.c, and ShmemInitHash (in 
storage/ipc/shmem.c) since it's in shared memory. There's plenty of 
examples that use hash tables, see for example 
storage/freespace/freespace.c.



I'm still trying to understand the effect of using the full relfilenode.
Do you mean using the entire relation _segment_ as the key? That doesn't
make sense to me. Or do you just mean using the relfilenode (without the
segment) as the key?


No, not the segment. RelFileNode consists of tablespace oid, database 
oid and relation oid. You can find it in scan-rs_rd-rd_node. The 
segmentation works at a lower level.



Linux with CFQ I/O scheduler performs very poorly and inconsistently
with concurrent sequential scans regardless of whether the scans are
synchronized or not. I suspect the reason for this is that CFQ is
designed to care more about the process issuing the request than any
other factor.

Every other I/O system performed either ideally (no interference between
scans) or had some interference but still much better than current
behavior.


Hmm. Should we care then? CFG is the default on Linux, and an average 
sysadmin is unlikely to change it.


What we could do quite easily is:
- when ReadBuffer is called, let the caller know if the read did 
physical I/O.
- when the previous ReadBuffer didn't result in physical I/O, assume 
that we're not the pack leader. If the next buffer isn't already in 
cache, wait a few milliseconds before initiating the read, giving the 
pack leader a chance to do it instead.


Needs testing, of course..

4. It fails regression tests. You get an assertion failure on the portal 
test. I believe that changing the direction of a scan isn't handled 
properly; it's probably pretty easy to fix.




I will examine the code more carefully. As a first guess, is it possible
that test is failing because of the non-deterministic order in which
tuples are returned?


No, it's an assertion failure, not just different output than expected. 
But it's probably quite simple to fix..


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] Feature freeze progress report

2007-05-02 Thread Andrew Dunstan
Marc Munro wrote:
 On Wed, 2007-02-05 at 08:27 -0400, Andrew Dunstan wrote:

 Naz Gassiep wrote:
  Andrew Dunstan wrote:
 
  Naz Gassiep wrote:
 
  I believe the suggestion was to have an automated process that only
 ran
  on known, sane patches.
 
  How do we know in advance of reviewing them that they are sane?
 
  Same way as happens now.
 

 The question was rhetorical ... there is no list of certified sane but
 unapplied patches. You are proceeding on the basis of a faulty
 understanding of how our processes work.

 Why do we need to know the patch is sane?

Because not doing so is dangerous and a major security hole. I won't run
arbitrary code on my machine and I won't create infrastructure (e.g.
buildfarm) to get others to do it either.

You are also conveniently ignoring all the other reasons why this won't
help anyone much (e.g. see Bruce's comments upthread).

cheers

andrew


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[HACKERS] Optimization in convert_string_datum?

2007-05-02 Thread Magnus Hagander
I'm reviewing the strxfrm patch, and while comparing that code to the
code in varstr_cmp (which uses the same UTF8/UTF16 workaround but for
strcoll instead), and I noticed that in varstr_cmp we have an
optimization to use a stack based buffer instead of palloc if the string
is short enough. Is convert_string_datum performance-critical enough to
make it worthwhile to put a similar optimization there?

//Magnus

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Feature freeze progress report

2007-05-02 Thread Tom Lane
Marc Munro [EMAIL PROTECTED] writes:
 On Wed, 2007-02-05 at 08:27 -0400, Andrew Dunstan wrote:
 The question was rhetorical ... there is no list of certified sane but
 unapplied patches. You are proceeding on the basis of a faulty
 understanding of how our processes work.

 Why do we need to know the patch is sane?  If it does not apply cleanly
 or causes regression tests to fail, the process would figure that out
 quickly and cheaply.  There is little cost in attempting to apply a
 non-sane patch.

Unless it contains a trojan horse.  I don't think many buildfarm owners
are running the tests inside a sandbox so tight that they don't care
how nasty the code that runs there might be.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Feature freeze progress report

2007-05-02 Thread Dave Page

Joshua D. Drake wrote:


Well according to himself the last time this came up:

http://archives.postgresql.org/pgsql-hackers/2006-08/msg01253.php

No, he isn't.


The last paragraph of 
http://archives.postgresql.org/pgsql-hackers/2007-04/msg01258.php is 
somewhat more positive regarding a patch tracker.


Regards, Dave

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Feature freeze progress report

2007-05-02 Thread Andrew Dunstan
Chris Browne wrote:
 [EMAIL PROTECTED] (Andrew Dunstan) writes:
 Tom Lane wrote:
 So in a roundabout way we come back
 to the idea that we need a bug tracker (NOT a patch tracker), plus
 people putting in the effort to make sure it stays a valid source
 of up-to-date info.  Without the latter it won't really be useful.

 Hallelujah Brother!

 BTW, a bug tracker can be used as a patch tracker, although the
 reverse isn't true. For example, the BZ people use BZ that way, in
 fact - most patches arrive as attachments to bugs. And trackers can be
 used just as well for tracking features as well as bugs.

 Well, Command Prompt set up a Bugzilla instance specifically so people
 could track PG bugs.  If only someone took interest and started using
 it...


I lost interest last time around when it seemed clear to me that there was
not enough traction.

Maybe there is this time around.

The effort Tom mentions above is crucial to success.

cheers

andrew



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Sequential scans

2007-05-02 Thread Jeff Davis
On Wed, 2007-05-02 at 20:02 +0100, Heikki Linnakangas wrote:
 Gregory Stark wrote:
  Heikki Linnakangas [EMAIL PROTECTED] writes:
  
  Let's use a normal hash table instead, and use a lock to protect it. If we 
  only
  update it every 10 pages or so, the overhead should be negligible. To 
  further
  reduce contention, we could modify ReadBuffer to let the caller know if the
  read resulted in a physical read or not, and only update the entry when a 
  page
  is physically read in. That way all the synchronized scanners wouldn't be
  updating the same value, just the one performing the I/O. And while we're 
  at
  it, let's use the full relfilenode instead of just the table oid in the 
  hash.
  
  It's probably fine to just do that. But if we find it's a performance
  bottleneck we could probably still manage to avoid the lock except when
  actually inserting a new hash element. If you just store in the hash an 
  index
  into an array stored in global memory then you could get away without a lock
  on the element in the array. 
  
  It starts to get to be a fair amount of code when you think about how you
  would reuse elements of the array. That's why I suggest only looking at this
  if down the road we find that it's a bottleneck.
 
 Another trick you could do is to use acquire the lock conditionally when 
 updating it. But I doubt it's a problem anyhow, if we put some sane 
 lower limit in there so that it's not used at all for small tables.
 

The more sophisticated the data structure the less able we are to avoid
locking, correct? For instance, if we have an LRU list it might be
tricky or impossible to avoid locking even on just reads.

Regards,
Jeff Davis


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] pg_index updates and SI invalidation

2007-05-02 Thread Tom Lane
Pavan Deolasee [EMAIL PROTECTED] writes:
 Please see the attached updated patch, based on Tom's comments.

 Attempt to reload index information for system indexes such as
 pg_class_oid_index can cause infinite recursion. But I realized that
 we don't need to reload system index information because we
 neither allow CREATE INDEX or CIC on system relations. Only
 REINDEX is allowed which does not need any reload. So we skip
 index information reload for system relations.

Applied with revisions --- mostly, trying to keep the comments in sync
with the code.  I also added a forced relcache inval on the index's
parent table at the end of CREATE INDEX CONCURRENTLY; this is to flush
cached plans and allow the newly valid index to be considered in
replanning.  (The relcache inval on the index won't do it, since by
definition the index is not mentioned in any such plan...)

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Optimization in convert_string_datum?

2007-05-02 Thread Tom Lane
Magnus Hagander [EMAIL PROTECTED] writes:
 I'm reviewing the strxfrm patch, and while comparing that code to the
 code in varstr_cmp (which uses the same UTF8/UTF16 workaround but for
 strcoll instead), and I noticed that in varstr_cmp we have an
 optimization to use a stack based buffer instead of palloc if the string
 is short enough. Is convert_string_datum performance-critical enough to
 make it worthwhile to put a similar optimization there?

No, I don't believe so.  It should only get invoked a few times per
query at most, since only the planner uses it.

It would be far more useful to figure out a way to make that code
actually do something sane with multibyte encodings than to
micro-optimize what's there.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Sequential scans

2007-05-02 Thread Heikki Linnakangas

Jeff Davis wrote:

On Wed, 2007-05-02 at 20:02 +0100, Heikki Linnakangas wrote:

Gregory Stark wrote:

Heikki Linnakangas [EMAIL PROTECTED] writes:


Let's use a normal hash table instead, and use a lock to protect it. If we only
update it every 10 pages or so, the overhead should be negligible. To further
reduce contention, we could modify ReadBuffer to let the caller know if the
read resulted in a physical read or not, and only update the entry when a page
is physically read in. That way all the synchronized scanners wouldn't be
updating the same value, just the one performing the I/O. And while we're at
it, let's use the full relfilenode instead of just the table oid in the hash.

It's probably fine to just do that. But if we find it's a performance
bottleneck we could probably still manage to avoid the lock except when
actually inserting a new hash element. If you just store in the hash an index
into an array stored in global memory then you could get away without a lock
on the element in the array. 


It starts to get to be a fair amount of code when you think about how you
would reuse elements of the array. That's why I suggest only looking at this
if down the road we find that it's a bottleneck.
Another trick you could do is to use acquire the lock conditionally when 
updating it. But I doubt it's a problem anyhow, if we put some sane 
lower limit in there so that it's not used at all for small tables.




The more sophisticated the data structure the less able we are to avoid
locking, correct? For instance, if we have an LRU list it might be
tricky or impossible to avoid locking even on just reads.


Agreed. I'm not concerned about reads, though. You only need to read 
from the structure once when you start a scan. It's the updates that 
cause most of the traffic.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Patch queue triage

2007-05-02 Thread Simon Riggs
On Tue, 2007-05-01 at 22:44 -0400, Tom Lane wrote:

Nice summary Tom.

 * Re: [PATCHES] [Fwd: Deferred Transactions, Transaction Guarantee
   and COMMITwithout waiting]  /Simon Riggs/
 
 Simon is on the hook to submit an updated patch.  I hope this one
 makes it in, as it looks like a really nice performance improvement
 for those who can use it; but the original patch seems overcomplicated.

It will, but it will be next week now. Changes agreed though.

 * [PATCHES] Minor recovery changes  /Simon Riggs/
 
 The submission mentions open issues --- when will those be resolved?
 Should we apply the patch-so-far anyway?

Patch is OK to go in as-is. There are open issues, but they need more
thought and discussion and I regret won't happen in a reasonable
timescale due to other work.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Creating new system catalog problems.

2007-05-02 Thread Tom Lane
Akmal Akmalhojaev [EMAIL PROTECTED] writes:
 I wanted to create a new system catalog in Postgres. So I changed the source
 code. Everything seems to be OK. I compiled it, but now after initdb I
 receive:
 creating template1 database in /usr/home/postgres/post1/base/1 ... FATAL:
 could
  not open relation with OID 2617

 I can't understand the reason. Can somebody help me.

Not when you didn't show us what you changed ...

2617 is pg_operator, so a possible theory is that you did something that
tries to access pg_operator before it's been created.  But just adding a
new table declaration in include/catalog shouldn't have that effect.
I suspect you added some code you haven't mentioned.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] [ADMIN] reindexdb hangs

2007-05-02 Thread Tom Lane
dx k9 [EMAIL PROTECTED] writes:
 [ stuck reindex ]
 It turns out it was a temporary database and temporary table, that just 
 wasn't there maybe it thought it was there from some type of snapshot then 
 the next minute it was gone.

Hmm, there is not any filter in ReindexDatabase() to exclude temp tables
of other backends, but it sure seems like there needs to be.  CLUSTER
might have the same issue.  I think we fixed this in VACUUM long ago,
but we need to check the other commands that grovel over all of a database.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Sequential scans

2007-05-02 Thread Jeff Davis
On Wed, 2007-05-02 at 20:58 +0100, Heikki Linnakangas wrote:
 Jeff Davis wrote:
  What should be the maximum size of this hash table? 
 
 Good question. And also, how do you remove entries from it?
 
 I guess the size should somehow be related to number of backends. Each 
 backend will realistically be doing just 1 or max 2 seq scan at a time. 
   It also depends on the number of large tables in the databases, but we 
 don't have that information easily available. How about using just 
 NBackends? That should be plenty, but wasting a few hundred bytes of 
 memory won't hurt anyone.

One entry per relation, not per backend, is my current design.

 I think you're going to need an LRU list and counter of used entries in 
 addition to the hash table, and when all entries are in use, remove the 
 least recently used one.
 
 The thing to keep an eye on is that it doesn't add too much overhead or 
 lock contention in the typical case when there's no concurrent scans.
 
 For the locking, use a LWLock.
 

Ok. What would be the potential lock contention in the case of no
concurrent scans?

Also, is it easy to determine the space used by a dynahash with N
entries? I haven't looked at the dynahash code yet, so perhaps this will
be obvious.

 No, not the segment. RelFileNode consists of tablespace oid, database 
 oid and relation oid. You can find it in scan-rs_rd-rd_node. The 
 segmentation works at a lower level.

Ok, will do.

 Hmm. Should we care then? CFG is the default on Linux, and an average 
 sysadmin is unlikely to change it.
 

Keep in mind that concurrent sequential scans with CFQ are *already*
very poor. I think that alone is an interesting fact that's somewhat
independent of Sync Scans.

 - when ReadBuffer is called, let the caller know if the read did 
 physical I/O.
 - when the previous ReadBuffer didn't result in physical I/O, assume 
 that we're not the pack leader. If the next buffer isn't already in 
 cache, wait a few milliseconds before initiating the read, giving the 
 pack leader a chance to do it instead.
 
 Needs testing, of course..
 

An interesting idea. I like that the most out of the ideas of
maintaining a pack leader. That's very similar to what the Linux
anticipatory scheduler does for us.

  4. It fails regression tests. You get an assertion failure on the portal 
  test. I believe that changing the direction of a scan isn't handled 
  properly; it's probably pretty easy to fix.
 
  
  I will examine the code more carefully. As a first guess, is it possible
  that test is failing because of the non-deterministic order in which
  tuples are returned?
 
 No, it's an assertion failure, not just different output than expected. 
 But it's probably quite simple to fix..
 

Ok, I'll find and correct it then.

Regards,
Jeff Davis


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Sequential scans

2007-05-02 Thread Heikki Linnakangas

Jeff Davis wrote:

On Wed, 2007-05-02 at 20:58 +0100, Heikki Linnakangas wrote:

Jeff Davis wrote:
What should be the maximum size of this hash table? 

Good question. And also, how do you remove entries from it?

I guess the size should somehow be related to number of backends. Each 
backend will realistically be doing just 1 or max 2 seq scan at a time. 
  It also depends on the number of large tables in the databases, but we 
don't have that information easily available. How about using just 
NBackends? That should be plenty, but wasting a few hundred bytes of 
memory won't hurt anyone.


One entry per relation, not per backend, is my current design.


Umm, you naturally have just entry per relation, but we were talking 
about how many entries the table needs to hold.. You're patch had a 
hard-coded value of 1000 which is quite arbitrary.


I think you're going to need an LRU list and counter of used entries in 
addition to the hash table, and when all entries are in use, remove the 
least recently used one.


The thing to keep an eye on is that it doesn't add too much overhead or 
lock contention in the typical case when there's no concurrent scans.


For the locking, use a LWLock.


Ok. What would be the potential lock contention in the case of no
concurrent scans?


If you only have one seq scan in the system, there's no contention. If 
you have more, they will all need to acquire the lock to update the page 
number in the hash table, regardless of the table their scanning, so 
there's potential for contention.


To put things into perspective, though, any time you need to evict a 
buffer from the buffer cache to read in a new buffer, you need to 
acquire the BufFreelistLock. If we only update the page number say every 
10 pages, the overhead and lock contention of that lock would be roughly 
1/10th of that arising from BufFreelistLock. And we can make it a 
conditional acquire, in which case the backends won't need to slow down 
their scans, but the updates of the entries in the hash table will get 
less frequent instead. We could also take just a shared lock to update 
the counter, and rely on the atomicity of the write like your patch did. 
You'd only need to take an exclusive lock to move entries in the LRU 
list or to add or remove an entry.


But let's keep it simple at first, and do some testing..


Also, is it easy to determine the space used by a dynahash with N
entries? I haven't looked at the dynahash code yet, so perhaps this will
be obvious.


Yes, see hash_estimate_size.


BTW: Thanks for the patch!

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Avoiding unnecessary reads in recovery

2007-05-02 Thread Tom Lane
Heikki Linnakangas [EMAIL PROTECTED] writes:
 What we could have is the semantics of Return a buffer, with either 
 correct contents or completely zeroed out. It would act just like 
 ReadBuffer if the buffer was already in memory, and zero out the page 
 otherwise. That's a bit strange semantics to have, but is simple to 
 implement and works for the use-cases we've been talking about.

 Patch implementing that attached. I named the function ReadOrZeroBuffer.

Applied.  BTW, I realized that there is a potential issue created by
this, which is that the smgr level might see a write for a page that it
never saw a read for.  I don't think there are any bad consequences of
this ATM, but it is skating around the edges of some bugs we've had
previously with relation extension.  In particular ReadOrZeroBuffer
avoids the error that would normally occur if one tries to read a page
that's beyond the logical EOF; and if the page is subsequently modified
and written, md.c is likely to get confused/unhappy, particularly if the
page is beyond the next segment boundary.  This isn't a problem in
XLogReadBuffer's usage because it carefully checks the EOF position
before trying to use ReadOrZeroBuffer, but it's a limitation other
callers will need to think about.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Sequential scans

2007-05-02 Thread Jeff Davis
On Wed, 2007-05-02 at 23:59 +0100, Heikki Linnakangas wrote:
 Jeff Davis wrote:
  On Wed, 2007-05-02 at 20:58 +0100, Heikki Linnakangas wrote:
  Jeff Davis wrote:
  What should be the maximum size of this hash table? 
  Good question. And also, how do you remove entries from it?
 
  I guess the size should somehow be related to number of backends. Each 
  backend will realistically be doing just 1 or max 2 seq scan at a time. 
It also depends on the number of large tables in the databases, but we 
  don't have that information easily available. How about using just 
  NBackends? That should be plenty, but wasting a few hundred bytes of 
  memory won't hurt anyone.
  
  One entry per relation, not per backend, is my current design.
 
 Umm, you naturally have just entry per relation, but we were talking 
 about how many entries the table needs to hold.. You're patch had a 
 hard-coded value of 1000 which is quite arbitrary.

I think I'm missing something from your statement I guess the size
should somehow be related to number of backends. If there is only one
entry per relation, why do more backends require a larger hash table?

 If you only have one seq scan in the system, there's no contention. If 
 you have more, they will all need to acquire the lock to update the page 
 number in the hash table, regardless of the table their scanning, so 
 there's potential for contention.
 
 To put things into perspective, though, any time you need to evict a 
 buffer from the buffer cache to read in a new buffer, you need to 
 acquire the BufFreelistLock. If we only update the page number say every 
 10 pages, the overhead and lock contention of that lock would be roughly 
 1/10th of that arising from BufFreelistLock. And we can make it a 
 conditional acquire, in which case the backends won't need to slow down 
 their scans, but the updates of the entries in the hash table will get 
 less frequent instead. We could also take just a shared lock to update 
 the counter, and rely on the atomicity of the write like your patch did. 
 You'd only need to take an exclusive lock to move entries in the LRU 
 list or to add or remove an entry.
 

If I just step back for a second to consider a simpler design:

We will most likely have no more than a few relations larger than the
minimum threshold for Sync Scanning in any database being scanned
concurrently.

What if I just make an LRU that occupies a fixed size? Any reads or
updates can start at the MRU item and work backward, and any evictions
can happen at the LRU item.

Does a hash table really save us cycles if we keep the list small, say,
100 items? Looking at all 100 items would only be required perhaps on a
scan startup. I don't have a good sense of scale here, is following 50
pointers while holding a lock a significant source of contention?

100 is of course arbitrary, and that could grow or shrink automatically
at runtime.

 Yes, see hash_estimate_size.
 

Ok.

Regards,
Jeff Davis


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Feature freeze progress report

2007-05-02 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes:
 You keep saying that but I think it's wrong. There are trivial patches that
 were submitted last year that are still sitting in the queue.

Um ... which ones exactly?  I don't see *anything* in the current queue
that is utterly without issues, other than Heikki's ReadOrZeroBuffer
patch which certainly doesn't date from last year (and besides, has
now been applied).

I'm a bit more worried about stuff that may have slid through the
cracks, like the Darwin SysV-vs-Posix-semaphores patch that I complained
of in my triage listing.  But the stuff that is listed on Bruce's patch
queue page is not trivial.  It's either large or has got problems.

regards, tom lane

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] Heap page diagnostic functions

2007-05-02 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes:
 Incidentally, Tom, were you consulting the dead-tree edition of the OED?

I plead guilty to having blown some dust off it before opening it ...
but when looking for an authoritative reference, I like things that
have got heft to them ...

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] Optimization in convert_string_datum?

2007-05-02 Thread Magnus Hagander
  I'm reviewing the strxfrm patch, and while comparing that code to the
  code in varstr_cmp (which uses the same UTF8/UTF16 workaround but for
  strcoll instead), and I noticed that in varstr_cmp we have an
  optimization to use a stack based buffer instead of palloc if the string
  is short enough. Is convert_string_datum performance-critical enough to
  make it worthwhile to put a similar optimization there?
 
 No, I don't believe so.  It should only get invoked a few times per
 query at most, since only the planner uses it.

ok.

 It would be far more useful to figure out a way to make that code
 actually do something sane with multibyte encodings than to
 micro-optimize what's there.

I'm not volunteering to do that - at least not now ;)

/Magnus


---(end of broadcast)---
TIP 6: explain analyze is your friend