date:20070221

Re: [HACKERS] msvc failure in largeobject regression test

2007-02-21 Thread Magnus Hagander

No.

I think one option is fix it in the cvs client. There
are options there to do it if you use cvsnt.

FWIW, if I built off the snapshot tarballs things work, since they have
unix linefeeds.

//Magnus

On Tue, Feb 20, 2007 at 05:11:41PM -0500, Bruce Momjian wrote:
 
 Was this problem addressed?
 
 ---
 
 Magnus Hagander wrote:
  On Tue, Jan 23, 2007 at 11:39:23AM -0800, Jeremy Drake wrote:
   On Tue, 23 Jan 2007, Magnus Hagander wrote:
   
On Tue, Jan 23, 2007 at 09:31:40AM -0500, Andrew Dunstan wrote:
 Magnus Hagander wrote:
 Hi!
 
 I get failures for the largeobject regression tests on my vc++ 
 build. I
 don't think this has ever worked, given that those tests are fairly 
 new.
 Any quick ideas on what's wrong before I dig deeper?
 
 
 [snip]

 I wonder if this is a line-end issue? Assuming you are working from 
 CVS,
 does your client turn \n into \r\n ? I see that other windows boxes 
 are
 happily passing this test on the buildfarm, and of course the mingw 
 cvs
 doesn't adjust line endings.
   
Bingo!
   
That's it. I copeid the file in binary mode from a linux box and now it
passes.
   
   I thought about that when I wrote it, and thus tried it under mingw and
   cygwin without issue ;)  I don't think the regression tests were in a
   position of running on the msvc build at the time...  My thought for what
   to do if this did run into a problem would be an alternate output file
   that is also acceptable (I don't know what they're called but other tests
   have them IIRC).
  
  Either that, or we require a checkout using Unix style linefeeds. I've
  confirmed that removing the file and checking it back out with cvs --lf
  update tenk.data works - tests pass fine.
  
  Yet another option might be to flag that file as binary in cvs, in which
  case I think cvsnt shouldn't go mess with it.
  
  //Magnus
  
  ---(end of broadcast)---
  TIP 1: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly
 
 -- 
   Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
   EnterpriseDB   http://www.enterprisedb.com
 
   + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] New feature request: FlashBack Query

2007-02-21 Thread Csaba Nagy

 Do 97% of transactions commit because Oracle has slow rollbacks and  
 developers are working around that performance issue, or because they  
 really commit?
 
 I have watched several developers that would prefer to issue numerous  
 selects to verify things like foreign keys in the application in  
 order to avoid a rollback.

Most of the code we have will not afford a rollback because it can be
part of a much bigger transaction which would have much higher
performance penalty if retried than a simple rollback. And you know that
in postgres you can't roll back just the last insert, you will crash the
whole transaction with it... and it's simply a performance bottleneck to
retry in a high contention scenario (which is usually so in our case).

So I would say we don't avoid rollbacks because of the cost of the
rollback, but because of the cost of the retry...

Cheers,
Csaba.



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] tsearch in core patch, for inclusion

2007-02-21 Thread Markus Schiltknecht


Hi,

Tom Lane wrote:

You mean four different object types.  I'm not totally clear on bison's
scaling behavior relative to the number of productions


You really want to trade parser performance (which is *very* 
implementation specific) for ease of use?


Bison generates a LALR [1] parser, which depend quite a bit on the 
number of productions. But AFAIK the dependency is mostly on memory 
consumption for the internal symbol sets, not so much on runtime 
complexity. I didn't find hard facts about runtime complexity of LALR, 
though (pointers are very welcome).


Are there any ongoing efforts to rewrite the parser (i.e. using another 
algorithm, like a recursive descent parser)?


Regards

Markus

[1]: Wikipedia on the LALR parsing algorithm:
http://en.wikipedia.org/wiki/LALR_parser

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [HACKERS] pg_proc without oid?

2007-02-21 Thread Magnus Hagander

On Mon, Feb 19, 2007 at 11:18:36AM -0500, Tom Lane wrote:
 
 Magnus, I'd suggest reverting whatever you did to your MSVC script,
 so we'll find out the next time someone makes this mistake...
 

Reverted. I left the part in genbki.pl in there, because that's a plain
bug that was exposed by this, and could at least theoretically be
exposed in other ways as well.

//Magnus

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] 8.3 patches hold queue empty

2007-02-21 Thread Oleg Bartunov



Hi there,

We have several patches already discussed in -hackers and waiting for
decision. Current versions adapted for HEAD are available:

1. built-in FTS http://www.sigaev.ru/misc/tsearch_core-0.35.gz

Documentation is available http://mira.sai.msu.su/~megera/pgsql/ftsdoc/
as a separate book, but could be wraped into pg docs - it is in sgml format.
We need definiteness about this patch, since we plan
to give several talks about FTS this spring on several conferences and it'd
be nice to present this new FTS.

btw, after announcing in -hackers, docs were downloaded from 1745 unique
ip's, which indicates people really want built-in FTS.

There is still no consensus about FTS syntax - use SQL commands or SQL 
functions,
it is  currently duscussing. All we want is to provide our users clear syntax
without boring SQL functions interface. Users shouldn't bother if we have
our internal problems with SQL parser optimization. It'is very difficult
to add new feature without some overhead, the problem is how big it and 
could we improve our parser so it will not bound developers.



2. Interval optimization (OR's)
http://www.sigaev.ru/misc/OR_82-0.10.gz

Thread:
http://archives.postgresql.org/pgsql-hackers/2006-12/msg00174.php
planner overhead
http://archives.postgresql.org/pgsql-hackers/2006-12/msg00167.ph

Such queries are frequent and its optimization could be very effective
(several orders of magnitudes). We could have it off by default if planner
overhead concerns and use GUC variable (interval_optimization) to allow 
people to provide hint for selected queries.



3. Indexing IS NULL
http://www.sigaev.ru/misc/indexnulls_82-0.7.gz

Removed support of indexing IS NOT NULL from previous patch.
See http://archives.postgresql.org/pgsql-hackers/2006-12/msg00146.php


Oleg


On Fri, 16 Feb 2007, Bruce Momjian wrote:


I have completed processing of the 8.3 patches hold queue.  There are
some emails where I am waiting on a reply from the authors, but I will
now just handle them as part of the normal patch process.

I will now return to processing patches as they come in, and deal with
the patches that are now waiting.




Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] tsearch in core patch, for inclusion

2007-02-21 Thread Florian G. Pflug


Markus Schiltknecht wrote:

Hi,

Tom Lane wrote:

You mean four different object types.  I'm not totally clear on bison's
scaling behavior relative to the number of productions


You really want to trade parser performance (which is *very* 
implementation specific) for ease of use?


Bison generates a LALR [1] parser, which depend quite a bit on the 
number of productions. But AFAIK the dependency is mostly on memory 
consumption for the internal symbol sets, not so much on runtime 
complexity. I didn't find hard facts about runtime complexity of LALR, 
though (pointers are very welcome).


According to http://en.wikipedia.org/wiki/LR_parser processing one
token in any LR(1) parser in the worst case needs to
 a) Do a lookup in the action table with the current (state, token) pair
 b) Do a lookup in the goto table with a (state, rule) pair.
 c) Push one state onto the stack, and pop n states with
n being the number of symbols (tokens or other rules) on the right
hand side of a rule.

a) and b) should be O(1). Processing one token pushes at most one state
onto the stack, so overall no more than N stats can be popped off again,
making the whole algorithm O(N) with N being the number of tokens of the
input stream.

AFAIK the only difference between SLR, LALR and LR(1) lies in the
generation of the goto and action tables.

Are there any ongoing efforts to rewrite the parser (i.e. using another 
algorithm, like a recursive descent parser)?

Why would you want to do that?

greetings, Florian Pflug

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [HACKERS] Column storage positions

2007-02-21 Thread Alvaro Herrera

Phil Currier escribió:
 Inspired by this thread [1], and in particular by the idea of storing
 three numbers (permanent ID, on-disk storage position, display
 position) for each column, I spent a little time messing around with a
 prototype implementation of column storage positions to see what kind
 of difference it would make.  The results were encouraging: on a table
 with 20 columns of alternating smallint and varchar(10) datatypes,
 selecting the max() of one of the rightmost int columns across 1
 million rows ran around 3 times faster.

[snipped]

I'd expect the system being able to reoder the columns to the most
efficient order possible (performance-wise and padding-saving-wise),
automatically.  When you create a table, sort the columns to the most
efficient order; ALTER TABLE ADD COLUMN just puts the new columns at the
end of the tuple; and anything that requires a rewrite of the table
(ALTER TABLE ... ALTER TYPE for example; would be cool to have CLUSTER
do it as well; and do it on TRUNCATE also) again recomputes the most
efficient order.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] New feature request: FlashBack Query

2007-02-21 Thread Florian G. Pflug


Theo Schlossnagle wrote:


On Feb 20, 2007, at 1:40 PM, Tom Lane wrote:


RPK [EMAIL PROTECTED] writes:

I did not mean asking for undo from a life-time log. Since FlashBack
Technology is already there, I just mean that world's most advanced 
database

(PostgreSQL, as they say), must have an optimized way for undoing of at
least a week changes.


You're living in a dream world.  Do you know any Oracle DBs who keep
enough rollback segments to go back a week?


Ours go for a good 6 hours sometimes :-D


Eeven if it's just one hour, it's certainly better than nothing.
I fully agree that I'd not be acceptable to introduce performance
problems for _everyone_ by introducing flashback. But if you only
experience a drop in performance if you actually enable flashback
(by, let's say setting vacuum_min_deadtime=1h), then I don't see
why anyone would object to having support for some kind of flashback.

However, I just realized that doing this is much harder than I initially
thought, because catalog access always happens with SnapshotNow, and
e.g. drop table deletes datafiles at commit time, and not during vacuum.

Supporting begin; drop table mytable; commit; begin; set transaction 
flashback 1 hour; select * from mytable; commit would therefore be

really hard...

greetings, Florian Pflug

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] --enable-xml instead of --with-libxml?

2007-02-21 Thread Alvaro Herrera

Nikolay Samokhvalov wrote:
 Now we have --with-libxml (and USE_LIBXML constant for #ifdef-s), what
 is not absolutely right: XML support is smth that is more general than
 using libxml2 library. E.g., some SQL/XML publishing functions (such
 as XMLPI) do not deal with libxml2.
 
 Also, in the future more non-libxml functionality could be added to
 XML support (well, Peter's recent SQL-to-XML mapping functions prove
 it).
 
 I think it'd better to rename configure option to --enable-xml and
 USE_LIBXML to ENABLE_XML. I'll do it if there are no objections.

I think it would be better that leaving --with-libxml out (i.e.
compiling without libxml2 support) would only disable those parts in XML
functionality that require libxml2 for their implementation; the rest of
the stuff should be compiled in regardless of the setting.

Is this not what is done currently?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] tsearch in core patch, for inclusion

2007-02-21 Thread Bruce Momjian

Tom Lane wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
  Oleg Bartunov wrote:
  It's not so big addition to the gram.y, see a list of commands
  http://mira.sai.msu.su/~megera/pgsql/ftsdoc/sql-commands.html.
 
  I looked at the diff file and the major change in gram.y is the creation
  of a new object type FULLTEXT,
 
 You mean four different object types.  I'm not totally clear on bison's
 scaling behavior relative to the number of productions, but I think
 there's no question that this patch will impose a measurable distributed
 penalty on every single query issued to Postgres by any application,
 whether it's heard of tsearch or not.  The percentage overhead would
 be a lot lower if the patch were introducing a similar number of entries
 into pg_proc.

My point is that the grammar splits off all the tsearch2 objects by
prefixing them with CREATE FULLTEXT object, where there are four object
types supported.

But as others have pointed out, the performance of the grammar is
probably not an issue in this case.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] New feature request: FlashBack Query

2007-02-21 Thread Alvaro Herrera

Florian G. Pflug wrote:

 However, I just realized that doing this is much harder than I initially
 thought, because catalog access always happens with SnapshotNow, and
 e.g. drop table deletes datafiles at commit time, and not during vacuum.

Not to mention the likenesses of CLUSTER and TRUNCATE, which would need
to be taught how to keep the old datafiles for an additional week/hour.

What I don't understand is why people isn't working in improving
contrib/spi/timetravel.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Column storage positions

2007-02-21 Thread Phil Currier


On 2/21/07, Alvaro Herrera [EMAIL PROTECTED] wrote:

I'd expect the system being able to reoder the columns to the most
efficient order possible (performance-wise and padding-saving-wise),
automatically.  When you create a table, sort the columns to the most
efficient order; ALTER TABLE ADD COLUMN just puts the new columns at the
end of the tuple; and anything that requires a rewrite of the table
(ALTER TABLE ... ALTER TYPE for example; would be cool to have CLUSTER
do it as well; and do it on TRUNCATE also) again recomputes the most
efficient order.


That's exactly what I'm proposing.  On table creation, the system
chooses an efficient column order for you.  The next time an ALTER
TABLE operation forces a rewrite, the system would recompute the
column storage order.  I hadn't thought of having CLUSTER also redo
the storage order, but that seems safe since it takes an exclusive
lock on the table.  I'm less sure about whether it's safe to do this
during a TRUNCATE.

phil

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [HACKERS] tsearch in core patch, for inclusion

2007-02-21 Thread Markus Schiltknecht


Hi,

Florian G. Pflug wrote:

According to http://en.wikipedia.org/wiki/LR_parser processing one
token in any LR(1) parser in the worst case needs to
 a) Do a lookup in the action table with the current (state, token) pair
 b) Do a lookup in the goto table with a (state, rule) pair.
 c) Push one state onto the stack, and pop n states with
n being the number of symbols (tokens or other rules) on the right
hand side of a rule.

a) and b) should be O(1). Processing one token pushes at most one state
onto the stack, so overall no more than N stats can be popped off again,
making the whole algorithm O(N) with N being the number of tokens of the
input stream.


Looks correct, thanks. What exactly is Tom worried about, then?

Are there any ongoing efforts to rewrite the parser (i.e. using 
another algorithm, like a recursive descent parser)?

Why would you want to do that?


I recall having read something about rewriting the parser. Together with 
Tom being worried about parser performance and knowing GCC has switched 
to a hand written parser some time ago, I suspected bison to be slow. 
That's why I've asked.


Regards

Markus


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [HACKERS] --enable-xml instead of --with-libxml?

2007-02-21 Thread Peter Eisentraut

Nikolay Samokhvalov wrote:
 Also, in the future more non-libxml functionality could be added to
 XML support (well, Peter's recent SQL-to-XML mapping functions
 prove it).

But note that those are not controlled by the --with-libxml switch.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] tsearch in core patch, for inclusion

2007-02-21 Thread Tom Lane

Florian G. Pflug [EMAIL PROTECTED] writes:
 Markus Schiltknecht wrote:
 I didn't find hard facts about runtime complexity of LALR, 
 though (pointers are very welcome).

 a) and b) should be O(1). Processing one token pushes at most one state
 onto the stack, so overall no more than N stats can be popped off again,
 making the whole algorithm O(N) with N being the number of tokens of the
 input stream.

Yeah.  I was concerned about the costs involved in trying to pack the
state tables, but it appears that that cost is all paid when the grammar
is compiled --- looking into gram.c, it appears the inner loop contains
just simple array lookups.  Still, bloating of the state tables is
something we ought to pay attention to, because there's a distributed
cost once they no longer fit in a processor's L1 cache.  On my machine
size gram.o is over 360K already ...

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] tsearch in core patch, for inclusion

2007-02-21 Thread Brian Hurt


Markus Schiltknecht wrote:


Hi,

I recall having read something about rewriting the parser. Together 
with Tom being worried about parser performance and knowing GCC has 
switched to a hand written parser some time ago, I suspected bison to 
be slow. That's why I've asked.


This has little to do with performance and everything to do with the 
insanity which is C++:

http://gnu.teleglobe.net/software/gcc/gcc-3.4/changes.html


* A hand-written recursive-descent C++ parser has replaced the
  YACC-derived C++ parser from previous GCC releases. The new
  parser contains much improved infrastructure needed for better
  parsing of C++ source codes, handling of extensions, and clean
  separation (where possible) between proper semantics analysis
  and parsing. The new parser fixes many bugs that were found in
  the old parser.



Short form: C++ is basically not LALR(1) parseable.

Brian

Re: [HACKERS] Column storage positions

2007-02-21 Thread Bruce Momjian

Phil Currier wrote:
 On 2/21/07, Alvaro Herrera [EMAIL PROTECTED] wrote:
  I'd expect the system being able to reoder the columns to the most
  efficient order possible (performance-wise and padding-saving-wise),
  automatically.  When you create a table, sort the columns to the most
  efficient order; ALTER TABLE ADD COLUMN just puts the new columns at the
  end of the tuple; and anything that requires a rewrite of the table
  (ALTER TABLE ... ALTER TYPE for example; would be cool to have CLUSTER
  do it as well; and do it on TRUNCATE also) again recomputes the most
  efficient order.
 
 That's exactly what I'm proposing.  On table creation, the system
 chooses an efficient column order for you.  The next time an ALTER
 TABLE operation forces a rewrite, the system would recompute the
 column storage order.  I hadn't thought of having CLUSTER also redo
 the storage order, but that seems safe since it takes an exclusive
 lock on the table.  I'm less sure about whether it's safe to do this
 during a TRUNCATE.

Keep in mind we have a patch in process to reduce the varlena length and
reduce alignment requirements, so once that is in, reordering columns
will not be as important.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] tsearch in core patch, for inclusion

2007-02-21 Thread Florian G. Pflug


Markus Schiltknecht wrote:
Are there any ongoing efforts to rewrite the parser (i.e. using 
another algorithm, like a recursive descent parser)?

Why would you want to do that?


I recall having read something about rewriting the parser. Together with 
Tom being worried about parser performance and knowing GCC has switched 
to a hand written parser some time ago, I suspected bison to be slow. 
That's why I've asked.


I think the case is different for C and C++. The grammars of C and C++
appear to be much more parser-friendly then SQL, making handcrafting
a parser easier I'd think. And I believe that one of the reasons gcc 
wasn't happy with bison was that I limited the quality of their error 
reporting - which isn't that much of a problem for SQL, since SQL 
statements are rather short compared to your typical C/C++ source file.


Last, but not least, the C and C++ syntax is basically set in stone - At
least now the g++ supports nearly all (or all? don't know) of the C++ 
standard. So it doesn't really matter if changes to the parse are a bit 
more work, because the rarely happen. Postgres seems to add new features 
that change the grammar with every release (with is a good thing!).


greetings, Florian Pflug

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Column storage positions

2007-02-21 Thread Florian G. Pflug


Phil Currier wrote:

On 2/21/07, Alvaro Herrera [EMAIL PROTECTED] wrote:

I'd expect the system being able to reoder the columns to the most
efficient order possible (performance-wise and padding-saving-wise),
automatically.  When you create a table, sort the columns to the most
efficient order; ALTER TABLE ADD COLUMN just puts the new columns at the
end of the tuple; and anything that requires a rewrite of the table
(ALTER TABLE ... ALTER TYPE for example; would be cool to have CLUSTER
do it as well; and do it on TRUNCATE also) again recomputes the most
efficient order.


That's exactly what I'm proposing.  On table creation, the system
chooses an efficient column order for you.  The next time an ALTER
TABLE operation forces a rewrite, the system would recompute the
column storage order.  I hadn't thought of having CLUSTER also redo
the storage order, but that seems safe since it takes an exclusive
lock on the table.  I'm less sure about whether it's safe to do this
during a TRUNCATE.


I think you'd want to have a flag per field that tell you if the user
has overridden the storage pos for that specific field. Otherwise,
the next time you have to chance to optimize the ordering, you might
throw away changes that the admin has done on purpose. The same hold
true for a pg_dump/pg_reload cycle. If none of the fields had their
storage order changed manually, you'd want to reoder them optimally
at dump/reload time. If, however, the admin specified an ordering, you'd
want to preserve that.

greetings, Florian Pflug

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [HACKERS] New feature request: FlashBack Query

2007-02-21 Thread Florian G. Pflug


Alvaro Herrera wrote:

Florian G. Pflug wrote:


However, I just realized that doing this is much harder than I initially
thought, because catalog access always happens with SnapshotNow, and
e.g. drop table deletes datafiles at commit time, and not during vacuum.


Not to mention the likenesses of CLUSTER and TRUNCATE, which would need
to be taught how to keep the old datafiles for an additional week/hour.

What I don't understand is why people isn't working in improving
contrib/spi/timetravel.


Because it serves different usecase I think - flashback is an 
administrative tool, not something you design your application around.

Flashback is more similar to PITR recovery than to contrib/spi/timetravel.

greetings, Florian Pflug

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] tsearch in core patch, for inclusion

2007-02-21 Thread Tom Lane

Florian G. Pflug [EMAIL PROTECTED] writes:
 Markus Schiltknecht wrote:
 Are there any ongoing efforts to rewrite the parser (i.e. using 
 another algorithm, like a recursive descent parser)?

 Why would you want to do that?

 Last, but not least, the C and C++ syntax is basically set in stone - At
 least now the g++ supports nearly all (or all? don't know) of the C++ 
 standard. So it doesn't really matter if changes to the parse are a bit 
 more work, because the rarely happen. Postgres seems to add new features 
 that change the grammar with every release (with is a good thing!).

Yeah.  I think it would be a pretty bad idea for us to go over to a
handwritten parser: not only greater implementation effort for grammar
changes, but greater risk of introducing bugs.  Bison tells you about it
when you've written something ambiguous ...

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

[HACKERS] Status of Hierarchical Queries

2007-02-21 Thread Jonah H. Harris


As was discussed in several threads, I'd handed over the
responsibility of hierarchical queries to Greg Stark several weeks
ago.  He posted a preliminary patch which I don't believe anyone
looked at.  For 8.3's sake, I wanted to make sure we get the status of
this out on the table so there won't be any surprises like those
related to 8.2.

Where are we at?  Has anyone reviewed the preliminary work?  Any
comments, suggestions, etc?

--
Jonah H. Harris, Software Architect | phone: 732.331.1324
EnterpriseDB Corporation| fax: 732.331.1301
33 Wood Ave S, 3rd Floor| [EMAIL PROTECTED]
Iselin, New Jersey 08830| http://www.enterprisedb.com/

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Column storage positions

2007-02-21 Thread Alvaro Herrera

Bruce Momjian escribió:
 Phil Currier wrote:
  On 2/21/07, Alvaro Herrera [EMAIL PROTECTED] wrote:
   I'd expect the system being able to reoder the columns to the most
   efficient order possible (performance-wise and padding-saving-wise),
   automatically.  When you create a table, sort the columns to the most
   efficient order; ALTER TABLE ADD COLUMN just puts the new columns at the
   end of the tuple; and anything that requires a rewrite of the table
   (ALTER TABLE ... ALTER TYPE for example; would be cool to have CLUSTER
   do it as well; and do it on TRUNCATE also) again recomputes the most
   efficient order.
  
  That's exactly what I'm proposing.  On table creation, the system
  chooses an efficient column order for you.  The next time an ALTER
  TABLE operation forces a rewrite, the system would recompute the
  column storage order.  I hadn't thought of having CLUSTER also redo
  the storage order, but that seems safe since it takes an exclusive
  lock on the table.  I'm less sure about whether it's safe to do this
  during a TRUNCATE.
 
 Keep in mind we have a patch in process to reduce the varlena length and
 reduce alignment requirements, so once that is in, reordering columns
 will not be as important.

Yes, but the cache offset stuff is still significant, so there will be
some benefit in putting all the fixed-length attributes at the start of
the tuple, and varlena atts grouped at the end.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Column storage positions

2007-02-21 Thread Phil Currier


On 2/21/07, Bruce Momjian [EMAIL PROTECTED] wrote:

Keep in mind we have a patch in process to reduce the varlena length and
reduce alignment requirements, so once that is in, reordering columns
will not be as important.


Well, as I understand it, that patch isn't really addressing the same
problem.  Consider this table:
create table foo (a varchar(10), b int, c smallint, d int, e smallint, );

There are two problems here:

1) On my machine, each int/smallint column pair takes up 8 bytes.  2
of those 8 bytes are alignment padding wasted on the smallint field.
If we grouped all the smallint fields together within the tuple, that
space would not be lost.

2) Each time you access any of the int/smallint fields, you have to
peek inside the varchar field to figure out its length.  If we stored
the varchar field at the end of the tuple instead, the access times
for all the other fields would be measurably improved, by a factor
that greatly outweighs the small penalty imposed on the varchar field
itself.

My understanding is that the varlena headers patch would potentially
reduce the size of the varchar header (which is definitely worthwhile
by itself), but it wouldn't help much for either of these problems.
Or am I misunderstanding what that patch does?

phil

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate

Re: [HACKERS] Column storage positions

2007-02-21 Thread Bruce Momjian

Phil Currier wrote:
 On 2/21/07, Bruce Momjian [EMAIL PROTECTED] wrote:
  Keep in mind we have a patch in process to reduce the varlena length and
  reduce alignment requirements, so once that is in, reordering columns
  will not be as important.
 
 Well, as I understand it, that patch isn't really addressing the same
 problem.  Consider this table:
 create table foo (a varchar(10), b int, c smallint, d int, e smallint, );
 
 There are two problems here:
 
 1) On my machine, each int/smallint column pair takes up 8 bytes.  2
 of those 8 bytes are alignment padding wasted on the smallint field.
 If we grouped all the smallint fields together within the tuple, that
 space would not be lost.

Yes, good point.

 2) Each time you access any of the int/smallint fields, you have to
 peek inside the varchar field to figure out its length.  If we stored
 the varchar field at the end of the tuple instead, the access times
 for all the other fields would be measurably improved, by a factor
 that greatly outweighs the small penalty imposed on the varchar field
 itself.
 
 My understanding is that the varlena headers patch would potentially
 reduce the size of the varchar header (which is definitely worthwhile
 by itself), but it wouldn't help much for either of these problems.
 Or am I misunderstanding what that patch does?
 

Agreed.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Column storage positions

2007-02-21 Thread Bruce Momjian

Alvaro Herrera wrote:
 Bruce Momjian escribi?:
  Phil Currier wrote:
   On 2/21/07, Alvaro Herrera [EMAIL PROTECTED] wrote:
I'd expect the system being able to reoder the columns to the most
efficient order possible (performance-wise and padding-saving-wise),
automatically.  When you create a table, sort the columns to the most
efficient order; ALTER TABLE ADD COLUMN just puts the new columns at the
end of the tuple; and anything that requires a rewrite of the table
(ALTER TABLE ... ALTER TYPE for example; would be cool to have CLUSTER
do it as well; and do it on TRUNCATE also) again recomputes the most
efficient order.
   
   That's exactly what I'm proposing.  On table creation, the system
   chooses an efficient column order for you.  The next time an ALTER
   TABLE operation forces a rewrite, the system would recompute the
   column storage order.  I hadn't thought of having CLUSTER also redo
   the storage order, but that seems safe since it takes an exclusive
   lock on the table.  I'm less sure about whether it's safe to do this
   during a TRUNCATE.
  
  Keep in mind we have a patch in process to reduce the varlena length and
  reduce alignment requirements, so once that is in, reordering columns
  will not be as important.
 
 Yes, but the cache offset stuff is still significant, so there will be
 some benefit in putting all the fixed-length attributes at the start of
 the tuple, and varlena atts grouped at the end.

Agreed.

-- 
  Bruce Momjian  [EMAIL PROTECTED]  http://momjian.us
  EnterpriseDB   http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Column storage positions

2007-02-21 Thread Martijn van Oosterhout

On Wed, Feb 21, 2007 at 03:59:12PM +0100, Florian G. Pflug wrote:
 I think you'd want to have a flag per field that tell you if the user
 has overridden the storage pos for that specific field. Otherwise,
 the next time you have to chance to optimize the ordering, you might
 throw away changes that the admin has done on purpose. 

Why would you want to let the admin have any say at all about the
storage order?

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 From each according to his ability. To each according to his ability to 
 litigate.


signature.asc
Description: Digital signature

Re: [HACKERS] Proposal: Change of pg_trigger.tg_enabled and adding

2007-02-21 Thread José Orlando Pereira

Jan Wieck wrote:
 On 1/25/2007 7:33 PM, Tom Lane wrote:
 
  1   fires always
  0   fires never
  N   fires in Normal mode
  R   fires in Replica mode
  other letters available for other future mode values?
  
  If you consistently think of origin and replica modes then the
  original proposal is better (using both 0 and O would be Real Bad),
  but your use of normal and replica in the followup makes me wonder
  which terminology is more common.
 
 Yeah, I tried for a long time to stay away from terms like master and
 slave ... but in the end people don't understand you if you talk about
 origin and subscriber or replica. That's how this inconsistent terminology 
 slipped into my vocabulary.

 I personally don't care about the particular values. I could live with A, B,
 C, D. If people find 1, 0, N, R more explanatory, fine. 

Hi, 

Regardless of the actual namiing, we at the GORDA project strongly support 
this feature. In fact, our current prototype does this, although we do not 
allow fine grained configuration of which triggers get deactivated by 
the replica mode. It is hardcoded.

(Sorry for being so late to the party, but the subject hasn't initially caught 
our attention.)

Regards,

-- 
Jose Orlando Pereira

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Priorities for users or queries?

2007-02-21 Thread José Orlando Pereira

Benjamin Arai wrote:
 Is there a way to give priorities to queries or users? Something similar to 
 NICE in Linux. My goal is to give the updating (backend) application a very 
 low priority and give the web application a high priority to avoid
 disturbing the user experience.  

 Thanks in advance!

You might want to look at the following, as they mention a PostgreSQL 
prototype:

http://www.cs.cmu.edu/~natassa/aapubs/conference/priority mechanisms.pdf

Regards,

-- 
Jose Orlando Pereira

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Column storage positions

2007-02-21 Thread Andrew Dunstan


Florian G. Pflug wrote:


I think you'd want to have a flag per field that tell you if the user
has overridden the storage pos for that specific field. Otherwise,
the next time you have to chance to optimize the ordering, you might
throw away changes that the admin has done on purpose. The same hold
true for a pg_dump/pg_reload cycle. If none of the fields had their
storage order changed manually, you'd want to reoder them optimally
at dump/reload time. If, however, the admin specified an ordering, you'd
want to preserve that.



I don't think users should be monkeying with the storage position at 
all. Decisions about that should belong to the engine, not to users. 
Providing a user tweakable knob for this strikes me as a large footgun, 
as well as requiring all sorts of extra checks along the lines you are 
talking of.


cheers

andrew

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate

Re: [HACKERS] Proposal: Commit timestamp

2007-02-21 Thread José Orlando Pereira

On Friday 09 February 2007, Jan Wieck wrote:
 I am not sure, I would have to look at what exactly that hook provides.
 The key to a Lamport timestamp is that it is advancing it commit order
 (plus some other things ... of course). If the hook can guarantee that
 the calls are made always in commit order, serialized without any race
 condition possible, it would probably be suitable.

Actually what we do is a bit stronger. We use the commit hook to enforce an 
externally defined commit order. In our case, this is defined by a group 
communication protocol, which is even allowed to reorder a pair of 
transactions originating from the same replica. Therefore, achieving a commit 
order that is consistent with a local clock should be straightforward.

Regards,

-- 
Jose Orlando Pereira

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] Column storage positions

2007-02-21 Thread Phil Currier


On 2/21/07, Martijn van Oosterhout kleptog@svana.org wrote:

On Wed, Feb 21, 2007 at 03:59:12PM +0100, Florian G. Pflug wrote:
 I think you'd want to have a flag per field that tell you if the user
 has overridden the storage pos for that specific field. Otherwise,
 the next time you have to chance to optimize the ordering, you might
 throw away changes that the admin has done on purpose.

Why would you want to let the admin have any say at all about the
storage order?


Well, for two reasons:

1) If you have a table with one very-frequently-accessed varchar()
column and several not-frequently-accessed int columns, it might
actually make sense to put the varchar column first.  The system won't
always be able to make the most intelligent decision about table
layout.

2) As I described in my original email, without this capability, I
don't see any good way to perform an upgrade between PG versions
without rewriting each table's data.  Maybe most people aren't doing
upgrades like this right now, but it seems like it will only become
more common in the future.  In my opinion, this is more important than
#1.

But I understand that it's a potential foot-gun, so I'm happy to drop
it.  It would be nice though if there were some ideas about how to
address problem #2 at least.

phil

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate

Re: [HACKERS] Column storage positions

2007-02-21 Thread Florian G. Pflug


Martijn van Oosterhout wrote:

On Wed, Feb 21, 2007 at 03:59:12PM +0100, Florian G. Pflug wrote:

I think you'd want to have a flag per field that tell you if the user
has overridden the storage pos for that specific field. Otherwise,
the next time you have to chance to optimize the ordering, you might
throw away changes that the admin has done on purpose. 


Why would you want to let the admin have any say at all about the
storage order?


It wasn't my idea - the OP proposed a alter table table alter column 
col set storage position pos command. But if you're gonna decouple

the storage order from the attnum, they why don't let the dba tweak it?

Since you have at least two possible optimization speeds - for size, or
for fast access to specifc fields, creating a one-size-fits-all ordering
rule seems hard...

greetings, Florian Pflug


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [HACKERS] Column storage positions

2007-02-21 Thread Florian G. Pflug


Andrew Dunstan wrote:

Florian G. Pflug wrote:


I think you'd want to have a flag per field that tell you if the user
has overridden the storage pos for that specific field. Otherwise,
the next time you have to chance to optimize the ordering, you might
throw away changes that the admin has done on purpose. The same hold
true for a pg_dump/pg_reload cycle. If none of the fields had their
storage order changed manually, you'd want to reoder them optimally
at dump/reload time. If, however, the admin specified an ordering, you'd
want to preserve that.



I don't think users should be monkeying with the storage position at 
all. Decisions about that should belong to the engine, not to users. 
Providing a user tweakable knob for this strikes me as a large footgun, 
as well as requiring all sorts of extra checks along the lines you are 
talking of.


Maybe you shouldn't support specifying the storage order directly, but
rather through some kind of priority field. The idea would be that
the storage order is determinted by sorting the fields according to
the priority field. Groups of fields with the same priority would
get ordered for maximal space efficiency.

greetings, Florian Pflug

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Column storage positions

2007-02-21 Thread Martijn van Oosterhout

On Wed, Feb 21, 2007 at 12:06:30PM -0500, Phil Currier wrote:
 Well, for two reasons:
 
 1) If you have a table with one very-frequently-accessed varchar()
 column and several not-frequently-accessed int columns, it might
 actually make sense to put the varchar column first.  The system won't
 always be able to make the most intelligent decision about table
 layout.

Umm, the point of the exercise is that if you know there are int
columns, then you can skip over them, whereas you can never skip over a
varchar column. So there isn't really any situation where it would be
better to put the varchar first.

 
 don't see any good way to perform an upgrade between PG versions
 without rewriting each table's data.  Maybe most people aren't doing
 upgrades like this right now, but it seems like it will only become
 more common in the future.  In my opinion, this is more important than
 #1.

I don't see this either. For all current tables, the storage position
is the attribute number, no exception. You say:

 because the version X table could
 have dropped columns that might or might not be present in any given
 tuple on disk. 

Whether they're there or not is irrelevent. Drop columns are not
necesarily empty, but in any case they occupy a storage position until
the table is rewritten. A dump/restore doesn't need to preserve this,
but pg_migrator will need some smarts to handle it. The system will
need to create a column of the appropriate type and drop it to get to
the right state.

If you really want to use pg_dump I'd suggest an option to pg_dump
--dump-dropped-columns which will include the dropped columns in the
CREATE TABLE but drop them immediatly after. It's really more a corner
case than anything else.

Have a nice day,
-- 
Martijn van Oosterhout   kleptog@svana.org   http://svana.org/kleptog/
 From each according to his ability. To each according to his ability to 
 litigate.


signature.asc
Description: Digital signature

Re: [HACKERS] Column storage positions

2007-02-21 Thread Simon Riggs

On Wed, 2007-02-21 at 09:25 -0500, Phil Currier wrote:
 On 2/21/07, Alvaro Herrera [EMAIL PROTECTED] wrote:
  I'd expect the system being able to reoder the columns to the most
  efficient order possible (performance-wise and padding-saving-wise),
  automatically.  When you create a table, sort the columns to the most
  efficient order; ALTER TABLE ADD COLUMN just puts the new columns at the
  end of the tuple; and anything that requires a rewrite of the table
  (ALTER TABLE ... ALTER TYPE for example; would be cool to have CLUSTER
  do it as well; and do it on TRUNCATE also) again recomputes the most
  efficient order.
 
 That's exactly what I'm proposing.  On table creation, the system
 chooses an efficient column order for you. 

That's fairly straightforward and beneficial. I much prefer Alvaro's
approach rather than the storage position details originally described.
Moreover, you'd need to significantly re-write lots of ALTER TABLE and I
really don't think you want to go there.

There is a problem: If people do a CREATE TABLE and then issue SELECT *
they will find the columns in a different order. That could actually
break some programs, so it isn't acceptable in all cases. e.g. COPY
without a column-list assumes that the incoming data should be assigned
to the table columns in the same order as the incoming data file.

So if we do this, it should be controllable using a GUC: 
optimize_column_order = off (default) | on
This should be a USERSET, so different users can create tables in either
full control or optimised mode, as they choose.

It should be possible to do that with the minimum number of position
swaps, so that people who have ordered the columns according to usage
frequency would still get what they wanted.

  The next time an ALTER
 TABLE operation forces a rewrite, the system would recompute the
 column storage order.  I hadn't thought of having CLUSTER also redo
 the storage order, but that seems safe since it takes an exclusive
 lock on the table.  I'm less sure about whether it's safe to do this
 during a TRUNCATE.

The GUC should apply to whenever/wherever this optimization occurs.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] HOT for PostgreSQL 8.3

2007-02-21 Thread Simon Riggs

On Mon, 2007-02-12 at 09:24 +0530, Pavan Deolasee wrote:

 On 2/12/07, Heikki Linnakangas [EMAIL PROTECTED] wrote:
 Hannu Krosing wrote:
  Ühel kenal päeval, P, 2007-02-11 kell 12:35, kirjutas Tom
 Lane:
  Hannu Krosing [EMAIL PROTECTED] writes:
  What if we would just reuse the root tuple directly
 instead of turning 
  it into a stub ?
  This would create a cycle of ctid pointers, which changes
 the lookup
  process from 'follow ctid chaint until the end' to 'follow
 the tid chain
  until you reach the start'.
  How do you know which one is newest?
 
  By xmin,cmin of course .
 
  What happens when you have to put a newer version off-page
 for lack of space? 
 
  Then this scheme won't work.
 
 Couldn't the ctid of the latest tuple point to the off-page
 tuple as usual? 
 
 It could. But then you may lose reference for older version(s). We can
 do 
 the whole page scans to locate older versions, but that might prove
 too
 costly
  
 
 It might be acceptable, if it was only stored on those tuples
 that are
 (HOT) updated. But it's not clear to me what you're proposing
 to do with
 the field, anyway, Which tuples would have it, and what would
 it point to? 
 
 My guess what Hannu is suggesting is to have a  circular chain of
 HOT-updated
 tuples in a page. The index can point to any tuple in the chain. When
 update goes off-page or is a COLD update, t_ctid points to the newer
 version 
 as usual. So a tuple which is COLD updated would need two pointers,
 one
 which points to the oldest version in the circular on-page chain and
 other which
 points to the new version.

I very much like Hannu's idea, but it does present some issues.

My feeling is we need to get some clarity on whether to go this route,
or not, because it takes us away from the pointer-swinging idea. To
maximise our chances of a something good for 8.3 we really need to pick
just one idea now. Maybe we've already done that, by default.

I've tried to analyse all aspects of the discussion to see if we can get
something out of this:

Hannu has described the possibility of having the index point to a tuple
version which isn't the earliest one on the block. That raises the issue
of how will we locate earlier tuple versions when the current Snapshot
cannot see the root tuple?

First off, we only need to locate the earlier tuple when the root was
re-placed by a HOT update - we can tell that, so we're good so far. 

Second, we would like to be able to validate the xmax == xmin rule. If
the original root tuple was dead, so will the prior versions that are
off-block, so nobody will ever follow the chain to the root tuple again
as part of EvalPlanQual. So we need to check newroot.xmax ==
nextinchain.xmin only.

Third we need to locate the appropriate tuple version. 

We have a number of approaches:

a) Block Seq Scan: using a scan to get there is possible, but not good.
The best mechanism is to scan the block backwards (i.e. highest item to
lowest item) picking up the tuple which points to the root, then
scanning backwards picking up the parts of the chain as they are seen
until we get to a visible tuple that we can validate as part of the
chain. If we scanned forwards we'd need to remember the whole chain as
we went, which would be less useful. *But* this doesn't allow us to
validate the xmax == xmin rule, so that kills this sub-option, IMHO.

b) Additional tuple fields

We could have additional fields on the root tuple to help us locate the
true root or oldest version of this tuple on the block. Those
additional header fields are only required on the root tuple - no other
tuple needs this info. So we need not complain about space
savings/losses - the comparison is between extra tuple headers and a
TupleStub, which is a full header width (24 bytes). This would need to
include a copy of the xmin of the original root tuple, to allow us to
validate the xmax == xmin rule (4 bytes). It would also need to include
an in-page pointer to the true root (2 bytes). So additional tuple
fields of 6 bytes would be added to the root tuple, probably maxaligning
to 8 more bytes than normal - a saving of 16 bytes in comparison with
the TupleStub approach. Since there is no separate TupleStub, we can
replace the root when required, not just at full table VACUUM time. The
presence, or not, of those fields can be marked using various info bits.
The NULL bitmap is effectively an optional header field, so the idea
seems workable - or at least as workable as pointer swinging.

The in-page pointer is only ever followed as a result of an index scan.
In all those cases we don't record the xmin,  so we don't check it - we
only need to store the xmin of the tuple pointed to by the in-page
pointer. There may be some off-block tuple versions that

Re: [HACKERS] Column storage positions

2007-02-21 Thread Phil Currier


On 2/21/07, Martijn van Oosterhout kleptog@svana.org wrote:

 don't see any good way to perform an upgrade between PG versions
 without rewriting each table's data.  Maybe most people aren't doing
 upgrades like this right now, but it seems like it will only become
 more common in the future.  In my opinion, this is more important than
 #1.

I don't see this either. For all current tables, the storage position
is the attribute number, no exception. You say:

 because the version X table could
 have dropped columns that might or might not be present in any given
 tuple on disk.

Whether they're there or not is irrelevent. Drop columns are not
necesarily empty, but in any case they occupy a storage position until
the table is rewritten. A dump/restore doesn't need to preserve this,
but pg_migrator will need some smarts to handle it. The system will
need to create a column of the appropriate type and drop it to get to
the right state.


I agree, a dump/restore that rewrites all the table datafiles doesn't
need to handle this.  And I agree that the system will need to create
dropped columns and then drop them again, that's exactly what I
suggested in fact.  We're talking about pg_migrator-style upgrades
only here.

Say we do this in 8.2:

create table foo (a varchar(10), b int);
insert into foo 
alter table foo add column c int;

At this point, the column storage order is (a, b, c) because 8.2 never
changes storage order.  Then you upgrade to 8.3.  pg_dump now wants to
write out some DDL that will create a table matching the existing
table datafile, since we don't want to have to rewrite it.  pg_dump
prints out:

create table foo (a varchar(10), b int, c int);

The 8.3 system will try to create the table with column order (b, c,
a), since it's trying to optimize storage order, and that won't match
the existing table datafile.  What we need is a way to make sure that
the table matches the original datafile.

Now say that it's not an 8.2 - 8.3 upgrade, say it's an 8.3 - 8.4
upgrade.  In this case, 8.3 would have the table with storage order
(b, a, c).  (Column c would have been added at the end since it was
added without a default, and didn't force a table rewrite.)  How do
you get pg_dump to print out table creation DDL that will result in a
table matching the existing (b, a, c) table datafile?

This is why I think pg_dump needs to be able to print an ALTER TABLE
statement that will explicitly assign storage positions.  This happens
to have the side-effect of being potentially useful to admins who
might want control over that.

If this only affected 8.2 - 8.3 upgrades, then maybe it's not as
important an issue.  But I think it affects *all* future upgrades,
which is why I'm trying to raise the issue now.



If you really want to use pg_dump I'd suggest an option to pg_dump
--dump-dropped-columns which will include the dropped columns in the
CREATE TABLE but drop them immediatly after. It's really more a corner
case than anything else.


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq

Re: [HACKERS] Column storage positions

2007-02-21 Thread Andrew Dunstan


Simon Riggs wrote:

On Wed, 2007-02-21 at 09:25 -0500, Phil Currier wrote:
  

On 2/21/07, Alvaro Herrera [EMAIL PROTECTED] wrote:


I'd expect the system being able to reoder the columns to the most
efficient order possible (performance-wise and padding-saving-wise),
automatically.  When you create a table, sort the columns to the most
efficient order; ALTER TABLE ADD COLUMN just puts the new columns at the
end of the tuple; and anything that requires a rewrite of the table
(ALTER TABLE ... ALTER TYPE for example; would be cool to have CLUSTER
do it as well; and do it on TRUNCATE also) again recomputes the most
efficient order.
  

That's exactly what I'm proposing.  On table creation, the system
chooses an efficient column order for you. 



That's fairly straightforward and beneficial. I much prefer Alvaro's
approach rather than the storage position details originally described.
Moreover, you'd need to significantly re-write lots of ALTER TABLE and I
really don't think you want to go there.

There is a problem: If people do a CREATE TABLE and then issue SELECT *
they will find the columns in a different order. That could actually
break some programs, so it isn't acceptable in all cases. e.g. COPY
without a column-list assumes that the incoming data should be assigned
to the table columns in the same order as the incoming data file.
  


You seem to have missed that we will be separating logical from physical 
ordering. Each attribute will have a permanent id, a physical ordering 
and a logical ordering. You can change either ordering without affecting 
the other.


COPY, SELECT and all user-visible commands should follow the logical 
ordering, not the physical ordering, which should be completely 
invisible to SQL.


cheers

andrew



---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] HOT for PostgreSQL 8.3

2007-02-21 Thread Pavan Deolasee


On 2/21/07, Simon Riggs [EMAIL PROTECTED] wrote:



I very much like Hannu's idea, but it does present some issues.



I too liked Hannu's idea initially, but Tom raised a valid concern
that it does not address the basic issue of root tuples. According
to the idea, a DEAD root tuple can be used for a subsequent
update of the same row. For a very large table, even if its updated
frequently, it is not unlikely that the same row might not be updated
for a long time. Even when the update happens we would be
constrained by the length of the new version being same or less
than the root tuple OR ability to perform retail-vacuum of the block.

Did you or anybody else got a chance to think about the other idea
I proposed of having indirection from the root line pointer ? As I
mentioned earlier, I myself haven't thought through it completely,
but at the face of it, it looks doable. It would add a four-byte
overhead per live tuple-chain, but IMHO would be much simpler
to implement and not too invasive.

Thanks,
Pavan

--

EnterpriseDB http://www.enterprisedb.com

Re: [HACKERS] Column storage positions

2007-02-21 Thread Simon Riggs

On Wed, 2007-02-21 at 13:16 -0500, Andrew Dunstan wrote:
 Simon Riggs wrote:
  On Wed, 2007-02-21 at 09:25 -0500, Phil Currier wrote:

  On 2/21/07, Alvaro Herrera [EMAIL PROTECTED] wrote:
  
  I'd expect the system being able to reoder the columns to the most
  efficient order possible (performance-wise and padding-saving-wise),
  automatically.  When you create a table, sort the columns to the most
  efficient order; ALTER TABLE ADD COLUMN just puts the new columns at the
  end of the tuple; and anything that requires a rewrite of the table
  (ALTER TABLE ... ALTER TYPE for example; would be cool to have CLUSTER
  do it as well; and do it on TRUNCATE also) again recomputes the most
  efficient order.

  That's exactly what I'm proposing.  On table creation, the system
  chooses an efficient column order for you. 
  
 
  That's fairly straightforward and beneficial. I much prefer Alvaro's
  approach rather than the storage position details originally described.
  Moreover, you'd need to significantly re-write lots of ALTER TABLE and I
  really don't think you want to go there.
 
  There is a problem: If people do a CREATE TABLE and then issue SELECT *
  they will find the columns in a different order. That could actually
  break some programs, so it isn't acceptable in all cases. e.g. COPY
  without a column-list assumes that the incoming data should be assigned
  to the table columns in the same order as the incoming data file.

 
 You seem to have missed that we will be separating logical from physical 
 ordering. Each attribute will have a permanent id, a physical ordering 
 and a logical ordering. You can change either ordering without affecting 
 the other.

I missed nothing, AFAICS. My understanding was that Alvaro was proposing
to have just a simple physical re-ordering and that would be altered at
CREATE TABLE time. No complexity of multiple column orderings: nice,
simple and effective. My only addition was to say: must be optional.

 COPY, SELECT and all user-visible commands should follow the logical 
 ordering, not the physical ordering, which should be completely 
 invisible to SQL.

I agree with comments here about the multiple orderings being a horrible
source of bugs, as well as lots of coding even to make it happen at all
http://archives.postgresql.org/pgsql-hackers/2006-12/msg00859.php

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] HOT for PostgreSQL 8.3

2007-02-21 Thread Simon Riggs

On Thu, 2007-02-22 at 00:00 +0530, Pavan Deolasee wrote:
 
 On 2/21/07, Simon Riggs [EMAIL PROTECTED] wrote:
 
 I very much like Hannu's idea, but it does present some
 issues.
 
 
 I too liked Hannu's idea initially, but Tom raised a valid concern
 that it does not address the basic issue of root tuples. According 
 to the idea, a DEAD root tuple can be used for a subsequent
 update of the same row. For a very large table, even if its updated
 frequently, it is not unlikely that the same row might not be updated
 for a long time. 

That's a fair point and pointer swinging would address that.

However, the space wastage is identical to the current situation. We
need to choose which use-case we are optimising for: the case you
mention would be optimising for high volume but very thinly spread
updates. Personally, I don't see that as the most important use case out
of the many possibilities. The problem we are trying to address is rows
that *are* very frequently updated. There are so many sub-cases its easy
to get distracted about which ones are actually causing usage problems
right now.

Anyway I take the point that pointer swinging is long term necessary,
but my question is whether we need this yet.

 Even when the update happens we would be 
 constrained by the length of the new version being same or less
 than the root tuple OR ability to perform retail-vacuum of the block.

Well, thats a more important question: surely we have agreed that retail
VACUUM is both possible and beneficial?

 Did you or anybody else got a chance to think about the other idea 
 I proposed of having indirection from the root line pointer ? 

Well yes, I saw that, but I was pointing out that we don't need to use
just a single byte if we have a part of the tuple header that only
exists in these circumstances.

 As I
 mentioned earlier, I myself haven't thought through it completely,
 but at the face of it, it looks doable. It would add a four-byte
 overhead per live tuple-chain, but IMHO would be much simpler 
 to implement and not too invasive.

Cool.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] HOT WIP Patch - version 2

2007-02-21 Thread Simon Riggs

On Tue, 2007-02-20 at 09:48 +0200, Hannu Krosing wrote:

 I'm not sure about the we are more concerned about the large tables
 part. I see it more as a device for high-update tables. This may not
 always be the same as large, so there should be some fallbacks for
 case where you can't get the lock. Maybe just give up and move to
 another page ?

Every design favours one or other of the various use cases.

The worst performance is caused by large tables with random updates,
because that causes substantially more I/O than a smaller table.

A table with substantially more updaters than rows will be unlikely to
ever yield a vacuum-level lock on the block, so the table will
inevitably grow. But because its fairly small, it won't grow that much
before an autovacuum is triggered to clean it up. The index entries will
still be minimised in this case.

The case of a small number of rows being very heavily updated in an
otherwise very large table will not be well optimised by this simplified
version of HOT. However, that case can still benefit from a Dead Space
Map approach. In view of other work on DSM it was felt that simplifying
HOT was the right thing to do. So DSM is still required.

If no other DSM approaches happen, it should be possible to implement an
80/20 version of DSM by simply running a VACUUM using the current FSM
implementation as the input blockids. In many cases that will yield a
good proportion of the benefits of a full VACUUM. I hope that will be
agreed if there isn't any other agreement on a full DSM solution; it
would be a pity to ignore such a low complexity solution.

Note also that Alvaro's multi-vacuum solution will also be required to
allow VACUUMs to be effective against heavily updated, yet smaller
tables.

So the comment about more concerned with large tables is really a
trade-off to allow a simpler solution, yet in an area that minimises the
performance disadvantages.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Column storage positions

2007-02-21 Thread Florian G. Pflug


Phil Currier wrote:

On 2/21/07, Martijn van Oosterhout kleptog@svana.org wrote:

 don't see any good way to perform an upgrade between PG versions
 without rewriting each table's data.  Maybe most people aren't doing
 upgrades like this right now, but it seems like it will only become
 more common in the future.  In my opinion, this is more important than
 #1.

I don't see this either. For all current tables, the storage position
is the attribute number, no exception. You say:

 because the version X table could
 have dropped columns that might or might not be present in any given
 tuple on disk.

Whether they're there or not is irrelevent. Drop columns are not
necesarily empty, but in any case they occupy a storage position until
the table is rewritten. A dump/restore doesn't need to preserve this,
but pg_migrator will need some smarts to handle it. The system will
need to create a column of the appropriate type and drop it to get to
the right state.


I agree, a dump/restore that rewrites all the table datafiles doesn't
need to handle this.  And I agree that the system will need to create
dropped columns and then drop them again, that's exactly what I
suggested in fact.  We're talking about pg_migrator-style upgrades
only here.


Why would a pg_migrator style upgrade use pg_dump at all? I assumed it
would rather copy the verbatim data from the old to the new catalog,
only changing it if the layout of the tables in pg_catalog actually 
changed.


greetings, Florian Pflug

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Column storage positions

2007-02-21 Thread Stephan Szabo

On Wed, 21 Feb 2007, Martijn van Oosterhout wrote:

 On Wed, Feb 21, 2007 at 12:06:30PM -0500, Phil Currier wrote:
  Well, for two reasons:
 
  1) If you have a table with one very-frequently-accessed varchar()
  column and several not-frequently-accessed int columns, it might
  actually make sense to put the varchar column first.  The system won't
  always be able to make the most intelligent decision about table
  layout.

 Umm, the point of the exercise is that if you know there are int
 columns, then you can skip over them, whereas you can never skip over a
 varchar column. So there isn't really any situation where it would be
 better to put the varchar first.

IIRC, in the first message in this thread, or another recent thread of
this type, someone tried a reordering example with alternating
smallints and varchar() and found that the leftmost varchar was
actually slower to access after reordering, so I'm not sure that we can
say there isn't a situation where it would affect things.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Column storage positions

2007-02-21 Thread Florian G. Pflug


Simon Riggs wrote:

On Wed, 2007-02-21 at 09:25 -0500, Phil Currier wrote:

On 2/21/07, Alvaro Herrera [EMAIL PROTECTED] wrote:

I'd expect the system being able to reoder the columns to the most
efficient order possible (performance-wise and padding-saving-wise),
automatically.  When you create a table, sort the columns to the most
efficient order; ALTER TABLE ADD COLUMN just puts the new columns at the
end of the tuple; and anything that requires a rewrite of the table
(ALTER TABLE ... ALTER TYPE for example; would be cool to have CLUSTER
do it as well; and do it on TRUNCATE also) again recomputes the most
efficient order.

That's exactly what I'm proposing.  On table creation, the system
chooses an efficient column order for you. 


That's fairly straightforward and beneficial. I much prefer Alvaro's
approach rather than the storage position details originally described.
Moreover, you'd need to significantly re-write lots of ALTER TABLE and I
really don't think you want to go there.

There is a problem: If people do a CREATE TABLE and then issue SELECT *
they will find the columns in a different order. That could actually
break some programs, so it isn't acceptable in all cases. e.g. COPY
without a column-list assumes that the incoming data should be assigned
to the table columns in the same order as the incoming data file.


But the display order (and hence the COPY order) of columns would still 
be determinted by attnum, not by some attstoragepos, no?
The column reordering would only apply to the physical storage of 
columns, not to how it's presented to the user I'd think.


The original idea was to add a third column, attdisplaypos, and let the 
user choose the display ordering independent from the unique id 
(attnum), which in turn is independent from the storage position.


For simplicity, the OP said he omitted the display-position part here,
because it's really orthogonal to being able to modify the storage position.

greetings, Florian Pflug

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Column storage positions

2007-02-21 Thread Andrew Dunstan


Simon Riggs wrote:


I agree with comments here about the multiple orderings being a horrible
source of bugs, as well as lots of coding even to make it happen at all
http://archives.postgresql.org/pgsql-hackers/2006-12/msg00859.php

  


I thought we were going with this later proposal of Tom's (on which he's 
convinced me): 
http://archives.postgresql.org/pgsql-hackers/2006-12/msg00983.php - if 
not I'm totally confused (situation normal). The current thread started 
with this sentence:



Inspired by this thread [1], and in particular by the idea of storing
three numbers (permanent ID, on-disk storage position, display
position) for each column, I spent a little time messing around with a
prototype implementation of column storage positions to see what kind
of difference it would make.


I haven't understood Alvaro to suggest not keeping 3 numbers.

cheers



andrew

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate

Re: [HACKERS] Column storage positions

2007-02-21 Thread Alvaro Herrera

Stephan Szabo escribió:
 On Wed, 21 Feb 2007, Martijn van Oosterhout wrote:
 
  On Wed, Feb 21, 2007 at 12:06:30PM -0500, Phil Currier wrote:
   Well, for two reasons:
  
   1) If you have a table with one very-frequently-accessed varchar()
   column and several not-frequently-accessed int columns, it might
   actually make sense to put the varchar column first.  The system won't
   always be able to make the most intelligent decision about table
   layout.
 
  Umm, the point of the exercise is that if you know there are int
  columns, then you can skip over them, whereas you can never skip over a
  varchar column. So there isn't really any situation where it would be
  better to put the varchar first.
 
 IIRC, in the first message in this thread, or another recent thread of
 this type, someone tried a reordering example with alternating
 smallints and varchar() and found that the leftmost varchar was
 actually slower to access after reordering, so I'm not sure that we can
 say there isn't a situation where it would affect things.

Offsets are cached in tuple accesses, but the caching is obviously
disabled for all attributes past any variable-length attribute.  So if
you put a varlena attr in front, caching is completely disabled for all
attrs (but that first one).  The automatic reordering algorithm must put
all fixed-len attrs at the front, so that their offets (and that of the
first variable length attr) can be cached.

Did I miss something in what you were trying to say?  I assume you must
already know this.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Column storage positions

2007-02-21 Thread Alvaro Herrera

Andrew Dunstan escribió:
 Simon Riggs wrote:
 
 I agree with comments here about the multiple orderings being a horrible
 source of bugs, as well as lots of coding even to make it happen at all
 http://archives.postgresql.org/pgsql-hackers/2006-12/msg00859.php
 
 I thought we were going with this later proposal of Tom's (on which he's 
 convinced me): 
 http://archives.postgresql.org/pgsql-hackers/2006-12/msg00983.php - if 
 not I'm totally confused (situation normal). The current thread started 
 with this sentence:
 
 Inspired by this thread [1], and in particular by the idea of storing
 three numbers (permanent ID, on-disk storage position, display
 position) for each column, I spent a little time messing around with a
 prototype implementation of column storage positions to see what kind
 of difference it would make.
 
 I haven't understood Alvaro to suggest not keeping 3 numbers.

Right, I'm not advocating not doing that -- I'm just saying that the
first step to that could be decoupling physical position with attr id
:-) Logical column ordering (the order in which SELECT * expands to)
seems to me to be a different feature.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Column storage positions

2007-02-21 Thread Gregory Stark


Florian G. Pflug [EMAIL PROTECTED] writes:

 Why would a pg_migrator style upgrade use pg_dump at all? I assumed it
 would rather copy the verbatim data from the old to the new catalog,
 only changing it if the layout of the tables in pg_catalog actually changed.

The way pg_migrator works is does a pg_dump to move the schema to the new
postgres. Then it transfers the files and drops them into place where the new
schema expects to find them.

So yes, there would be a use case for specifying the physical column layout
when pg_migrator is doing the pg_dump/restore. But pg_migrator could probably
just update the physical column numbers itself. It's not like updating system
catalog tables directly is any more of an abstraction violation than swapping
files out from under the database...

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Column storage positions

2007-02-21 Thread Andrew Dunstan




Alvaro Herrera wrote:


I haven't understood Alvaro to suggest not keeping 3 numbers.



Right, I'm not advocating not doing that -- I'm just saying that the
first step to that could be decoupling physical position with attr id
:-) Logical column ordering (the order in which SELECT * expands to)
seems to me to be a different feature.

  


Except in the sense that divorcing the id from the storage order makes 
it possible to do sanely. :-)


Incidentally, I'm sure there would be a full scale revolt if there was a 
suggestion to alter the visible behaviour of SELECT *, COPY and other 
commands that rely on the logical ordering (which is currently, and 
unless we provide commands to alter it would stay as, the definition 
order). That's the order pg_dump should use IMNSHO - it should never 
have to worry about the physical order nor about explicitly setting the 
logical order.


cheers

andrew



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Status of Hierarchical Queries

2007-02-21 Thread Gregory Stark

Jonah H. Harris [EMAIL PROTECTED] writes:

 As was discussed in several threads, I'd handed over the
 responsibility of hierarchical queries to Greg Stark several weeks
 ago.  He posted a preliminary patch which I don't believe anyone
 looked at.  For 8.3's sake, I wanted to make sure we get the status of
 this out on the table so there won't be any surprises like those
 related to 8.2.

 Where are we at?  

The preliminary patch didn't actually do anything recursive. It handled
non-recursive WITH clauses by directly inlining the subquery as if it were a
subquery RangeTable.

Now that's not entirely useless, it's a handy syntactic sugar for having to
write the same query multiple times.

 Has anyone reviewed the preliminary work? Any comments, suggestions, etc?

I had asked questions about whether people thought the places where I was
storing the state were appropriate. I'm not entirely clear on what types of
state should live in the pstate versus in the parse tree versus elsewhere.

Specifically I asked about a problem where I thought using the pstate to store
the scope of the cte names would give the right semantics where they get
inherited by subqueries but pass out of scope for outer queries. However for
some reason I wasn't getting the behaviour I was expecting and subqueries
didn't seem to have them in scope.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

[HACKERS] bug in CHECK(some SIMILAR TO ..)

2007-02-21 Thread Pavel Stehule


Hello

yesterday I sent posted bug, bud it hangs because I am not member of 
psql-bug mailing list. Can somebody look to queue?


Thank You
Pavel Stehule

_
Najdete si svou lasku a nove pratele na Match.com. http://www.msn.cz/


---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Status of Hierarchical Queries

2007-02-21 Thread Gavin Sherry

On Wed, 21 Feb 2007, Jonah H. Harris wrote:

 As was discussed in several threads, I'd handed over the
 responsibility of hierarchical queries to Greg Stark several weeks
 ago.  He posted a preliminary patch which I don't believe anyone
 looked at.  For 8.3's sake, I wanted to make sure we get the status of
 this out on the table so there won't be any surprises like those
 related to 8.2.

 Where are we at?  Has anyone reviewed the preliminary work?  Any
 comments, suggestions, etc?

Yes, I looked at it.

The WITH support seems okay. I guess I'd thought it might be represented
different internally (not a sub query) but the approach Greg has taken is
probably more straight forward (in that you get a lot of proven code for
free). It should work fine for recursive queries too, if you just re-seed
the param keys for every scan of the 'sub-query'.

I wonder if anyone can think of a good way to cost the recursive side of
the query. I'm still pre-coffee and it hurts my head :).

Gavin

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Column storage positions

2007-02-21 Thread Phil Currier


On 2/21/07, Gregory Stark [EMAIL PROTECTED] wrote:

So yes, there would be a use case for specifying the physical column layout
when pg_migrator is doing the pg_dump/restore. But pg_migrator could probably
just update the physical column numbers itself. It's not like updating system
catalog tables directly is any more of an abstraction violation than swapping
files out from under the database...


If people are ok with that answer, then I'll gladly stop suggesting
that ALTER TABLE be able to explicitly set storage positions.  I was
just trying to avoid forcing a tool like pg_migrator to muck with the
system catalogs.

phil

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] Column storage positions

2007-02-21 Thread Alvaro Herrera

Phil Currier escribió:
 On 2/21/07, Gregory Stark [EMAIL PROTECTED] wrote:
 So yes, there would be a use case for specifying the physical column layout
 when pg_migrator is doing the pg_dump/restore. But pg_migrator could 
 probably
 just update the physical column numbers itself. It's not like updating 
 system
 catalog tables directly is any more of an abstraction violation than 
 swapping
 files out from under the database...
 
 If people are ok with that answer, then I'll gladly stop suggesting
 that ALTER TABLE be able to explicitly set storage positions.  I was
 just trying to avoid forcing a tool like pg_migrator to muck with the
 system catalogs.

I am ... that would be pg_migrator's goal anyway.  And it's certainly
going to need knowledge on how to go from one version to the next.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Column storage positions

2007-02-21 Thread Stephan Szabo

On Wed, 21 Feb 2007, Alvaro Herrera wrote:

 Stephan Szabo escribi?:
  On Wed, 21 Feb 2007, Martijn van Oosterhout wrote:
 
   On Wed, Feb 21, 2007 at 12:06:30PM -0500, Phil Currier wrote:
Well, for two reasons:
   
1) If you have a table with one very-frequently-accessed varchar()
column and several not-frequently-accessed int columns, it might
actually make sense to put the varchar column first.  The system won't
always be able to make the most intelligent decision about table
layout.
  
   Umm, the point of the exercise is that if you know there are int
   columns, then you can skip over them, whereas you can never skip over a
   varchar column. So there isn't really any situation where it would be
   better to put the varchar first.
 
  IIRC, in the first message in this thread, or another recent thread of
  this type, someone tried a reordering example with alternating
  smallints and varchar() and found that the leftmost varchar was
  actually slower to access after reordering, so I'm not sure that we can
  say there isn't a situation where it would affect things.

 Offsets are cached in tuple accesses, but the caching is obviously
 disabled for all attributes past any variable-length attribute.  So if
 you put a varlena attr in front, caching is completely disabled for all
 attrs (but that first one).  The automatic reordering algorithm must put
 all fixed-len attrs at the front, so that their offets (and that of the
 first variable length attr) can be cached.

 Did I miss something in what you were trying to say?  I assume you must
 already know this.

I think so. What I was mentioning was that I was pretty sure that there
was a message with someone saying that they actually tried something that
did this and that they found left-most varchar access was slightly slower
after the reordering although general access was faster. I believe the
table case was alternating smallint and varchar columns, but I don't know
what was tested for the retrieval. If that turns out to be able to be
supported by other tests, then for some access patterns, the rearranged
version might be slower.

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] autovacuum next steps, take 2

2007-02-21 Thread Alvaro Herrera

Ok, scratch that :-)  Another round of braindumping below.


Launcher starts one worker in each database.  This worker is not going
to do vacuum work, just report how much vacuum effort is needed in the
database.  Vacuum effort is measured as the total number of pages in
need of vacuum, being the sum of relpages of all tables and indexes
needing vacuum.  (Note: we weight heap pages the same as index pages.
Is this OK?)

Create a plan for vacuuming all those databases within the constraints
of max_workers.  Databases needing the most work are vacuumed first.
One worker per database.  Thus max_workers databases are being vacuumed
in parallel at this time.  When one database is finished, the launcher
starts a worker in the next database in the list.

When the plan is complete (i.e. the list is empty) we can do the whole
thing again, excluding the databases that are still being vacuumed.

Perhaps we should wait autovacuum_naptime seconds between finishing one
vacuum round in all databases and starting the next.  How do we measure
this: do we start sleeping when the last worker finishes, or when the
list is empty?

Perhaps we should reserve a worker for vacuuming hot tables.  Launcher
then uses max_workers-1 workers for the above plan, and the spare worker
is continuously connecting to one database, vacuuming hot tables, going
away, the launcher starts it again to connect to the next database.
Definitional problem: how to decide what's a hot table?  One idea (the
simplest) is to let the DBA define it.

Thus, at most two workers are on any database: one of them is working on
normal tables, the other on hot tables.

(This idea can be complemented by having another GUC var,
autovacuum_hot_workers, which allows the DBA to have more than one
worker on hot tables (just for the case where there are too many hot
tables).  This may be overkill.)


Ron Mayer expressed the thought that we're complicating needlessly the
UI for vacuum_delay, naptime, etc.  He proposes that instead of having
cost_delay etc, we have a mbytes_per_second parameter of some sort.
This strikes me a good idea, but I think we could make that after this
proposal is implemented.  So this take 2 could be implemented, and
then we could switch the cost_delay stuff to using a MB/s kind of
measurement somehow (he says waving his hands wildly).

Greg Stark and Matthew O'Connor say that we're misdirected in having
more than one worker per tablespace.  I say we're not :-)  If we
consider Ron Mayer's idea of measuring MB/s, but we do it per
tablespace, then we would inflict the correct amount of vacuum pain to
each tablespace, sleeping as appropriate.  I think this would require
workers of different databases to communicate what tablespaces they are
using, so that all of them can utilize the correct amount of bandwidth.


I'd like to know if this responds to the mentioned people's objections.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] Column storage positions

2007-02-21 Thread Andrew Dunstan




Stephan Szabo wrote:

 What I was mentioning was that I was pretty sure that there
was a message with someone saying that they actually tried something that
did this and that they found left-most varchar access was slightly slower
after the reordering although general access was faster. I believe the
table case was alternating smallint and varchar columns, but I don't know
what was tested for the retrieval. If that turns out to be able to be
supported by other tests, then for some access patterns, the rearranged
version might be slower.

  


I would want to see this very carefully instrumented. Assuming we are 
putting all fixed size objects at the front, which seems like the best 
arrangement, then the position of every fixed field and the fixed 
portion of the position of every varlena field can be precalculated (and 
in the case of the leftmost varlena field that's it's complete 
position). So the extra effort in getting to the leftmost varchar field 
should be close to zero if this is done right, ISTM.


cheers

andrew

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [HACKERS] Column storage positions

2007-02-21 Thread Florian G. Pflug


Stephan Szabo wrote:

On Wed, 21 Feb 2007, Alvaro Herrera wrote:

Did I miss something in what you were trying to say?  I assume you must
already know this.


I think so. What I was mentioning was that I was pretty sure that there
was a message with someone saying that they actually tried something that
did this and that they found left-most varchar access was slightly slower
after the reordering although general access was faster. I believe the
table case was alternating smallint and varchar columns, but I don't know
what was tested for the retrieval. If that turns out to be able to be
supported by other tests, then for some access patterns, the rearranged
version might be slower.


Here is the original quote:

The results were encouraging: on a table
with 20 columns of alternating smallint and varchar(10) datatypes,
selecting the max() of one of the rightmost int columns across 1
million rows ran around 3 times faster.  The same query on the
leftmost varchar column (which should suffer the most from this
change) predictably got a little slower (about 10%);


What the OP doesn't mention is how the exact layouts looked before
and after the reordering - maybe a nullable field fixed-length field
got moved before the varchar column in question, which would disable
offset caching I guess.

Let's say the reodering algorithm is changed to only move non-nullable
fixed-width columns to the left - can anyone see an access pattern that 
would run slower after the reodering? I certainly can't - because the set

of columns for which offset caching works after the reodering would
be a superset of the one for which it works before the reordering.

BTW, this is a good case for why the storage order should - directly or
indirectly - be tweakable. You can either optimize for space, and _then_
for speed - which is what the OP did I think - or first for speed, and then 
for space. If the dba cannot choose the strategy, there will always be 
workloads where the engine does it the wrong way around.


greetings, Florian Pflug

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [HACKERS] Plan invalidation design

2007-02-21 Thread Andrew Hammond

On Feb 18, 9:35 am, [EMAIL PROTECTED] (Tom Lane) wrote:
 Russell Smith [EMAIL PROTECTED] writes:

  If you replan and immutable function, aren't you possibly messing up a
  functional index that is using the old function.  Hey, if you change an
  immutable function that has an index, you are in trouble already.

 True.

While I agree that if you change an immutable function used by an
index, your index will break, I do not understand how re-planning it
will cause problems. Is the worry that the index will not pick up on
the new plan?

Andrew


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Column storage positions

2007-02-21 Thread Andrew Dunstan




Florian G. Pflug wrote:


BTW, this is a good case for why the storage order should - directly or
indirectly - be tweakable. You can either optimize for space, and _then_
for speed - which is what the OP did I think - or first for speed, and 
then for space. If the dba cannot choose the strategy, there will 
always be workloads where the engine does it the wrong way around.





Maybe a simple setting on ordering strategy would be OK. The chance of 
mucking it up if you can directly set the physical order seems just too 
great to me.


cheers

andrew

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] Column storage positions

2007-02-21 Thread Gregory Stark

Andrew Dunstan [EMAIL PROTECTED] writes:

 I would want to see this very carefully instrumented. Assuming we are putting
 all fixed size objects at the front, which seems like the best arrangement,
 then the position of every fixed field and the fixed portion of the position 
 of
 every varlena field can be precalculated (and in the case of the leftmost
 varlena field that's it's complete position). 

I'm not sure what you mean by the fixed portion of the position of every
varlena field. Fields are just stuck one after the other (plus alignment)
skipping nulls. So any field after a null or varlena field can't have its
position cached at all.

Sadly one effect of the 1-byte header varlenas is that the position of the
first varlena can't be cached any more. That's because its alignment depends
on whether you're storing a short varlena or a full 4-byte varlena.

Currently there's an exception for the first column of the table since that's
always at offset 0. We could add another exception and cache the first varlena
if it happens to land on an intaligned offset without any extra alignment. I'm
not sure if that pays for itself or not though. It still only helps 25% of the
time and only for the first varlena so it doesn't seem terribly important.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] autovacuum next steps, take 2

2007-02-21 Thread Matthew T. O'Connor


Alvaro Herrera wrote:

Ok, scratch that :-)  Another round of braindumping below.


I still think this is solution in search of a problem.  The main problem 
we have right now is that hot tables can be starved from vacuum.  Most 
of this proposal doesn't touch that.  I would like to see that problem 
solved first, then we can talk about adding multiple workers per 
database or per tablespace etc...



(This idea can be complemented by having another GUC var,
autovacuum_hot_workers, which allows the DBA to have more than one
worker on hot tables (just for the case where there are too many hot
tables).  This may be overkill.)


I think this is more along the lines of what we need first.


Ron Mayer expressed the thought that we're complicating needlessly the
UI for vacuum_delay, naptime, etc.  He proposes that instead of having
cost_delay etc, we have a mbytes_per_second parameter of some sort.
This strikes me a good idea, but I think we could make that after this
proposal is implemented.  So this take 2 could be implemented, and
then we could switch the cost_delay stuff to using a MB/s kind of
measurement somehow (he says waving his hands wildly).


Agree this is probably a good idea in the long run, but I agree this is 
lower on the priority list and should come next.



Greg Stark and Matthew O'Connor say that we're misdirected in having
more than one worker per tablespace.  I say we're not :-)  If we
consider Ron Mayer's idea of measuring MB/s, but we do it per
tablespace, then we would inflict the correct amount of vacuum pain to
each tablespace, sleeping as appropriate.  I think this would require
workers of different databases to communicate what tablespaces they are
using, so that all of them can utilize the correct amount of bandwidth.


I agree that in the long run it might be better to have multiple workers 
with MB/s throttle and tablespace aware, but we don't have any of that 
infrastructure right now.  I think the piece of low-hanging fruit that 
your launcher concept can solve is the hot table starvation.


My Proposal:  If we require admins to identify hot tables tables, then: 
1) Launcher fires-off a worker1 into database X.

2) worker1 deals with hot tables first, then regular tables.
3) Launcher continues to launch workers to DB X every autovac naptime. 
4) worker2 (or 3 or 4 etc...) sees it is alone in DB X, if so it acts as 
worker1 did above.  If worker1 is still working in DB X then worker2 
looks for hot tables that are being starved because worker1 got busy. 
If worker2 finds no hot tables that need work, then worker2 exits.


This seems a very simple solution (given your launcher work) that can 
solve the starvation problem.



Thoughts?

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org

Re: [HACKERS] autovacuum next steps, take 2

2007-02-21 Thread Gregory Stark


Alvaro Herrera [EMAIL PROTECTED] writes:

 Greg Stark and Matthew O'Connor say that we're misdirected in having
 more than one worker per tablespace.  I say we're not :-)  

I did say that. But your comment about using a high cost_delay was fairly
convincing too. It would be a simpler design and I think you're right. As long
as raise both cost_delay and cost_limit by enough you should get pretty much
the same sequential i/o rate and not step on each others toes too much.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] Status of Hierarchical Queries

2007-02-21 Thread Gregory Stark

Gavin Sherry [EMAIL PROTECTED] writes:

 The WITH support seems okay. I guess I'd thought it might be represented
 different internally (not a sub query) but the approach Greg has taken is
 probably more straight forward (in that you get a lot of proven code for
 free). It should work fine for recursive queries too, if you just re-seed
 the param keys for every scan of the 'sub-query'.

I don't think it works for recursive queries. Since you can't have the same
executor plan in motion for two different sets of parameters simultaneously.
That's why I was talking about a Memoize node.

It is sufficient for the non-recursive case which might make it worthwhile
putting it in 8.3. But even there user's expectations are probably that the
reason they're writing it as a cte is precisely to avoid duplicate execution.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] autovacuum next steps, take 2

2007-02-21 Thread Joshua D. Drake


 Ron Mayer expressed the thought that we're complicating needlessly the
 UI for vacuum_delay, naptime, etc.  He proposes that instead of having
 cost_delay etc, we have a mbytes_per_second parameter of some sort.
 This strikes me a good idea, but I think we could make that after this
 proposal is implemented.  So this take 2 could be implemented, and
 then we could switch the cost_delay stuff to using a MB/s kind of
 measurement somehow (he says waving his hands wildly).

vacuum should be a process with the least amount of voodoo. If we can
just have vacuum_delay and vacuum_threshold, where threshold allows an
arbitrary setting of how much bandwidth we will allot to the process,
then that is a beyond wonderful thing.

It is easy to determine how much IO you have, and what you can spare.

Joshua D. Drake




 
 Greg Stark and Matthew O'Connor say that we're misdirected in having
 more than one worker per tablespace.  I say we're not :-)  If we
 consider Ron Mayer's idea of measuring MB/s, but we do it per
 tablespace, then we would inflict the correct amount of vacuum pain to
 each tablespace, sleeping as appropriate.  I think this would require
 workers of different databases to communicate what tablespaces they are
 using, so that all of them can utilize the correct amount of bandwidth.
 
 
 I'd like to know if this responds to the mentioned people's objections.
 


-- 

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997
 http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] Column storage positions

2007-02-21 Thread Andrew Dunstan




Gregory Stark wrote:

Andrew Dunstan [EMAIL PROTECTED] writes:

  

I would want to see this very carefully instrumented. Assuming we are putting
all fixed size objects at the front, which seems like the best arrangement,
then the position of every fixed field and the fixed portion of the position of
every varlena field can be precalculated (and in the case of the leftmost
varlena field that's it's complete position). 



I'm not sure what you mean by the fixed portion of the position of every
varlena field. Fields are just stuck one after the other (plus alignment)
skipping nulls. So any field after a null or varlena field can't have its
position cached at all.
  


I'd forgotten about nulls :-( . Nevertheless, it's hard to think of a 
case where the penalty for shifting fixed size fields to the front is 
going to be very big. If we really wanted to optimise for speed for some 
varlena case, we'd probably need to keep stats on usage patterns, but 
that seems like massive overkill.


cheers

andrew

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Column storage positions

2007-02-21 Thread Gregory Stark

Andrew Dunstan [EMAIL PROTECTED] writes:

 Gregory Stark wrote:
 Andrew Dunstan [EMAIL PROTECTED] writes:

   
 I would want to see this very carefully instrumented. Assuming we are 
 putting
 all fixed size objects at the front, which seems like the best arrangement,
 then the position of every fixed field and the fixed portion of the 
 position of
 every varlena field can be precalculated (and in the case of the leftmost
 varlena field that's it's complete position). 

 I'm not sure what you mean by the fixed portion of the position of every
 varlena field. Fields are just stuck one after the other (plus alignment)
 skipping nulls. So any field after a null or varlena field can't have its
 position cached at all.

 I'd forgotten about nulls :-( . Nevertheless, it's hard to think of a case
 where the penalty for shifting fixed size fields to the front is going to be
 very big. If we really wanted to optimise for speed for some varlena case, 
 we'd
 probably need to keep stats on usage patterns, but that seems like massive
 overkill.

Oh, certainly, especially since only one varlena could ever be cached and soon
even that one won't be unless it's the very first column in the table. So
really, not worth thinking about.

Well the statistics we have do include the percentage of nulls in each column,
so we can sort columns by fixed width not null first, then fixed width
nullable by decreasing probability of being null, then varlenas.

But there's a tradeoff here. The more we try to optimize for cacheable offsets
the more difficult it will be to pack away the alignments.

Consider something like:

int not null
boolean not null
int null
textnull

If we want we can pack this as int,int,boolean,text and (as long as the text
gets a 1-byte header) have them packed with no alignment.

But then the boolean can't use the cache whenever the int column is null. (the
offset will still be cached but it won't be used unless the int column is
non-null).

Alternatively we can pack this as int,boolean,int,text in which case the
boolean will *always* use the cache but it will be preceded by three wasted
padding bytes.

I tend to think the padding is more important than the caching because in
large systems the i/o speed dominates. But that doesn't mean the cpu cost is
negligible either. Especially on very wide tables.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] log ssl mode with connections?

2007-02-21 Thread Henry B. Hotz

Anyone making those kind of decisions probably wants a generic  
connection is encrypted flag.  It could be true if a GSSAPI  
connection has negotiated use of a security layer.


Of course I don't have my GSSAPI patches working as well as the SASL  
ones were yet, and I haven't started on adding security layers yet  
either.


On Jan 30, 2007, at 12:56 PM, Magnus Hagander wrote:


On Tue, Jan 30, 2007 at 12:35:01PM -0500, Kris Jurka wrote:



On Tue, 30 Jan 2007, Andrew Dunstan wrote:

If I am allowing both SSL and non-SSL I might like to know which  
is used

by a particular connection.



Other places I've heard people ask for this info:

1) pg_stat_activity to see who's currently connected and how.

2) Via a function (boolean am_i_using_ssl()) so they can make  
security

decisions in views or procedural code.


That information is available to the client in the form of the API  
call
PQgetssl(). It will return NULL if no SSL is in use, or something  
other
than NULL if it is (a SSL * pointer, but you don't need to know  
that if

you just want to know if you're on SSL or not).
IIRC it was originally disucssed to put it as a function callable, but
it was decided that it makes a lot more sense to provide it in the
client library. I don't know how many other client libraries  
provide the

SSL information stuff.

//Magnus


The opinions expressed in this message are mine,
not those of Caltech, JPL, NASA, or the US Government.
[EMAIL PROTECTED], or [EMAIL PROTECTED]



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [HACKERS] Status of Hierarchical Queries

2007-02-21 Thread Gavin Sherry

On Wed, 21 Feb 2007, Gregory Stark wrote:

 Gavin Sherry [EMAIL PROTECTED] writes:

  The WITH support seems okay. I guess I'd thought it might be represented
  different internally (not a sub query) but the approach Greg has taken is
  probably more straight forward (in that you get a lot of proven code for
  free). It should work fine for recursive queries too, if you just re-seed
  the param keys for every scan of the 'sub-query'.

 I don't think it works for recursive queries. Since you can't have the same
 executor plan in motion for two different sets of parameters simultaneously.
 That's why I was talking about a Memoize node.

Can you elaborate on the 'two different sets of parameters' bit? I'm still
without coffee.

 It is sufficient for the non-recursive case which might make it worthwhile
 putting it in 8.3. But even there user's expectations are probably that the
 reason they're writing it as a cte is precisely to avoid duplicate execution.

I wonder if the planner should decide that?

Thanks,

Gavin

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Status of Hierarchical Queries

2007-02-21 Thread Gregory Stark

Gavin Sherry [EMAIL PROTECTED] writes:

 Can you elaborate on the 'two different sets of parameters' bit? I'm still
 without coffee.

The spec allows for arbitrarily complex recursive query structures. Including
mutually recursive queries, and even non-linearly recursive queries. I found
grokking them requires far stronger brews than coffee :)

But in a simple recursive tree search you have a node which wants to do a join
between the output of tree level n against some table to produce tree level
n+1. It can't simply execute the plan to produce tree level n since that's the
same tree it's executing itself. If it calls the Init method on itself it'll
lose all its state.

There's another reason it can't just execute the previous node. You really
don't want to recompute all the results for level n when you go to produce
level n+1. You want to keep them around from the previous iteration. Otherwise
you have an n^2 algorithm.

Think of the fibonacci sequence algorithm: if you implement it recursively the
naive way you have to return all the way back to the beginning to calculate
each number. But if you remember the last two you can calculate it directly
without recalculating all the previous number repeatedly.


 It is sufficient for the non-recursive case which might make it worthwhile
 putting it in 8.3. But even there user's expectations are probably that the
 reason they're writing it as a cte is precisely to avoid duplicate execution.

 I wonder if the planner should decide that?

That's one option. For the non-recursive case we could inline the cte subquery
everywhere it's referenced and then add smarts to the planner to find
identical subqueries and have a heuristic to determine when it would be
advantageous to calculate the result once.

The alternative is to retain them as references to a single plan. Then have a
heuristic for when to inline them.

In neither case is a heuristic going to be particularly good. The problem is
that for any reasonably complex plan it'll be cheaper to execute it only once
than multiple times. Unless there's an advantage to be gained by inlining it
such as being able to push conditions down into it. But the only way to find
out if that will be possible would be to try planning it and see.

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Status of Hierarchical Queries

2007-02-21 Thread Gavin Sherry

On Thu, 22 Feb 2007, Gregory Stark wrote:

 Gavin Sherry [EMAIL PROTECTED] writes:

  Can you elaborate on the 'two different sets of parameters' bit? I'm still
  without coffee.

 The spec allows for arbitrarily complex recursive query structures. Including
 mutually recursive queries, and even non-linearly recursive queries. I found
 grokking them requires far stronger brews than coffee :)

Hehe.

 But in a simple recursive tree search you have a node which wants to do a join
 between the output of tree level n against some table to produce tree level
 n+1. It can't simply execute the plan to produce tree level n since that's the
 same tree it's executing itself. If it calls the Init method on itself it'll
 lose all its state.

 There's another reason it can't just execute the previous node. You really
 don't want to recompute all the results for level n when you go to produce
 level n+1. You want to keep them around from the previous iteration. Otherwise
 you have an n^2 algorithm.

Right. When I've spent some idle cycles thinking through this in the past
I figured that in a non-trivial query, we'd end up with a bunch of
materialisations, one for each level of recursion. That sounds very ugly.

  It is sufficient for the non-recursive case which might make it worthwhile
  putting it in 8.3. But even there user's expectations are probably that the
  reason they're writing it as a cte is precisely to avoid duplicate 
  execution.
 
  I wonder if the planner should decide that?

 That's one option. For the non-recursive case we could inline the cte subquery
 everywhere it's referenced and then add smarts to the planner to find
 identical subqueries and have a heuristic to determine when it would be
 advantageous to calculate the result once.

 The alternative is to retain them as references to a single plan. Then have a
 heuristic for when to inline them.

 In neither case is a heuristic going to be particularly good. The problem is
 that for any reasonably complex plan it'll be cheaper to execute it only once
 than multiple times. Unless there's an advantage to be gained by inlining it
 such as being able to push conditions down into it. But the only way to find
 out if that will be possible would be to try planning it and see.

Pushing down predicates was the exact idea I had in mind.

Thanks,

Gavin

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] Column storage positions

2007-02-21 Thread elein

On Wed, Feb 21, 2007 at 08:33:10PM +0100, Florian G. Pflug wrote:
 Simon Riggs wrote:
 On Wed, 2007-02-21 at 09:25 -0500, Phil Currier wrote:
 On 2/21/07, Alvaro Herrera [EMAIL PROTECTED] wrote:
 I'd expect the system being able to reoder the columns to the most
 efficient order possible (performance-wise and padding-saving-wise),
 automatically.  When you create a table, sort the columns to the most
 efficient order; ALTER TABLE ADD COLUMN just puts the new columns at the
 end of the tuple; and anything that requires a rewrite of the table
 (ALTER TABLE ... ALTER TYPE for example; would be cool to have CLUSTER
 do it as well; and do it on TRUNCATE also) again recomputes the most
 efficient order.
 That's exactly what I'm proposing.  On table creation, the system
 chooses an efficient column order for you. 
 
 That's fairly straightforward and beneficial. I much prefer Alvaro's
 approach rather than the storage position details originally described.
 Moreover, you'd need to significantly re-write lots of ALTER TABLE and I
 really don't think you want to go there.
 
 There is a problem: If people do a CREATE TABLE and then issue SELECT *
 they will find the columns in a different order. That could actually
 break some programs, so it isn't acceptable in all cases. e.g. COPY
 without a column-list assumes that the incoming data should be assigned
 to the table columns in the same order as the incoming data file.
 
 But the display order (and hence the COPY order) of columns would still 
 be determinted by attnum, not by some attstoragepos, no?
 The column reordering would only apply to the physical storage of 
 columns, not to how it's presented to the user I'd think.
 
 The original idea was to add a third column, attdisplaypos, and let the 
 user choose the display ordering independent from the unique id 
 (attnum), which in turn is independent from the storage position.
 
 For simplicity, the OP said he omitted the display-position part here,
 because it's really orthogonal to being able to modify the storage position.
 

IMHO I think display order is very important to users.  First, don't
break the select *, no matter how bad it is to code that. Next, don't
break copy or pg_dump/restore.  We've fielded a lot of questions on
the ordering of columns for display and simplicity reasons.

The storage order is orthogonal to the display order.  display order can be 
handled
in attnum and the new storage order can be the new column.

--elein

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] Column storage positions

2007-02-21 Thread Andrew Dunstan


elein wrote:


The storage order is orthogonal to the display order.  display order can be 
handled
in attnum and the new storage order can be the new column.


  


If you review the earlier discussion you will see that it is proposed 
(by Tom) to have 3 numbers (i.e. 2 new cols): an immutable id and 
mutable storage/physical order and display/logical order.


cheers

andrew


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

[HACKERS] SCMS question

2007-02-21 Thread Warren Turkal

Are there any plans to move to another SCMS in the future? I am curious, I 
guess.

wt
-- 
Warren Turkal (w00t)

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [HACKERS] Log levels for checkpoint/bgwriter monitoring

2007-02-21 Thread Robert Treat

On Monday 19 February 2007 22:59, Greg Smith wrote:
 I have a WIP patch that adds the main detail I have found I need to
 properly tune checkpoint and background writer activity.  I think it's
 almost ready to submit (you can see the current patch against 8.2 at
 http://www.westnet.com/~gsmith/content/postgresql/patch-checkpoint.txt )
 after making it a bit more human-readable.  But I've realized that along
 with that, I need some guidance in regards to what log level is
 appropriate for this information.

 An example works better than explaining what the patch does:

 2007-02-19 21:53:24.602 EST - DEBUG:  checkpoint required (wrote
 checkpoint_segments)
 2007-02-19 21:53:24.685 EST - DEBUG:  checkpoint starting
 2007-02-19 21:53:24.705 EST - DEBUG:  checkpoint flushing buffer pool
 2007-02-19 21:53:24.985 EST - DEBUG:  checkpoint database fsync starting
 2007-02-19 21:53:42.725 EST - DEBUG:  checkpoint database fsync complete
 2007-02-19 21:53:42.726 EST - DEBUG:  checkpoint buffer flush dirty=8034
 write=279956 us sync=17739974 us

 Remember that Load distributed checkpoint discussion back in December?
 You can see exactly how bad the problem is on your system with this log
 style (this is from a pgbench run where it's postively awful--it really
 does take over 17 seconds for the fsync to execute, and there are clients
 that are hung the whole time waiting for it).

 I also instrumented the background writer.  You get messages like this:

 2007-02-19 21:58:54.328 EST - DEBUG:  BGWriter Scan All - Written = 5/5
 Unscanned = 23/54

 This shows that we wrote (5) the maximum pages we were allowed to write
 (5) while failing to scan almost half (23) of the buffers we meant to look
 at (54).  By taking a look at this output while the system is under load,
 I found I was able to do bgwriter optimization that used to take me days
 of frustrating testing in hours.  I've been waiting for a good guide to
 bgwriter tuning since 8.1 came out.  Once you have this, combined with
 knowing how many buffers were dirty at checkpoint time because the
 bgwriter didn't get to them in time (the number you want to minimize), the
 tuning guide practically writes itself.

 So my question is...what log level should all this go at?  Right now, I
 have the background writer stuff adjusting its level dynamically based on
 what happened; it logs at DEBUG2 if it hits the write limit (which should
 be a rare event once you're tuned properly), DEBUG3 for writes that
 scanned everything they were supposed to, and DEBUG4 if it scanned but
 didn't find anything to write.  The source of checkpoint information logs
 at DEBUG1, the fsync/write info at DEBUG2.

 I'd like to move some of these up.  On my system, I even have many of the
 checkpoint messages logged at INFO (the source of the checkpoint and the
 total write time line).  It's a bit chatty, but when you get some weird
 system pause issue it makes it easy to figure out if checkpoints were to
 blame.  Given how useful I feel some of these messages are to system
 tuning, and to explaining what currently appears as inexplicable pauses, I
 don't want them to be buried at DEBUG levels where people are unlikely to
 ever see them (I think some people may be concerned about turning on
 things labeled DEBUG at all).  I am aware that I am too deep into this to
 have an unbiased opinion at this point though, which is why I ask for
 feedback on how to proceed here.


My impression of this is that DBA's would typically want to run this for a 
short period of time to get thier systems tuned and then it pretty much 
becomes chatter.  Can you come up with an idea of what information DBA's need 
to know?  Would it be better to hide all of this logging behind a guc that 
can be turned on or off (log_bgwriter_activity)? Maybe you could just 
reinsterment it as dtrace probes so a seperate stand-alone process could pull 
the information as needed? :-)

-- 
Robert Treat
Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] Column storage positions

2007-02-21 Thread Tom Lane

Andrew Dunstan [EMAIL PROTECTED] writes:
 Alvaro Herrera wrote:
 Right, I'm not advocating not doing that -- I'm just saying that the
 first step to that could be decoupling physical position with attr id
 :-) Logical column ordering (the order in which SELECT * expands to)
 seems to me to be a different feature.

 Except in the sense that divorcing the id from the storage order makes 
 it possible to do sanely. :-)

They are different features, but they are going to hit all the same
code, because the hardest part of this remains making sure that every
piece of the code is using the right kind of column number.  The
suggestion I posted awhile ago amounts to saying that we might be able
to solve that by default, by making sure that only one definition of
column number is relevant to the majority of the backend and we can
figure out exactly where the other definitions need to apply.  But
that's handwaving until someone actually does it :-(

In any case I think it's foolish not to tackle both issues at once.
We know we'd like to have both features and we know that all the same
bits of code need to be looked at to implement either.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

[HACKERS] infinity and DATE type

2007-02-21 Thread Warren Turkal

Is there a reason why the DATE type does not support infinity something like 
the DATE type?

wt
-- 
Warren Turkal (w00t)

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] SCMS question

2007-02-21 Thread Tom Lane

Warren Turkal [EMAIL PROTECTED] writes:
 Are there any plans to move to another SCMS in the future?

Not particularly.  We keep hearing from various advocates that
$foo-is-better-than-CVS, but the preferred value of $foo changes with
amazing frequency, and none of the arguments seem to justify the pain
of converting.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] tsearch in core patch, for inclusion

2007-02-21 Thread Peter Eisentraut

Oleg Bartunov wrote:
 It's not so big addition to the gram.y, see a list of commands
 http://mira.sai.msu.su/~megera/pgsql/ftsdoc/sql-commands.html.

In that proposed syntax, I would drop all =, ,, (, and ).  They 
don't seem necessary and they are untypical for SQL commands.  I'd 
compare with CREATE FUNCTION or CREATE SEQUENCE for SQL commands that 
do similar things.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] tsearch in core patch, for inclusion

2007-02-21 Thread Peter Eisentraut

Joshua D. Drake wrote:
 This is like the third time we have been around this problem. The
 syntax is clear and reasonable imo.

But others have differing opinions.

 Can we stop arguing about it and just include? If there are specific
 issues beyond syntax that is one
 thing, but that this point it seems we are arguing for the sake of
 arguing.

How is that worse than wanting to abort the discussion for the sake of 
aborting the discussion?

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] infinity and DATE type

2007-02-21 Thread Tom Lane

Warren Turkal [EMAIL PROTECTED] writes:
 Is there a reason why the DATE type does not support infinity

Nobody got round to it.  I believe it's listed in the TODO file ...

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [HACKERS] tsearch in core patch, for inclusion

2007-02-21 Thread Oleg Bartunov


On Thu, 22 Feb 2007, Peter Eisentraut wrote:


Oleg Bartunov wrote:

It's not so big addition to the gram.y, see a list of commands
http://mira.sai.msu.su/~megera/pgsql/ftsdoc/sql-commands.html.


In that proposed syntax, I would drop all =, ,, (, and ).  They
don't seem necessary and they are untypical for SQL commands.  I'd
compare with CREATE FUNCTION or CREATE SEQUENCE for SQL commands that
do similar things.


that looks reasonable.

Regards,
Oleg
_
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] infinity and DATE type

2007-02-21 Thread Warren Turkal

On Thursday 22 February 2007 00:20, Tom Lane wrote:
 Nobody got round to it.  I believe it's listed in the TODO file ...

It's not at [1]. Should someone add it to the TODO?

[1]http://www.postgresql.org/docs/faqs.TODO.html

wt
-- 
Warren Turkal (w00t)

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] SCMS question

2007-02-21 Thread Warren Turkal

On Thursday 22 February 2007 00:05, Tom Lane wrote:
 Not particularly.  We keep hearing from various advocates that
 $foo-is-better-than-CVS, but the preferred value of $foo changes with
 amazing frequency, and none of the arguments seem to justify the pain
 of converting.

Some of the other options just seem to have much nicer user interfaces. I was 
playing with Bacula and they just changed to Subversion. It really is much 
nicer than the CVS they used to use. Git seems interesting as well. I guess 
Subversion and git are the two big ones right now. What would you look for if 
you were to check out new SCMSes? Would you want distributed like Git or 
centralized like CVS/Subversion?

wt
-- 
Warren Turkal (w00t)

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate

Re: [HACKERS] infinity and DATE type

2007-02-21 Thread Tom Lane

Warren Turkal [EMAIL PROTECTED] writes:
 On Thursday 22 February 2007 00:20, Tom Lane wrote:
 Nobody got round to it. I believe it's listed in the TODO file ...

 It's not at [1]. Should someone add it to the TODO?

I see

* Dates and Times

o Allow infinite dates and intervals just like infinite timestamps


regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [HACKERS] SCMS question

2007-02-21 Thread Tom Lane

Warren Turkal [EMAIL PROTECTED] writes:
 On Thursday 22 February 2007 00:05, Tom Lane wrote:
 Not particularly. We keep hearing from various advocates that
 $foo-is-better-than-CVS, but the preferred value of $foo changes with
 amazing frequency, and none of the arguments seem to justify the pain
 of converting.

 Some of the other options just seem to have much nicer user interfaces. I was 
 playing with Bacula and they just changed to Subversion. It really is much 
 nicer than the CVS they used to use. Git seems interesting as well. I guess 
 Subversion and git are the two big ones right now. What would you look for if 
 you were to check out new SCMSes? Would you want distributed like Git or 
 centralized like CVS/Subversion?

I think you just made my point for me.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

Re: [HACKERS] infinity and DATE type

2007-02-21 Thread Gavin Sherry

On Thu, 22 Feb 2007, Warren Turkal wrote:

 On Thursday 22 February 2007 00:20, Tom Lane wrote:
  Nobody got round to it.  I believe it's listed in the TODO file ...

 It's not at [1]. Should someone add it to the TODO?

 [1]http://www.postgresql.org/docs/faqs.TODO.html

Yes it is.

Allow infinite dates and intervals just like infinite timestamps

Gavin

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [HACKERS] SCMS question

2007-02-21 Thread Warren Turkal

On Thursday 22 February 2007 00:42, you wrote:
 I think you just made my point for me.

I wasn't trying to convince so much as get an opinion.

wt
-- 
Warren Turkal (w00t)

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

90 matches

Mail list logo