Re: [HACKERS] []performance issues

2002-08-03 Thread Christopher Kings-Lynne

 So I am still interested in PostgreSQL's ability to deal with
 multimillon records tables.

Postgres has no problem with multimillion row tables - many people on this
list run them - just don't do sequential scans on them if you can't afford
the time it takes.

Chris



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] PITR, checkpoint, and local relations

2002-08-03 Thread Christopher Kings-Lynne

 The main area where it seems to get heavy use is during index builds,
 and for 'CREATE TABLE AS SELECT...'.

 So I will remove the local buffer manager as part of the PITR patch,
 unless there is further objection.

Would someone mind filling me in as to what the local bugger manager is and
how it is different (and not useful) compared to the shared buffer manager?

Chris



---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



Re: [HACKERS] Why is MySQL more chosen over PostgreSQL?

2002-08-03 Thread Curt Sampson

On 2 Aug 2002, Hannu Krosing wrote:

 On Fri, 2002-08-02 at 12:15, Curt Sampson wrote:
  On 2 Aug 2002, Hannu Krosing wrote:
 
   Could you brief me why do they discourage a syntactical frontent to a
   feature that is trivially implemented ?
 
  What's the point of adding it? It's just one more thing to learn.

 You don't have to learn it if you don't want to. But once you do, you
 have a higher level way of expressing a whole class of models.

Perhaps this is the problem. I disagree that it's  a higher level.

cjs
-- 
Curt Sampson  [EMAIL PROTECTED]   +81 90 7737 2974   http://www.netbsd.org
Don't you know, in this new Dark Age, we're all light.  --XTC


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] WAL file location

2002-08-03 Thread Curt Sampson

On Fri, 2 Aug 2002, Thomas Lockhart wrote:

 [Symlinks] don't scale,

Given that we have only one directory for the log file, this would not
appear to be a problem.

 they are not portable,

That's certainly a problem if we intend to run on systems without them.

 and it is difficult for
 applications (like the Postgres backend) to know that they are dealing
 with a simlink or a real file.

Er...that's the whole point of symlinks.

Not that I really care either way about the whole issue, so long
as we do *something*.

cjs
-- 
Curt Sampson  [EMAIL PROTECTED]   +81 90 7737 2974   http://www.netbsd.org
Don't you know, in this new Dark Age, we're all light.  --XTC


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] getpid() function

2002-08-03 Thread Hannu Krosing

On Sat, 2002-08-03 at 01:25, Tom Lane wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
   Perhaps a more relevant question is why are we cluttering the namespace
   with any such function at all?  What's the use case for it?
 
  It was requested because it is exposed in libpq and people need it to
  generate unique names and stuff like that from within psql and
  functions.  Seems like a valid use for the pid.
 
 The sole reason libpq exposes it is so that you can tell a self-notify
 from an incoming notify.  (ie, given you are LISTENing on a condition
 that both you and other clients send NOTIFYs for, is this particular
 message one that you sent yourself, or not?  Compare the originator PID
 in the NOTIFY message to your backend_pid to find out.)  I put that
 feature in back around 6.4, because it allowed some important
 optimizations in an app I had that used LISTEN/NOTIFY a lot.
 
 Since NOTIFY messages aren't even visible at the SQL level, the above is
 not a reason for making PIDs visible at the SQL level.

When I last time showed how backend_pid function can be trivially
defined as 

hannu=# create function getpid() returns int
hannu-# as '/lib/libc.so.6','getpid' language 'C';
CREATE
hannu=# select getpid();
 getpid 

   2832
(1 row)

You claimed that NOTIFY uses some _other_ backend id (i.e. not process
id).

But when I now tested it it seems that this is not the case, notify does
use the actual process id.

hannu=# listen a;
LISTEN
hannu=# notify a;
NOTIFY
Asynchronous NOTIFY 'a' from backend with pid 2832 received.

 
 So I'm still unconvinced that we need or want this ...
 

And you can do it trivially as long as we support old-style C functions
anyway.


Hannu


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] Table inheritance versus views

2002-08-03 Thread Hannu Krosing

On Fri, 2002-08-02 at 22:39, [EMAIL PROTECTED] wrote:
 On 29 Jul 2002 18:27:40 MDT, the world broke into rejoicing as
 Stephen Deasey [EMAIL PROTECTED]  said:
  Curt Sampson wrote:
  I'm still waiting to find out just what advantage table inheritance
  offers. I've asked a couple of times here, and nobody has even
  started to come up with anything.
 
  Table inheritance offers data model extensibility.  New (derived) tables
  can be added to the system, and will work with existing code that
  operates on the base tables, without having to hack up all the code.
 
 But it kind of begs the question of why you're creating the new table in
 the first place.
 
 The new table certainly _won't_ work with existing code, at least from
 the perspective that the existing code doesn't _refer_ to that table.

The beuty of OO is that it does not need to :

hannu=# create table animal (name text, legcount int);
CREATE
hannu=# insert into animal values('pig',4);
INSERT 34183 1
hannu=# select * from animal;
 name | legcount 
--+--
 pig  |4
(1 row)

hannu=# create table bird (wingcount int) inherits (animal);
CREATE
hannu=# insert into bird values('hen',2,2);
INSERT 34189 1
hannu=# select * from animal;
 name | legcount 
--+--
 pig  |4
 hen  |2
(2 rows)

--
Hannu


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] Why is MySQL more chosen over PostgreSQL?

2002-08-03 Thread Hannu Krosing

On Sat, 2002-08-03 at 16:32, Curt Sampson wrote:
 On 2 Aug 2002, Hannu Krosing wrote:
 
  On Fri, 2002-08-02 at 12:15, Curt Sampson wrote:
   On 2 Aug 2002, Hannu Krosing wrote:
  
Could you brief me why do they discourage a syntactical frontent to a
feature that is trivially implemented ?
  
   What's the point of adding it? It's just one more thing to learn.
 
  You don't have to learn it if you don't want to. But once you do, you
  have a higher level way of expressing a whole class of models.
 
 Perhaps this is the problem. I disagree that it's  a higher level.

I don't mean morally higher ;)

Just more concise and easier to grasp, same as VIEW vs. TABLE + ON xxx
DO INSTEAD rules.

With INSTEAD rules you can do more than a VIEW does, but when all you
want is a VIEW, then it is easier to define a VIEW, thus VIEW is a
higher level construct than TABLE + ON xxx DO INSTEAD

That is the same way that C is higher than ASM and ASM is higher than
writing code directly using hex editor.

--
Hannu


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] getpid() function

2002-08-03 Thread Tom Lane

Hannu Krosing [EMAIL PROTECTED] writes:
 You claimed that NOTIFY uses some _other_ backend id (i.e. not process
 id).

I did?  Must have been momentary brain fade on my part.  It's always
been process ID.

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] FUNC_MAX_ARGS benchmarks

2002-08-03 Thread Tom Lane

Joe Conway [EMAIL PROTECTED] writes:
 I ran a crude test as follows (using a PHP script on the same machine. 
 Nothing else going on at the same time):

 do 100 times
select 2+2+2+2+2+2+ ... iterated 9901 times

 The results were as follows:
 INDEX_MAX_KEYS1632  64 128
  -+---+--+
 Time in seconds   4849  51  55

Okay, that seems like a good basic test.

Did you happen to make any notes about the disk space occupied by the
database?  One thing I was worried about was the bloat that'd occur
in pg_proc, pg_index, and pg_proc_proname_args_nsp_index.  Aside from
costing disk space, this would indirectly slow things down due to more
I/O to read these tables --- an effect that probably your test couldn't
measure, since it wasn't touching very many entries in any of those
tables.

Looks like we could go for 32 without much problem, though.

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



[HACKERS] fate of CLUSTER command ?

2002-08-03 Thread Oleg Bartunov

I just tried CLUSTER command at fts.postgresql.org to cluster
fts index and got very visual performance win. Unfortunately
I had to restore permissions and recreate other indices by hand.
So, I'm interested what's a future of CLUSTER command ?

Regards,
Oleg
_
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] Table inheritance versus views

2002-08-03 Thread Don Baccus


 On Fri, 2002-08-02 at 22:39, [EMAIL PROTECTED] wrote:
 
On 29 Jul 2002 18:27:40 MDT, the world broke into rejoicing as
Stephen Deasey [EMAIL PROTECTED]  said:

Curt Sampson wrote:

I'm still waiting to find out just what advantage table inheritance
offers. I've asked a couple of times here, and nobody has even
started to come up with anything.

Table inheritance offers data model extensibility.  New (derived) tables
can be added to the system, and will work with existing code that
operates on the base tables, without having to hack up all the code.

But it kind of begs the question of why you're creating the new table in
the first place.

The new table certainly _won't_ work with existing code, at least from
the perspective that the existing code doesn't _refer_ to that table.

Since OpenACS has been brought up in this thread, I thought I'd join the 
list for a day or two and offer my perspective as the project manager.

1. Yes, we use views in our quasi-object oriented data model.  They're 
automatically generated when content types are built by the content 
repository, for instance.

2. Yes, you can model anything you can model with PG's OO extensions 
using views.  If you haven't implemented some way to generate the view 
automatically then a bit more work is required compared to using PG's OO 
extensions.

3. The view approach requires joins on all the subtype tables.  If I 
declare type 'foo' then the view that returns all of foo's columns joins 
on all the subtype tables, while in the PG OO case all of foo's columns 
are stored in foo meaning I can get them all back with a simple query on 
the table.  The PG OO approach can be considerably more efficient than 
the view approach, and this is important to some folks, no matter how 
many appeals to authority are made to various bibles on relational 
theory written by Date and Darwen.

4. The killer that makes the current implementation unusable for us is 
the fact that there's no form of indexing that spans all the tables 
inherited from a base type.  This means there's no cheap enforcement of 
uniqueness constraints across a set of object types, among other things. 
  Being able to inherit indexes and constraints would greatly increase 
the utility of PG's OO extensions.

5. If PG's OO extensions included inheritance of indexes and 
constraints, there's no doubt we'd use them in the OpenACS project, 
because when researching PG we compared datamodels written in this style 
vs. modelling the object relationships manually with automatically 
generated views.  We found the datamodel written using PG's OO 
extensions not only potentially more efficient, but more readable as well.

As far as whether or not there's a significant maintenance cost 
associated with keeping the existing OO stuff in PG, Tom Lane's voice is 
authorative while, when it comes to PG internals, Curt Sampson doesn't 
know squat.


-- 
Don Baccus
Portland, OR
http://donb.photo.net, http://birdnotes.net, http://openacs.org


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] FUNC_MAX_ARGS benchmarks

2002-08-03 Thread Joe Conway

Tom Lane wrote:
 Did you happen to make any notes about the disk space occupied by the
 database?  One thing I was worried about was the bloat that'd occur
 in pg_proc, pg_index, and pg_proc_proname_args_nsp_index.  Aside from
 costing disk space, this would indirectly slow things down due to more
 I/O to read these tables --- an effect that probably your test couldn't
 measure, since it wasn't touching very many entries in any of those
 tables.

No, but it's easy enough to repeat. I'll do that today sometime.

Joe



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] FUNC_MAX_ARGS benchmarks

2002-08-03 Thread Hannu Krosing

On Sat, 2002-08-03 at 18:41, Tom Lane wrote:
 Joe Conway [EMAIL PROTECTED] writes:
  I ran a crude test as follows (using a PHP script on the same machine. 
  Nothing else going on at the same time):
 
  do 100 times
 select 2+2+2+2+2+2+ ... iterated 9901 times
 
  The results were as follows:
  INDEX_MAX_KEYS1632  64 128
   -+---+--+
  Time in seconds   4849  51  55
 
 Okay, that seems like a good basic test.
 
 Did you happen to make any notes about the disk space occupied by the
 database?  One thing I was worried about was the bloat that'd occur
 in pg_proc, pg_index, and pg_proc_proname_args_nsp_index.  Aside from
 costing disk space, this would indirectly slow things down due to more
 I/O to read these tables --- an effect that probably your test couldn't
 measure, since it wasn't touching very many entries in any of those
 tables.

How hard would it be to change pg_proc.proargtypes from oidvector to _oid and hope 
that toasting will take care of the rest ?

This could also get the requested 2% speedup, not to mention the
potential for up to 64K arguments ;)

---
Hannu

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] FUNC_MAX_ARGS benchmarks

2002-08-03 Thread Tom Lane

Hannu Krosing [EMAIL PROTECTED] writes:
 How hard would it be to change pg_proc.proargtypes from oidvector to _oid

Lack of btree index support for _oid would be the first hurdle.

Even if we wanted to do that work, there'd be some serious breakage
of client queries because of the historical differences in output format
and subscripting.  (oidvector indexes from 0, _oid from 1.  Which is
pretty bogus, but if the regression tests are anything to judge by there
are probably a lot of queries out there that know this.)

 This could also get the requested 2% speedup,

I'm not convinced that _oid would be faster.

All in all, it doesn't seem worth the trouble compared to just kicking
FUNC_MAX_ARGS up a notch.  At least not right now.  I think we've
created quite enough system-catalog changes for one release cycle ;-)

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] fate of CLUSTER command ?

2002-08-03 Thread Alvaro Herrera

Oleg Bartunov dijo: 

 I just tried CLUSTER command at fts.postgresql.org to cluster
 fts index and got very visual performance win. Unfortunately
 I had to restore permissions and recreate other indices by hand.
 So, I'm interested what's a future of CLUSTER command ?

I'm working on CLUSTER.  I have a problem with dependency tracking right
now that I need to get fixed before the patch gets accepted, but that
shouldn't take long (hopefully).

The patch supposedly fixes all the concerns about CLUSTER (permissions,
other indexes, inheritance).

-- 
Alvaro Herrera (alvherre[a]atentus.com)
Officer Krupke, what are we to do?
Gee, officer Krupke, Krup you! (West Side Story, Gee, Officer Krupke)


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



[HACKERS] cvs changes and broken links

2002-08-03 Thread Joe Conway

I couldn't keep up with the list traffic this week, but I thought I saw 
enough to convince me that after it was all said and done, I would still 
be able to do `cvs co pgsql`. I'm finding today that after using cvsup 
to sync up, I can no longer checkout pgsql, but pgsql-server instead. Is 
this intended, or are there more changes left to be made?

Also, as side note, the link for cvsup is broken:
   http://developer.postgresql.org/TODO/docs/cvsup.html
and CVS tree Oragnization:
   http://developer.postgresql.org/TODO/docs/cvs-tree.html

Thanks,

Joe


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] fate of CLUSTER command ?

2002-08-03 Thread Oleg Bartunov

On Sat, 3 Aug 2002, Alvaro Herrera wrote:

 Oleg Bartunov dijo:

  I just tried CLUSTER command at fts.postgresql.org to cluster
  fts index and got very visual performance win. Unfortunately
  I had to restore permissions and recreate other indices by hand.
  So, I'm interested what's a future of CLUSTER command ?

 I'm working on CLUSTER.  I have a problem with dependency tracking right
 now that I need to get fixed before the patch gets accepted, but that
 shouldn't take long (hopefully).

 The patch supposedly fixes all the concerns about CLUSTER (permissions,
 other indexes, inheritance).


God news. Will it go to 7.3 ?



Regards,
Oleg
_
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] fate of CLUSTER command ?

2002-08-03 Thread Alvaro Herrera

Oleg Bartunov dijo: 

 On Sat, 3 Aug 2002, Alvaro Herrera wrote:
 
  Oleg Bartunov dijo:
 
   I just tried CLUSTER command at fts.postgresql.org to cluster
   fts index and got very visual performance win. Unfortunately
   I had to restore permissions and recreate other indices by hand.
   So, I'm interested what's a future of CLUSTER command ?
 
  I'm working on CLUSTER.  I have a problem with dependency tracking right
  now that I need to get fixed before the patch gets accepted, but that
  shouldn't take long (hopefully).
 
 God news. Will it go to 7.3 ?

In fact, I have just corrected the error and am submitting the patch for
revision and possible inclusion.

Please test it and check if it does what you need.  Let me know if it
doesn't, because it should.

-- 
Alvaro Herrera (alvherre[a]atentus.com)
Find a bug in a program, and fix it, and the program will work today.
Show the program how to find and fix a bug, and the program
will work forever (Oliver Silfridge)


---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] FUNC_MAX_ARGS benchmarks

2002-08-03 Thread Joe Conway

Tom Lane wrote:
 Did you happen to make any notes about the disk space occupied by the
 database?  One thing I was worried about was the bloat that'd occur
 in pg_proc, pg_index, and pg_proc_proname_args_nsp_index.  Aside from
 costing disk space, this would indirectly slow things down due to more
 I/O to read these tables --- an effect that probably your test couldn't
 measure, since it wasn't touching very many entries in any of those
 tables.

#define INDEX_MAX_KEYS16
#define FUNC_MAX_ARGSINDEX_MAX_KEYS
du -h --max-depth=1 /opt/data/pgsql/data/base/
2.7M/opt/data/pgsql/data/base/1
2.7M/opt/data/pgsql/data/base/16862
2.7M/opt/data/pgsql/data/base/16863
2.7M/opt/data/pgsql/data/base/16864
3.2M/opt/data/pgsql/data/base/16865
2.7M/opt/data/pgsql/data/base/16866
17M /opt/data/pgsql/data/base

#define INDEX_MAX_KEYS32
#define FUNC_MAX_ARGSINDEX_MAX_KEYS
  du -h --max-depth=1 /opt/data/pgsql/data/base/
3.1M/opt/data/pgsql/data/base/1
3.1M/opt/data/pgsql/data/base/16862
3.1M/opt/data/pgsql/data/base/16863
3.1M/opt/data/pgsql/data/base/16864
3.6M/opt/data/pgsql/data/base/16865
3.1M/opt/data/pgsql/data/base/16866
19M /opt/data/pgsql/data/base

#define INDEX_MAX_KEYS64
#define FUNC_MAX_ARGSINDEX_MAX_KEYS
du -h --max-depth=1 /opt/data/pgsql/data/base/
3.9M/opt/data/pgsql/data/base/1
3.9M/opt/data/pgsql/data/base/16862
3.9M/opt/data/pgsql/data/base/16863
3.9M/opt/data/pgsql/data/base/16864
4.4M/opt/data/pgsql/data/base/16865
3.9M/opt/data/pgsql/data/base/16866
24M /opt/data/pgsql/data/base

#define INDEX_MAX_KEYS128
#define FUNC_MAX_ARGSINDEX_MAX_KEYS
du -h --max-depth=1 /opt/data/pgsql/data/base/
5.7M/opt/data/pgsql/data/base/1
5.7M/opt/data/pgsql/data/base/16862
5.7M/opt/data/pgsql/data/base/16863
5.7M/opt/data/pgsql/data/base/16864
6.3M/opt/data/pgsql/data/base/16865
5.7M/opt/data/pgsql/data/base/16866
35M /opt/data/pgsql/data/base


Joe


---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] FUNC_MAX_ARGS benchmarks

2002-08-03 Thread Hannu Krosing

On Sat, 2002-08-03 at 23:20, Tom Lane wrote:
 Hannu Krosing [EMAIL PROTECTED] writes:
  How hard would it be to change pg_proc.proargtypes from oidvector to _oid
 
 Lack of btree index support for _oid would be the first hurdle.

Is that index really needed, or is it there just to enforce uniqueness ?

Would the lookup not be in some internal cache most of the time ?

Also, (imho ;) btree index support should be done for all array types
which have comparison ops for elements at once (with semantics similar
to string) not one by one for individual types. It should be in some
ways quite similar to multi-key indexes, so perhaps some code could be
borrowed from there.

Otoh, It should be a SMOP to write support for b-tree indexes just for
_oid :-p , most likely one could re-use code from oidvector ;)

 Even if we wanted to do that work, there'd be some serious breakage
 of client queries because of the historical differences in output format
 and subscripting.  (oidvector indexes from 0, _oid from 1.  Which is
 pretty bogus, but if the regression tests are anything to judge by there
 are probably a lot of queries out there that know this.)

I would guess that oidvector is sufficiently obscure type and that
nobody actually uses oidvector for user tables. 

It is also only used in two tables and one index in system tables:

hannu=# select relname,relkind from pg_class where oid in (
hannu-#   select attrelid from pg_attribute where atttypid=30);
 relname | relkind 
-+-
 pg_index| r
 pg_proc_proname_narg_type_index | i
 pg_proc | r
(3 rows)

  This could also get the requested 2% speedup,
 
 I'm not convinced that _oid would be faster.

Neither am I, but it _may_ be that having generally shorter oid arrays
wins us enough ;)

 All in all, it doesn't seem worth the trouble compared to just kicking
 FUNC_MAX_ARGS up a notch.  At least not right now.  I think we've
 created quite enough system-catalog changes for one release cycle ;-)

But going to _oid will free us from arbitrary limits on argument count.
Or at least from small arbitrary limits, as there will probably still be
the at-least-three-btree-keys-must-fit-in-page limit (makes  2600
args/function) and maybe some other internal limits as well.

--
Hannu


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] PITR, checkpoint, and local relations

2002-08-03 Thread Bruce Momjian

Christopher Kings-Lynne wrote:
  The main area where it seems to get heavy use is during index builds,
  and for 'CREATE TABLE AS SELECT...'.
 
  So I will remove the local buffer manager as part of the PITR patch,
  unless there is further objection.
 
 Would someone mind filling me in as to what the local bugger manager is and
 how it is different (and not useful) compared to the shared buffer manager?

Sure.  I think I can handle that.

When you create a table in a transaction, there isn't any committed
state to the table yet, so any table modifications are kept in a local
buffer, which is local memory to the backend(?).  No one needs to see it
because it isn't visible to anyone yet.  Same for indexes.

Anyway, the WAL activity doesn't handle local buffers the same as shared
buffers because there is no crisis if the system crashes.

There is debate on whether the local buffers are even valuable
considering the headache they cause in other parts of the system.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



Re: [HACKERS] FUNC_MAX_ARGS benchmarks

2002-08-03 Thread Bruce Momjian

Hannu Krosing wrote:
 On Sat, 2002-08-03 at 23:20, Tom Lane wrote:
  Hannu Krosing [EMAIL PROTECTED] writes:
   How hard would it be to change pg_proc.proargtypes from oidvector to _oid
  
  Lack of btree index support for _oid would be the first hurdle.
 
 Is that index really needed, or is it there just to enforce uniqueness ?

Needed to look up functions based on their args.

The big issue of using arrays is that we don't have cache capability for
variable length fields.  Until we get that, we are stuck with
NAMEDATALEN taking the full length, and oidvector taking the full
length.

And if we went with variable length, there may be a performance penalty.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] fate of CLUSTER command ?

2002-08-03 Thread Bruce Momjian

Oleg Bartunov wrote:
 I just tried CLUSTER command at fts.postgresql.org to cluster
 fts index and got very visual performance win. Unfortunately
 I had to restore permissions and recreate other indices by hand.
 So, I'm interested what's a future of CLUSTER command ?

Yes, I have always liked CLUSTER with full text searches because you are
usually hitting multiple rows with a single equaltiy restriction, and
CLUSTER puts all the hits on the same page.

If you look in contrib/fulltextindex, you will see mention of CLUSTER in
the README.  It may make sense to add that to your documentation.

Also, is there any value to contrib/fulltextindex now that we have
contrib/tsearch?

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] fate of CLUSTER command ?

2002-08-03 Thread Bruce Momjian


Also, let me add that CLUSTER in 7.3 will be fully functional because we
will no longer be changing the oid of the table during cluster.   This
will allow people to use CLUSTER more frequently/safely.

---

Bruce Momjian wrote:
 Oleg Bartunov wrote:
  I just tried CLUSTER command at fts.postgresql.org to cluster
  fts index and got very visual performance win. Unfortunately
  I had to restore permissions and recreate other indices by hand.
  So, I'm interested what's a future of CLUSTER command ?
 
 Yes, I have always liked CLUSTER with full text searches because you are
 usually hitting multiple rows with a single equaltiy restriction, and
 CLUSTER puts all the hits on the same page.
 
 If you look in contrib/fulltextindex, you will see mention of CLUSTER in
 the README.  It may make sense to add that to your documentation.
 
 Also, is there any value to contrib/fulltextindex now that we have
 contrib/tsearch?
 
 -- 
   Bruce Momjian|  http://candle.pha.pa.us
   [EMAIL PROTECTED]   |  (610) 853-3000
   +  If your life is a hard drive, |  830 Blythe Avenue
   +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026
 
 ---(end of broadcast)---
 TIP 2: you can get off all lists at once with the unregister command
 (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly



Re: [HACKERS] FUNC_MAX_ARGS benchmarks

2002-08-03 Thread Bruce Momjian


OK, time to get moving folks.  Looks like the increase in the function
args to 32 and the NAMEDATALEN to 128 has been sufficiently tested.  Tom
has some ideas on removing some memset() calls for function args to
speed things up, but we don't have to wait for that go get going.  The
end of August is nearing.

Is there any reason to delay the change further?

---

Joe Conway wrote:
 Tom Lane wrote:
  Did you happen to make any notes about the disk space occupied by the
  database?  One thing I was worried about was the bloat that'd occur
  in pg_proc, pg_index, and pg_proc_proname_args_nsp_index.  Aside from
  costing disk space, this would indirectly slow things down due to more
  I/O to read these tables --- an effect that probably your test couldn't
  measure, since it wasn't touching very many entries in any of those
  tables.
 
 #define INDEX_MAX_KEYS16
 #define FUNC_MAX_ARGSINDEX_MAX_KEYS
 du -h --max-depth=1 /opt/data/pgsql/data/base/
 2.7M/opt/data/pgsql/data/base/1
 2.7M/opt/data/pgsql/data/base/16862
 2.7M/opt/data/pgsql/data/base/16863
 2.7M/opt/data/pgsql/data/base/16864
 3.2M/opt/data/pgsql/data/base/16865
 2.7M/opt/data/pgsql/data/base/16866
 17M /opt/data/pgsql/data/base
 
 #define INDEX_MAX_KEYS32
 #define FUNC_MAX_ARGSINDEX_MAX_KEYS
   du -h --max-depth=1 /opt/data/pgsql/data/base/
 3.1M/opt/data/pgsql/data/base/1
 3.1M/opt/data/pgsql/data/base/16862
 3.1M/opt/data/pgsql/data/base/16863
 3.1M/opt/data/pgsql/data/base/16864
 3.6M/opt/data/pgsql/data/base/16865
 3.1M/opt/data/pgsql/data/base/16866
 19M /opt/data/pgsql/data/base
 
 #define INDEX_MAX_KEYS64
 #define FUNC_MAX_ARGSINDEX_MAX_KEYS
 du -h --max-depth=1 /opt/data/pgsql/data/base/
 3.9M/opt/data/pgsql/data/base/1
 3.9M/opt/data/pgsql/data/base/16862
 3.9M/opt/data/pgsql/data/base/16863
 3.9M/opt/data/pgsql/data/base/16864
 4.4M/opt/data/pgsql/data/base/16865
 3.9M/opt/data/pgsql/data/base/16866
 24M /opt/data/pgsql/data/base
 
 #define INDEX_MAX_KEYS128
 #define FUNC_MAX_ARGSINDEX_MAX_KEYS
 du -h --max-depth=1 /opt/data/pgsql/data/base/
 5.7M/opt/data/pgsql/data/base/1
 5.7M/opt/data/pgsql/data/base/16862
 5.7M/opt/data/pgsql/data/base/16863
 5.7M/opt/data/pgsql/data/base/16864
 6.3M/opt/data/pgsql/data/base/16865
 5.7M/opt/data/pgsql/data/base/16866
 35M /opt/data/pgsql/data/base
 
 
 Joe
 
 
 ---(end of broadcast)---
 TIP 2: you can get off all lists at once with the unregister command
 (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] getpid() function

2002-08-03 Thread Bruce Momjian


As I remember, most cases where people have recently been asking for
backend pid were related to temp tables because they were named by pid. 
I don't think they are anymore.  (?)

We can do two things.  We can either rename it to pg_backend_pid and
move it to the statistics section in the docs, where the backend pids of
all active backends are available, or remove my code additions and see
if anyone asks for it in 7.3.

---

Tom Lane wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
   Perhaps a more relevant question is why are we cluttering the namespace
   with any such function at all?  What's the use case for it?
 
  It was requested because it is exposed in libpq and people need it to
  generate unique names and stuff like that from within psql and
  functions.  Seems like a valid use for the pid.
 
 The sole reason libpq exposes it is so that you can tell a self-notify
 from an incoming notify.  (ie, given you are LISTENing on a condition
 that both you and other clients send NOTIFYs for, is this particular
 message one that you sent yourself, or not?  Compare the originator PID
 in the NOTIFY message to your backend_pid to find out.)  I put that
 feature in back around 6.4, because it allowed some important
 optimizations in an app I had that used LISTEN/NOTIFY a lot.
 
 Since NOTIFY messages aren't even visible at the SQL level, the above is
 not a reason for making PIDs visible at the SQL level.
 
 I'm really dubious about using backend PID for the sort of purpose you
 suggest.  Unique names would be *much* more safely handled with, say,
 a sequence generator.  If you are not using libpq or another client
 library that can give you a backend-PID API call, then very likely you
 don't have a lot of control over the backend connection either, and
 shouldn't assume that backend PID is going to be stable for you.
 (Think about pooled connections in a webserver, etc.)
 
 Finally, the most legitimate uses of PID (like discovering a backend PID
 to send SIGINT to, when some client query is running wild) are not
 supported at all by a function that can only return your own backend's
 PID, because that's seldom the PID you need to know.  The
 pg_stat_activity view handles this much better.
 
 So I'm still unconvinced that we need or want this ...
 
   regards, tom lane
 
 ---(end of broadcast)---
 TIP 2: you can get off all lists at once with the unregister command
 (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] FUNC_MAX_ARGS benchmarks

2002-08-03 Thread Tom Lane

Hannu Krosing [EMAIL PROTECTED] writes:
 Lack of btree index support for _oid would be the first hurdle.

 Is that index really needed, or is it there just to enforce uniqueness ?

Both.

 Also, (imho ;) btree index support should be done for all array types
 which have comparison ops for elements at once (with semantics similar
 to string) not one by one for individual types.

Fine, send a patch ;-)

 Even if we wanted to do that work, there'd be some serious breakage
 of client queries because of the historical differences in output format
 and subscripting.  (oidvector indexes from 0, _oid from 1.  Which is
 pretty bogus, but if the regression tests are anything to judge by there
 are probably a lot of queries out there that know this.)

 I would guess that oidvector is sufficiently obscure type and that
 nobody actually uses oidvector for user tables. 

No, you miss my point: client queries that do subscripting on
proargtypes will break.  Since the regression tests find this a useful
thing to do, I suspect there are clients out there that do too.

 But going to _oid will free us from arbitrary limits on argument count.

I didn't say it wouldn't be a good idea in the long run.  I'm saying I
don't think it's happening for 7.3, given that Aug 31 is not that far
away anymore and that a lot of cleanup work remains undone on other
already-committed features.  FUNC_MAX_ARGS=32 could happen for 7.3, though.

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



[HACKERS] CLUSTER and indisclustered

2002-08-03 Thread Gavin Sherry

Hi all,

It occured to me on the plane home that now that CLUSTER is fixed we may
be able to put pg_index.indisclustered to use. If CLUSTER was to set
indisclustered to true when it clusters a heap according to the given
index, we could speed up sequantial scans. There are two possible ways.

1) Planner determines that a seqscan is appropriate *and* the retrieval is
qualified by the key(s) of one of the relation's indexes
2) Planner determines that the relation is clustered on disk according to
the index over the key(s) used to qualify the retrieval
3) Planner sets an appropriate nodeTag for the retrieval (SeqScanCluster?)
4) ExecProcNode() calls some new scan routine, ExecSeqScanCluster() ?
5) ExecSeqScanCluster() calls ExecScan() with a new ExecScanAccessMtd (ie,
different from SeqNext) called SeqClusterNext
6) SeqClusterNext() has all the heapgettup() logic with two
exceptions: a) we find the first tuple more intelligently (instead of
scanning from the first page) b) if we have found tuple(s) matching the
ScanKey when we encounter an non-matching tuple (via
HeapTupleSatisfies() ?) we return a NULL'ed out tuple, terminating the
scan

Any reason this isn't possible? Any reason it couldn't dramatically speed
up the performance of the type of query i've mentioned?

Gavin


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



Re: [HACKERS] PITR, checkpoint, and local relations

2002-08-03 Thread Tom Lane

Bruce Momjian [EMAIL PROTECTED] writes:
 There is debate on whether the local buffers are even valuable
 considering the headache they cause in other parts of the system.

More specifically, the issue is that when (if) you commit, the contents
of the new table now have to be pushed out to shared storage.  This is
moderately annoying in itself (among other things, it implies fsync'ing
those tables before commit).  But the real reason it comes up now is
that the proposed PITR scheme can't cope gracefully with tables that
are suddenly there but weren't participating in checkpoints before.

It looks to me like we should stop using local buffers for ordinary
tables that happen to be in their first transaction of existence.
But, per Vadim's suggestion, we shouldn't abandon the local buffer
manager altogether.  What we could and should use it for is TEMP tables,
which have no need to be checkpointed or WAL-logged or fsync'd or
accessible to other backends *ever*.  Also, a temp table can leave
blocks in local buffers across transactions, which makes local buffers
considerably more useful than they are now.

If temp tables didn't use the shared bufmgr nor did updates to them get
WAL-logged, they'd be noticeably more efficient than plain tables, which
IMHO would be a Good Thing.  Such tables would be essentially invisible
to WAL and PITR (at least their contents would be --- I assume we'd
still log file creation and deletion).  But I can't see anything wrong
with that.

In short, the proposal runs something like this:

* Regular tables that happen to be in their first transaction of
existence are not treated differently from any other regular table so
far as buffer management or WAL or PITR go.  (rd_myxactonly either goes
away or is used for much less than it is now.)

* TEMP tables use the local buffer manager for their entire existence.
(This probably means adding an rd_istemp flag to relcache entries, but
I can't see anything wrong with that.)

* Local bufmgr semantics are twiddled to reflect this reality --- in
particular, data in local buffers can be held across transactions, there
is no end-of-transaction write (much less fsync).  A TEMP table that
isn't too large might never touch disk at all.

* Data operations in TEMP tables do not get WAL-logged, nor do we
WAL-log page images of local-buffer pages.


These changes seem very attractive to me even without regard for making
the world safer for PITR.  I'm willing to volunteer to make them happen,
if there are no objections.

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] FUNC_MAX_ARGS benchmarks

2002-08-03 Thread Tom Lane

Bruce Momjian [EMAIL PROTECTED] writes:
 OK, time to get moving folks.  Looks like the increase in the function
 args to 32 and the NAMEDATALEN to 128 has been sufficiently tested.

I'm convinced by Joe's numbers that FUNC_MAX_ARGS = 32 shouldn't hurt
too much.  But have we done equivalent checks on NAMEDATALEN?  In
particular, do we know what it does to the size of template1?

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] getpid() function

2002-08-03 Thread Tom Lane

Bruce Momjian [EMAIL PROTECTED] writes:
 As I remember, most cases where people have recently been asking for
 backend pid were related to temp tables because they were named by pid. 

Ah, good point.

 I don't think they are anymore.  (?)

Check.

 We can do two things.  We can either rename it to pg_backend_pid and
 move it to the statistics section in the docs, where the backend pids of
 all active backends are available, or remove my code additions and see
 if anyone asks for it in 7.3.

Let's take it out and wait to see if anyone really still wants it.

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] CLUSTER and indisclustered

2002-08-03 Thread Tom Lane

Gavin Sherry [EMAIL PROTECTED] writes:
 It occured to me on the plane home that now that CLUSTER is fixed we may
 be able to put pg_index.indisclustered to use. If CLUSTER was to set
 indisclustered to true when it clusters a heap according to the given
 index, we could speed up sequantial scans.

AFAICT you're assuming that the table is *exactly* ordered by the
clustered attribute.  While this is true at the instant CLUSTER
completes, the exact ordering will be destroyed by the first insert or
update :-(.  I can't see much value in creating a whole new scan type
that's only usable on a perfectly-clustered table.

The existing approach to making the planner smart about clustered tables
is to compute a physical-vs-logical-order-correlation statistic and use
that to adjust the estimated cost of indexscans.  I believe this is a
more robust approach than considering a table to be clustered or not
clustered, since it can deal with the gradual degradation of clustered
order over time.  However, I will not make any great claims for the
specific equations currently used for this purpose --- they're surely in
need of improvement.  Feel free to take a look and see if you have any
ideas.  The collection of the statistic is in commands/analyze.c and the
use of it is in optimizer/path/costsize.c.

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] PITR, checkpoint, and local relations

2002-08-03 Thread Bruce Momjian


Sounds like a win all around; make PITR easier and temp tables faster.


---

Tom Lane wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
  There is debate on whether the local buffers are even valuable
  considering the headache they cause in other parts of the system.
 
 More specifically, the issue is that when (if) you commit, the contents
 of the new table now have to be pushed out to shared storage.  This is
 moderately annoying in itself (among other things, it implies fsync'ing
 those tables before commit).  But the real reason it comes up now is
 that the proposed PITR scheme can't cope gracefully with tables that
 are suddenly there but weren't participating in checkpoints before.
 
 It looks to me like we should stop using local buffers for ordinary
 tables that happen to be in their first transaction of existence.
 But, per Vadim's suggestion, we shouldn't abandon the local buffer
 manager altogether.  What we could and should use it for is TEMP tables,
 which have no need to be checkpointed or WAL-logged or fsync'd or
 accessible to other backends *ever*.  Also, a temp table can leave
 blocks in local buffers across transactions, which makes local buffers
 considerably more useful than they are now.
 
 If temp tables didn't use the shared bufmgr nor did updates to them get
 WAL-logged, they'd be noticeably more efficient than plain tables, which
 IMHO would be a Good Thing.  Such tables would be essentially invisible
 to WAL and PITR (at least their contents would be --- I assume we'd
 still log file creation and deletion).  But I can't see anything wrong
 with that.
 
 In short, the proposal runs something like this:
 
 * Regular tables that happen to be in their first transaction of
 existence are not treated differently from any other regular table so
 far as buffer management or WAL or PITR go.  (rd_myxactonly either goes
 away or is used for much less than it is now.)
 
 * TEMP tables use the local buffer manager for their entire existence.
 (This probably means adding an rd_istemp flag to relcache entries, but
 I can't see anything wrong with that.)
 
 * Local bufmgr semantics are twiddled to reflect this reality --- in
 particular, data in local buffers can be held across transactions, there
 is no end-of-transaction write (much less fsync).  A TEMP table that
 isn't too large might never touch disk at all.
 
 * Data operations in TEMP tables do not get WAL-logged, nor do we
 WAL-log page images of local-buffer pages.
 
 
 These changes seem very attractive to me even without regard for making
 the world safer for PITR.  I'm willing to volunteer to make them happen,
 if there are no objections.
 
   regards, tom lane
 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] FUNC_MAX_ARGS benchmarks

2002-08-03 Thread Bruce Momjian

Tom Lane wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
  OK, time to get moving folks.  Looks like the increase in the function
  args to 32 and the NAMEDATALEN to 128 has been sufficiently tested.
 
 I'm convinced by Joe's numbers that FUNC_MAX_ARGS = 32 shouldn't hurt
 too much.  But have we done equivalent checks on NAMEDATALEN?  In
 particular, do we know what it does to the size of template1?

No, I thought we saw the number, was 30%?   No, we did a test for 64.
Can someone get us that number for 128?

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] CLUSTER and indisclustered

2002-08-03 Thread Gavin Sherry

On Sat, 3 Aug 2002, Tom Lane wrote:

 Gavin Sherry [EMAIL PROTECTED] writes:
  It occured to me on the plane home that now that CLUSTER is fixed we may
  be able to put pg_index.indisclustered to use. If CLUSTER was to set
  indisclustered to true when it clusters a heap according to the given
  index, we could speed up sequantial scans.
 
 AFAICT you're assuming that the table is *exactly* ordered by the
 clustered attribute.  While this is true at the instant CLUSTER
 completes, the exact ordering will be destroyed by the first insert or
 update :-(.  I can't see much value in creating a whole new scan type

Sorry, I meant to say that heap_insert() etc would need to set
indisclustered to false.

I do see some worth in this however. Naturally, in a situation where a
database is being modified very often this is of little value. However,
for applications focussed on analysing large amounts of static data this
could increase performance significantly. Once I get some time I will
attempt to explore this further in `diff -c` format :-).

Gavin


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] CLUSTER and indisclustered

2002-08-03 Thread Bruce Momjian

Tom Lane wrote:
 Gavin Sherry [EMAIL PROTECTED] writes:
  It occured to me on the plane home that now that CLUSTER is fixed we may
  be able to put pg_index.indisclustered to use. If CLUSTER was to set
  indisclustered to true when it clusters a heap according to the given
  index, we could speed up sequantial scans.
 
 AFAICT you're assuming that the table is *exactly* ordered by the
 clustered attribute.  While this is true at the instant CLUSTER
 completes, the exact ordering will be destroyed by the first insert or
 update :-(.  I can't see much value in creating a whole new scan type
 that's only usable on a perfectly-clustered table.
 
 The existing approach to making the planner smart about clustered tables
 is to compute a physical-vs-logical-order-correlation statistic and use
 that to adjust the estimated cost of indexscans.  I believe this is a
 more robust approach than considering a table to be clustered or not
 clustered, since it can deal with the gradual degradation of clustered
 order over time.  However, I will not make any great claims for the
 specific equations currently used for this purpose --- they're surely in
 need of improvement.  Feel free to take a look and see if you have any
 ideas.  The collection of the statistic is in commands/analyze.c and the
 use of it is in optimizer/path/costsize.c.

Tom, should we be updating that flag after we CLUSTER instead of
requiring an ANALYZE after the CLUSTER?

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] CLUSTER and indisclustered

2002-08-03 Thread Bruce Momjian

Gavin Sherry wrote:
 Hi all,
 
 It occured to me on the plane home that now that CLUSTER is fixed we may
 be able to put pg_index.indisclustered to use. If CLUSTER was to set
 indisclustered to true when it clusters a heap according to the given
 index, we could speed up sequantial scans. There are two possible ways.
 
 1) Planner determines that a seqscan is appropriate *and* the retrieval is
 qualified by the key(s) of one of the relation's indexes
 2) Planner determines that the relation is clustered on disk according to
 the index over the key(s) used to qualify the retrieval
 3) Planner sets an appropriate nodeTag for the retrieval (SeqScanCluster?)
 4) ExecProcNode() calls some new scan routine, ExecSeqScanCluster() ?
 5) ExecSeqScanCluster() calls ExecScan() with a new ExecScanAccessMtd (ie,
 different from SeqNext) called SeqClusterNext
 6) SeqClusterNext() has all the heapgettup() logic with two
 exceptions: a) we find the first tuple more intelligently (instead of
 scanning from the first page) b) if we have found tuple(s) matching the
 ScanKey when we encounter an non-matching tuple (via
 HeapTupleSatisfies() ?) we return a NULL'ed out tuple, terminating the
 scan

Gavin, is that a big win compared to just using the index and looping
through the entries, knowing that the index matches are on the same
page, and the heap matches are on the same page.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] getpid() function

2002-08-03 Thread Bruce Momjian

Tom Lane wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
  As I remember, most cases where people have recently been asking for
  backend pid were related to temp tables because they were named by pid. 
 
 Ah, good point.
 
  I don't think they are anymore.  (?)
 
 Check.
 
  We can do two things.  We can either rename it to pg_backend_pid and
  move it to the statistics section in the docs, where the backend pids of
  all active backends are available, or remove my code additions and see
  if anyone asks for it in 7.3.
 
 Let's take it out and wait to see if anyone really still wants it.

Just when I am ready to throw it away, I come up with a use for the
function:

test= select * from pg_stat_activity where procpid != backend_pid();

This shows all activity _except_ my session, which pgmonitor or others
may want to use, and I can think of no other way to do it.

Comments?  Maybe this is why it should be called pg_backend_id and put
in the stat section.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] CLUSTER and indisclustered

2002-08-03 Thread Gavin Sherry

On Sat, 3 Aug 2002, Bruce Momjian wrote:

 Gavin Sherry wrote:
  Hi all,
  
  It occured to me on the plane home that now that CLUSTER is fixed we may
  be able to put pg_index.indisclustered to use. If CLUSTER was to set
  indisclustered to true when it clusters a heap according to the given
  index, we could speed up sequantial scans. There are two possible ways.
  
  1) Planner determines that a seqscan is appropriate *and* the retrieval is
  qualified by the key(s) of one of the relation's indexes
  2) Planner determines that the relation is clustered on disk according to
  the index over the key(s) used to qualify the retrieval
  3) Planner sets an appropriate nodeTag for the retrieval (SeqScanCluster?)
  4) ExecProcNode() calls some new scan routine, ExecSeqScanCluster() ?
  5) ExecSeqScanCluster() calls ExecScan() with a new ExecScanAccessMtd (ie,
  different from SeqNext) called SeqClusterNext
  6) SeqClusterNext() has all the heapgettup() logic with two
  exceptions: a) we find the first tuple more intelligently (instead of
  scanning from the first page) b) if we have found tuple(s) matching the
  ScanKey when we encounter an non-matching tuple (via
  HeapTupleSatisfies() ?) we return a NULL'ed out tuple, terminating the
  scan
 
 Gavin, is that a big win compared to just using the index and looping
 through the entries, knowing that the index matches are on the same
 page, and the heap matches are on the same page.

Bruce,

It would cut out the index over head. Besides at (1) (above) we would have
determined that an index scan was too expensive and we would be using a
SeqScan instead. This would just be faster, since a) we would locate the
tuples more intelligently b) we wouldn't need to scan the whole heap once
we'd found all tuples matching the scan key.

Gavin


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] CLUSTER and indisclustered

2002-08-03 Thread Bruce Momjian

Tom Lane wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
  Tom, should we be updating that flag after we CLUSTER instead of
  requiring an ANALYZE after the CLUSTER?
 
 Could do that I suppose, but I'm not super-excited about it.  ANALYZE is
 quite cheap these days (especially in comparison to CLUSTER ;-)).  I'd
 settle for a note in the CLUSTER docs that recommends a subsequent
 ANALYZE --- this seems no different from recommending ANALYZE after bulk
 data load or other major update of a table.

OK.  I am sure it is not obvious to people to ANALYZE because the data
in their table hasn't changed, just the ordering.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] CLUSTER and indisclustered

2002-08-03 Thread Bruce Momjian

Gavin Sherry wrote:
  Gavin, is that a big win compared to just using the index and looping
  through the entries, knowing that the index matches are on the same
  page, and the heap matches are on the same page.
 
 Bruce,
 
 It would cut out the index over head. Besides at (1) (above) we would have
 determined that an index scan was too expensive and we would be using a
 SeqScan instead. This would just be faster, since a) we would locate the
 tuples more intelligently b) we wouldn't need to scan the whole heap once
 we'd found all tuples matching the scan key.

Yes, but in a clustered table, an index scan is _never_ (?) more
expensive than a sequential scan, at least if the optimizer is working
correctly.  Index scans are slower only because they assume random heap
access, but with a clustered table, there is no random heap access.  The
index takes to right to the spot to start.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



Re: [HACKERS] getpid() function

2002-08-03 Thread Tom Lane

Bruce Momjian [EMAIL PROTECTED] writes:
 Let's take it out and wait to see if anyone really still wants it.

 Just when I am ready to throw it away, I come up with a use for the
 function:

   test= select * from pg_stat_activity where procpid != backend_pid();

 This shows all activity _except_ my session, which pgmonitor or others
 may want to use, and I can think of no other way to do it.

Hm.  Actually this seems like an argument for exposing MyBackendId, since
what pg_stat_activity really depends on is BackendId.  But as that view
is presently defined, you'd not be able to write
WHERE backendid = my_backend_id()
because the view doesn't expose backendid.

 Comments?  Maybe this is why it should be called pg_backend_id and put
 in the stat section.

*Please* don't call it pg_backend_id --- that invites confusion with
BackendId which is a different thing.

I'd suggest pg_backend_pid.

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] CLUSTER and indisclustered

2002-08-03 Thread Tom Lane

Gavin Sherry [EMAIL PROTECTED] writes:
 On Sat, 3 Aug 2002, Tom Lane wrote:
 AFAICT you're assuming that the table is *exactly* ordered by the
 clustered attribute.  While this is true at the instant CLUSTER
 completes, the exact ordering will be destroyed by the first insert or
 update :-(.  I can't see much value in creating a whole new scan type

 Sorry, I meant to say that heap_insert() etc would need to set
 indisclustered to false.

itch  You could do that, but only if you are prepared to invent
a mechanism that will instantly invalidate any existing query plans
that assume the clustered ordering is good.

Up to now we've only allowed the planner to make decisions that impact
performace, not correctness of the result.  I'm uncomfortable with the
idea that a clusterscan plan could silently return wrong answers after
someone else updates the table and doesn't tell us they did.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] getpid() function

2002-08-03 Thread Bruce Momjian

Tom Lane wrote:
 Bruce Momjian [EMAIL PROTECTED] writes:
  Let's take it out and wait to see if anyone really still wants it.
 
  Just when I am ready to throw it away, I come up with a use for the
  function:
 
  test= select * from pg_stat_activity where procpid != backend_pid();
 
  This shows all activity _except_ my session, which pgmonitor or others
  may want to use, and I can think of no other way to do it.
 
 Hm.  Actually this seems like an argument for exposing MyBackendId, since
 what pg_stat_activity really depends on is BackendId.  But as that view
 is presently defined, you'd not be able to write
   WHERE backendid = my_backend_id()
 because the view doesn't expose backendid.

Yes.

  Comments?  Maybe this is why it should be called pg_backend_id and put
  in the stat section.
 
 *Please* don't call it pg_backend_id --- that invites confusion with
 BackendId which is a different thing.
 
 I'd suggest pg_backend_pid.

Sorry, I mean pg_backend_pid.   I could expose backend_id but it may
confuse people so pid is probably better.  If you had the id, you could
use pg_stat_get_backend_pid() to get the pid.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])



Re: [HACKERS] CLUSTER and indisclustered

2002-08-03 Thread Tom Lane

Bruce Momjian [EMAIL PROTECTED] writes:
 Tom, should we be updating that flag after we CLUSTER instead of
 requiring an ANALYZE after the CLUSTER?

Could do that I suppose, but I'm not super-excited about it.  ANALYZE is
quite cheap these days (especially in comparison to CLUSTER ;-)).  I'd
settle for a note in the CLUSTER docs that recommends a subsequent
ANALYZE --- this seems no different from recommending ANALYZE after bulk
data load or other major update of a table.

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



Re: [HACKERS] getpid() function

2002-08-03 Thread Tom Lane

Bruce Momjian [EMAIL PROTECTED] writes:
 Sorry, I mean pg_backend_pid.

Okay, I was unsure if that was a typo or not.

 I could expose backend_id but it may
 confuse people so pid is probably better.  If you had the id, you could
 use pg_stat_get_backend_pid() to get the pid.

Yeah, I thought of suggesting pg_backend_id() to return MyBackendId and
then pg_stat_get_backend_pid() to get the PID, but was stopped by the
thought that this breaks down if the stats collector isn't running.
While I'm not convinced that there's any need for backend PID that's not
connected to looking at stats-collector results, it's probably foolish
to set up a mechanism that doesn't work outside that context.  Let's go
with pg_backend_pid().

regards, tom lane

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster



Re: [HACKERS] [PATCHES] START TRANSACTION

2002-08-03 Thread Alvaro Herrera

Peter Eisentraut dijo: 

 Neil Conway writes:
 
  The attached patch implements START TRANSACTION, per SQL99. The
  functionality of the command is basically identical to that of
  BEGIN; it just accepts a few extra options (only one of which
  PostgreSQL currently implements), and is standards-compliant.
  The patch includes a simple regression test and documentation.
 
 Very nice patch, but I don't think we need the regression test.  It's a
 bit too simple.

That makes me wonder: should I produce some regression tests for
CLUSTER?

-- 
Alvaro Herrera (alvherre[a]atentus.com)
Investigación es lo que hago cuando no sé lo que estoy haciendo
(Wernher von Braun)


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



Re: [HACKERS] Why is MySQL more chosen over PostgreSQL?

2002-08-03 Thread Curt Sampson

On 3 Aug 2002, Hannu Krosing wrote:

 On Sat, 2002-08-03 at 16:32, Curt Sampson wrote:
  On 2 Aug 2002, Hannu Krosing wrote:
 
  Perhaps this is the problem. I disagree that it's  a higher level.

 I don't mean morally higher ;)
 Just more concise and easier to grasp, same as VIEW vs. TABLE + ON xxx
 DO INSTEAD rules.

That's because we don't do a good job of implementing updatable views.
Views ought to be as fully updatable as possible given the definition,
without having to define rules for doing this. Simple views such as

CREATE TABLE tab1 (
id  int,
foo text
)
CREATE TABLE tab2 (
id  int,
bar text
)
CREATE VIEW something AS
SELECT tab1.id, tab1.foo, tab2.bar
FROM tab1, tab2
WHERE tab1.id = tab2.id

ought to be completely updatable without any special rules.

For further info see the detailed discussion of this in Date's
database textbook.

 That is the same way that C is higher than ASM and ASM is higher than
 writing code directly using hex editor.

No, this is the same way that Smalltalk is higher than Lisp.
(I.e., it isn't.)

cjs
-- 
Curt Sampson  [EMAIL PROTECTED]   +81 90 7737 2974   http://www.netbsd.org
Don't you know, in this new Dark Age, we're all light.  --XTC


---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



Re: [HACKERS] Patch for Bug of PL/pgSQL parser

2002-08-03 Thread Bruce Momjian


Patch rejected.  Tom Lane pointed out some mistakes in this patch, and
the patch does not show any corrections.

---

eutm wrote:
Some weeks ago i wrote about one problem(called as 
 Bug of PL/pgSQL parser):
 
 eutm [EMAIL PROTECTED] writes:
  Dear Sirs!:)I encounted one small problem,working with
  PostgreSQL 7.3devel.It can look a
  bit strange,but i have to use whitespaces in names of
 databases,tables,fields
  and so on(like roomno jk).It's possible to create them all and work
 with them
  (INSERT,DELETE,UPDATE),but PL/pgSQL parser(compiler ?) can't execute such
  statements...
 
 Today i send a simple patch(my first:)).
  Regards,Eugene.
 
 
 
 
 

[ Attachment, skipping... ]

 
 ---(end of broadcast)---
 TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 6: Have you searched our list archives?

http://archives.postgresql.org



Re: [HACKERS] [PATCHES] START TRANSACTION

2002-08-03 Thread Tom Lane

Alvaro Herrera [EMAIL PROTECTED] writes:
 That makes me wonder: should I produce some regression tests for
 CLUSTER?

It'd be a good thing.

regards, tom lane

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]



Re: [HACKERS] Please, apply ltree patch

2002-08-03 Thread Bruce Momjian


Patch applied.  Thanks.

---



Oleg Bartunov wrote:
 Bruce,
 
 please find attached patch to current CVS ( contrib/ltree )
 
 Changes:
 
 July 31, 2002
Now works on 64-bit platforms.
Added function lca - lowest common ancestor
Version for 7.2 is distributed as separate package -
http://www.sai.msu.su/~megera/postgres/gist/ltree/ltree-7.2.tar.gz
 
 
   Regards,
   Oleg
 _
 Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
 Sternberg Astronomical Institute, Moscow University (Russia)
 Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/
 phone: +007(095)939-16-83, +007(095)939-23-83

Content-Description: 

[ Attachment, skipping... ]

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 853-3000
  +  If your life is a hard drive, |  830 Blythe Avenue
  +  Christ can be your backup.|  Drexel Hill, Pennsylvania 19026

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] fate of CLUSTER command ?

2002-08-03 Thread Christopher Kings-Lynne

 Yes, I have always liked CLUSTER with full text searches because you are
 usually hitting multiple rows with a single equaltiy restriction, and
 CLUSTER puts all the hits on the same page.

 If you look in contrib/fulltextindex, you will see mention of CLUSTER in
 the README.  It may make sense to add that to your documentation.

 Also, is there any value to contrib/fulltextindex now that we have
 contrib/tsearch?

I haven't looked at tsearch yet, but I expect it's way better than
fulltextindex.  However there's more than a few of us using fulltextindex,
so I think it will need to stay for some while.  I'm working on a new
version of it for 7.3.

I can put pointers in the README about checking out tsearch...

Chris



---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/users-lounge/docs/faq.html



Re: [HACKERS] fate of CLUSTER command ?

2002-08-03 Thread Tom Lane

Christopher Kings-Lynne [EMAIL PROTECTED] writes:
 Also, is there any value to contrib/fulltextindex now that we have
 contrib/tsearch?

 I haven't looked at tsearch yet, but I expect it's way better than
 fulltextindex.  However there's more than a few of us using fulltextindex,
 so I think it will need to stay for some while.

Right, at least a couple releases.

 I'm working on a new version of it for 7.3.

What have you got in mind?

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to [EMAIL PROTECTED] so that your
message can get through to the mailing list cleanly