Re: [HACKERS] Index/Function organized table layout

2003-10-02 Thread Hannu Krosing
James Rogers kirjutas N, 02.10.2003 kell 23:44:
> On Thu, 2003-10-02 at 12:09, Hannu Krosing wrote:
> > So what you really need is the CLUSTER command to leave pages half-empty
> > and the tuple placement logic on inserts/updates to place new tuples
> > near the place where they would be placed by CLUSTER. I.e. the code that
> > does actual inserting should be aware of CLUSTERing.
> 
> 
> Not exactly. What you are describing is more akin to partitioning or
> hash-organized tables i.e. sorting insert/update tuples to various pages
> according to some hash function.

What I actually thought I was describing is how CLUSTER should work in a
postgres flavour of MVCC storage ;). Not the CLUSTER command, but the
whole feature.

> B-Tree organized tables basically make the table and the primary index
> the same thing, and tend to be most useful for tables that use a single
> multi-column primary key index for queries.  This has the effect of
> putting all the tuples for a typical query in the same small number of
> heap pages (and therefore buffers), allowing very efficient access in
> the typical case with the caveat that non-indexed queries will be quite
> slow.

AFAICS we could resolve this problem (querying indexes only) by keeping
a copy of visibility info (tmin,tmax,...) in index tuples. This would
make index updates bigger and thus slower, so this should be optional.

If you then put all fields in primary key, then the main table could be
dropped. If there is no "data" table then no other indexes would then be
allowed, or they must be "double-indexes" referencing the primary key,
not tuple and thus even bigger ...

> B-Tree organized tables are particularly useful when the insert order is
> orthogonal to the typical query order.  As I mentioned before, tables
> that store parallel collections of time-series data are classic
> examples.  Often the data is inserted into the pages in order that can
> roughly be described as (timestamp, id), but is queried using (id,
> timestamp) as the index.  If you have enough ids, you end up with the
> pathological case where you typically have one relevant tuple per page
> for a given query.

But if we had clustered the table on (id, timestamp), then the data
would be in right order for queries, if cluster worked well.

> The nuisance would be keeping track of which pages are collecting which
> tuples.  Knowing the CLUSTER index doesn't tell you much about which
> pages would currently be a good place to put a new tuple.  You could
> always markup the index that CLUSTER uses to keep track of good
> candidates (plus some additional structures), but the more I think about
> that, the more it looks like a nasty hack.

Yeah, index-organized tables seems exact fit for your problem, but then
my abstract idea of what clustering should do is exactly that - keep
tuples in roughly the same order as an index ;)

So what really is needed is a smart tuple-placer which can keep tuples
that are close (as defined by index) together in a small number of
pages. These pages themselves need not be coninuous, they can be
sprinkled around in the whole data table, but they need to stay clusters
of index-close tuples.
 

Hannu


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] minor view creation weirdness

2003-10-02 Thread Greg Stark

Tom Lane <[EMAIL PROTECTED]> writes:

> >> I'd almost argue that we should change this message to an error
> 
> > I agree.
> 
> Motion proposed and seconded; any objections out there?

Uhm, doesn't the spec have anything to say about this? 
I mean, the view sure looks like standard SQL on its face.

In any case, I would sure think there was something strange about a query
working fine as a select but not working in a view:

slo=>  select 'foo' as a,'bar' as b;
  a  |  b  
-+-
 foo | bar
(1 row)

slo=> create view x as select 'foo' as a,'bar' as b;
WARNING:  Attribute "a" has an unknown type
Proceeding with relation creation anyway
WARNING:  Attribute "b" has an unknown type
Proceeding with relation creation anyway
CREATE VIEW

slo=> select * from x;
  a  |  b  
-+-
 foo | bar
(1 row)

-- 
greg


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


[HACKERS] Beta4 Tag'd and Bundled ...

2003-10-02 Thread Marc G. Fournier

Check her over and let me know if there are any problems ... will do a
full general announce tomorrow for it ...

---(end of broadcast)---
TIP 8: explain analyze is your friend


[HACKERS] Quick question

2003-10-02 Thread Christopher Kings-Lynne
Hi guys,

If someone could help me with this, it would be cool.  How do I query 
the catalogs to find the underlying index for a constraint?  (Assuming 
the constraint is primary or unique)

Chris



---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] Weird locking situation

2003-10-02 Thread Christopher Kings-Lynne

I wonder if it's something to do with the tsearch trigger on food_foods?
Actually, it definitely seems to be the tsearch trigger.  The deadlock 
occurs on every table that uses tsearch trigger, and no table that doesn't.

It's probably not the tsearch trigger itself even, it's probably the 
fact that the tsearch trigger modifies the row during the update in the 
second session?

Chris



---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] Weird locking situation

2003-10-02 Thread Christopher Kings-Lynne
OK, I tried it again and it still seems buggy to me...



australia= begin;
BEGIN
australia=# select * from food_foods where food_id = 21 for update;
 food_id | category_id | brand_id | source_id |description 
| base | type |  created   |  modified  | water | kilojoules | 
calories | protein | total_fat | total_carbohydrate | sugars | 
starch_and_dextrins | fiber | calcium | phosphorus | iron | sodium | 
potassium | magnesium | zinc | retinol_equivalent | retinol | 
beta_carotene_equivalent | thiamin | riboflavin | niacin_equivalent | 
niacin | vitamin_c | alcohol | saturated_fatty_acids | 
monounsaturated_fatty_acids | poly_unsaturated_fatty_acids | 
omega3_fatty_acids | cholesterol | folate | caffeine | 
  ftiidx | in_palm |   brand_name   | staff_id
-+-+--+---++--+--+++---++--+-+---+++-+---+-++--++---+---+--++-+--+-++---++---+-+---+-+--++-++--+---+-++--
  21 |  91 |1 | 2 | Spirits: Brandy (40% 
Alc.) |  100 | V| 2000-07-01 | 2002-06-18 |  66.4 |858 | 
  207 |   0 | 0 |0.3 |0.3 | 
 |   | ||  |  2 |   | 
 |  || | 
   | ||   ||   | 
 29.4 | 0 | | 
|| || 
| '40' 'alc' 'brand' 'averag' 'brandi' 'spirit' | t   | - 
Average All Brands - |
(1 row)


australia=# update food_foods set calories=208 where food_id=21;


australia=# update food_foods set calories=207 where food_id=21;
ERROR:  deadlock detected

UPDATE 1
But strangely enough, it works just fine on another table:


australia=# begin;
BEGIN
australia=# select * from users_users where userid=1 for update;
 userid |  firstname  |  lastname   |email| 
username | password | admin | promo |  joindate 
 | country | postcode | suspended | address |   suburb| 
state | city  | sex |dob | phone  |   expiry   | freebie 
| listed |   last_time   | 
last_browser  | 
notify | referrer | cc_number | cc_name | cc_type | cc_expire_mon | 
cc_expire_year | recurring | meetings | publicdiary | suspended_on | 
suspended_off | online | message | msgreceive | recurring_id | 
cobrand_id | first_brand | last_brand | professional_id | publicjournal
+-+-+-+--+--+---+---++-+--+---+-+-+---+---+-++++-++---+++--+---+-+-+---++---+--+-+--+---++-++--++-++-+---
  1 | Christopher | Kings-Lynne | x | chriskl  | 
x| t | f | 2000-12-15 | AU  | 6007 | 
f | x | x| WA| Perth | M   | 1978-05-01 | 
xxx | 3002-02-02 | f   | t  | 2003-10-01 15:39:44.139815+08 
| Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.5a) 
Gecko/20030728 Mozilla Firebird/0.6.1 | t  |  |   | 
| |   || f | f 
   | t   |  |   | t  | t   | f 
 |  |  1 |   1 |  1 | 
   | t
(1 row)


australia=# update users_users set expiry='3000-01-01' where userid=1;
UPDATE 1

australia=#  update users_users set expiry='3000-01-01' where userid=1;
UPDATE 1
australia=# commit;
COMMIT
Table definitions are attached.  The RI_constraint triggers are there 
because this is on our test database, and there's a bit of screwiness 
with those constraints that adddepend couldn't fix.

I wonder if it's something to do with the tsearch trigger on food_foods?

Chris

Tom Lane wrote:

Christopher Kings-Lynne <[EMAIL PROTECTED]> writes:

What is going on here?  Surely getting a F

Re: [HACKERS] minor view creation weirdness

2003-10-02 Thread Christopher Kings-Lynne
nconway=# create view baz (a,b) as select 'hello', 'world';
WARNING:  column "a" has type "unknown"
DETAIL:  Proceeding with relation creation anyway.
It's always done that, although the spelling of the notice has
varied over the years.


I'd almost argue that we should change this message to an error


I agree.
Except that it would totally break backwards-compatibility?  Or will 
'unknown' fields in views be dumped with explicit casts?

Chris



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


[HACKERS] back from Washington, still busy

2003-10-02 Thread Bruce Momjian
I have returned from Washington, but one of my sons is in the hospital
with a mild pneumonia.   I think he is coming home tomorrow, so I will
read all my email this weekend.

It might seems strange I am reporting this, but I went away a while ago
and didn't inform the hackers list, and some people got concerned when
email requests weren't handled promptly.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 8: explain analyze is your friend


[HACKERS] Beta4 in the morning ...

2003-10-02 Thread Marc G. Fournier

Noticed a few late commits ... will tag-n-bundle beta4 tonight around
midnight, which should give the mirrors a chance to pick it up ...



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] Index/Function organized table layout

2003-10-02 Thread Alvaro Herrera Munoz
On Thu, Oct 02, 2003 at 10:09:12PM +0300, Hannu Krosing wrote:

> So what you really need is the CLUSTER command to leave pages half-empty
> and the tuple placement logic on inserts/updates to place new tuples
> near the place where they would be placed by CLUSTER. I.e. the code that
> does actual inserting should be aware of CLUSTERing.

Oh, you mean like what I half-proposed in

http://archives.postgresql.org/pgsql-general/2002-06/msg00968.php

BTW, this is the one message (the one on the CVS log I cite at the end)
that got me into Postgres hacking.  The little involvement I've had has
been great.  Thanks to all PostgreSQL hackers!

-- 
Alvaro Herrera (<[EMAIL PROTECTED]>)
"El realista sabe lo que quiere; el idealista quiere lo que sabe" (Anonimo)

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Index/Function organized table layout

2003-10-02 Thread James Rogers
On Thu, 2003-10-02 at 12:09, Hannu Krosing wrote:
> So what you really need is the CLUSTER command to leave pages half-empty
> and the tuple placement logic on inserts/updates to place new tuples
> near the place where they would be placed by CLUSTER. I.e. the code that
> does actual inserting should be aware of CLUSTERing.


Not exactly. What you are describing is more akin to partitioning or
hash-organized tables i.e. sorting insert/update tuples to various pages
according to some hash function.  This works well if you a table is
composed of many well-defined working sets but the access and query
patterns for a given working set are varied.  A common example of this
is for large accounting tables, which are frequently partitioned by a
function which truncates the record timestamp to the year/month.  Note
that indexes can be partitioned as well, so that as long as you are
doing queries within a given month (using the accounting example), the
query performance is generally what you would expect if the entire table
was the size of one month's data, no matter how big the table actually
gets.

Partitions and hash-organized tables (for RDBMS that support them) are
often handled internally as multiple tables masquerading as one (kind of
like a view), with the hash function determining which table is actually
being accessed.  This allows the possibility of doing things like
locking individual partitions or setting them "read-only", and generally
having the option of treating them as individual tables for
administrative purposes e.g. deleting a partition of old unused data
without adversely affecting the "active" partition in the way it would
if they were truly one table, even if they look like one table from a
SQL-level view.

What I really need in my specific case is B-Tree organized tables,
though hashing/partitioning would help too.  It *is* a similar kind of
problem.

B-Tree organized tables basically make the table and the primary index
the same thing, and tend to be most useful for tables that use a single
multi-column primary key index for queries.  This has the effect of
putting all the tuples for a typical query in the same small number of
heap pages (and therefore buffers), allowing very efficient access in
the typical case with the caveat that non-indexed queries will be quite
slow.

B-Tree organized tables are particularly useful when the insert order is
orthogonal to the typical query order.  As I mentioned before, tables
that store parallel collections of time-series data are classic
examples.  Often the data is inserted into the pages in order that can
roughly be described as (timestamp, id), but is queried using (id,
timestamp) as the index.  If you have enough ids, you end up with the
pathological case where you typically have one relevant tuple per page
for a given query.


> True, but my above suggestion would be much easier to implement
> near-term. It seems to be a nice incremental improvement just needs 
> to touch places:
> 
> 1. selecting where new tuples go : 
> 
>   * updated ones go near old ones if not clustered and near the place
> CLUSTER would place them if clustered. 
> 
>   * inserted ones go to the less than half-empty pages if not clustered
> and near the place CLUSTER would place them if clustered. 
> 
> 2. making reorganization code (CLUSTER and VACUUM FULL) to leave space 
> in pages for clustered updates/inserts.


The nuisance would be keeping track of which pages are collecting which
tuples.  Knowing the CLUSTER index doesn't tell you much about which
pages would currently be a good place to put a new tuple.  You could
always markup the index that CLUSTER uses to keep track of good
candidates (plus some additional structures), but the more I think about
that, the more it looks like a nasty hack.

Cheers,

-James Rogers
 [EMAIL PROTECTED]





---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] minor view creation weirdness

2003-10-02 Thread Tom Lane
Neil Conway <[EMAIL PROTECTED]> writes:
> On Thu, 2003-10-02 at 10:16, Tom Lane wrote:
>> Neil Conway <[EMAIL PROTECTED]> writes:
>>> nconway=# create view baz (a,b) as select 'hello', 'world';
>>> WARNING:  column "a" has type "unknown"
>>> DETAIL:  Proceeding with relation creation anyway.

>> I'd almost argue that we should change this message to an error

> I agree.

Motion proposed and seconded; any objections out there?

regards, tom lane

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] minor view creation weirdness

2003-10-02 Thread Neil Conway
On Thu, 2003-10-02 at 10:16, Tom Lane wrote:
> Neil Conway <[EMAIL PROTECTED]> writes:
> > nconway=# create view baz (a,b) as select 'hello', 'world';
> > WARNING:  column "a" has type "unknown"
> > DETAIL:  Proceeding with relation creation anyway.
> 
> It's always done that, although the spelling of the notice has
> varied over the years.

> I'd almost argue that we should change this message to an error

I agree.

-Neil



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [HACKERS] Large block size problems and notes...

2003-10-02 Thread Tom Lane
Sean Chittenden <[EMAIL PROTECTED]> writes:
> 2) While testing the above, I noted that 65K blocks fail and 32K is as
> big as PostgreSQL can handle, for one reason or another.

15-bit offsets in page item ids.

> but, it seems as though the test at the bottom of
> RelationGetBufferForTuple():
> should be changed to:
> if (len > MaxTupleSize)

No, it shouldn't.  That test would simply be redundant.

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] Index/Function organized table layout

2003-10-02 Thread Hannu Krosing
James Rogers kirjutas N, 02.10.2003 kell 20:50:

> To give a real world example, a standard query on one of our tables that
> has not been CLUSTER-ed recently (i.e. within the last several days)
> generates an average of ~2,000 cache misses.  Recently CLUSTER-ed, it
> generates ~0 cache misses on average.  Needless to say, one is *much*
> faster than the other. 

So what you really need is the CLUSTER command to leave pages half-empty
and the tuple placement logic on inserts/updates to place new tuples
near the place where they would be placed by CLUSTER. I.e. the code that
does actual inserting should be aware of CLUSTERing.

I guess that similar behaviour (half-empty pages, or even each second
page empty which is better as it creates less dirty buffers) could also
significantly speed up updates on huge number of tuples, as then code
could always select a place near the old one and thus avoid needless
head-movements between reading and writing areas.

> In my case, not only does CLUSTER-ing increase the number of concurrent
> queries possible without disk thrashing by an integer factor, but the
> number of buffers touched on a query that generates a cache misses is
> greatly reduced as well.  The problem is that CLUSTER-ing is costly and
> index-organizing some of the tables would reduce the buffer needs, since
> the index tuple in these cases are almost as large as the heap tuples
> they reference.

True, but my above suggestion would be much easier to implement
near-term. It seems to be a nice incremental improvement just needs 
to touch places:

1. selecting where new tuples go : 

  * updated ones go near old ones if not clustered and near the place
CLUSTER would place them if clustered. 

  * inserted ones go to the less than half-empty pages if not clustered
and near the place CLUSTER would place them if clustered. 

2. making reorganization code (CLUSTER and VACUUM FULL) to leave space 
in pages for clustered updates/inserts.

the "half" above could of course mean anything from 10% to 95% depending
on access patterns.

-
Hannu


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Thoughts on maintaining 7.3

2003-10-02 Thread Bruno Wolff III
On Thu, Oct 02, 2003 at 10:47:06 -0700,
  "Joshua D. Drake" <[EMAIL PROTECTED]> wrote:
> Hello,
> 
>  Possible scenario for maintaining 7.3:
> 
>  Only one or two committers  using a two stage cvs... one stage for 
> testing (not including sandbox), one stage for commit.
>  Scheduled releases based on non-critical fixes. Quarterly? Of course 
> critical fixes should be released as soon as plausible.
> 
>  Separate mailing list for 7.3 issues, concerns etc... Which would help 
> develop it's own temporary community.
> 
> Thoughts?

It might be better to split into two different trees. One just gets bug fixes,
the other gets bug fixes plus enhancements that won't require an initdb.

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


[HACKERS] Large block size problems and notes...

2003-10-02 Thread Sean Chittenden
Sorry, no benchmark results in this post, but I do have a few notes to
pass along for folks:

1) FreeBSD -devel port now has configurable block sizes
2) 65K blocks fail, I think erroneously
3) The size of the postmaster proc and friends explodes to 45MB
4) effective_cache_size is a bad name for a GUC
5) Larger databases with lots of seqscans should use 32K pages, IMHO

1) For the FreeBSD folks in the crowd, it is possible to test non 8K
block sizes with the databases/postgresql-devel port.

WITH_PGBLOCKSIZEIt's possible to change the blocksize to "16K"
or "32K" (defaults to 8K and different
block sizes require a dump, initdb, reload!)

# cd /usr/ports/databases/postgresql-devel
# make WITH_PGBLOCKSIZE=32K

Should be easy enough for folks to do.  Values of 16K and 32K will
change the block size values.  For those with the paranoid hat on,
I've also included the following message as a warning:

==
* DEFAULT BLOCK SIZE CONFIGURABLE *

Starting with 7.4, PostgreSQL's block size for the -devel port can be
changed from the default 8K blocks to either 16K or 32K blocks by
setting WITH_PGBLOCKSIZE to either "16K" or "32K".  Changing block
sizes has uncertain performance implications and can result in faster
operation, in other cases slower.  Benchmarking and testing your
installation is *highly recommended* before using these values in any
form of production!  When changing block sizes, it is necessary to
dump, initdb, and reload data when changing block sizes.  Be careful
when switching from databases/postgresql-devel and
databases/postgresql7 or from databases/postgresql-devel -> src
tarball builds that have different block sizes!
==



2) While testing the above, I noted that 65K blocks fail and 32K is as
big as PostgreSQL can handle, for one reason or another.  When you try
to initdb (postmaster dies in RelationGetBufferForTuple() in
src/backend/access/heap/hio.c, on line 274):

### BEGIN ###
Running in noclean mode.  Mistakes will not be cleaned up.
The files belonging to this database system will be owned by user "sean".
This user must also own the server process.

The database cluster will be initialized with locale C.

creating directory 
/usr/ports/databases/postgresql-devel/work/postgresql-7.4.b3.2003.09.30/src/test/regress/./tmp_check/data...
 ok
creating directory 
/usr/ports/databases/postgresql-devel/work/postgresql-7.4.b3.2003.09.30/src/test/regress/./tmp_check/data/base...
 ok
creating directory 
/usr/ports/databases/postgresql-devel/work/postgresql-7.4.b3.2003.09.30/src/test/regress/./tmp_check/data/global...
 ok
creating directory 
/usr/ports/databases/postgresql-devel/work/postgresql-7.4.b3.2003.09.30/src/test/regress/./tmp_check/data/pg_xlog...
 ok
creating directory 
/usr/ports/databases/postgresql-devel/work/postgresql-7.4.b3.2003.09.30/src/test/regress/./tmp_check/data/pg_clog...
 ok
selecting default shared_buffers... 1000
selecting default max_connections... 100
creating configuration files... ok
creating template1 database in 
/usr/ports/databases/postgresql-devel/work/postgresql-7.4.b3.2003.09.30/src/test/regress/./tmp_check/data/base/1...
 PANIC:  tuple is too big: size 268
Abort trap (core dumped)
### END ###

I'm not sure if this is a case of over-testing or not, however.
Someone with more PG heap foo than me will have to comment on this,
but, it seems as though the test at the bottom of
RelationGetBufferForTuple():

if (len > PageGetFreeSpace(pageHeader))
{
/* We should not get here given the test at the top */
elog(PANIC, "tuple is too big: size %lu", (unsigned long) len);
}

should be changed to:

if (len > MaxTupleSize)

based off of the comments.  *shrug* I've spent a few min looking at
the code, and I'm not sure where the problem is.  Jan or Tom?


3) Using 32K blocks, the postmaster takes a whopping 45MB of RAM!  From
   top(1):

  539 pgsql   960 44312K12K select   0:00  0.00%  0.00% postgre
  548 pgsql   960 44340K12K select   0:00  0.00%  0.00% postgre
  547 pgsql   960 45284K12K select   0:00  0.00%  0.00% postgre

Nifty, huh?  I haven't been able to figure out where the bloat is
coming from other than to mention that in production, with 8K blocks,
postmaster processes only take up 8MB in RAM.  Anyone know where the
extra is coming from? I can only assume it's from some kind of page
cache, but I don't see where that's being set.  Even then, I'd expect
only a 4X increase in proportion from going from 8K to 32K, not a 5x
increase.


4) effective_cache_size needs to be tuned accordingly because it is
   based off of pages, not bytes, which I think isn't portable across
   different installations with differing block sizes.
   effective_cache_size 

Re: [HACKERS] Index/Function organized table layout

2003-10-02 Thread James Rogers
On Wed, 2003-10-01 at 08:37, Tom Lane wrote:
> I think you'd need to do some basic architectural work first.  Right now
> we have a clean API for index access methods, but there is no comparable
> abstraction layer for heaps (tables).  It'd probably be necessary to
> create such a layer in order to allow different heap organizations to be
> supported.  Another point of confusion is that heaps and indexes are
> rigidly distinct.  Perhaps some heaps could be considered to be indexes
> as well, in order to support your idea of b-tree-organized tables.
> Doing that would undoubtedly break a few places though.


Ahhh, okay.  So it would require an architectural mod.

Still, it would probably be a worthwhile project (for me at least). 
There are a number of enterprise features that rely on a similar
architectural premise that would also be good to have eventually.  We
are at the point at this company where we need some of these features in
our databases for scalability and management purposes, but we aren't all
that eager to migrate to Oracle (the alternative).  From my standpoint,
it would be substantially cheaper to actively work on adding the
features needed to Postgres than to move to Oracle to get the same
features.

I clearly need to take a good look at the relevant source.


> > Both of these things really are attempts to address the same basic problem,
> > which is optimizing the number of buffers a given query uses by making the
> > tables layout reflect typical queries.
> 
> Hm, are you sure that smarter buffer management wouldn't serve the
> purpose?


No, the problem isn't really buffer management; the buffer usage
behaviors are largely valid for the situation.  The problem is that the
number of relevant tuples per buffer is low for a given query, so we
aren't getting much tuple bang for the buffer buck.  I suppose one could
look at it as trying to improve the intra-buffer hit rate for a given
query.

Cheers,

-James Rogers
 [EMAIL PROTECTED]



---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] Index/Function organized table layout

2003-10-02 Thread James Rogers
On Wed, 2003-10-01 at 09:29, Alvaro Herrera wrote:
> On Wed, Oct 01, 2003 at 11:37:38AM -0400, Tom Lane wrote:
> > Hm, are you sure that smarter buffer management wouldn't serve the
> > purpose?
> 
> It doesn't help when there a lot of access locality in searching.  In my
> case I want to select some thousands of records that were inserted very
> apart from each other, but are logically very near.  Having this
> pseudoheap that is ordered by definition helps very much with the
> selection; the current heap requires me to bring to buffers lots of
> uninteresting tuples, whichever buffer management algorithm is used,
> because they are in the same page as interesting tuples.


Yes, what Alvaro said.

For very large tables that routinely run modest range queries, it can be
very expensive in terms of cache efficiency if tuples that are closely
grouped and ordered logically are scattered throughout the heap.  The
requirement to buffer a lot of unrelated data for typical case queries
can greatly reduce the cache hit rate if the active portion of the data
is already quite large relative to the physical RAM available.

To give a real world example, a standard query on one of our tables that
has not been CLUSTER-ed recently (i.e. within the last several days)
generates an average of ~2,000 cache misses.  Recently CLUSTER-ed, it
generates ~0 cache misses on average.  Needless to say, one is *much*
faster than the other.  The problem is that the number of buffers
required to satisfy this query with the tuples scattered is enough to
make it swap out the buffers of another competing query on another table
that is also running.  The result is that performance grinds to a halt
as processes are competing with each other and trying to swap out each
others buffers, resulting in a lot less *actual* buffering than should
be occurring given the amount of data actually being queried.

In my case, not only does CLUSTER-ing increase the number of concurrent
queries possible without disk thrashing by an integer factor, but the
number of buffers touched on a query that generates a cache misses is
greatly reduced as well.  The problem is that CLUSTER-ing is costly and
index-organizing some of the tables would reduce the buffer needs, since
the index tuple in these cases are almost as large as the heap tuples
they reference.

The classic scenario for this is when you have a large collection of
time-series data stored in a table, with each series keyed to another
table.  The the typical tuple distribution creates pathological
behaviors when buffer space becomes tight.

Cheers,

-James Rogers
 [EMAIL PROTECTED]





---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [HACKERS] Thoughts on maintaining 7.3

2003-10-02 Thread Joshua D. Drake
Hello,

 Possible scenario for maintaining 7.3:

 Only one or two committers  using a two stage cvs... one stage for 
testing (not including sandbox), one stage for commit.
 Scheduled releases based on non-critical fixes. Quarterly? Of course 
critical fixes should be released as soon as plausible.

 Separate mailing list for 7.3 issues, concerns etc... Which would help 
develop it's own temporary community.

Thoughts?

Joshua D. Drake

Tom Lane wrote:

Robert Treat <[EMAIL PROTECTED]> writes:
 

and the question as i thought was being discussed (or should be
discussed) was what is the level of interest in having this work kept in
the community cvs tree vs. someone else's quasi-forked branch... 
   

I see no reason that the maintenance shouldn't be done in the community
CVS archive.  The problem is where to find the people who want to do it.
Of course we have to trust those people enough to give them write access
to the community archive, but if they can't be trusted with that, one
wonders who's going to trust their work product either.
			regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly
 

--
Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC
Postgresql support, programming shared hosting and dedicated hosting.
+1-503-222-2783 - [EMAIL PROTECTED] - http://www.commandprompt.com
Editor-N-Chief - PostgreSQl.Org - http://www.postgresql.org


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [HACKERS] query plan different for "SELECT ..." and "DECLARE

2003-10-02 Thread Joe Conway
Tom Lane wrote:
David Blasby <[EMAIL PROTECTED]> writes:
The only real problem is that the user has to manual keep stats 
up-to-date.  Is there anyway to attach something to VACUUM ANALYSE?
The ANALYZE code is set up with the idea that there could be multiple
analysis methods and various kinds of stuff stored in pg_statistic.
Right now there isn't any way to physically plug in a different analysis
routine :-( but it seems like it'd be reasonable to add some hook of
that kind.  Perhaps CREATE TYPE could be extended to specify an analysis
routine for ANALYZE to call for columns of that type.
Please take a look at pg_statistic.h and commands/analyze.c and see if
this would do the job for you.  Obviously it's too late for 7.4 but we
could think about a solution in 7.5.
It would be interesting if a PL/R function could be plugged in for both 
the ANALYZE function and the selectivity function. There are quite a few 
spatial data related packages available for R; one of them might fit 
nicely for this application.

Joe

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?
  http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] query plan different for "SELECT ..." and "DECLARE CURSOR ..."?

2003-10-02 Thread Tom Lane
David Blasby <[EMAIL PROTECTED]> writes:
> The only real problem is that the user has to manual keep stats 
> up-to-date.  Is there anyway to attach something to VACUUM ANALYSE?

The ANALYZE code is set up with the idea that there could be multiple
analysis methods and various kinds of stuff stored in pg_statistic.
Right now there isn't any way to physically plug in a different analysis
routine :-( but it seems like it'd be reasonable to add some hook of
that kind.  Perhaps CREATE TYPE could be extended to specify an analysis
routine for ANALYZE to call for columns of that type.

Please take a look at pg_statistic.h and commands/analyze.c and see if
this would do the job for you.  Obviously it's too late for 7.4 but we
could think about a solution in 7.5.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [HACKERS] query plan different for "SELECT ..." and "DECLARE

2003-10-02 Thread David Blasby
Tom Lane wrote:

You may need to bite the bullet and try to devise some real selectivity
estimation techniques for your geometric operators.  The stuff in
src/backend/utils/adt/geo_selfuncs.c at the moment is all just stubs :-(
I've already done this - it actually gives pretty accurate estimates.
Basically, I do a two-stage pass through the geometry table.  The first 
pass just calculates the bounding box of all the geometries.   I make a 
2D histogram structure (much like a really really simple quad tree) that 
fits this bounding box.  The 2nd pass looks at each individual geometry 
in the table - it updates one (or more) cells in the 2D histogram.

When the planner asks for the selectivity of a '&&' query, I pull in the 
2D Histogram (its in the geometry_column metadata table) and look at the 
query's 'query bounding box' and see which cells of the 2D histogram it 
overlaps.  I make an estimate of the number of rows the query will 
return by looking at the % overlap of the query window and the number of 
geometries in that 2D histogram cells.  You'd think such a crude test 
wouldnt give you good results, but in fact its amazingly accurate.

If for some reason I cannot calculate an estimate, I fall back to the 
geo_selfuncs.c method and return some really small number.

The PostGIS mailing list (postgis.refractions.net) has more technical 
details.

The only real problem is that the user has to manual keep stats 
up-to-date.  Is there anyway to attach something to VACUUM ANALYSE?

As an aside, PostGIS is now passing the Open GIS Consortium's "Simple 
Features For SQL" conformance test.  We're hoping to submit it for the 
OGC stamp of approval "very soon."

Thanks for the info on start-up cost being more heavily weighted.  The 
user who reported this problem didnt have the index-selectivity-stats 
package turned on in their database.  Once they turned it on, everything 
worked correctly.  I was just confused as to why the DECLARE and SELECT 
were making different plans.

dave



---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [HACKERS] Weird locking situation

2003-10-02 Thread Hannu Krosing
Tom Lane kirjutas N, 02.10.2003 kell 17:30:
> Christopher Kings-Lynne <[EMAIL PROTECTED]> writes:
> > What is going on here?  Surely getting a FOR UPDATE row lock should 
> > prevent another process getting an update lock?

> The behavior you describe would certainly be a bug, but you'll have to
> show a reproducible example to convince me it wasn't pilot error.  One
> idea that springs to mind is that maybe additional rows with id=1 were
> inserted (by some other transaction) between the SELECT FOR UPDATE and
> the UPDATE?

Perhaps he was looking for "key locking", so thet "select ... where
key=1 for update" would also prevent inserts where key=1 ?


Hannu



---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [HACKERS] Weird locking situation

2003-10-02 Thread Tom Lane
Christopher Kings-Lynne <[EMAIL PROTECTED]> writes:
> What is going on here?  Surely getting a FOR UPDATE row lock should 
> prevent another process getting an update lock?

I could not duplicate your results.  I did

regression=# create table tab(id int , blah int);
CREATE TABLE
regression=# insert into tab values(1,1);
INSERT 320558 1
regression=# insert into tab values(1,2);
INSERT 320559 1
regression=# insert into tab values(2,3);
INSERT 320560 1
regression=# BEGIN;
BEGIN
regression=# SELECT * FROM tab WHERE id=1 FOR UPDATE;
 id | blah
+--
  1 |1
  1 |2
(2 rows)

<< in another window >>

regression=# UPDATE tab SET blah=1 WHERE id=1;
[waits]

<< back to first window >>

regression=# UPDATE tab SET blah=1 WHERE id=1;
UPDATE 2
regression=# end;
COMMIT

<< second window now reports >>

UPDATE 2
regression=#

The behavior you describe would certainly be a bug, but you'll have to
show a reproducible example to convince me it wasn't pilot error.  One
idea that springs to mind is that maybe additional rows with id=1 were
inserted (by some other transaction) between the SELECT FOR UPDATE and
the UPDATE?

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [HACKERS] minor view creation weirdness

2003-10-02 Thread Tom Lane
Neil Conway <[EMAIL PROTECTED]> writes:
> Is this a bug?

> nconway=# create view baz (a,b) as select 'hello', 'world';
> WARNING:  column "a" has type "unknown"
> DETAIL:  Proceeding with relation creation anyway.

It's always done that, although the spelling of the notice has
varied over the years.

These days we tend to force "unknown" to become "text" if a specific
data type is really required, and I suppose a case could be made that
CREATE VIEW should do that too.  But the consequences of guessing wrong
are probably worse here than elsewhere, since you can't really change
the view column type short of dropping and recreating the view.

I'd almost argue that we should change this message to an error:

ERROR: column "a" has type "unknown"
HINT: Explicitly cast the string literal to some specific type.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])


Re: [HACKERS] Thoughts on maintaining 7.3

2003-10-02 Thread Rod Taylor

> For example, if you have a timestamp index and you routinely clean out
> all entries older than N-days-ago, you won't have a problem in 7.4.
> If your pattern is to delete nine out of every ten entries (maybe you
> drop minute-by-minute entries and keep only hourly entries after awhile)
> then you might find the index loading getting unpleasantly low.  We'll
> have to see whether it's a problem in practice.  I'm willing to revisit
> the page-merging problem if it's proven to be a real practical problem,
> but it looked hard enough that I think it's more profitable to spend the
> development effort elsewhere until it's proven necessary.

A pattern I have on a few tables is to record daily data.  After a
period of time, create an entry for a week that is the sums of 7 days,
after another period of time compress 4 weeks into a month.

Index is on the date representing the block. It's a new insert, but
would go onto the old page. Anyway, I don't have that much data (~20M
rows) -- but I believe it is a real-world example of this pattern.


signature.asc
Description: This is a digitally signed message part


Re: [HACKERS] minor view creation weirdness

2003-10-02 Thread Oliver Elphick
On Thu, 2003-10-02 at 08:40, Greg Stark wrote:
> Neil Conway <[EMAIL PROTECTED]> writes:
> 
> > Is this a bug?
> > 
> > (using CVS code from yesterday)
> > 
> > nconway=# create view baz (a,b) as select 'hello', 'world';
> > WARNING:  column "a" has type "unknown"
> > DETAIL:  Proceeding with relation creation anyway.
> > WARNING:  column "b" has type "unknown"
> > DETAIL:  Proceeding with relation creation anyway.
> > CREATE VIEW
> > nconway=# 
> 
> 7.3 does the same thing actually. I don't know what that means though.

junk=# \d baz
   View "public.baz"
 Column |   Type| Modifiers
+---+---
 a  | "unknown" |
 b  | "unknown" |
View definition:
 SELECT 'hello' AS a, 'world' AS b;
 

There is no table behind the view, so there is no way for PostgreSQL to
derive the column types of a and b.  A quoted string (as supplied in the
view definition) could be one of text, varchar, char, date, time,
timestamp, cidr and so on.


-- 
Oliver Elphick[EMAIL PROTECTED]
Isle of Wight, UK http://www.lfix.co.uk/oliver
GPG: 1024D/3E1D0C1C: CA12 09E0 E8D5 8870 5839  932A 614D 4C34 3E1D 0C1C
 
 "Set your affection on things above, not on things on 
  the earth."  Colossians 3:2 


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


[HACKERS] Weird locking situation

2003-10-02 Thread Christopher Kings-Lynne
Hi guys,

I'm just trying to understand this situation:

Session 1
-
BEGIN;
SELECT * FROM tab WHERE id=1 FOR UPDATE;
Session 2
-
UPDATE tab SET blah=1 WHERE id=1;

Session 1
-
UPDATE tab SET blah=1 WHERE id=1;
ERROR: deadlock detected
Session 2
-
...update has gone through.
What is going on here?  Surely getting a FOR UPDATE row lock should 
prevent another process getting an update lock?

Chris



---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [HACKERS] PREPARE/EXECUTE across backends?

2003-10-02 Thread Karel Zak

On Wed, Oct 01, 2003 at 09:01:23PM -0400, Neil Conway wrote:
> On Wed, 2003-10-01 at 20:25, Jingren Zhou wrote:
> > From the document, it seems that PREPARE/EXECUTE works only in the same 
> > session. I am wondering whether postgres can prepare a query (save the plan) 
> > for difference backends.
> 
> The decision to store prepared statements per-backend, rather than in
> shared memory, was made deliberately. In fact, an early version of the
> PREPARE/EXECUTE patch (written by Karel Zak) stored prepared statements
> in shared memory. But I decided to remove this, because:

> That's all the reasons I can think of off the top of my head for doing
> things the way we do. However, I'm open to being convinced: if you think
> we should store prepared statements in shm, feel free to make a case for
> it.

 I think  the current non-shared PREPARE/EXECUTE  is right solution. The
 shared version  require define  new type  of memory  managment routines
 which is full compatible with the standard and abstract PostgreSQL mmgr
 tier.  I worked  on this because it was interesting  experiment and now
 we know that write something like this is possible :-)

 I  think final  and right  solution are  persisten pre-forked  backends
 which know  to keep cached PREPARE/EXECUTE  stuff (and a lot  of others
 things) in  own memory. It's nice  and simple solution than  use shared
 memory.
 
Karel

-- 
 Karel Zak  <[EMAIL PROTECTED]>
 http://home.zf.jcu.cz/~zakkr/

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] ADD FOREIGN KEY

2003-10-02 Thread Greg Stark

Christopher Browne <[EMAIL PROTECTED]> writes:

> I would, given an ideal world, prefer to be able to have a connection
> or two live during this to let me monitor the DB and even get an early
> peek at the data.  

On that note, how hard would it be to implement a read-dirty mode in postgres?
This would be useful for few things, the only thing I can think of are
progress indicators for long-running updates/inserts.

It seems like it falls naturally out of the MVCC algorithm, simply have it set
the transaction id of the current transaction to be a magic value that
compares greater than any transaction id. So all uncommitted transactions are
seen as having been committed in the past.

I don't see any real need for updates or inserts, but reasonable semantics for
them also fall out of MVCC. Any updates or inserts should be seen as being
committed infinitely far in the future. So they can only be seen by other
read-dirty transactions.

The main use for this that I see are doing select count(*) on tables being
imported or inserted into. Or perhaps being able to peek at records being
updated by another session in a long-running job.

If nothing else it'll save the load on the mailing list every time people ask
how to calculate how much longer their data load is going to take based on the
size of the files in the postgres data directory.

I'm sure I'm skipping a few steps. What I said doesn't quite make sense on its
own. I think I'm missing some key elements of the postgres MVCC system.

-- 
greg


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] minor view creation weirdness

2003-10-02 Thread Greg Stark
Neil Conway <[EMAIL PROTECTED]> writes:

> Is this a bug?
> 
> (using CVS code from yesterday)
> 
> nconway=# create view baz (a,b) as select 'hello', 'world';
> WARNING:  column "a" has type "unknown"
> DETAIL:  Proceeding with relation creation anyway.
> WARNING:  column "b" has type "unknown"
> DETAIL:  Proceeding with relation creation anyway.
> CREATE VIEW
> nconway=# 

7.3 does the same thing actually. I don't know what that means though.


-- 
greg


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


[HACKERS] minor view creation weirdness

2003-10-02 Thread Neil Conway
Is this a bug?

(using CVS code from yesterday)

nconway=# create view baz (a,b) as select 'hello', 'world';
WARNING:  column "a" has type "unknown"
DETAIL:  Proceeding with relation creation anyway.
WARNING:  column "b" has type "unknown"
DETAIL:  Proceeding with relation creation anyway.
CREATE VIEW
nconway=# 

-Neil



---(end of broadcast)---
TIP 8: explain analyze is your friend