Re: [HACKERS] PostgreSQL - 'SKYLINE OF' clause added!

2007-03-07 Thread Josh Berkus
Gavin,

 Personally, I'd love to see some of these newer data analysis
 capabilities added to PostgreSQL -- or at least put out there as
 interesting patches.

I think if the code is good enough, and we can avoid horrible non-standard 
syntax extensions, they should go in.   We have to defend our title as most 
advanced database and having stuff like Skyline first (before DB2 or MS) 
goes a long way for that.

-- 
Josh Berkus
PostgreSQL @ Sun
San Francisco

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Bug: Buffer cache is not scan resistant

2007-03-07 Thread Marko Kreen

On 3/7/07, Hannu Krosing [EMAIL PROTECTED] wrote:

Do any of you know about a way to READ PAGE ONLY IF IN CACHE in *nix
systems ?


Supposedly you could mmap() a file and then do mincore() on the
area to see which pages are cached.

But you were talking about postgres cache before, there it should
be easily implementable.

--
marko

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Auto creation of Partitions

2007-03-07 Thread Peter Eisentraut
Am Mittwoch, 7. März 2007 07:44 schrieb NikhilS:
 Sure, but as Chris mentioned earlier, wouldn't it be useful to maintain
 uniqueness on a
 partition-by-partition basis too?

Many things might be useful, but the aim of the table partitioning venture 
is believed to be the provision of a transparent interface to the existing 
do-it-yourself partitioning facilities.  Therefore, the logical definition of  
a table must continue to work unchanged (or alternatively, the use of the 
feature must be prohibited if that cannot be guaranteed in a particular 
case).  Other features such as uniqueness on a partition basis might also be 
useful but you can do that today and you don't need partitioning for that.

-- 
Peter Eisentraut
http://developer.postgresql.org/~petere/

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Auto creation of Partitions

2007-03-07 Thread Simon Riggs
On Tue, 2007-03-06 at 18:31 +0530, NikhilS wrote:

 On 3/6/07, Peter Eisentraut [EMAIL PROTECTED] wrote:
 NikhilS wrote:
  iv) Based on the PRIMARY, UNIQUE, REFERENCES information
 specified,
  pass it on to the children tables.
 
 How will you maintain a primary key in such a table,
 considering that
 indexes can't span multiple tables?
 
 We will not (I know its a hard thing to do :) ), the intention is to
 use this information from the parent and make it a property of the
 child table. This will avoid the step for the user having to manually
 specify CREATE INDEX and the likes on all the children tables
 one-by-one. 

If you know that the constraints on each of the tables is distinct, then
building a UNIQUE index on each of the partitions is sufficient to prove
that all rows in the combined partitioned table are distinct also.

The hard part there is checking that the partition constraints are
distinct. If the partition constraints are added one at a time, you can
use the predicate testing logic to compare the to-be-added partition's
constraint against each of the already added constraints. That becomes
an O(N) problem.

What is really needed is a data structure that allows range partitions
to be accessed more efficiently. This could make adding partitions and
deciding in which partition a specific value goes an O(logN) operation.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[HACKERS] Grouped Index Tuples / Clustered Indexes

2007-03-07 Thread Heikki Linnakangas
I've updated the GIT patch at http://community.enterprisedb.com/git/. 
Bitrot caused by the findinsertloc-patch has been fixed, making that 
part of the GIT patch a little bit smaller and cleaner. I also did some 
refactoring, and minor cleanup and commenting.


Any comments on the design or patch? For your convenience, I copied the 
same text I added to access/nbtree/README to 
http://community.enterprisedb.com/git/git-readme.txt


Should we start playing the name game at this point? I've been thinking 
we should call this feature just Clustered Indexes, even though it's not 
exactly the same thing as clustered indexes in other DBMSs. From user 
point of view, they behave similarly enough that it may be best to use 
the existing term.


As a next step, I'm hoping to get the indexam API changes from the 
bitmap index patch committed soon, and in a way that supports GIT as well.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Auto creation of Partitions

2007-03-07 Thread NikhilS

Hi,

If you know that the constraints on each of the tables is distinct, then

building a UNIQUE index on each of the partitions is sufficient to prove
that all rows in the combined partitioned table are distinct also.

The hard part there is checking that the partition constraints are
distinct. If the partition constraints are added one at a time, you can
use the predicate testing logic to compare the to-be-added partition's
constraint against each of the already added constraints. That becomes
an O(N) problem.

What is really needed is a data structure that allows range partitions
to be accessed more efficiently. This could make adding partitions and
deciding in which partition a specific value goes an O(logN) operation.



If the unique constraint is supposed to be on a column which is NOT being
used for the partitioning, then all the above becomes much more difficult.
While partitioning, the additional onus on the user is to specify
non-conflicting CHECKs for the range/list partitions.

Regards,
Nikhils

--

  Simon Riggs
  EnterpriseDB   http://www.enterprisedb.com






--
EnterpriseDB   http://www.enterprisedb.com


Re: [HACKERS] Grouped Index Tuples / Clustered Indexes

2007-03-07 Thread Grzegorz Jaskiewicz
my only question would be. 
Why isn't that in core already ?

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Auto creation of Partitions

2007-03-07 Thread Zeugswetter Andreas ADI SD

  iv) Based on the PRIMARY, UNIQUE, REFERENCES information specified, 
  pass it on to the children tables.
 
 How will you maintain a primary key in such a table, 
 considering that indexes can't span multiple tables?

Many partitioning schemes have (or can be made to have) only one
possible target partition for the primary key. Thus if you create
separate unique indexes on each partition the problem is solved.
For a first version I opt, that it is sufficient to disallow creation of
a unique index on the master, when the constraints (on columns of this
index) do not nail down a specific partition for each row (e.g. a hash
or a range on one of the index columns that does not overlap).
Later, global index, or indexes with separate partitioning rules can be
implemented, that cover the other cases.

Andreas

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Grouped Index Tuples / Clustered Indexes

2007-03-07 Thread Luke Lonergan
+1


On 3/7/07 6:53 AM, Grzegorz Jaskiewicz [EMAIL PROTECTED] wrote:

 my only question would be.
 Why isn't that in core already ?
 
 ---(end of broadcast)---
 TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match
 



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Auto creation of Partitions

2007-03-07 Thread NikhilS

Hi,

On 3/7/07, Zeugswetter Andreas ADI SD [EMAIL PROTECTED] wrote:



  iv) Based on the PRIMARY, UNIQUE, REFERENCES information specified,
  pass it on to the children tables.

 How will you maintain a primary key in such a table,
 considering that indexes can't span multiple tables?

Many partitioning schemes have (or can be made to have) only one
possible target partition for the primary key. Thus if you create
separate unique indexes on each partition the problem is solved.
For a first version I opt, that it is sufficient to disallow creation of
a unique index on the master, when the constraints (on columns of this
index) do not nail down a specific partition for each row (e.g. a hash
or a range on one of the index columns that does not overlap).
Later, global index, or indexes with separate partitioning rules can be
implemented, that cover the other cases.

Andreas



Yes, I agree. For version 1, UNIQUE/PRIMARY indexes will cascade down to the
child table, only if the indexed column is present as part of the
partitioning rule.

Regards,
Nikhils

--
EnterpriseDB   http://www.enterprisedb.com


Re: [HACKERS] Patch license update to developer's FAQ

2007-03-07 Thread Zeugswetter Andreas ADI SD

 I don't think it is common.  I didn't add that part, so if 
 you also think it is rare, I will remove that distinction.  New text:
 
 liPostgreSQL is licensed under a BSD license.  By posting a
patch
 to the public PostgreSQL mailling lists, you are giving the
PostgreSQL
 Global Development Group the non-revokable right to distribute
your
 patch under the BSD license.  If the patch is not BSD-licensed, it
 will be rejected./li

I would remove the last sentence, since it puts the responsibility back
on the group (namely to actively reject).

Andreas

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Auto creation of Partitions

2007-03-07 Thread Alvaro Herrera
I am wondering if we can implement unique indexes across several tables
(inheritance hierarchy) not by using a single, big index covering all
the tables, but rather by inserting a dummy entry into each partition's
unique index.  This dummy entry would have an expanded CTID which would
include the tableoid, so it's possible to check it (albeit there is a
problem in that we may require the opening of another heap to do the
actual checking).  These dummy entries could be removed by bulkcleanup
as soon as the inserting transaction is no longer running, to avoid
bloating the index too much.  All said dummy index entries would be
located at either the rightmost or the leftmost leaf, or close to it, so
another idea is to have future inserters reuse the entry for a different
key.

The obvious problem with this is, naturally, the excess I/O that extra
index traversing causes.  The not so obvious ones are locking,
deadlocking and the opening of other heaps and indexes while you do the
insertion, which may be too expensive.  On the other hand, maybe this
idea is easier to implement than full-fledged cross-table indexes, so we
could have richer partitioning earlier than when somebody finally bites
the bullet and implements cross-table indexes.

Or maybe this is just a dumb idea, but I had to let it out anyway :-)

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] PostgreSQL - 'SKYLINE OF' clause added!

2007-03-07 Thread Tom Lane
Josh Berkus josh@agliodbs.com writes:
 I think if the code is good enough, and we can avoid horrible non-standard 
 syntax extensions, they should go in.   We have to defend our title as most 
 advanced database and having stuff like Skyline first (before DB2 or MS) 
 goes a long way for that.

Well, whether it's horrible or not is in the eye of the beholder, but
this is certainly a non-standard syntax extension.

My questions about whether to adopt it have more to do with
cost/benefit.  I haven't seen the patch, but it sounds like it will be
large and messy; and it's for a feature that nobody ever heard of before,
let alone one that the community has developed a consensus it wants.
I'm not interested in adopting stuff just because DB2 hasn't got it.

It's also worth noting that what we've got here is a large patch
developed, by students, completely outside our normal development
process; so the odds that it's going to be anywhere near acceptable are
low.  I think the last time we applied a patch that met that description
was the INTERSECT/EXCEPT patch in 1999 ... maybe you don't remember
what a fiasco that was, but I do.

Sorry to be a thrower of cold water, but I just don't see that this
comes anywhere near being something we should be eager to accept.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Auto creation of Partitions

2007-03-07 Thread Luke Lonergan
Simon,

On 3/7/07 5:26 AM, Simon Riggs [EMAIL PROTECTED] wrote:

 What is really needed is a data structure that allows range partitions
 to be accessed more efficiently. This could make adding partitions and
 deciding in which partition a specific value goes an O(logN) operation.

I think we need to re-evaluate the inheritance mechanism for partitioning
and implement something much closer to the storage layer, similar to Oracle.

Having the constraint exclusion occur in the planner is not flexible enough
to allow more advanced solutions.

- Luke 



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] GIST and TOAST

2007-03-07 Thread Gregory Stark
Teodor Sigaev [EMAIL PROTECTED] writes:

 input value. As I remember, only R-Tree emulation over boxes, contrib/seg 
 and
 contrib/cube have simple compress method.
 Hm, if they just return the original datum without detoasting it then it 
 could
 be an issue. I'll check.
 seg and box aren't a varlena types, but cube is and it seems broken :(.
 g_cube_decompress and g_cube_compress don't detoast values. I'll fix that.

Also, all the cube operators like cube_union, cube_size, cube_cmp, etc.

Would you like me to do it or are you already started?

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Auto creation of Partitions

2007-03-07 Thread Simon Riggs
On Wed, 2007-03-07 at 10:23 -0500, Luke Lonergan wrote:
 Simon,
 
 On 3/7/07 5:26 AM, Simon Riggs [EMAIL PROTECTED] wrote:
 
  What is really needed is a data structure that allows range partitions
  to be accessed more efficiently. This could make adding partitions and
  deciding in which partition a specific value goes an O(logN) operation.
 
 I think we need to re-evaluate the inheritance mechanism for partitioning
 and implement something much closer to the storage layer, similar to Oracle.

Oracle's implementation is fairly close to ours, actually. Each
partition is a segment. SQLServer's is fairly low level.

 Having the constraint exclusion occur in the planner is not flexible enough
 to allow more advanced solutions.

It depends what those advanced solutions are. I definitely want to be
able to do run-time exclusion, push down merge joins and parallelism,
but I also think that being able to index only the first 3 partitions is
a useful feature too.

ISTM that making the Append node responsible for exclusion might be a
way to go with this, but various ways are possible, I'm sure. The trick
is to find one that does everything you need and that will take some
deeper analysis.

However you do it, you'll still need a way of deciding which partitions
to include/exclude that doesn't involve a sequential scan of all
partition constraints. So my comments about a different index structure
are still going to be relevant, wherever that lives/executes.

I'm not doing anything in this area personally for 8.3 though.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Time-correlated columns in large tables

2007-03-07 Thread Luke Lonergan
Jeroen,

On 3/5/07 12:39 PM, Jeroen T. Vermeulen [EMAIL PROTECTED] wrote:

 I guess if you did simple run-length compression on these bitmaps you'd
 end up more or less where I came in.  But you wouldn't want to flip a bit
 somewhere in the middle of a compressed data stream, of course. :-

We handle that by doing a recompression in page if possibly, page splitting
if not.

Jie/Gavin's work will initially be an equality encoded bitmap as Heikki
indicates, soon after we can implement range encoding, etc.

- Luke



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] GIST and TOAST

2007-03-07 Thread Teodor Sigaev
I'm already started, don't worry about that. Cube is broken since TOAST 
implemented :)


Gregory Stark wrote:

Teodor Sigaev [EMAIL PROTECTED] writes:


input value. As I remember, only R-Tree emulation over boxes, contrib/seg and
contrib/cube have simple compress method.

Hm, if they just return the original datum without detoasting it then it could
be an issue. I'll check.

seg and box aren't a varlena types, but cube is and it seems broken :(.
g_cube_decompress and g_cube_compress don't detoast values. I'll fix that.


Also, all the cube operators like cube_union, cube_size, cube_cmp, etc.

Would you like me to do it or are you already started?



--
Teodor Sigaev   E-mail: [EMAIL PROTECTED]
   WWW: http://www.sigaev.ru/

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


[HACKERS] Test report on GENERATED/IDENTITY

2007-03-07 Thread Zoltan Boszormenyi

Hi,

I made some tests to prove that GENERATED can help
boost performance. I created a table like this:

create table t1 (
 id serial,
 i1 integer,
 i2 integer,
 g1 integer generated always as (
 case when i1 is null then i2
when i2 is null then i1
else i1 + i2 end));

I inserted 1 million records into the table:

 for (i = 1; i = 1000; i++)
   for (j = 1; j = 1000; j++)
  INSERT INTO t1 (i1, i2) VALUES (i, j);

After VACUUM FULL ANALYZE,
I timed  SELECT id, i1, i2, g1 FROM t1
and SELECT id, i1, i2, generation expression FROM t1,
result redirected to /dev/null. Results of ten consecutive runs are:

SELECT id, i1, i2, g1 FROM t1
--

2.59user 0.18system 0:04.75elapsed 58%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+33536minor)pagefaults 0swaps
2.57user 0.19system 0:04.59elapsed 60%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+34068minor)pagefaults 0swaps
2.60user 0.16system 0:04.60elapsed 60%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+33826minor)pagefaults 0swaps
2.59user 0.17system 0:04.82elapsed 57%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+34067minor)pagefaults 0swaps
2.59user 0.16system 0:04.61elapsed 59%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+34068minor)pagefaults 0swaps
2.61user 0.17system 0:04.60elapsed 60%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+33574minor)pagefaults 0swaps
2.59user 0.18system 0:04.60elapsed 60%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+34067minor)pagefaults 0swaps
2.59user 0.18system 0:04.67elapsed 59%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+32147minor)pagefaults 0swaps
2.58user 0.19system 0:04.63elapsed 59%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+33584minor)pagefaults 0swaps
2.73user 0.16system 0:04.94elapsed 58%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+34066minor)pagefaults 0swaps

Average is 4.68 seconds.

SELECT id, i1, i2, generation expression FROM t1
--

2.76user 0.18system 0:05.49elapsed 53%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+33945minor)pagefaults 0swaps
2.69user 0.17system 0:05.23elapsed 54%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+33917minor)pagefaults 0swaps
2.60user 0.18system 0:05.04elapsed 55%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+32027minor)pagefaults 0swaps
2.64user 0.16system 0:05.03elapsed 56%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+32626minor)pagefaults 0swaps
2.60user 0.15system 0:05.03elapsed 54%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+32550minor)pagefaults 0swaps
2.77user 0.18system 0:05.21elapsed 56%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+32609minor)pagefaults 0swaps
2.71user 0.17system 0:05.07elapsed 57%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+33043minor)pagefaults 0swaps
2.66user 0.17system 0:05.12elapsed 55%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+33042minor)pagefaults 0swaps
2.80user 0.16system 0:05.19elapsed 57%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+33043minor)pagefaults 0swaps
2.71user 0.16system 0:05.14elapsed 56%CPU (0avgtext+0avgdata 0maxresident)k
0inputs+0outputs (0major+32052minor)pagefaults 0swaps

Average is 5.16 seconds.

Using a single GENERATED column gains a good 9.4% in runtime.

With the IDENTITY, the picture is not so bright.
I expected some cost but not that much. Why is
[simple_]heap_update() so expensive? I created a table and
times inserting 1 million rows into it:

create table t2 (id serial, i1 integer);
or
create table t2 (id serial generated always as identity, i1 integer);

Using a serial column gave me about 12 seconds
on the average of 5 runs. With an IDENTITY column,
I got 61 seconds once and 66 seconds twice.
So, the strictness of the identity column gave me 500-550%
performance penalty.

With a single unique index on i1, I got 24.4 seconds
with the serial column and 67 seconds for the identity
column. I run these only once so this last one isn't
representative.

I tried to use heap_inplace_update() to update the
newly updated or inserted tuple in place but it gave me

ERROR:  heap_inplace_update: wrong tuple length

even when I already filled the IDENTITY column with
a constant Datum with an Int64 value 0 converted to
the type of the column.

If I read it correctly, the HOT patch would give me
a speedup for this case?

Best regards,
Zoltán Böszörményi


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Auto creation of Partitions

2007-03-07 Thread Zeugswetter Andreas ADI SD

  What is really needed is a data structure that allows range
partitions 
  to be accessed more efficiently. This could make adding partitions
and 
  deciding in which partition a specific value goes an O(logN)
operation.
 
 I think we need to re-evaluate the inheritance mechanism for 
 partitioning and implement something much closer to the 
 storage layer, similar to Oracle.
 
 Having the constraint exclusion occur in the planner is not 
 flexible enough to allow more advanced solutions.

Whoa, do you have anything to back that up ?
You would need to elaborate what you actually mean, but I think it is
moot.
Sure, the constraint technique can be further extended (e.g. during
runtime), but imho the approach is very good.

Andreas

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[HACKERS] Bug in VACUUM FULL ?

2007-03-07 Thread Pavan Deolasee


Hi,

I am right now working on to get HOT and VACUUM FULL work
together. I hit upon a bug which I initially thought is
something that HOT has introduced. But I can reproduce
it with CVS HEAD as well.

Here is what I do: Create a table a simple table with
three columns and one index. Insert a single row in the
table.

CREATE TABLE test (a int, b int, c char(512));
CREATE UNIQUE INDEX testindx ON  test (a);
INSERT INTO test VALUES (1, 2, 'test');

Now, I continuosly UPDATE the same row with a simple
sql script using pgbench with two clients.

$ cat test.sql
   
UPDATE test set b = b + 10 WHERE a = 1;

SELECT *, ctid FROM test;

$ ./pgbench -c 2 -t 5 -f ./test.sql postgres

Now, I run VACUUM FULL on the table in a loop with 20
seconds sleep.

$ while (true); do
echo VACUUM FULL START;
./install/bin/psql -c 'vacuum full verbose test;' postgres;
echo VACUUM FULL COMPLETE;
sleep 20;
done;


After few seconds pgbench fails with the following output:

starting vacuum...end.
Client 1 aborted in state 0: ERROR:  could not read block 650 of 
relation 1663/11467/16401: read only 0 of 8192 bytes
Client 0 aborted in state 0: ERROR:  could not read block 649 of 
relation 1663/11467/16401: read only 0 of 8192 bytes

transaction type: Custom query
scaling factor: 1
number of clients: 2
number of transactions per client: 5
number of transactions actually processed: 29445/10
tps = 459.980394 (including connections establishing)
tps = 460.040423 (excluding connections establishing)


Is this something which has been reported earlier ? My first
guess would be some kind of race condition between the FSM
updates and the relation truncation.

Its too late for me to look into this now. If someone wants
to look into this, that would be great. I would otherwise
work on this tomorrow. Any clues/pointers are appreciated.


Thanks,
Pavan


EnterpriseDBhttp://www.enterprisedb.com


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] PostgreSQL - 'SKYLINE OF' clause added!

2007-03-07 Thread David Fuhry
The query Ranbeer gave - as with any skyline query - can be solved with 
just pure SQL:


select * from books b where not exists(
  select * from books b2 where
b2.rating = b.rating and b2.price = b.price and
(b2.rating  b.rating or b2.price  b.price)
);
 book_name | rating | price
---++---
 Prodigal Daughter |  3 |   250
 The Notebook  |  4 |   300
 Fountain Head |  5 |   350
(3 rows)

The idea of the BNL (block nested loop) skyline algorithm is to avoid 
the nested loop by storing dominating records as the query proceeds - 
in the above example, records which are relatively high in rating and 
low in price - and comparing each candidate record to those first.


BNL is the most reasonable skyline algorithm in the absence of a 
multidimensional (usually R-Tree) index on the columns.  For answering 
skyline queries where such an index exists over all query columns, the 
most broadly used generalized algorithm is BBS [1].


Thanks,

Dave Fuhry

[1] Papadias, D., Tao, Y., Fu, G., and Seeger, B. 2005. Progressive 
skyline computation in database systems. ACM Trans. Database Syst. 30, 1 
(Mar. 2005), 41-82. DOI= http://doi.acm.org/10.1145/1061318.1061320


Gavin Sherry wrote:

On Tue, 6 Mar 2007, Alvaro Herrera wrote:


Also, keep in mind that there were plenty of changes in the executor.
This stuff is not likely to be very easy to implement efficiently using
our extant executor machinery; note that Ranbeer mentioned
implementation of block nested loop and other algorithms.  Not sure
how easy would be to fold that stuff into the optimizer for multi-input
aggregates, instead of hardwiring it to the SKYLINE OF syntax.



Yes, there's been a lot of working on calculating skyline efficiently,
with different sorting techniques and so on. This is the most interesting
part of the idea. You could calculate the query Ranbeer gave using pure
SQL and, perhaps, use of some covariance aggregates or something already.
Of course, it gets harder when you want to calculate across many
dimensions.

Personally, I'd love to see some of these newer data analysis
capabilities added to PostgreSQL -- or at least put out there as
interesting patches.

Thanks,

Gavin

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] PostgreSQL - 'SKYLINE OF' clause added!

2007-03-07 Thread Josh Berkus
Tom,

 My questions about whether to adopt it have more to do with
 cost/benefit.  I haven't seen the patch, but it sounds like it will be
 large and messy; and it's for a feature that nobody ever heard of before,
 let alone one that the community has developed a consensus it wants.
 I'm not interested in adopting stuff just because DB2 hasn't got it.

OK, to make it a clearer case: we have an increasing user base using 
PostgreSQL for decision support.  One of the reasons for this is that PG is 
the *only* OSDB which does a decent job of DSS.  Adding unique DSS features 
will make PostgreSQL attractive to a lot more DSS application developers, and 
help make up for the things which we don't have yet (parallel query, async 
I/O, windowing functions).  

Approximate queries is something with DSS users *want*.  Jim Grey addressed 
this in his ACM editiorial on the databases of the future.  It's something 
that *I* want, and if the Greenplum people aren't speaking up here, it's 
because they're not paying atttention.

Now, I don't know if this Skyline patch is our answer for approximate queries.  
Maybe I should pester Meredith about getting QBE free of its IP issues; it 
certainly looked more flexible than Skyline.  In either case, the code 
probably needs a complete refactor. 

But I think that approximate queries ought to be on our TODO list.

-- 
Josh Berkus
PostgreSQL @ Sun
San Francisco

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] Bug in VACUUM FULL ?

2007-03-07 Thread Tom Lane
Pavan Deolasee [EMAIL PROTECTED] writes:
 I am right now working on to get HOT and VACUUM FULL work
 together. I hit upon a bug which I initially thought is
 something that HOT has introduced. But I can reproduce
 it with CVS HEAD as well.

I think we broke this in 8.2: vac_update_relstats needs to ensure that
it always sends a relcache invalidation event, but as of 8.2 it's set
up to conditionally update the pg_class entry only if it wrote new
values into the tuple.  If vacuum full is truncating the rel back to
the same length that it was on the previous cycle (as is always the
case in this test), then no update, hence no relcache flush, hence
clients still think their cached rd_targblock is good.  I think the
code in vac_update_relstats

if (dirty)
heap_inplace_update(rd, ctup);

needs to look more like what index_update_stats does:

if (dirty)
{
heap_inplace_update(pg_class, tuple);
/* the above sends a cache inval message */
}
else
{
/* no need to change tuple, but force relcache inval anyway */
CacheInvalidateRelcacheByTuple(tuple);
}

Please check if this makes it go away for you --- I'm a bit busy
at the moment.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Auto creation of Partitions

2007-03-07 Thread Joris Dobbelsteen
-Original Message-
From: [EMAIL PROTECTED] 
[mailto:[EMAIL PROTECTED] On Behalf Of Alvaro Herrera
Sent: woensdag 7 maart 2007 15:59
To: NikhilS
Cc: Zeugswetter Andreas ADI SD; Peter Eisentraut; 
pgsql-hackers@postgresql.org
Subject: Re: [HACKERS] Auto creation of Partitions

I am wondering if we can implement unique indexes across 
several tables (inheritance hierarchy) not by using a single, 
big index covering all the tables, but rather by inserting a 
dummy entry into each partition's unique index.

I think that one big index is much better in this case. You are already
replicating the data and need a lot of work when updating the partition
structure (e.g. removing one).

[snip]

Rather:
If we have the partition relations (aka partition), then we can define a
unique index on it. This guarentees that there is at most one tuple with
the same value (for the specified columns) for every individual
partition.

Now for the inserts.
We already guarentee that is unique within the partition it lives. So we
must insert it first.
Next its to check the other partitions (in order!) for existence of a
row with a similar context. Of course we require full visibility of the
data.
We do the insert first, as this ensures other will find it. The order is
also important, otherwise there can be situations where we can't
guarentee the constraint.

Updates are exactly the same.
Deletes are trivial.

Perhaps you can do it without an index, but this is nothing different
than for a normal table.

- Joris Dobbelsteen

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] Synchronized Scan update

2007-03-07 Thread Jim Nasby

On Mar 6, 2007, at 9:43 AM, Josh Berkus wrote:

Don't get me wrong, I want things to be easily understandable as well
but the reason you site above pretty much
makes us need to remove most of the postgresql.conf, including all
bgwriter, vacuum cost delay, and autovac settings.
Not to mention commit delay and others ;).


Wouldn't that be nice!

The explosion of GUC settings is primarily a result of not enough  
information.
The reason there are 7 bgwriter settings, for example, is that we  
have no
idea what those settings should be and are hoping that people will  
tinker
with them and tell us.  Someday when I can fully profile bgwriter,  
we'll just
have one setting: bgwriter_aggressive, set to a number between 0  
and 9.


In the mean time; it would be great for these multiple-settings cases  
to be listed somewhere, indicating that it's something we could use  
help with. I think that with some explanation of what we're looking  
for there's any number of people who could do this kind of profiling.

--
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Grouped Index Tuples / Clustered Indexes

2007-03-07 Thread Simon Riggs
On Wed, 2007-03-07 at 10:32 +, Heikki Linnakangas wrote:
 I've been thinking 
 we should call this feature just Clustered Indexes 

Works for me.

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [HACKERS] WITH/RECURSIVE plans

2007-03-07 Thread Simon Riggs
On Fri, 2007-03-02 at 11:09 -0800, David Fetter wrote:
 On Fri, Mar 02, 2007 at 10:52:14AM -0800, Joshua D. Drake wrote:
  Hello,
  
  Unfortunately we (the community) will not have WITH/RECURSIVE for 8.3. 
  However I have spoken with a Alexey and Alvaro and Command Prompt has 
  decided to make WITH/RECURSIVE a priority for 8.4.
 
 Any chance we can get WITH without RECURSIVE?  That would be very
 handy all by itself.

I thought Greg already did submitted that?

-- 
  Simon Riggs 
  EnterpriseDB   http://www.enterprisedb.com



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Log levels for checkpoint/bgwriter monitoring

2007-03-07 Thread Jim Nasby

On Mar 6, 2007, at 10:11 PM, ITAGAKI Takahiro wrote:

I have some results that if we have plenty of time for checkpoints,
bgwriter_all_maxpages is not a so important parameter because it is
adjusted to shared_buffers / duration of checkpoint.
Also, my recommended bgwriter_lru_maxpages is average number of
recycled buffers per cycle, that is hardly able to tune manually.


What do you mean by 'number of recycled buffers per cycle?
--
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] PostgreSQL - 'SKYLINE OF' clause added!

2007-03-07 Thread Mark Kirkwood

Josh Berkus wrote:


Now, I don't know if this Skyline patch is our answer for approximate queries.  
Maybe I should pester Meredith about getting QBE free of its IP issues; it 
certainly looked more flexible than Skyline.  In either case, the code 
probably needs a complete refactor. 


But I think that approximate queries ought to be on our TODO list.



+1



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] PostgreSQL - 'SKYLINE OF' clause added!

2007-03-07 Thread Gavin Sherry
On Wed, 7 Mar 2007, Josh Berkus wrote:

 Approximate queries is something with DSS users *want*.  Jim Grey addressed
 this in his ACM editiorial on the databases of the future.  It's something
 that *I* want, and if the Greenplum people aren't speaking up here, it's
 because they're not paying atttention.

 Now, I don't know if this Skyline patch is our answer for approximate queries.
 Maybe I should pester Meredith about getting QBE free of its IP issues; it
 certainly looked more flexible than Skyline.  In either case, the code
 probably needs a complete refactor.

What people want from approximate queries is different to this: the
desire is usually to balance run time with level of accuracy/quality (some
times the desire is to have accurate results as well as similar results).
Neither skyline or QBE are about this. The only thing in the spec which
addresses this is 'tablesample'.

Thanks,

Gavin

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] PostgreSQL - 'SKYLINE OF' clause added!

2007-03-07 Thread Luke Lonergan
Yep - we're paying attention Josh!

I like the category being explored with skyline, I'm not sure yet how it fits 
with existing 'soft data' models and applications that use them.

If SKYLINE is interesting to app developers, maybe we should consider it for 
Bizgres?

- Luke

Msg is shrt cuz m on ma treo

 -Original Message-
From:   Gavin Sherry [mailto:[EMAIL PROTECTED]
Sent:   Wednesday, March 07, 2007 05:44 PM Eastern Standard Time
To: Josh Berkus
Cc: Tom Lane; pgsql-hackers@postgresql.org; Alvaro Herrera; Chris Browne
Subject:Re: [HACKERS] PostgreSQL - 'SKYLINE OF' clause added!

On Wed, 7 Mar 2007, Josh Berkus wrote:

 Approximate queries is something with DSS users *want*.  Jim Grey addressed
 this in his ACM editiorial on the databases of the future.  It's something
 that *I* want, and if the Greenplum people aren't speaking up here, it's
 because they're not paying atttention.

 Now, I don't know if this Skyline patch is our answer for approximate queries.
 Maybe I should pester Meredith about getting QBE free of its IP issues; it
 certainly looked more flexible than Skyline.  In either case, the code
 probably needs a complete refactor.

What people want from approximate queries is different to this: the
desire is usually to balance run time with level of accuracy/quality (some
times the desire is to have accurate results as well as similar results).
Neither skyline or QBE are about this. The only thing in the spec which
addresses this is 'tablesample'.

Thanks,

Gavin

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match



[HACKERS] RFC: changing autovacuum_naptime semantics

2007-03-07 Thread Alvaro Herrera
Hackers,

I want to propose some very simple changes to autovacuum in order to
move forward (a bit):

1. autovacuum_naptime semantics
2. limiting the number of workers: global, per database, per tablespace?

I still haven't received the magic bullet to solve the hot table
problem, but these at least means we continue doing *something*.


Changing autovacuum_naptime semantics

Are we agreed on changing autovacuum_naptime semantics?  The idea is to
make it per-database instead of the current per-cluster, i.e., a nap
would be the minimum time that passes between starting one worker into a
database and starting another worker in the same database.

Currently, naptime is the time elapsed between two worker runs across
all databases.  So if you have 15 databases, autovacuuming each one
takes place every 15*naptime.

Eventually, we could have per-database naptime defined in pg_database,
and do away with the autovacuum_naptime GUC param (or maybe keep it as a
default value).  Say for database D1 you want to have workers every 60
seconds but for database D2 you want 1 hour.

Question:
Is everybody OK with changing the autovacuum_naptime semantics?


Limiting the number of workers

I was originally proposing having a GUC parameter which would limit the
cluster-wide maximum number of workers.  Additionally we could have a
per-database limit (stored in a pg_database column), being simple to
implement.  Josh Drake proposed getting rid of the GUC param, saying
that it would confuse users to set the per-database limit to some higher
value than the GUC setting and then finding the lower limit enforced
(presumably because of being unaware of it).

The problem is that we need to set shared memory up for workers, so we
really need a hard limit and it must be global.  Thus the GUC param is
not optional.

Other people also proposed having a per-tablespace limit.  This would
make a lot of sense, tablespaces being the natural I/O units.  However,
I'm not very sure it's too easy to implement, because you can put half
of database D1 and half of database D2 in tablespace T1, and the two
other halves in tablespace T2.  Then enforcing the limit becomes rather
complicated and will probably mean putting a worker to sleep.  I think
it makes more sense to skip implementing per-tablespace limits for now,
and have a plan to put per-tablespace IO throttles in the future.

Questions:
Is everybody OK with not putting a per-tablespace worker limit?
Is everybody OK with putting per-database worker limits on a pg_database
column?

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[HACKERS] Proposed ProcessUtility() API additions

2007-03-07 Thread Tom Lane
I'd like to change ProcessUtility to take a couple of additional
parameters, which it in turn would pass down to those (relatively few)
utility statements that need one or both:

* query_string: source text of command, if known (can be NULL)
* is_top_level: TRUE if command is being driven directly from
  exec_simple_query or exec_execute_message, else FALSE (this would
  need to be passed through PortalRun, so it gets this parameter added
  too).

The point of adding query_string is that whenever parse analysis of
a sub-command is postponed until utility execution, we need to pass
the query string to parse_analyze if we want syntax error location.
This is already an issue for CREATE SCHEMA, and it's about to be
a problem for PREPARE.  There are also a couple of places that rely
on debug_query_string, which they really shouldn't be doing since
that's the current interactive command, not necessarily what's being
parsed at the moment.

The point of adding is_top_level is to provide a simpler, more reliable
means for PreventTransactionChain and related functions to detect
whether a function is trying to invoke a non-transaction-block-safe
command.  Currently we rely on an ugly test involving seeing if the
statement node is in the QueryContext, but that's always been a kluge,
and I'm not sure that it works 100% even today.  I'd like to get rid
of the QueryContext global altogether.

Comments, objections?

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] RFC: changing autovacuum_naptime semantics

2007-03-07 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 Is everybody OK with changing the autovacuum_naptime semantics?

it seems already different from 8.2, so no objection to further change.

 Is everybody OK with not putting a per-tablespace worker limit?
 Is everybody OK with putting per-database worker limits on a pg_database
 column?

I don't think we need a new pg_database column.  If it's a GUC you can
do ALTER DATABASE SET, no?  Or was that what you meant?

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] WITH/RECURSIVE plans

2007-03-07 Thread Gregory Stark
Simon Riggs [EMAIL PROTECTED] writes:

 On Fri, 2007-03-02 at 11:09 -0800, David Fetter wrote:
 On Fri, Mar 02, 2007 at 10:52:14AM -0800, Joshua D. Drake wrote:
  Hello,
  
  Unfortunately we (the community) will not have WITH/RECURSIVE for 8.3. 
  However I have spoken with a Alexey and Alvaro and Command Prompt has 
  decided to make WITH/RECURSIVE a priority for 8.4.
 
 Any chance we can get WITH without RECURSIVE?  That would be very
 handy all by itself.

 I thought Greg already did submitted that?

I thought so too. Except then I tested it on more complex examples and it
didn't seem to work. Somehow the common table expression name wasn't being
propagated into the scope of subqueries. If that were fixed then, yes,
basically it's done.

I think in the long term we want common table expressions to not be executed
multiple times when they're referred to multiple times. So the existing
implementation which is basically just the syntax support and directly inlines
them wherever they're referenced, is inadequate. But there's nothing (aside
from the aforementioned failure to work) wrong with putting in something that
multiply evaluates now and avoiding the multiple evaluation later as an
optimisation. 

Having said that I suppose one could argue it's incorrect to multiply evaluate
if there's a volatile function in the common table expression. But I could see
an equally strong argument for the converse. I'm wouldn't be too concerned
about that.


So... 

The question I posed earlier was what types of data belong in the pstate and
what belongs in the parse node itself. I had the impression from the code that
the pstate was inherited when subqueries were planned (and restored after they
were done) so that made it the right place to keep the list of common table
expression names that were in scope.

Is that right? If so then I probably just have some mundane bug somewhere I
can track down now that I have some time. If not then what am I
misunderstanding about the pstate and where would be the right place to keep
this kind of parser namespace state?

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Proposed ProcessUtility() API additions

2007-03-07 Thread Gregory Stark


Tom Lane [EMAIL PROTECTED] writes:

 * is_top_level: TRUE if command is being driven directly from
   exec_simple_query or exec_execute_message, else FALSE (this would
   need to be passed through PortalRun, so it gets this parameter added
   too).

...

 The point of adding is_top_level is to provide a simpler, more reliable
 means for PreventTransactionChain and related functions to detect
 whether a function is trying to invoke a non-transaction-block-safe
 command.  Currently we rely on an ugly test involving seeing if the
 statement node is in the QueryContext, but that's always been a kluge,
 and I'm not sure that it works 100% even today.  I'd like to get rid
 of the QueryContext global altogether.

I'm not exactly following. How does the exec_simple_query or
exec_execute_message tell you whether you're in a transaction?

Can't you exec_simple_query(BEGIN;) and then exec_simple_query a second
query in the same transaction?

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Grouped Index Tuples / Clustered Indexes

2007-03-07 Thread Gregory Stark

 On Wed, 2007-03-07 at 10:32 +, Heikki Linnakangas wrote:
 I've been thinking 
 we should call this feature just Clustered Indexes 

So we would have clustered tables which are tables whose heap is ordered
according to an index and separately clustered indexes which are indexes
optimized for such tables?

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] WITH/RECURSIVE plans

2007-03-07 Thread Joshua D. Drake



Any chance we can get WITH without RECURSIVE?  That would be very
handy all by itself.



I thought Greg already did submitted that?
  

To my knowledge, it is not done. If it is, great!

Joshua D. Drake



---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] PostgreSQL - 'SKYLINE OF' clause added!

2007-03-07 Thread Shane Ambler

Tom Lane wrote:

Josh Berkus josh@agliodbs.com writes:
I think if the code is good enough, and we can avoid horrible non-standard 
syntax extensions, they should go in.   We have to defend our title as most 
advanced database and having stuff like Skyline first (before DB2 or MS) 
goes a long way for that.


Well, whether it's horrible or not is in the eye of the beholder, but
this is certainly a non-standard syntax extension.


Being non-standard should not be the only reason to reject a worthwhile 
feature. Do you really believe that the SQL standard covers every 
feature that a RDBMS could ever want to implement? Do you think that the 
current non-standard features of PostgreSQL should be removed?



My questions about whether to adopt it have more to do with
cost/benefit.  I haven't seen the patch, but it sounds like it will be
large and messy; and it's for a feature that nobody ever heard of before,
let alone one that the community has developed a consensus it wants.
I'm not interested in adopting stuff just because DB2 hasn't got it.


Partially agree but I do think it is worth looking at to see if some or 
all of the feature is worth implementing. The fact that several 
different groups have been mentioned to be working on this feature would 
indicate that it is worth considering. Maybe one of the other groups 
will have implemented it better than the first off the rank. Maybe our 
core developers can work out a better way to implement these features.


A few people on this list have said they are interested in this.


It's also worth noting that what we've got here is a large patch
developed, by students, completely outside our normal development
process; so the odds that it's going to be anywhere near acceptable are
low.  I think the last time we applied a patch that met that description
was the INTERSECT/EXCEPT patch in 1999 ... maybe you don't remember
what a fiasco that was, but I do.


True but the quals he has listed on his web pages look impressive and 
probably give him a little reason to have his work considered/looked at. 
He may just end up being a main PostgreSQL developer in the future.



Sorry to be a thrower of cold water, but I just don't see that this
comes anywhere near being something we should be eager to accept.


True we shouldn't just say sounds good let's put it in but with some 
indication that this feature is along the lines of what users want, 
would indicate that we should be asking -


Do we want this or a similar feature?
Is the theory behind this feature solid?
Can the same end results be gained with other existing methods?
Is the implementation offered worth considering?
Has it been developed to meet the PostgreSQL developer guidelines?
Is it reasonable to work on it to reach a level of quality/performance 
that we will be happy to include?

Can we implement this feature better ourselves?
Do we want to start this feature from scratch ourselves?



--

Shane Ambler
[EMAIL PROTECTED]

Get Sheeky @ http://Sheeky.Biz

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] RFC: changing autovacuum_naptime semantics

2007-03-07 Thread Galy Lee
Alvaro,

Alvaro Herrera wrote:
 I still haven't received the magic bullet to solve the hot table
 problem, but these at least means we continue doing *something*.

Can I know about what is your plan or idea for autovacuum improvement
for 8.3 now? And also what is the roadmap of autovacuum improvement for 8.4?

Thanks,

Galy Lee
lee.galy _at_ ntt.oss.co.jp
NTT Open Source Software Center



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Auto creation of Partitions

2007-03-07 Thread Luke Lonergan
Andreas,

On 3/7/07 11:45 AM, Zeugswetter Andreas ADI SD [EMAIL PROTECTED]
wrote:

 Whoa, do you have anything to back that up ?

Sure - when we start to consider designs that implement advanced data
management features, we run into problems with the architecture of
tables-tables-tables  Here are some examples:
1 - people think of partitions as a logical building block for tables, they
would like to move partitions around underneath a table without the table
definition being involved.  In the current implementation, there are
explicit linkages between the table definition and the child tables -
imagine an ALTER TABLE foo_parent ADD COLUMN and how it would need to
cascade to 1,000 child tables and you get the beginning of it - this
connection should not exist.

2 - INSERT/UPDATE/DELETE processing through the SQL rewrite layer (rules) is
terribly slow and gets slower as you add more partitions.  If done closer to
the storage layer, this can be done in ways that use access methods shared
with other storage entities, e.g. Indices, and the code path would flow more
naturally.

3 - Parallel query can be accomplished more easily by separating scans
across relations split among tablespaces.  This is more natural than trying
to parallelize APPEND nodes within existing plans

 You would need to elaborate what you actually mean, but I think it is
 moot.
 Sure, the constraint technique can be further extended (e.g. during
 runtime), but imho the approach is very good.

Well, it's being used and that's good, but it needs to be better IMO and I
think that before we go too far down the current path we should consider the
alternatives more carefully.

- Luke



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] Log levels for checkpoint/bgwriter monitoring

2007-03-07 Thread Greg Smith

On Wed, 7 Mar 2007, ITAGAKI Takahiro wrote:


Also, my recommended bgwriter_lru_maxpages is average number of
recycled buffers per cycle, that is hardly able to tune manually.


This is completely dependent on what percentage of your buffer cache is 
pinned.  If your load is something like the standard pgbench, the LRU 
writer will rarely find anything useful to write, so this entire line of 
thinking won't work.  The proper behavior for heavily pinned data is to 
turn off the LRU writer altogether so there's more time to run the all 
scan.


The job I'm trying to take on here is not to presume I can solve these 
problems myself yet.  I've instead recognized that people need usefully 
organized information in order to even move in that direction, and that 
informatoin is not even close to being available right now.


What my latest work in progress patches do is summarize each scan of the 
buffer pool with information about how much was written by each of the two 
writers, along with noting what percentage of the pool was pinned data. 
I'm trying to get that one ready to submit this week.  Those three values 
suggest some powerful techniques for tuning, but it's not quite good 
enough to allow auto-tuning.  It also needs a feel for how much time is 
left before the next checkpoint.


What really needs to go along with all this is a sort of progress bar that 
esimates how long we are from a checkpoint based on both a) the timeout, 
and b) how many segments have been written.  The timeout one is easy to 
work with that way (from what I read of your code, you've worked that 
angle).  The part I had trouble doing was getting the WAL writers to 
communicate a progress report on how many segments they filled back to the 
bgwriter.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] Proposed ProcessUtility() API additions

2007-03-07 Thread Tom Lane
Gregory Stark [EMAIL PROTECTED] writes:
 Tom Lane [EMAIL PROTECTED] writes:
 The point of adding is_top_level is to provide a simpler, more reliable
 means for PreventTransactionChain and related functions to detect
 whether a function is trying to invoke a non-transaction-block-safe
 command.

 I'm not exactly following. How does the exec_simple_query or
 exec_execute_message tell you whether you're in a transaction?

Read PreventTransactionChain and friends.  We can tell from state info
provided by xact.c whether we're inside a BEGIN block.  The problem
stems from cases like

SELECT myfunc()

where myfunc() tries to execute one of the verboten commands, via SPI or
some similar mechanism.  If we allowed that, then the function could try
to execute more commands within the same top-level transaction, which is
exactly what we don't want to allow.  If the command is being issued
directly by exec_simple_query or exec_execute_message, and it's not
inside a BEGIN block, then there's no way for an additional command to
be issued before commit.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] Trivial HugeTLB Benchmark

2007-03-07 Thread Jim Nasby

On Mar 4, 2007, at 3:33 PM, Ryan Cumming wrote:

I did another 18 runs, 9 each for huge pages and normal shared memory.
The database was reinitialized before every third run with pgbench -i
-s 10. The runs themselves were done with pgbench -s 10 -c 5 -t  
1


Rather than doing that, I think you'd be much better off just running  
a very long benchmark and turning on autovaccum. That would at least  
be closer to real-world usage.

--
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] Auto creation of Partitions

2007-03-07 Thread Jim Nasby

On Mar 6, 2007, at 9:13 AM, Shane Ambler wrote:

NikhilS wrote:

On 3/6/07, Peter Eisentraut [EMAIL PROTECTED] wrote:

1) Whether we should use triggers/rules for step number (iii) above.
 Maybe rules is the way to go.

Since this would basically be a case of the updatable rules  
problem, you
should review those discussions in the past to check whether the  
issues

mentioned there don't interfere with that plan.

The rules mentioned here will be to specify that all the
inserts/updates/deletes should go into proper children tables  
instead of the
parent. I do not see the updateable rules problem with regards to  
this, but
will check out the archives for discussion on this related to  
partitioning.


I would think that a trigger would be a better choice as I see the  
need (or at least the possibility) for more than just a rewrite.  
When a value is inserted that is outside of a condition currently  
covered by an existing child table then a new child will need to be  
spawned to contain the new data.


There's no reason a new child has to be spawned, and I don't think  
that having a DML statement 'automagically' generating DDL is such a  
hot idea, either.


Also, there's nothing inherently wrong with having an 'overflow  
partition' (to use Oracle syntax) that holds values that don't fall  
in the range of any other tables. The obvious place to do that with  
our partitioning is in the parent table.


There are 2 other reasons to favor triggers though:

1) People (Josh Drake comes to mind) have found that if you get over  
a tiny number of partitions, the performance of rules is abysmal.


2) I believe it should be possible to construct an update trigger  
that allows you to perform updates that will place the row in  
question into a new partition. While I can see cases for simply  
disallowing updates to the partitioning key, I think there are also  
times when being able to do that would be very useful.


Will ALTER TABLE be extended to handle partitions? This will allow  
partitioning existing tables (maybe without blocking access?) and  
allow things like ALTER TABLE mytable ADD PARTITION (mydate within  
200703)
and ALTER TABLE mytable DROP PARTITION (mydate within 199912) or  
would dropping be covered by DELETE FROM mytable where mydate =  
199912 ?


I think it'd be great to make adding and removing partitions as  
simple as ALTER TABLE. I don't think that DELETE should be the  
mechanism to drop a partition, though. Again, DML statements  
shouldn't be performing DDL.


Could such a syntax be devised for date columns? (month of mydate)  
or similar to auto create partitions based on the year and month of  
a date column? or will we just do CHECK(mydatecol = 1/3/07 and  
mydatecol = 31/3/07) for each month of data? Also (day of  
mydatecol) to partition based on year and day of year.


Another syntax possibility - range(myserialcol of 50) where new  
child tables are created every 50 rows?


Maybe I'm looking at auto-maintenance which is beyond any current  
planning?


I don't think it's worth it to burden the database with auto-creating  
time partitions; it's easy enough to setup a cron job to handle it.


It might be more useful to have the database handle things like  
partitioning on a SERIAL column, though I agree with Nikhils that  
this should wait.


Does any other database support 'automagic' partition creation? I  
know Oracle 9i doesn't... not sure about 10g or DB2 or MSSQL...

--
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [HACKERS] Auto creation of Partitions

2007-03-07 Thread Jim Nasby

On Mar 6, 2007, at 4:57 AM, NikhilS wrote:
iv) Based on the PRIMARY, UNIQUE, REFERENCES information specified,  
pass it on to the children tables.


Since we want to eventually support 'global' indexes, I think we need  
to be really careful here. There's actually 3 possible cases:


1) Index* should be global (spanning multiple tables)
2) Index* should be inherited by all partitions as they're created
3) Index* should exist only on the parent table

* Note that there's really no reason this has to be limit to indexes;  
it could certainly apply to constraints, or even triggers.


IIRC, Oracle has a special syntax for global indexes; any other index  
defined on a partitioned table is picked up *by newly created  
partitions*. If you want to add indexes to existing partitions, you  
have to explicitly add it to each partition.


I'd like to eventually see us supporting all 3 options. I'm not sure  
how much we want to clutter the grammar, though.

--
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Auto creation of Partitions

2007-03-07 Thread Jim Nasby

On Mar 7, 2007, at 3:26 AM, Simon Riggs wrote:
If you know that the constraints on each of the tables is distinct,  
then
building a UNIQUE index on each of the partitions is sufficient to  
prove

that all rows in the combined partitioned table are distinct also.

The hard part there is checking that the partition constraints are
distinct. If the partition constraints are added one at a time, you  
can

use the predicate testing logic to compare the to-be-added partition's
constraint against each of the already added constraints. That becomes
an O(N) problem.

What is really needed is a data structure that allows range partitions
to be accessed more efficiently. This could make adding partitions and
deciding in which partition a specific value goes an O(logN)  
operation.


Directing data to child tables with triggers pretty much necessitates  
having some way to codify what partition a particular row belongs in.  
IE: for partitioning by month, you'll see things like naming the  
partition tables parent_table_name_$YEAR_$MONTH, so the  
'partitioning function' takes a date or timestamp and then returns  
what partition it belongs to. Perhaps there is some way to use that  
mapping to drive the selection of what partitions could contain a  
given value?


One possibility would be to require 3 functions for a partitioned  
table: one accepts the partitioning key and tells you what partition  
it's in, one that tells you what the minimum partitioning key for a  
partition would be, and one that tells you what the maximum would be.  
If the user supplied those 3 functions, I think it would be possibly  
to automatically generate code for the triggers and check  
constraints. The min/max partition key functions might allow you to  
more efficiently do partition elimination, too.

--
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)



---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] RFC: changing autovacuum_naptime semantics

2007-03-07 Thread Jim Nasby

On Mar 7, 2007, at 4:00 PM, Alvaro Herrera wrote:
Is everybody OK with putting per-database worker limits on a  
pg_database

column?


I'm worried that we would live to regret such a limit. I can't really  
see any reason to limit how many vacuums are occurring in a database,  
because there's no limiting factor there; you're either going to be  
IO bound (per-tablespace), or *maybe* CPU-bound (perhaps the  
Greenplum folks could enlighten us as to whether they run into vacuum  
being CPU-bound on thumpers).


Changing the naptime behavior to be database related makes perfect  
sense, because the minimum XID you have to worry about is a per- 
database thing; I just don't see limiting the number of vacuums as  
being per-database, though. I'm also skeptical that we'll be able to  
come up with a good way to limit the number of backends until we get  
the hot table issue addressed. Perhaps a decent compromise for now  
would be to limit how many 'small table' vacuums could run on each  
tablespace, and then limit how many 'unlimited table size' vacuums  
could run on each tablespace, where 'small table' would probably have  
to be configurable. I don't think it's the best final solution, but  
it should at least solve the immediate need.

--
Jim Nasby[EMAIL PROTECTED]
EnterpriseDB  http://enterprisedb.com  512.569.9461 (cell)



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] PostgreSQL - 'SKYLINE OF' clause added!

2007-03-07 Thread Tom Lane
Shane Ambler [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 Well, whether it's horrible or not is in the eye of the beholder, but
 this is certainly a non-standard syntax extension.

 Being non-standard should not be the only reason to reject a worthwhile 
 feature.

No, but being non-standard is certainly an indicator that the feature
may not be of widespread interest --- if it were, the SQL committee
would've gotten around to including it; seems they've managed to include
everything but the kitchen sink already.  Add to that the complete lack
of any previous demand for the feature, and you have to wonder where the
market is.

 The fact that several 
 different groups have been mentioned to be working on this feature would 
 indicate that it is worth considering.

It looks to me more like someone published a paper that caught the
attention of a few profs looking for term projects for their students.

Now maybe it really is the best idea since sliced bread and will be seen
in the next SQL spec edition, but color me skeptical.  It seems to me
to be a very narrow-usage extension, as opposed to (eg) multi-input
aggregates or WITH/RECURSIVE, which provide general mechanisms applicable
to a multitude of problems.  Now even so it would be fine if the
implementation were similarly narrow in scope, but the published
description of the patch mentions a large chunk of additional executor
mechanisms.  If we're going to be adding as much code as that, I'd like
to see a wider scope of usage for it.

Basically, this patch isn't sounding like it has a reasonable
bang-to-the-buck ratio ...

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] Bug in VACUUM FULL ?

2007-03-07 Thread Pavan Deolasee

Tom Lane wrote:

 Please check if this makes it go away for you --- I'm a bit busy
 at the moment.


Thanks a lot, Tom. It seems to work fine for me. I will do some
more tests and report if I see any issue. Btw, the patch as per
your suggestion is attached.

Thanks,
Pavan

*** src/backend/commands/vacuum.c   2007-02-16 04:53:22.0 +0530
--- src/backend/commands/vacuum.c   2007-03-08 09:25:15.016991272 +0530
***
*** 718,724 
--- 718,732 
 * If anything changed, write out the tuple
 */
if (dirty)
+   {
heap_inplace_update(rd, ctup);
+   /* the above sends a cache inval message */
+   }
+   else
+   {
+   /* no need to change tuple, but force relcache inval anyway */
+   CacheInvalidateRelcacheByTuple(ctup);
+   }
  
heap_close(rd, RowExclusiveLock);
  }

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq