Re: [HACKERS] [PERFORM] A Better External Sort?

2005-10-01 Thread Tom Lane
Jeffrey W. Baker [EMAIL PROTECTED] writes: I think the largest speedup will be to dump the multiphase merge and merge all tapes in one pass, no matter how large M. Currently M is capped at 6, so a sort of 60GB with 1GB sort memory needs 13 passes over the tape. It could be done in a single

Re: [HACKERS] Expression index ignores column statistics target

2005-10-01 Thread Tom Lane
I wrote: I seem to recall bringing up the question of whether we could find a less implementation-specific way of commanding this behavior, but I can't find it in the archives right now. Ah, here it is: http://archives.postgresql.org/pgsql-hackers/2004-03/msg00502.php No responses :-(

Re: [HACKERS] [PERFORM] A Better External Sort?

2005-10-01 Thread Dann Corbit
-Original Message- From: [EMAIL PROTECTED] [mailto:pgsql-hackers- [EMAIL PROTECTED] On Behalf Of Tom Lane Sent: Friday, September 30, 2005 11:02 PM To: Jeffrey W. Baker Cc: Luke Lonergan; Josh Berkus; Ron Peacetree; pgsql- [EMAIL PROTECTED]; pgsql-performance@postgresql.org

Re: [HACKERS] Expression index ignores column statistics target

2005-10-01 Thread Michael Fuhr
On Sat, Oct 01, 2005 at 02:19:06AM -0400, Tom Lane wrote: I wrote: I seem to recall bringing up the question of whether we could find a less implementation-specific way of commanding this behavior, but I can't find it in the archives right now. Ah, here it is:

Re: [HACKERS] Expression index ignores column statistics target

2005-10-01 Thread Tom Lane
Michael Fuhr [EMAIL PROTECTED] writes: On Sat, Oct 01, 2005 at 02:19:06AM -0400, Tom Lane wrote: Ah, here it is: http://archives.postgresql.org/pgsql-hackers/2004-03/msg00502.php Would an ALTER INDEX SET STATISTICS form be possible? It's not so much the table/index misnomer that's bothering

Re: [HACKERS] Expression index ignores column statistics target

2005-10-01 Thread Michael Fuhr
On Sat, Oct 01, 2005 at 02:42:32AM -0400, Tom Lane wrote: Michael Fuhr [EMAIL PROTECTED] writes: Would an ALTER INDEX SET STATISTICS form be possible? It's not so much the table/index misnomer that's bothering me, it's the lack of a clean way to identify which column of the index you are

Re: [HACKERS] effective SELECT from child tables

2005-10-01 Thread Simon Riggs
On Wed, 2005-09-28 at 22:24 +0200, Martijn van Oosterhout wrote: On Wed, Sep 28, 2005 at 11:00:46PM +0400, Ilia Kantor wrote: Your suggestion is essentially the same as mine.. There exists tableoid, pretty much suited to tell between tables in the case of inheritance.. I can't see a real

Re: [HACKERS] effective SELECT from child tables

2005-10-01 Thread Simon Riggs
On Fri, 2005-09-30 at 18:30 -0500, Jim C. Nasby wrote: I thought char was actually stored variable-length...? I know there's a type that actually acts like char does on most databases, but I can't remember what it is off-hand (it should be mentioned in docs 8.3...) You are correct on that

Re: [HACKERS] Found small issue with OUT params

2005-10-01 Thread Dave Page
-Original Message- From: [EMAIL PROTECTED] on behalf of Bruce Momjian Sent: Sat 10/1/2005 1:16 AM To: Jim C. Nasby Cc: Joshua D. Drake; Tony Caduto; pgsql-hackers@postgresql.org Subject: Re: [HACKERS] Found small issue with OUT params fix pgxs for spaces in file names I posted a

Re: [HACKERS] [PERFORM] A Better External Sort?

2005-10-01 Thread Simon Riggs
On Fri, 2005-09-30 at 13:41 -0700, Josh Berkus wrote: Yeah, that's what I thought too. But try sorting an 10GB table, and you'll see: disk I/O is practically idle, while CPU averages 90%+. We're CPU-bound, because sort is being really inefficient about something. I just don't know what

Re: [HACKERS] [PERFORM] A Better External Sort?

2005-10-01 Thread Simon Riggs
On Sat, 2005-10-01 at 02:01 -0400, Tom Lane wrote: Jeffrey W. Baker [EMAIL PROTECTED] writes: I think the largest speedup will be to dump the multiphase merge and merge all tapes in one pass, no matter how large M. Currently M is capped at 6, so a sort of 60GB with 1GB sort memory needs 13

Re: [HACKERS] [PERFORM] A Better External Sort?

2005-10-01 Thread Michael Stone
On Fri, Sep 30, 2005 at 01:41:22PM -0700, Josh Berkus wrote: Realistically, you can't do better than about 25MB/s on a single-threaded I/O on current Linux machines, What on earth gives you that idea? Did you drop a zero? Mike Stone ---(end of

Re: [HACKERS] effective SELECT from child tables

2005-10-01 Thread Martijn van Oosterhout
On Fri, Sep 30, 2005 at 06:30:10PM -0500, Jim C. Nasby wrote: On Wed, Sep 28, 2005 at 07:25:46PM +0100, Simon Riggs wrote: Include the Discriminator as a column in A and it will be inherited by all A1, A2, A3. e.g. concrete_class char(1) not null snip This will add 1 byte per row in

Re: [HACKERS] [PERFORM] A Better External Sort?

2005-10-01 Thread Hannu Krosing
On R, 2005-09-30 at 13:38 -0700, Luke Lonergan wrote: Bulk loading speed is irrelevant here - that is dominated by parsing, which we have covered copiously (har har) previously and have sped up by 500%, which still makes Postgres 1/2 the loading speed of MySQL. Is this 1/2 of MySQL with

Re: [HACKERS] effective SELECT from child tables

2005-10-01 Thread mark
On Fri, Sep 30, 2005 at 09:54:39PM +0100, Simon Riggs wrote: On Wed, 2005-09-28 at 22:24 +0200, Martijn van Oosterhout wrote: I wonder if it would be possible to tweak the constraints exclusion code so that if it sees something of the form tableoid = X to exclude other tables... You know,

Re: [HACKERS] [PERFORM] A Better External Sort?

2005-10-01 Thread Ron Peacetree
*blink* Tapes?! I thought that was a typo... If our sort is code based on sorting tapes, we've made a mistake. HDs are not tapes, and Polyphase Merge Sort and it's brethren are not the best choices for HD based sorts. Useful references to this point: Knuth, Vol 3 section 5.4.9, (starts p356 of

Re: [HACKERS] Expression index ignores column statistics target

2005-10-01 Thread Bruce Momjian
Tom Lane wrote: Michael Fuhr [EMAIL PROTECTED] writes: I've noticed that row count estimates for expression indexes appear to rely on default_statistics_target rather than on a column's actual statistics target. That is, if I use ALTER TABLE SET STATISTICS to increase a column's

Re: [HACKERS] effective SELECT from child tables

2005-10-01 Thread Martijn van Oosterhout
On Sat, Oct 01, 2005 at 10:05:22AM -0400, [EMAIL PROTECTED] wrote: It has the 'side or additional benefit' being requested here. The ability to filter the child table by some attribute. For example, if the child tables are used for partitioning, and the attribute were to keep a date range, the

Re: [HACKERS] [PERFORM] A Better External Sort?

2005-10-01 Thread Andrew Dunstan
Ron Peacetree wrote: The good news is all this means it's easy to demonstrate that we can improve the performance of our sorting functionality. Assuming we get the abyssmal physical IO performance fixed... (because until we do, _nothing_ is going to help us as much) I for one would be

Re: [HACKERS] [PERFORM] A Better External Sort?

2005-10-01 Thread Tom Lane
Josh Berkus josh@agliodbs.com writes: The biggest single area where I see PostgreSQL external sort sucking is on index creation on large tables. For example, for free version of TPCH, it takes only 1.5 hours to load a 60GB Lineitem table on OSDL's hardware, but over 3 hours to create

Re: [HACKERS] On Logging

2005-10-01 Thread Jim C. Nasby
On Fri, Sep 30, 2005 at 06:24:17PM -0700, David Fetter wrote: How would per-backend logging work? I'd suggest having settings for a per-backend 'debug' logging mode that could be triggered either via a SQL command or a signal to the backend. It would be useful to be able to log this to a

Re: [HACKERS] [PATCHES] Proposed patch for sequence-renaming problems

2005-10-01 Thread Tom Lane
Bruce Momjian pgman@candle.pha.pa.us writes: Hold, I think I see an idea here. We have two cases, those for SERIAL and those for manual nextval() in DEFAULT. For SERIAL, we can easily map to a nextval(::regclass) call on pg_dump reload. For manual nextval, it will appear in the dump as

Re: [HACKERS] effective SELECT from child tables

2005-10-01 Thread Jim C. Nasby
To clarify, this is a hard-coded implementation of what I'm asking for: http://cvs.distributed.net/viewcvs.cgi/stats-sql/logdb/ in a nutshell: CREATE TABLE log_other ( project_id smallint NOT NULL ... ) CREATE TABLE log_8 ( -- No project_id ... ) CREATE TABLE log_24, log_25,

Re: [HACKERS] effective SELECT from child tables

2005-10-01 Thread Jim C. Nasby
On Sat, Oct 01, 2005 at 09:43:03AM +0100, Simon Riggs wrote: On Fri, 2005-09-30 at 18:30 -0500, Jim C. Nasby wrote: I thought char was actually stored variable-length...? I know there's a type that actually acts like char does on most databases, but I can't remember what it is off-hand

Re: [HACKERS] effective SELECT from child tables

2005-10-01 Thread Jim C. Nasby
On Sat, Oct 01, 2005 at 02:13:09PM +0200, Martijn van Oosterhout wrote: On Fri, Sep 30, 2005 at 06:30:10PM -0500, Jim C. Nasby wrote: On Wed, Sep 28, 2005 at 07:25:46PM +0100, Simon Riggs wrote: Include the Discriminator as a column in A and it will be inherited by all A1, A2, A3.

Re: [HACKERS] [PATCHES] Proposed patch for sequence-renaming problems

2005-10-01 Thread Tom Lane
Peter Eisentraut [EMAIL PROTECTED] writes: That was my thinking. The issue has probably been there since 7.3. I don't think we need to shove in a solution now, especially when there is so much disagreement about the behavior. Well, we have a new issue that has made the problem much worse

Re: [HACKERS] effective SELECT from child tables

2005-10-01 Thread Andrew Dunstan
Jim C. Nasby wrote: On Sat, Oct 01, 2005 at 02:13:09PM +0200, Martijn van Oosterhout wrote: On Fri, Sep 30, 2005 at 06:30:10PM -0500, Jim C. Nasby wrote: On Wed, Sep 28, 2005 at 07:25:46PM +0100, Simon Riggs wrote: Include the Discriminator as a column in A and it will be

Re: [HACKERS] [PATCHES] Proposed patch for sequence-renaming problems

2005-10-01 Thread Tom Lane
Bruce Momjian pgman@candle.pha.pa.us writes: Tom Lane wrote: This still wouldn't put us in a place where existing dumps are automatically fixed up during import. We'd parse the expressions as nextval('foo'::text::regclass), which will work but it's effectively still late-binding --- the

Re: [HACKERS] [PERFORM] A Better External Sort?

2005-10-01 Thread Greg Stark
Tom Lane [EMAIL PROTECTED] writes: Jeffrey W. Baker [EMAIL PROTECTED] writes: I think the largest speedup will be to dump the multiphase merge and merge all tapes in one pass, no matter how large M. Currently M is capped at 6, so a sort of 60GB with 1GB sort memory needs 13 passes over

Re: [HACKERS] [PERFORM] A Better External Sort?

2005-10-01 Thread Martijn van Oosterhout
On Sat, Oct 01, 2005 at 10:22:40AM -0400, Ron Peacetree wrote: Assuming we get the abyssmal physical IO performance fixed... (because until we do, _nothing_ is going to help us as much) I'm still not convinced this is the major problem. For example, in my totally unscientific tests on an oldish

Re: [HACKERS] effective SELECT from child tables

2005-10-01 Thread Martijn van Oosterhout
Hmm, I'm trying to understand here. If every row in log_8 should have the same project_id, couldn't this be acheived by having each row in log_other contain the tableoid of the table it refers to. Then a join will return the info you're looking for. Or am I missing something? On Sat, Oct 01,

Re: [HACKERS] effective SELECT from child tables

2005-10-01 Thread Jim C. Nasby
On Sat, Oct 01, 2005 at 06:28:03PM +0200, Martijn van Oosterhout wrote: Hmm, I'm trying to understand here. If every row in log_8 should have the same project_id, couldn't this be acheived by having each row in log_other contain the tableoid of the table it refers to. Then a join will return

Re: [HACKERS] [PERFORM] A Better External Sort?

2005-10-01 Thread Ron Peacetree
As I posted earlier, I'm looking for code to base a prototype on now. I'll test it outside pg to make sure it is bug free and performs as promised before I hand it off to the core pg developers. Someone else is going to have to merge it into the pg code base since I don't know the code intimately

Re: [HACKERS] [PATCHES] Proposed patch for sequence-renaming problems

2005-10-01 Thread Bruce Momjian
Tom Lane wrote: Bruce Momjian pgman@candle.pha.pa.us writes: Hold, I think I see an idea here. We have two cases, those for SERIAL and those for manual nextval() in DEFAULT. For SERIAL, we can easily map to a nextval(::regclass) call on pg_dump reload. For manual nextval, it will

Re: [HACKERS] [PATCHES] Proposed patch for sequence-renaming problems

2005-10-01 Thread Bruce Momjian
Tom Lane wrote: Peter Eisentraut [EMAIL PROTECTED] writes: That was my thinking. The issue has probably been there since 7.3. I don't think we need to shove in a solution now, especially when there is so much disagreement about the behavior. Well, we have a new issue that has made

Re: [HACKERS] [PATCHES] Proposed patch for sequence-renaming problems

2005-10-01 Thread Bruce Momjian
Tom Lane wrote: Bruce Momjian pgman@candle.pha.pa.us writes: Tom Lane wrote: This still wouldn't put us in a place where existing dumps are automatically fixed up during import. We'd parse the expressions as nextval('foo'::text::regclass), which will work but it's effectively still

Re: [HACKERS] [PATCHES] Proposed patch for sequence-renaming problems

2005-10-01 Thread Tom Lane
Bruce Momjian pgman@candle.pha.pa.us writes: Tom Lane wrote: Ripping out ALTER SCHEMA RENAME is not a solution unless you have a path to a solution later with more work. Well, we are only giving ourselves a few weeks to solve this, and I think a hack to make it work cleanly for users is

Re: [HACKERS] FW: PGBuildfarm member snake Branch HEAD Status changed

2005-10-01 Thread Tom Lane
Andrew Dunstan [EMAIL PROTECTED] writes: Alvaro Herrera wrote: Here, the culprits are tfind() and tsearch(). These apparently aren't portable enough, but they seem to exist on all other platforms. Maybe we could come up with a replacement on Windows? Are there simple btree/hash table

Re: [HACKERS] Expression index ignores column statistics target

2005-10-01 Thread Tom Lane
Michael Fuhr [EMAIL PROTECTED] writes: ALTER INDEX indexname ALTER COLUMN the expression SET STATISTICS 100; Yeah, that could probably be made to work. I do see that indexes allow multiple instances of the same expression, so this approach could be ambiguous. I can't think of an actual use

Re: [HACKERS] [PERFORM] A Better External Sort?

2005-10-01 Thread Ron Peacetree
You have not said anything about what HW, OS version, and pg version used here, but even at that can't you see that something Smells Wrong? The most common CPUs currently shipping have clock rates of ~2-3GHz and have 8B-16B internal pathways. SPARCs and other like CPUs are clocked slower but

Re: [HACKERS] \d on database with a lot of tables is slow

2005-10-01 Thread Tom Lane
I wrote: It's presumably mostly in the pg_table_is_visible() calls. I did some profiling on a test case with 1 tables, and noticed that a big part of the problem is that the catalog caches become entirely useless: almost every catcache lookup ends up going to the underlying tables. This is

Re: [HACKERS] [PATCHES] Proposed patch for sequence-renaming problems

2005-10-01 Thread Bruce Momjian
Tom Lane wrote: Bruce Momjian pgman@candle.pha.pa.us writes: Tom Lane wrote: Ripping out ALTER SCHEMA RENAME is not a solution unless you have a path to a solution later with more work. Well, we are only giving ourselves a few weeks to solve this, and I think a hack to make it work

Re: [HACKERS] [PATCHES] Proposed patch for sequence-renaming problems

2005-10-01 Thread Tom Lane
Bruce Momjian pgman@candle.pha.pa.us writes: Tom Lane wrote: Does that sound like a workable compromise? Personally, I _love_ it. I hope others do as well. :-) OK, I'll work up a patch. regards, tom lane ---(end of

Re: [HACKERS] effective SELECT from child tables

2005-10-01 Thread Simon Riggs
On Sat, 2005-10-01 at 10:57 -0500, Jim C. Nasby wrote: To clarify, this is a hard-coded implementation of what I'm asking for: http://cvs.distributed.net/viewcvs.cgi/stats-sql/logdb/ in a nutshell: CREATE TABLE log_other ( project_id smallint NOT NULL ... ) CREATE TABLE log_8 (

Re: [HACKERS] [PATCHES] Proposed patch for sequence-renaming problems

2005-10-01 Thread Bruce Momjian
Tom Lane wrote: Bruce Momjian pgman@candle.pha.pa.us writes: Tom Lane wrote: Does that sound like a workable compromise? Personally, I _love_ it. I hope others do as well. :-) OK, I'll work up a patch. Here is a query that shows nextval(::text) usage as defaults: SELECT

Re: [HACKERS] [PATCHES] Proposed patch for sequence-renaming problems

2005-10-01 Thread Bruce Momjian
Bruce Momjian wrote: Tom Lane wrote: Bruce Momjian pgman@candle.pha.pa.us writes: Tom Lane wrote: Does that sound like a workable compromise? Personally, I _love_ it. I hope others do as well. :-) OK, I'll work up a patch. Here is a query that shows nextval(::text) usage

Re: [HACKERS] effective SELECT from child tables

2005-10-01 Thread Hannu Krosing
On L, 2005-10-01 at 19:59 +0100, Simon Riggs wrote: Jim, Your idea was noted before and actually; I mentioned it to show that I listen and take note of ideas from any source. For everybody, I would note that the current behaviour is exactly the way that List Partitioning works on other

Re: [HACKERS] [PATCHES] Proposed patch for sequence-renaming problems

2005-10-01 Thread Tom Lane
Bruce Momjian pgman@candle.pha.pa.us writes: Sorry, this is the right one: SELECT n.nspname, c.relname, a.attname, d.adsrc FROMpg_namespace n, pg_class c, pg_attribute a, pg_attrdef d WHERE n.oid = c.relnamespace AND c.oid = a.attrelid AND

Re: [PERFORM] [HACKERS] Query in SQL statement

2005-10-01 Thread Tom Lane
Roger Hand [EMAIL PROTECTED] writes: I suggest ditching the CamelCase and going with underline_seperators. I'd also not use the bareword id, instead using bad_user_id. And I'd name the table bad_user. But that's just me. :) I converted a db from MS SQL, where tables and fields were CamelCase,

[HACKERS] 8.1beta2 pg_dumpall inconsistencies

2005-10-01 Thread Joshua D. Drake
After initializing a cluster with initdb -D data --no-locale as the user jd. I created a table named foo in the template1 database. create table foo(bar text, baz bigserial); When executing pg_dumpall: bin/pg_dumpall -U jd -p5500 Looks as it should. However: [EMAIL PROTECTED] pgsqldev]$

Re: [HACKERS] [PERFORM] A Better External Sort?

2005-10-01 Thread Martijn van Oosterhout
[removed -performance, not subscribed] On Sat, Oct 01, 2005 at 01:42:32PM -0400, Ron Peacetree wrote: You have not said anything about what HW, OS version, and pg version used here, but even at that can't you see that something Smells Wrong? Somewhat old machine running 7.3 on Linux 2.4. Not

Re: [HACKERS] [PERFORM] A Better External Sort?

2005-10-01 Thread Tom Lane
Martijn van Oosterhout kleptog@svana.org writes: Anyway, to bring some real info I just profiled PostgreSQL 8.1beta doing an index create on a 2960296 row table (3 columns, table size 317MB). 3 columns in the index you mean? What were the column datatypes? Any null values? The number 1

Re: [HACKERS] effective SELECT from child tables

2005-10-01 Thread Greg Stark
Hannu Krosing [EMAIL PROTECTED] writes: I also often wish that this would be possible when someone adds a column with a default value to a multi-million row table on a 24/7 production system and insists on filling all existing columns with the default. A rule ON SELECT FROM table_x WHERE