[PERFORM] Planner doesn't look at LIMIT?
Hello, I have PostgreSQL 8.0.3 running on a "workstation" with 768 MB of RAM, under FreeBSD. And I have a 47-milion row table: qnex=# explain select * from log; QUERY PLAN --- Seq Scan on log (cost=0.00..1741852.36 rows=47044336 width=180) (1 row) ...which is joined with a few smaller ones, like: qnex=# explain select * from useragents; QUERY PLAN --- Seq Scan on useragents (cost=0.00..9475.96 rows=364896 width=96) (1 row) shared_buffers = 5000 random_page_cost = 3 work_mem = 102400 effective_cache_size = 6 Now, if I do a SELECT: qnex=# EXPLAIN SELECT * FROM log NATURAL JOIN useragents LIMIT 1; QUERY PLAN - Limit (cost=15912.20..15912.31 rows=1 width=272) -> Hash Join (cost=15912.20..5328368.96 rows=47044336 width=272) Hash Cond: ("outer".useragent_id = "inner".useragent_id) -> Seq Scan on log (cost=0.00..1741852.36 rows=47044336 width=180) -> Hash (cost=9475.96..9475.96 rows=364896 width=96) -> Seq Scan on useragents (cost=0.00..9475.96 rows=364896 width=96) (6 rows) Or: qnex=# EXPLAIN SELECT * FROM log NATURAL LEFT JOIN useragents LIMIT 1; QUERY PLAN - Limit (cost=15912.20..15912.31 rows=1 width=272) -> Hash Left Join (cost=15912.20..5328368.96 rows=47044336 width=272) Hash Cond: ("outer".useragent_id = "inner".useragent_id) -> Seq Scan on log (cost=0.00..1741852.36 rows=47044336 width=180) -> Hash (cost=9475.96..9475.96 rows=364896 width=96) -> Seq Scan on useragents (cost=0.00..9475.96 rows=364896 width=96) (6 rows) Time: 2.688 ms ...the query seems to last forever (its hashing 47 million rows!) If I set enable_hashjoin=false: qnex=# EXPLAIN ANALYZE SELECT * FROM log NATURAL LEFT JOIN useragents LIMIT 1; QUERY PLAN --- Limit (cost=0.00..3.07 rows=1 width=272) (actual time=74.214..74.216 rows=1 loops=1) -> Nested Loop Left Join (cost=0.00..144295895.01 rows=47044336 width=272) (actual time=74.204..74.204 rows=1 loops=1) -> Seq Scan on log (cost=0.00..1741852.36 rows=47044336 width=180) (actual time=23.270..23.270 rows=1 loops=1) -> Index Scan using useragents_pkey on useragents (cost=0.00..3.02 rows=1 width=96) (actual time=50.867..50.867 rows=1 loops=1) Index Cond: ("outer".useragent_id = useragents.useragent_id) Total runtime: 74.483 ms ...which is way faster. Of course if I did: qnex=# EXPLAIN ANALYZE SELECT * FROM log NATURAL LEFT JOIN useragents WHERE logid = (SELECT logid FROM log LIMIT 1); QUERY PLAN --- Nested Loop Left Join (cost=0.04..6.09 rows=1 width=272) (actual time=61.403..61.419 rows=1 loops=1) InitPlan -> Limit (cost=0.00..0.04 rows=1 width=4) (actual time=0.029..0.032 rows=1 loops=1) -> Seq Scan on log (cost=0.00..1741852.36 rows=47044336 width=4) (actual time=0.023..0.023 rows=1 loops=1) -> Index Scan using log_pkey on log (cost=0.00..3.02 rows=1 width=180) (actual time=61.316..61.319 rows=1 loops=1) Index Cond: (logid = $0) -> Index Scan using useragents_pkey on useragents (cost=0.00..3.02 rows=1 width=96) (actual time=0.036..0.042 rows=1 loops=1) Index Cond: ("outer".useragent_id = useragents.useragent_id) Total runtime: 61.741 ms (9 rows) ...I tried tweaking cpu_*, work_mem, effective_cache and so on, but without any luck. 47 milion table is huge compared to useragents (I actually need to join the log with 3 similar to useragents tables, and create a view out of it). Also tried using LEFT/RIGHT JOINS insead of (inner) JOINs... Of course the database is freshly vacuum analyzed, and statistics are set at 50... My view of the problem is that planner ignores the "LIMIT" part. It assumes it _needs_ to return all 47 million rows joined with the useragents table, so the hashjoin is the only sane approach. But chances are that unless I'll use LIMIT 20, the nested loop will be much faster. Any ideas how to make it work (other than rewriting the query to use subselects, use explicit id-rows, disabling hashjoin completely)? Or is this a bug? Regards, Dawid ---
Re: [PERFORM] Planner doesn't look at LIMIT?
Which row do you want ? Do you want 'a row' at random ? I presume you want the N latest rows ? In that case you should use an ORDER BY on an indexed field, the serial primary key will do nicely (ORDER BY id DESC) ; it's indexed so it will use the index and it will fly. Any ideas how to make it work (other than rewriting the query to use subselects, use explicit id-rows, disabling hashjoin completely)? Or is this a bug? Regards, Dawid ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Planner doesn't look at LIMIT?
Dawid Kuroczko <[EMAIL PROTECTED]> writes: > qnex=# EXPLAIN SELECT * FROM log NATURAL JOIN useragents LIMIT 1; > Limit (cost=15912.20..15912.31 rows=1 width=272) >-> Hash Join (cost=15912.20..5328368.96 rows=47044336 width=272) > If I set enable_hashjoin=false: > qnex=# EXPLAIN ANALYZE SELECT * FROM log NATURAL LEFT JOIN useragents LIMIT 1; > Limit (cost=0.00..3.07 rows=1 width=272) (actual time=74.214..74.216 > rows=1 loops=1) >-> Nested Loop Left Join (cost=0.00..144295895.01 rows=47044336 > width=272) (actual time=74.204..74.204 rows=1 loops=1) This is quite strange. The nestloop plan definitely should be preferred in the context of the LIMIT, considering that it has far lower estimated cost. And it is preferred in simple tests for me. It seems there must be something specific to your installation that's causing the planner to go wrong. Can you develop a self-contained test case that behaves this way for you? I recall we saw a similar complaint a month or two back, but the complainant never followed up with anything useful for tracking down the problem. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[PERFORM] Another index question
Hi all, I am trying to do an update on a table but so far I can't seem to come up with a usable index. After my last question/thread the user 'PFC' recommended I store whether a file was to be backed up as either 't'(rue), 'f'(alse) or 'i'(nherit) to speed up changing files and sub directories under a given directory when it was toggled. I've more or less finished implementing this and it is certainly a LOT faster but I am hoping to make it just a little faster still with an Index. Tom Lane pointed out to me that I needed 'text_pattern_ops' on my 'file_parent_dir' column in the index if I wanted to do pattern matching (the C locale wasn't set). Now I have added an additional condition and I think this might be my problem. Here is a sample query I am trying to create my index for: UPDATE file_info_2 SET file_backup='i' WHERE file_backup!='i' AND file_parent_dir='/'; This would be an example of someone changing the backup state of the root of a partition. It could also be: UPDATE file_info_2 SET file_backup='i' WHERE file_backup!='i' AND file_parent_dir='/usr'; If, for example, the user was toggling the backup state of the '/usr' directory. I suspected that because I was using "file_backup!='i'" that maybe I was running into the same problem as before so I tried creating the index: tle-bu=> CREATE INDEX file_info_2_mupdate_idx ON file_info_2 (file_backup bpchar_pattern_ops, file_parent_dir text_pattern_ops); tle-bu=> EXPLAIN ANALYZE UPDATE file_info_2 SET file_backup='i' WHERE file_backup!='i' AND file_parent_dir~'^/'; QUERY PLAN - Seq Scan on file_info_2 (cost=0.00..13379.38 rows=1 width=134) (actual time=1623.819..1624.087 rows=4 loops=1) Filter: ((file_backup <> 'i'::bpchar) AND (file_parent_dir ~ '^/'::text)) Total runtime: 1628.053 ms (3 rows) This index wasn't used though, even when I set 'enable_seqscan' to 'OFF'. The column 'file_backup' is 'char(1)' and the column 'file_parent_dir' is 'text'. tle-bu=> \d file_info_2; \di file_info_2_mupdate_idx; Table "public.file_info_2" Column | Type | Modifiers -+--+-- file_group_name | text | file_group_uid | integer | not null file_mod_time | bigint | not null file_name | text | not null file_parent_dir | text | not null file_perm | integer | not null file_size | bigint | not null file_type | character(1) | not null file_user_name | text | file_user_uid | integer | not null file_backup | character(1) | not null default 'i'::bpchar file_display| character(1) | not null default 'i'::bpchar file_restore| character(1) | not null default 'i'::bpchar Indexes: "file_info_2_mupdate_idx" btree (file_backup bpchar_pattern_ops, file_parent_dir text_pattern_ops) "file_info_2_supdate_idx" btree (file_parent_dir, file_name, file_type) List of relations Schema | Name | Type | Owner |Table +-+---+-+- public | file_info_2_mupdate_idx | index | madison | file_info_2 (1 row) Could it be that there needs to be a certain number of "file_backup!='i'" before the planner will use the index? I have also tried not defining an op_class on both tables (and one at a time) but I can't seem to figure this out. As always, thank you! Madison ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Planner doesn't look at LIMIT?
Dawid Kuroczko wrote: >work_mem = 102400 >...I tried tweaking cpu_*, work_mem, effective_cache and so on, but without >any luck. I'm hoping you didn't tweak it enough! I posted something similar this a while ago, but haven't since got around to figuring out a useful test case to send to the list. Try reducing your work_mem down to 1000 or so and things should start doing what you expect. Sam ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Solved (was: Re: [PERFORM] Another index question)
Line noise, sorry... After posting I went back to reading the pgsql docs and saw the query: SELECT am.amname AS index_method, opc.opcname AS opclass_name, opr.oprname AS opclass_operator FROM pg_am am, pg_opclass opc, pg_amop amop, pg_operator opr WHERE opc.opcamid = am.oid AND amop.amopclaid = opc.oid AND amop.amopopr = opr.oid ORDER BY index_method, opclass_name, opclass_operator; Which listed all the op_classes. I noticed none of the opclass_operators supported '!=' so I wondered if that was simply an unindexable (is that a word?) operator. So I tried creating the index: tle-bu=> CREATE INDEX file_info_2_mupdate_idx ON file_info_2 (file_backup, file_parent_dir text_pattern_ops); And changing my query to: tle-bu=> EXPLAIN ANALYZE UPDATE file_info_2 SET file_backup='i' WHERE file_backup='t' OR file_backup='f' AND file_parent_dir~'^/'; QUERY PLAN - Index Scan using file_info_2_mupdate_idx, file_info_2_mupdate_idx on file_info_2 (cost=0.00..10.04 rows=1 width=134) (actual time=0.112..0.718 rows=4 loops=1) Index Cond: ((file_backup = 't'::bpchar) OR ((file_backup = 'f'::bpchar) AND (file_parent_dir ~>=~ '/'::text) AND (file_parent_dir ~<~ '0'::text))) Filter: ((file_backup = 't'::bpchar) OR ((file_backup = 'f'::bpchar) AND (file_parent_dir ~ '^/'::text))) Total runtime: 60.359 ms (4 rows) Bingo! Hopefully someone might find this useful in the archives. :p Madison Madison Kelly wrote: Hi all, I am trying to do an update on a table but so far I can't seem to come up with a usable index. After my last question/thread the user 'PFC' recommended I store whether a file was to be backed up as either 't'(rue), 'f'(alse) or 'i'(nherit) to speed up changing files and sub directories under a given directory when it was toggled. I've more or less finished implementing this and it is certainly a LOT faster but I am hoping to make it just a little faster still with an Index. Tom Lane pointed out to me that I needed 'text_pattern_ops' on my 'file_parent_dir' column in the index if I wanted to do pattern matching (the C locale wasn't set). Now I have added an additional condition and I think this might be my problem. Here is a sample query I am trying to create my index for: UPDATE file_info_2 SET file_backup='i' WHERE file_backup!='i' AND file_parent_dir='/'; This would be an example of someone changing the backup state of the root of a partition. It could also be: UPDATE file_info_2 SET file_backup='i' WHERE file_backup!='i' AND file_parent_dir='/usr'; If, for example, the user was toggling the backup state of the '/usr' directory. I suspected that because I was using "file_backup!='i'" that maybe I was running into the same problem as before so I tried creating the index: tle-bu=> CREATE INDEX file_info_2_mupdate_idx ON file_info_2 (file_backup bpchar_pattern_ops, file_parent_dir text_pattern_ops); tle-bu=> EXPLAIN ANALYZE UPDATE file_info_2 SET file_backup='i' WHERE file_backup!='i' AND file_parent_dir~'^/'; QUERY PLAN - Seq Scan on file_info_2 (cost=0.00..13379.38 rows=1 width=134) (actual time=1623.819..1624.087 rows=4 loops=1) Filter: ((file_backup <> 'i'::bpchar) AND (file_parent_dir ~ '^/'::text)) Total runtime: 1628.053 ms (3 rows) This index wasn't used though, even when I set 'enable_seqscan' to 'OFF'. The column 'file_backup' is 'char(1)' and the column 'file_parent_dir' is 'text'. tle-bu=> \d file_info_2; \di file_info_2_mupdate_idx; Table "public.file_info_2" Column | Type | Modifiers -+--+-- file_group_name | text | file_group_uid | integer | not null file_mod_time | bigint | not null file_name | text | not null file_parent_dir | text | not null file_perm | integer | not null file_size | bigint | not null file_type | character(1) | not null file_user_name | text | file_user_uid | integer | not null file_backup | character(1) | not null default 'i'::bpchar file_display| character(1) | not null default 'i'::bpchar file_restore| character(1) | not null default 'i'::bpchar Indexes: "file_info_2_mupdate_idx" btree (file_backup bpchar_pattern_ops, file_parent_dir text_pattern_ops) "file_info_2_supdate_idx" btree (file_parent_dir, file_name, file_type) List of relations Schema | Name | Type | Owner |Table +-+---+-+- public | file_info_2_mupdate_idx | index | madison | file_info_2 (1 row) C
Re: [PERFORM] Planner doesn't look at LIMIT?
On 7/22/05, Tom Lane <[EMAIL PROTECTED]> wrote: > Dawid Kuroczko <[EMAIL PROTECTED]> writes: > > qnex=# EXPLAIN SELECT * FROM log NATURAL JOIN useragents LIMIT 1; > > > Limit (cost=15912.20..15912.31 rows=1 width=272) > >-> Hash Join (cost=15912.20..5328368.96 rows=47044336 width=272) > > This is quite strange. The nestloop plan definitely should be preferred > in the context of the LIMIT, considering that it has far lower estimated > cost. And it is preferred in simple tests for me. It seems there must > be something specific to your installation that's causing the planner to > go wrong. Can you develop a self-contained test case that behaves this > way for you? Why, certainly. I did test it also on Gentoo Linux PostgreSQL 8.0.1 (yeah, a bit older one), but the behaviour is the same. The test looks like this: -- First lets make a "small" lookup table -- 40 rows. CREATE TABLE lookup ( lookup_id serial PRIMARY KEY, value integer NOT NULL ); INSERT INTO lookup (value) SELECT * FROM generate_series(1, 40); VACUUM ANALYZE lookup; -- Then lets make a huge data table... CREATE TABLE huge_data ( huge_data_id serial PRIMARY KEY, lookup_id integer NOT NULL ); INSERT INTO huge_data (lookup_id) SELECT lookup_id FROM lookup; INSERT INTO huge_data (lookup_id) SELECT lookup_id FROM huge_data; --800 000 INSERT INTO huge_data (lookup_id) SELECT lookup_id FROM huge_data; -- 1 600 000 INSERT INTO huge_data (lookup_id) SELECT lookup_id FROM huge_data; -- 3 200 000 INSERT INTO huge_data (lookup_id) SELECT lookup_id FROM huge_data; -- 6 400 000 INSERT INTO huge_data (lookup_id) SELECT lookup_id FROM huge_data; -- 12 800 000 -- You may want to put ANALYZE and EXPLAIN between each of these -- steps. In my cases, at 12.8 mln rows PostgreSQL seems to go for hashjoin -- in each case. YMMV, so you may try to push it up to 1024 mln rows. INSERT INTO huge_data (lookup_id) SELECT lookup_id FROM huge_data; -- 25 600 000 ANALYZE huge_data; EXPLAIN SELECT * FROM huge_data NATURAL JOIN lookup LIMIT 1; My EXPLAIN FROM Linux (SMP P-III box), with PostgreSQL 8.0.1, during making this test case: qnex=# EXPLAIN SELECT * FROM huge_data NATURAL JOIN lookup LIMIT 1; QUERY PLAN Limit (cost=0.00..3.21 rows=1 width=12) -> Nested Loop (cost=0.00..19557596.04 rows=6094777 width=12) -> Seq Scan on huge_data (cost=0.00..95372.42 rows=6399942 width=8) -> Index Scan using lookup_pkey on lookup (cost=0.00..3.02 rows=1 width=8) Index Cond: ("outer".lookup_id = lookup.lookup_id) (5 rows) Time: 4,333 ms qnex=# INSERT INTO huge_data (lookup_id) SELECT lookup_id FROM huge_data; -- 12 800 000 INSERT 0 640 Time: 501014,692 ms qnex=# ANALYZE huge_data; ANALYZE Time: 4243,453 ms qnex=# EXPLAIN SELECT * FROM huge_data NATURAL JOIN lookup LIMIT 1; QUERY PLAN --- Limit (cost=11719.00..11719.09 rows=1 width=12) -> Hash Join (cost=11719.00..1212739.73 rows=12800185 width=12) Hash Cond: ("outer".lookup_id = "inner".lookup_id) -> Seq Scan on huge_data (cost=0.00..190747.84 rows=12800184 width=8) -> Hash (cost=5961.00..5961.00 rows=40 width=8) -> Seq Scan on lookup (cost=0.00..5961.00 rows=40 width=8) (6 rows) Regards, Dawid ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Planner doesn't look at LIMIT?
I wrote: > Dawid Kuroczko <[EMAIL PROTECTED]> writes: >> qnex=# EXPLAIN SELECT * FROM log NATURAL JOIN useragents LIMIT 1; >> Limit (cost=15912.20..15912.31 rows=1 width=272) >> -> Hash Join (cost=15912.20..5328368.96 rows=47044336 width=272) >> If I set enable_hashjoin=false: >> qnex=# EXPLAIN ANALYZE SELECT * FROM log NATURAL LEFT JOIN useragents LIMIT >> 1; >> Limit (cost=0.00..3.07 rows=1 width=272) (actual time=74.214..74.216 >> rows=1 loops=1) >> -> Nested Loop Left Join (cost=0.00..144295895.01 rows=47044336 >> width=272) (actual time=74.204..74.204 rows=1 loops=1) > This is quite strange. The nestloop plan definitely should be preferred > in the context of the LIMIT, considering that it has far lower estimated > cost. And it is preferred in simple tests for me. After a suitable period of contemplating my navel, I figured out what is going on here: the total costs involved are large enough that the still-fairly-high startup cost of the hash is disregarded by compare_fuzzy_path_costs(), and so the nestloop is discarded as not having any significant potential advantage in startup time. I think that this refutes the original scheme of using the same fuzz factor for both startup and total cost comparisons, and therefore propose the attached patch. Comments? regards, tom lane *** src/backend/optimizer/util/pathnode.c.orig Fri Jul 15 13:09:25 2005 --- src/backend/optimizer/util/pathnode.c Fri Jul 22 12:08:25 2005 *** *** 98,157 static int compare_fuzzy_path_costs(Path *path1, Path *path2, CostSelector criterion) { - Costfuzz; - /* !* The fuzz factor is set at one percent of the smaller total_cost, !* but not less than 0.01 cost units (just in case total cost is !* zero). * * XXX does this percentage need to be user-configurable? */ - fuzz = Min(path1->total_cost, path2->total_cost) * 0.01; - fuzz = Max(fuzz, 0.01); - if (criterion == STARTUP_COST) { ! if (Abs(path1->startup_cost - path2->startup_cost) > fuzz) ! { ! if (path1->startup_cost < path2->startup_cost) ! return -1; ! else ! return +1; ! } /* * If paths have the same startup cost (not at all unlikely), * order them by total cost. */ ! if (Abs(path1->total_cost - path2->total_cost) > fuzz) ! { ! if (path1->total_cost < path2->total_cost) ! return -1; ! else ! return +1; ! } } else { ! if (Abs(path1->total_cost - path2->total_cost) > fuzz) ! { ! if (path1->total_cost < path2->total_cost) ! return -1; ! else ! return +1; ! } /* * If paths have the same total cost, order them by startup cost. */ ! if (Abs(path1->startup_cost - path2->startup_cost) > fuzz) ! { ! if (path1->startup_cost < path2->startup_cost) ! return -1; ! else ! return +1; ! } } return 0; } --- 98,138 static int compare_fuzzy_path_costs(Path *path1, Path *path2, CostSelector criterion) { /* !* We use a fuzz factor of 1% of the smaller cost. * * XXX does this percentage need to be user-configurable? */ if (criterion == STARTUP_COST) { ! if (path1->startup_cost > path2->startup_cost * 1.01) ! return +1; ! if (path2->startup_cost > path1->startup_cost * 1.01) ! return -1; /* * If paths have the same startup cost (not at all unlikely), * order them by total cost. */ ! if (path1->total_cost > path2->total_cost * 1.01) ! return +1; ! if (path2->total_cost > path1->total_cost * 1.01) ! return -1; } else { ! if (path1->total_cost > path2->total_cost * 1.01) ! return +1; ! if (path2->total_cost > path1->total_cost * 1.01) ! return -1; /* * If paths have the same total cost, order them by startup cost. */ ! if (path1->startup_cost > path2->startup_cost * 1.01) ! return +1; ! if (path2->startup_cost > path1->
Re: [PERFORM] [PATCHES] COPY FROM performance improvements
Here is the SCSI output: Web Server SCSI subsystem driver Revision: 1.00 megaraid: v1.18j (Release Date: Mon Jul 7 14:39:55 EDT 2003) megaraid: found 0x1028:0x000f:idx 0:bus 4:slot 3:func 0 scsi0 : Found a MegaRAID controller at 0xf883f000, IRQ: 18 scsi0 : Enabling 64 bit support megaraid: [412W:H406] detected 1 logical drives megaraid: supports extended CDBs. megaraid: channel[1] is raid. megaraid: channel[2] is raid. scsi0 : LSI Logic MegaRAID 412W 254 commands 15 targs 5 chans 7 luns Database Server SCSI subsystem driver Revision: 1.00 megaraid: v1.18j (Release Date: Mon Jul 7 14:39:55 EDT 2003) megaraid: found 0x101e:0x1960:idx 0:bus 5:slot 0:func 0 scsi0 : Found a MegaRAID controller at 0xf883f000, IRQ: 21 scsi0 : Enabling 64 bit support megaraid: [196T:3.33] detected 1 logical drives megaraid: supports extended CDBs. megaraid: channel[1] is raid. megaraid: channel[2] is raid. scsi0 : LSI Logic MegaRAID 196T 254 commands 15 targs 5 chans 7 luns Starting timer : 0 0 blk: queue c5f2d218, I/O limit 4095Mb (mask 0x) scsi0: scanning virtual channel 0 for logical drives. Vendor: MegaRAID Model: LD 0 RAID5 86G Rev: 196T Type: Direct-Access ANSI SCSI revision: 02 Starting timer : 0 0 The webserver is a 1U and it actually performs better on the IO than the database server even though the database server is running 6 disks versus 3. The database server is a PE (Power Edge) 6600 Database Server IO: [EMAIL PROTECTED] root]# /sbin/hdparm -tT /dev/sda /dev/sda: Timing buffer-cache reads: 1888 MB in 2.00 seconds = 944.00 MB/sec Timing buffered disk reads: 32 MB in 3.06 seconds = 10.46 MB/sec Second Database Server IO: [EMAIL PROTECTED] root]# /sbin/hdparm -tT /dev/sda /dev/sda: Timing buffer-cache reads: 1816 MB in 2.00 seconds = 908.00 MB/sec Timing buffered disk reads: 26 MB in 3.11 seconds = 8.36 MB/sec [EMAIL PROTECTED] root]# Which is just horrible. Sincerely, Joshua D. Drake Patrick Welche wrote: On Thu, Jul 21, 2005 at 09:19:04PM -0700, Luke Lonergan wrote: Joshua, On 7/21/05 7:53 PM, "Joshua D. Drake" <[EMAIL PROTECTED]> wrote: Well I know that isn't true at least not with ANY of the Dells my customers have purchased in the last 18 months. They are still really, really slow. That's too bad, can you cite some model numbers? SCSI? I would be interested too, given http://www.netbsd.org/cgi-bin/query-pr-single.pl?number=30531 Cheers, Patrick ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org -- Your PostgreSQL solutions company - Command Prompt, Inc. 1.800.492.2240 PostgreSQL Replication, Consulting, Custom Programming, 24x7 support Managed Services, Shared and Dedicated Hosting Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] [PERFORM] Planner doesn't look at LIMIT?
On Fri, 2005-07-22 at 12:20 -0400, Tom Lane wrote: > I think that this refutes the original scheme of using the same fuzz > factor for both startup and total cost comparisons, and therefore > propose the attached patch. > > Comments? Looks good. I think it explains a few other wierd perf reports also. Best Regards, Simon Riggs ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] [PERFORM] Planner doesn't look at LIMIT?
Simon Riggs <[EMAIL PROTECTED]> writes: > Looks good. I think it explains a few other wierd perf reports also. Could be. I went back to look at Sam Mason's report about three weeks ago, and it definitely seems to explain his issue. The "fuzzy cost comparison" logic is new in 8.0 so it hasn't had all that much testing... regards, tom lane ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] [PATCHES] COPY FROM performance improvements
On a single spindle: $ time dd if=/dev/zero of=bigfile bs=8k count=200 200+0 records in 200+0 records out real2m8.569s user0m0.725s sys 0m19.633s None of my drives are partitioned big enough for me to create 2x RAM sized files on a single disk. I have 16MB RAM and only 36GB drives. But here are some number for my 12-disk lvm2 striped volume. $ time dd if=/dev/zero of=bigfile3 bs=8k count=400 400+0 records in 400+0 records out real1m17.059s user0m1.479s sys 0m41.293s Mark On Thu, 21 Jul 2005 16:14:47 -0700 "Luke Lonergan" <[EMAIL PROTECTED]> wrote: > Cool! > > At what rate does your disk setup write sequential data, e.g.: > time dd if=/dev/zero of=bigfile bs=8k count=50 > > (sized for 2x RAM on a system with 2GB) > > BTW - the Compaq smartarray controllers are pretty broken on Linux from a > performance standpoint in our experience. We've had disastrously bad > results from the SmartArray 5i and 6 controllers on kernels from 2.4 -> > 2.6.10, on the order of 20MB/s. > > For comparison, the results on our dual opteron with a single LSI SCSI > controller with software RAID0 on a 2.6.10 kernel: > > [EMAIL PROTECTED] dbfast]$ time dd if=/dev/zero of=bigfile bs=8k > count=50 > 50+0 records in > 50+0 records out > > real0m24.702s > user0m0.077s > sys 0m8.794s > > Which calculates out to about 161MB/s. > > - Luke > > > On 7/21/05 2:55 PM, "Mark Wong" <[EMAIL PROTECTED]> wrote: > > > I just ran through a few tests with the v14 patch against 100GB of data > > from dbt3 and found a 30% improvement; 3.6 hours vs 5.3 hours. Just to > > give a few details, I only loaded data and started a COPY in parallel > > for each the data files: > > http://www.testing.osdl.org/projects/dbt3testing/results/fast_copy/ > > > > Here's a visual of my disk layout, for those familiar with the database > > schema: > > http://www.testing.osdl.org/projects/dbt3testing/results/fast_copy/layout-dev4 > > -010-dbt3.html > > > > I have 6 arrays of fourteen 15k rpm drives in a split-bus configuration > > attached to a 4-way itanium2 via 6 compaq smartarray pci-x controllers. > > > > Let me know if you have any questions. > > > > Mark > > > ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Planner doesn't look at LIMIT?
On 7/22/05, Tom Lane <[EMAIL PROTECTED]> wrote: > > This is quite strange. The nestloop plan definitely should be preferred > > in the context of the LIMIT, considering that it has far lower estimated > > cost. And it is preferred in simple tests for me. > > After a suitable period of contemplating my navel, I figured out > what is going on here: the total costs involved are large enough that > the still-fairly-high startup cost of the hash is disregarded by > compare_fuzzy_path_costs(), and so the nestloop is discarded as not > having any significant potential advantage in startup time. > > I think that this refutes the original scheme of using the same fuzz > factor for both startup and total cost comparisons, and therefore > propose the attached patch. > > Comments? Works great!!! With LIMIT below 4 000 000 rows (its 47-milion row table) it prefers nested loops, then it starts to introduce merge joins. Regards, Dawid ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org