Re: [PERFORM] sum of all values
Madison Kelly wrote: Richard Huxton wrote: Madison Kelly wrote: Hi all, Is there a fast(er) way to get the sum of all integer values for a certain condition over many thousands of rows? What I am currently doing is this (which takes ~5-10sec.): OK, I'm assuming you've configured PG to your satisfaction and this is the only query giving you problems. This is a program for general consumption (hopefully... eventually...) so I want to leave the psql config alone. Once I am happier with the program I will try different tuning options and write a faq though I expect 9 out of 10 users won't read it. PostgreSQL is not FireFox, and you can't expect it to work efficiently without doing at least some configuration. The settings to support 100 simultaneous connections on a dual-Opteron with 8GB RAM are not the same as on a single-user laptop. Take half an hour to read through the performance-tuning guide here: http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html SELECT SUM (a.file_size) FROM file_info_1 a, file_set_1 b WHERE a.file_name=b.fs_name AND a.file_parent_dir=b.fs_parent_dir AND a.file_type=b.fs_type AND b.fs_backup='t'; You'll want to run EXPLAIN ANALYSE SELECT SUM... and post the output of that, although the query looks straightforward enough. tle-bu=> EXPLAIN ANALYZE SELECT SUM (a.file_size) FROM file_info_1 a, file_set_1 b WHERE a.file_name=b.fs_name AND a.file_parent_dir=b.fs_parent_dir AND a.file_type=b.fs_type AND b.fs_backup='t'; QUERY PLAN Aggregate (cost=2202.54..2202.54 rows=1 width=8) (actual time=5078.744..5078.748 rows=1 loops=1) -> Merge Join (cost=724.94..2202.51 rows=11 width=8) (actual time=3281.677..4969.719 rows=12828 loops=1) Merge Cond: (("outer".file_parent_dir = "inner".fs_parent_dir) AND ("outer".file_name = "inner".fs_name) AND ("outer".file_type = "inner".fs_type)) -> Index Scan using file_info_1_search_idx on file_info_1 a (cost=0.00..1317.11 rows=12828 width=104) (actual time=0.042..116.825 rows=12828 loops=1) -> Sort (cost=724.94..740.97 rows=6414 width=96) (actual time=3281.516..3350.640 rows=12828 loops=1) Sort Key: b.fs_parent_dir, b.fs_name, b.fs_type -> Seq Scan on file_set_1 b (cost=0.00..319.35 rows=6414 width=96) (actual time=0.029..129.129 rows=12828 loops=1) Filter: (fs_backup = true) Total runtime: 5080.729 ms Well, it's slow, but that's probably your settings. Run VACUUM ANALYSE on the tables though, it looks like you've got default statistics (It's expecting exactly 1/2 the fs_backup values to be true - 6414 out of 12828). Here are the schemas, in case they help: tle-bu=> \d file_info_1Table "public.file_info_1" Column | Type | Modifiers -+-+ file_acc_time | bigint | not null file_group_name | text| not null file_group_uid | integer | not null file_mod_time | bigint | not null file_name | text| not null file_parent_dir | text| not null file_perm | text| not null file_size | bigint | not null file_type | text| not null default 'f'::text file_user_name | text| not null file_user_uid | integer | not null Indexes: "file_info_1_display_idx" btree (file_parent_dir, file_name, file_type) "file_info_1_search_idx" btree (file_parent_dir, file_name, file_type) tle-bu=> \d file_set_1 Table "public.file_set_1" Column | Type | Modifiers ---+-+ fs_backup | boolean | not null default true fs_display| boolean | not null default false fs_name | text| not null fs_parent_dir | text| not null fs_restore| boolean | not null default false fs_type | text| not null default 'f'::text Indexes: "file_set_1_sync_idx" btree (fs_parent_dir, fs_name, fs_type) 1. WHERE ARE YOUR PRIMARY KEYS??? 2. Why do you have two identical indexes on file_info_1 3. WHERE ARE YOUR PRIMARY KEYS??? 4. Am I right in thinking that always, file_name==fs_name (i.e. they represent the same piece of information) and if so, why are you storing it twice? Same for _parent_dir too 5. file_type/fs_type are being held as unbounded text? Not an index into some lookup table or a varchar(N)? Can you explain what you're trying to do here - it might be you want to alter your database design. -- Richard Huxton Archonet Ltd This is where I have to admit my novice level of knowledge. Until now I have been more concerned with "making it work". It is only now that I have finished (more or less) the program that I have started going back and trying to find ways to speed it up. I have not used postgres (or perl or anything) before this program. I hope my questions aren't too basic. ^.^; There's a rule of thumb about throwing the
Re: [PERFORM] sum of all values
Richard Huxton wrote: Madison Kelly wrote: Hi all, Is there a fast(er) way to get the sum of all integer values for a certain condition over many thousands of rows? What I am currently doing is this (which takes ~5-10sec.): OK, I'm assuming you've configured PG to your satisfaction and this is the only query giving you problems. This is a program for general consumption (hopefully... eventually...) so I want to leave the psql config alone. Once I am happier with the program I will try different tuning options and write a faq though I expect 9 out of 10 users won't read it. SELECT SUM (a.file_size) FROM file_info_1 a, file_set_1 b WHERE a.file_name=b.fs_name AND a.file_parent_dir=b.fs_parent_dir AND a.file_type=b.fs_type AND b.fs_backup='t'; You'll want to run EXPLAIN ANALYSE SELECT SUM... and post the output of that, although the query looks straightforward enough. tle-bu=> EXPLAIN ANALYZE SELECT SUM (a.file_size) FROM file_info_1 a, file_set_1 b WHERE a.file_name=b.fs_name AND a.file_parent_dir=b.fs_parent_dir AND a.file_type=b.fs_type AND b.fs_backup='t'; QUERY PLAN Aggregate (cost=2202.54..2202.54 rows=1 width=8) (actual time=5078.744..5078.748 rows=1 loops=1) -> Merge Join (cost=724.94..2202.51 rows=11 width=8) (actual time=3281.677..4969.719 rows=12828 loops=1) Merge Cond: (("outer".file_parent_dir = "inner".fs_parent_dir) AND ("outer".file_name = "inner".fs_name) AND ("outer".file_type = "inner".fs_type)) -> Index Scan using file_info_1_search_idx on file_info_1 a (cost=0.00..1317.11 rows=12828 width=104) (actual time=0.042..116.825 rows=12828 loops=1) -> Sort (cost=724.94..740.97 rows=6414 width=96) (actual time=3281.516..3350.640 rows=12828 loops=1) Sort Key: b.fs_parent_dir, b.fs_name, b.fs_type -> Seq Scan on file_set_1 b (cost=0.00..319.35 rows=6414 width=96) (actual time=0.029..129.129 rows=12828 loops=1) Filter: (fs_backup = true) Total runtime: 5080.729 ms (9 rows) Here are the schemas, in case they help: tle-bu=> \d file_info_1Table "public.file_info_1" Column | Type | Modifiers -+-+ file_acc_time | bigint | not null file_group_name | text| not null file_group_uid | integer | not null file_mod_time | bigint | not null file_name | text| not null file_parent_dir | text| not null file_perm | text| not null file_size | bigint | not null file_type | text| not null default 'f'::text file_user_name | text| not null file_user_uid | integer | not null Indexes: "file_info_1_display_idx" btree (file_parent_dir, file_name, file_type) "file_info_1_search_idx" btree (file_parent_dir, file_name, file_type) tle-bu=> \d file_set_1 Table "public.file_set_1" Column | Type | Modifiers ---+-+ fs_backup | boolean | not null default true fs_display| boolean | not null default false fs_name | text| not null fs_parent_dir | text| not null fs_restore| boolean | not null default false fs_type | text| not null default 'f'::text Indexes: "file_set_1_sync_idx" btree (fs_parent_dir, fs_name, fs_type) 1. WHERE ARE YOUR PRIMARY KEYS??? 2. Why do you have two identical indexes on file_info_1 3. WHERE ARE YOUR PRIMARY KEYS??? 4. Am I right in thinking that always, file_name==fs_name (i.e. they represent the same piece of information) and if so, why are you storing it twice? Same for _parent_dir too 5. file_type/fs_type are being held as unbounded text? Not an index into some lookup table or a varchar(N)? Can you explain what you're trying to do here - it might be you want to alter your database design. -- Richard Huxton Archonet Ltd This is where I have to admit my novice level of knowledge. Until now I have been more concerned with "making it work". It is only now that I have finished (more or less) the program that I have started going back and trying to find ways to speed it up. I have not used postgres (or perl or anything) before this program. I hope my questions aren't too basic. ^.^; I keep hearing about Primary Keys but I can't say that I know what they are or how they are used. If I do understand, it is a way to reference another table's entry (using a foreign key)? The two matching indexes is a typo in my program that I hadn't noticed, I'll fix that asap. Here is what the database is used for: This is a backup program and I use the DB to store extended information on all selected files and directories on a partition. Each partition has it's own 'file_info_#' and 'file_set_#' tables where '#' matches the ID stored for that partition in the DB in another table. The 'file_info_#' table stored the data that
Re: [PERFORM] sum of all values
Madison Kelly wrote: Hi all, Is there a fast(er) way to get the sum of all integer values for a certain condition over many thousands of rows? What I am currently doing is this (which takes ~5-10sec.): OK, I'm assuming you've configured PG to your satisfaction and this is the only query giving you problems. SELECT SUM (a.file_size) FROM file_info_1 a, file_set_1 b WHERE a.file_name=b.fs_name AND a.file_parent_dir=b.fs_parent_dir AND a.file_type=b.fs_type AND b.fs_backup='t'; You'll want to run EXPLAIN ANALYSE SELECT SUM... and post the output of that, although the query looks straightforward enough. Here are the schemas, in case they help: tle-bu=> \d file_info_1Table "public.file_info_1" Column | Type | Modifiers -+-+ file_acc_time | bigint | not null file_group_name | text| not null file_group_uid | integer | not null file_mod_time | bigint | not null file_name | text| not null file_parent_dir | text| not null file_perm | text| not null file_size | bigint | not null file_type | text| not null default 'f'::text file_user_name | text| not null file_user_uid | integer | not null Indexes: "file_info_1_display_idx" btree (file_parent_dir, file_name, file_type) "file_info_1_search_idx" btree (file_parent_dir, file_name, file_type) tle-bu=> \d file_set_1 Table "public.file_set_1" Column | Type | Modifiers ---+-+ fs_backup | boolean | not null default true fs_display| boolean | not null default false fs_name | text| not null fs_parent_dir | text| not null fs_restore| boolean | not null default false fs_type | text| not null default 'f'::text Indexes: "file_set_1_sync_idx" btree (fs_parent_dir, fs_name, fs_type) 1. WHERE ARE YOUR PRIMARY KEYS??? 2. Why do you have two identical indexes on file_info_1 3. WHERE ARE YOUR PRIMARY KEYS??? 4. Am I right in thinking that always, file_name==fs_name (i.e. they represent the same piece of information) and if so, why are you storing it twice? Same for _parent_dir too 5. file_type/fs_type are being held as unbounded text? Not an index into some lookup table or a varchar(N)? Can you explain what you're trying to do here - it might be you want to alter your database design. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to [EMAIL PROTECTED])