[PERFORM] Index not used on group by
Hello all, I have table ma_data, that contain above 30 rows. This table has primary key id, and field alias_id. I create index (btree)on this field. Set statistic: ALTER TABLE public.ma_data ALTER COLUMN alias_id SET STATISTICS 998; So, when I do something like SELECT alias_id FROM ma_data GROUP BY alias_id and have (with seq_scan off): Group (cost=0.00..1140280.63 rows=32 width=4) (actual time=0.159..2640.090 rows=32 loops=1) - Index Scan using reference_9_fk on ma_data (cost=0.00..1139526.57 rows=301624 width=4) (actual time=0.120..1471.128 rows=301624 loops=1) Total runtime: 2640.407 ms (3 rows) As I understand there are some problems with visibility of records, but some others DBMS used indexes without problems(for example FireBird)? Or maybe some another information be helpful for me and community. -- С наилучшими пожеланиями, Репко Андрей Владимирович mailto:[EMAIL PROTECTED] ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Query seem to slow if table have more than 200 million rows
Hi Qingqing, I don't know whether the statistic got is bad or good, this is the statistic: scooby=# select a.relid, a.relname, b.indexrelid, b.indexrelname, c.idx_scan, c.idx_tup_read, c.idx_tup_fetch, scooby-# a.heap_blks_read, a.heap_blks_hit, a.idx_blks_read, a.idx_blks_hit, scooby-# a.toast_blks_read, a.toast_blks_hit, a.tidx_blks_read, a.tidx_blks_hit, b.idx_blks_read, b.idx_blks_hit scooby-# from pg_statio_user_tables a, pg_statio_user_indexes b, pg_stat_all_indexes c scooby-# where a.relid=b.relid and a.relid=c.relid and b.indexrelid=c.indexrelid and a.relname=b.relname and scooby-# a.relname=c.relname and a.relname='fti_dict1'; relid | relname | indexrelid | indexrelname | idx_scan | idx_tup_read | idx_tup_fetch | heap_blks_read | heap_blks_hit | idx _blks_read | idx_blks_hit | toast_blks_read | toast_blks_hit | tidx_blks_read | tidx_blks_hit | idx_blks_read | idx_blks_hit --+---++--+--+-- +---++---+ ---+--+-++-- --+---+---+-- 22880226 | fti_dict1 | 22880231 | idx_dict3|0 |0 | 0 | 0 | 0 | 0 |0 | || | | 0 |0 22880226 | fti_dict1 | 22880230 | idx_dict2|7 | 592799 |592799 | 0 | 0 | 0 |0 | || | | 0 |0 22880226 | fti_dict1 | 22880229 | idx_dict1|0 |0 | 0 | 0 | 0 | 0 |0 | || | | 0 |0 (3 rows) I have try several time the query below with different keyword, but I just got idx_tup_read and idx_tup_fetch changed, others keep zero. The Index are: Ids (Idx_dict1), keywords (idx_dict2 varchar_ops), keywords (idx_dict3 varchar_pattern_ops) == I use this index for query ... keywords like 'blabla%', just for testing purpose Regards, ahmad fajar -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Qingqing Zhou Sent: Selasa, 27 September 2005 8:43 To: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Query seem to slow if table have more than 200 million rows Ahmad Fajar [EMAIL PROTECTED] wrote Select ids, keywords from dict where keywords='blabla' ('blabla' is a single word); The table have 200 million rows, I have index the keywords field. On the first time my query seem to slow to get the result, about 15-60 sec to get the result. But if I repeat the query I will get fast result. My question is why on the first time the query seem very slow. Table structure is quite simple: Ids bigint, keywords varchar(150), weight varchar(1), dpos int. The first slowness is obviously caused by disk IOs. The second time is faster because all data pages it requires are already in buffer pool. 200 million rows is not a problem for btree index, even if your client tool appends some spaces to your keywords at your insertion time, the ideal btree is 5 to 6 layers high at most. Can you show the iostats of index from your statistics view? http://www.postgresql.org/docs/8.0/static/monitoring-stats.html#MONITORING-S TATS-VIEWS Regards, Qingqing ---(end of broadcast)--- TIP 6: explain analyze is your friend ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Index not used on group by
Andrey Repko wrote: I have table ma_data, that contain above 30 rows. This table has primary key id, and field alias_id. I create index (btree)on this field. Set statistic: ALTER TABLE public.ma_data ALTER COLUMN alias_id SET STATISTICS 998; So, when I do something like SELECT alias_id FROM ma_data GROUP BY alias_id Why are you using GROUP BY without any aggregate functions? What happens if you use something like SELECT DISTINCT alias_id FROM ma_data; -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Index not used on group by
Здравствуйте Richard, Tuesday, September 27, 2005, 1:48:15 PM, Вы писали: RH Andrey Repko wrote: I have table ma_data, that contain above 30 rows. This table has primary key id, and field alias_id. I create index (btree)on this field. Set statistic: ALTER TABLE public.ma_data ALTER COLUMN alias_id SET STATISTICS 998; So, when I do something like SELECT alias_id FROM ma_data GROUP BY alias_id RH Why are you using GROUP BY without any aggregate functions? RH What happens if you use something like RHSELECT DISTINCT alias_id FROM ma_data; sart_ma=# EXPLAIN ANALYZE SELECT DISTINCT alias_id FROM ma_data; QUERY PLAN --- Unique (cost=65262.63..66770.75 rows=32 width=4) (actual time=16780.214..18250.761 rows=32 loops=1) - Sort (cost=65262.63..66016.69 rows=301624 width=4) (actual time=16780.204..17255.129 rows=301624 loops=1) Sort Key: alias_id - Seq Scan on ma_data (cost=0.00..37811.24 rows=301624 width=4) (actual time=6.896..15321.023 rows=301624 loops=1) Total runtime: 18292.542 ms (5 rows) sart_ma=# EXPLAIN ANALYZE SELECT alias_id FROM ma_data GROUP BY alias_id; QUERY PLAN - HashAggregate (cost=38565.30..38565.62 rows=32 width=4) (actual time=15990.863..15990.933 rows=32 loops=1) - Seq Scan on ma_data (cost=0.00..37811.24 rows=301624 width=4) (actual time=3.446..14572.141 rows=301624 loops=1) Total runtime: 15991.244 ms (3 rows) -- С наилучшими пожеланиями, Репко Андрей Владимирович mailto:[EMAIL PROTECTED] ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Index not used on group by
Здравствуйте Richard, Tuesday, September 27, 2005, 2:08:31 PM, Вы писали: sart_ma=# EXPLAIN ANALYZE SELECT alias_id FROM ma_data GROUP BY alias_id; QUERY PLAN - HashAggregate (cost=38565.30..38565.62 rows=32 width=4) (actual time=15990.863..15990.933 rows=32 loops=1) - Seq Scan on ma_data (cost=0.00..37811.24 rows=301624 width=4) (actual time=3.446..14572.141 rows=301624 loops=1) Total runtime: 15991.244 ms RH OK - the planner thinks it's doing the right thing, your cost estimates RH are way off. If you look back at where you got an index-scan, it's cost RH was 1.1 million. RHIndex Scan using reference_9_fk on ma_data (cost=0.00..1139526.57 But why PG scan _all_ the records in the table? As I understand we can just select information from index, not scaning all the table? Of course if we select ALL records from table index can't help us. If I write something like: SELECT (SELECT alias_id FROM ma_data WHERE alias_id =1 LIMIT 1) UNION ALL SELECT (SELECT alias_id FROM ma_data WHERE alias_id =2 LIMIT 1) UNION ALL SELECT (SELECT alias_id FROM ma_data WHERE alias_id =3 LIMIT 1) UNION ALL SELECT (SELECT alias_id FROM ma_data WHERE alias_id =4 LIMIT 1) UNION ALL SELECT (SELECT alias_id FROM ma_data WHERE alias_id =5 LIMIT 1) UNION ALL SELECT (SELECT alias_id FROM ma_data WHERE alias_id =6 LIMIT 1) UNION ALL SELECT (SELECT alias_id FROM ma_data WHERE alias_id =7 LIMIT 1) UNION ALL SELECT (SELECT alias_id FROM ma_data WHERE alias_id =8 LIMIT 1) UNION ALL SELECT (SELECT alias_id FROM ma_data WHERE alias_id =9 LIMIT 1) ... UNION ALL SELECT (SELECT alias_id FROM ma_data WHERE alias_id=max_alias_id LIMIT 1) It works better, much better. RH That's way above the numbers for seq-scan+hash/sort, so if the cost RH estimate was right PG would be making the right choice. Looks like you RH need to check your configuration settings. Have you read: RHhttp://www.powerpostgresql.com/PerfList RH or RHhttp://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html Thanks. -- С наилучшими пожеланиями, Репко Андрей Владимирович mailto:[EMAIL PROTECTED] ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] PostgreSQL overall design
On 9/27/05, Abhijit Menon-Sen [EMAIL PROTECTED] wrote: At 2005-09-27 15:20:05 +0530, [EMAIL PROTECTED] wrote: Can anyone please tell/point me where I can get the postgresql system layout (I've an interest to contribute). http://www.postgresql.org/developer/codingAnd, in particular:http://www.postgresql.org/docs/faqs.FAQ_DEV.html -- ams Thanks. I'll go thru' the documentation.-- with regards,S.Gnanavel
Re: [PERFORM] Index not used on group by
Андрей Репко wrote: RH What happens if you use something like RHSELECT DISTINCT alias_id FROM ma_data; sart_ma=# EXPLAIN ANALYZE SELECT DISTINCT alias_id FROM ma_data; QUERY PLAN --- Unique (cost=65262.63..66770.75 rows=32 width=4) (actual time=16780.214..18250.761 rows=32 loops=1) - Sort (cost=65262.63..66016.69 rows=301624 width=4) (actual time=16780.204..17255.129 rows=301624 loops=1) Sort Key: alias_id - Seq Scan on ma_data (cost=0.00..37811.24 rows=301624 width=4) (actual time=6.896..15321.023 rows=301624 loops=1) Total runtime: 18292.542 ms sart_ma=# EXPLAIN ANALYZE SELECT alias_id FROM ma_data GROUP BY alias_id; QUERY PLAN - HashAggregate (cost=38565.30..38565.62 rows=32 width=4) (actual time=15990.863..15990.933 rows=32 loops=1) - Seq Scan on ma_data (cost=0.00..37811.24 rows=301624 width=4) (actual time=3.446..14572.141 rows=301624 loops=1) Total runtime: 15991.244 ms OK - the planner thinks it's doing the right thing, your cost estimates are way off. If you look back at where you got an index-scan, it's cost was 1.1 million. Index Scan using reference_9_fk on ma_data (cost=0.00..1139526.57 That's way above the numbers for seq-scan+hash/sort, so if the cost estimate was right PG would be making the right choice. Looks like you need to check your configuration settings. Have you read: http://www.powerpostgresql.com/PerfList or http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Index not used on group by
Андрей Репко wrote: Здравствуйте Richard, Tuesday, September 27, 2005, 2:08:31 PM, Вы писали: sart_ma=# EXPLAIN ANALYZE SELECT alias_id FROM ma_data GROUP BY alias_id; QUERY PLAN - HashAggregate (cost=38565.30..38565.62 rows=32 width=4) (actual time=15990.863..15990.933 rows=32 loops=1) - Seq Scan on ma_data (cost=0.00..37811.24 rows=301624 width=4) (actual time=3.446..14572.141 rows=301624 loops=1) Total runtime: 15991.244 ms RH OK - the planner thinks it's doing the right thing, your cost estimates RH are way off. If you look back at where you got an index-scan, it's cost RH was 1.1 million. RHIndex Scan using reference_9_fk on ma_data (cost=0.00..1139526.57 But why PG scan _all_ the records in the table? As I understand we can just select information from index, not scaning all the table? Of course if we select ALL records from table index can't help us. Actually, if you select more than 5-10% of the rows (in general) you are better off using a seq-scan. PostgreSQL estimates the total cost of possible query plans and picks the cheapest. In your case your configuration settings seem to be pushing the cost of an index scan much higher than it is. So, it picks the sequential-scan. If I write something like: SELECT (SELECT alias_id FROM ma_data WHERE alias_id =1 LIMIT 1) UNION ALL SELECT (SELECT alias_id FROM ma_data WHERE alias_id =2 LIMIT 1) UNION ALL SELECT (SELECT alias_id FROM ma_data WHERE alias_id =3 LIMIT 1) UNION ALL SELECT (SELECT alias_id FROM ma_data WHERE alias_id =4 LIMIT 1) UNION ALL SELECT (SELECT alias_id FROM ma_data WHERE alias_id =5 LIMIT 1) UNION ALL SELECT (SELECT alias_id FROM ma_data WHERE alias_id =6 LIMIT 1) UNION ALL SELECT (SELECT alias_id FROM ma_data WHERE alias_id =7 LIMIT 1) UNION ALL SELECT (SELECT alias_id FROM ma_data WHERE alias_id =8 LIMIT 1) UNION ALL SELECT (SELECT alias_id FROM ma_data WHERE alias_id =9 LIMIT 1) ... UNION ALL SELECT (SELECT alias_id FROM ma_data WHERE alias_id=max_alias_id LIMIT 1) It works better, much better. Of course - it will always choose index queries here - it can see you are only fetching one row in each subquery. Correct your configuration settings so PG estimates the cost of an index query correctly and all should be well. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] slow database, queries accumulating
I have read that 600 connections are a LOT (somebody correct me please if I'm wrong), since each connections requires a process and your server must serve this. Besides the overhead involved, you will end up with 1200 megabytes of sort_mem allocated (probably idle most of time)... pgpool allows you to reuse process (similar to oracle shared servers). Fact: I didn't have the need to use it. AFAICS, it's easy to use. (I'll try to make it work and I'll share tests, but dunno know when) long life, little spam and prosperity -Mensaje original- De: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] nombre de Anjan Dave Enviado el: viernes, 23 de septiembre de 2005 13:02 Para: pgsql-performance@postgresql.org Asunto: [PERFORM] slow database, queries accumulating Hi We are experiencing consistent slowness on the database for one application. This is more a reporting type of application, heavy on the bytea data type usage (gets rendered into PDFs in the app server). A lot of queries, mostly selects and a few random updates, get accumulated on the server - with increasing volume of users on the application. Below is a snapshot of top, with about 80 selects and 3 or 4 updates. Things get better eventually if I cancel (SIGINT) some of the oldest queries. I also see a few instances of shared locks not being granted during this time.I don't even see high iowait or memory starvation during these times, as indicated by top. -bash-2.05b$ psql -c select * from pg_locks; dbname | grep f | |77922136 | 16761 | ShareLock| f We (development) are looking into the query optimization (explain analyze, indexes, etc), and my understanding is that the queries when run for explain analyze execute fast, but during busy times, they become quite slow, taking from a few seconds to a few minutes to execute. I do see in the log that almost all queries do have either ORDER BY, or GROUP BY, or DISTINCT. Does it hurt to up the sort_mem to 3MB or 4MB? Should I up the effective_cache_size to 5 or 6GB? The app is does not need a lot of connections on the database, I can reduce it down from 600. Based on the description above and the configuration below does any thing appear bad in config? Is there anything I can try in the configuration to improve performance? The database size is about 4GB. This is PG 7.4.7, RHAS3.0 (u5), Local 4 spindle RAID10 (15KRPM), and logs on a separate set of drives, RAID10. 6650 server, 4 x XEON, 12GB RAM. Vacuum is done every night, full vacuum done once a week. I had increased the shared_buffers and sort_memory recently, which didn't help. Thanks, Anjan 10:44:51 up 14 days, 13:38, 2 users, load average: 0.98, 1.14, 1.12 264 processes: 257 sleeping, 7 running, 0 zombie, 0 stopped CPU states: cpuusernice systemirq softirq iowaitidle total 14.4%0.0%7.4% 0.0% 0.0%0.0% 77.9% cpu00 15.7%0.0%5.7% 0.0% 0.1%0.0% 78.2% cpu01 15.1%0.0%7.5% 0.0% 0.0%0.1% 77.0% cpu02 10.5%0.0%5.9% 0.0% 0.0%0.0% 83.4% cpu039.9%0.0%5.9% 0.0% 0.0%0.0% 84.0% cpu047.9%0.0%3.7% 0.0% 0.0%0.0% 88.2% cpu05 19.3%0.0% 12.3% 0.0% 0.0%0.0% 68.3% cpu06 20.5%0.0%9.5% 0.0% 0.0%0.1% 69.7% cpu07 16.1%0.0%8.5% 0.0% 0.1%0.3% 74.7% Mem: 12081736k av, 7881972k used, 4199764k free, 0k shrd, 82372k buff 4823496k actv, 2066260k in_d,2036k in_c Swap: 4096532k av, 0k used, 4096532k free 6888900k cached PID USER PRI NI SIZE RSS SHARE STAT %CPU %MEM TIME CPU COMMAND 16773 postgres 15 0 245M 245M 240M S 0.0 2.0 1:16 7 postmaster 16880 postgres 15 0 245M 245M 240M S 0.1 2.0 0:49 6 postmaster 16765 postgres 15 0 245M 245M 240M S 0.0 2.0 1:16 0 postmaster 16825 postgres 15 0 245M 245M 240M S 0.0 2.0 1:02 5 postmaster 16774 postgres 15 0 245M 245M 240M S 0.1 2.0 1:16 0 postmaster 16748 postgres 15 0 245M 245M 240M S 0.0 2.0 1:19 5 postmaster 16881 postgres 15 0 245M 245M 240M S 0.1 2.0 0:50 7 postmaster 16762 postgres 15 0 245M 245M 240M S 0.0 2.0 1:14 4 postmaster . . max_connections = 600 shared_buffers = 3 #=234MB, up from 21760=170MB min 16, at least max_connections*2, 8KB each sort_mem = 2048 # min 64, size in KB vacuum_mem = 32768 # up from 16384 min 1024, size in KB # - Free Space Map - #max_fsm_pages = 2 # min max_fsm_relations*16, 6 bytes each #max_fsm_relations = 1000 # min 100, ~50 bytes each #fsync = true # turns forced synchronization on or off #wal_sync_method = fsync# the default varies across platforms: # fsync,
Re: [HACKERS] [PERFORM] A Better External Sort?
Ron, I've somehow missed part of this thread, which is a shame since this is an area of primary concern for me. Your suggested algorithm seems to be designed to relieve I/O load by making more use of the CPU. (if I followed it correctly). However, that's not PostgreSQL's problem; currently for us external sort is a *CPU-bound* operation, half of which is value comparisons. (oprofiles available if anyone cares) So we need to look, instead, at algorithms which make better use of work_mem to lower CPU activity, possibly even at the expense of I/O. --Josh Berkus ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] VACUUM FULL vs CLUSTER
you can see that at the end of vacuum log (sorry for my english) ... INFO: free space map: 930 relations, 48827 pages stored; 60240 total pages needed -- NEEDED! -- I have already configured in postgresql.conf, you can see it below DETAIL: Allocated FSM size: 1000 relations + 7 pages = 475 kB shared memory. -- ALLOCATED ACCORDING TO max_fsm_pages , etc VACUUM You probably must adjust your shared memory, coz the database need it, but it depends on your database... (I could be wrong, I'm learning postgresql, please, feel free to correct me) -Mensaje original- De: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] nombre de Stef Enviado el: viernes, 23 de septiembre de 2005 14:18 Para: Bruno Wolff III CC: Markus Benne; pgsql-performance@postgresql.org Asunto: Re: [PERFORM] VACUUM FULL vs CLUSTER Bruno Wolff III mentioned : = = If you have a proper FSM setting you shouldn't need to do vacuum fulls = = (unless you have an older version of postgres where index bloat might = = be an issue). Thanks Alvaro and Bruno I just want to clarify something that I also couldn't find a clear cut answer for before. What is a proper fsm setting? Someone told me to set max_fsm_relations to the number of relations in pg_class plus a few more to allow for new relations. And max_fsm_pages to the number of rows in the biggest table I want to vacuum, plus a few 1000's for extra room? Where does this free space map sit? On the disk somewhere, or in memory, or both. I once set the max_fsm_pages very high by mistake, and postgres then started up and used a _lot_ of shared memory, and I had to increase shmmax. Is there abything to watch out for when bumping this setting up a lot? Kind Regards Stefan ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] Delphi connection ADO is slow
Hello, My connection ADO is very, very, very slow My Delphi connection saw ADO is very slow. All SQL that I execute delaybig, I tested in pgadmin and the reply is instantaned, the problem this in the Delphi? Tanks! No virus found in this outgoing message. Checked by AVG Anti-Virus. Version: 7.0.344 / Virus Database: 267.11.5/110 - Release Date: 22/09/2005 ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] PostgreSQL overall design
Have you read the developers FAQ? --- Gnanavel S wrote: Hi All, Can anyone please tell/point me where I can get the postgresql system layout (I've an interest to contribute). I would also like to know the files involved for performing each task ( for eg when doing a select operation what is exactly happening in postgres along with the files). I was wandering inside the source for a while and I couldn't get a start point to go with. Need a clarification in copydir.c file of src/port directory, In the following snippet the destination directory is created first then the source directory is read. Suppose if I don't have permission to read the source, even then the destination directory would be created. I just want to know whether there is any reason for doing so? if (mkdir(todir, S_IRUSR | S_IWUSR | S_IXUSR) != 0) ereport(ERROR, (errcode_for_file_access(), errmsg(could not create directory \%s\: %m, todir))); xldir = AllocateDir(fromdir); if (xldir == NULL) ereport(ERROR, (errcode_for_file_access(), errmsg(could not open directory \%s\: %m, fromdir))); -- with thanks regards, S.Gnanavel Satyam Computer Services Ltd. -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] PostgreSQL overall design
On 9/27/05, Bruce Momjian pgman@candle.pha.pa.us wrote: Have you read the developers FAQ? Thanks Bruce. I'm going through that. --- Gnanavel S wrote: Hi All, Can anyone please tell/point me where I can get the postgresql system layout (I've an interest to contribute). I would also like to know the files involved for performing each task ( for eg when doing a select operation what is exactly happening in postgres along with the files). I was wandering inside the source for a while and I couldn't get a start point to go with. Need a clarification in copydir.c file of src/port directory, In the following snippet the destination directory is created first then the source directory is read. Suppose if I don't have permission to read the source, even then the destination directory would be created. I just want to know whether there is any reason for doing so? if (mkdir(todir, S_IRUSR | S_IWUSR | S_IXUSR) != 0) ereport(ERROR, (errcode_for_file_access(), errmsg(could not create directory \%s\: %m, todir))); xldir = AllocateDir(fromdir); if (xldir == NULL) ereport(ERROR, (errcode_for_file_access(), errmsg(could not open directory \%s\: %m, fromdir))); -- with thanks regards, S.Gnanavel Satyam Computer Services Ltd.--Bruce Momjian|http://candle.pha.pa.uspgman@candle.pha.pa.us |(610) 359-1001+If your life is a hard drive, |13 Roberts Road+Christ can be your backup.|Newtown Square, Pennsylvania 19073-- with regards,S.Gnanavel
Re: [HACKERS] [PERFORM] A Better External Sort?
From: Josh Berkus josh@agliodbs.com ent: Sep 27, 2005 12:15 PM To: Ron Peacetree [EMAIL PROTECTED] Subject: Re: [HACKERS] [PERFORM] A Better External Sort? I've somehow missed part of this thread, which is a shame since this is an area of primary concern for me. Your suggested algorithm seems to be designed to relieve I/O load by making more use of the CPU. (if I followed it correctly). The goal is to minimize all IO load. Not just HD IO load, but also RAM IO load. Particularly random access IO load of any type (for instance: the pointer chasing problem). In addition, the design replaces explicit data or explicit key manipulation with the creation of a smaller, far more CPU and IO efficient data structure (essentially a CPU cache friendly Btree index) of the sorted order of the data. That Btree can be used to generate a physical reordering of the data in one pass, but that's the weakest use for it. The more powerful uses involve allowing the Btree to persist and using it for more efficient re-searches or combining it with other such Btrees (either as a step in task distribution across multiple CPUs or as a more efficient way to do things like joins by manipulating these Btrees rather than the actual records.) However, that's not PostgreSQL's problem; currently for us external sort is a *CPU-bound* operation, half of which is value comparisons. (oprofiles available if anyone cares) So we need to look, instead, at algorithms which make better use of work_mem to lower CPU activity, possibly even at the expense of I/O. I suspect that even the highly efficient sorting code we have is suffering more pessimal CPU IO behavior than what I'm presenting. Jim Gray's external sorting contest web site points out that memory IO has become a serious problem for most of the contest entries. Also, I'll bet the current code manipulates more data. Finally, there's the possibilty of reusing the product of this work to a degree and in ways that we can't with our current sorting code. Now all we need is resources and time to create a prototype. Since I'm not likely to have either any time soon, I'm hoping that I'll be able to explain this well enough that others can test it. *sigh* I _never_ have enough time or resources any more... Ron ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] [PERFORM] A Better External Sort?
On Tue, 2005-09-27 at 13:15 -0400, Ron Peacetree wrote: That Btree can be used to generate a physical reordering of the data in one pass, but that's the weakest use for it. The more powerful uses involve allowing the Btree to persist and using it for more efficient re-searches or combining it with other such Btrees (either as a step in task distribution across multiple CPUs or as a more efficient way to do things like joins by manipulating these Btrees rather than the actual records.) Maybe you could describe some concrete use cases. I can see what you are getting at, and I can imagine some advantageous uses, but I'd like to know what you are thinking. Specifically I'd like to see some cases where this would beat sequential scan. I'm thinking that in your example of a terabyte table with a column having only two values, all the queries I can think of would be better served with a sequential scan. Perhaps I believe this because you can now buy as much sequential I/O as you want. Random I/O is the only real savings. -jwb ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
[PERFORM] PostgreSQL overall design
Hi All, Can anyone please tell/point me where I can get the postgresql system layout (I've an interest to contribute). I would also like to know the files involved for performing each task ( for eg when doing a select operation what is exactly happening in postgres along with the files). I was wandering inside the source for a while and I couldn't get a start point to go with. Need a clarification in copydir.c file of src/port directory, In the following snippet the destination directory is created first then the source directory is read. Suppose if I don't have permission to read the source, even then the destination directory would be created. I just want to know whether there is any reason for doing so? if (mkdir(todir, S_IRUSR | S_IWUSR | S_IXUSR) != 0) ereport(ERROR, (errcode_for_file_access(), errmsg(could not create directory \%s\: %m, todir))); xldir = AllocateDir(fromdir); if (xldir == NULL) ereport(ERROR, (errcode_for_file_access(), errmsg(could not open directory \%s\: %m, fromdir))); -- with thanks regards,S.GnanavelSatyam Computer Services Ltd.
[PERFORM] The need for full vacuum / reindex
By occation, we dropped the whole production database and refreshed it from a database backup - and all our performance problems seems to have gone. I suppose this means that to keep the database efficient, one eventually does have to do reindexing and/or full vacuum from time to time? -- Notice of Confidentiality: This email is sent unencrypted over the network, and may be stored on several email servers; it can be read by third parties as easy as a postcard. Do not rely on email for confidential information. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] The need for full vacuum / reindex
On Wed, Sep 28, 2005 at 05:33:27 +0200, Tobias Brox [EMAIL PROTECTED] wrote: By occation, we dropped the whole production database and refreshed it from a database backup - and all our performance problems seems to have gone. I suppose this means that to keep the database efficient, one eventually does have to do reindexing and/or full vacuum from time to time? Normally you only need to do that if you didn't vacuum often enough or with high enough fsm setting and bloat has gotten out of hand to the point that you need to recover some space. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq