[PERFORM] Index not used on group by

2005-09-27 Thread Andrey Repko
Hello all,

I have table ma_data, that contain above 30 rows.
   This table has primary key id, and field alias_id.
   I create index (btree)on this field.
   Set statistic:

 ALTER TABLE public.ma_data
   ALTER COLUMN alias_id SET STATISTICS 998;

   So, when I do something like
 SELECT alias_id FROM ma_data GROUP BY alias_id
   and have (with seq_scan off):
   
   Group  (cost=0.00..1140280.63 rows=32 width=4) (actual time=0.159..2640.090 
rows=32 loops=1)
  -  Index Scan using reference_9_fk on ma_data  (cost=0.00..1139526.57 
rows=301624 width=4) (actual time=0.120..1471.128 rows=301624 loops=1)
Total runtime: 2640.407 ms
   (3 rows)

   As I understand there are some problems with visibility of records,
   but some others DBMS used indexes without problems(for example
   FireBird)? Or maybe some another information be helpful for me and
   community.

-- 
С наилучшими пожеланиями,
  Репко Андрей Владимирович   mailto:[EMAIL PROTECTED]


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Query seem to slow if table have more than 200 million rows

2005-09-27 Thread Ahmad Fajar
Hi Qingqing,

I don't know whether the statistic got is bad or good, this is the
statistic:
scooby=# select a.relid, a.relname, b.indexrelid, b.indexrelname,
c.idx_scan, c.idx_tup_read, c.idx_tup_fetch,
scooby-# a.heap_blks_read, a.heap_blks_hit, a.idx_blks_read, a.idx_blks_hit,
scooby-# a.toast_blks_read, a.toast_blks_hit, a.tidx_blks_read,
a.tidx_blks_hit, b.idx_blks_read, b.idx_blks_hit
scooby-# from pg_statio_user_tables a, pg_statio_user_indexes b,
pg_stat_all_indexes c
scooby-# where a.relid=b.relid and a.relid=c.relid and
b.indexrelid=c.indexrelid and a.relname=b.relname and
scooby-# a.relname=c.relname and a.relname='fti_dict1';
  relid   |  relname  | indexrelid | indexrelname | idx_scan | idx_tup_read
| idx_tup_fetch | heap_blks_read | heap_blks_hit | idx
_blks_read | idx_blks_hit | toast_blks_read | toast_blks_hit |
tidx_blks_read | tidx_blks_hit | idx_blks_read | idx_blks_hit
--+---++--+--+--
+---++---+
---+--+-++--
--+---+---+--
 22880226 | fti_dict1 |   22880231 | idx_dict3|0 |0
| 0 |  0 | 0 |
 0 |0 | ||
|   | 0 |0
 22880226 | fti_dict1 |   22880230 | idx_dict2|7 |   592799
|592799 |  0 | 0 |
 0 |0 | ||
|   | 0 |0
 22880226 | fti_dict1 |   22880229 | idx_dict1|0 |0
| 0 |  0 | 0 |
 0 |0 | ||
|   | 0 |0
(3 rows)

I have try several time the query below with different keyword, but I just
got idx_tup_read and idx_tup_fetch changed, others keep zero. 
The Index are:
Ids (Idx_dict1), 
keywords (idx_dict2 varchar_ops),
keywords (idx_dict3 varchar_pattern_ops) == I use this index for query ...
keywords like 'blabla%', just for testing purpose

Regards,
ahmad fajar

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Qingqing Zhou
Sent: Selasa, 27 September 2005 8:43
To: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Query seem to slow if table have more than 200
million rows


Ahmad Fajar [EMAIL PROTECTED] wrote

 Select ids, keywords from dict where keywords='blabla' ('blabla' is a 
 single
 word);

 The table have 200 million rows, I have index the keywords field. On the
 first time my query seem to slow to get the result, about 15-60 sec to get
 the result. But if I repeat the query I will get fast result. My question 
 is
 why on the first time the query seem very slow.

 Table structure is quite simple:

 Ids bigint, keywords varchar(150), weight varchar(1), dpos int.


The first slowness is obviously caused by disk IOs. The second time is 
faster because all data pages it requires are already in buffer pool. 200 
million rows is not a problem for btree index, even if your client tool 
appends some spaces to your keywords at your insertion time, the ideal btree

is 5 to 6 layers high at most. Can you show the iostats of index from your 
statistics view? 
http://www.postgresql.org/docs/8.0/static/monitoring-stats.html#MONITORING-S
TATS-VIEWS

Regards,
Qingqing



---(end of broadcast)---
TIP 6: explain analyze is your friend


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Index not used on group by

2005-09-27 Thread Richard Huxton

Andrey Repko wrote:


I have table ma_data, that contain above 30 rows.
   This table has primary key id, and field alias_id.
   I create index (btree)on this field.
   Set statistic:

 ALTER TABLE public.ma_data
   ALTER COLUMN alias_id SET STATISTICS 998;

   So, when I do something like
 SELECT alias_id FROM ma_data GROUP BY alias_id


Why are you using GROUP BY without any aggregate functions?

What happens if you use something like
  SELECT DISTINCT alias_id FROM ma_data;

--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Index not used on group by

2005-09-27 Thread Андрей Репко
Здравствуйте Richard,

Tuesday, September 27, 2005, 1:48:15 PM, Вы писали:

RH Andrey Repko wrote:
 
 I have table ma_data, that contain above 30 rows.
This table has primary key id, and field alias_id.
I create index (btree)on this field.
Set statistic:
 
  ALTER TABLE public.ma_data
ALTER COLUMN alias_id SET STATISTICS 998;
 
So, when I do something like
  SELECT alias_id FROM ma_data GROUP BY alias_id

RH Why are you using GROUP BY without any aggregate functions?

RH What happens if you use something like
RHSELECT DISTINCT alias_id FROM ma_data;
sart_ma=# EXPLAIN ANALYZE SELECT DISTINCT alias_id FROM ma_data;
  QUERY PLAN

---
 Unique  (cost=65262.63..66770.75 rows=32 width=4) (actual 
time=16780.214..18250.761 rows=32 loops=1)
   -  Sort  (cost=65262.63..66016.69 rows=301624 width=4) (actual 
time=16780.204..17255.129 rows=301624 loops=1)
 Sort Key: alias_id
 -  Seq Scan on ma_data  (cost=0.00..37811.24 rows=301624 width=4) 
(actual time=6.896..15321.023 rows=301624 loops=1)
 Total runtime: 18292.542 ms
(5 rows)

sart_ma=# EXPLAIN ANALYZE SELECT alias_id FROM ma_data GROUP BY alias_id;
   QUERY PLAN
-
 HashAggregate  (cost=38565.30..38565.62 rows=32 width=4) (actual 
time=15990.863..15990.933 rows=32 loops=1)
   -  Seq Scan on ma_data  (cost=0.00..37811.24 rows=301624 width=4) (actual 
time=3.446..14572.141 rows=301624 loops=1)
 Total runtime: 15991.244 ms
(3 rows)

-- 
С наилучшими пожеланиями,
  Репко Андрей Владимирович   mailto:[EMAIL PROTECTED]


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Index not used on group by

2005-09-27 Thread Андрей Репко
Здравствуйте Richard,

Tuesday, September 27, 2005, 2:08:31 PM, Вы писали:


 sart_ma=# EXPLAIN ANALYZE SELECT alias_id FROM ma_data GROUP BY alias_id;
QUERY PLAN
 -
  HashAggregate  (cost=38565.30..38565.62 rows=32 width=4)
 (actual time=15990.863..15990.933 rows=32 loops=1)
-  Seq Scan on ma_data  (cost=0.00..37811.24 rows=301624
 width=4) (actual time=3.446..14572.141 rows=301624 loops=1)
  Total runtime: 15991.244 ms

RH OK - the planner thinks it's doing the right thing, your cost estimates
RH are way off. If you look back at where you got an index-scan, it's cost
RH was 1.1 million.
RHIndex Scan using reference_9_fk on ma_data (cost=0.00..1139526.57
But why PG scan _all_ the records in the table? As I understand we can
just select information from index, not scaning all the table? Of
course if we select ALL records from table index can't help us.
If I write something like:
SELECT (SELECT  alias_id  FROM ma_data WHERE alias_id =1 LIMIT 1)
UNION ALL SELECT (SELECT  alias_id  FROM ma_data WHERE alias_id =2 LIMIT 1)
UNION ALL SELECT (SELECT  alias_id  FROM ma_data WHERE alias_id =3 LIMIT 1)
UNION ALL SELECT (SELECT  alias_id  FROM ma_data WHERE alias_id =4 LIMIT 1)
UNION ALL SELECT (SELECT  alias_id  FROM ma_data WHERE alias_id =5 LIMIT 1)
UNION ALL SELECT (SELECT  alias_id  FROM ma_data WHERE alias_id =6 LIMIT 1)
UNION ALL SELECT (SELECT  alias_id  FROM ma_data WHERE alias_id =7 LIMIT 1)
UNION ALL SELECT (SELECT  alias_id  FROM ma_data WHERE alias_id =8 LIMIT 1)
UNION ALL SELECT (SELECT  alias_id  FROM ma_data WHERE alias_id =9 LIMIT 1)
...
UNION ALL SELECT (SELECT  alias_id  FROM ma_data WHERE alias_id=max_alias_id 
LIMIT 1)
It works better, much better.

RH That's way above the numbers for seq-scan+hash/sort, so if the cost
RH estimate was right PG would be making the right choice. Looks like you
RH need to check your configuration settings. Have you read:
RHhttp://www.powerpostgresql.com/PerfList
RH or
RHhttp://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html
Thanks.


-- 
С наилучшими пожеланиями,
  Репко Андрей Владимирович   mailto:[EMAIL PROTECTED]


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] PostgreSQL overall design

2005-09-27 Thread Gnanavel S
On 9/27/05, Abhijit Menon-Sen [EMAIL PROTECTED] wrote:
At 2005-09-27 15:20:05 +0530, [EMAIL PROTECTED] wrote: Can anyone please tell/point me where I can get the postgresql system layout (I've an interest to contribute).
http://www.postgresql.org/developer/codingAnd, in particular:http://www.postgresql.org/docs/faqs.FAQ_DEV.html
-- ams
Thanks. I'll go thru' the documentation.-- with regards,S.Gnanavel



Re: [PERFORM] Index not used on group by

2005-09-27 Thread Richard Huxton

Андрей Репко wrote:

RH What happens if you use something like
RHSELECT DISTINCT alias_id FROM ma_data;
sart_ma=# EXPLAIN ANALYZE SELECT DISTINCT alias_id FROM ma_data;
  QUERY PLAN

---
 Unique  (cost=65262.63..66770.75 rows=32 width=4) (actual 
time=16780.214..18250.761 rows=32 loops=1)
   -  Sort  (cost=65262.63..66016.69 rows=301624 width=4) (actual 
time=16780.204..17255.129 rows=301624 loops=1)
 Sort Key: alias_id
 -  Seq Scan on ma_data  (cost=0.00..37811.24 rows=301624 width=4) 
(actual time=6.896..15321.023 rows=301624 loops=1)
 Total runtime: 18292.542 ms



sart_ma=# EXPLAIN ANALYZE SELECT alias_id FROM ma_data GROUP BY alias_id;
   QUERY PLAN
-
 HashAggregate  (cost=38565.30..38565.62 rows=32 width=4) (actual 
time=15990.863..15990.933 rows=32 loops=1)
   -  Seq Scan on ma_data  (cost=0.00..37811.24 rows=301624 width=4) (actual 
time=3.446..14572.141 rows=301624 loops=1)
 Total runtime: 15991.244 ms


OK - the planner thinks it's doing the right thing, your cost estimates 
are way off. If you look back at where you got an index-scan, it's cost 
was 1.1 million.

  Index Scan using reference_9_fk on ma_data (cost=0.00..1139526.57

That's way above the numbers for seq-scan+hash/sort, so if the cost 
estimate was right PG would be making the right choice. Looks like you 
need to check your configuration settings. Have you read:

  http://www.powerpostgresql.com/PerfList
or
  http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html
--
  Richard Huxton
  Archonet Ltd


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Index not used on group by

2005-09-27 Thread Richard Huxton

Андрей Репко wrote:

Здравствуйте Richard,

Tuesday, September 27, 2005, 2:08:31 PM, Вы писали:




sart_ma=# EXPLAIN ANALYZE SELECT alias_id FROM ma_data GROUP BY alias_id;
  QUERY PLAN
-
HashAggregate  (cost=38565.30..38565.62 rows=32 width=4)
(actual time=15990.863..15990.933 rows=32 loops=1)
  -  Seq Scan on ma_data  (cost=0.00..37811.24 rows=301624
width=4) (actual time=3.446..14572.141 rows=301624 loops=1)
Total runtime: 15991.244 ms



RH OK - the planner thinks it's doing the right thing, your cost estimates
RH are way off. If you look back at where you got an index-scan, it's cost
RH was 1.1 million.
RHIndex Scan using reference_9_fk on ma_data (cost=0.00..1139526.57
But why PG scan _all_ the records in the table? As I understand we can
just select information from index, not scaning all the table? Of
course if we select ALL records from table index can't help us.


Actually, if you select more than 5-10% of the rows (in general) you are 
better off using a seq-scan.


PostgreSQL estimates the total cost of possible query plans and picks 
the cheapest. In your case your configuration settings seem to be 
pushing the cost of an index scan much higher than it is. So, it picks 
the sequential-scan.



If I write something like:
SELECT (SELECT  alias_id  FROM ma_data WHERE alias_id =1 LIMIT 1)
UNION ALL SELECT (SELECT  alias_id  FROM ma_data WHERE alias_id =2 LIMIT 1)
UNION ALL SELECT (SELECT  alias_id  FROM ma_data WHERE alias_id =3 LIMIT 1)
UNION ALL SELECT (SELECT  alias_id  FROM ma_data WHERE alias_id =4 LIMIT 1)
UNION ALL SELECT (SELECT  alias_id  FROM ma_data WHERE alias_id =5 LIMIT 1)
UNION ALL SELECT (SELECT  alias_id  FROM ma_data WHERE alias_id =6 LIMIT 1)
UNION ALL SELECT (SELECT  alias_id  FROM ma_data WHERE alias_id =7 LIMIT 1)
UNION ALL SELECT (SELECT  alias_id  FROM ma_data WHERE alias_id =8 LIMIT 1)
UNION ALL SELECT (SELECT  alias_id  FROM ma_data WHERE alias_id =9 LIMIT 1)
...
UNION ALL SELECT (SELECT  alias_id  FROM ma_data WHERE alias_id=max_alias_id 
LIMIT 1)
It works better, much better.


Of course - it will always choose index queries here - it can see you 
are only fetching one row in each subquery.


Correct your configuration settings so PG estimates the cost of an index 
 query correctly and all should be well.

--
  Richard Huxton
  Archonet Ltd


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] slow database, queries accumulating

2005-09-27 Thread Dario
I have read that 600 connections are a LOT (somebody correct me please if
I'm wrong), since each connections requires a process and your server must
serve this. Besides the overhead involved, you will end up with 1200
megabytes of sort_mem allocated (probably idle most of time)...

pgpool allows you to reuse process (similar to oracle shared servers). Fact:
I didn't have the need to use it. AFAICS, it's easy to use. (I'll try to
make it work and I'll share tests, but dunno know when)

long life, little spam and prosperity


-Mensaje original-
De: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] nombre de Anjan Dave
Enviado el: viernes, 23 de septiembre de 2005 13:02
Para: pgsql-performance@postgresql.org
Asunto: [PERFORM] slow database, queries accumulating


Hi

We are experiencing consistent slowness on the database for one application.
This is more a reporting type of application, heavy on the bytea data type
usage (gets rendered into PDFs in the app server). A lot of queries, mostly
selects and a few random updates, get accumulated on the server - with
increasing volume of users on the application. Below is a snapshot of top,
with about 80 selects and 3 or 4 updates. Things get better eventually if I
cancel (SIGINT) some of the oldest queries. I also see a few instances of
shared locks not being granted during this time.I don't even see high iowait
or memory starvation during these times, as indicated by top.

-bash-2.05b$ psql -c select * from pg_locks; dbname | grep f
  |  |77922136 | 16761 | ShareLock| f



We (development) are looking into the query optimization (explain analyze,
indexes, etc), and my understanding is that the queries when run for explain
analyze execute fast, but during busy times, they become quite slow, taking
from a few seconds to a few minutes to execute. I do see in the log that
almost all queries do have either ORDER BY, or GROUP BY, or DISTINCT. Does
it hurt to up the sort_mem to 3MB or 4MB? Should I up the
effective_cache_size to 5 or 6GB? The app is does not need a lot of
connections on the database, I can reduce it down from 600.

Based on the description above and the configuration below does any thing
appear bad in config? Is there anything I can try in the configuration to
improve performance?


The database size is about 4GB.
This is PG 7.4.7, RHAS3.0 (u5), Local 4 spindle RAID10 (15KRPM), and logs on
a separate set of drives, RAID10. 6650 server, 4 x XEON, 12GB RAM.
Vacuum is done every night, full vacuum done once a week.
I had increased the shared_buffers and sort_memory recently, which didn't
help.

Thanks,
Anjan




10:44:51  up 14 days, 13:38,  2 users,  load average: 0.98, 1.14, 1.12
264 processes: 257 sleeping, 7 running, 0 zombie, 0 stopped
CPU states:  cpuusernice  systemirq  softirq  iowaitidle
   total   14.4%0.0%7.4%   0.0% 0.0%0.0%   77.9%
   cpu00   15.7%0.0%5.7%   0.0% 0.1%0.0%   78.2%
   cpu01   15.1%0.0%7.5%   0.0% 0.0%0.1%   77.0%
   cpu02   10.5%0.0%5.9%   0.0% 0.0%0.0%   83.4%
   cpu039.9%0.0%5.9%   0.0% 0.0%0.0%   84.0%
   cpu047.9%0.0%3.7%   0.0% 0.0%0.0%   88.2%
   cpu05   19.3%0.0%   12.3%   0.0% 0.0%0.0%   68.3%
   cpu06   20.5%0.0%9.5%   0.0% 0.0%0.1%   69.7%
   cpu07   16.1%0.0%8.5%   0.0% 0.1%0.3%   74.7%
Mem:  12081736k av, 7881972k used, 4199764k free,   0k shrd,   82372k
buff
   4823496k actv, 2066260k in_d,2036k in_c
Swap: 4096532k av,   0k used, 4096532k free 6888900k
cached

  PID USER PRI  NI  SIZE  RSS SHARE STAT %CPU %MEM   TIME CPU COMMAND
16773 postgres  15   0  245M 245M  240M S 0.0  2.0   1:16   7 postmaster
16880 postgres  15   0  245M 245M  240M S 0.1  2.0   0:49   6 postmaster
16765 postgres  15   0  245M 245M  240M S 0.0  2.0   1:16   0 postmaster
16825 postgres  15   0  245M 245M  240M S 0.0  2.0   1:02   5 postmaster
16774 postgres  15   0  245M 245M  240M S 0.1  2.0   1:16   0 postmaster
16748 postgres  15   0  245M 245M  240M S 0.0  2.0   1:19   5 postmaster
16881 postgres  15   0  245M 245M  240M S 0.1  2.0   0:50   7 postmaster
16762 postgres  15   0  245M 245M  240M S 0.0  2.0   1:14   4 postmaster
.
.


max_connections = 600

shared_buffers = 3  #=234MB, up from 21760=170MB min 16, at least
max_connections*2, 8KB each
sort_mem = 2048 # min 64, size in KB
vacuum_mem = 32768  # up from 16384 min 1024, size in KB

# - Free Space Map -

#max_fsm_pages = 2  # min max_fsm_relations*16, 6 bytes each
#max_fsm_relations = 1000   # min 100, ~50 bytes each

#fsync = true   # turns forced synchronization on or off
#wal_sync_method = fsync# the default varies across platforms:
# fsync, 

Re: [HACKERS] [PERFORM] A Better External Sort?

2005-09-27 Thread Josh Berkus

Ron,

I've somehow missed part of this thread, which is a shame since this is 
an area of primary concern for me.


Your suggested algorithm seems to be designed to relieve I/O load by 
making more use of the CPU.   (if I followed it correctly).  However, 
that's not PostgreSQL's problem; currently for us external sort is a 
*CPU-bound* operation, half of which is value comparisons.  (oprofiles 
available if anyone cares)


So we need to look, instead, at algorithms which make better use of 
work_mem to lower CPU activity, possibly even at the expense of I/O.


--Josh Berkus

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] VACUUM FULL vs CLUSTER

2005-09-27 Thread Dario
you can see that at the end of vacuum log   (sorry for my english)
...
INFO:  free space map: 930 relations, 48827 pages stored; 60240 total pages
needed  -- NEEDED!
-- I have already configured in postgresql.conf, you can see it below
DETAIL:  Allocated FSM size: 1000 relations + 7 pages = 475 kB shared
memory.   -- ALLOCATED ACCORDING TO max_fsm_pages , etc
VACUUM

You probably must adjust your shared memory, coz the database need it, but
it depends on your database...

(I could be wrong, I'm learning postgresql, please, feel free to correct me)

-Mensaje original-
De: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] nombre de Stef
Enviado el: viernes, 23 de septiembre de 2005 14:18
Para: Bruno Wolff III
CC: Markus Benne; pgsql-performance@postgresql.org
Asunto: Re: [PERFORM] VACUUM FULL vs CLUSTER



Bruno Wolff III mentioned :
=  = If you have a proper FSM setting you shouldn't need to do vacuum
fulls
=  = (unless you have an older version of postgres where index bloat
might
=  = be an issue).

Thanks Alvaro and Bruno

I just want to clarify something that I also couldn't
find a clear cut answer for before.

What is a proper fsm setting?

Someone told me to set max_fsm_relations to the number of
relations in pg_class plus a few more to allow for new relations.
And max_fsm_pages to the number of rows in the biggest table I
want to vacuum, plus a few 1000's for extra room?

Where does this free space map sit? On the disk somewhere,
or in memory, or both.

I once set the max_fsm_pages very high by mistake, and postgres
then started up and used a _lot_ of shared memory, and I had to
increase shmmax. Is there abything to watch out for when bumping this
setting up a lot?

Kind Regards
Stefan

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


[PERFORM] Delphi connection ADO is slow

2005-09-27 Thread Everton



Hello,

My connection ADO is very, very, very slow 


My Delphi connection saw ADO is very slow. 
All SQL that I execute delaybig, I tested in pgadmin and the reply is 
instantaned, the problem this in the Delphi? 

Tanks!
No virus found in this outgoing message.
Checked by AVG Anti-Virus.
Version: 7.0.344 / Virus Database: 267.11.5/110 - Release Date: 22/09/2005

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] PostgreSQL overall design

2005-09-27 Thread Bruce Momjian

Have you read the developers FAQ?

---

Gnanavel S wrote:
 Hi All,
 
 Can anyone please tell/point me where I can get the postgresql system layout
 (I've an interest to contribute). I would also like to know the files
 involved for performing each task ( for eg when doing a select operation
 what is exactly happening in postgres along with the files).
 
 I was wandering inside the source for a while and I couldn't get a start
 point to go with.
 
 Need a clarification in copydir.c file of src/port directory, In the
 following snippet the destination directory is created first then the source
 directory is read. Suppose if I don't have permission to read the source,
 even then the destination directory would be created.
 I just want to know whether there is any reason for doing so?
 
 if (mkdir(todir, S_IRUSR | S_IWUSR | S_IXUSR) != 0)
 ereport(ERROR,
 (errcode_for_file_access(),
 errmsg(could not create directory \%s\: %m, todir)));
 
 xldir = AllocateDir(fromdir);
 if (xldir == NULL)
 ereport(ERROR,
 (errcode_for_file_access(),
 errmsg(could not open directory \%s\: %m, fromdir)));
 
 
 
 --
 with thanks  regards,
 S.Gnanavel
 Satyam Computer Services Ltd.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  pgman@candle.pha.pa.us   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] PostgreSQL overall design

2005-09-27 Thread Gnanavel S
On 9/27/05, Bruce Momjian pgman@candle.pha.pa.us wrote:
Have you read the developers FAQ?
Thanks Bruce. I'm going through that.
---
Gnanavel S wrote: Hi All, Can anyone please tell/point me where I can get the postgresql system layout (I've an interest to contribute). I would also like to know the files involved for performing each task ( for eg when doing a select operation
 what is exactly happening in postgres along with the files). I was wandering inside the source for a while and I couldn't get a start point to go with. Need a clarification in 
copydir.c file of src/port directory, In the following snippet the destination directory is created first then the source directory is read. Suppose if I don't have permission to read the source, even then the destination directory would be created.
 I just want to know whether there is any reason for doing so? if (mkdir(todir, S_IRUSR | S_IWUSR | S_IXUSR) != 0) ereport(ERROR, (errcode_for_file_access(), errmsg(could not create directory \%s\: %m, todir)));
 xldir = AllocateDir(fromdir); if (xldir == NULL) ereport(ERROR, (errcode_for_file_access(), errmsg(could not open directory \%s\: %m, fromdir)));
 -- with thanks  regards, S.Gnanavel Satyam Computer Services Ltd.--Bruce
Momjian|http://candle.pha.pa.uspgman@candle.pha.pa.us
|(610) 359-1001+If your life is a hard drive, |13 Roberts Road+Christ
can be your
backup.|Newtown
Square, Pennsylvania 19073-- with regards,S.Gnanavel


Re: [HACKERS] [PERFORM] A Better External Sort?

2005-09-27 Thread Ron Peacetree
From: Josh Berkus josh@agliodbs.com
ent: Sep 27, 2005 12:15 PM
To: Ron Peacetree [EMAIL PROTECTED] 
Subject: Re: [HACKERS] [PERFORM] A Better External Sort?

I've somehow missed part of this thread, which is a shame since this is 
an area of primary concern for me.

Your suggested algorithm seems to be designed to relieve I/O load by 
making more use of the CPU.   (if I followed it correctly).

The goal is to minimize all IO load.  Not just HD IO load, but also RAM
IO load.  Particularly random access IO load of any type (for instance:
the pointer chasing problem).

In addition, the design replaces explicit data or explicit key manipulation
with the creation of a smaller, far more CPU and IO efficient data
structure (essentially a CPU cache friendly Btree index) of the sorted
order of the data.

That Btree can be used to generate a physical reordering of the data
in one pass, but that's the weakest use for it.  The more powerful
uses involve allowing the Btree to persist and using it for more
efficient re-searches or combining it with other such Btrees (either as
a step in task distribution across multiple CPUs or as a more efficient
way to do things like joins by manipulating these Btrees rather than
the actual records.)


However, that's not PostgreSQL's problem; currently for us external
sort is a *CPU-bound* operation, half of which is value comparisons.
(oprofiles available if anyone cares)

So we need to look, instead, at algorithms which make better use of 
work_mem to lower CPU activity, possibly even at the expense of I/O.

I suspect that even the highly efficient sorting code we have is
suffering more pessimal CPU IO behavior than what I'm presenting.
Jim Gray's external sorting contest web site points out that memory IO
has become a serious problem for most of the contest entries.

Also, I'll bet the current code manipulates more data.

Finally, there's the possibilty of reusing the product of this work to a
degree and in ways that we can't with our current sorting code.


Now all we need is resources and time to create a prototype.
Since I'm not likely to have either any time soon, I'm hoping that
I'll be able to explain this well enough that others can test it.

*sigh* I _never_ have enough time or resources any more...
Ron
 
  

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [HACKERS] [PERFORM] A Better External Sort?

2005-09-27 Thread Jeffrey W. Baker
On Tue, 2005-09-27 at 13:15 -0400, Ron Peacetree wrote:

 That Btree can be used to generate a physical reordering of the data
 in one pass, but that's the weakest use for it.  The more powerful
 uses involve allowing the Btree to persist and using it for more
 efficient re-searches or combining it with other such Btrees (either as
 a step in task distribution across multiple CPUs or as a more efficient
 way to do things like joins by manipulating these Btrees rather than
 the actual records.)

Maybe you could describe some concrete use cases.  I can see what you
are getting at, and I can imagine some advantageous uses, but I'd like
to know what you are thinking.

Specifically I'd like to see some cases where this would beat sequential
scan.  I'm thinking that in your example of a terabyte table with a
column having only two values, all the queries I can think of would be
better served with a sequential scan.

Perhaps I believe this because you can now buy as much sequential I/O as
you want.  Random I/O is the only real savings.

-jwb



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[PERFORM] PostgreSQL overall design

2005-09-27 Thread Gnanavel S
Hi All,

 Can anyone please tell/point me where I can get the
postgresql system layout (I've an interest to contribute). I would also
like to know the files involved for performing each task ( for eg when
doing a select operation what is exactly happening in postgres along
with the files).

 I was wandering inside the source for a while and I couldn't get a start point to go with. 

Need a clarification in copydir.c file of src/port
directory, In the following snippet the destination directory is
created first then the source directory is read. Suppose if I don't
have permission to read the source, even then the destination directory
would be created.
I just want to know whether there is any reason for doing so?

 if (mkdir(todir, S_IRUSR | S_IWUSR | S_IXUSR) != 0)
 ereport(ERROR,

(errcode_for_file_access(),

errmsg(could not create directory \%s\: %m, todir)));

 xldir = AllocateDir(fromdir);
 if (xldir == NULL)
 ereport(ERROR,

(errcode_for_file_access(),

errmsg(could not open directory \%s\: %m, fromdir)));


-- with thanks  regards,S.GnanavelSatyam Computer Services Ltd.


[PERFORM] The need for full vacuum / reindex

2005-09-27 Thread Tobias Brox
By occation, we dropped the whole production database and refreshed it from
a database backup - and all our performance problems seems to have gone.  I
suppose this means that to keep the database efficient, one eventually does
have to do reindexing and/or full vacuum from time to time?

-- 
Notice of Confidentiality: This email is sent unencrypted over the network,
and may be stored on several email servers; it can be read by third parties
as easy as a postcard.  Do not rely on email for confidential information.

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] The need for full vacuum / reindex

2005-09-27 Thread Bruno Wolff III
On Wed, Sep 28, 2005 at 05:33:27 +0200,
  Tobias Brox [EMAIL PROTECTED] wrote:
 By occation, we dropped the whole production database and refreshed it from
 a database backup - and all our performance problems seems to have gone.  I
 suppose this means that to keep the database efficient, one eventually does
 have to do reindexing and/or full vacuum from time to time?

Normally you only need to do that if you didn't vacuum often enough or with
high enough fsm setting and bloat has gotten out of hand to the point that
you need to recover some space.

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq