[PERFORM] expression (functional) index use in joins

2003-11-26 Thread Roger Ging
I just installed v7.4 and restored a database from v7.3.4.  I have an 
index based on a function that the planner is using on the old version, 
but doing seq scans on left joins in the new version.  I have run 
analyze on the table post restore. the query returns in less than 1 
second on version 7.3.4 and takes over 10 seconds on version 7.4.  Any 
help will be appreciated.

Roger Ging

Query:

SELECT L.row_id FROM music.logfile L LEFT JOIN music.program P ON
music.fn_mri_id_no_program(P.mri_id_no) = L.program_id
WHERE  L.station = UPPER('kabc')::VARCHAR
AND L.air_date = '04/12/2002'::TIMESTAMP
AND P.cutoff_date IS NULL
ORDER BY L.chron_start,L.chron_end;
planner results on 7.4:

 Sort  (cost=17595.99..17608.23 rows=4894 width=12)
   Sort Key: l.chron_start, l.chron_end
   -  Merge Left Join  (cost=17135.92..17296.07 rows=4894 width=12)
 Merge Cond: (outer.?column5? = inner.?column3?)
 Filter: (inner.cutoff_date IS NULL)
 -  Sort  (cost=1681.69..1682.73 rows=414 width=21)
   Sort Key: (l.program_id)::text
   -  Index Scan using idx_logfile_station_air_date on 
logfile l  (cost=0.00..1663.70 rows=414 width=21)
 Index Cond: (((station)::text = 'KABC'::text) AND 
(air_date = '2002-04-12 00:00:00'::timestamp without time zone))
 -  Sort  (cost=15454.22..15465.06 rows=4335 width=20)
   Sort Key: (music.fn_mri_id_no_program(p.mri_id_no))::text
   -  Seq Scan on program p  (cost=0.00..15192.35 
rows=4335 width=20)

planner results on 7.3.4:

 Sort  (cost=55765.51..55768.33 rows=1127 width=41)
   Sort Key: l.chron_start, l.chron_end
   -  Nested Loop  (cost=0.00..55708.36 rows=1127 width=41)
 Filter: (inner.cutoff_date IS NULL)
 -  Index Scan using idx_logfile_station_air_date on logfile l 
 (cost=0.00..71.34 rows=17 width=21)
   Index Cond: ((station = 'KABC'::character varying) AND 
(air_date = '2002-04-12 00:00:00'::timestamp without time zone))
 -  Index Scan using idx_program_mri_id_no_program on program 
p  (cost=0.00..3209.16 rows=870 width=20)
   Index Cond: (music.fn_mri_id_no_program(p.mri_id_no) = 
outer.program_id)

table Program details:

 Column |Type | Modifiers
+-+---
 record_id  | integer |
 title  | character varying(40)   |
 mri_id_no  | character varying(8)|
 ascap_cat  | character varying(1)|
 ascap_mult | numeric(5,3)|
 ascap_prod | character varying(10)   |
 npa_ind| character varying(3)|
 non_inc_in | character varying(1)|
 as_pr_su   | character varying(1)|
 as_1st_run | character varying(1)|
 as_cue_st  | character varying(1)|
 bmi_cat| character varying(2)|
 bmi_mult   | numeric(6,2)|
 bmi_prod   | character varying(7)|
 year   | integer |
 prog_type  | character varying(1)|
 total_ep   | integer |
 last_epis  | character varying(3)|
 syndicator | character varying(6)|
 station| character varying(4)|
 syn_loc| character varying(1)|
 spdb_ver   | character varying(4)|
 as_filed   | character varying(4)|
 bmidb_ver  | character varying(4)|
 cutoff_date| timestamp without time zone |
 effective_date | timestamp without time zone |
 program_id | character varying(5)|
Indexes:
idx_program_mri_id_no btree (mri_id_no)
idx_program_mri_id_no_program btree 
(music.fn_mri_id_no_program(mri_id_no))
idx_program_program_id btree (program_id)
program_mri_id_no btree (mri_id_no)
program_oid btree (oid)

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [PERFORM] Maximum Possible Insert Performance?

2003-11-26 Thread Tom Lane
William Yu [EMAIL PROTECTED] writes:
 I then tried to put the WAL directory onto a ramdisk. I turned off 
 swapping, created a tmpfs mount point and copied the pg_xlog directory 
 over. Everything looked fine as far as I could tell but Postgres just 
 panic'd with a file permissions error. Anybody have thoughts to why 
 tmpfs would not work?

I'd say you got the file or directory ownership or permissions wrong.

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] expression (functional) index use in joins

2003-11-26 Thread Richard Huxton
On Wednesday 26 November 2003 16:38, Roger Ging wrote:
 I just installed v7.4 and restored a database from v7.3.4.
[snip]

Hmm - you seem to be getting different row estimates in the plan. Can you 
re-analyse both versions and post EXPLAIN ANALYSE rather than just EXPLAIN?

 -  Seq Scan on program p  (cost=0.00..15192.35
 rows=4335 width=20)

 planner results on 7.3.4:

   -  Index Scan using idx_program_mri_id_no_program on program
 p  (cost=0.00..3209.16 rows=870 width=20)

-- 
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Maximum Possible Insert Performance?

2003-11-26 Thread William Yu
Tom Lane wrote:
William Yu [EMAIL PROTECTED] writes:

I then tried to put the WAL directory onto a ramdisk. I turned off 
swapping, created a tmpfs mount point and copied the pg_xlog directory 
over. Everything looked fine as far as I could tell but Postgres just 
panic'd with a file permissions error. Anybody have thoughts to why 
tmpfs would not work?


I'd say you got the file or directory ownership or permissions wrong.
I did a mv instead of a cp which duplicates ownership  permissions exactly.

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Maximum Possible Insert Performance?

2003-11-26 Thread Dror Matalon

But the permissions of the base ramdisk might be wrong. I'd su to the
user that you run postgres as (probably postgres), and make sure that
you can go to the directory where the log and the database files are and
make sure you can see the files.

On Wed, Nov 26, 2003 at 10:03:47AM -0800, William Yu wrote:
 Tom Lane wrote:
 William Yu [EMAIL PROTECTED] writes:
 
 I then tried to put the WAL directory onto a ramdisk. I turned off 
 swapping, created a tmpfs mount point and copied the pg_xlog directory 
 over. Everything looked fine as far as I could tell but Postgres just 
 panic'd with a file permissions error. Anybody have thoughts to why 
 tmpfs would not work?
 
 
 I'd say you got the file or directory ownership or permissions wrong.
 
 I did a mv instead of a cp which duplicates ownership  permissions exactly.
 
 
 ---(end of broadcast)---
 TIP 4: Don't 'kill -9' the postmaster

-- 
Dror Matalon
Zapatec Inc 
1700 MLK Way
Berkeley, CA 94709
http://www.fastbuzz.com
http://www.zapatec.com

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Followup - expression (functional) index use in joins

2003-11-26 Thread Richard Huxton
On Wednesday 26 November 2003 18:39, Roger Ging wrote:
 version 7.4 results:

 explain analyse SELECT L.row_id FROM music.logfile L LEFT JOIN
 music.program P ON
 music.fn_mri_id_no_program(P.mri_id_no) = L.program_id
 WHERE  L.station = UPPER('kabc')::VARCHAR
 AND L.air_date = '04/12/2002'::TIMESTAMP
 AND P.cutoff_date IS NULL
 ORDER BY L.chron_start,L.chron_end;

 -  Seq Scan on program p  (cost=0.00..15192.35
 rows=4335 width=20) (actual time=109.045..1955.882 rows=173998 loops=1)

The estimated number of rows here (4335) is *way* off (173998 actually). If 
you only had 4335 rows, then this might be a more sensible plan.

First step is to run:
  VACUUM ANALYSE program;
Then, check the definition of your function fn_mri_id_no_program() and make 
sure it is marked immutable/stable (depending on what it does) and that it's 
returning a varchar.


-- 
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


[PERFORM] For full text indexing, which is better, tsearch2 or fulltextindex

2003-11-26 Thread LIANHE SHAO
Hi all, 
Which one is better (performance/easier to use),
tsearch2 or fulltextindex? 
there is an example how to use fulltextindex in the
techdocs, but I checked the contrib/fulltextindex
package, there is a WARNING that fulltextindex is
much slower than tsearch2. but tsearch2 seems
complex to use, and I can not find a good example.
Which one I should use? Any suggestions?  

thanks and Regards,
William

- Original Message -
From: Hannu Krosing [EMAIL PROTECTED]
Date: Wednesday, November 26, 2003 5:33 pm
Subject: Re: [PERFORM] why index scan not working
when using 'like'?

 Tom Lane kirjutas T, 25.11.2003 kell 23:29:
  Josh Berkus [EMAIL PROTECTED] writes:
   In regular text fields containing words, your
problem is 
 solvable with full 
   text indexing (FTI).   Unfortunately, FTI is
not designed for 
 arbitrary 
   non-language strings.  It could be adapted,
but would require a 
 lot of 
   hacking.
  
  I'm not sure why you say that FTI isn't a usable
solution.  As 
 long as
  the gene symbols are separated by whitespace or
some other non-
 letters (eg, foo mif bar not foomifbar), I'd
think FTI would 
 work.
 If he wants to search on arbitrary substring, he
could change 
 tokeniserin FTI to produce trigrams, so that
foomifbar would be 
 indexed as if
 it were text foo oom omi mif ifb fba bar and
search for things like
 %mifb% should first do a FTI search for mif AND
ifb and then 
 simpleLIKE %mifb% to weed out something like mififb.
 
 There are ways to use trigrams for 1 and 2 letter
matches as well.
 
 -
 Hannu
 
 
 ---(end of
broadcast)---
 
 TIP 3: if posting/reading through Usenet, please
send an appropriate
  subscribe-nomail command to
[EMAIL PROTECTED] so that 
 your  message can get through to the mailing
list cleanly
 


---(end of broadcast)---
TIP 8: explain analyze is your friend


[PERFORM] cross table indexes or something?

2003-11-26 Thread Jeremiah Jahn
I was wondering if there is something I can do that would act similar to
a index over more than one table. 

I have about 3 million people in my DB at the moment, they all have
roles, and many of them have more than one name. 

for example, a Judge will only have one name, but a Litigant could have
multiple aliases. Things go far to slow when I do a query on a judge
named smith. Does any one know a possible way to speed this up? 

I would think that In a perfect world there would be a way to create an
index on commonly used joins, or something of that nature. I've tried
partial indexes, but the optimizer feels that it would be quicker to do
an index scan for smith% then join using the pkey of the person to get
their role. For litigants, this makes since, for non-litigants, this
doesn't. 

thanx for any insight,
-jj-

the basic schema

actor
actor_id PK
role_class_code

identity
actor_id FK
identity_id PK
full_name

event
event_date_time
event_id PK

event_actor
event_id FK
actor_id FK


explain select distinct actor.actor_id,court.id,court.name,role_class_code,full_name 
from actor,identity,court,event,event_actor where role_class_code = 'Judge' and 
full_name like 'SMITH%' and identity.actor_id = actor.actor_id and identity.court_ori 
= actor.court_ori and actor.court_ori = court.id and actor.actor_id = 
event_actor.actor_id and event_actor.event_id = event.event_id and event_date_time  
'20021126' order by full_name;
QUERY PLAN
--
 Unique  (cost=726.57..726.58 rows=1 width=92)
   -  Sort  (cost=726.57..726.57 rows=1 width=92)
 Sort Key: identity.full_name, actor.actor_id, court.id, court.name, 
actor.role_class_code
 -  Nested Loop  (cost=3.02..726.56 rows=1 width=92)
   -  Nested Loop  (cost=3.02..720.72 rows=1 width=144)
 -  Nested Loop  (cost=3.02..9.62 rows=1 width=117)
   Join Filter: ((outer.court_ori)::text = 
(inner.court_ori)::text)
   -  Hash Join  (cost=3.02..4.18 rows=1 width=93)
 Hash Cond: ((outer.id)::text = 
(inner.court_ori)::text)
 -  Seq Scan on court  (cost=0.00..1.10 rows=10 
width=34)
 -  Hash  (cost=3.01..3.01 rows=1 width=59)
   -  Index Scan using name_speed on identity  
(cost=0.00..3.01 rows=1 width=59)
 Index Cond: (((full_name)::text = 
'SMITH'::character varying) AND ((full_name)::text  'SMITI'::character varying))
 Filter: ((full_name)::text ~~ 
'SMITH%'::text)
   -  Index Scan using actor_speed on actor  (cost=0.00..5.43 
rows=1 width=50)
 Index Cond: ((outer.actor_id)::text = 
(actor.actor_id)::text)
 Filter: ((role_class_code)::text = 'Judge'::text)
 -  Index Scan using event_actor_speed on event_actor  
(cost=0.00..695.15 rows=1275 width=73)
   Index Cond: ((event_actor.actor_id)::text = 
(outer.actor_id)::text)
   -  Index Scan using event_pkey on event  (cost=0.00..5.83 rows=1 
width=52)
 Index Cond: ((outer.event_id)::text = (event.event_id)::text)
 Filter: (event_date_time  '20021126'::bpchar)


-- 
You can't make a program without broken egos.
-- 
Jeremiah Jahn [EMAIL PROTECTED]


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] very large db performance question

2003-11-26 Thread Neil Conway
LIANHE SHAO [EMAIL PROTECTED] writes:
 We will have a very large database to store microarray data (may
 exceed 80-100G some day). now we have 1G RAM, 2G Hz Pentium 4, 1
 CPU. and enough hard disk.

 Could anybody tell me that our hardware is an issue or not?

IMHO the size of the DB is less relevant than the query workload. For
example, if you're storying 100GB of data but only doing a single
index scan on it every 10 seconds, any modern machine with enough HD
space should be fine.

If you give us an idea of the # of queries you expect per second, the
approximate mix of reads and writes, and some idea of how complex the
queries are, we might be able to give you some better advice.

-Neil



---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] cross table indexes or something?

2003-11-26 Thread Hannu Krosing
Jeremiah Jahn kirjutas K, 26.11.2003 kell 22:14:
 I was wondering if there is something I can do that would act similar to
 a index over more than one table. 
 
 I have about 3 million people in my DB at the moment, they all have
 roles, and many of them have more than one name. 
 
 for example, a Judge will only have one name, but a Litigant could have
 multiple aliases. Things go far to slow when I do a query on a judge
 named smith.

If you dont need all the judges named smith you could try to use LIMIT.

Have you run ANALYZE ? Why does DB think that there is only one judge
with name like SMITH% ?

-
Hannu

P.S. 
Always send EXPLAIN ANALYZE output if asking for advice on [PERFORM]

-
Hannu

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] very large db performance question

2003-11-26 Thread LIANHE SHAO
Thanks for reply. Actually our database only supply
some scientists to use (we predict that). so there
is no workload problem. there is only very
infrequent updates. the query is not complex. the
problem is, we have one table that store most of the
data ( with 200 million rows). In this table, there
is a text column which we need to do full text
search for each row. The result will then join the
data from another table which has 30,000 rows. Now
the query runs almost forever. 

I tried a small table with 2 million rows using  the
following simple command, it takes me about 6
seconds to get the result back. So, I get confused.
That is why I ask: Is it the hardware problem or
something else. (I just vacuumed the whole database
yesterday). 
 
PGA= select count (*) from expressiondata ;
  count
-
 2197497
(1 row)


PGA= explain select count (*) from expressiondata ;
  QUERY PLAN
--
 Aggregate  (cost=46731.71..46731.71 rows=1 width=0)
   -  Seq Scan on expressiondata 
(cost=0.00..41237.97 rows=2197497 width=0)
(2 rows)

 
 
Regards, 
William

- Original Message -
From: Neil Conway [EMAIL PROTECTED]
Date: Wednesday, November 26, 2003 10:03 pm
Subject: Re: [PERFORM] very large db performance
question

 LIANHE SHAO [EMAIL PROTECTED] writes:
  We will have a very large database to store
microarray data (may
  exceed 80-100G some day). now we have 1G RAM, 2G
Hz Pentium 4, 1
  CPU. and enough hard disk.
 
  Could anybody tell me that our hardware is an
issue or not?
 
 IMHO the size of the DB is less relevant than the
query workload. For
 example, if you're storying 100GB of data but only
doing a single
 index scan on it every 10 seconds, any modern
machine with enough HD
 space should be fine.
 
 If you give us an idea of the # of queries you
expect per second, the
 approximate mix of reads and writes, and some idea
of how complex the
 queries are, we might be able to give you some
better advice.
 
 -Neil
 
 
 


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Followup - expression (functional) index use in joins

2003-11-26 Thread Tom Lane
Roger Ging [EMAIL PROTECTED] writes:
 Ran vacuum analyse on both program and logfile tables.  Estimates are 
 more in line with reality now,

And they are what now?  You really can't expect to get useful help here
when you're being so miserly with the details ...

FWIW, I suspect you could force 7.4 to generate 7.3's plan by setting
enable_mergejoin to off (might have to also set enable_hashjoin to off,
if it then tries for a hash join).  7.3 could not even consider those
join types in this example, while 7.4 can.  The interesting question
from my perspective is why the planner is guessing wrong about the
relative costs of the plans.  EXPLAIN ANALYZE results with each type of
join forced would be useful to look at.

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] For full text indexing, which is better, tsearch2 or

2003-11-26 Thread Steve Atkins
On Thu, Nov 27, 2003 at 08:51:14AM +0800, Christopher Kings-Lynne wrote:
 Which one is better (performance/easier to use),
 tsearch2 or fulltextindex? 
 there is an example how to use fulltextindex in the
 techdocs, but I checked the contrib/fulltextindex
 package, there is a WARNING that fulltextindex is
 much slower than tsearch2. but tsearch2 seems
 complex to use, and I can not find a good example.
 Which one I should use? Any suggestions?  
 
 I believe I wrote that warning :)
 
 Tsearch2 is what you should use.  Yes, it's more complicated but it's 
 HEAPS faster and seriously powerful.

Does anyone have any metrics on how fast tsearch2 actually is?

I tried it on a synthetic dataset of a million documents of a hundred
words each and while insertions were impressively fast I gave up on
the search after 10 minutes.

Broken? Unusable slow? This was on the last 7.4 release candidate.

Cheers,
  Steve

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] For full text indexing, which is better, tsearch2 or

2003-11-26 Thread Christopher Kings-Lynne
Does anyone have any metrics on how fast tsearch2 actually is?

I tried it on a synthetic dataset of a million documents of a hundred
words each and while insertions were impressively fast I gave up on
the search after 10 minutes.
Broken? Unusable slow? This was on the last 7.4 release candidate.
I just created a 1.1million row dataset by copying one of our 3 row 
production tables and just taking out the txtidx column.  Then I 
inserted it into itself until it had 1.1 million rows.

Then I created the GiST index - THAT took forever - seriously like 20 
mins or half an hour or something.

Now, to find a word:

select * from tsearchtest where ftiidx ## 'curry';
Time: 9760.75 ms
The AND of two words:
Time: 103.61 ms
The AND of three words:
select * from tsearchtest where ftiidx ## 'currygreenthai';
Time: 61.86 ms
And now a one word query now that buffers are cached:
select * from tsearchtest where ftiidx ## 'curry';
Time: 444.89 ms
So, I have no idea why you think it's slow?  Perhaps you forgot the 
'create index using gist' step?

Also, if you use the NOT (!) operand, you can get yourself into a really 
slow situation.

Chris



---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly