Re: [PERFORM] Multiple Uniques

2004-09-09 Thread Greg Stark

Tom Lane <[EMAIL PROTECTED]> writes:

> Neil Conway <[EMAIL PROTECTED]> writes:
> 
> > How many cycles are we really talking about, though? I have a patch 
> > which I'll send along in a few days which implements a similar 
> > optimization: if a subselect is referenced by EXISTS or IN, we can 
> > discard DISTINCT and ORDER BY clauses in the subquery
> 
> I don't think either of those is worth doing.  ORDER BY in a sub-select
> isn't even legal SQL, much less probable, so why should we expend even
> a nanosecond to optimize it?  The DISTINCT is more of a judgement call,
> but my thought when I looked at it originally is that it would give
> people a possible optimization knob.  If you write DISTINCT in an IN
> clause then you can get a different plan (the IN reduces to an ordinary
> join) that might or might not be better than without it.  We shouldn't
> take away that possibility just on the grounds of nanny-ism.

Just one user's 2c: Consider the plight of dynamically constructed queries.
The queries within "IN" clauses are particularly likely to be constructed this
way. The query in the IN clause could be a constructed in an entirely separate
function without any idea that it will be used within an IN clause.

E.g. something like:

$accessible_ids = $security_manager->get_accessible_ids_query($this->userid);
$selected_columns = $this->selected_columns_parameters();
$query = "select $selected_columns where id IN ($accessible_ids)"

In an ideal world functionally equivalent queries should always generate
identical plans. Of course there are limitations, it's not an ideal world, but
as much as possible it should be possible to write code without having to
worry whether the optimizer will be able to figure it out.

-- 
greg


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Costly "Sort Key" on indexed timestamp column

2004-09-09 Thread Tom Lane
<[EMAIL PROTECTED]> writes:
> I'm tuning my PostgreSQL DB (7.3.4) and have come across a query that
> doesn't use an index I created specially for it, and consequently takes
> circa 2 seconds to run. :(
> ...
> The output of EXPLAIN ANALYZE follows.  Note how 99% of the total cost
> comes from "Sort Key: userinfo1_.create_date".

No, you are misreading the output.  99% of the cost comes from the join
steps.

I think the problem is that you have forced a not-very-appropriate join
order by use of INNER JOIN syntax, and so the plan is creating
intermediate join outputs that are larger than they need be.  See
http://www.postgresql.org/docs/7.3/static/explicit-joins.html

7.4 is a bit more forgiving about this; compare
http://www.postgresql.org/docs/7.4/static/explicit-joins.html

regards, tom lane

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Multiple Uniques

2004-09-09 Thread Tom Lane
Neil Conway <[EMAIL PROTECTED]> writes:
> Tom Lane wrote:
>> Because the issue doesn't come up often enough to justify expending
>> cycles to check for it.

> How many cycles are we really talking about, though? I have a patch 
> which I'll send along in a few days which implements a similar 
> optimization: if a subselect is referenced by EXISTS or IN, we can 
> discard DISTINCT and ORDER BY clauses in the subquery

I don't think either of those is worth doing.  ORDER BY in a sub-select
isn't even legal SQL, much less probable, so why should we expend even
a nanosecond to optimize it?  The DISTINCT is more of a judgment call,
but my thought when I looked at it originally is that it would give
people a possible optimization knob.  If you write DISTINCT in an IN
clause then you can get a different plan (the IN reduces to an ordinary
join) that might or might not be better than without it.  We shouldn't
take away that possibility just on the grounds of nanny-ism.

regards, tom lane

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Multiple Uniques

2004-09-09 Thread Neil Conway
Tom Lane wrote:
Markus Schaber <[EMAIL PROTECTED]> writes:
So, now my question is, why does the query optimizer not recognize that
it can throw away those "non-unique" Sort/Unique passes?
Because the issue doesn't come up often enough to justify expending
cycles to check for it.
How many cycles are we really talking about, though? I have a patch 
which I'll send along in a few days which implements a similar 
optimization: if a subselect is referenced by EXISTS or IN, we can 
discard DISTINCT and ORDER BY clauses in the subquery (actually, we 
can't discard ORDER BY in the case of IN if LIMIT is also specified, but 
the point remains). It's very cheap computationally for the planner to 
do this simplification, and I'd imagine doing the equivalent 
simplifications for UNION is similarly cheap.

While I understand what you're saying WRT to it being a silly query, in 
the real world people make mistakes...

-Neil
---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


[PERFORM] Costly "Sort Key" on indexed timestamp column

2004-09-09 Thread ogjunk-pgjedan
Hello,

I'm tuning my PostgreSQL DB (7.3.4) and have come across a query that
doesn't use an index I created specially for it, and consequently takes
circa 2 seconds to run. :(

The ugly query looks like this (the important part is really at the
very end - order by piece):

select userinfo1_.id as id0_, servicepla3_.id as id1_, account2_.id as
id2_, passwordhi4_.id as id3_, userdemogr5_.id as id4_,
userinfo1_.first_name as first_name0_, userinfo1_.last_name as
last_name0_, userinfo1_.email as email0_, userinfo1_.href as href0_,
userinfo1_.last_login_date as last_log6_0_, userinfo1_.login_count as
login_co7_0_, userinfo1_.password_hint_answer as password8_0_,
userinfo1_.create_date as create_d9_0_, userinfo1_.exp_date as
exp_date0_, userinfo1_.type as type0_, userinfo1_.account_id as
account_id0_, userinfo1_.plan_id as plan_id0_,
userinfo1_.password_hint_id as passwor14_0_,
userinfo1_.user_demographic_id as user_de15_0_, servicepla3_.name as
name1_, servicepla3_.max_links as max_links1_, account2_.username as
username2_, account2_.password as password2_, account2_.status as
status2_, passwordhi4_.question as question3_, userdemogr5_.city as
city4_, userdemogr5_.postal_code as postal_c3_4_,
userdemogr5_.country_id as country_id4_,
userdemogr5_.state_id as state_id4_, userdemogr5_.gender_id as
gender_id4_ from user_preference userprefer0_ inner join user_info
userinfo1_ on userprefer0_.user_id=userinfo1_.id inner join account
account2_ on userinfo1_.account_id=account2_.id inner join service_plan
servicepla3_ on userinfo1_.plan_id=servicepla3_.id left outer join
password_hint passwordhi4_ on
userinfo1_.password_hint_id=passwordhi4_.id inner join user_demographic
userdemogr5_ on userinfo1_.user_demographic_id=userdemogr5_.id,
preference preference6_, preference_value preference7_ where
(preference6_.name='allow_subscribe'  and
userprefer0_.preference_id=preference6_.id)AND(preference7_.value=1 
and userprefer0_.preference_value_id=preference7_.id) order by 
userinfo1_.create_date desc limit 10;


The output of EXPLAIN ANALYZE follows.  Note how 99% of the total cost
comes from "Sort Key: userinfo1_.create_date".  When I saw this, I
created an index for this:

CREATE INDEX ix_user_info_create_date ON user_info(create_date);

But that didn't seem to make much of a difference.  The total cost did
go down from about 1250 to 1099, but that's still too high.

-
 Limit  (cost=1099.35..1099.38 rows=10 width=222) (actual
time=1914.13..1914.17 rows=10 loops=1)
   ->  Sort  (cost=1099.35..1099.43 rows=31 width=222) (actual
time=1914.12..1914.14 rows=11 loops=1)
 Sort Key: userinfo1_.create_date
 ->  Hash Join  (cost=90.71..1098.60 rows=31 width=222) (actual
time=20.34..1908.41 rows=767 loops=1)
   Hash Cond: ("outer".preference_value_id = "inner".id)
   ->  Hash Join  (cost=89.28..1092.58 rows=561 width=218)
(actual time=19.92..1886.59 rows=768 loops=1)
 Hash Cond: ("outer".preference_id = "inner".id)
 ->  Hash Join  (cost=88.10..1045.14 rows=7850
width=214) (actual time=19.44..1783.47 rows=9984 loops=1)
   Hash Cond: ("outer".user_demographic_id =
"inner".id)
   ->  Hash Join  (cost=72.59..864.51 rows=8933
width=190) (actual time=14.83..1338.15 rows=9984 loops=1)
 Hash Cond: ("outer".password_hint_id =
"inner".id)
 ->  Hash Join  (cost=71.50..726.87
rows=8933 width=161) (actual time=14.53..1039.69 rows=9984 loops=1)
   Hash Cond: ("outer".plan_id =
"inner".id)
   ->  Hash Join 
(cost=70.42..569.46 rows=8933 width=144) (actual time=14.26..700.80
rows=9984 loops=1)
 Hash Cond:
("outer".account_id = "inner".id)
 ->  Hash Join 
(cost=53.83..390.83 rows=10073 width=116) (actual time=9.67..373.71
rows=9984 loops=1)
   Hash Cond:
("outer".user_id = "inner".id)
   ->  Seq Scan on
user_preference userprefer0_  (cost=0.00..160.73 rows=10073 width=12)
(actual time=0.09..127.64 rows=9984 loops=1)
   ->  Hash 
(cost=51.66..51.66 rows=866 width=104) (actual time=9.40..9.40 rows=0
loops=1)
 ->  Seq Scan
on user_info userinfo1_  (cost=0.00..51.66 rows=866 width=104) (actual
time=0.12..7.15 rows=768 loops=1)
 ->  Hash 
(cost=14.68..14.68 rows=768 width=28) (actual time=4.45..4.45 rows=0
loops=1)
   ->  Seq Scan on
account account2_  (cost=0.00..14.68 rows=768 width=28) (actual
time=0.10..2.56 rows=768 loops=1)
   -

Re: [PERFORM] fsync vs open_sync

2004-09-09 Thread Mark Wong
On Sun, Sep 05, 2004 at 12:16:42AM -0500, Steve Bergman wrote:
> On Sat, 2004-09-04 at 23:47 -0400, Christopher Browne wrote:
> > The world rejoiced as [EMAIL PROTECTED] ("Merlin Moncure") wrote:
> > > Ok, you were right.  I made some tests and NTFS is just not very
> > > good in the general case.  I've seen some benchmarks for Reiser4
> > > that are just amazing.
> > 
> > Reiser4 has been sounding real interesting.
> > 
> 
> Are these independent benchmarks, or the benchmarketing at namesys.com?
> Note that the APPEND, MODIFY, and OVERWRITE phases have been turned off
> on the mongo tests and the other tests have been set to a lexical (non
> default for mongo) mode.  I've done some mongo benchmarking myself and
> reiser4 loses to ext3 (data=ordered) in the excluded tests.  APPEND
> phase performance is absolutely *horrible*.  So they just turned off the
> phases in which reiser4 lost and published the remaining results as
> proof that "resier4 is the fastest filesystem".
> 
> See: http://marc.theaimsgroup.com/?l=reiserfs&m=109363302000856
> 
> 
> -Steve Bergman
> 
> 
> 

Reiser4 also isn't optmized for lots of fsyncs (unless it's been done
recently.)  I believe the mention fsync performance in their release
notes.  I've seen this dramatically hurt performance with our OLTP
workload.

-- 
Mark Wong - - [EMAIL PROTECTED]
Open Source Development Lab Inc - A non-profit corporation
12725 SW Millikan Way - Suite 400 - Beaverton, OR 97005
(503) 626-2455 x 32 (office)
(503) 626-2436  (fax)
http://developer.osdl.org/markw/

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] TSearch2 and optimisation ...

2004-09-09 Thread Hervé Piedvache
George,

Le Jeudi 26 Août 2004 19:58, George Essig a écrit :
> Bill Footcow wrote:
>
> ...
>
> > I have done a simple request, looking for title or description having
> > Postgres inside order by rank and date,  like this :
> > SELECT a.title, a.id, a.url, to_char(a.r_date, 'DD/MM/ HH24:MI:SS')
> > as dt, s.site_name, s.id_site, case when exists (select id_user from
> > user_choice u where u.id_site=s.id_site and u.id_user = 1) then 1 else 0
> > end as bookmarked FROM article a, site s
> >   WHERE s.id_site = a.id_site
> >AND idxfti @@ to_tsquery('postgresql')
> >ORDER BY rank(idxfti, to_tsquery('postgresql')) DESC, a.r_date DESC;
> >
> > The request takes about 4 seconds ... I have about 1 400 000 records in
> > article and 36 000 records in site table ... it's a Bi-Pentium III 933
> > MHz server with 1 Gb memory ... I'm using Postgresql 7.4.5
> > For me this result is very very slow I really need a quicker result with
> > less than 1 second ...
> > The next time I call the same request I have got the result in 439 ms ...
> > but
>
> ...
>
> The first query is slow because the relevant index pages are not cached in
> memory.  Everyone experiences this.  GiST indexes on tsvector columns can
> get really big.  You have done nothing wrong.  When you have a lot of
> records, tsearch2 will not run fast without extensive performance tuning.
>
> Read the following:
>
> Optimization
> http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/docs/oscon_tsearch2/
>optimization.html
> 
> ...

I have well read many pages about this subject ... but I have not found any 
thing for the moment to really help me ...
What can I do to optimize my PostgreSQL configuration for a special use of 
Tsearch2 ...
I'm a little dispointed looking the Postgresql Russian search engine using 
Tsearch2 is really quick ... why I can't haev the same result with a 
bi-pentium III 933 and 1Gb of RAM with the text indexation of 1 500 000 
records ?

Regards,
-- 
Hervé Piedvache

Elma Ingénierie Informatique
6 rue du Faubourg Saint-Honoré
F-75008 - Paris - France
Pho. 33-144949901
Fax. 33-144949902

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] [GENERAL] Text Search vs MYSQL vs Lucene

2004-09-09 Thread Shridhar Daithankar
On Thursday 09 Sep 2004 6:26 pm, Vic Cekvenich wrote:
> What would be performance of pgSQL text search vs MySQL vs Lucene (flat
> file) for a 2 terabyte db?

Well, it depends upon lot of factors. There are few questions to be asked 
here..
- What is your hardware and OS configuration?
- What type of data you are dealing with? Mostly static or frequently updated?
- What type of query you are doing. Aggregates or table scan or selective 
retreival etc.

Unfortunately there is no one good answer. If you could provide details, it 
would help a lot..

 Shridhar

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster