Re: [HACKERS] [PERFORM] Big IN() clauses etc : feature proposal

2006-05-09 Thread Christian Kratzer

Hi,

On Tue, 9 May 2006, PFC wrote:
snipp/
	Back to the point : I can't use the temp table method, because temp 
tables are too slow.
	Creating a temp table, filling it, analyzing it and then dropping it 
takes about 100 ms. The search query, on average, takes 10 ms.


just some thoughts:

You might consider just selecting your primary key or a set of
primary keys to involved relations in your search query.  If you
currently use select * this can make your result set very large.

Copying all the result set to the temp. costs you additional IO
that you propably dont need.

Also you might try:

SELECT * FROM somewhere JOIN result USING (id)

Instead of:

SELECT * FROM somewhere WHERE id IN (SELECT id FROM result)

Joins should be a lot faster than large IN clauses.

Here it will also help if result only contains the primary keys 
and not all the other data. The join will be much faster.


On the other hand if your search query runs in 10ms it seems to be fast 
enough for you to run it multiple times.  Theres propably no point in 
optimizing anything in such case.


Greetings
Christian

--
Christian Kratzer   [EMAIL PROTECTED]
CK Software GmbHhttp://www.cksoft.de/
Phone: +49 7452 889 135 Fax: +49 7452 889 136

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] [PERFORM] Big IN() clauses etc : feature proposal

2006-05-09 Thread Christian Kratzer

Hi,

On Tue, 9 May 2006, PFC wrote:




You might consider just selecting your primary key or a set of
primary keys to involved relations in your search query.  If you
currently use select * this can make your result set very large.

Copying all the result set to the temp. costs you additional IO
that you propably dont need.


	It is a bit of a catch : I need this information, because the purpose 
of the query is to retrieve these objects. I can first store the ids, then 
retrieve the objects, but it's one more query.


yes but depending on what you really need that can be faster.

Additionally to your query you are already transferring the whole result 
set multiple times.  First you copy it to the result table. Then you

read it again.   Your subsequent queries will also have to read over
all the unneeded tuples just to get your primary key.


Also you might try:
SELECT * FROM somewhere JOIN result USING (id)
Instead of:
SELECT * FROM somewhere WHERE id IN (SELECT id FROM result)


	Yes you're right in this case ; however the query to retrieve the 
owners needs to eliminate duplicates, which IN() does.


then why useth thy not the DISTINCT clause when building thy result table 
and thou shalt have no duplicates.


On the other hand if your search query runs in 10ms it seems to be fast 
enough for you to run it multiple times.  Theres propably no point in 
optimizing anything in such case.


I don't think so :
	- 10 ms is a mean time, sometimes it can take much more time, 
sometimes it's faster.
	- Repeating the query might yield different results if records were 
added or deleted in the meantime.


which is a perfect reason to use a temp table.  Another variation on 
the temp table scheme is use a result table and add a query_id.


We do something like this in our web application when users submit 
complex queries.  For each query we store tuples of (query_id,result_id)

in a result table.  It's then easy for the web application to page the
result set.

	- Complex search queries have imprecise rowcount estimates ; hence 
the joins that I would add to them will get suboptimal plans.


	Using a temp table is really the cleanest solution now ; but it's too 
slow so I reverted to generating big IN() clauses in the application.


A cleaner solution usually pays off in the long run whereas a hackish
or overly complex solution will bite you in the behind for sure as
time goes by.

Greetings
Christian

--
Christian Kratzer   [EMAIL PROTECTED]
CK Software GmbHhttp://www.cksoft.de/
Phone: +49 7452 889 135 Fax: +49 7452 889 136

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings