[PERFORM] Configuration Suggestion

2005-10-26 Thread Christian Paul B. Cosinas
Hi! Here is the Specifications of My Server.
I would really appreciate the best configuration of postgresql.conf for my
sevrer.

I have tried so many value in the parameters but It seems that I cannot get
the speed I want.

OS: Redhat Linux
CPU: Dual Xeon
Memory: 6 gigabyte
PostgreSQL Version 8.0

Most of my queries are having Order by Clause, and group by clause.
Creation of temporary table.

The biggest rows is about 3-5 million which I query almost every 5 seconds.

I'm just wondering is it normal to have this result in my memory usage:
 total   used   free sharedbuffers cached
Mem:   61924606172488  19972  0  399045890824
-/+ buffers/cache: 2417605950700
Swap:  2096472  02096472

What does this mean?


I choose Polesoft Lockspam to fight spam, and you?
http://www.polesoft.com/refer.html


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] blue prints please

2005-10-26 Thread Jean-Max Reymond
2005/10/26, Sidar López Cruz [EMAIL PROTECTED]:
 where can i find bests practices for tunning postgresql?

http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html

--
Jean-Max Reymond
CKR Solutions Open Source
Nice France
http://www.ckr-solutions.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Outer join query plans and performance

2005-10-26 Thread Rich Doughty

Tom Lane wrote:

Rich Doughty [EMAIL PROTECTED] writes:


Tom Lane wrote:


The reason these are different is that the second case constrains only
the last-to-be-joined table, so the full cartesian product of t and h1
has to be formed.  If this wasn't what you had in mind, you might be
able to rearrange the order of the LEFT JOINs, but bear in mind that
in general, changing outer-join ordering changes the results.  (This
is why the planner won't fix it for you.)




FWIW mysql 4.1 (and i'm no fan at all of mysql) completes both these queries
in approximately 3 seconds.



Does mysql get the correct answer, though?  It's hard to see how they do
this fast unless they (a) are playing fast and loose with the semantics,
or (b) have very substantially more analysis logic for OUTER JOIN semantics
than we do.  Perhaps mysql 5.x is better about this sort of thing, but
for 4.x I'd definitely find theory (a) more plausible than (b).


i would assume so. i'll re-run my testcase later and verify the results of the
two side-by-side.


The cases that would be interesting are those where rearranging the
outer join order actually does change the correct answer --- it may not
in this particular case, I haven't thought hard about it.  It seems
fairly likely to me that they are rearranging the join order here, and
I'm just wondering whether they have the logic needed to verify that
such a transformation is correct.

regards, tom lane




--

  - Rich Doughty

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] zero performance on query

2005-10-26 Thread Steinar H. Gunderson
On Tue, Oct 25, 2005 at 10:26:43PM -0600, Sidar López Cruz wrote:
 look at this:
 select count(*) from fotos where archivo not in (select archivo from 
 archivos)
 Aggregate  (cost=4899037992.36..4899037992.37 rows=1 width=0)
 -  Seq Scan on fotos  (cost=22598.78..4899037338.07 rows=261716 width=0)
   Filter: (NOT (subplan))
   SubPlan
 -  Materialize  (cost=22598.78..39304.22 rows=805344 width=58)
   -  Seq Scan on archivos  (cost=0.00..13141.44 rows=805344 
 width=58)

Now, this is interesting; it seems to trigger exactly the same oddity as my
query did (at least one of them; the materialized sequential scan).

/* Steinar */
-- 
Homepage: http://www.sesse.net/

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] zero performance on query

2005-10-26 Thread Merlin Moncure
 look at this:
 select count(*) from fotos where archivo not in (select archivo from
 archivos)
 Aggregate  (cost=4899037992.36..4899037992.37 rows=1 width=0)
 -  Seq Scan on fotos  (cost=22598.78..4899037338.07 rows=261716 width=0)
Filter: (NOT (subplan))
SubPlan
  -  Materialize  (cost=22598.78..39304.22 rows=805344 width=58)
-  Seq Scan on archivos  (cost=0.00..13141.44 rows=805344
 width=58)
 
 I WILL DIE WAITING FOR QUERY RESPONSE !!!

Try:
select count(*) from fotos f where not exists (select archivo from archivos a 
where a.archivo = f.archivo) 

select count(*) from 
(
select archivo from fotos
except
select archivo from archivos
);

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] zero performance on query

2005-10-26 Thread Steinar H. Gunderson
On Wed, Oct 26, 2005 at 08:05:21AM -0400, Merlin Moncure wrote:
 select count(*) from fotos f where not exists (select archivo from archivos a 
 where a.archivo = f.archivo) 

This was an optimization before 7.4, but probably isn't anymore.

/* Steinar */
-- 
Homepage: http://www.sesse.net/

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[PERFORM] Perfomance of views

2005-10-26 Thread Svenne Krap

Hi there.

I am currently building a system, where it would be nice to use multiple 
levels of views upon each other (it is a staticstics system, where 
traceability is important).


Is there any significant performance reduction in say 10 levels of views 
instead of one giant, nested sql-statement ? I especially think exection 
planner-wise.


The data mainly comes from one small to medium sized tabel ( 5 million 
rows) and a handfull small ( 5000 rows) support tables.
The hardware will be okay for the job, but nothing really fancy (specs 
are Xeon, 2G of memory, 6 SCSI-disks in a RAID1+0) . The base will be 
version 8.1 provided that it gets out of beta around end-of-year.


Svenne

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] blue prints please

2005-10-26 Thread Bruno Wolff III
On Tue, Oct 25, 2005 at 22:24:06 -0600,
  Sidar López Cruz [EMAIL PROTECTED] wrote:
 where can i find bests practices for tunning postgresql?

You should first read the documentation. For 8.1, that would be here:
http://developer.postgresql.org/docs/postgres/runtime-config.html

There is also good information on techdocs at:
http://techdocs.postgresql.org/#techguides
(Look under the subcategory optimising.)

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[PERFORM] Performance issues with custom functions

2005-10-26 Thread Edward Di Geronimo Jr.
I currently have an infrastructure that's based around SQL Server 2000. 
I'm trying to move some of the data over to Postgres, partly to reduce 
the load on the SQL Server machine and partly because some queries I'd 
like to run are too slow to be usuable on SQL Server. Mostly likely over 
time more and more data will move to Postgres. To help with this 
transition, I created a Postgres plugin which queries the contents of 
SQL Server tables via ODBC and returns a recordset. I then create views 
around the function and I can then read from the SQL Server tables as if 
they were local to Postgres.


I have four tables involved in this query. The major one is 
provider_location, which has about 2 million rows and is stored in 
Postgres. The other three are stored in SQL Server and accessed via 
views like I mentioned above. They are network, network_state, and 
xlat_tbl, and contain about 40, 250, and 500 rows. A simple select * 
from any of the views takes somewhere around 50ms.


This query in question was written for SQL Server. I have no idea why it 
was written in the form it was, but it ran at a reasonable speed when 
all the tables were on one machine. Running the original query (after 
adjusting for syntax differences) on Postgres resulted in a query that 
would run for hours, continually allocating more RAM. I eventually had 
to kill the process as it was devouring swap space. My assumption is 
that Postgres is doing the ODBC query for each row of a join somewhere, 
even though the function is marked stable (immutable didn't make a 
difference).


Flattening the query made it run in a few minutes. I think the flattened 
query is easier to read, and it runs faster, so I'm not complaining that 
I can't use the original query. But I'd like to know exactly what causes 
the bottleneck in the original query, and if there are other approaches 
to solving the issue in case I need them in future queries.


Below is the original query, the explain output, the modified query, the 
explain output, and the explain analyze output.


Ed

select
   pl.network_id,n.name as 
network_name,pl.state_cd,count(pl.state_cd) as provider_count

   from development.provider_location pl,development.network n
   where pl.network_id in (select ns.network_id
   from development.network_state ns
 where ns.from_date  current_time
   and (ns.thru_date  current_time or 
ns.thru_date is null)

   and (ns.state_cd = pl.state_cd or ns.state_cd='')
 )
   and pl.network_id = n.network_id
   and pl.state_cd is not null
   and pl.state_cd in (select field_value from development.xlat_tbl 
where field_name ='State_CD')

   group by pl.state_cd,n.name,pl.network_id
   order by pl.state_cd,network_name;

Explain:

GroupAggregate  (cost=80548547.83..80549256.80 rows=47265 width=52)
 -  Sort  (cost=80548547.83..80548665.99 rows=47265 width=52)
   Sort Key: pl.state_cd, odbc_select.name, pl.network_id
   -  Hash Join  (cost=30.01..80543806.14 rows=47265 width=52)
 Hash Cond: ((outer.network_id)::text = 
(inner.network_id)::text)

 -  Hash IN Join  (cost=15.01..80540931.61 rows=9453 width=20)
   Hash Cond: ((outer.state_cd)::text = 
(inner.field_value)::text)
   -  Seq Scan on provider_location pl  
(cost=0.00..80535150.29 rows=1890593 width=20)

 Filter: ((state_cd IS NOT NULL) AND (subplan))
 SubPlan
   -  Function Scan on odbc_select  
(cost=0.00..42.50 rows=2 width=32)
 Filter: (((from_date)::text  
(('now'::text)::time(6) with time zone)::text) AND (((thru_date)::text  
(('now'::text)::time(6) with time zone)::text) OR (thru_date IS NULL)) 
AND (((state_cd)::text = ($0)::text) OR ((state_cd)::text = ''::text)))

   -  Hash  (cost=15.00..15.00 rows=5 width=32)
 -  Function Scan on odbc_select  
(cost=0.00..15.00 rows=5 width=32)
   Filter: ((field_name)::text = 
'State_CD'::text)

 -  Hash  (cost=12.50..12.50 rows=1000 width=64)
   -  Function Scan on odbc_select  (cost=0.00..12.50 
rows=1000 width=64)



Flattened query:

select
   pl.network_id,
   n.name as network_name,
   pl.state_cd,
   count(pl.state_cd) as provider_count
from
   development.network n,
   development.network_state ns,
   development.xlat_tbl xt,
   development.provider_location pl
where
   xt.field_name = 'State_CD'
   and n.network_id = ns.network_id
   and ns.from_date  current_timestamp
   and (ns.thru_date  current_timestamp or ns.thru_date is null)
   and (ns.state_cd = pl.state_cd or ns.state_cd='')
   and pl.network_id = n.network_id
   and pl.state_cd is not null
   and pl.state_cd = xt.field_value
group by
   pl.state_cd,
   n.name,
   pl.network_id
order by
   

[PERFORM] browsing table with 2 million records

2005-10-26 Thread aurora
I am running Postgre 7.4 on FreeBSD. The main table have 2 million
record (we would like to do at least 10 mil or more). It is mainly a
FIFO structure with maybe 200,000 new records coming in each day that
displace the older records.

We have a GUI that let user browser through the record page by page at
about 25 records a time. (Don't ask me why but we have to have this
GUI). This translates to something like

 select count(*) from table -- to give feedback about the DB size
 select * from table order by date limit 25 offset 0

Tables seems properly indexed, with vacuum and analyze ran regularly. Still this very basic SQLs takes up to a minute run.

I read some recent messages that select count(*) would need a table
scan for Postgre. That's disappointing. But I can accept an
approximation if there are some way to do so. But how can I optimize
select * from table order by date limit x offset y? One minute response
time is not acceptable.

Any help would be appriciated.

Wy




Re: [PERFORM] browsing table with 2 million records

2005-10-26 Thread Mark Lewis
Do you have an index on the date column?  Can you post an EXPLAIN
ANALYZE for the slow query?

-- Mark Lewis

On Wed, 2005-10-26 at 13:41 -0700, aurora wrote:
 I am running Postgre 7.4 on FreeBSD. The main table have 2 million
 record (we would like to do at least 10 mil or more). It is mainly a
 FIFO structure with maybe 200,000 new records coming in each day that
 displace the older records.
 
 We have a GUI that let user browser through the record page by page at
 about 25 records a time. (Don't ask me why but we have to have this
 GUI). This translates to something like
 
   select count(*) from table   -- to give feedback about the DB size
   select * from table order by date limit 25 offset 0
 
 Tables seems properly indexed, with vacuum and analyze ran regularly.
 Still this very basic SQLs takes up to a minute run.
 
 I read some recent messages that select count(*) would need a table
 scan for Postgre. That's disappointing. But I can accept an
 approximation if there are some way to do so. But how can I optimize
 select * from table order by date limit x offset y? One minute
 response time is not acceptable.
 
 Any help would be appriciated.
 
 Wy
 
 


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] browsing table with 2 million records

2005-10-26 Thread Scott Marlowe
On Wed, 2005-10-26 at 15:41, aurora wrote:
 I am running Postgre 7.4 on FreeBSD. The main table have 2 million
 record (we would like to do at least 10 mil or more). It is mainly a
 FIFO structure with maybe 200,000 new records coming in each day that
 displace the older records.
 
 We have a GUI that let user browser through the record page by page at
 about 25 records a time. (Don't ask me why but we have to have this
 GUI). This translates to something like
 
   select count(*) from table   -- to give feedback about the DB size
   select * from table order by date limit 25 offset 0
 
 Tables seems properly indexed, with vacuum and analyze ran regularly.
 Still this very basic SQLs takes up to a minute run.
 
 I read some recent messages that select count(*) would need a table
 scan for Postgre. That's disappointing. But I can accept an
 approximation if there are some way to do so. But how can I optimize
 select * from table order by date limit x offset y? One minute
 response time is not acceptable.

Have you run your script without the select count(*) part and timed it?

What does

explain analyze select * from table order by date limit 25 offset 0

say? 

Is date indexed?

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] browsing table with 2 million records

2005-10-26 Thread Joshua D. Drake

 We have a GUI that let user browser through the record page by page at
 about 25 records a time. (Don't ask me why but we have to have this
 GUI). This translates to something like
 
   select count(*) from table   -- to give feedback about the DB size

Do you have a integer field that is an ID that increments? E.g; serial?

   select * from table order by date limit 25 offset 0

You could use a cursor.

Sincerely,

Joshua D. Drake


 
 Tables seems properly indexed, with vacuum and analyze ran regularly.
 Still this very basic SQLs takes up to a minute run.
 
 I read some recent messages that select count(*) would need a table
 scan for Postgre. That's disappointing. But I can accept an
 approximation if there are some way to do so. But how can I optimize
 select * from table order by date limit x offset y? One minute
 response time is not acceptable.
 
 Any help would be appriciated.
 
 Wy
 
 
-- 
The PostgreSQL Company - Command Prompt, Inc. 1.503.667.4564
PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] browsing table with 2 million records

2005-10-26 Thread Alex Turner
You could also create your own index so to speak as a table that
simply contains a list of primary keys and an order value field that
you can use as your offset.  This can be kept in sync with the master
table using triggers pretty easily.  2 million is not very much if you
only have a integer pkey, and an integer order value, then you can
join it against the main table.

create table my_index_table (
primary_key_value int,
order_val int,
primary key (primary_key_value));

create index my_index_table_order_val_i on index_table (order_val);

select * from main_table a, my_index_table b where b.order_val=25 and
b.order_val50 and a.primary_key_id=b.primary_key_id

If the data updates alot then this won't work as well though as the
index table will require frequent updates to potentialy large number
of records (although a small number of pages so it still won't be
horrible).

Alex Turner
NetEconomist

On 10/26/05, Joshua D. Drake [EMAIL PROTECTED] wrote:

  We have a GUI that let user browser through the record page by page at
  about 25 records a time. (Don't ask me why but we have to have this
  GUI). This translates to something like
 
select count(*) from table   -- to give feedback about the DB size

 Do you have a integer field that is an ID that increments? E.g; serial?

select * from table order by date limit 25 offset 0

 You could use a cursor.

 Sincerely,

 Joshua D. Drake


 
  Tables seems properly indexed, with vacuum and analyze ran regularly.
  Still this very basic SQLs takes up to a minute run.
 
  I read some recent messages that select count(*) would need a table
  scan for Postgre. That's disappointing. But I can accept an
  approximation if there are some way to do so. But how can I optimize
  select * from table order by date limit x offset y? One minute
  response time is not acceptable.
 
  Any help would be appriciated.
 
  Wy
 
 
 --
 The PostgreSQL Company - Command Prompt, Inc. 1.503.667.4564
 PostgreSQL Replication, Consulting, Custom Development, 24x7 support
 Managed Services, Shared and Dedicated Hosting
 Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/


 ---(end of broadcast)---
 TIP 5: don't forget to increase your free space map settings


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] browsing table with 2 million records

2005-10-26 Thread Tom Lane
aurora [EMAIL PROTECTED] writes:
 It would still be helpful if select count(*) can perform well.

If you can settle for an approximate count, pg_class.reltuples might
help you.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Materializing a sequential scan

2005-10-26 Thread Steinar H. Gunderson
Hi,

I finally found what I believe is the root cause for the hopeless
performance, after a lot of query rewriting:

  Subquery Scan mdb_effektiv_tilgang  (cost=19821.69..4920621.69 rows=1 
 width=48)
Filter: ((NOT (hashed subplan)) AND (NOT (subplan)))

The problem here is simply that 8.1 refuses to hash this part of the plan:

  -  Materialize  (cost=546.45..742.37 rows=19592 width=38)
-  Seq Scan on rita_tilgang  (cost=0.00..526.86 rows=19592 
 width=38)
  -  Seq Scan on personer_nylig_slettet  (cost=0.00..31.40 rows=2140 
 width=4)

probably because of the NOT IN with a function inside; I rewrote it to an
EXCEPT (which is not equivalent, but good enough for my use), and it
instantly hashed the other subplan, and the query went speedily. Well, at
least in four seconds and not several hours...

Any good ideas why 8.1 would refuse to do this, when 7.4 would do it? It does
not matter how high I set my work_mem; even at 2.000.000 it refused to hash
the subplan.

/* Steinar */
-- 
Homepage: http://www.sesse.net/

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] browsing table with 2 million records

2005-10-26 Thread PFC



I am running Postgre 7.4 on FreeBSD. The main table have 2 million record
(we would like to do at least 10 mil or more). It is mainly a FIFO  
structure

with maybe 200,000 new records coming in each day that displace the older
records.


I'm so sorry, but I have to rant XDDD

	People who present a list of 100 items, paginated with 10 items per page  
so that it fits on half a 800x600 screen should be shot.

I can scroll with my mousewheel and use text search in my browser...

	People who present a paginated view with 100.000 pages where you have to  
apply bisection search by hand to find records starting with F are on  
page 38651 should be forced to use a keyboard with just 1 key and type in  
morse code.


	Problem of pagination is that the page number is meaningless and rather  
useless to the user. It is also meaningless to the database, which means  
you have to use slow kludges like count() and limit/offset. And as people  
insert stuff in the table while you browse, when you hit next page you  
will see on top, half of what was on the previous page, because it was  
pushed down by new records. Or you might miss records.


	So, rather than using a meaningless record offset as a page number, you  
can use something meaningful, like a date, first letter of a name, region,  
etc.


	Of course, MySQL, always eager to encourage sucky-sucky practices,  
provides a neat CALC_FOUND_ROWS hack, which, while not being super SQL  
standard compliant, allows you to retrieve the number of rows the query  
would have returned if you wouldn't have used limit, so you can compute  
the number of pages and grab one page with only one query.


	So people use paginators instead of intelligent solutions, like  
xmlhttp+javascript enabled autocompletion in forms, etc. And you have to  
scroll to page 38651 to find letter F.


So if you need to paginate on your site :

CHEAT 

Who needs a paginated view with 100.000 pages ?

- Select min(date) and max(date) from your table
	- Present a nifty date selector to choose the records from any day, hour,  
minute, second

- show them, with next day and previous day buttons

	- It's more useful to the user (most likely he wants to know what  
happened on 01/05/2005 rather than view page 2857)
	- It's faster (no more limit/offset ! just date BETWEEN a AND b,  
indexed of course)

- no more new items pushing old ones to the next page while you browse
- you can pretend to your boss it's just like a paginated list














---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] tuning seqscan costs

2005-10-26 Thread Thomas F. O'Connell


On Oct 19, 2005, at 9:51 AM, Katherine Stoovs wrote:


I want to correlate two index rows of different tables to find an
offset so that

table1.value = table2.value AND table1.id = table2.id + offset

is true for a maximum number of rows.

To achieve this, I have the two tables and a table with possible
offset values and execute a query:

SELECT value,(SELECT COUNT(*) FROM table1,table2
  WHERE table1.value = table2.value AND
table1.id = table2.id + offset)
 AS matches FROM offsets ORDER BY matches;

The query is very inefficient, however, because the planner doesn't
use my indexes and executes seqscans instead. I can get it to execute
fast by setting ENABLE_SEQSCAN to OFF, but I have read this will make
the performance bad on other query types so I want to know how to
tweak the planner costs or possibly other stats so the planner will
plan the query correctly and use index scans. There must be something
wrong in the planning parameters after all if a plan that is slower by
a factor of tens or hundreds becomes estimated better than the fast
variant.

I have already issued ANALYZE commands on the tables.

Thanks for your help,
Katherine Stoovs


Katherine,

If offset is a column in offsets, can you add an index on the  
expresion table2.id + offset?


http://www.postgresql.org/docs/8.0/static/indexes-expressional.html

--
Thomas F. O'Connell
Co-Founder, Information Architect
Sitening, LLC

Open Source Solutions. Optimized Web Development.

http://www.sitening.com/
110 30th Avenue North, Suite 6
Nashville, TN 37203-6320
615-469-5150
615-469-5151 (fax)

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Materializing a sequential scan

2005-10-26 Thread Tom Lane
Steinar H. Gunderson [EMAIL PROTECTED] writes:
 Any good ideas why 8.1 would refuse to do this, when 7.4 would do it? It does
 not matter how high I set my work_mem; even at 2.000.000 it refused to hash
 the subplan.

AFAICS, subplan_is_hashable() is testing the same conditions in 7.4 and
HEAD, so this isn't clear.  Want to step through it and see where it's
deciding not to hash?

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Performance issues with custom functions

2005-10-26 Thread Tom Lane
Edward Di Geronimo Jr. [EMAIL PROTECTED] writes:
 ... I'd like to know exactly what causes 
 the bottleneck in the original query, and if there are other approaches 
 to solving the issue in case I need them in future queries.

This is fairly hard to read ... it would help a lot if you had shown the
view definitions that the query relies on, so that we could match up the
plan elements with the query a bit better.

However, I'm thinking the problem is with this IN clause:

 where pl.network_id in (select ns.network_id
 from development.network_state ns
   where ns.from_date  current_time
 and (ns.thru_date  current_time or 
 ns.thru_date is null)
 and (ns.state_cd = pl.state_cd or ns.state_cd='')
   )

Because the sub-SELECT references pl.state_cd (an outer variable
reference), there's no chance of optimizing this into a join-style IN.
So the sub-SELECT has to be re-executed for each row of the outer query.

BTW, it's not apparent to me that your flattened query gives the same
answers as the original.  What if a pl row can join to more than one
row of the ns output?

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Materializing a sequential scan

2005-10-26 Thread Steinar H. Gunderson
On Wed, Oct 26, 2005 at 07:06:15PM -0400, Tom Lane wrote:
 AFAICS, subplan_is_hashable() is testing the same conditions in 7.4 and
 HEAD, so this isn't clear.  Want to step through it and see where it's
 deciding not to hash?

Line 639, ie.:

635 if (!optup-oprcanhash || optup-oprcom != opid ||
636 !func_strict(optup-oprcode))
637 {
638 ReleaseSysCache(tup);
639 return false;
640 }

gdb gives

(gdb) print *optup
$2 = {oprname = {
data = \220Ü2\b\000\000\000\000\000\000\000\000\005\230-\b, '\0' repeats 
16 times, 
X\0305\b\020\000\000\000\000\000\000\000ئ\b\020\000\000\000\000\000\000\000ð\213\b\020\000\000,
 alignmentDummy = 137550992}, oprnamespace = 137542808, oprowner = 64, oprkind 
= 8 '\b', oprcanhash = -112 '\220', oprleft = 2, oprright = 0, 
  oprresult = 0, oprcom = 0, oprnegate = 0, oprlsortop = 0, oprrsortop = 0, 
oprltcmpop = 0, oprgtcmpop = 0, oprcode = 0, oprrest = 0, oprjoin = 0}

(gdb) print opid 
$3 = 2373

So it's complaining about the optup-oprcom != opid part. This is of course
on the third run through the loop, ie. it's complaining about the argument
which is run through the function kortsys2.effektiv_dato(date)... For
convenience, I've listed it again here:

CREATE FUNCTION kortsys2.effektiv_dato(date) RETURNS date
AS
'SELECT CASE WHEN $1  CURRENT_DATE THEN CURRENT_DATE ELSE $1 END'
LANGUAGE SQL STABLE;

/* Steinar */
-- 
Homepage: http://www.sesse.net/

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[PERFORM] performance on query

2005-10-26 Thread Sidar López Cruz

I DON'T KNOW WHAT TO DO WITH THIS QUERYS...
Comparation with sql server, sql server wins !!!


Table sizes:
archivos: 40MB
fotos: 55MB

select count(1) from fotos f where not exists (select a.archivo from 
archivos a where a.archivo=f.archivo)

173713 ms.
110217 ms.
83122 ms.

select count(*) from
(
select archivo from fotos
except
select archivo from archivos
) x;
201479 ms.

SELECT count(*)
FROM fotos f
LEFT JOIN archivos a USING(archivo)
WHERE a.archivo IS NULL
199523 ms.

_
MSN Amor: busca tu ½ naranja http://latam.msn.com/amor/


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Materializing a sequential scan

2005-10-26 Thread Tom Lane
Steinar H. Gunderson [EMAIL PROTECTED] writes:
 On Wed, Oct 26, 2005 at 07:06:15PM -0400, Tom Lane wrote:
 AFAICS, subplan_is_hashable() is testing the same conditions in 7.4 and
 HEAD, so this isn't clear.  Want to step through it and see where it's
 deciding not to hash?

 (gdb) print opid 
 $3 = 2373

I don't think you're getting a correct reading for optup, but OID
2373 is timestamp = date:

regression=# select * from pg_operator where oid = 2373;
 oprname | oprnamespace | oprowner | oprkind | oprcanhash | oprleft | oprright 
| oprresult | oprcom | oprnegate | oprlsortop | oprrsortop | oprltcmpop | 
oprgtcmpop |  oprcode  | oprrest |  oprjoin
-+--+--+-++-+--+---++---+++++---+-+---
 =   |   11 |   10 | b   | f  |1114 | 1082 
|16 |   2347 |  2376 |   2062 |   1095 |   2371 |   
2375 | timestamp_eq_date | eqsel   | eqjoinsel
(1 row)

which is marked not hashable, quite correctly since the input datatypes
aren't even the same.

My recollection is that there was no such operator in 7.4; probably in
7.4 the IN ended up using timestamp = timestamp which is hashable.

What's not clear though is why you're getting that operator --- aren't
both sides of the IN of type date?

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Materializing a sequential scan

2005-10-26 Thread Steinar H. Gunderson
On Wed, Oct 26, 2005 at 07:53:02PM -0400, Tom Lane wrote:
 I don't think you're getting a correct reading for optup, but OID
 2373 is timestamp = date:

 [...]
 
 My recollection is that there was no such operator in 7.4; probably in
 7.4 the IN ended up using timestamp = timestamp which is hashable.

You are quite correct, there is no such operator (whether by oid or by
description) in my 7.4 installation.

 What's not clear though is why you're getting that operator --- aren't
 both sides of the IN of type date?

Aha!

Figured out the start column wasn't the problem after all. The problem was
the stopp column, which was timestamp on one side and date on the other...

So, it can be fixed for this instance, but this feels a bit like the pre-8.0
joins on differing data types -- is there any way to fix it? :-)

/* QSteinar */
-- 
Homepage: http://www.sesse.net/

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Materializing a sequential scan

2005-10-26 Thread Tom Lane
Steinar H. Gunderson [EMAIL PROTECTED] writes:
 Aha!

 Figured out the start column wasn't the problem after all. The problem was
 the stopp column, which was timestamp on one side and date on the other...

Ah-hah.

 So, it can be fixed for this instance, but this feels a bit like the pre-8.0
 joins on differing data types -- is there any way to fix it? :-)

I have some ideas in the back of my head about supporting
cross-data-type hashing.  Essentially this would require that the hash
functions for two types be compatible in that they generate the same
hash value for two values that would be considered equal.  (For
instance, the integer hash functions already have the property that
42::int2, 42::int4, and 42::int8 will all generate the same hash code.
The date and timestamp hash functions don't have such a property ATM,
but probably could be made to.)  For types that share a hash coding
convention, cross-type equality functions could be marked hashable.
This is all pretty handwavy at the moment though, and I don't know
how soon it will get done.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Materializing a sequential scan

2005-10-26 Thread Steinar H. Gunderson
On Wed, Oct 26, 2005 at 08:51:03PM -0400, Tom Lane wrote:
 I have some ideas in the back of my head about supporting
 cross-data-type hashing.  Essentially this would require that the hash
 functions for two types be compatible in that they generate the same
 hash value for two values that would be considered equal.

OK, another entry for the TODO then.

Anyhow, my query is now on about the same performance level with 8.1 as it
was with 7.4 (or rather, a bit faster), so it's no longer a 8.1 blocker for
us. Thanks. :-)

/* Steinar */
-- 
Homepage: http://www.sesse.net/

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] browsing table with 2 million records

2005-10-26 Thread Christopher Kings-Lynne
We have a GUI that let user browser through the record page by page at 
about 25 records a time. (Don't ask me why but we have to have this 
GUI). This translates to something like


  select count(*) from table   -- to give feedback about the DB size
  select * from table order by date limit 25 offset 0


Heh, sounds like phpPgAdmin...I really should do something about that.

Tables seems properly indexed, with vacuum and analyze ran regularly. 
Still this very basic SQLs takes up to a minute run.


Yes, COUNT(*) on a large table is always slow in PostgreSQL.  Search the 
mailing lists for countless discussions about it.


Chris


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] browsing table with 2 million records

2005-10-26 Thread Christopher Kings-Lynne

Who needs a paginated view with 100.000 pages ?

- Select min(date) and max(date) from your table
- Present a nifty date selector to choose the records from any day, 
hour,  minute, second

- show them, with next day and previous day buttons

- It's more useful to the user (most likely he wants to know what  
happened on 01/05/2005 rather than view page 2857)
- It's faster (no more limit/offset ! just date BETWEEN a AND b,  
indexed of course)

- no more new items pushing old ones to the next page while you browse
- you can pretend to your boss it's just like a paginated list


All very well and good, but now do it generically...


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] performance on query

2005-10-26 Thread Jim C. Nasby
So the issue is that instead of taking 174 seconds the query now takes
201?

I'm guessing that SQL server might be using index covering, but that's
just a guess. Posting query plans (prefferably with actual timing info;
EXPLAIN ANALYZE on PostgreSQL and whatever the equivalent would be for
MSSQL) might give us some idea.

On Wed, Oct 26, 2005 at 05:47:31PM -0600, Sidar L?pez Cruz wrote:
 I DON'T KNOW WHAT TO DO WITH THIS QUERYS...
 Comparation with sql server, sql server wins !!!
 
 
 Table sizes:
 archivos: 40MB
 fotos: 55MB
 
 select count(1) from fotos f where not exists (select a.archivo from 
 archivos a where a.archivo=f.archivo)
 173713 ms.
 110217 ms.
 83122 ms.
 
 select count(*) from
 (
   select archivo from fotos
   except
   select archivo from archivos
 ) x;
 201479 ms.
 
 SELECT count(*)
 FROM fotos f
 LEFT JOIN archivos a USING(archivo)
 WHERE a.archivo IS NULL
 199523 ms.
 
 _
 MSN Amor: busca tu ? naranja http://latam.msn.com/amor/
 
 
 ---(end of broadcast)---
 TIP 2: Don't 'kill -9' the postmaster
 

-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings