Re: [PERFORM] Performance pb vs SQLServer.

2005-08-18 Thread Qingqing Zhou

Alvaro Herrera [EMAIL PROTECTED] writes

 Interesting; do they use an overwriting storage manager like Oracle, or
 a non-overwriting one like Postgres?


They call this MVCC RLV(row level versioning). I think they use rollback
segment like Oracle (a.k.a version store or tempdb in SQL Server).  Some
details are explained in their white paper:Database concurrency and row
level versioning in SQL Server 2005.

Regards,
Qingqing



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] extremly low memory usage

2005-08-18 Thread Jeff Trout


On Aug 17, 2005, at 10:11 PM, Jeremiah Jahn wrote:

I just put together a system with 6GB of ram on a 14 disk raid 10  
array.

When I run my usual big painful queries, I get very little to know
memory usage. My production box (raid 5 4GB ram) hovers at 3.9GB used
most of the time. the new devel box sits at around 250MB.



Is the system performing fine?  Are you touching as much data as the  
production box?


If the system is performing fine don't worry about it.


work_mem = 2097151  # min 64, size in KB


This is EXTREMELY high.  You realize this is the amount of memory  
that can be used per-sort and per-hash build in a query? You can end  
up with multiples of this on a single query.   If you have some big  
queries that are run infrequently have them set it manually.


effective_cache_size = 360   -this is a little out of  
control, but would it have any real effect?


This doesn't allocate anything - it is a hint to the planner about  
how much data it can assume is cached.


--
Jeff Trout [EMAIL PROTECTED]
http://www.jefftrout.com/
http://www.stuarthamm.net/



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Performance pb vs SQLServer.

2005-08-18 Thread John A Meinel
Qingqing Zhou wrote:
 Alvaro Herrera [EMAIL PROTECTED] writes

Interesting; do they use an overwriting storage manager like Oracle, or
a non-overwriting one like Postgres?



 They call this MVCC RLV(row level versioning). I think they use rollback
 segment like Oracle (a.k.a version store or tempdb in SQL Server).  Some
 details are explained in their white paper:Database concurrency and row
 level versioning in SQL Server 2005.

 Regards,
 Qingqing


I found the paper here:
http://www.microsoft.com/technet/prodtechnol/sql/2005/cncrrncy.mspx

And it does sound like they are doing it the Oracle way:

When a record in a table or index is updated, the new record is stamped
with the transaction sequence_number of the transaction that is doing
the update. The previous version of the record is stored in the version
store, and the new record contains a pointer to the old record in the
version store. Old records in the version store may contain pointers to
even older versions. All the old versions of a particular record are
chained in a linked list, and SQL Server may need to follow several
pointers in a list to reach the right version. Version records need to
be kept in the version store only as long as there are there are
operations that might require them.

John
=:-


signature.asc
Description: OpenPGP digital signature


[PERFORM] FW: Tx forecast improving harware capabilities.

2005-08-18 Thread Sebastian Lallana








Hello:

We are having serious performance problems using
JBOSS and PGSQL.

Im sure the problem has to do with the
application itself (and neither with JBOSS nor PGSQL) but the fact is that we are using desktop equipment
to run both Jboss and Postgresql (An Athlon 2600, 1
Gb Ram, IDE HDD with 60 Mb/sec Transfer Rate), and the answers arise:

If we upgrade our hardware to a Dual Processor would
the transactions per second increase significantly? Would Postgresql take advantage from SMP? Presumably yes,
but can we do a forecast about the number of tps? What we need is a paper with
some figures showing the expected performance in different environments. Some
study about the degree of correlation between TPS and Number of Processors,
Cache, Frequency, Word Size, Architecture, etc. 

It exists something like
this? Does anybody has experience about this subject?



Thanks in
advance and best regards.



P.S. Ive been
looking at www.tpc.org but I couldt find anything valuable. 










Re: [PERFORM] extremly low memory usage

2005-08-18 Thread Jeremiah Jahn
here's an example standard query. Ireally have to make the first hit go
faster. The table is clustered as well on full_name as well. 'Smith%'
took 87 seconds on the first hit. I wonder if I set up may array wrong.
I remeber see something about DMA access versus something else, and
choose DMA access. LVM maybe?

explain analyze select distinct 
case_category,identity_id,court.name,litigant_details.case_id,case_year,date_of_birth,assigned_case_role,litigant_details.court_ori,full_name,litigant_details.actor_id,case_data.type_code,case_data.subtype_code,litigant_details.impound_litigant_data,
 to_number(trim(leading case_data.type_code from trim(leading 
case_data.case_year from case_data.case_id)),'99') as seq from 
identity,court,litigant_details,case_data where identity.court_ori = 
litigant_details.court_ori and identity.case_id = litigant_details.case_id and 
identity.actor_id = litigant_details.actor_id and court.id = identity.court_ori 
and identity.court_ori = case_data.court_ori and case_data.case_id = 
identity.case_id  and identity.court_ori = 'IL081025J' and full_name like 
'MILLER%' order by  full_name;


  QUERY 
PLAN 
---
 Unique  (cost=20411.84..20411.91 rows=2 width=173) (actual 
time=38340.231..38355.120 rows=4906 loops=1)
   -  Sort  (cost=20411.84..20411.84 rows=2 width=173) (actual 
time=38340.227..38343.667 rows=4906 loops=1)
 Sort Key: identity.full_name, case_data.case_category, 
identity.identity_id, court.name, litigant_details.case_id, 
case_data.case_year, identity.date_of_birth, 
litigant_details.assigned_case_role, litigant_details.court_ori, 
litigant_details.actor_id, case_data.type_code, case_data.subtype_code, 
litigant_details.impound_litigant_data, 
to_number(ltrim(ltrim((case_data.case_id)::text, (case_data.case_year)::text), 
(case_data.type_code)::text), '99'::text)
 -  Nested Loop  (cost=0.00..20411.83 rows=2 width=173) (actual 
time=12.891..38317.017 rows=4906 loops=1)
   -  Nested Loop  (cost=0.00..20406.48 rows=1 width=159) (actual 
time=12.826..23232.106 rows=4906 loops=1)
 -  Nested Loop  (cost=0.00..20403.18 rows=1 width=138) 
(actual time=12.751..22885.439 rows=4906 loops=1)
   Join Filter: ((outer.case_id)::text = 
(inner.case_id)::text)
   -  Index Scan using name_speed on identity  
(cost=0.00..1042.34 rows=4868 width=82) (actual time=0.142..52.538 rows=4915 
loops=1)
 Index Cond: (((full_name)::text = 
'MILLER'::character varying) AND ((full_name)::text  'MILLES'::character 
varying))
 Filter: (((court_ori)::text = 
'IL081025J'::text) AND ((full_name)::text ~~ 'MILLER%'::text))
   -  Index Scan using lit_actor_speed on 
litigant_details  (cost=0.00..3.96 rows=1 width=81) (actual time=4.631..4.635 
rows=1 loops=4915)
 Index Cond: ((outer.actor_id)::text = 
(litigant_details.actor_id)::text)
 Filter: ('IL081025J'::text = (court_ori)::text)
 -  Seq Scan on court  (cost=0.00..3.29 rows=1 width=33) 
(actual time=0.053..0.062 rows=1 loops=4906)
   Filter: ('IL081025J'::text = (id)::text)
   -  Index Scan using case_speed on case_data  (cost=0.00..5.29 
rows=3 width=53) (actual time=3.049..3.058 rows=1 loops=4906)
 Index Cond: (('IL081025J'::text = 
(case_data.court_ori)::text) AND ((case_data.case_id)::text = 
(outer.case_id)::text))
 Total runtime: 38359.722 ms
(18 rows)

copa= explain analyze select distinct 
case_category,identity_id,court.name,litigant_details.case_id,case_year,date_of_birth,assigned_case_role,litigant_details.court_ori,full_name,litigant_details.actor_id,case_data.type_code,case_data.subtype_code,litigant_details.impound_litigant_data,
 to_number(trim(leading case_data.type_code from trim(leading 
case_data.case_year from case_data.case_id)),'99') as seq from 
identity,court,litigant_details,case_data where identity.court_ori = 
litigant_details.court_ori and identity.case_id = litigant_details.case_id and 
identity.actor_id = litigant_details.actor_id and court.id = identity.court_ori 
and identity.court_ori = case_data.court_ori 

Re: [PERFORM] extremly low memory usage

2005-08-18 Thread John Arbash Meinel
Jeremiah Jahn wrote:

here's an example standard query. Ireally have to make the first hit go
faster. The table is clustered as well on full_name as well. 'Smith%'
took 87 seconds on the first hit. I wonder if I set up may array wrong.
I remeber see something about DMA access versus something else, and
choose DMA access. LVM maybe?
  

It would be nice if you would format your queries to be a little bit
easier to read before posting them.
However, I believe I am reading it correctly, to say that the index scan
on identity is not your slow point. In fact, as near as I can tell, it
only takes 52ms to complete.

The expensive parts are the 4915 lookups into the litigant_details (each
one takes approx 4ms for a total of ~20s).
And then you do it again on case_data (average 3ms each * 4906 loops =
~15s).

So there is no need for preloading your indexes on the identity table.
It is definitely not the bottleneck.

So a few design bits, which may help your database.
Why is actor_id a text field instead of a number?
You could try creating an index on litigant_details (actor_id,
count_ori) so that it can do just an index lookup, rather than an index
+ filter.

More importantly, though, the planner seems to think the join of
identity to litigant_details will only return 1 row, not 5000.
Do you regularly vacuum analyze your tables?
Just as a test, try running:
set enable_nested_loop to off;
And then run EXPLAIN ANALYZE again, just to see if it is faster.

You probably need to increase some statistics targets, so that the
planner can design better plans.

  -  Nested Loop  (cost=0.00..20411.83 rows=2 width=173)
 (actual time=12.891..38317.017 rows=4906 loops=1)
-  Nested Loop  (cost=0.00..20406.48 rows=1 width=159)
 (actual time=12.826..23232.106 rows=4906 loops=1)
  -  Nested Loop  (cost=0.00..20403.18 rows=1
 width=138) (actual time=12.751..22885.439 rows=4906 loops=1)
Join Filter: ((outer.case_id)::text =
 (inner.case_id)::text)
-  Index Scan using name_speed on
 identity  (cost=0.00..1042.34 rows=4868 width=82) (actual
 time=0.142..52.538 rows=4915 loops=1)
  Index Cond: (((full_name)::text =
 'MILLER'::character varying) AND ((full_name)::text 
 'MILLES'::character varying))
  Filter: (((court_ori)::text =
 'IL081025J'::text) AND ((full_name)::text ~~ 'MILLER%'::text))
-  Index Scan using lit_actor_speed on
 litigant_details  (cost=0.00..3.96 rows=1 width=81) (actual
 time=4.631..4.635 rows=1 loops=4915)
  Index Cond: ((outer.actor_id)::text
 = (litigant_details.actor_id)::text)
  Filter: ('IL081025J'::text =
 (court_ori)::text)
  -  Seq Scan on court  (cost=0.00..3.29 rows=1
 width=33) (actual time=0.053..0.062 rows=1 loops=4906)
Filter: ('IL081025J'::text = (id)::text)
-  Index Scan using case_speed on case_data 
 (cost=0.00..5.29 rows=3 width=53) (actual time=3.049..3.058 rows=1
 loops=4906)
  Index Cond: (('IL081025J'::text =
 (case_data.court_ori)::text) AND ((case_data.case_id)::text =
 (outer.case_id)::text))


John
=:-



signature.asc
Description: OpenPGP digital signature


Re: [PERFORM] limit number of concurrent callers to a stored proc?

2005-08-18 Thread Merlin Moncure
Christopher
 You could use a 1 column/1 row table perhaps.  Use some sort of
locking
 mechanism.
 
 Also, check out contrib/userlock

userlock is definitely the way to go for this type of problem.  

The are really the only way to provide locking facilities that live
outside transactions.

You are provided with 48 bits of lock space in the form of offset/block
in 32 bit field and a 16 bit field.  The 16 bit field could be the pid
of the locker and the 32 bit field the oid of the function.

Unfortunately, userlocks are not really easy to query via the pg_locks()
view.  However this has been addressed for 8.1.  In 8.1, it will be
trivial to create a function which checked the number of lockers on the
function oid and acquire a lock if less than a certain amount.

Merlin

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] FW: Tx forecast improving harware capabilities.

2005-08-18 Thread David Hodgkinson


On 18 Aug 2005, at 16:01, Sebastian Lallana wrote:


It exists something like this? Does anybody has experience about  
this subject?


I've just been through this with a client with both a badly tuned Pg and
an application being less than optimal.

First, find a benchmark. Just something you can hold on to. For us, it
was the generation time of the site's home page. In this case, 7  
seconds.

We looked hard at postgresql.conf, planned the memory usage, sort_memory
and all that. That was a boost. Then we looked at the queries that were
being thrown at the database. Over 200 to build one page! So, a layer
of caching was built into the web server layer. Finally, some frequently
occurring combinations of queries were pushed down into stored procs.
We got the page gen time down to 1.5 seconds AND the server being stable
under extreme stress. So, a fair win.

Thanks to cms for several clues.

So, without understanding your application and were it's taking the  
time,

you can't begin to estimate hardware usage.



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Performance pb vs SQLServer.

2005-08-18 Thread Stéphane COEZ
John Arbash Meinel wrote : 
 
 You might also try a different query, something like:
 
 SELECT DISTINCT cod FROM mytable ORDER BY cod GROUP BY cod; 
 (You may or may not want order by, or group by, try the different
 combinations.)
 It might be possible to have the planner realize that all you 
 want is unique rows, just doing a group by doesn't give you that.
 
 John
 =:-
 
Thanks John, but using SELECT DISTINCT with or without Order nor Group by is
worth...
30 sec (with index) - stopped at 200 sec without index...

So Hash Aggregate is much better than index scan ...


 
 Thanks for help.
  
 Stéphane COEZ
 
 
 
 
 ---(end of 
 broadcast)---
 TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match
 
   
 
 
 
 




---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] [JDBC] Data Selection Slow From VB 6.0

2005-08-18 Thread Oliver Jowett

Mahesh Shinde wrote:

Hi
I am using Postgres version
**PostgreSQL 7.4.5 on i686-pc-linux-gnu, compiled by GCC gcc 
(GCC) 3.2.2 20030222 (Red Hat Linux 3.2.2-5).* *
for an multy user desktop application using VB 6.0 as a front 
end toll.
 
To connect To the PostgreSQL I am using **PostgreSQL Win32 ODBC and 
OLEDB client drivers 1.0.0.2**


pgsql-jdbc isn't relevant, then -- the JDBC driver is not involved.

-O

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] [JDBC] Performance problem using V3 protocol in jdbc driver

2005-08-18 Thread Csaba Nagy
Barry,

I have made a similar experience, moving a big Oracle data base to
Postgres 8.03 on linux.
The first impact was similar, huge performance problems.
The main problem was bad planner choices. The cause in our case: bad
parameter types in the jdbc set methods (I guess you use Java). For
oracle we used the NUMERIC type to set primary keys, but the postgres id
type used was BIGINT, and it just refused to use the index in this case.
Imagine that kicking in on a 100 million rows table... a sequential scan
started a few times a second, now that made the DB unusable.
So we fixed the code that for oracle continues to use NUMERIC and for
postgres it uses BIGINT, and that is very important on setNull calls
too.

One very useful tool was the following query:

prepare ps as
SELECT procpid, substring(current_query for 97),
to_char((now()-query_start), 'HH24:MI:SS') as t
FROM pg_stat_activity
where current_query not like '%insufficient%'
  and current_query not like '%IDLE%' order by t desc;

Then you just execute ps; in psql, and it will show you the queries
which are already running for a while.

Other problems were caused by complex queries, where more than 2 tables
were joined. For oracle we were giving hints in the form of special
comments, to point to the right index, right plan, but that's not an
option for postgres (yet ?). So the fix in this case was to use explicit
joins which do influence the postgres planner choices. This fixed
another class of issues for us...

Another problem: if you want to avoid worst-case plans, and do away with
a generic plan for all cases, then you might force the usage of server
side prepare statements in all cases. I had to do that, a lot of queries
were performing very badly without this. Now maybe that could be solved
by raising the statistics targets where needed, but in my case the
generic plan was always good enough, by design. We rely on the DB
picking a good generic plan in all cases. One typical example for us
would be: a limit query which select 20 rows out of 100 million, with a
where clause which actually selects 1 row out of it for the last
chunk... it was going for an index scan, but on the wrong index. The
right index would have selected that exactly 1 row, the wrong one had to
cruise through a few million rows... the limit fooled the planner that
it will get 20 rows quickly. Now when I forced the usage of a prepared
statement, it went for the right index and all was good.
I actually set this in our connection pool:
((PGConnection)connection).setPrepareThreshold(1);
but it is possible to set/reset it on a statement level, I just didn't
find any query I should to do it for yet... the DB is steady now.

Another issue was that we've had some functional indexes on oracle
returning null for uninteresting rows, to lower the index size. This is
easier to implement on postgres using a partial index, which has a lot
simpler syntax than the oracle hack, and it is easier to handle. The
catch was that we needed to change the where clause compared to oracle
so that postgres picks the partial index indeed. There are cases where
the planner can't figure out that it can use the index, especially if
you use prepared statements and one of the parameters is used in the
index condition. In this case it is needed to add the proper restriction
to the where clause to point postgres to use the partial index. Using
partial indexes speeds up the inserts and updates on those tables, and
could speed up some selects too.

Hmmm... that's about what I recall now... beside the postgres admin
stuff, have you analyzed your data after import ? I forgot to do that at
first, and almost reverted again back to oracle... and then after a few
days it was very clear that running the auto-vacuum daemon is also a
must :-)
And: for big data sets is important to tweak all performance settings in
the config file, otherwise you get surprises. We've been running a
smaller instance of the same code on postgres for quite a while before
deciding to migrate a big one, and that was cruising along happily with
the default settings, so the first time we needed to do optimizations
was when using a data set with a lot of data in it...

HTH,
Csaba.


On Wed, 2005-08-17 at 06:42, Barry Lind wrote:
 We just moved a large production instance of ours from Oracle to
 Postgres 8.0.3 on linux.  When running on Oracle the machine hummed
 along using about 5% of the CPU easily handling the fairly constant
 load, after moving the data to Postgres the machine was pretty much
 maxed out on CPU and could no longer keep up with the transaction
 volume.  On a hunch I switched the jdbc driver to using the V2
 protocol and the load on the machine dropped down to what it was when
 using Oracle and everything was fine.
 
  
 
 Now obviously I have found a work around for the performance problem,
 but I really don’t want to rely on using the V2 protocol forever, and
 don’t want to have to recommend to our customers that they need 

Re: [PERFORM] Performance pb vs SQLServer.

2005-08-18 Thread Stéphane COEZ
 De : Magnus Hagander [mailto:[EMAIL PROTECTED] 
 Out of curiosity, what plan do you get from SQLServer? I bet 
 it's a clustered index scan...
 
 
 //Magnus
 

I have a Table scan and Hashaggregate...
Stephane
 




---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Need for speed

2005-08-18 Thread Roger Hand
 Ulrich Wisser wrote:
 
  one of our services is click counting for on line advertising. We do
  this by importing Apache log files every five minutes. This results in a
  lot of insert and delete statements. 
...
 If you are doing mostly inserting, make sure you are in a transaction,

Well, yes, but you may need to make sure that a single transaction doesn't have 
too many inserts in it.
I was having a performance problem when doing transactions with a huge number 
of inserts
(tens of thousands), and I solved the problem by putting a simple counter in 
the loop (in the Java import code, 
that is) and doing a commit every 100 or so inserts.

-Roger

 John

  Ulrich

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Performance pb vs SQLServer.

2005-08-18 Thread Stéphane COEZ
 
 
 One little thing. Did you shutdown sql2000 while testing 
 postgresql? Remember that postgresql uses system cache. 
 Sql2000 uses a large part of memory as buffer and it will not 
 be available to operating system. I must say that, probably, 
 results will be the same, but it will be a better test.
 

Shutting done SQL2000 has no effect on PG performancies.

Stephane.




---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


[PERFORM] Query plan looks OK, but slow I/O - settings advice?

2005-08-18 Thread Roger Hand
Summary
===
We are writing to the db pretty much 24 hours a day.
Recently the amount of data we write has increased, and the query speed, 
formerly okay, has taken a dive.
The query is using the indexes as expected, so I don't _think_ I have a query 
tuning issue, just an io problem. 
The first time a query is done it takes about 60 seconds. The second time it 
runs in about 6 seconds.
What I know I need advice on is io settings and various buffer settings. 
I may also need advice on other things, but just don't know it yet!

Below is ...
- an explain analyze
- details of the db setup and hardware
- some vmstat and iostat output showing the disks are very busy
- the SHOW ALL output for the db config.

Details
===
Postgres 8.0.3

Below is a sample query. (This is actually implemented as a prepared statement. 
Here I fill in the '?'s with actual values.)

electric=# EXPLAIN ANALYZE
electric-# SELECT datavalue, logfielddatatype, timestamp FROM logdata_recent 
electric-# WHERE (logfielddatatype = 70 OR logfielddatatype = 71 OR 
logfielddatatype = 69) 
electric-# AND graphtargetlog = 1327 
electric-# AND timestamp = 1123052400 AND timestamp = 1123138800 
electric-# ORDER BY timestamp;


QUERY PLAN  

   
--
 Sort  (cost=82.48..82.50 rows=6 width=14) (actual time=60208.968..60211.232 
rows=2625 loops=1)
   Sort Key: public.logdata_recent.timestamp
   -  Result  (cost=0.00..82.41 rows=6 width=14) (actual 
time=52.483..60200.868 rows=2625 loops=1)
 -  Append  (cost=0.00..82.41 rows=6 width=14) (actual 
time=52.476..60189.929 rows=2625 loops=1)
   -  Seq Scan on logdata_recent  (cost=0.00..46.25 rows=1 
width=14) (actual time=0.003..0.003 rows=0 loops=1)
 Filter: (((logfielddatatype = 70) OR (logfielddatatype = 
71) OR (logfielddatatype = 69)) AND (graphtargetlog = 1327) AND (timestamp = 
1123052400) AND (timestamp = 1123138800))
   -  Index Scan using logdata_recent_1123085306_ix_t_fld_gtl, 
logdata_recent_1123085306_ix_t_fld_gtl, logdata_recent_1123085306_ix_t_fld_gtl 
on logdata_recent_stale logdata_recent  (cost=0.00..18.08 rows=3 width=14) 
(actual time=52.465..60181.624 rows=2625 loops=1)
 Index Cond: (((timestamp = 1123052400) AND (timestamp 
= 1123138800) AND (logfielddatatype = 70) AND (graphtargetlog = 1327)) OR 
((timestamp = 1123052400) AND (timestamp = 1123138800) AND 
(logfielddatatype = 71) AND (graphtargetlog = 1327)) OR ((timestamp = 
1123052400) AND (timestamp = 1123138800) AND (logfielddatatype = 69) AND 
(graphtargetlog = 1327)))
 Filter: (((logfielddatatype = 70) OR (logfielddatatype = 
71) OR (logfielddatatype = 69)) AND (graphtargetlog = 1327) AND (timestamp = 
1123052400) AND (timestamp = 1123138800))
   -  Index Scan using logdata_recent_1123139634_ix_t_fld_gtl, 
logdata_recent_1123139634_ix_t_fld_gtl, logdata_recent_1123139634_ix_t_fld_gtl 
on logdata_recent_active logdata_recent  (cost=0.00..18.08 rows=2 width=14) 
(actual time=0.178..0.178 rows=0 loops=1)
 Index Cond: (((timestamp = 1123052400) AND (timestamp 
= 1123138800) AND (logfielddatatype = 70) AND (graphtargetlog = 1327)) OR 
((timestamp = 1123052400) AND (timestamp = 1123138800) AND 
(logfielddatatype = 71) AND (graphtargetlog = 1327)) OR ((timestamp = 
1123052400) AND (timestamp = 1123138800) AND (logfielddatatype = 69) AND 
(graphtargetlog = 1327)))
 Filter: (((logfielddatatype = 70) OR (logfielddatatype = 
71) OR (logfielddatatype = 69)) AND (graphtargetlog = 1327) AND (timestamp = 
1123052400) AND (timestamp = 1123138800))
 Total runtime: 60214.545 ms
(13 rows)

60 seconds is much longer than it used to be. I would guess it used to be under 
10 seconds. The second time the above query is run we see the magic of caching 
as the time goes down to 6 seconds.

logdata_recent_active and logdata_recent_stale are inherited tables of 
logdata_recent, which never has any data. (This is pseudo-partitioning in 
action!)
So the very quick seq_scan on the empty logdata_recent parent table is okay 
with me.

The index is built on timestamp, logfielddatatype, graphtargetlog. I am curious 
as to why the same index shows up 3 times in the using clause, but can live 
without knowing the details as long as it doesn't indicate that something's 
wrong.

The logdata_recent_stale table has 5 millions rows. The size of the table 
itself, on disk, is 324MB. The size of the index is 210MB.

The disks are ext3 with journalling type of ordered, but this was later changed 
to 

Re: [PERFORM] I'm configuraing a new system (Bigish) and need some advice.

2005-08-18 Thread Jeremiah Jahn
7.4 is the pg version BTWgoing to switch to 8 if it's worth it.


Ingrate, n.: A man who bites the hand that feeds him, and then complains
of indigestion.
-- 
Don't say yes until I finish talking.
-- Darryl F. Zanuck


signature.asc
Description: This is a digitally signed message part


[PERFORM] I'm configuraing a new system (Bigish) and need some advice.

2005-08-18 Thread Jeremiah Jahn
The system is a dual Xenon with 6Gig of ram and 14 73Gig 15K u320 scsi
drives. Plus 2 raid 1 system dives.

RedHat EL ES4 is the OS. 


Any1 have any suggestions as to the configuration? The database is about
60 Gig's. Should jump to 120 here quite soon. Mus of the searches
involve people's names. Through a website. My current setup just doesn't
seem to have resulted in the performance kick I wanted. I don't know if
it's LVM or what. The strang thing is that My Memory usage stays very
LOW for some reason. While on my current production server it stays very
high. Also looking for ideas on stipe and extent size. The below is run
off of a RAID 10. I have not moved my WAL file yet, but there were no
incoming transactions at the time the query was run. My stats on the
identity table are set to 1000.



 explain analyze select distinct 
 case_category,identity_id,court.name,litigant_details.case_id,case_year,date_of_birth,assigned_case_role,litigant_details.court_ori,full_name,litigant_details.actor_id,case_data.type_code,case_data.subtype_code,litigant_details.impound_litigant_data,
  to_number(trim(leading case_data.type_code from trim(leading 
 case_data.case_year from case_data.case_id)),'99') as seq from 
 identity,court,litigant_details,case_data where identity.court_ori = 
 litigant_details.court_ori and identity.case_id = litigant_details.case_id 
 and identity.actor_id = litigant_details.actor_id and court.id = 
 identity.court_ori and identity.court_ori = case_data.court_ori and 
 case_data.case_id = identity.case_id  and identity.court_ori = 'IL081025J' 
 and full_name like 'SMITH%' order by  full_name;
   
   
   
 QUERY PLAN 
 ---
  Unique  (cost=34042.46..34042.57 rows=3 width=173) (actual 
 time=63696.896..63720.193 rows=8086 loops=1)
-  Sort  (cost=34042.46..34042.47 rows=3 width=173) (actual 
 time=63696.892..63702.239 rows=8086 loops=1)
  Sort Key: identity.full_name, case_data.case_category, 
 identity.identity_id, court.name, litigant_details.case_id, 
 case_data.case_year, identity.date_of_birth, 
 litigant_details.assigned_case_role, litigant_details.court_ori, 
 litigant_details.actor_id, case_data.type_code, case_data.subtype_code, 
 litigant_details.impound_litigant_data, 
 to_number(ltrim(ltrim((case_data.case_id)::text, 
 (case_data.case_year)::text), (case_data.type_code)::text), '99'::text)
  -  Nested Loop  (cost=0.00..34042.43 rows=3 width=173) (actual 
 time=135.498..63655.542 rows=8086 loops=1)
-  Nested Loop  (cost=0.00..34037.02 rows=1 width=159) 
 (actual time=95.760..34637.611 rows=8086 loops=1)
  -  Nested Loop  (cost=0.00..34033.72 rows=1 width=138) 
 (actual time=89.222..34095.763 rows=8086 loops=1)
Join Filter: ((outer.case_id)::text = 
 (inner.case_id)::text)
-  Index Scan using name_speed on identity  
 (cost=0.00..1708.26 rows=8152 width=82) (actual time=42.589..257.818 
 rows=8092 loops=1)
  Index Cond: (((full_name)::text = 
 'SMITH'::character varying) AND ((full_name)::text  'SMITI'::character 
 varying))
  Filter: (((court_ori)::text = 
 'IL081025J'::text) AND ((full_name)::text ~~ 'SMITH%'::text))
-  Index Scan using lit_actor_speed on 
 litigant_details  (cost=0.00..3.95 rows=1 width=81) (actual time=4.157..4.170 
 rows=1 loops=8092)
  Index Cond: ((outer.actor_id)::text = 
 (litigant_details.actor_id)::text)
  Filter: ('IL081025J'::text = 
 (court_ori)::text)
  -  Seq Scan on court  (cost=0.00..3.29 rows=1 width=33) 
 (actual time=0.051..0.058 rows=1 loops=8086)
Filter: ('IL081025J'::text = (id)::text)
-  Index Scan using case_data_pkey on case_data  
 (cost=0.00..5.36 rows=2 width=53) (actual time=3.569..3.572 rows=1 loops=8086)
  Index Cond: (('IL081025J'::text = 
 (case_data.court_ori)::text) AND ((case_data.case_id)::text = 
 (outer.case_id)::text))
  Total runtime: 63727.873 ms
 
 



 tcpip_socket = true
 max_connections = 100
 shared_buffers = 5  # min 16, at least max_connections*2, 8KB each
 sort_mem =