date:20050818

Re: [PERFORM] Query plan looks OK, but slow I/O - settings advice?

2005-08-18 Thread Jeffrey W. Baker

On Tue, 2005-08-16 at 10:46 -0700, Roger Hand wrote:
> The disks are ext3 with journalling type of ordered, but this was later 
> changed to writeback with no apparent change in speed.
> 
> They're on a Dell poweredge 6650 with LSI raid card, setup as follows:
> 4 disks raid 10 for indexes (145GB) - sdc1
> 6 disks raid 10 for data (220GB) - sdd1
> 2 mirrored disks for logs - sdb1
> 
> stripe size is 32k
> cache policy: cached io (am told the controller has bbu)
> write policy: write-back
> read policy: readahead

I assume you are using Linux 2.6.  Have you considered booting your
machine with elevator=deadline?  You can also change this at runtime
using sysfs.

These read speeds are not too impressive.  Perhaps this is a slow
controller.  Alternately you might need bigger CPUs.

There's a lot of possibilities, obviously :)  I'd start with the
elevator, since that's easily tested.

-jwb

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

[PERFORM] I'm configuraing a new system (Bigish) and need some advice.

2005-08-18 Thread Jeremiah Jahn

The system is a dual Xenon with 6Gig of ram and 14 73Gig 15K u320 scsi
drives. Plus 2 raid 1 system dives.

RedHat EL ES4 is the OS. 


Any1 have any suggestions as to the configuration? The database is about
60 Gig's. Should jump to 120 here quite soon. Mus of the searches
involve people's names. Through a website. My current setup just doesn't
seem to have resulted in the performance kick I wanted. I don't know if
it's LVM or what. The strang thing is that My Memory usage stays very
LOW for some reason. While on my current production server it stays very
high. Also looking for ideas on stipe and extent size. The below is run
off of a RAID 10. I have not moved my WAL file yet, but there were no
incoming transactions at the time the query was run. My stats on the
identity table are set to 1000.



> explain analyze select distinct 
> case_category,identity_id,court.name,litigant_details.case_id,case_year,date_of_birth,assigned_case_role,litigant_details.court_ori,full_name,litigant_details.actor_id,case_data.type_code,case_data.subtype_code,litigant_details.impound_litigant_data,
>  to_number(trim(leading case_data.type_code from trim(leading 
> case_data.case_year from case_data.case_id)),'99') as seq from 
> identity,court,litigant_details,case_data where identity.court_ori = 
> litigant_details.court_ori and identity.case_id = litigant_details.case_id 
> and identity.actor_id = litigant_details.actor_id and court.id = 
> identity.court_ori and identity.court_ori = case_data.court_ori and 
> case_data.case_id = identity.case_id  and identity.court_ori = 'IL081025J' 
> and full_name like 'SMITH%' order by  full_name;
>   
>   
>   
> QUERY PLAN 
> ---
>  Unique  (cost=34042.46..34042.57 rows=3 width=173) (actual 
> time=63696.896..63720.193 rows=8086 loops=1)
>->  Sort  (cost=34042.46..34042.47 rows=3 width=173) (actual 
> time=63696.892..63702.239 rows=8086 loops=1)
>  Sort Key: identity.full_name, case_data.case_category, 
> identity.identity_id, court.name, litigant_details.case_id, 
> case_data.case_year, identity.date_of_birth, 
> litigant_details.assigned_case_role, litigant_details.court_ori, 
> litigant_details.actor_id, case_data.type_code, case_data.subtype_code, 
> litigant_details.impound_litigant_data, 
> to_number(ltrim(ltrim((case_data.case_id)::text, 
> (case_data.case_year)::text), (case_data.type_code)::text), '99'::text)
>  ->  Nested Loop  (cost=0.00..34042.43 rows=3 width=173) (actual 
> time=135.498..63655.542 rows=8086 loops=1)
>->  Nested Loop  (cost=0.00..34037.02 rows=1 width=159) 
> (actual time=95.760..34637.611 rows=8086 loops=1)
>  ->  Nested Loop  (cost=0.00..34033.72 rows=1 width=138) 
> (actual time=89.222..34095.763 rows=8086 loops=1)
>Join Filter: (("outer".case_id)::text = 
> ("inner".case_id)::text)
>->  Index Scan using name_speed on identity  
> (cost=0.00..1708.26 rows=8152 width=82) (actual time=42.589..257.818 
> rows=8092 loops=1)
>  Index Cond: (((full_name)::text >= 
> 'SMITH'::character varying) AND ((full_name)::text < 'SMITI'::character 
> varying))
>  Filter: (((court_ori)::text = 
> 'IL081025J'::text) AND ((full_name)::text ~~ 'SMITH%'::text))
>->  Index Scan using lit_actor_speed on 
> litigant_details  (cost=0.00..3.95 rows=1 width=81) (actual time=4.157..4.170 
> rows=1 loops=8092)
>  Index Cond: (("outer".actor_id)::text = 
> (litigant_details.actor_id)::text)
>  Filter: ('IL081025J'::text = 
> (court_ori)::text)
>  ->  Seq Scan on court  (cost=0.00..3.29 rows=1 width=33) 
> (actual time=0.051..0.058 rows=1 loops=8086)
>Filter: ('IL081025J'::text = (id)::text)
>->  Index Scan using case_data_pkey on case_data  
> (cost=0.00..5.36 rows=2 width=53) (actual time=3.569..3.572 rows=1 loops=8086)
>  Index Cond: (('IL081025J'::text = 
> (case_data.court_ori)::text) AND ((case_data.case_id)::text = 
> ("outer".case_id)::text))
>  Total runtime: 63727.873 ms
> 
> 



> tcpip_socket = true
> max_connections = 100
> shared_buffers

Re: [PERFORM] I'm configuraing a new system (Bigish) and need some advice.

2005-08-18 Thread Jeremiah Jahn

7.4 is the pg version BTWgoing to switch to 8 if it's worth it.


Ingrate, n.: A man who bites the hand that feeds him, and then complains
of indigestion.
-- 
"Don't say yes until I finish talking."
-- Darryl F. Zanuck


signature.asc
Description: This is a digitally signed message part

[PERFORM] Query plan looks OK, but slow I/O - settings advice?

2005-08-18 Thread Roger Hand

Summary
===
We are writing to the db pretty much 24 hours a day.
Recently the amount of data we write has increased, and the query speed, 
formerly okay, has taken a dive.
The query is using the indexes as expected, so I don't _think_ I have a query 
tuning issue, just an io problem. 
The first time a query is done it takes about 60 seconds. The second time it 
runs in about 6 seconds.
What I know I need advice on is io settings and various buffer settings. 
I may also need advice on other things, but just don't know it yet!

Below is ...
- an explain analyze
- details of the db setup and hardware
- some vmstat and iostat output showing the disks are very busy
- the SHOW ALL output for the db config.

Details
===
Postgres 8.0.3

Below is a sample query. (This is actually implemented as a prepared statement. 
Here I fill in the '?'s with actual values.)

electric=# EXPLAIN ANALYZE
electric-# SELECT datavalue, logfielddatatype, timestamp FROM logdata_recent 
electric-# WHERE (logfielddatatype = 70 OR logfielddatatype = 71 OR 
logfielddatatype = 69) 
electric-# AND graphtargetlog = 1327 
electric-# AND timestamp >= 1123052400 AND timestamp <= 1123138800 
electric-# ORDER BY timestamp;


QUERY PLAN  

   
--
 Sort  (cost=82.48..82.50 rows=6 width=14) (actual time=60208.968..60211.232 
rows=2625 loops=1)
   Sort Key: public.logdata_recent."timestamp"
   ->  Result  (cost=0.00..82.41 rows=6 width=14) (actual 
time=52.483..60200.868 rows=2625 loops=1)
 ->  Append  (cost=0.00..82.41 rows=6 width=14) (actual 
time=52.476..60189.929 rows=2625 loops=1)
   ->  Seq Scan on logdata_recent  (cost=0.00..46.25 rows=1 
width=14) (actual time=0.003..0.003 rows=0 loops=1)
 Filter: (((logfielddatatype = 70) OR (logfielddatatype = 
71) OR (logfielddatatype = 69)) AND (graphtargetlog = 1327) AND ("timestamp" >= 
1123052400) AND ("timestamp" <= 1123138800))
   ->  Index Scan using logdata_recent_1123085306_ix_t_fld_gtl, 
logdata_recent_1123085306_ix_t_fld_gtl, logdata_recent_1123085306_ix_t_fld_gtl 
on logdata_recent_stale logdata_recent  (cost=0.00..18.08 rows=3 width=14) 
(actual time=52.465..60181.624 rows=2625 loops=1)
 Index Cond: ((("timestamp" >= 1123052400) AND ("timestamp" 
<= 1123138800) AND (logfielddatatype = 70) AND (graphtargetlog = 1327)) OR 
(("timestamp" >= 1123052400) AND ("timestamp" <= 1123138800) AND 
(logfielddatatype = 71) AND (graphtargetlog = 1327)) OR (("timestamp" >= 
1123052400) AND ("timestamp" <= 1123138800) AND (logfielddatatype = 69) AND 
(graphtargetlog = 1327)))
 Filter: (((logfielddatatype = 70) OR (logfielddatatype = 
71) OR (logfielddatatype = 69)) AND (graphtargetlog = 1327) AND ("timestamp" >= 
1123052400) AND ("timestamp" <= 1123138800))
   ->  Index Scan using logdata_recent_1123139634_ix_t_fld_gtl, 
logdata_recent_1123139634_ix_t_fld_gtl, logdata_recent_1123139634_ix_t_fld_gtl 
on logdata_recent_active logdata_recent  (cost=0.00..18.08 rows=2 width=14) 
(actual time=0.178..0.178 rows=0 loops=1)
 Index Cond: ((("timestamp" >= 1123052400) AND ("timestamp" 
<= 1123138800) AND (logfielddatatype = 70) AND (graphtargetlog = 1327)) OR 
(("timestamp" >= 1123052400) AND ("timestamp" <= 1123138800) AND 
(logfielddatatype = 71) AND (graphtargetlog = 1327)) OR (("timestamp" >= 
1123052400) AND ("timestamp" <= 1123138800) AND (logfielddatatype = 69) AND 
(graphtargetlog = 1327)))
 Filter: (((logfielddatatype = 70) OR (logfielddatatype = 
71) OR (logfielddatatype = 69)) AND (graphtargetlog = 1327) AND ("timestamp" >= 
1123052400) AND ("timestamp" <= 1123138800))
 Total runtime: 60214.545 ms
(13 rows)

60 seconds is much longer than it used to be. I would guess it used to be under 
10 seconds. The second time the above query is run we see the magic of caching 
as the time goes down to 6 seconds.

logdata_recent_active and logdata_recent_stale are inherited tables of 
logdata_recent, which never has any data. (This is pseudo-partitioning in 
action!)
So the very quick seq_scan on the empty logdata_recent parent table is okay 
with me.

The index is built on timestamp, logfielddatatype, graphtargetlog. I am curious 
as to why the same index shows up 3 times in the "using" clause, but can live 
without knowing the details as long as it doesn't indicate that something's 
wrong.

The logdata_recent_stale table has 5 millions rows. The size of the table 
itself, on disk, is 324MB. The size of the index is 210MB.

The disks are ext3 with jou

Re: [PERFORM] Performance pb vs SQLServer.

2005-08-18 Thread Stéphane COEZ

 
> 
> One little thing. Did you shutdown sql2000 while testing 
> postgresql? Remember that postgresql uses system cache. 
> Sql2000 uses a large part of memory as buffer and it will not 
> be available to operating system. I must say that, probably, 
> results will be the same, but it will be a better test.
> 

Shutting done SQL2000 has no effect on PG performancies.

Stephane.




---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

Re: [PERFORM] Need for speed

2005-08-18 Thread Roger Hand

> Ulrich Wisser wrote:
> >
> > one of our services is click counting for on line advertising. We do
> > this by importing Apache log files every five minutes. This results in a
> > lot of insert and delete statements. 
...
> If you are doing mostly inserting, make sure you are in a transaction,

Well, yes, but you may need to make sure that a single transaction doesn't have 
too many inserts in it.
I was having a performance problem when doing transactions with a huge number 
of inserts
(tens of thousands), and I solved the problem by putting a simple counter in 
the loop (in the Java import code, 
that is) and doing a commit every 100 or so inserts.

-Roger

> John
>
> > Ulrich

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] Performance pb vs SQLServer.

2005-08-18 Thread Stéphane COEZ

> De : Magnus Hagander [mailto:[EMAIL PROTECTED] 
> Out of curiosity, what plan do you get from SQLServer? I bet 
> it's a clustered index scan...
> 
> 
> //Magnus
> 

I have a Table scan and Hashaggregate...
Stephane
 




---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] [JDBC] Performance problem using V3 protocol in jdbc driver

2005-08-18 Thread Csaba Nagy

Barry,

I have made a similar experience, moving a big Oracle data base to
Postgres 8.03 on linux.
The first impact was similar, huge performance problems.
The main problem was bad planner choices. The cause in our case: bad
parameter types in the jdbc set methods (I guess you use Java). For
oracle we used the NUMERIC type to set primary keys, but the postgres id
type used was BIGINT, and it just refused to use the index in this case.
Imagine that kicking in on a 100 million rows table... a sequential scan
started a few times a second, now that made the DB unusable.
So we fixed the code that for oracle continues to use NUMERIC and for
postgres it uses BIGINT, and that is very important on setNull calls
too.

One very useful tool was the following query:

prepare ps as
SELECT procpid, substring(current_query for 97),
to_char((now()-query_start), 'HH24:MI:SS') as t
FROM pg_stat_activity
where current_query not like '% We just moved a large production instance of ours from Oracle to
> Postgres 8.0.3 on linux.  When running on Oracle the machine hummed
> along using about 5% of the CPU easily handling the fairly constant
> load, after moving the data to Postgres the machine was pretty much
> maxed out on CPU and could no longer keep up with the transaction
> volume.  On a hunch I switched the jdbc driver to using the V2
> protocol and the load on the machine dropped down to what it was when
> using Oracle and everything was fine.
> 
>  
> 
> Now obviously I have found a work around for the performance problem,
> but I really don’t want to rely on using the V2 protocol forever, and
> don’t want to have to recommend to our customers that they need to run
> with the V2 protocol.  So I would like to resolve the problem and be
> able to move back to a default configuration with the V3 protocol and
> the benefits thereof.
> 
>  
> 
> The problem is that I don’t really know where to begin to debug a
> problem like this.  In development environments and testing
> environments we have not seen performance problems with the V3
> protocol in the jdbc driver.  But they don’t come close to approaching
> the transaction volume of this production instance.
> 
>  
> 
> What I see when running the V3 protocol under ‘top’ is that the
> postgres processes are routinely using 15% or more of the CPU each,
> when running the V2 protocol they use more like 0.3%.
> 
>  
> 
> Does anyone have any suggestions on an approach to debug a problem
> like this?
> 
>  
> 
> Thanks,
> 
> --Barry
> 
> 


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [PERFORM] [JDBC] Data Selection Slow From VB 6.0

2005-08-18 Thread Oliver Jowett


Mahesh Shinde wrote:

Hi
I am using Postgres version
**PostgreSQL 7.4.5 on i686-pc-linux-gnu, compiled by GCC gcc 
(GCC) 3.2.2 20030222 (Red Hat Linux 3.2.2-5).* *
for an multy user desktop application using VB 6.0 as a front 
end toll.
 
To connect To the PostgreSQL I am using **PostgreSQL Win32 ODBC and 
OLEDB client drivers 1.0.0.2**


pgsql-jdbc isn't relevant, then -- the JDBC driver is not involved.

-O

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly

Re: [PERFORM] Performance pb vs SQLServer.

2005-08-18 Thread Stéphane COEZ

John Arbash Meinel wrote : 
> 
> You might also try a different query, something like:
> 
> SELECT DISTINCT cod FROM mytable ORDER BY cod GROUP BY cod; 
> (You may or may not want order by, or group by, try the different
> combinations.)
> It might be possible to have the planner realize that all you 
> want is unique rows, just doing a group by doesn't give you that.
> 
> John
> =:->
> 
Thanks John, but using SELECT DISTINCT with or without Order nor Group by is
worth...
30 sec (with index) - stopped at 200 sec without index...

So Hash Aggregate is much better than index scan ...


> >
> >Thanks for help.
> > 
> >Stéphane COEZ
> >
> >
> >
> >
> >---(end of 
> >broadcast)---
> >TIP 9: In versions below 8.0, the planner will ignore your desire to
> >   choose an index scan if your joining column's datatypes do not
> >   match
> >
> >  
> >
> 
> 
> 




---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [PERFORM] FW: Tx forecast improving harware capabilities.

2005-08-18 Thread David Hodgkinson



On 18 Aug 2005, at 16:01, Sebastian Lallana wrote:


It exists something like this? Does anybody has experience about  
this subject?


I've just been through this with a client with both a badly tuned Pg and
an application being less than optimal.

First, find a benchmark. Just something you can hold on to. For us, it
was the generation time of the site's home page. In this case, 7  
seconds.

We looked hard at postgresql.conf, planned the memory usage, sort_memory
and all that. That was a boost. Then we looked at the queries that were
being thrown at the database. Over 200 to build one page! So, a layer
of caching was built into the web server layer. Finally, some frequently
occurring combinations of queries were pushed down into stored procs.
We got the page gen time down to 1.5 seconds AND the server being stable
under extreme stress. So, a fair win.

Thanks to cms for several clues.

So, without understanding your application and were it's taking the  
time,

you can't begin to estimate hardware usage.



---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] extremly low memory usage

2005-08-18 Thread Ron

At 01:55 PM 8/18/2005, John Arbash Meinel wrote:

Jeremiah Jahn wrote:

>here's an example standard query. Ireally have to make the first hit go
>faster. The table is clustered as well on full_name as well. 'Smith%'
>took 87 seconds on the first hit. I wonder if I set up may array wrong.
>I remeber see something about DMA access versus something else, and
>choose DMA access. LVM maybe?
>
>
It would be nice if you would format your queries to be a little bit
easier to read before posting them.
However, I believe I am reading it correctly, to say that the index scan
on identity is not your slow point. In fact, as near as I can tell, it
only takes 52ms to complete.

The expensive parts are the 4915 lookups into the litigant_details (each
one takes approx 4ms for a total of ~20s).
And then you do it again on case_data (average 3ms each * 4906 loops =
~15s).

How big are litigant_details and case_data?  If they can fit in RAM, 
preload them using methods like the "cat to /dev/null" trick and 
those table lookups will be ~100-1000x faster.  If they won't fit 
into RAM but the machine can be expanded to hold enough RAM to fit 
the tables, it's well worth the ~$75-$150/GB to upgrade the server so 
that the tables will fit into RAM.

If they can't be made to fit into RAM as atomic entities, you have a 
few choices:
A= Put the data tables and indexes on separate dedicated spindles and 
put litigant_details and case_data each on their own dedicated 
spindles.  This will lower seek conflicts.  Again it this requires 
buying some more HDs, it's well worth it.

B= Break litigant_details and case_data into a set of smaller tables 
(based on something sane like the first n characters of the primary key)
such that the smaller tables easily fit into RAM.  Given that you've 
said only 10GB/60GB is "hot", this could work very well.  Combine it 
with "A" above (put all the litigant_details sub tables on one 
dedicated spindle set and all the case_data sub tables on another 
spindle set) for added oomph.

C= Buy a SSD big enough to hold litigant_details and case_data and 
put them there.  Again, this can be combined with "A" and "B" above 
to lessen the size of the SSD needed.

So there is no need for preloading your indexes on the identity 
table.  It is definitely not the bottleneck.

So a few design bits, which may help your database.  Why is 
"actor_id" a text field instead of a number?
You could try creating an index on "litigant_details (actor_id, 
count_ori)" so that it can do just an index lookup, rather than an 
index+ filter.

Yes, that certainly sounds like it would be more efficient.

More importantly, though, the planner seems to think the join of 
identity to litigant_details will only return 1 row, not 5000.

Do you regularly vacuum analyze your tables?
Just as a test, try running:
set enable_nested_loop to off;
And then run EXPLAIN ANALYZE again, just to see if it is faster.

You probably need to increase some statistics targets, so that the
planner can design better plans.

> ->  Nested Loop  (cost=0.00..20411.83 rows=2 width=173) (actual 
time=12.891..38317.017 rows=4906 loops=1)
>->  Nested Loop  (cost=0.00..20406.48 rows=1 width=159)(actual 
time=12.826..23232.106 rows=4906 loops=1)
>   ->  Nested Loop  (cost=0.00..20403.18 rows=1 width=138) 
(actual time=12.751..22885.439 rows=4906 loops=1)
>   Join Filter: (("outer".case_id)::text = 
("inner".case_id)::text)
>   ->  Index Scan using name_speed on 
identity  (cost=0.00..1042.34 rows=4868 width=82) (actual time=0.142..52.538

> rows=4915 loops=1)
>Index Cond: (((full_name)::text >= 
'MILLER'::character varying) AND ((full_name)::text < 
'MILLES'::character varying))
>Filter: (((court_ori)::text = 
'IL081025J'::text) AND ((full_name)::text ~~ 'MILLER%'::text))
>->  Index Scan using lit_actor_speed on 
litigant_details  (cost=0.00..3.96 rows=1 width=81) (actual

> time=4.631..4.635 rows=1 loops=4915)
> Index Cond: (("outer".actor_id)::text = 
(litigant_details.actor_id)::text)

> Filter: ('IL081025J'::text = (court_ori)::text)
> ->  Seq Scan on court  (cost=0.00..3.29 
rows=1 width=33) (actual time=0.053..0.062 rows=1 loops=4906)

>  Filter: ('IL081025J'::text = (id)::text)
>  ->  Index Scan using case_speed on 
case_data (cost=0.00..5.29 rows=3 width=53) (actual time=3.049..3.058

> rows=1 loops=4906)
>   Index Cond: (('IL081025J'::text 
= (case_data.court_ori)::text) AND ((case_data.case_id)::text =

 > ("outer".case_id)::text))

---(end of broadcast)---
TIP 6: explain analyze is your friend

Re: [PERFORM] limit number of concurrent callers to a stored proc?

2005-08-18 Thread Merlin Moncure

Christopher
> You could use a 1 column/1 row table perhaps.  Use some sort of
locking
> mechanism.
> 
> Also, check out contrib/userlock

userlock is definitely the way to go for this type of problem.  

The are really the only way to provide locking facilities that live
outside transactions.

You are provided with 48 bits of lock space in the form of offset/block
in 32 bit field and a 16 bit field.  The 16 bit field could be the pid
of the locker and the 32 bit field the oid of the function.

Unfortunately, userlocks are not really easy to query via the pg_locks()
view.  However this has been addressed for 8.1.  In 8.1, it will be
trivial to create a function which checked the number of lockers on the
function oid and acquire a lock if less than a certain amount.

Merlin

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster

Re: [PERFORM] extremly low memory usage

2005-08-18 Thread John Arbash Meinel

Jeremiah Jahn wrote:

>here's an example standard query. Ireally have to make the first hit go
>faster. The table is clustered as well on full_name as well. 'Smith%'
>took 87 seconds on the first hit. I wonder if I set up may array wrong.
>I remeber see something about DMA access versus something else, and
>choose DMA access. LVM maybe?
>  
>
It would be nice if you would format your queries to be a little bit
easier to read before posting them.
However, I believe I am reading it correctly, to say that the index scan
on identity is not your slow point. In fact, as near as I can tell, it
only takes 52ms to complete.

The expensive parts are the 4915 lookups into the litigant_details (each
one takes approx 4ms for a total of ~20s).
And then you do it again on case_data (average 3ms each * 4906 loops =
~15s).

So there is no need for preloading your indexes on the identity table.
It is definitely not the bottleneck.

So a few design bits, which may help your database.
Why is "actor_id" a text field instead of a number?
You could try creating an index on "litigant_details (actor_id,
count_ori)" so that it can do just an index lookup, rather than an index
+ filter.

More importantly, though, the planner seems to think the join of
identity to litigant_details will only return 1 row, not 5000.
Do you regularly vacuum analyze your tables?
Just as a test, try running:
set enable_nested_loop to off;
And then run EXPLAIN ANALYZE again, just to see if it is faster.

You probably need to increase some statistics targets, so that the
planner can design better plans.

>  ->  Nested Loop  (cost=0.00..20411.83 rows=2 width=173)
> (actual time=12.891..38317.017 rows=4906 loops=1)
>->  Nested Loop  (cost=0.00..20406.48 rows=1 width=159)
> (actual time=12.826..23232.106 rows=4906 loops=1)
>  ->  Nested Loop  (cost=0.00..20403.18 rows=1
> width=138) (actual time=12.751..22885.439 rows=4906 loops=1)
>Join Filter: (("outer".case_id)::text =
> ("inner".case_id)::text)
>->  Index Scan using name_speed on
> identity  (cost=0.00..1042.34 rows=4868 width=82) (actual
> time=0.142..52.538 rows=4915 loops=1)
>  Index Cond: (((full_name)::text >=
> 'MILLER'::character varying) AND ((full_name)::text <
> 'MILLES'::character varying))
>  Filter: (((court_ori)::text =
> 'IL081025J'::text) AND ((full_name)::text ~~ 'MILLER%'::text))
>->  Index Scan using lit_actor_speed on
> litigant_details  (cost=0.00..3.96 rows=1 width=81) (actual
> time=4.631..4.635 rows=1 loops=4915)
>  Index Cond: (("outer".actor_id)::text
> = (litigant_details.actor_id)::text)
>  Filter: ('IL081025J'::text =
> (court_ori)::text)
>  ->  Seq Scan on court  (cost=0.00..3.29 rows=1
> width=33) (actual time=0.053..0.062 rows=1 loops=4906)
>Filter: ('IL081025J'::text = (id)::text)
>->  Index Scan using case_speed on case_data 
> (cost=0.00..5.29 rows=3 width=53) (actual time=3.049..3.058 rows=1
> loops=4906)
>  Index Cond: (('IL081025J'::text =
> (case_data.court_ori)::text) AND ((case_data.case_id)::text =
> ("outer".case_id)::text))


John
=:->



signature.asc
Description: OpenPGP digital signature

Re: [PERFORM] extremly low memory usage

2005-08-18 Thread Jeremiah Jahn

here's an example standard query. Ireally have to make the first hit go
faster. The table is clustered as well on full_name as well. 'Smith%'
took 87 seconds on the first hit. I wonder if I set up may array wrong.
I remeber see something about DMA access versus something else, and
choose DMA access. LVM maybe?

explain analyze select distinct 
case_category,identity_id,court.name,litigant_details.case_id,case_year,date_of_birth,assigned_case_role,litigant_details.court_ori,full_name,litigant_details.actor_id,case_data.type_code,case_data.subtype_code,litigant_details.impound_litigant_data,
 to_number(trim(leading case_data.type_code from trim(leading 
case_data.case_year from case_data.case_id)),'99') as seq from 
identity,court,litigant_details,case_data where identity.court_ori = 
litigant_details.court_ori and identity.case_id = litigant_details.case_id and 
identity.actor_id = litigant_details.actor_id and court.id = identity.court_ori 
and identity.court_ori = case_data.court_ori and case_data.case_id = 
identity.case_id  and identity.court_ori = 'IL081025J' and full_name like 
'MILLER%' order by  full_name;


  QUERY 
PLAN 
---
 Unique  (cost=20411.84..20411.91 rows=2 width=173) (actual 
time=38340.231..38355.120 rows=4906 loops=1)
   ->  Sort  (cost=20411.84..20411.84 rows=2 width=173) (actual 
time=38340.227..38343.667 rows=4906 loops=1)
 Sort Key: identity.full_name, case_data.case_category, 
identity.identity_id, court.name, litigant_details.case_id, 
case_data.case_year, identity.date_of_birth, 
litigant_details.assigned_case_role, litigant_details.court_ori, 
litigant_details.actor_id, case_data.type_code, case_data.subtype_code, 
litigant_details.impound_litigant_data, 
to_number(ltrim(ltrim((case_data.case_id)::text, (case_data.case_year)::text), 
(case_data.type_code)::text), '99'::text)
 ->  Nested Loop  (cost=0.00..20411.83 rows=2 width=173) (actual 
time=12.891..38317.017 rows=4906 loops=1)
   ->  Nested Loop  (cost=0.00..20406.48 rows=1 width=159) (actual 
time=12.826..23232.106 rows=4906 loops=1)
 ->  Nested Loop  (cost=0.00..20403.18 rows=1 width=138) 
(actual time=12.751..22885.439 rows=4906 loops=1)
   Join Filter: (("outer".case_id)::text = 
("inner".case_id)::text)
   ->  Index Scan using name_speed on identity  
(cost=0.00..1042.34 rows=4868 width=82) (actual time=0.142..52.538 rows=4915 
loops=1)
 Index Cond: (((full_name)::text >= 
'MILLER'::character varying) AND ((full_name)::text < 'MILLES'::character 
varying))
 Filter: (((court_ori)::text = 
'IL081025J'::text) AND ((full_name)::text ~~ 'MILLER%'::text))
   ->  Index Scan using lit_actor_speed on 
litigant_details  (cost=0.00..3.96 rows=1 width=81) (actual time=4.631..4.635 
rows=1 loops=4915)
 Index Cond: (("outer".actor_id)::text = 
(litigant_details.actor_id)::text)
 Filter: ('IL081025J'::text = (court_ori)::text)
 ->  Seq Scan on court  (cost=0.00..3.29 rows=1 width=33) 
(actual time=0.053..0.062 rows=1 loops=4906)
   Filter: ('IL081025J'::text = (id)::text)
   ->  Index Scan using case_speed on case_data  (cost=0.00..5.29 
rows=3 width=53) (actual time=3.049..3.058 rows=1 loops=4906)
 Index Cond: (('IL081025J'::text = 
(case_data.court_ori)::text) AND ((case_data.case_id)::text = 
("outer".case_id)::text))
 Total runtime: 38359.722 ms
(18 rows)

copa=> explain analyze select distinct 
case_category,identity_id,court.name,litigant_details.case_id,case_year,date_of_birth,assigned_case_role,litigant_details.court_ori,full_name,litigant_details.actor_id,case_data.type_code,case_data.subtype_code,litigant_details.impound_litigant_data,
 to_number(trim(leading case_data.type_code from trim(leading 
case_data.case_year from case_data.case_id)),'99') as seq from 
identity,court,litigant_details,case_data where identity.court_ori = 
litigant_details.court_ori and identity.case_id = litigant_details.case_id and 
identity.actor_id = litigant_details.actor_id and court.id = identity.court_ori 
and identity.court_ori = c

Re: [PERFORM] extremly low memory usage

2005-08-18 Thread John Arbash Meinel

Jeremiah Jahn wrote:

>On Wed, 2005-08-17 at 21:21 -0500, John A Meinel wrote:
>  
>
>>Jeremiah Jahn wrote:
>>
>>
>>>I just put together a system with 6GB of ram on a 14 disk raid 10 array.
>>>When I run my usual big painful queries, I get very little to know
>>>memory usage. My production box (raid 5 4GB ram) hovers at 3.9GB used
>>>most of the time. the new devel box sits at around 250MB.
>>>
>>>I've switched to an 8.0 system on the new devel box, but the .conf
>>>really didn't change. Index usage is the same. Something seems wrong and
>>>I'm not sure why.
>>>
>>>  
>>>
>>How big is your actual database on disk? And how much of it is actually
>>touched by your queries?
>>
>>
>The DB is about 60GB. About 10GB is actually used in real queries,
>versus get me this single record with this ID. I have a large query that
>finds court cases based on certain criteria that is name based. I get a
>full seq scan on the name table in about 7 seconds, This table has about
>6 million names (most being 'smith, something'). The index scan takes
>much less time of course, once it's been cached (somewhere but not
>apparently memory). The really query can take 60 seconds on a first run.
>And 1.3 seconds on a second run. I'm very happy with the cached results,
>just not really sure where that caching is happening since it doesn't
>show up as memory usage. I do know that the caching that happens seems
>to be independent of the DB. I can restart the DB and my speeds are
>still the same as the cached second query. Is there some way to
>pre-cache some of the tables/files on the file system? If I switch my
>query to search for 'jones%' instead of 'smith%', I take a hit. But if I
>then rerun the smith search, I still get cached speed. I only have two
>tables essentially names and events that have to do any real work ie.
>not very atomic data. I'd love to be able to force these two tables into
>a cache somewhere. This is a linux system (RHEL ES4) by the way. 
>  
>
I think what is happening is that *some* of the index pages are being
cached, just not all of them. Most indexes (if you didn't specify
anything special) are btree, so that you load the root page, and then
determine what pages need to be loaded from there. So the "jones%" pages
aren't anywhere near the "smith%" pages. And don't need to be loaded if
you aren't accessing them.

So the required memory usage might be smaller than you think. At least
until all of the index pages have been accessed.

The reason it is DB independent is because the OS is caching a file
access (you read a file, it keeps the old pages in RAM in case you ask
for it again).

Part of the trick, is that as you use the database, it will cache what
has been used. So you may not need to do anything. It should sort itself
out with time.
However, if you have to have cached performance as soon as your machine
reboots, you could figure out what files on disk represent your indexes
and tables, and then just "cat $files >/dev/null"
That should cause a read on those files, which should pull them into the
memory cache. *However* this will fail if the size of those files is
greater than available memory, so you may want to be a little bit stingy
about what you preload.
Alternatively, you could just write an SQL script which runs a bunch of
indexed queries to make sure all the pages get loaded.

Something like:
FOR curname IN SELECT DISTINCT name FROM users LOOP
  SELECT name FROM users WHERE name=curname;
END LOOP;

That should make the database go through the entire table, and load the
index for every user. This is overkill, and will probably take a long
time to execute. But you could do it if you wanted.

>>It seems that your tough queries might only be exercising a portion of
>>the database. If you really want to make memory usage increase try
>>something like:
>>find . -type f -print0 | xargs -0 cat >/dev/null
>>Which should read all the files. After doing that, does the memory usage
>>increase?
>>
>>
>>
>>>any thoughts,
>>>-jj-
>>>
>>>
>>>shared_buffers = 32768  # min 16, at least max_connections*2, 8KB 
>>>each
>>>work_mem = 2097151  # min 64, size in KB
>>>  
>>>
>>This seems awfully high. 2GB Per sort? This might actually be flushing
>>some of your ram, since it would get allocated and filled, and then
>>freed when finished. Remember, depending on what you are doing, this
>>amount can get allocated more than once per query.
>>
>>
>What's a good way to determine the optimal size?
>  
>

Practice. :) A few questions I guess...
How many concurrent connections are you expecting? How many joins does a
standard query have? How big are the joins?

In general, I would tend to make this a smaller number, so that the os
has more room to cache tables, rather than having big buffers for joins.
If someone is requesting a join that requires a lot of rows, I would
rather *that* query be slower, than impacting everyone else.
I would put it more with a maximum in the 20-100MB ra

Re: [PERFORM] FW: Tx forecast improving harware capabilities.

2005-08-18 Thread Josh Berkus

Sebastian,

> We are having serious performance problems using JBOSS and PGSQL.

How about some information about your application?   Performance tuning 
approaches vary widely according to what you're doing with the database.

Also, read this:
http://www.powerpostgresql.com/PerfList

> I'm sure the problem has to do with the application itself (and neither
> with JBOSS nor PGSQL) but the fact is that we are using desktop
> equipment to run both Jboss and Postgresql (An Athlon 2600, 1 Gb Ram,
> IDE HDD with 60 Mb/sec Transfer Rate), and the answers arise:

Well, first off, the IDE HDD is probably killing performance unless your 
application is 95% read or greater.

> If we upgrade our hardware to a Dual Processor would the transactions
> per second increase significantly? Would Postgresql take advantage from
> SMP? Presumably yes, but can we do a forecast about the number of tps?

If this is an OLTP application, chance are that nothing is going to improve 
performance until you get decent disk support.

> What we need is a paper with some figures showing the expected
> performance in different environments. Some study about the "degree of
> correlation" between TPS and Number of Processors, Cache, Frequency,
> Word Size, Architecture, etc.

I don't think such a thing exists even for Oracle.   Hardware configuration 
for maximum performance is almost entirely dependant on your application.

If it helps, running DBT2 (an OLTP test devised by OSDL after TPC-C), I can 
easily get 1700 new orders per minute (NOTPM) (about 3000 total 
multiple-write transactions per minute) on a quad-pentium-III with 4GB RAM 
and 14 drives, and 6500 notpm on a dual-Itanium machine.  

> P.S. I've been looking at www.tpc.org but I could't find anything
> valuable.

Nor would you for any real-world situation even if we had a TPC benchmark 
(which are involved and expensive, give us a couple of years).  The TPC 
benchmarks are more of a litmus test that your database system & platform are 
"competitive"; they don't really relate to real-world performance (unless you 
have budget for an 112-disk system!)

-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

Re: [PERFORM] extremly low memory usage

2005-08-18 Thread Jeremiah Jahn

On Wed, 2005-08-17 at 21:21 -0500, John A Meinel wrote:
> Jeremiah Jahn wrote:
> > I just put together a system with 6GB of ram on a 14 disk raid 10 array.
> > When I run my usual big painful queries, I get very little to know
> > memory usage. My production box (raid 5 4GB ram) hovers at 3.9GB used
> > most of the time. the new devel box sits at around 250MB.
> >
> > I've switched to an 8.0 system on the new devel box, but the .conf
> > really didn't change. Index usage is the same. Something seems wrong and
> > I'm not sure why.
> >
> 
> How big is your actual database on disk? And how much of it is actually
> touched by your queries?
The DB is about 60GB. About 10GB is actually used in real queries,
versus get me this single record with this ID. I have a large query that
finds court cases based on certain criteria that is name based. I get a
full seq scan on the name table in about 7 seconds, This table has about
6 million names (most being 'smith, something'). The index scan takes
much less time of course, once it's been cached (somewhere but not
apparently memory). The really query can take 60 seconds on a first run.
And 1.3 seconds on a second run. I'm very happy with the cached results,
just not really sure where that caching is happening since it doesn't
show up as memory usage. I do know that the caching that happens seems
to be independent of the DB. I can restart the DB and my speeds are
still the same as the cached second query. Is there some way to
pre-cache some of the tables/files on the file system? If I switch my
query to search for 'jones%' instead of 'smith%', I take a hit. But if I
then rerun the smith search, I still get cached speed. I only have two
tables essentially names and events that have to do any real work ie.
not very atomic data. I'd love to be able to force these two tables into
a cache somewhere. This is a linux system (RHEL ES4) by the way. 
> 
> It seems that your tough queries might only be exercising a portion of
> the database. If you really want to make memory usage increase try
> something like:
> find . -type f -print0 | xargs -0 cat >/dev/null
> Which should read all the files. After doing that, does the memory usage
> increase?
> 
> >
> > any thoughts,
> > -jj-
> >
> >
> > shared_buffers = 32768  # min 16, at least max_connections*2, 8KB 
> > each
> > work_mem = 2097151  # min 64, size in KB
> 
> This seems awfully high. 2GB Per sort? This might actually be flushing
> some of your ram, since it would get allocated and filled, and then
> freed when finished. Remember, depending on what you are doing, this
> amount can get allocated more than once per query.
What's a good way to determine the optimal size?

> 
> > maintenance_work_mem = 819200   # min 1024, size in KB
> > max_fsm_pages = 8   # min max_fsm_relations*16, 6 bytes each
> > checkpoint_segments = 30# in logfile segments, min 1, 16MB each
> > effective_cache_size = 360   <-this is a little out of control, but 
> > would it have any real effect?
> 
> It should just tell the planner that it is more likely to have buffers
> in cache, so index scans are slightly cheaper than they would otherwise be.
> 
> > random_page_cost = 2# units are one sequential page fetch cost
> > log_min_duration_statement = 1 # -1 is disabled, in milliseconds.
> > lc_messages = 'C'   # locale for system error message strings
> > lc_monetary = 'C'   # locale for monetary formatting
> > lc_numeric = 'C'# locale for number formatting
> > lc_time = 'C'   # locale for time formatting
> >
> 
> John
> =:->
-- 
"Now this is a totally brain damaged algorithm.  Gag me with a
smurfette."
-- P. Buhr, Computer Science 354


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match

[PERFORM] FW: Tx forecast improving harware capabilities.

2005-08-18 Thread Sebastian Lallana









Hello:

We are having serious performance problems using
JBOSS and PGSQL.

I’m sure the problem has to do with the
application itself (and neither with JBOSS nor PGSQL) but the fact is that we are using desktop equipment
to run both Jboss and Postgresql (An Athlon 2600, 1
Gb Ram, IDE HDD with 60 Mb/sec Transfer Rate), and the answers arise:

If we upgrade our hardware to a Dual Processor would
the transactions per second increase significantly? Would Postgresql take advantage from SMP? Presumably yes,
but can we do a forecast about the number of tps? What we need is a paper with
some figures showing the expected performance in different environments. Some
study about the “degree of correlation” between TPS and Number of Processors,
Cache, Frequency, Word Size, Architecture, etc. 

It exists something like
this? Does anybody has experience about this subject?

 

Thanks in
advance and best regards.

 

P.S. I’ve been
looking at www.tpc.org but I could’t find anything valuable.

Re: [PERFORM] Performance pb vs SQLServer.

2005-08-18 Thread John A Meinel

Qingqing Zhou wrote:
> "Alvaro Herrera" <[EMAIL PROTECTED]> writes
>
>>Interesting; do they use an overwriting storage manager like Oracle, or
>>a non-overwriting one like Postgres?
>>
>
>
> They call this MVCC "RLV(row level versioning)". I think they use rollback
> segment like Oracle (a.k.a "version store" or tempdb in SQL Server).  Some
> details are explained in their white paper:"Database concurrency and row
> level versioning in SQL Server 2005".
>
> Regards,
> Qingqing
>

I found the paper here:
http://www.microsoft.com/technet/prodtechnol/sql/2005/cncrrncy.mspx

And it does sound like they are doing it the Oracle way:

When a record in a table or index is updated, the new record is stamped
with the transaction sequence_number of the transaction that is doing
the update. The previous version of the record is stored in the version
store, and the new record contains a pointer to the old record in the
version store. Old records in the version store may contain pointers to
even older versions. All the old versions of a particular record are
chained in a linked list, and SQL Server may need to follow several
pointers in a list to reach the right version. Version records need to
be kept in the version store only as long as there are there are
operations that might require them.

John
=:->

signature.asc
Description: OpenPGP digital signature

Re: [PERFORM] extremly low memory usage

2005-08-18 Thread Jeff Trout



On Aug 17, 2005, at 10:11 PM, Jeremiah Jahn wrote:

I just put together a system with 6GB of ram on a 14 disk raid 10  
array.

When I run my usual big painful queries, I get very little to know
memory usage. My production box (raid 5 4GB ram) hovers at 3.9GB used
most of the time. the new devel box sits at around 250MB.



Is the system performing fine?  Are you touching as much data as the  
production box?


If the system is performing fine don't worry about it.


work_mem = 2097151  # min 64, size in KB


This is EXTREMELY high.  You realize this is the amount of memory  
that can be used per-sort and per-hash build in a query? You can end  
up with multiples of this on a single query.   If you have some big  
queries that are run infrequently have them set it manually.


effective_cache_size = 360   <-this is a little out of  
control, but would it have any real effect?


This doesn't allocate anything - it is a hint to the planner about  
how much data it can assume is cached.


--
Jeff Trout <[EMAIL PROTECTED]>
http://www.jefftrout.com/
http://www.stuarthamm.net/



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match

Re: [PERFORM] Performance pb vs SQLServer.

2005-08-18 Thread Qingqing Zhou


"Alvaro Herrera" <[EMAIL PROTECTED]> writes
>
> Interesting; do they use an overwriting storage manager like Oracle, or
> a non-overwriting one like Postgres?
>

They call this MVCC "RLV(row level versioning)". I think they use rollback
segment like Oracle (a.k.a "version store" or tempdb in SQL Server).  Some
details are explained in their white paper:"Database concurrency and row
level versioning in SQL Server 2005".

Regards,
Qingqing



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly

Re: [PERFORM] Query plan looks OK, but slow I/O - settings advice?

[PERFORM] I'm configuraing a new system (Bigish) and need some advice.

Re: [PERFORM] I'm configuraing a new system (Bigish) and need some advice.

[PERFORM] Query plan looks OK, but slow I/O - settings advice?

Re: [PERFORM] Performance pb vs SQLServer.

Re: [PERFORM] Need for speed

Re: [PERFORM] Performance pb vs SQLServer.

Re: [PERFORM] [JDBC] Performance problem using V3 protocol in jdbc driver

Re: [PERFORM] [JDBC] Data Selection Slow From VB 6.0

Re: [PERFORM] Performance pb vs SQLServer.

Re: [PERFORM] FW: Tx forecast improving harware capabilities.

Re: [PERFORM] extremly low memory usage

Re: [PERFORM] limit number of concurrent callers to a stored proc?

Re: [PERFORM] extremly low memory usage

Re: [PERFORM] extremly low memory usage

Re: [PERFORM] extremly low memory usage

Re: [PERFORM] FW: Tx forecast improving harware capabilities.

Re: [PERFORM] extremly low memory usage

[PERFORM] FW: Tx forecast improving harware capabilities.

Re: [PERFORM] Performance pb vs SQLServer.

Re: [PERFORM] extremly low memory usage

Re: [PERFORM] Performance pb vs SQLServer.

22 matches

Site Navigation

Mail list logo

Footer information