Re: [PERFORM] Query plan looks OK, but slow I/O - settings advice?
On Tue, 2005-08-16 at 10:46 -0700, Roger Hand wrote: > The disks are ext3 with journalling type of ordered, but this was later > changed to writeback with no apparent change in speed. > > They're on a Dell poweredge 6650 with LSI raid card, setup as follows: > 4 disks raid 10 for indexes (145GB) - sdc1 > 6 disks raid 10 for data (220GB) - sdd1 > 2 mirrored disks for logs - sdb1 > > stripe size is 32k > cache policy: cached io (am told the controller has bbu) > write policy: write-back > read policy: readahead I assume you are using Linux 2.6. Have you considered booting your machine with elevator=deadline? You can also change this at runtime using sysfs. These read speeds are not too impressive. Perhaps this is a slow controller. Alternately you might need bigger CPUs. There's a lot of possibilities, obviously :) I'd start with the elevator, since that's easily tested. -jwb ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] I'm configuraing a new system (Bigish) and need some advice.
The system is a dual Xenon with 6Gig of ram and 14 73Gig 15K u320 scsi drives. Plus 2 raid 1 system dives. RedHat EL ES4 is the OS. Any1 have any suggestions as to the configuration? The database is about 60 Gig's. Should jump to 120 here quite soon. Mus of the searches involve people's names. Through a website. My current setup just doesn't seem to have resulted in the performance kick I wanted. I don't know if it's LVM or what. The strang thing is that My Memory usage stays very LOW for some reason. While on my current production server it stays very high. Also looking for ideas on stipe and extent size. The below is run off of a RAID 10. I have not moved my WAL file yet, but there were no incoming transactions at the time the query was run. My stats on the identity table are set to 1000. > explain analyze select distinct > case_category,identity_id,court.name,litigant_details.case_id,case_year,date_of_birth,assigned_case_role,litigant_details.court_ori,full_name,litigant_details.actor_id,case_data.type_code,case_data.subtype_code,litigant_details.impound_litigant_data, > to_number(trim(leading case_data.type_code from trim(leading > case_data.case_year from case_data.case_id)),'99') as seq from > identity,court,litigant_details,case_data where identity.court_ori = > litigant_details.court_ori and identity.case_id = litigant_details.case_id > and identity.actor_id = litigant_details.actor_id and court.id = > identity.court_ori and identity.court_ori = case_data.court_ori and > case_data.case_id = identity.case_id and identity.court_ori = 'IL081025J' > and full_name like 'SMITH%' order by full_name; > > > > QUERY PLAN > --- > Unique (cost=34042.46..34042.57 rows=3 width=173) (actual > time=63696.896..63720.193 rows=8086 loops=1) >-> Sort (cost=34042.46..34042.47 rows=3 width=173) (actual > time=63696.892..63702.239 rows=8086 loops=1) > Sort Key: identity.full_name, case_data.case_category, > identity.identity_id, court.name, litigant_details.case_id, > case_data.case_year, identity.date_of_birth, > litigant_details.assigned_case_role, litigant_details.court_ori, > litigant_details.actor_id, case_data.type_code, case_data.subtype_code, > litigant_details.impound_litigant_data, > to_number(ltrim(ltrim((case_data.case_id)::text, > (case_data.case_year)::text), (case_data.type_code)::text), '99'::text) > -> Nested Loop (cost=0.00..34042.43 rows=3 width=173) (actual > time=135.498..63655.542 rows=8086 loops=1) >-> Nested Loop (cost=0.00..34037.02 rows=1 width=159) > (actual time=95.760..34637.611 rows=8086 loops=1) > -> Nested Loop (cost=0.00..34033.72 rows=1 width=138) > (actual time=89.222..34095.763 rows=8086 loops=1) >Join Filter: (("outer".case_id)::text = > ("inner".case_id)::text) >-> Index Scan using name_speed on identity > (cost=0.00..1708.26 rows=8152 width=82) (actual time=42.589..257.818 > rows=8092 loops=1) > Index Cond: (((full_name)::text >= > 'SMITH'::character varying) AND ((full_name)::text < 'SMITI'::character > varying)) > Filter: (((court_ori)::text = > 'IL081025J'::text) AND ((full_name)::text ~~ 'SMITH%'::text)) >-> Index Scan using lit_actor_speed on > litigant_details (cost=0.00..3.95 rows=1 width=81) (actual time=4.157..4.170 > rows=1 loops=8092) > Index Cond: (("outer".actor_id)::text = > (litigant_details.actor_id)::text) > Filter: ('IL081025J'::text = > (court_ori)::text) > -> Seq Scan on court (cost=0.00..3.29 rows=1 width=33) > (actual time=0.051..0.058 rows=1 loops=8086) >Filter: ('IL081025J'::text = (id)::text) >-> Index Scan using case_data_pkey on case_data > (cost=0.00..5.36 rows=2 width=53) (actual time=3.569..3.572 rows=1 loops=8086) > Index Cond: (('IL081025J'::text = > (case_data.court_ori)::text) AND ((case_data.case_id)::text = > ("outer".case_id)::text)) > Total runtime: 63727.873 ms > > > tcpip_socket = true > max_connections = 100 > shared_buffers
Re: [PERFORM] I'm configuraing a new system (Bigish) and need some advice.
7.4 is the pg version BTWgoing to switch to 8 if it's worth it. Ingrate, n.: A man who bites the hand that feeds him, and then complains of indigestion. -- "Don't say yes until I finish talking." -- Darryl F. Zanuck signature.asc Description: This is a digitally signed message part
[PERFORM] Query plan looks OK, but slow I/O - settings advice?
Summary === We are writing to the db pretty much 24 hours a day. Recently the amount of data we write has increased, and the query speed, formerly okay, has taken a dive. The query is using the indexes as expected, so I don't _think_ I have a query tuning issue, just an io problem. The first time a query is done it takes about 60 seconds. The second time it runs in about 6 seconds. What I know I need advice on is io settings and various buffer settings. I may also need advice on other things, but just don't know it yet! Below is ... - an explain analyze - details of the db setup and hardware - some vmstat and iostat output showing the disks are very busy - the SHOW ALL output for the db config. Details === Postgres 8.0.3 Below is a sample query. (This is actually implemented as a prepared statement. Here I fill in the '?'s with actual values.) electric=# EXPLAIN ANALYZE electric-# SELECT datavalue, logfielddatatype, timestamp FROM logdata_recent electric-# WHERE (logfielddatatype = 70 OR logfielddatatype = 71 OR logfielddatatype = 69) electric-# AND graphtargetlog = 1327 electric-# AND timestamp >= 1123052400 AND timestamp <= 1123138800 electric-# ORDER BY timestamp; QUERY PLAN -- Sort (cost=82.48..82.50 rows=6 width=14) (actual time=60208.968..60211.232 rows=2625 loops=1) Sort Key: public.logdata_recent."timestamp" -> Result (cost=0.00..82.41 rows=6 width=14) (actual time=52.483..60200.868 rows=2625 loops=1) -> Append (cost=0.00..82.41 rows=6 width=14) (actual time=52.476..60189.929 rows=2625 loops=1) -> Seq Scan on logdata_recent (cost=0.00..46.25 rows=1 width=14) (actual time=0.003..0.003 rows=0 loops=1) Filter: (((logfielddatatype = 70) OR (logfielddatatype = 71) OR (logfielddatatype = 69)) AND (graphtargetlog = 1327) AND ("timestamp" >= 1123052400) AND ("timestamp" <= 1123138800)) -> Index Scan using logdata_recent_1123085306_ix_t_fld_gtl, logdata_recent_1123085306_ix_t_fld_gtl, logdata_recent_1123085306_ix_t_fld_gtl on logdata_recent_stale logdata_recent (cost=0.00..18.08 rows=3 width=14) (actual time=52.465..60181.624 rows=2625 loops=1) Index Cond: ((("timestamp" >= 1123052400) AND ("timestamp" <= 1123138800) AND (logfielddatatype = 70) AND (graphtargetlog = 1327)) OR (("timestamp" >= 1123052400) AND ("timestamp" <= 1123138800) AND (logfielddatatype = 71) AND (graphtargetlog = 1327)) OR (("timestamp" >= 1123052400) AND ("timestamp" <= 1123138800) AND (logfielddatatype = 69) AND (graphtargetlog = 1327))) Filter: (((logfielddatatype = 70) OR (logfielddatatype = 71) OR (logfielddatatype = 69)) AND (graphtargetlog = 1327) AND ("timestamp" >= 1123052400) AND ("timestamp" <= 1123138800)) -> Index Scan using logdata_recent_1123139634_ix_t_fld_gtl, logdata_recent_1123139634_ix_t_fld_gtl, logdata_recent_1123139634_ix_t_fld_gtl on logdata_recent_active logdata_recent (cost=0.00..18.08 rows=2 width=14) (actual time=0.178..0.178 rows=0 loops=1) Index Cond: ((("timestamp" >= 1123052400) AND ("timestamp" <= 1123138800) AND (logfielddatatype = 70) AND (graphtargetlog = 1327)) OR (("timestamp" >= 1123052400) AND ("timestamp" <= 1123138800) AND (logfielddatatype = 71) AND (graphtargetlog = 1327)) OR (("timestamp" >= 1123052400) AND ("timestamp" <= 1123138800) AND (logfielddatatype = 69) AND (graphtargetlog = 1327))) Filter: (((logfielddatatype = 70) OR (logfielddatatype = 71) OR (logfielddatatype = 69)) AND (graphtargetlog = 1327) AND ("timestamp" >= 1123052400) AND ("timestamp" <= 1123138800)) Total runtime: 60214.545 ms (13 rows) 60 seconds is much longer than it used to be. I would guess it used to be under 10 seconds. The second time the above query is run we see the magic of caching as the time goes down to 6 seconds. logdata_recent_active and logdata_recent_stale are inherited tables of logdata_recent, which never has any data. (This is pseudo-partitioning in action!) So the very quick seq_scan on the empty logdata_recent parent table is okay with me. The index is built on timestamp, logfielddatatype, graphtargetlog. I am curious as to why the same index shows up 3 times in the "using" clause, but can live without knowing the details as long as it doesn't indicate that something's wrong. The logdata_recent_stale table has 5 millions rows. The size of the table itself, on disk, is 324MB. The size of the index is 210MB. The disks are ext3 with jou
Re: [PERFORM] Performance pb vs SQLServer.
> > One little thing. Did you shutdown sql2000 while testing > postgresql? Remember that postgresql uses system cache. > Sql2000 uses a large part of memory as buffer and it will not > be available to operating system. I must say that, probably, > results will be the same, but it will be a better test. > Shutting done SQL2000 has no effect on PG performancies. Stephane. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Need for speed
> Ulrich Wisser wrote: > > > > one of our services is click counting for on line advertising. We do > > this by importing Apache log files every five minutes. This results in a > > lot of insert and delete statements. ... > If you are doing mostly inserting, make sure you are in a transaction, Well, yes, but you may need to make sure that a single transaction doesn't have too many inserts in it. I was having a performance problem when doing transactions with a huge number of inserts (tens of thousands), and I solved the problem by putting a simple counter in the loop (in the Java import code, that is) and doing a commit every 100 or so inserts. -Roger > John > > > Ulrich ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Performance pb vs SQLServer.
> De : Magnus Hagander [mailto:[EMAIL PROTECTED] > Out of curiosity, what plan do you get from SQLServer? I bet > it's a clustered index scan... > > > //Magnus > I have a Table scan and Hashaggregate... Stephane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] [JDBC] Performance problem using V3 protocol in jdbc driver
Barry, I have made a similar experience, moving a big Oracle data base to Postgres 8.03 on linux. The first impact was similar, huge performance problems. The main problem was bad planner choices. The cause in our case: bad parameter types in the jdbc set methods (I guess you use Java). For oracle we used the NUMERIC type to set primary keys, but the postgres id type used was BIGINT, and it just refused to use the index in this case. Imagine that kicking in on a 100 million rows table... a sequential scan started a few times a second, now that made the DB unusable. So we fixed the code that for oracle continues to use NUMERIC and for postgres it uses BIGINT, and that is very important on setNull calls too. One very useful tool was the following query: prepare ps as SELECT procpid, substring(current_query for 97), to_char((now()-query_start), 'HH24:MI:SS') as t FROM pg_stat_activity where current_query not like '% We just moved a large production instance of ours from Oracle to > Postgres 8.0.3 on linux. When running on Oracle the machine hummed > along using about 5% of the CPU easily handling the fairly constant > load, after moving the data to Postgres the machine was pretty much > maxed out on CPU and could no longer keep up with the transaction > volume. On a hunch I switched the jdbc driver to using the V2 > protocol and the load on the machine dropped down to what it was when > using Oracle and everything was fine. > > > > Now obviously I have found a work around for the performance problem, > but I really don’t want to rely on using the V2 protocol forever, and > don’t want to have to recommend to our customers that they need to run > with the V2 protocol. So I would like to resolve the problem and be > able to move back to a default configuration with the V3 protocol and > the benefits thereof. > > > > The problem is that I don’t really know where to begin to debug a > problem like this. In development environments and testing > environments we have not seen performance problems with the V3 > protocol in the jdbc driver. But they don’t come close to approaching > the transaction volume of this production instance. > > > > What I see when running the V3 protocol under ‘top’ is that the > postgres processes are routinely using 15% or more of the CPU each, > when running the V2 protocol they use more like 0.3%. > > > > Does anyone have any suggestions on an approach to debug a problem > like this? > > > > Thanks, > > --Barry > > ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] [JDBC] Data Selection Slow From VB 6.0
Mahesh Shinde wrote: Hi I am using Postgres version **PostgreSQL 7.4.5 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.2.2 20030222 (Red Hat Linux 3.2.2-5).* * for an multy user desktop application using VB 6.0 as a front end toll. To connect To the PostgreSQL I am using **PostgreSQL Win32 ODBC and OLEDB client drivers 1.0.0.2** pgsql-jdbc isn't relevant, then -- the JDBC driver is not involved. -O ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Performance pb vs SQLServer.
John Arbash Meinel wrote : > > You might also try a different query, something like: > > SELECT DISTINCT cod FROM mytable ORDER BY cod GROUP BY cod; > (You may or may not want order by, or group by, try the different > combinations.) > It might be possible to have the planner realize that all you > want is unique rows, just doing a group by doesn't give you that. > > John > =:-> > Thanks John, but using SELECT DISTINCT with or without Order nor Group by is worth... 30 sec (with index) - stopped at 200 sec without index... So Hash Aggregate is much better than index scan ... > > > >Thanks for help. > > > >Stéphane COEZ > > > > > > > > > >---(end of > >broadcast)--- > >TIP 9: In versions below 8.0, the planner will ignore your desire to > > choose an index scan if your joining column's datatypes do not > > match > > > > > > > > > ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] FW: Tx forecast improving harware capabilities.
On 18 Aug 2005, at 16:01, Sebastian Lallana wrote: It exists something like this? Does anybody has experience about this subject? I've just been through this with a client with both a badly tuned Pg and an application being less than optimal. First, find a benchmark. Just something you can hold on to. For us, it was the generation time of the site's home page. In this case, 7 seconds. We looked hard at postgresql.conf, planned the memory usage, sort_memory and all that. That was a boost. Then we looked at the queries that were being thrown at the database. Over 200 to build one page! So, a layer of caching was built into the web server layer. Finally, some frequently occurring combinations of queries were pushed down into stored procs. We got the page gen time down to 1.5 seconds AND the server being stable under extreme stress. So, a fair win. Thanks to cms for several clues. So, without understanding your application and were it's taking the time, you can't begin to estimate hardware usage. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] extremly low memory usage
At 01:55 PM 8/18/2005, John Arbash Meinel wrote: Jeremiah Jahn wrote: >here's an example standard query. Ireally have to make the first hit go >faster. The table is clustered as well on full_name as well. 'Smith%' >took 87 seconds on the first hit. I wonder if I set up may array wrong. >I remeber see something about DMA access versus something else, and >choose DMA access. LVM maybe? > > It would be nice if you would format your queries to be a little bit easier to read before posting them. However, I believe I am reading it correctly, to say that the index scan on identity is not your slow point. In fact, as near as I can tell, it only takes 52ms to complete. The expensive parts are the 4915 lookups into the litigant_details (each one takes approx 4ms for a total of ~20s). And then you do it again on case_data (average 3ms each * 4906 loops = ~15s). How big are litigant_details and case_data? If they can fit in RAM, preload them using methods like the "cat to /dev/null" trick and those table lookups will be ~100-1000x faster. If they won't fit into RAM but the machine can be expanded to hold enough RAM to fit the tables, it's well worth the ~$75-$150/GB to upgrade the server so that the tables will fit into RAM. If they can't be made to fit into RAM as atomic entities, you have a few choices: A= Put the data tables and indexes on separate dedicated spindles and put litigant_details and case_data each on their own dedicated spindles. This will lower seek conflicts. Again it this requires buying some more HDs, it's well worth it. B= Break litigant_details and case_data into a set of smaller tables (based on something sane like the first n characters of the primary key) such that the smaller tables easily fit into RAM. Given that you've said only 10GB/60GB is "hot", this could work very well. Combine it with "A" above (put all the litigant_details sub tables on one dedicated spindle set and all the case_data sub tables on another spindle set) for added oomph. C= Buy a SSD big enough to hold litigant_details and case_data and put them there. Again, this can be combined with "A" and "B" above to lessen the size of the SSD needed. So there is no need for preloading your indexes on the identity table. It is definitely not the bottleneck. So a few design bits, which may help your database. Why is "actor_id" a text field instead of a number? You could try creating an index on "litigant_details (actor_id, count_ori)" so that it can do just an index lookup, rather than an index+ filter. Yes, that certainly sounds like it would be more efficient. More importantly, though, the planner seems to think the join of identity to litigant_details will only return 1 row, not 5000. Do you regularly vacuum analyze your tables? Just as a test, try running: set enable_nested_loop to off; And then run EXPLAIN ANALYZE again, just to see if it is faster. You probably need to increase some statistics targets, so that the planner can design better plans. > -> Nested Loop (cost=0.00..20411.83 rows=2 width=173) (actual time=12.891..38317.017 rows=4906 loops=1) >-> Nested Loop (cost=0.00..20406.48 rows=1 width=159)(actual time=12.826..23232.106 rows=4906 loops=1) > -> Nested Loop (cost=0.00..20403.18 rows=1 width=138) (actual time=12.751..22885.439 rows=4906 loops=1) > Join Filter: (("outer".case_id)::text = ("inner".case_id)::text) > -> Index Scan using name_speed on identity (cost=0.00..1042.34 rows=4868 width=82) (actual time=0.142..52.538 > rows=4915 loops=1) >Index Cond: (((full_name)::text >= 'MILLER'::character varying) AND ((full_name)::text < 'MILLES'::character varying)) >Filter: (((court_ori)::text = 'IL081025J'::text) AND ((full_name)::text ~~ 'MILLER%'::text)) >-> Index Scan using lit_actor_speed on litigant_details (cost=0.00..3.96 rows=1 width=81) (actual > time=4.631..4.635 rows=1 loops=4915) > Index Cond: (("outer".actor_id)::text = (litigant_details.actor_id)::text) > Filter: ('IL081025J'::text = (court_ori)::text) > -> Seq Scan on court (cost=0.00..3.29 rows=1 width=33) (actual time=0.053..0.062 rows=1 loops=4906) > Filter: ('IL081025J'::text = (id)::text) > -> Index Scan using case_speed on case_data (cost=0.00..5.29 rows=3 width=53) (actual time=3.049..3.058 > rows=1 loops=4906) > Index Cond: (('IL081025J'::text = (case_data.court_ori)::text) AND ((case_data.case_id)::text = > ("outer".case_id)::text)) ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] limit number of concurrent callers to a stored proc?
Christopher > You could use a 1 column/1 row table perhaps. Use some sort of locking > mechanism. > > Also, check out contrib/userlock userlock is definitely the way to go for this type of problem. The are really the only way to provide locking facilities that live outside transactions. You are provided with 48 bits of lock space in the form of offset/block in 32 bit field and a 16 bit field. The 16 bit field could be the pid of the locker and the 32 bit field the oid of the function. Unfortunately, userlocks are not really easy to query via the pg_locks() view. However this has been addressed for 8.1. In 8.1, it will be trivial to create a function which checked the number of lockers on the function oid and acquire a lock if less than a certain amount. Merlin ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] extremly low memory usage
Jeremiah Jahn wrote: >here's an example standard query. Ireally have to make the first hit go >faster. The table is clustered as well on full_name as well. 'Smith%' >took 87 seconds on the first hit. I wonder if I set up may array wrong. >I remeber see something about DMA access versus something else, and >choose DMA access. LVM maybe? > > It would be nice if you would format your queries to be a little bit easier to read before posting them. However, I believe I am reading it correctly, to say that the index scan on identity is not your slow point. In fact, as near as I can tell, it only takes 52ms to complete. The expensive parts are the 4915 lookups into the litigant_details (each one takes approx 4ms for a total of ~20s). And then you do it again on case_data (average 3ms each * 4906 loops = ~15s). So there is no need for preloading your indexes on the identity table. It is definitely not the bottleneck. So a few design bits, which may help your database. Why is "actor_id" a text field instead of a number? You could try creating an index on "litigant_details (actor_id, count_ori)" so that it can do just an index lookup, rather than an index + filter. More importantly, though, the planner seems to think the join of identity to litigant_details will only return 1 row, not 5000. Do you regularly vacuum analyze your tables? Just as a test, try running: set enable_nested_loop to off; And then run EXPLAIN ANALYZE again, just to see if it is faster. You probably need to increase some statistics targets, so that the planner can design better plans. > -> Nested Loop (cost=0.00..20411.83 rows=2 width=173) > (actual time=12.891..38317.017 rows=4906 loops=1) >-> Nested Loop (cost=0.00..20406.48 rows=1 width=159) > (actual time=12.826..23232.106 rows=4906 loops=1) > -> Nested Loop (cost=0.00..20403.18 rows=1 > width=138) (actual time=12.751..22885.439 rows=4906 loops=1) >Join Filter: (("outer".case_id)::text = > ("inner".case_id)::text) >-> Index Scan using name_speed on > identity (cost=0.00..1042.34 rows=4868 width=82) (actual > time=0.142..52.538 rows=4915 loops=1) > Index Cond: (((full_name)::text >= > 'MILLER'::character varying) AND ((full_name)::text < > 'MILLES'::character varying)) > Filter: (((court_ori)::text = > 'IL081025J'::text) AND ((full_name)::text ~~ 'MILLER%'::text)) >-> Index Scan using lit_actor_speed on > litigant_details (cost=0.00..3.96 rows=1 width=81) (actual > time=4.631..4.635 rows=1 loops=4915) > Index Cond: (("outer".actor_id)::text > = (litigant_details.actor_id)::text) > Filter: ('IL081025J'::text = > (court_ori)::text) > -> Seq Scan on court (cost=0.00..3.29 rows=1 > width=33) (actual time=0.053..0.062 rows=1 loops=4906) >Filter: ('IL081025J'::text = (id)::text) >-> Index Scan using case_speed on case_data > (cost=0.00..5.29 rows=3 width=53) (actual time=3.049..3.058 rows=1 > loops=4906) > Index Cond: (('IL081025J'::text = > (case_data.court_ori)::text) AND ((case_data.case_id)::text = > ("outer".case_id)::text)) John =:-> signature.asc Description: OpenPGP digital signature
Re: [PERFORM] extremly low memory usage
here's an example standard query. Ireally have to make the first hit go faster. The table is clustered as well on full_name as well. 'Smith%' took 87 seconds on the first hit. I wonder if I set up may array wrong. I remeber see something about DMA access versus something else, and choose DMA access. LVM maybe? explain analyze select distinct case_category,identity_id,court.name,litigant_details.case_id,case_year,date_of_birth,assigned_case_role,litigant_details.court_ori,full_name,litigant_details.actor_id,case_data.type_code,case_data.subtype_code,litigant_details.impound_litigant_data, to_number(trim(leading case_data.type_code from trim(leading case_data.case_year from case_data.case_id)),'99') as seq from identity,court,litigant_details,case_data where identity.court_ori = litigant_details.court_ori and identity.case_id = litigant_details.case_id and identity.actor_id = litigant_details.actor_id and court.id = identity.court_ori and identity.court_ori = case_data.court_ori and case_data.case_id = identity.case_id and identity.court_ori = 'IL081025J' and full_name like 'MILLER%' order by full_name; QUERY PLAN --- Unique (cost=20411.84..20411.91 rows=2 width=173) (actual time=38340.231..38355.120 rows=4906 loops=1) -> Sort (cost=20411.84..20411.84 rows=2 width=173) (actual time=38340.227..38343.667 rows=4906 loops=1) Sort Key: identity.full_name, case_data.case_category, identity.identity_id, court.name, litigant_details.case_id, case_data.case_year, identity.date_of_birth, litigant_details.assigned_case_role, litigant_details.court_ori, litigant_details.actor_id, case_data.type_code, case_data.subtype_code, litigant_details.impound_litigant_data, to_number(ltrim(ltrim((case_data.case_id)::text, (case_data.case_year)::text), (case_data.type_code)::text), '99'::text) -> Nested Loop (cost=0.00..20411.83 rows=2 width=173) (actual time=12.891..38317.017 rows=4906 loops=1) -> Nested Loop (cost=0.00..20406.48 rows=1 width=159) (actual time=12.826..23232.106 rows=4906 loops=1) -> Nested Loop (cost=0.00..20403.18 rows=1 width=138) (actual time=12.751..22885.439 rows=4906 loops=1) Join Filter: (("outer".case_id)::text = ("inner".case_id)::text) -> Index Scan using name_speed on identity (cost=0.00..1042.34 rows=4868 width=82) (actual time=0.142..52.538 rows=4915 loops=1) Index Cond: (((full_name)::text >= 'MILLER'::character varying) AND ((full_name)::text < 'MILLES'::character varying)) Filter: (((court_ori)::text = 'IL081025J'::text) AND ((full_name)::text ~~ 'MILLER%'::text)) -> Index Scan using lit_actor_speed on litigant_details (cost=0.00..3.96 rows=1 width=81) (actual time=4.631..4.635 rows=1 loops=4915) Index Cond: (("outer".actor_id)::text = (litigant_details.actor_id)::text) Filter: ('IL081025J'::text = (court_ori)::text) -> Seq Scan on court (cost=0.00..3.29 rows=1 width=33) (actual time=0.053..0.062 rows=1 loops=4906) Filter: ('IL081025J'::text = (id)::text) -> Index Scan using case_speed on case_data (cost=0.00..5.29 rows=3 width=53) (actual time=3.049..3.058 rows=1 loops=4906) Index Cond: (('IL081025J'::text = (case_data.court_ori)::text) AND ((case_data.case_id)::text = ("outer".case_id)::text)) Total runtime: 38359.722 ms (18 rows) copa=> explain analyze select distinct case_category,identity_id,court.name,litigant_details.case_id,case_year,date_of_birth,assigned_case_role,litigant_details.court_ori,full_name,litigant_details.actor_id,case_data.type_code,case_data.subtype_code,litigant_details.impound_litigant_data, to_number(trim(leading case_data.type_code from trim(leading case_data.case_year from case_data.case_id)),'99') as seq from identity,court,litigant_details,case_data where identity.court_ori = litigant_details.court_ori and identity.case_id = litigant_details.case_id and identity.actor_id = litigant_details.actor_id and court.id = identity.court_ori and identity.court_ori = c
Re: [PERFORM] extremly low memory usage
Jeremiah Jahn wrote: >On Wed, 2005-08-17 at 21:21 -0500, John A Meinel wrote: > > >>Jeremiah Jahn wrote: >> >> >>>I just put together a system with 6GB of ram on a 14 disk raid 10 array. >>>When I run my usual big painful queries, I get very little to know >>>memory usage. My production box (raid 5 4GB ram) hovers at 3.9GB used >>>most of the time. the new devel box sits at around 250MB. >>> >>>I've switched to an 8.0 system on the new devel box, but the .conf >>>really didn't change. Index usage is the same. Something seems wrong and >>>I'm not sure why. >>> >>> >>> >>How big is your actual database on disk? And how much of it is actually >>touched by your queries? >> >> >The DB is about 60GB. About 10GB is actually used in real queries, >versus get me this single record with this ID. I have a large query that >finds court cases based on certain criteria that is name based. I get a >full seq scan on the name table in about 7 seconds, This table has about >6 million names (most being 'smith, something'). The index scan takes >much less time of course, once it's been cached (somewhere but not >apparently memory). The really query can take 60 seconds on a first run. >And 1.3 seconds on a second run. I'm very happy with the cached results, >just not really sure where that caching is happening since it doesn't >show up as memory usage. I do know that the caching that happens seems >to be independent of the DB. I can restart the DB and my speeds are >still the same as the cached second query. Is there some way to >pre-cache some of the tables/files on the file system? If I switch my >query to search for 'jones%' instead of 'smith%', I take a hit. But if I >then rerun the smith search, I still get cached speed. I only have two >tables essentially names and events that have to do any real work ie. >not very atomic data. I'd love to be able to force these two tables into >a cache somewhere. This is a linux system (RHEL ES4) by the way. > > I think what is happening is that *some* of the index pages are being cached, just not all of them. Most indexes (if you didn't specify anything special) are btree, so that you load the root page, and then determine what pages need to be loaded from there. So the "jones%" pages aren't anywhere near the "smith%" pages. And don't need to be loaded if you aren't accessing them. So the required memory usage might be smaller than you think. At least until all of the index pages have been accessed. The reason it is DB independent is because the OS is caching a file access (you read a file, it keeps the old pages in RAM in case you ask for it again). Part of the trick, is that as you use the database, it will cache what has been used. So you may not need to do anything. It should sort itself out with time. However, if you have to have cached performance as soon as your machine reboots, you could figure out what files on disk represent your indexes and tables, and then just "cat $files >/dev/null" That should cause a read on those files, which should pull them into the memory cache. *However* this will fail if the size of those files is greater than available memory, so you may want to be a little bit stingy about what you preload. Alternatively, you could just write an SQL script which runs a bunch of indexed queries to make sure all the pages get loaded. Something like: FOR curname IN SELECT DISTINCT name FROM users LOOP SELECT name FROM users WHERE name=curname; END LOOP; That should make the database go through the entire table, and load the index for every user. This is overkill, and will probably take a long time to execute. But you could do it if you wanted. >>It seems that your tough queries might only be exercising a portion of >>the database. If you really want to make memory usage increase try >>something like: >>find . -type f -print0 | xargs -0 cat >/dev/null >>Which should read all the files. After doing that, does the memory usage >>increase? >> >> >> >>>any thoughts, >>>-jj- >>> >>> >>>shared_buffers = 32768 # min 16, at least max_connections*2, 8KB >>>each >>>work_mem = 2097151 # min 64, size in KB >>> >>> >>This seems awfully high. 2GB Per sort? This might actually be flushing >>some of your ram, since it would get allocated and filled, and then >>freed when finished. Remember, depending on what you are doing, this >>amount can get allocated more than once per query. >> >> >What's a good way to determine the optimal size? > > Practice. :) A few questions I guess... How many concurrent connections are you expecting? How many joins does a standard query have? How big are the joins? In general, I would tend to make this a smaller number, so that the os has more room to cache tables, rather than having big buffers for joins. If someone is requesting a join that requires a lot of rows, I would rather *that* query be slower, than impacting everyone else. I would put it more with a maximum in the 20-100MB ra
Re: [PERFORM] FW: Tx forecast improving harware capabilities.
Sebastian, > We are having serious performance problems using JBOSS and PGSQL. How about some information about your application? Performance tuning approaches vary widely according to what you're doing with the database. Also, read this: http://www.powerpostgresql.com/PerfList > I'm sure the problem has to do with the application itself (and neither > with JBOSS nor PGSQL) but the fact is that we are using desktop > equipment to run both Jboss and Postgresql (An Athlon 2600, 1 Gb Ram, > IDE HDD with 60 Mb/sec Transfer Rate), and the answers arise: Well, first off, the IDE HDD is probably killing performance unless your application is 95% read or greater. > If we upgrade our hardware to a Dual Processor would the transactions > per second increase significantly? Would Postgresql take advantage from > SMP? Presumably yes, but can we do a forecast about the number of tps? If this is an OLTP application, chance are that nothing is going to improve performance until you get decent disk support. > What we need is a paper with some figures showing the expected > performance in different environments. Some study about the "degree of > correlation" between TPS and Number of Processors, Cache, Frequency, > Word Size, Architecture, etc. I don't think such a thing exists even for Oracle. Hardware configuration for maximum performance is almost entirely dependant on your application. If it helps, running DBT2 (an OLTP test devised by OSDL after TPC-C), I can easily get 1700 new orders per minute (NOTPM) (about 3000 total multiple-write transactions per minute) on a quad-pentium-III with 4GB RAM and 14 drives, and 6500 notpm on a dual-Itanium machine. > P.S. I've been looking at www.tpc.org but I could't find anything > valuable. Nor would you for any real-world situation even if we had a TPC benchmark (which are involved and expensive, give us a couple of years). The TPC benchmarks are more of a litmus test that your database system & platform are "competitive"; they don't really relate to real-world performance (unless you have budget for an 112-disk system!) -- Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] extremly low memory usage
On Wed, 2005-08-17 at 21:21 -0500, John A Meinel wrote: > Jeremiah Jahn wrote: > > I just put together a system with 6GB of ram on a 14 disk raid 10 array. > > When I run my usual big painful queries, I get very little to know > > memory usage. My production box (raid 5 4GB ram) hovers at 3.9GB used > > most of the time. the new devel box sits at around 250MB. > > > > I've switched to an 8.0 system on the new devel box, but the .conf > > really didn't change. Index usage is the same. Something seems wrong and > > I'm not sure why. > > > > How big is your actual database on disk? And how much of it is actually > touched by your queries? The DB is about 60GB. About 10GB is actually used in real queries, versus get me this single record with this ID. I have a large query that finds court cases based on certain criteria that is name based. I get a full seq scan on the name table in about 7 seconds, This table has about 6 million names (most being 'smith, something'). The index scan takes much less time of course, once it's been cached (somewhere but not apparently memory). The really query can take 60 seconds on a first run. And 1.3 seconds on a second run. I'm very happy with the cached results, just not really sure where that caching is happening since it doesn't show up as memory usage. I do know that the caching that happens seems to be independent of the DB. I can restart the DB and my speeds are still the same as the cached second query. Is there some way to pre-cache some of the tables/files on the file system? If I switch my query to search for 'jones%' instead of 'smith%', I take a hit. But if I then rerun the smith search, I still get cached speed. I only have two tables essentially names and events that have to do any real work ie. not very atomic data. I'd love to be able to force these two tables into a cache somewhere. This is a linux system (RHEL ES4) by the way. > > It seems that your tough queries might only be exercising a portion of > the database. If you really want to make memory usage increase try > something like: > find . -type f -print0 | xargs -0 cat >/dev/null > Which should read all the files. After doing that, does the memory usage > increase? > > > > > any thoughts, > > -jj- > > > > > > shared_buffers = 32768 # min 16, at least max_connections*2, 8KB > > each > > work_mem = 2097151 # min 64, size in KB > > This seems awfully high. 2GB Per sort? This might actually be flushing > some of your ram, since it would get allocated and filled, and then > freed when finished. Remember, depending on what you are doing, this > amount can get allocated more than once per query. What's a good way to determine the optimal size? > > > maintenance_work_mem = 819200 # min 1024, size in KB > > max_fsm_pages = 8 # min max_fsm_relations*16, 6 bytes each > > checkpoint_segments = 30# in logfile segments, min 1, 16MB each > > effective_cache_size = 360 <-this is a little out of control, but > > would it have any real effect? > > It should just tell the planner that it is more likely to have buffers > in cache, so index scans are slightly cheaper than they would otherwise be. > > > random_page_cost = 2# units are one sequential page fetch cost > > log_min_duration_statement = 1 # -1 is disabled, in milliseconds. > > lc_messages = 'C' # locale for system error message strings > > lc_monetary = 'C' # locale for monetary formatting > > lc_numeric = 'C'# locale for number formatting > > lc_time = 'C' # locale for time formatting > > > > John > =:-> -- "Now this is a totally brain damaged algorithm. Gag me with a smurfette." -- P. Buhr, Computer Science 354 ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] FW: Tx forecast improving harware capabilities.
Hello: We are having serious performance problems using JBOSS and PGSQL. I’m sure the problem has to do with the application itself (and neither with JBOSS nor PGSQL) but the fact is that we are using desktop equipment to run both Jboss and Postgresql (An Athlon 2600, 1 Gb Ram, IDE HDD with 60 Mb/sec Transfer Rate), and the answers arise: If we upgrade our hardware to a Dual Processor would the transactions per second increase significantly? Would Postgresql take advantage from SMP? Presumably yes, but can we do a forecast about the number of tps? What we need is a paper with some figures showing the expected performance in different environments. Some study about the “degree of correlation” between TPS and Number of Processors, Cache, Frequency, Word Size, Architecture, etc. It exists something like this? Does anybody has experience about this subject? Thanks in advance and best regards. P.S. I’ve been looking at www.tpc.org but I could’t find anything valuable.
Re: [PERFORM] Performance pb vs SQLServer.
Qingqing Zhou wrote: > "Alvaro Herrera" <[EMAIL PROTECTED]> writes > >>Interesting; do they use an overwriting storage manager like Oracle, or >>a non-overwriting one like Postgres? >> > > > They call this MVCC "RLV(row level versioning)". I think they use rollback > segment like Oracle (a.k.a "version store" or tempdb in SQL Server). Some > details are explained in their white paper:"Database concurrency and row > level versioning in SQL Server 2005". > > Regards, > Qingqing > I found the paper here: http://www.microsoft.com/technet/prodtechnol/sql/2005/cncrrncy.mspx And it does sound like they are doing it the Oracle way: When a record in a table or index is updated, the new record is stamped with the transaction sequence_number of the transaction that is doing the update. The previous version of the record is stored in the version store, and the new record contains a pointer to the old record in the version store. Old records in the version store may contain pointers to even older versions. All the old versions of a particular record are chained in a linked list, and SQL Server may need to follow several pointers in a list to reach the right version. Version records need to be kept in the version store only as long as there are there are operations that might require them. John =:-> signature.asc Description: OpenPGP digital signature
Re: [PERFORM] extremly low memory usage
On Aug 17, 2005, at 10:11 PM, Jeremiah Jahn wrote: I just put together a system with 6GB of ram on a 14 disk raid 10 array. When I run my usual big painful queries, I get very little to know memory usage. My production box (raid 5 4GB ram) hovers at 3.9GB used most of the time. the new devel box sits at around 250MB. Is the system performing fine? Are you touching as much data as the production box? If the system is performing fine don't worry about it. work_mem = 2097151 # min 64, size in KB This is EXTREMELY high. You realize this is the amount of memory that can be used per-sort and per-hash build in a query? You can end up with multiples of this on a single query. If you have some big queries that are run infrequently have them set it manually. effective_cache_size = 360 <-this is a little out of control, but would it have any real effect? This doesn't allocate anything - it is a hint to the planner about how much data it can assume is cached. -- Jeff Trout <[EMAIL PROTECTED]> http://www.jefftrout.com/ http://www.stuarthamm.net/ ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Performance pb vs SQLServer.
"Alvaro Herrera" <[EMAIL PROTECTED]> writes > > Interesting; do they use an overwriting storage manager like Oracle, or > a non-overwriting one like Postgres? > They call this MVCC "RLV(row level versioning)". I think they use rollback segment like Oracle (a.k.a "version store" or tempdb in SQL Server). Some details are explained in their white paper:"Database concurrency and row level versioning in SQL Server 2005". Regards, Qingqing ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly