Re: [PERFORM] Anything to be gained from a 'Postgres Filesystem'?
The intuitive thing would be to put pg into a file system. /Aaron On Thu, 21 Oct 2004 12:44:10 +0200, Leeuw van der, Tim [EMAIL PROTECTED] wrote: Hi, I guess the difference is in 'severe hacking inside PG' vs. 'some unknown amount of hacking that doesn't touch PG code'. Hacking PG internally to handle raw devices will meet with strong resistance from large portions of the development team. I don't expect (m)any core devs of PG will be excited about rewriting the entire I/O architecture of PG and duplicating large amounts of OS type of code inside the application, just to try to attain an unknown performance benefit. PG doesn't use one big file, as some databases do, but many small files. Now PG would need to be able to do file-management, if you put the PG database on a raw disk partition! That's icky stuff, and you'll find much resistance against putting such code inside PG. So why not try to have the external FS know a bit about PG and it's directory-layout, and it's IO requirements? Then such type of code can at least be maintained outside the application, and will not be as much of a burden to the rest of the application. (I'm not sure if it's a good idea to create a PG-specific FS in your OS of choice, but it's certainly gonna be easier than getting FS code inside of PG) cheers, --Tim -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Steinar H. Gunderson Sent: Thursday, October 21, 2004 12:27 PM To: [EMAIL PROTECTED] Subject: Re: [PERFORM] Anything to be gained from a 'Postgres Filesystem'? On Thu, Oct 21, 2004 at 08:58:01AM +0100, Matt Clark wrote: I suppose I'm just idly wondering really. Clearly it's against PG philosophy to build an FS or direct IO management into PG, but now it's so relatively easy to plug filesystems into the main open-source Oses, It struck me that there might be some useful changes to, say, XFS or ext3, that could be made that would help PG out. This really sounds like a poor replacement for just making PostgreSQL use raw devices to me. (I have no idea why that isn't done already, but presumably it isn't all that easy to get right. :-) ) /* Steinar */ -- Homepage: http://www.sesse.net/ ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED] ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings -- Regards, /Aaron ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Simple machine-killing query!
Sounds like you need some way to match a subset of the data first, rather than try indices that are bigger than the data. Can you add operation indices, perhaps on the first 10 bytes of the keys in both tables or on a integer hash of all of the strings? If so you could join on the exact set difference over the set difference of the operation match. /Aaron On Thu, 21 Oct 2004 17:34:17 +0300, Victor Ciurus [EMAIL PROTECTED] wrote: Hi all, I'm writing this because I've reached the limit of my imagination and patience! So here is it... 2 tables: 1 containing 27 million variable lenght, alpha-numeric records (strings) in 1 (one) field. (10 - 145 char lenght per record) 1 containing 2.5 million variable lenght, alpha-numeric records (strings) in 1 (one) field. table wehere created using: CREATE TABLE public.BIGMA (string VARCHAR(255) NOT NULL) WITH OIDS; + CREATE INDEX BIGMA_INDEX ON public.BIGMA USING btree (string); and CREATE TABLE public.DIRTY (string VARCHAR(128) NOT NULL) WITH OIDS; + CREATE INDEX DIRTY_INDEX ON public.DIRTY USING btree (string); What I am requested to do is to keep all records from 'BIGMA' that do not apear in 'DIRTY' So far I have tried solving this by going for: [explain] select * from BIGMA where string not in (select * from DIRTY); QUERY PLAN Seq Scan on bigma (cost=0.00..24582291.25 rows=500 width=145) Filter: (NOT (subplan)) SubPlan - Seq Scan on dirty (cost=0.00..42904.63 rows=2503963 width=82) (4 rows) AND [explain] select * from bigma,dirty where bigma.email!=dirty.email; QUERY PLAN --- Nested Loop (cost=20.00..56382092.13 rows=2491443185 width=227) Join Filter: ((inner.email)::text (outer.email)::text) - Seq Scan on dirty (cost=0.00..42904.63 rows=2503963 width=82) - Materialize (cost=20.00..30.00 rows=1000 width=145) - Seq Scan on bigma (cost=0.00..20.00 rows=1000 width=145) (5 rows) Now the problem is that both of my previous tries seem to last forever! I'm not a pqsql guru so that's why I'm asking you fellas to guide mw right! I've tried this on mysql previosly but there seems to be no way mysql can handle this large query. QUESTIONS: What can I do in order to make this work? Where do I make mistakes? Is there a way I can improve the performance in table design, query style, server setting so that I can get this monster going and producing a result? Thanks all for your preciuos time and answers! Victor C. ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED] -- Regards, /Aaron ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Free PostgreSQL Training, Philadelphia, Oct 30
I'm driving from Tenafly NJ and going to both sessions. If you're able to get to the George Washington Bridge (A train to 178th Street [Port Authority North] and a bus over the bridge), I can drive you down. I'm not sure right now about the return because I have confused plans to meet someone. /Aaron On Tue, 19 Oct 2004 14:43:29 -0400, Max Baker [EMAIL PROTECTED] wrote: On Wed, Oct 13, 2004 at 12:21:27PM -0400, Aaron Mulder wrote: All, My company (Chariot Solutions) is sponsoring a day of free PostgreSQL training by Bruce Momjian (one of the core PostgreSQL developers). The day is split into 2 sessions (plus a QA session): * Mastering PostgreSQL Administration * PostgreSQL Performance Tuning Registration is required, and space is limited. The location is Malvern, PA (suburb of Philadelphia) and it's on Saturday Oct 30. For more information or to register, see http://chariotsolutions.com/postgresql.jsp I'm up in New York City and would be taking the train down to Philly. Is anyone coming from Philly or New York that would be able to give me a lift to/from the train station? Sounds like a great event. Cheers, -m ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly -- Regards, /Aaron ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] Performance vs Schemas
Right - if you split a table to a lot of more selective tables, it can often dramatically change the plan options (e.g. - in a single table, selectivity for a query may be 1% and require an expensive nested loop while in the more restrictive table it may match 14% of the data and do a cheaper scan). Also - don't forget that just rebuilding a database cleanly can dramatically improve performance. The only dbms I know that indexes views is MS SQL Server 2000, where it is a limited form of materialized queries. pg doesn't do MQs, but check out functional indices. /Aaron - Original Message - From: Gregory S. Williamson [EMAIL PROTECTED] To: Igor Maciel Macaubas [EMAIL PROTECTED]; [EMAIL PROTECTED] Sent: Thursday, October 14, 2004 2:45 PM Subject: Re: [PERFORM] Performance vs Schemas Igor, I'm not sure if it is proper to state that schemas are themselves speeding things up. As an example, we have data that is usually accessed by county; when we put all of the data into one big table and select from it using a code for a county of interest, the process is fairly slow as there are several hundred thousand candidate rows from that county in a table with many millions of rows. When we broke out certain aspects of the data into schemas (one per county) the searches become very fast indeed because we can skip the searching for a specific county code with the relevant tables and there is less (unneeded) data in the table being searched. As always, EXPLAIN ANALYZE ... is your friend in understanding what the planner is doing with a given query. See http://www.varlena.com/varlena/GeneralBits/Tidbits/ for some useful information, especially under the performance tips section. HTH, Greg Williamson DBA GlobeXplorer LLC -Original Message- From: Igor Maciel Macaubas [mailto:[EMAIL PROTECTED] Sent: Thu 10/14/2004 11:38 AM To: [EMAIL PROTECTED] Cc: Subject: [PERFORM] Performance vs Schemas Hi all, I recently migrated my database from schema 'public' to multiple schema. I have around 100 tables, and divided them in 14 different schemas, and then adapted my application to use schemas as well. I could percept that the query / insert / update times get pretty much faster then when I was using the old unique schema, and I'd just like to confirm with you if using schemas speed up the things. Is that true ? What else I can do to speed up the query processing, best pratices, recommendations ... ? What about indexed views, does postgresql supports it? Regards, Igor -- [EMAIL PROTECTED] ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Normal case or bad query plan?
Makes sense. See DB2 8.2 info on their new implementation of cross column statistics. If this is common and you're willing to change code, you can fake that by adding a operation index on some hash function of both columns, and search for both columns and the hash. - Original Message - From: Kris Jurka [EMAIL PROTECTED] To: Gabriele Bartolini [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent: Monday, October 11, 2004 5:17 PM Subject: Re: [PERFORM] Normal case or bad query plan? On Mon, 11 Oct 2004, Gabriele Bartolini wrote: -- --- Seq Scan on ip2location (cost=0.00..30490.65 rows=124781 width=8) (actual time=5338.120..40237.283 rows=1 loops=1) Filter: ((1040878301::bigint = ip_address_from) AND (1040878301::bigint = ip_address_to)) Total runtime: 40237.424 ms I believe the problem is that pg's lack of cross-column statistics is producing the poor number of rows estimate. The number of rows mataching just the first 1040878301::bigint = ip_address_from condition is 122774 which is roughtly 10% of the table. I imagine the query planner believes that the other condition alone will match the other 90% of the table. The problem is that it doesn't know that these two ranges' intersection is actually tiny. The planner assumes a complete or nearly complete overlap so it thinks it will need to fetch 10% of the rows from both the index and the heap and chooses a seqscan. Kris Jurka ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] Data warehousing requirements
Consider how the fact table is going to be used, and review hacking it up based on usage. Fact tables should be fairly narrow, so if there are extra columns beyond keys and dimension keys consider breaking it into parallel tables (vertical partitioning). Horizontal partitioning is your friend; especially if it is large - consider slicing the data into chunks. If the fact table is date driven it might be worthwhile to break it into separate tables based on date key. This wins in reducing the working set of queries and in buffering. If there is a real hotspot, such as current month's activity, you might want to keep a separate table with just the (most) active data.Static tables of unchanged data can simplify backups, etc., as well. Consider summary tables if you know what type of queries you'll hit. Especially here, MVCC is not your friend because it has extra work to do for aggregate functions. Cluster helps if you bulk load. In most warehouses, the data is downstream data from existing operational systems. Because of that you're not able to use database features to preserve integrity. In most cases, the data goes through an extract/transform/load process - and the output is considered acceptable. So, no RI is correct for star or snowflake design. Pretty much no anything else that adds intelligence - no triggers, no objects, no constraints of any sort. Many designers try hard to avoid nulls. On the hardware side - RAID5 might work here because of the low volume if you can pay the write performance penalty. To size hardware you need to estimate load in terms of transaction type (I usually make bucket categories of small, medium, and large effort needs) and transaction rate. Then try to estimate how much CPU and I/O they'll use. /Aaron Let us not speak of them; but look, and pass on. - Original Message - From: Gabriele Bartolini [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Wednesday, October 06, 2004 5:36 PM Subject: [PERFORM] Data warehousing requirements Hi guys, I just discussed about my problem on IRC. I am building a Web usage mining system based on Linux, PostgreSQL and C++ made up of an OLTP database which feeds several and multi-purpose data warehouses about users' behaviour on HTTP servers. I modelled every warehouse using the star schema, with a fact table and then 'n' dimension tables linked using a surrogate ID. Discussing with the guys of the chat, I came up with these conclusions, regarding the warehouse's performance: 1) don't use referential integrity in the facts table 2) use INTEGER and avoid SMALLINT and NUMERIC types for dimensions' IDs 3) use an index for every dimension's ID in the fact table As far as administration is concerned: run VACUUM ANALYSE daily and VACUUM FULL periodically. Is there anything else I should keep in mind? Also, I was looking for advice regarding hardware requirements for a data warehouse system that needs to satisfy online queries. I have indeed no idea at the moment. I can only predict 4 million about records a month in the fact table, does it make sense or not? is it too much? Data needs to be easily backed up and eventually replicated. Having this in mind, what hardware architecture should I look for? How many hard disks do I need, what kind and what RAID solution do you suggest me to adopt (5 or 10 - I think)? Thank you so much, -Gabriele -- Gabriele Bartolini: Web Programmer, ht://Dig IWA/HWG Member, ht://Check maintainer Current Location: Prato, Toscana, Italia [EMAIL PROTECTED] | http://www.prato.linux.it/~gbartolini | ICQ#129221447 Leave every hope, ye who enter!, Dante Alighieri, Divine Comedy, The Inferno --- Outgoing mail is certified Virus Free. Checked by AVG anti-virus system (http://www.grisoft.com). Version: 6.0.773 / Virus Database: 520 - Release Date: 05/10/2004 ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED] ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Caching of Queries
The context of the discussion was a hack to speed queries against static tables, so MVCC is not relevent. As soon as any work unit against a referenced table commits, the cache is invalid, and in fact the table shouldn't be a candidate for this caching for a while. In fact, this cache would reduce some the MVCC 'select count(*) from us_states' type of horrors. (The attraction of a server side cache is obviously that it could *with no server or app changes* dramatically improve performance. A materialized view is a specialized denormalization-ish mechanism to optimize a category of queries and requires the DBA to sweat the details. It is very hard to cache things stochastically without writing a server. Trigger managed extracts won't help you execute 1,000 programs issuing the query select sec_level from sec where division=23 each second or a big table loaded monthly.) - Original Message - From: Jeff [EMAIL PROTECTED] To: Mitch Pirtle [EMAIL PROTECTED] Cc: Aaron Werman [EMAIL PROTECTED]; Scott Kirkwood [EMAIL PROTECTED]; Neil Conway [EMAIL PROTECTED]; [EMAIL PROTECTED]; Tom Lane [EMAIL PROTECTED] Sent: Monday, September 27, 2004 2:25 PM Subject: Re: [PERFORM] Caching of Queries [ discussion of server side result caching ] and lets not forget PG's major fork it will throw into things: MVCC The results of query A may hold true for txn 1, but not txn 2 and so on . That would have to be taken into account as well and would greatly complicate things. It is always possible to do a poor man's query cache with triggers.. which would just leave you with basically a materialized view. -- Jeff Trout [EMAIL PROTECTED] http://www.jefftrout.com/ http://www.stuarthamm.net/ ---(end of broadcast)--- TIP 8: explain analyze is your friend ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Slow update/insert process
Some quick notes: - Using a side effect of a function to update the database feels bad to me - how long does theSELECT into varQueryRecord md5(upc.keyp function take / what does it's explain look like? - There are a lot of non-indexed columns on that delta master table, such as keyf_upc. I'm guessing you're doing 90,000 x {a lot of slow scans} - My temptation would be to rewrite the processing to do a pass of updates, a pass of inserts, and then the SELECT - Original Message - From: Patrick Hatcher To: [EMAIL PROTECTED] Sent: Friday, October 01, 2004 2:14 PM Subject: [PERFORM] Slow update/insert process Pg: 7.4.5 RH 7.3 8g Ram 200 g drive space RAID0+1 Tables vacuum on a nightly basis The following process below takes 8 hours to run on 90k records and I'm not sure where to being to look for the bottleneck. This isn't the only updating on this database that seems to take a long time to complete. Is there something I should be looking for in my conf settings? TIA Patrick SQL: ---Bring back only selected records to run through the update process. --Without the function the SQL takes 10secs to return 90,000 records SELECT count(pm.pm_delta_function_amazon(upc.keyp_upc,'amazon')) FROM mdc_upc upc JOIN public.mdc_products prod ON upc.keyf_products = prod.keyp_products JOIN public.mdc_price_post_inc price ON prod.keyp_products = price.keyf_product JOIN public.mdc_attribute_product ap on ap.keyf_products = prod.keyp_products and keyf_attribute=22 WHERE upper(trim(ap.attributevalue)) NOT IN ('ESTEE LAUDER', 'CLINIQUE','ORGINS','PRESCRIPTIVES','LANC?ME','CHANEL','ARAMIS','M.A.C','TAG HEUER') AND keyf_producttype222 AND prod.action_publish = 1; Function: CREATE OR REPLACE FUNCTION pm.pm_delta_function_amazon(int4, "varchar")RETURNS bool AS'DECLARE varkeyf_upc ALIAS FOR $1; varPassword ALIAS FOR $2; varRealMD5 varchar; varDeltaMD5 varchar; varLastTouchDate date; varQuery text; varQuery1 text; varQueryMD5 text; varQueryRecordrecord; varFuncStatusboolean := false; BEGIN-- Check the passwordIF varPassword \'amazon\' THEN Return false;END IF;-- Get the md5 hash for this productSELECT into varQueryRecord md5(upc.keyp_upc || prod.description || pm.pm_price_post_inc(prod.keyp_products)) AS md5 FROM public.mdc_upc upc JOIN public.mdc_products prod ON upc.keyf_products = prod.keyp_products JOIN public.mdc_price_post_inc price ON price.keyf_product = prod.keyp_products WHERE upc.keyp_upc = varkeyf_upc LIMIT 1 ;IF NOT FOUND THEN RAISE EXCEPTION \'varRealMD5 is NULL. UPC ID is %\', varkeyf_upc;ELSE varRealMD5:=varQueryRecord.md5;END IF;-- Check that the product is in the delta table and return its hash for comparison SELECT into varQueryRecord md5_hash,last_touch_date FROM pm.pm_delta_master_amazon WHERE keyf_upc = varkeyf_upc LIMIT 1;IF NOT FOUND THEN -- ADD and exit INSERT INTO pm.pm_delta_master_amazon (keyf_upc,status,md5_hash,last_touch_date) values (varkeyf_upc,\'add\',varRealMD5,CURRENT_DATE); varFuncStatus:=true; RETURN varFuncStatus;ELSE --Update the record --- If the hash matches then set the record to HOLD IF varRealMD5 = varQueryRecord.md5_hash THEN UPDATE pm.pm_delta_master_amazon SET status= \'hold\', last_touch_date = CURRENT_DATE WHERE keyf_upc = varkeyf_upc AND last_touch_date CURRENT_DATE; varFuncStatus:=true; ELSE -- ELSE mark the item as ADD UPDATE pm.pm_delta_master_amazon SET status= \'add\', last_touch_date = CURRENT_DATE WHERE keyf_upc = varkeyf_upc; varFuncStatus:=true; END IF; END IF;RETURN varFuncStatus;END;'LANGUAGE 'plpgsql' IMMUTABLE;TableDef CREATE TABLE pm.pm_delta_master_amazon ( keyf_upc int4 , status varchar(6) , md5_hashvarchar(40) , last_touch_date date ) GO CREATE INDEX status_idx ON pm.pm_delta_master_amazon(status) GO CONF # WRITE AHEAD LOG #--- # - Settings - #fsync = true # turns forced synchronization on or off #wal_sync_method = fsync# the default varies across platforms: # fsync, fdatasync, open_sync, or open_datasync wal_buffers = 32 # min 4, 8KB each # - Checkpoints - checkpoint_segments = 50# in logfile segments, min 1, 16MB each checkpoint_timeout = 600# range 30-3600, in seconds #checkpoint_warning = 30 # 0 is off, in seconds #commit_delay = 0 # range 0-10, in microseconds #commit_siblings = 5 # range 1-1000 Patrick HatcherMacys.Com
Re: [PERFORM] Caching of Queries
I'm not sure I understand your req fully. If the same request is repeatedly done with same parameters, you could implement a proxy web server with a croned script to purge stale pages. If there is substantially the same data being summarized, doing your own summary tables works; if accessed enough, they're in memory. I interleaved some notes into your posting. - Original Message - From: Josh Berkus [EMAIL PROTECTED] To: Postgresql Performance [EMAIL PROTECTED] Sent: Friday, October 01, 2004 1:10 PM Subject: Re: [PERFORM] Caching of Queries People: Transparent query caching is the industry standard for how these things are handled. However, Postgres' lack of this feature has made me consider other approaches, and I'm starting to wonder if the standard query caching -- where a materialized query result, or some reduction thereof, is cached in database memory -- isn't the best way to cache things. I'm going to abbreviate it SQC for the rest of this e-mail. Obviously, the draw of SQC is its transparency to developers. With it, the Java/Perl/PHP programmers and the DBA don't have to communicate at all -- you set it up, give it some RAM, and it just works. As someone who frequently has to consult based on limited knowledge, I can understand the appeal. My sense is that pg is currently unique among popular dbmses in having the majority of applications being homegrown (a chicken / egg / advocacy issue - if I install a CMS, I'm not the DBA or the PHP programmer - and I don't want to change the code; we'll see more about this when native WinPg happens). However, one of the problems with SQC, aside from the ones already mentioned of stale data and/or cache-clearing, is that (at least in applications like MySQL's) it is indiscriminate and caches, at least breifly, unique queries as readily as common ones. Possibly Oracle's implementation is more sophisticated; I've not had an opportunity. I'm not sure I agree here. Stale data and caching choice are optimizer/buffer manager choices and implementation can decide whether to allow stale data. These are design choices involving development effort and choices of where to spend server cycles and memory. All buffering choices cache unique objects, I'm not sure why this is bad (but sensing you want control of the choices). FWIW, this is my impression of other dbmses. In MySQL, a global cache can be specified with size and globally, locally, or through statement hints in queries to suggest caching results. I don't believe that these could be used as common subexpressions (with an exception of MERGE table component results). The optimizer knows nothing about the cached results - SQL select statements are hashed, and can be replaced by the the cached statement/results on a match. In DB2 and Oracle result sets are not cached. They have rich sets of materialized view features (that match your requirements). They allow a materialized view to be synchronous with table updates or asynchronous. Synchronous is often an unrealistic option, and asynchronous materialized views are refreshed at a specified schedule. The optimizers allow query rewrite (in Oracle it is a session option) so one can connect to the database and specify that the optimizer is allowed to replace subexpressions with data from (possibly stale) materialized views. SQL Server 2K has more restrictive synchronous MVs, but I've never used them. So, in your example use in Oracle, you would need to define appropriate MVs with a ½ hour refresh frequency, and hope that the planner would use them in your queries. The only change in the app is on connection you would allow use of asynchronous stale data. You're suggesting an alternative involving identifying common, but expensive, subexpressions and generating MVs for them. This is a pretty sophisticated undertaking, and probably requires some theory research to determine if it's viable. The other half of that problem is that an entire query is cached, rather than just the relevant data to uniquely identify the request to the application. This is bad in two respects; one that the entire query needs to be parsed to see if a new query is materially equivalent, and that two materially different queries which could utilize overlapping ranges of the same underlying result set must instead cache their results separately, eating up yet more memory. There are two separate issues. The cost of parse/optimization and the cost of results retrieval. Other dbmses hash statement text. This is a good thing, and probably 3 orders of magnitude faster than parse and optimization. (Oracle also has options to replace literals with parameters and match parse trees instead of text, expecting parse costs to be less than planning costs.) MySQL on a match simply returns the result set. Oracle and DB2 attempt to rewrite queries to use the DBA selected extracts. The MySQL approach seems to be almost what you're describing: all it needs
Re: [PERFORM] Caching of Queries
- Original Message - From: Tom Lane [EMAIL PROTECTED] To: Aaron Werman [EMAIL PROTECTED] Cc: Iain [EMAIL PROTECTED]; Jim C. Nasby [EMAIL PROTECTED]; [EMAIL PROTECTED] Sent: Tuesday, September 28, 2004 9:58 AM Subject: Re: [PERFORM] Caching of Queries Aaron Werman [EMAIL PROTECTED] writes: I imagine a design where a shared plan cache would consist of the plans, indexed by a statement hash and again by dependant objects. A statement to be planned would be hashed and matched to the cache. DDL would need to synchronously destroy all dependant plans. If each plan maintains a validity ^ flag, changing the cache wouldn't have to block so I don't see where there ^^ would be contention. You have contention to access a shared data structure *at all* -- for instance readers must lock out writers. Or didn't you notice the self- contradictions in what you just said? Our current scalability problems dictate reducing such contention, not adding whole new sources of it. You're right - that seems unclear. What I meant is that there can be a global hash table that is never locked, and the hashes point to chains of plans that are only locally locked for maintenance, such as gc and chaining hash collisions. If maintenance was relatively rare and only local, my assumption is that it wouldn't have global impact. The nice thing about plan caching is that it can be sloppy, unlike block cache, because it is only an optimization tweak. So, for example, if the plan has atomic refererence times or counts there is no need to block, since overwriting is not so bad. If the multiprocessing planner chains the same plan twice, the second one would ultimately age out /Aaron regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Data Warehouse Reevaluation - MySQL vs Postgres -- merge tables
Mark Cotner wrote: The time has come to reevaluate/rearchitect an application which I built about 3 years ago. There are no performance concerns with MySQL, but it would benefit greatly from stored procedures, views, etc. From: Mischa Sandberg [EMAIL PROTECTED] If your company is currently happy with MySQL, there probably are other (nontechnical) reasons to stick with it. I'm impressed that you'd consider reconsidering PG. I'd like to second Mischa on that issue. In general, if you migrate an *existing* application from one RDBMS to another, you should expect performance to decrease significantly. This is always true in a well performing system even if the replacement technology is more sophisticated. This is because of several factors. Even if you try to develop in totally agnostic generic SQL, you are always customizing to a feature set, namely the ones in the current system. Any existing application has had substantial tuning and tweaking, and the new one is at a disadvantage. Moreover, an existing system is a Skinnerian reward/punishment system to the developers and DBAs, rewarding or punishing them for very environment specific choices - resulting in an application, dbms, OS, and platform that are both explicitly and unconsciously customized to work together in a particular manner. The net effect is a rule of thumb that I use: NEVER reimplement an existing system unless the project includes substantial functional imporovement. Every time I've broken that rule, I've found that users expectations, based on the application they are used to, are locked in. Any place where the new system is slower, the users are dissatisfied; where it exceeds expectations it isn't appreciated: the users are used to the old system quirks, and the improvements only leave them uncomforable since the system acts differently. (I've broken the rule on occation for standardization conversions.) My expectation is that pg will not get a fair shake here. If you do it - I'd like to see the results anyway. /Aaron ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Data Warehouse Reevaluation - MySQL vs Postgres -- merge tables
From: Harald Lau (Sector-X) [EMAIL PROTECTED] ... From: Mischa Sandberg [EMAIL PROTECTED] If your company is currently happy with MySQL, there probably are other (nontechnical) reasons to stick with it. I'm impressed that you'd consider reconsidering PG. I'd like to second Mischa on that issue. Though both of you are right from my point of view, I don't think it's very useful to discuss this item here. It is kinda windy for the list, but the point is that a big part of performance is developer expectation and user expectation. I'd hope to lower expectations before we see an article in eWeek. Perhaps this thread should move to the advocacy list until the migration needs specific advice. _ Get ready for school! Find articles, homework help and more in the Back to School Guide! http://special.msn.com/network/04backtoschool.armx ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Looking for ideas on how to speed up warehouse loading
By definition, it is equivalent to: SELECT t1.id, t2.url FROM referral_temp t2 LEFT /*OUTER*/ JOIN d_referral t1 ON t2.url = t1.referral_raw_url union all SELECT null, url FROM referral_temp WHERE url is null ORDER BY 1; /Aaron - Original Message - From: Joe Conway [EMAIL PROTECTED] To: Sean Shanny [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent: Friday, April 23, 2004 12:38 AM Subject: Re: [PERFORM] Looking for ideas on how to speed up warehouse loading Sean Shanny wrote: explain analyze SELECT t1.id, t2.url FROM referral_temp t2 LEFT OUTER JOIN d_referral t1 ON t2.url = t1.referral_raw_url ORDER BY t1.id; What I would like to know is if there are better ways to do the join? I need to get all the rows back from the referral_temp table as they are used for assigning FK's for the fact table later in processing. When I iterate over the values that I get back those with t1.id = null I assign a new FK and push both into the d_referral table as new entries as well as a text file for later use. The matching records are written to a text file for later use. Would something like this work any better (without disabling index scans): SELECT t1.id, t2.url FROM referral_temp t2, d_referral t1 WHERE t1.referral_raw_url = t2.url; process rows with a match SELECT t1.id, t2.url FROM referral_temp t2 WHERE NOT EXISTS (select 1 FROM d_referral t1 WHERE t1.referral_raw_url = t2.url); process rows without a match ? Joe ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] Help with performance problems
Your second server has queuing (load averages are highish), only 2 processes running, and almost all cycles are idle. You need to track down your bottleneck. Have you looked at iostat/vmstat? I think it would be useful to post these, ideally both before and after full vacuum analyze. /Aaron - Original Message - From: Chris Hoover [EMAIL PROTECTED] To: [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent: Friday, April 23, 2004 9:31 AM Subject: [PERFORM] Help with performance problems I need some help. I have 5 db servers running our database servers, and they all are having various degrees of performance problems. The problems we are experiencing are: 1. General slowness 2. High loads All of our db's are running on Dell Poweredge 2650 with 2 P4 Xeons (2.8 - 3.06 GHz) with 8 to 12 GB of memory. The databases are running on attached Dell Powervault 220s running raid5. The databases were created and taken into production before I started working here and are very flat. Most of the major tables have a combined primary key using an int field and a single char field. There are some additional indexes on some tables. Most queries I see in the logs are running at less than .01 seconds with many significantly slower. We are trying to narrow down the performance problem to either the db or the hardware. As the dba, I need to try and get these db's tuned to the best possible way considering the current db state. We are in the beginning of a complete db redesign and application re-write, but the completion and deployment of the new db and app are quite a ways off. Anyway, we are running the following: PE 2650 w/ 2 cpus (2.8-3.06) - HT on 8-12 GB memory OS on raid 0 DB's on Powervaults 220S using raid 5 (over 6 disks) Each Postgresql cluster has 2 db up to almost 170db's (project to level out the num of db's/cluster is being started) DB's are no bigger than a few GB in size (largest is about 11GB according to a du -h) Running RH ES 2.1 Here is the postgresql.conf from the server with the 11GB db: max_connections = 64 shared_buffers = 32768 # 256MB=32768(buffs)*8192(bytes/buff) max_fsm_relations = 1000 # min 10, fsm is free space map, ~40 bytes max_fsm_pages = 1 # min 1000, fsm is free space map, ~6 bytes sort_mem = 4096 # 256MB=4096(bytes/proc)*64(procs or conns) checkpoint_segments = 16 # in logfile segments, min 1, 16MB each checkpoint_timeout = 30 # range 30-3600, in seconds effective_cache_size = 131072 # typically 8KB each log_connections = true log_pid = true log_statement = true log_duration = true log_timestamp = true stats_start_collector = true stats_reset_on_server_start = true stats_command_string = true stats_row_level = true stats_block_level = true LC_MESSAGES = 'en_US' LC_MONETARY = 'en_US' LC_NUMERIC = 'en_US' LC_TIME = 'en_US' Here is top (server running pretty good right now) 9:28am up 25 days, 16:02, 2 users, load average: 0.54, 0.33, 0.22 94 processes: 91 sleeping, 3 running, 0 zombie, 0 stopped CPU0 states: 64.0% user, 0.1% system, 0.0% nice, 34.0% idle CPU1 states: 29.0% user, 9.0% system, 0.0% nice, 60.0% idle CPU2 states: 2.0% user, 0.1% system, 0.0% nice, 96.0% idle CPU3 states: 0.0% user, 0.0% system, 0.0% nice, 100.0% idle Mem: 7720072K av, 7711648K used,8424K free, 265980K shrd, 749888K buff Swap: 2096440K av, 22288K used, 2074152K free 6379304K cached Here is top from another server (with the most db's): 9:31am up 25 days, 16:05, 5 users, load average: 2.34, 3.39, 4.28 147 processes: 145 sleeping, 2 running, 0 zombie, 0 stopped CPU0 states: 6.0% user, 1.0% system, 0.0% nice, 91.0% idle CPU1 states: 9.0% user, 4.0% system, 0.0% nice, 85.0% idle CPU2 states: 9.0% user, 3.0% system, 0.0% nice, 86.0% idle CPU3 states: 9.0% user, 4.0% system, 0.0% nice, 85.0% idle Mem: 7721096K av, 7708040K used, 13056K free, 266132K shrd, 3151336K buff Swap: 2096440K av, 24208K used, 2072232K free 3746596K cached Thanks for any help/advice, Chris ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED]) ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] possible improvement between G4 and G5
There are a few things that you can do to help force yourself to be I/O bound. These include: - RAID 5 for write intensive applications, since multiple writes per synch write is good. (There is a special case for logging or other streaming sequential writes on RAID 5) - Data journaling file systems are helpful in stress testing your checkpoints - Using midsized battery backed up write through buffering controllers. In general, if you have a small cache, you see the problem directly, and a huge cache will balance out load and defer writes to quieter times. That is why a midsized cache is so useful in showing stress in your system only when it is being stressed. Only partly in jest, /Aaron BTW - I am truly curious about what happens to your system if you use separate RAID 0+1 for your logs, disk sorts, and at least the most active tables. This should reduce I/O load by an order of magnitude. Vivek Khera [EMAIL PROTECTED] wrote in message news:[EMAIL PROTECTED] JB == Josh Berkus [EMAIL PROTECTED] writes: JB Aaron, I do consulting, so they're all over the place and tend to be complex. Very few fit in RAM, but still are very buffered. These are almost all backed with very high end I/O subsystems, with dozens of spindles with battery backed up writethrough cache and gigs of buffers, which may be why I worry so much about CPU. I have had this issue with multiple servers. JB Aha, I think this is the difference. I never seem to be able to JB get my clients to fork out for adequate disk support. They are JB always running off single or double SCSI RAID in the host server; JB not the sort of setup you have. Even when I upgraded my system to a 14-spindle RAID5 with 128M cache and 4GB RAM on a dual Xeon system, I still wind up being I/O bound quite often. I think it depends on what your working set turns out to be. My workload really spans a lot more of the DB than I can end up caching. -- =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= Vivek Khera, Ph.D.Khera Communications, Inc. Internet: [EMAIL PROTECTED] Rockville, MD +1-301-869-4449 x806 AIM: vivekkhera Y!: vivek_khera http://www.khera.org/~vivek/ ---(end of broadcast)--- TIP 8: explain analyze is your friend ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Moving postgres to FC disks
I agree on not linking and adding non-SAN disk dependancy to your DB. I'm trying to understand your FS reasoning. I have never seen XFS run faster than ReiserFS in any situation (or for that matter beat any FS in performance except JFS). XFS has some nifty very large file features, but we're talking about 30G and all modern FSs support 2G files. My tendancy would be to stay on ext3, since it is the default RH FS. I would review site preference and the SAN recommended FS and see if they add any compelling points. /Aaron - Original Message - From: Joshua D. Drake To: Anjan Dave Cc: [EMAIL PROTECTED] Sent: Tuesday, April 20, 2004 8:27 PM Subject: Re: [PERFORM] Moving postgres to FC disks -With the db size being as big as, say, 30+GB, how do I move it on the new logical drive? (stop postgresql, and simply move it over somehow and make a link?)I would stop the database, move the data directory to the new volume using rsync then start up postgresql pointed at the new data directory.Providing everything is working correctly you can then remove the old data directory. -Currently, the internal RAID volume is ext3 filesystem. Any recommendations for the filesystem on the new FC volume? Rieserfs? XFS DBs are 7.4.1(RH9), and 7.2.3 (RH8). Appreciate any pointers. Thanks,Anjan-- Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC Postgresql support, programming shared hosting and dedicated hosting. +1-503-667-4564 - [EMAIL PROTECTED] - http://www.commandprompt.com PostgreSQL Replicator -- production quality replication for PostgreSQL
Re: [PERFORM] good pc but bad performance,why?
fsync I'm thinking 50 inserts, if autocommiting is 50TPS = ~100 IO per second (50 WAL + checkpoint IO) = roughly the I/O rate of a single drive. Huang - Are you using a single drive for pg? If so, there is a safety problem of both the data and logs used for recovery on the same drive. If the drive crashes, there is nothing left for recovery. Also, there is a big contention issue, since the log is a fast sequential write, and checkpointing is random. If the log is on a separate drive, you'll probably see insert speed at disk sequential write speed, since the other drive(s) should hopefully be able to keep up when checkpointing. If they share the same drive, you'll see an initial burst of inserts, then a order of magnitude performance drop-off as soon as you checkpoint - because the disk is interleaving the log and data writes. fsync off is only appropriate for externally recoverable processes, such as loading an empty server from a file. /Aaron - Original Message - From: Richard Huxton [EMAIL PROTECTED] To: huang yaqin [EMAIL PROTECTED]; Tom Lane [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent: Wednesday, April 07, 2004 4:33 AM Subject: Re: [PERFORM] good pc but bad performance,why? On Wednesday 07 April 2004 05:00, huang yaqin wrote: hello Thanks, you are right. I use postmaster -o -F to start my PGand performance improved greatly. I don't think Tom was recommending turning fsync off. If you have a system crash/power glitch then the database can become corrupted. If you are happy the possibility if losing your data, write performance will improve noticably. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] possible improvement between G4 and G5
- Original Message - From: Tom Lane [EMAIL PROTECTED] To: Qing Zhao [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent: Tuesday, April 06, 2004 1:47 AM Subject: Re: [PERFORM] possible improvement between G4 and G5 Qing Zhao [EMAIL PROTECTED] writes: We have got a G5 64-bit processor to replace an old G4 32-bit processor. Given everything else equal, should we see a big improvement on PG's performance? Nope. Database performance typically depends on disk performance first, and RAM size second. I'm surprised by this thought. I tend to hit CPU bottlenecks more often than I/O ones. In most applications, db I/O is a combination of buffer misses and logging, which are both reasonably constrained. RAM size seems to me to be the best way to improve performance, and then CPU which is needed to perform the in-memory searching, locking, versioning, and processing, and finally I/O (this is not the case in small I/O subsystems - if you have less than a dozen drives, you're easily I/O bound). I/O is often the thing I tune first, because I can do it in place without buying hardware. Conceptually, an RDBMS converts slow random I/O into in memory processing and sequential logging writes. If successful, it should reduce the I/O overhead. /Aaron ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] possible improvement between G4 and G5
- Original Message - From: Josh Berkus [EMAIL PROTECTED] To: Aaron Werman [EMAIL PROTECTED]; Qing Zhao [EMAIL PROTECTED]; Tom Lane [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent: Tuesday, April 06, 2004 2:52 PM Subject: Re: [PERFORM] possible improvement between G4 and G5 Aaron, I'm surprised by this thought. I tend to hit CPU bottlenecks more often than I/O ones. In most applications, db I/O is a combination of buffer misses and logging, which are both reasonably constrained. Not my experience at all. In fact, the only times I've seen modern platforms max out the CPU was when: a) I had bad queries with bad plans, or b) I had reporting queires that did a lot of calculation for display (think OLAP). Otherwise, on the numerous servers I administrate, RAM spikes, and I/O bottlenecks, but the CPU stays almost flat. Of course, most of my apps are large databases (i.e. too big for RAM) with a heavy transaction-processing component. What kind of applications are you running? -- -Josh Berkus Aglio Database Solutions San Francisco hot air I do consulting, so they're all over the place and tend to be complex. Very few fit in RAM, but still are very buffered. These are almost all backed with very high end I/O subsystems, with dozens of spindles with battery backed up writethrough cache and gigs of buffers, which may be why I worry so much about CPU. I have had this issue with multiple servers. Consider an analysis db with 10G data. Of that, 98% of the access is read and only 2% write (that is normal for almost anything that is not order entry, even transaction processing with thorough cross validation). Almost all the queries access 10%, or 1G of the data. Of the reads, they average ~3 level b-trees, with the first 2 levels certainly cached, and the last ones often cached. Virtually all the I/O activity is logical reads against buffer. A system with a 100 transactions which on average access 200 rows does 98% of 200 rows x 100 transactions x 3 logical I/Os per read = 58,800 logical reads, of which actually maybe a hundred are physical reads. It also does 2% of 200 rows x 100 transactions x (1 table logical I/O and say 2 index logical writes) per write = 1,200 logical writes to log, of which there are 100 transaction commit synch writes, and in reality less than that because of queuing against logs (there are also 1,200 logical writes deferred to checkpoint, of which it is likely to only be 40 physical writes because of page overlaps). Transaction processing is a spectrum between activity logging, and database centric design. The former, where actions are stored in the database is totally I/O bound with the engine acting as a thin layer of logical to physical mapping. Database centric processing makes the engine a functional server of discrete actions - and is a big CPU hog. What my CPU tends to be doing is a combination of general processing, complex SQL processing: nested loops and sorting and hashing and triggers and SPs. I'm curious about you having flat CPU, which is not my experience. Are your apps mature and stable? /hot air /Aaron ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] single index on more than two coulumns a bad thing?
You're absolutely correct that the general rule is to lead a composite index with the highest cardinality index columns for fastest selectivity. Indices and all physical design are based on usage. In this case of unique indices supporting primary keys in a hierarchy, it depends. For selection of small sets of arbitrary rows, your arrangement is best. For hierarchy based queries, such as for grandparent of foo, and parent of bar, give average age of sons - the hierarchy based index is often more efficient. Surrogate keys have a role, and can improve performance, but also carry an enormous penalty of intentionally obfuscating logical keys and data semantics, and almost always lead to data errors not being caught because they obscure irrational relationships. I hate them, but use them frequently in high transaction rate operational systems where there is much functional validation outside the dbms (and the apps behave therefore like object databases and surrogate keys are network database pointers) and in data warehousing (where downstream data cannot be corrected anyway). /Aaron - Original Message - From: Leeuw van der, Tim [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Sunday, April 04, 2004 5:06 PM Subject: Re: [PERFORM] single index on more than two coulumns a bad thing? Hi Aaron, -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of Aaron Werman Sent: vrijdag 2 april 2004 13:57 another thing that I have all over the place is a hierarchy: index on grandfather_table(grandfather) index on father_table(grandfather, father) index on son_table(grandfather, father, son) It depends on your data-distribution, but I find that in almost all cases it's beneficial to have your indexes the other way round in such cases: index on grandfather_table(grandfather) index on father_table(father, grandfather) index on son_table(son, father, grandfather) That usually gives a less common, more selective value at the start of the index, making the initial selection in the index smaller. And AFAIK I don't have to rewrite my queries for that; the planner doesn't care about the order of expressions in the query that are on the same level. That said, I tend to use 'surrogate keys'; keys generated from sequences or auto-number columns for my tables. It makes the tables less readable, but the indexes remain smaller. Greetings, --Tim ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] PostgreSQL and Linux 2.6 kernel.
Almost any cross dbms migration shows a drop in performance. The engine effectively trains developers and administrators in what works and what doesn't. The initial migration thus compares a tuned to an untuned version. /Aaron - Original Message - From: Josh Berkus [EMAIL PROTECTED] To: Gary Doades [EMAIL PROTECTED]; [EMAIL PROTECTED] Sent: Saturday, April 03, 2004 1:59 PM Subject: Re: [PERFORM] PostgreSQL and Linux 2.6 kernel. Gary, There are no indexes on the columns involved in the update, they are not required for my usual select statements. This is an attempt to slightly denormalise the design to get the performance up comparable to SQL Server 2000. We hope to move some of our databases over to PostgreSQL later in the year and this is part of the ongoing testing. SQLServer's query optimiser is a bit smarter that PostgreSQL's (yet) so I am hand optimising some of the more frequently used SQL and/or tweaking the database design slightly. Hmmm ... that hasn't been my general experience on complex queries. However, it may be due to a difference in ANALYZE statistics. I'd love to see you increase your default_stats_target, re-analyze, and see if PostgreSQL gets smarter. -- -Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] single index on more than two coulumns a bad thing?
another thing that I have all over the place is a hierarchy: index on grandfather_table(grandfather) index on father_table(grandfather, father) index on son_table(grandfather, father, son) almost all of my indices are composite. Are you thinking about composite indices with low cardinality leading columns? /Aaron - Original Message - From: Josh Berkus [EMAIL PROTECTED] To: Palle Girgensohn [EMAIL PROTECTED]; [EMAIL PROTECTED] Sent: Thursday, April 01, 2004 7:35 PM Subject: Re: [PERFORM] single index on more than two coulumns a bad thing? Palle, Is it always bad to create index xx on yy (field1, field2, field3); No, it seldom bad, in fact.I have some indexes that run up to seven columns, becuase they are required for unique keys. Indexes of 3-4 columns are often *required* for many-to-many join tables. I'm afraid that you've been given some misleading advice. I guess the problem is that the index might often grow bigger than the table, or at least big enough not to speed up the queries? Well, yes ... a 4-column index on a 5-column table could be bigger than the table if allowed to bloat and not re-indexed. But that's just a reason for better maintainence. -- -Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] [ADMIN] Benchmarking postgres on Solaris/Linux
Are you talking about http://www.potentialtech.com/wmoran/postgresql.php#conclusion - Original Message - From: Subbiah, Stalin [EMAIL PROTECTED] To: [EMAIL PROTECTED]; Matt Clark [EMAIL PROTECTED]; Subbiah, Stalin [EMAIL PROTECTED]; 'Andrew Sullivan' [EMAIL PROTECTED]; [EMAIL PROTECTED] Sent: Tuesday, March 23, 2004 3:42 PM Subject: Re: [PERFORM] [ADMIN] Benchmarking postgres on Solaris/Linux As anyone done performance benchmark testing with solaris sparc/intel linux. I once read a post here, which had benchmarking test results for using different filesystem like xfs, ext3, ext2, ufs etc. i couldn't find that link anymore and google is failing on me, so anyone have the link handy. Thanks! -Original Message- From: Josh Berkus [mailto:[EMAIL PROTECTED] Sent: Tuesday, March 23, 2004 12:13 PM To: Matt Clark; Subbiah, Stalin; 'Andrew Sullivan'; [EMAIL PROTECTED] Subject: Re: [PERFORM] [ADMIN] Benchmarking postgres on Solaris/Linux Matt, Stalin, As for the compute intensive side (complex joins sorts etc), the Dell will most likely beat the Sun by some distance, although what the Sun lacks in CPU power it may make up a bit in memory bandwidth/ latency. Personally, I've been unimpressed by Dell/Xeon; I think the Sun might do better than you think, comparitively.On all the Dell servers I've used so far, I've not seen performance that comes even close to the hardware specs. -- -Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 8: explain analyze is your friend ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] atrocious update performance
Bulk updates are generally dogs (not just in pg), so I avoid doing them by doing faster selects and inserts. You can create a new table using 'create table as' to produce your target results. This is real fast - avoiding the row iteration in insert, allowing the select optimizer to run and no index overhead. Then alter/rename, add indexes and whatever else hangs off the table (or if you're lazy do an insert/select into the original target table). I often see 2 orders of magnitude improvement doing this, and no need to vacuum. /Aaron - Original Message - From: Rosser Schwarz [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Monday, March 15, 2004 3:28 PM Subject: [PERFORM] atrocious update performance We're in the throes of an MS SQL to PostgreSQL migration; our databases include a number of ~5M row tables. We decided to take this opportunity to clean up and slightly re-normalize our schemas, given what we've learned about the data over its lifetime and such, else we wouldn't be experiencing any of the following (we could instead just dump and `copy from`). We have a temporary table, public.tempprod, containing 4.7M rows, one for each row in account.cust. account.cust has, among others, two columns, prod and subprod, which we're trying to update from tempprod joined against prod. The update tends to take unnecessarily long-- rather, we've had to finally kill it after its taking obscenely too long. The table: # \d account.cust Table account.cust Column |Type | Modifiers ---+-+-- custid| bigint | not null default | | nextval('account.custid_seq'::text) ownerid | integer | not null origid| text| not null pname | text| fname | text| mname | text| lname | text| suffix| text| addr1 | text| addr2 | text| addr3 | text| city | text| state | text| zip | text| zipplus | text| homeph| text| workph| text| otherph | text| ssn | text| isactive | boolean | default true createddt | timestamp without time zone | default now() prodid| bigint | subprodid | bigint | Indexes: cust_pkey primary key, btree (custid) ix_addr1 btree (addr1) WHERE (addr1 IS NOT NULL) ix_addr2 btree (addr2) WHERE (addr2 IS NOT NULL) ix_city btree (city) WHERE (city IS NOT NULL) ix_fname btree (fname) WHERE (fname IS NOT NULL) ix_homeph btree (homeph) WHERE (homeph IS NOT NULL) ix_lname btree (lname) WHERE (lname IS NOT NULL) ix_mname btree (mname) WHERE (mname IS NOT NULL) ix_origid btree (origid) ix_ssn btree (ssn) WHERE (ssn IS NOT NULL) ix_state btree (state) WHERE (state IS NOT NULL) ix_workph btree (workph) WHERE (workph IS NOT NULL) ix_zip btree (zip) WHERE (zip IS NOT NULL) We're currently running on a dual Xeon 700 (I know, I know; it's what we've got) with 2.5GB RAM and 4x36GB SCSI in hardware RAID 5 (Dell Perc3 something-or-other controller). If we can demonstrate that PostgreSQL will meet our needs, we'll be going production on a dual Opteron, maxed memory, with a 12-disk Fibre Channel array. The query is: update account.cust set prodid = (select p.prodid from account.prod p join public.tempprod t on t.pool = p.origid where custid = t.did) And then, upon its completion, s/prod/subprod/. That shouldn't run overnight, should it, let alone for -days-? In experimenting with ways of making the updates take less time, we tried adding product and subproduct columns to tempprod, and updating those. That seemed to work marginally better: explain analyze update public.tempprod set prodid = (select account.prod.prodid::bigint from account.prod where public.tempprod.pool::text = account.prod.origid::text) Seq Scan on tempprod (cost=0.00..9637101.35 rows 4731410 width=56) (actual time=24273.467..16090470.438 rows=4731410 loops=1) SubPlan - Limit (cost=0.00..2.02 rows=2 width=8) (actual time=0.134..0.315 rows=1 loops=4731410) - Seq Scan on prod (cost=0.00..2.02 rows=2 width=8) (actual time=0.126..0.305 rows=1 loops=4731410) Filter: (($0)::text = (origid)::text) Total runtime: 2284551.962 ms But then going from public.tempprod to account.cust again takes days. I just cancelled an update that's been running since last Thursday. Alas,
Re: [PERFORM] atrocious update performance
The original point was about a very slow update of an entire table with a plan that looped, and over a dozen conditional indices - vs. a 'create as' in a CPU starved environment. I stand by my statement about observing the orders of magnitude difference. In theory I agree that the update should be in the same order of magnitude as the create as, but in practice I disagree. I also think something is wrong on the logical side (besides FKs, are there any triggers?) but was responding to the Gordian knot issue of bailing out of pg. Can you post a sample extract, Rosser? Otherwise, I'll try to put together a sample of a slow mass join update. /Aaron - Original Message - From: Tom Lane [EMAIL PROTECTED] To: Rosser Schwarz [EMAIL PROTECTED] Cc: [EMAIL PROTECTED] Sent: Monday, March 15, 2004 7:08 PM Subject: Re: [PERFORM] atrocious update performance Rosser Schwarz [EMAIL PROTECTED] writes: You can create a new table using 'create table as' to produce your target results. This is real fast ... I often see 2 orders of magnitude improvement doing this, and no need to vacuum. Indeed: Query returned successfully with no result in 582761 ms. Though I must say, ten minutes is nominally more than two orders of mangitude performance improvement, versus several days. Hm. There is no way that inserting a row is two orders of magnitude faster than updating a row --- they both require storing a new row and making whatever index entries are needed. The only additional cost of the update is finding the old row (not a very big deal AFAICS in the examples you gave) and marking it deleted (definitely cheap). So there's something awfully fishy going on here. I'm inclined to suspect an issue with foreign-key checking. You didn't give us any details about foreign key relationships your cust table is involved in --- could we see those? And the schemas of the other tables involved? Also, exactly which PG version is this? regards, tom lane ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Scaling further up
Sorry about not chiming in before - I've been too swamped to think. I agree with most of the points, but a lot of these posts are interesting and seem to describe systems from an SA perspective to my DBA-centric view. - Original Message - From: Marty Scholes [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent: Wednesday, March 10, 2004 6:29 PM Subject: Re: [PERFORM] Scaling further up I have some suggestions based on my anecdotal experience. 1. This is a relatively small DB -- the working set will likely be in RAM at any moment in time, making read I/O time mostly irrelevant. 2. The killer will be write times -- specifically log writes. Small and heavily synchronized writes, log and data writes, will drag down an impressive hardware RAID setup. We run mirrored hardware RAID 5 arrays with write back cache and are constantly seeking ways to improve write performance. We do a lot of batch processing, though, so we do a lot of write I/Os. My experience with RAID5 for streaming sequential writes is bad. This is sometimes helped by the hardware caching to cover the cost of the additional I/Os for striping (write through RAID5 + big cache acts like RAID 1+0 until you run out of cache). Batch processing is different from high concurrency transactions because it needs faster volume streaming, while TP is dependant on the speed of ack'ing (few big writes with less synchronous waits vs. lots of small writes which serialize everyone). (RAID 3 worked for me in the past for logging, but I haven't used it in years.) 3. Be very careful with battery backed write cache. It usually works as advertised. More than once in the past decade I have seen spontaneous cache corruption after power losss. The corruption usually happens when some admin, including me, has assumed that the cache will ALWAYS survive a power failure unblemished and has no plan B. Make sure you have a contingency plan for corruption, or don't enable the cache. I agree strongly. There is also the same problem with disk write back cache and even with SCSI controllers with write through enabled. PITR would help here. A lot of these problems are due to procedural error post crash. 4. RAID 10 will likely have bigger stripe sizes on the RAID 0 portion of the setup, and might hinder, not help small write I/O performance. In a high volume system without write caching you are almost always going to see queuing, which can make the larger buffer mostly irrelevant, if it's not huge. Write caching thrives on big block sizes (which is a key reason why Symmetrix doesn't do worse than it does) by reducing I/O counts. Most shops I've set up or seen use mirroring or RAID 10 for logs. Note also that many RAID 10 controllers in a non-write cached setup allows having a race between the two writers, acknowledging when the first of the two completes - increasing throughput by about 1/4. 5. Most (almost all) of the I/O time will be due to the access time (head seek + head settle + rotational latency) and very little of the I/O time will due to data transfer time. In other words, getting drives that provide faster transfer rates will barely improve performance. The secret is lowering the access time. True. This is very much a latency story. Even in volume batch, you can see access time that clearly shows some other system configuration bottleneck that happens elsewhere before hitting I/O capacity. 6. A relatively cheap way to drastically drop the access time is to get large drive(s) and only use a portion of them for storage. The less space used on the drive, the less area the heads need to cover for seeks. At one extreme, you could make the partition the size of a single cylinder. This would make access time (ignoring OS and controller overhead) identical to rotational latency, which is as low as 4.2 ms for a cheap 7200 RPM drive. This is a good strategy for VLDB, and may not be relevant in this case. Also - big sequential writes and 15K rpm drives, in the case of writethrough, is a beautiful thing - they look like a manufacturers' demo. A primary performance role of a RDBMS is to convert random I/O to sequential (by buffering reads and using a streaming log to defer random writes to checkpoints). RDBMS's are the prime beneficiaries of the drive speed improvements - since logging, backups, and copies are about the only things (ignoring bad perl scripts and find commands) that generate loads of 50+ mB/sec. /Aaron ---(end of broadcast)--- TIP 8: explain analyze is your friend