Re: [PERFORM] Why we don't want hints
robertmh...@gmail.com (Robert Haas) writes: On Thu, Feb 10, 2011 at 11:45 AM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Well, I'm comfortable digging in my heels against doing *lame* hints just because it's what all the other kids are doing, which I think is the only thing which would have satisfied the OP on this thread. From both on-list posts and ones exchanged off-list with me, it seems he was stubbornly resistant to properly tuning the server to see if any problems remained, or posting particular problems to see how they would be most effectively handled in PostgreSQL. We obviously can't be drawn into dumb approaches because of ill-informed demands like that. Nor was I proposing any such thing. But that doesn't make we don't want hints an accurate statement. Despite the impression that OP went away with, the real situation is a lot more nuanced than that, and the statement on the Todo list gives the wrong impression, IMHO. I have added the following comment to the ToDo: We are not interested to implement hints in ways they are commonly implemented on other databases, and proposals based on because they've got them will not be welcomed. If you have an idea that avoids the problems that have been observed with other hint systems, that could lead to valuable discussion. That seems to me to characterize the nuance. -- let name=cbbrowne and tld=gmail.com in String.concat @ [name;tld];; http://www3.sympatico.ca/cbbrowne/languages.html If only women came with pull-down menus and online help. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] getting the most of out multi-core systems for repeated complex SELECT statements
gnuo...@rcn.com writes: Time for my pet meme to wiggle out of its hole (next to Phil's, and a day later). For PG to prosper in the future, it has to embrace the multi-core/processor/SSD machine at the query level. It has to. And it has to because the Big Boys already do so, to some extent, and they've realized that the BCNF schema on such machines is supremely efficient. PG/MySql/OSEngineOfChoice will get left behind simply because the efficiency offered will be worth the price. I know this is far from trivial, and my C skills are such that I can offer no help. These machines have been the obvious current machine in waiting for at least 5 years, and those applications which benefit from parallelism (servers of all kinds, in particular) will filter out the winners and losers based on exploiting this parallelism. Much as it pains me to say it, but the MicroSoft approach to software: write to the next generation processor and force users to upgrade, will be the winning strategy for database engines. There's just way too much to gain. I'm not sure how true that is, really. (e.g. - too much to gain.) I know that Jan Wieck and I have been bouncing thoughts on valid use of threading off each other for *years*, now, and it tends to be interesting but difficult to the point of impracticality. But how things play out are quite fundamentally different for different usage models. It's useful to cross items off the list, so we're left with the tough ones that are actually a problem. 1. For instance, OLTP applications, that generate a lot of concurrent connections, already do perfectly well in scaling on multi-core systems. Each connection is a separate process, and that already harnesses multi-core systems perfectly well. Things have improved a lot over the last 10 years, and there may yet be further improvements to be found, but it seems pretty reasonable to me to say that the OLTP scenario can be treated as solved in this context. The scenario where I can squint and see value in trying to multithread is the contrast to that, of OLAP. The case where we only use a single core, today, is where there's only a single connection, and a single query, running. But that can reasonably be further constrained; not every single-connection query could be improved by trying to spread work across cores. We need to add some further assumptions: 2. The query needs to NOT be I/O-bound. If it's I/O bound, then your system is waiting for the data to come off disk, rather than to do processing of that data. That condition can be somewhat further strengthened... It further needs to be a query where multi-processing would not increase the I/O burden. Between those two assumptions, that cuts the scope of usefulness to a very considerable degree. And if we *are* multiprocessing, we introduce several new problems, each of which is quite troublesome: - How do we decompose the query so that the pieces are processed in ways that improve processing time? In effect, how to generate a parallel query plan? It would be more than stupid to consider this to be obvious. We've got 15-ish years worth of query optimization efforts that have gone into Postgres, and many of those changes were not obvious until after they got thought through carefully. This multiplies the complexity, and opportunity for error. - Coordinating processing Becomes quite a bit more complex. Multiple threads/processes are accessing parts of the same data concurrently, so a parallelized query that harnesses 8 CPUs might generate 8x as many locks and analogous coordination points. - Platform specificity Threading is a problem in that each OS platform has its own implementation, and even when they claim to conform to common standards, they still have somewhat different interpretations. This tends to go in one of the following directions: a) You have to pick one platform to do threading on. Oops. There's now PostgreSQL-Linux, that is the only platform where our multiprocessing thing works. It could be worse than that; it might work on a particular version of a particular OS... b) You follow some apparently portable threading standard And find that things are hugely buggy because the platforms follow the standard a bit differently. And perhaps this means that, analogous to a), you've got a set of platforms where this works (for some value of works), and others where it can't. That's almost as evil as a). c) You follow some apparently portable threading standard And need to wrap things in a pretty thick safety blanket to make sure it is compatible with all the bugs in interpretation and implementation. Complexity++, and performance probably suffers. None of these are particularly palatable, which is why threading proposals get a lot of pushback. At the end of the day, if this is
Re: [HACKERS] [PERFORM] Slow count(*) again...
mladen.gog...@vmsinfo.com (Mladen Gogala) writes: Hints are not even that complicated to program. The SQL parser should compile the list of hints into a table and optimizer should check whether any of the applicable access methods exist in the table. If it does - use it. If not, ignore it. This looks to me like a philosophical issue, not a programming issue. It's worth looking back to what has already been elaborated on in the ToDo. http://wiki.postgresql.org/wiki/Todo --- Optimizer hints (not wanted) Optimizer hints are used to work around problems in the optimizer and introduce upgrade and maintenance issues. We would rather have the problems reported and fixed. We have discussed a more sophisticated system of per-class cost adjustment instead, but a specification remains to be developed. --- The complaint is that kludging hints into a particular query attacks the problem from the wrong direction. The alternative recommended is to collect some declarative information, that *won't* be part of the query, that *won't* be processed by the parser, and that *won't* kludge up the query with information that is liable to turn into crud over time. Tom Lane was pretty specific about some kinds of declarative information that seemed useful: http://archives.postgresql.org/pgsql-hackers/2006-10/msg00663.php On Jeapordy, participants are expected to phrase one's answers in the form of a question, and doing so is rewarded. Based on the presence of query hints on the Not Wanted portion of the ToDo list, it's pretty clear that participants here are expected to propose optimizer hints in ways that do NOT involve decorating queries with crud. You'll get a vastly friendlier response if you at least make an attempt to attack the problem in the declarative information fashion. Perhaps we're all wrong in believing that pushing query optimization information into application queries by decorating the application with hints, is the right idea but it's a belief that certainly seems to be regularly agreed upon by gentle readers. -- cbbrowne,@,linuxdatabases.info The people's revolutionary committee has decided that the name e is retrogressive, unmulticious and reactionary, and has been flushed. Please update your abbrevs. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [HACKERS] [PERFORM] Slow count(*) again...
mladen.gog...@vmsinfo.com (Mladen Gogala) writes: I must say that this purist attitude is extremely surprising to me. All the major DB vendors support optimizer hints, yet in the Postgres community, they are considered bad with almost religious fervor. Postgres community is quite unique with the fatwa against hints. Well, the community declines to add hints until there is actual consensus on a good way to add hints. Nobody has ever proposed a way to add hints where consensus was arrived at that the way was good, so... -- http://linuxfinances.info/info/nonrdbms.html Rules of the Evil Overlord #192. If I appoint someone as my consort, I will not subsequently inform her that she is being replaced by a younger, more attractive woman. http://www.eviloverlord.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] the XID question
kevin.gritt...@wicourts.gov (Kevin Grittner) writes: Filip Rembia*kowskiplk.zu...@gmail.com wrote: 2011/1/19 Charles.Hou giveme...@gmail.com: select * from mybook SQL command also increase the XID ? Yes. Single SELECT is a transaction. Hence, it needs a transaction ID. No, not in recent versions of PostgreSQL. There's virtual transaction ID, too; which is all that's needed unless the transaction writes something. Also, as a fine point, if you use explicit database transactions (with BEGIN or START TRANSACTION) then you normally get one XID for the entire transaction, unless you use SAVEPOINTs. Erm, not *necessarily* in recent versions of PostgreSQL. A read-only transaction won't consume XIDs, but if you don't expressly declare it read-only, they're still liable to get eaten... -- (format nil ~S@~S cbbrowne gmail.com) http://www3.sympatico.ca/cbbrowne/lisp.html Parenthesize to avoid ambiguity. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] COPY TO stdout statements occurrence in log files
msakre...@truviso.com (Maciek Sakrejda) writes: Is this normal? I'm afraid because my application doesn't run this kind of statement, so how can I know what is doing these commands? Maybe pg_dump? I think pg_dump is likely, yes, if you have that scheduled. I don't think anything in the log file will identify it as pg_dump explicitly (I believe as far as the server is concerned, pg_dump is just another client), but if you're concerned about this, you can add the client pid (%p) to log_line_prefix in postgresql.conf, log the pg_dump pid through whatever mechanism manages that, and compare. That's an option... More are possible... 1. Our DBAs have been known to create users specifically for doing backups (dumpy). It doesn't seem like a *huge* proliferation of users to have some 'utility' user names for common processes. 2. In 9.1, there will be a new answer, as there's a GUC to indicate the application_name. -- Programming today is a race between software engineers striving to build bigger and better idiot-proof programs, and the Universe trying to produce bigger and better idiots. So far, the Universe is winning. -- Rich Cook -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] best db schema for time series data?
vindex+lists-pgsql-performa...@apartia.org (Louis-David Mitterrand) writes: On Tue, Nov 16, 2010 at 11:35:24AM -0500, Chris Browne wrote: vindex+lists-pgsql-performa...@apartia.org (Louis-David Mitterrand) writes: I have to collect lots of prices from web sites and keep track of their changes. What is the best option? 1) one 'price' row per price change: create table price ( id_price primary key, id_product integer references product, price integer ); 2) a single 'price' row containing all the changes: create table price ( id_price primary key, id_product integer references product, price integer[] -- prices are 'pushed' on this array as they change ); Which is bound to give the best performance, knowing I will often need to access the latest and next-to-latest prices? I'd definitely bias towards #1, but with a bit of a change... create table product ( id_product serial primary key ); create table price ( id_product integer references product, as_at timestamptz default now(), primary key (id_product, as_at), price integer ); Hi Chris, So an id_price serial on the price table is not necessary in your opinion? I am using order by id_price limit X or max(id_price) to get at the most recent prices. It (id_price) is an extra piece of information that doesn't reveal an important fact, namely when the price was added. I'm uncomfortable with adding data that doesn't provide much more information, and it troubles me when people put a lot of interpretation into the meanings of SERIAL columns. I'd like to set up some schemas (for experiment, if not necessarily to get deployed to production) where I'd use DCE UUID values rather than sequences, so that people wouldn't make the error of imagining meanings in the values that aren't really there. And I suppose that there lies a way to think about it... If you used UUIDs rather than SERIAL, how would your application break? And of the ways in which it would break, which of those are errors that fall from: a) Ignorant usage, assuming order that isn't really there? (e.g. - a SERIAL might capture some order information, but UUID won't!) b) Inadequate data capture, where you're using the implicit data collection from SERIAL to capture, poorly, information that should be expressly captured? When I added the timestamp to the price table, that's intended to address b), capturing the time that the price was added. The query to get the last 5 prices for a product should be splendidly efficient: select price, as_at from price where id_product = 17 order by as_at desc limit 5; (That'll use the PK index perfectly nicely.) If you needed higher performance, for latest price, then I'd add a secondary table, and use triggers to copy latest price into place: create table latest_prices ( id_product integer primary key references product, price integer ); I did the same thing with a 'price_dispatch' trigger and partitioned tables (inheritance). It's definitely needed when the price database grow into the millions. Thanks, The conversations are always interesting! Cheers! -- output = (cbbrowne @ gmail.com) http://www3.sympatico.ca/cbbrowne/x.html FLORIDA: If you think we can't vote, wait till you see us drive. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] best db schema for time series data?
vindex+lists-pgsql-performa...@apartia.org (Louis-David Mitterrand) writes: I have to collect lots of prices from web sites and keep track of their changes. What is the best option? 1) one 'price' row per price change: create table price ( id_price primary key, id_product integer references product, price integer ); 2) a single 'price' row containing all the changes: create table price ( id_price primary key, id_product integer references product, price integer[] -- prices are 'pushed' on this array as they change ); Which is bound to give the best performance, knowing I will often need to access the latest and next-to-latest prices? I'd definitely bias towards #1, but with a bit of a change... create table product ( id_product serial primary key ); create table price ( id_product integer references product, as_at timestamptz default now(), primary key (id_product, as_at), price integer ); The query to get the last 5 prices for a product should be splendidly efficient: select price, as_at from price where id_product = 17 order by as_at desc limit 5; (That'll use the PK index perfectly nicely.) If you needed higher performance, for latest price, then I'd add a secondary table, and use triggers to copy latest price into place: create table latest_prices ( id_product integer primary key references product, price integer ); create or replace function capture_latest_price () returns trigger as $$ declare begin delete from latest_prices where id_product = NEW.id_product; insert into latest_prices (id_product,price) values (NEW.id_product, NEW.price); return NEW; end $$ language plpgsql; create trigger price_capture after insert on price execute procedure capture_latest_price(); This captures *just* the latest price for each product. (There's a bit of race condition - if there are two concurrent price updates, one will fail, which wouldn't happen without this trigger in place.) -- ... Turns out that JPG was in fact using his brain... and I am inclined to encourage him to continue the practice even if it isn't exactly what I would have done myself. -- Alan Bawden (way out of context) -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Running PostgreSQL as fast as possible no matter the consequences
gentosa...@gmail.com (A B) writes: If you just wanted PostgreSQL to go as fast as possible WITHOUT any care for your data (you accept 100% dataloss and datacorruption if any error should occur), what settings should you use then? Use /dev/null. It is web scale, and there are good tutorials. But seriously, there *are* cases where blind speed is of use. When loading data into a fresh database is a good time for this; if things fall over, it may be pretty acceptable to start from scratch with mkfs/initdb. I'd: - turn off fsync - turn off synchronous commit - put as much as possible onto Ramdisk/tmpfs/similar as possible -- output = reverse(moc.liamg @ enworbbc) http://linuxfinances.info/info/lsf.html 43% of all statistics are worthless. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Slow count(*) again...
sgend...@ideasculptor.com (Samuel Gendler) writes: Geez. I wish someone would have written something quite so bold as 'xfs is always faster than ext3' in the standard tuning docs. I couldn't find anything that made a strong filesystem recommendation. How does xfs compare to ext4? I wound up on ext4 on a dell perc6 raid card when an unexpected hardware failure on a production system caused my test system to get thrown into production before I could do any serious testing of xfs. If there is a strong consensus that xfs is simply better, I could afford the downtime to switch. It's news to me (in this thread!) that XFS is actually getting some developer love, which is a pretty crucial factor to considering it relevant. XFS was an SGI creation, and, with: a) the not-scintillating performance of the company, b) the lack of a lot of visible work going into the filesystem, c) the paucity of support by Linux vendors (for a long time, if you told RHAT you were having problems, and were using XFS, the next step would be to park the ticket awaiting your installing a supported filesystem) it didn't look like XFS was a terribly good bet. Those issues were certainly causing concern a couple of years ago. Faster raw performance isn't much good if it comes with a risk of: - Losing data - Losing support from vendors If XFS now *is* getting support from both the development and support perspectives, then the above concerns may have been invalidated. It would be very encouraging, if so. -- output = (cbbrowne @ gmail.com) Rules of the Evil Overlord #228. If the hero claims he wishes to confess in public or to me personally, I will remind him that a notarized deposition will serve just as well. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] large dataset with write vs read clients
cr...@postnewspapers.com.au (Craig Ringer) writes: Hey, maybe I should try posting YouTube video answers to a few questions for kicks, see how people react ;-) And make sure it uses the same voice as is used in the MongoDB is web scale video, to ensure that people interpret it correctly :-). -- output = (cbbrowne @ gmail.com) http://linuxdatabases.info/info/nonrdbms.html The *Worst* Things to Say to a Police Officer: Hey, is that a 9 mm? That's nothing compared to this .44 magnum. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] large dataset with write vs read clients
mladen.gog...@vmsinfo.com (Mladen Gogala) writes: I have a logical problem with asynchronous commit. The commit command should instruct the database to make the outcome of the transaction permanent. The application should wait to see whether the commit was successful or not. Asynchronous behavior in the commit statement breaks the ACID rules and should not be used in a RDBMS system. If you don't need ACID, you may not need RDBMS at all. You may try with MongoDB. MongoDB is web scale: http://www.youtube.com/watch?v=b2F-DItXtZs The client always has the option of connecting to a set of databases, and stowing parts of the data hither and thither. That often leads to the relaxation called BASE. (And IBM has been selling that relaxation as MQ-Series since the early '90s!) There often *ARE* cases where it is acceptable for some of the data to not be as durable, because that data is readily reconstructed. This is particularly common for calculated/cached/aggregated data. Many things can get relaxed for a data warehouse data store, where the database is not authoritative, but rather aggregates data drawn from other authoritative sources. In such applications, neither the A, C, I, nor the D are pointedly crucial, in the DW data store. - We don't put the original foreign key constraints into the DW database; they don't need to be enforced a second time. Ditto for constraints of all sorts. - Batching of the loading of updates is likely to break several of the letters. And I find it *quite* acceptable to lose D if the data may be safely reloaded into the DW database. I don't think this is either cavalier nor that it points to MongoDB is web scale. -- cbbrowne,@,gmail.com Rules of the Evil Overlord #181. I will decree that all hay be shipped in tightly-packed bales. Any wagonload of loose hay attempting to pass through a checkpoint will be set on fire. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] read only transactions
jnelson+pg...@jamponi.net (Jon Nelson) writes: Are there any performance implications (benefits) to executing queries in a transaction where SET TRANSACTION READ ONLY; has been executed? Directly? No. Indirectly, well, a *leetle* bit... Transactions done READ ONLY do not generate actual XIDs, which reduces the amount of XID generation (pretty tautological!), which reduces the need to do VACUUM to protect against XID wraparound. http://www.postgresql.org/docs/8.4/static/routine-vacuuming.html#VACUUM-BASICS If you process 50 million transactions, that chews thru 50 million XIDs. If 45 million of those were processed via READ ONLY transactions, then the same processing only chews thru 5 million XIDs, meaning that the XID-relevant vacuums can be done rather less frequently. This only terribly much matters if: a) your database is so large that there are tables on which VACUUM would run for a very long time, and b) you are chewing through XIDs mighty quickly. If either condition isn't true, then the indirect effect isn't important either. -- let name=cbbrowne and tld=gmail.com in name ^ @ ^ tld;; I'm not switching from slrn. I'm quite confident that anything that *needs* to be posted in HTML is fatuous garbage not worth my time. -- David M. Cook davec...@home.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] GPU Accelerated Sorting
david_l...@boreham.org (David Boreham) writes: Feels like I fell through a worm hole in space/time, back to inmos in 1987, and a guy from marketing has just walked in the office going on about there's a customer who wants to use our massively parallel hardware to speed up databases... ... As long as you're willing to rewrite PostgreSQL in Occam 2... -- http://projects.cs.kent.ac.uk/projects/tock/trac/ The statistics on sanity are that one out of every four Americans is suffering from some form of mental illness. Think of your three best friends. If they're okay, then it's you. -- Rita Mae Brown -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Testing Sandforce SSD
g...@2ndquadrant.com (Greg Smith) writes: Yeb Havinga wrote: * What filesystem to use on the SSD? To minimize writes and maximize chance for seeing errors I'd choose ext2 here. I don't consider there to be any reason to deploy any part of a PostgreSQL database on ext2. The potential for downtime if the fsck doesn't happen automatically far outweighs the minimal performance advantage you'll actually see in real applications. Ah, but if the goal is to try to torture the SSD as cruelly as possible, these aren't necessarily downsides (important or otherwise). I don't think ext2 helps much in maximizing chances of seeing errors in notably useful ways, as the extra torture that takes place as part of the post-remount fsck isn't notably PG-relevant. (It's not obvious that errors encountered would be readily mapped to issues relating to PostgreSQL.) I think the WAL-oriented test would be *way* more useful; inducing work whose brokenness can be measured in one series of files in one directory should be way easier than trying to find changes across a whole PG cluster. I don't expect the filesystem choice to be terribly significant to that. -- cbbrowne,@,gmail.com Heuristics (from the French heure, hour) limit the amount of time spent executing something. [When using heuristics] it shouldn't take longer than an hour to do something. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Testing Sandforce SSD
j...@commandprompt.com (Joshua D. Drake) writes: On Sat, 2010-07-24 at 16:21 -0400, Greg Smith wrote: Greg Smith wrote: Note that not all of the Sandforce drives include a capacitor; I hope you got one that does! I wasn't aware any of the SF drives with a capacitor on them were even shipping yet, all of the ones I'd seen were the chipset that doesn't include one still. Haven't checked in a few weeks though. Answer my own question here: the drive Yeb got was the brand spanking new OCZ Vertex 2 Pro, selling for $649 at Newegg for example: http://www.newegg.com/Product/Product.aspx?Item=N82E16820227535 and with the supercacitor listed right in the main production specifications there. This is officially the first inexpensive (relatively) SSD with a battery-backed write cache built into it. If Yeb's test results prove it works as it's supposed to under PostgreSQL, I'll be happy to finally have a moderately priced SSD I can recommend to people for database use. And I fear I'll be out of excuses to avoid buying one as a toy for my home system. That is quite the toy. I can get 4 SATA-II with RAID Controller, with battery backed cache, for the same price or less :P Sure, but it: - Fits into a single slot - Is quiet - Consumes little power - Generates little heat - Is likely to be about as quick as the 4-drive array It doesn't have the extra 4TB of storage, but if you're building big-ish databases, metrics have to change anyways. This is a pretty slick answer for the small OLTP server. -- output = reverse(moc.liamg @ enworbbc) http://linuxfinances.info/info/postgresql.html Chaotic Evil means never having to say you're sorry. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PostgreSQL as a local in-memory cache
swamp...@noao.edu (Steve Wampler) writes: Or does losing WAL files mandate a new initdb? Losing WAL would mandate initdb, so I'd think this all fits into the set of stuff worth putting onto ramfs/tmpfs. Certainly it'll all be significant to the performance focus. -- select 'cbbrowne' || '@' || 'cbbrowne.com'; http://cbbrowne.com/info/internet.html MS apparently now has a team dedicated to tracking problems with Linux and publicizing them. I guess eventually they'll figure out this back fires... ;) -- William Burrow aa...@delete.fan.nb.ca -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] mysql to postgresql, performance questions
reeds...@rice.edu (Ross J. Reedstrom) writes: http://www.mythtv.org/wiki/PostgreSQL_Support That's a pretty hostile presentation... The page has had two states: a) In 2008, someone wrote up... After some bad experiences with MySQL (data loss by commercial power failure, very bad performance deleting old records and more) I would prefer to have a MythTV Application option to use PostgreSQL. I never saw such bad database behaviour at any other RDBMS than MySQL. I'm ready to contribute at any activity going that direction (I'm developer for commercial database applications). b) Deleted by GBee in 2009, indicating (Outdated, messy and unsupported) -- let name=cbbrowne and tld=gmail.com in String.concat @ [name;tld];; http://linuxfinances.info/info/spreadsheets.html A language that doesn't affect the way you think about programming, is not worth knowing. -- Alan J. Perlis -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] mysql to postgresql, performance questions
t...@sss.pgh.pa.us (Tom Lane) writes: Ross J. Reedstrom reeds...@rice.edu writes: On Sat, Mar 20, 2010 at 10:47:30PM -0500, Andy Colson wrote: (I added the and trust as an after thought, because I do have one very important 100% uptime required mysql database that is running. Its my MythTV box at home, and I have to ask permission from my GF before I take the box down to upgrade anything. And heaven forbid if it crashes or anything. So I do have experience with care and feeding of mysql. And no, I'm not kidding.) Andy, you are so me! I have the exact same one-and-only-one mission critical mysql DB, but the gatekeeper is my wife. And experience with that instance has made me love and trust PostgreSQL even more. So has anyone looked at porting MythTV to PG? It has come up several times on the MythTV list. http://david.hardeman.nu/files/patches/mythtv/mythletter.txt http://www.mythtv.org/pipermail/mythtv-dev/2004-August/025385.html http://www.mythtv.org/pipermail/mythtv-users/2006-July/141191.html Probably worth asking David Härdeman and Danny Brow who have proposed such to the MythTV community what happened. (It's possible that they will get cc'ed on this.) If there's a meaningful way to help, that would be cool. If not, then we might as well not run slipshot across the same landmines that blew the idea up before. -- Transported to a surreal landscape, a young girl kills the first woman she meets and then teams up with three complete strangers to kill again. -- Unknown, Marin County newspaper's TV listing for _The Wizard of Oz_ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Is DBLINK transactional
cr...@postnewspapers.com.au (Craig Ringer) writes: On 13/03/2010 5:54 AM, Jeff Davis wrote: On Fri, 2010-03-12 at 12:07 -0500, Merlin Moncure wrote: of course. You can always explicitly open a transaction on the remote side over dblink, do work, and commit it at the last possible moment. Your transactions aren't perfectly synchronized...if you crash in the precise moment between committing the remote and the local you can get in trouble. The chances of this are extremely remote though. If you want a better guarantee than that, consider using 2PC. Translation in case you don't know: 2PC = two phase commit. Note that you have to monitor lost transactions that were prepared for commit then abandoned by the controlling app and periodically get rid of them or you'll start having issues. There can be issues even if they're not abandoned... Note that prepared transactions establish, and maintain, until removed, all the appropriate locks on the underlying tables and tuples. As a consequence, maintenance-related activities may be somewhat surprisingly affected. foo=# begin; set transaction isolation level serializable; BEGIN SET foo=# insert into my_table (date_time, hostname, duration, diag) values (now(), 'foo', 1, 2); INSERT 0 1 foo=# prepare transaction 'foo'; PREPARE TRANSACTION [then, I quit the psql session...] foo=# select * from pg_locks where relation = (select oid from pg_class where relname = 'my_table'); -[ RECORD 1 ]--+- locktype | relation database | 308021 relation | 308380 page | tuple | virtualxid | transactionid | classid| objid | objsubid | virtualtransaction | -1/433653 pid| mode | RowExclusiveLock granted| t If I try to truncate the table... foo=# truncate my_table; [hangs, waiting on the lock...] [looking at another session...] foo=# select * from pg_locks where relation = (select oid from pg_class where relname = 'my_table'); -[ RECORD 1 ]--+ locktype | relation database | 308021 relation | 308380 page | tuple | virtualxid | transactionid | classid| objid | objsubid | virtualtransaction | -1/433653 pid| mode | RowExclusiveLock granted| t -[ RECORD 2 ]--+ locktype | relation database | 308021 relation | 308380 page | tuple | virtualxid | transactionid | classid| objid | objsubid | virtualtransaction | 2/13 pid| 3749 mode | AccessExclusiveLock granted| f Immediately upon submitting commit prepared 'foo';, both locks are resolved quite quickly. The problem with things that are extremely remote possibilities are that they tend to be less remote than we expect ;) ... and they know just when they can happen despite all the odds to maximise the pain and chaos caused. A lot of these kinds of things only come up as race conditions. The trouble is that a lot of races do wind up synchronizing themselves. In sporting events, this is intended and desired; an official fires the starter pistol or activates the horn, or what have you, with the intended result that athletes begin very nearly simultaneously. And at the end of Olympic races, their times frequently differ only by miniscule intervals. In my example up above, there's a possibly unexpected synchronization point; the interweaving of the PREPARE TRANSACTION and TRUNCATE requests lead to a complete lock against the table. Supposing 15 processes then try accessing that table, they'll be blocked until the existing locks get closed out. Which takes place the very instant after the COMMIT PREPARED request comes in. At that moment, 15 racers are released very nearly simultaneously. If there is any further mischief to be had in the race, well, they're set up to tickle it... -- let name=cbbrowne and tld=gmail.com in name ^ @ ^ tld;; http://linuxdatabases.info/info/nonrdbms.html Barf, what is all this prissy pedantry? Groups, modules, rings, ufds, patent-office algebra. Barf! -- R. William Gosper -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Master/Slave, DB separation or just spend $$$?
kelv...@gmail.com (Kelvin Quee) writes: I will go look at Slony now. It's worth looking at, but it is not always to be assumed that replication will necessarily improve scalability of applications; it's not a magic wand to wave such that presto, it's all faster! Replication is helpful from a performance standpoint if there is a lot of query load where it is permissible to look at *somewhat* out of date information. For instance, replication can be quite helpful for pushing load off for processing accounting data where you tend to be doing analysis on data from {yesterday, last week, last month, last year}, and where the data tends to be inherently temporal (e.g. - you're looking at transactions with dates on them). On the other hand, any process that anticipates *writing* to the master database will be more or less risky to try to shift over to a possibly-somewhat-behind 'slave' system, as will be anything that needs to be consistent with the master state. -- (reverse (concatenate 'string ofni.secnanifxunil @ enworbbc)) http://www3.sympatico.ca/cbbrowne/spiritual.html Nondeterminism means never having to say you're wrong. -- Unknown -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] superlative missuse
cl...@uah.es (Angel Alvarez) writes: more optimal plan... morreoptimal configuration... we suffer a 'more optimal' superlative missuse there is not so 'more optimal' thing but a simple 'better' thing. im not native english speaker but i think it still applies. If I wanted to be pedantic about it, I'd say that the word nearly is missing. That is, it would be strictly correct if one instead said more nearly optimal. I don't imagine people get too terribly confused by the lack of the word nearly, so I nearly don't care :-). -- select 'cbbrowne' || '@' || 'acm.org'; http://linuxfinances.info/info/languages.html Bureaucracies interpret communication as damage and route around it -- Jamie Zawinski -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] I have a fusion IO drive available for testing
craig_ja...@emolecules.com (Craig James) writes: Dave Cramer wrote: So I tried writing directly to the device, gets around 250MB/s, reads at around 500MB/s The client is using redhat so xfs is not an option. I'm using Red Hat and XFS, and have been for years. Why is XFS not an option with Red Hat? If you report practically any kind of problem, and you're using XFS, or JFS, or such, their support offering is to tell you to use a supported filesystem. -- let name=cbbrowne and tld=cbbrowne.com in name ^ @ ^ tld;; http://linuxdatabases.info/info/linuxxian.html The only thing better than TV with the sound off is Radio with the sound off. -- Dave Moon -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] scheduling autovacuum at lean hours only.
mallah.raj...@gmail.com (Rajesh Kumar Mallah) writes: why is it not a good idea to give end users control over when they want to run it ? It's not a particularly good idea to give end users things that they are likely then to *immediately* use to shoot themselves in the foot. Turning off vacuuming all day is the sort of thing that is indeed pretty certain to hurt you when you imagined it was going to help you. In particular, if you shut off autovac all day, heavily updated tables with certain sorts of (pretty common!) update patterns are certain to bloat up, to the point that you'll need to do CLUSTER/VACUUM FULL on them. In effect, the practical effect of autovacuum at lean hours only is more reasonably described as cancel autovacuum and revert to the elder policy of requiring users to do manual vacuuming. It's worth looking at how autovacuum has been evolving over time... - When it was introduced, 8.0-ish (maybe it was 8.1 when it became official), it was pretty primitive. Autovac was a single process, where you had three controls over behaviour: - You could run autovac, or not; - You could exclude specific tables from being processed by autovac - There is a capability to vacuum less aggressively by using delays to reduce autovac I/O usage - In 8.3, it was enhanced to add the notion of having multiple vacuum workers There was discussion about having one of those workers restrict itself to small tables, so that you'd never have the scenario where the workers were all busy and a small table that needed vacuuming was left unvacuumed for a long time. It appears that didn't happen, which seems unfortunate, but that's life... You should look at all the knobs that *are* offered before deciding a policy that may be harmful to performance. As things stand now, there are a couple of ways I could see tuning this: - You might check on the GUC variables autovacuum_vacuum_cost_delay and autovacuum_vacuum_cost_limit, which would allow you to restrict the I/O cost. This might allow you to run autovacuum all the time without adversely affecting performance. - You might come up with a list of the *LARGE* tables that you don't want vacuumed during the day, and set up a cron job that adds/drops them from the pg_autovacuum table at the appropriate times. This is definitely going to be more finicky, and requires a great deal more awareness of the tables being updated by your applications. It makes autovacuum a whole lot less automatic. There are known enhancements coming up: - In 8.4, there is a capability for VACUUM to only process the portions of the table known to have been altered. That ought to be a better answer than *any* of the fiddling suggested, to date. Right now, a VACUUM on public.my_huge_table, a table 18GB in size, will walk through the entire table, even though there were only a handful of pages where tuples were invalidated. This is almost certainly the single best improvement possible to resolve your issue; it seems likely to *directly* address the problem, and has the considerable merit of not requiring much if any configuration/reconfiguration/scheduling. -- (reverse (concatenate 'string gro.mca @ enworbbc)) http://linuxdatabases.info/info/ what would we do without C? we would have PASAL, BASI, OBOL, and Ommon Lisp. -- #Erik -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] PG performance in high volume environment (many INSERTs and lots of aggregation reporting)
phoenix.ki...@gmail.com (Phoenix Kiula) writes: [Ppsted similar note to PG General but I suppose it's more appropriate in this list. Apologies for cross-posting.] Hi. Further to my bafflement with the count(*) queries as described in this thread: http://archives.postgresql.org/pgsql-general/2009-01/msg00804.php It seems that whenever this question has come up, Postgresql comes up very short in terms of count(*) functions. The performance is always slow, because of the planner's need to guess and such. I don't fully understand how the statistics work (and the explanation on the PG website is way too geeky) but he columns I work with already have a stat level of 100. Not helping at all. That's definitely *NOT* due to planner's need to guess; it's due to there being some *specific* work that PostgreSQL needs to do that some other databases can avoid due to different storage strategies. The matter is quite succinctly described here: http://wiki.postgresql.org/wiki/Why_PostgreSQL_Instead_of_MySQL:_Comparing_Reliability_and_Speed_in_2007#Counting_rows_in_a_table I'll just take one excerpt: --- It is worth observing that it is only this precise form of aggregate that must be so pessimistic; if augmented with a WHERE clause like SELECT COUNT(*) FROM table WHERE status = 'something' PostgreSQL, MySQL, and most other database implementations will take advantage of available indexes against the restricted field(s) to limit how many records must be counted, which can greatly accelerate such queries. --- It is common for systems where it is necessary for aggregation reporting to be fast to do pre-computation of the aggregates, and that is in no way specific to PostgreSQL. If you need *really* fast aggregates, then it will be worthwhile to put together triggers or procedures or something of the sort to help pre-compute the aggregates. -- (reverse (concatenate 'string ofni.sesabatadxunil @ enworbbc)) http://linuxfinances.info/info/wp.html When you have eliminated the impossible, whatever remains, however improbable, must be the truth. -- Sir Arthur Conan Doyle (1859-1930), English author. Sherlock Holmes, in The Sign of Four, ch. 6 (1889). [...but see the Holmesian Fallacy, due to Bob Frankston... http://www.frankston.com/public/Essays/Holmesian%20Fallacy.asp] -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Intel's X25-M SSD
[EMAIL PROTECTED] (Merlin Moncure) writes: I think the SSD manufacturers made a tactical error chasing the notebook market when they should have been chasing the server market... That's a very good point; I agree totally! -- output = reverse(moc.enworbbc @ enworbbc) http://www3.sympatico.ca/cbbrowne/nonrdbms.html We are all somehow dreadfully cracked about the head, and sadly need mending. --/Moby-Dick/, Ch 17 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] OVERLAPS is slow
I'm doing some analysis on temporal usages, and was hoping to make use of OVERLAPS, but it does not appear that it makes use of indices. Couching this in an example... I created a table, t1, thus: metadata=# \d t1 Table public.t1 Column | Type | Modifiers +--+--- id | integer | not null default nextval('t1_id_seq'::regclass) t1 | timestamp with time zone | not null default now() t2 | timestamp with time zone | not null default 'infinity'::timestamp with time zone data | text | not null Indexes: t1_pkey PRIMARY KEY, btree (id) f2 btree (id) WHERE t2 = 'infinity'::timestamp with time zone t1t1 btree (t1) t1t2 btree (t2) When entries go in, they default to having an effective date range from now() until 'infinity'. I then went off and seeded a bunch of data into the table, inserting values: for i in `cat /etc/dictionaries-common/words | head 2000`; do psql -d metadata -c insert into t1 (data) values ('$i'); done Then, I started doing temporal updates, thus: for i in `cat /etc/dictionaries-common/words`; do psql -d metadata -c insert into t1 (data) values ('$i');update t1 set t2 = now() where t2 = 'infinity' and id in (select id from t1 where t2 = 'infinity' order by random() limit 1); done This terminates many of those entries, and creates a new one that is effective to infinity. After running this for a while, I have a reasonably meaningful amount of data in the table: metadata=# select count(*) from t1; select count(*) from t1 where t2 = 'infinity'; count 125310 (1 row) count --- 2177 (1 row) Searching for the active items in the table, via a constructed 'overlap': metadata=# explain analyze select count(*) from t1 where t1 = now() and t2 = now(); QUERY PLAN -- Aggregate (cost=98.13..98.14 rows=1 width=0) (actual time=8.104..8.105 rows=1 loops=1) - Index Scan using t1t2 on t1 (cost=0.00..93.95 rows=1671 width=0) (actual time=0.116..6.374 rows=2177 loops=1) Index Cond: (t2 = now()) Filter: (t1 = now()) Total runtime: 8.193 ms (5 rows) Note, that makes use of the index on column t2, and runs nice and quick. (And notice that the rows found, 2177, agrees with the earlier count.) Unfortunately, when I try using OVERLAPS, it reverts to a Seq Scan. metadata=# explain analyze select * from t1 where (t1,t2) overlaps (now(), now()); QUERY PLAN --- Seq Scan on t1 (cost=0.00..3156.59 rows=43135 width=24) (actual time=171.248..205.941 rows=2177 loops=1) Filter: overlaps(t1, t2, now(), now()) Total runtime: 207.508 ms (3 rows) I would surely think that I have enough data in the table for the stats to be good, and the first query certainly does harness the index on t2 to determine if records are overlapping (now(),now()). Is it possible that we need to have some improvement to the optimizer so that OVERLAPS could make use of the indices? -- select 'cbbrowne' || '@' || 'cbbrowne.com'; http://linuxfinances.info/info/lsf.html Very little is known about the War of 1812 because the Americans lost it. -- Eric Nicol -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Replication Syatem
[EMAIL PROTECTED] (Gauri Kanekar) writes: Basically we have some background process which updates table1 and we don't want the application to make any changes to table1 while vacuum. Vacuum requires exclusive lock on table1 and if any of the background or application is ON vacuum don't kick off. Thats the reason we need to get the site down. VACUUM has not required an exclusive lock on tables since version 7.1. What version of PostgreSQL are you running? -- output = (cbbrowne @ acm.org) http://linuxdatabases.info/info/sap.html Rules of the Evil Overlord #192. If I appoint someone as my consort, I will not subsequently inform her that she is being replaced by a younger, more attractive woman. http://www.eviloverlord.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Where do a novice do to make it run faster?
[EMAIL PROTECTED] (A B) writes: So, it is time to improve performance, it is running to slow. AFAIK (as a novice) there are a few general areas: 1) hardware 2) rewriting my queries and table structures 3) using more predefined queries 4) tweek parameters in the db conf files Of these points: 1) is nothing I can do about right now, but in the future perhaps. 2) will be quite hard right now since there is more code than time. 3) almost like 2 but perhaps more do-able with the current constraints. 4) This seems to be the easiest one to start with... So what should I do/read concerning point 4? If you have other good suggestions I'd be very interested in that. Thank you :-) In the order of ease of implementation, it tends to be... 1. Tweak postgresql.conf 2. Make sure you ran VACUUM + ANALYZE 3. Find some expensive queries and try to improve them, which might involve changing the queries and/or adding relevant indices 4. Add RAM to your server 5. Add disk to your server 6. Redesign your application's DB schema so that it is more performant by design URL below may have some material of value... -- select 'cbbrowne' || '@' || 'cbbrowne.com'; http://linuxfinances.info/info/postgresqlperformance.html It is usually a good idea to put a capacitor of a few microfarads across the output, as shown. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Replication Syatem
[EMAIL PROTECTED] (Gauri Kanekar) writes: We have a table table1 which get insert and updates daily in high numbers, bcoz of which its size is increasing and we have to vacuum it every alternate day. Vacuuming table1 take almost 30min and during that time the site is down. We need to cut down on this downtime.So thought of having a replication system, for which the replicated DB will be up during the master is getting vacuumed. Can anybody guide which will be the best suited replication solution for this. The only reason that it would be necessary for VACUUM to take the site down would be if you are running version 7.1, which was obsoleted in 2002, which, it should be noted, was SIX YEARS AGO. As has been noted, you seem to be presupposing a remarkably complex solution to resolve a problem which is likely to be better handled via running VACUUM rather more frequently. -- output = reverse(ofni.sesabatadxunil @ enworbbc) http://www3.sympatico.ca/cbbrowne/postgresql.html Rules of the Evil Overlord #181. I will decree that all hay be shipped in tightly-packed bales. Any wagonload of loose hay attempting to pass through a checkpoint will be set on fire. http://www.eviloverlord.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Message queue table..
[EMAIL PROTECTED] (Jesper Krogh) writes: I have this message queue table.. currently with 8m+ records. Picking the top priority messages seem to take quite long.. it is just a matter of searching the index.. (just as explain analyze tells me it does). Can anyone digest further optimizations out of this output? (All records have funcid=4) # explain analyze SELECT job.jobid, job.funcid, job.arg, job.uniqkey, job.insert_time, job.run_after, job.grabbed_until, job.priority, job.coalesce FROM workqueue.job WHERE (job.funcid = 4) AND (job.run_after = 1208442668) AND (job.grabbed_until = 1208442668) AND (job.coalesce = 'Efam') ORDER BY funcid, priority ASC LIMIT 1 ; There might be value in having one or more extra indices... Here are *plausible* candidates: 1. If funcid = 4 is highly significant (e.g. - you are always running this query, and funcid often 4), then you might add a functional index such as: create index job_funcid_run_after on workqueue.job (run_after) where funcid = 4; create index job_funcid_grabbeduntil on workqueue.job (grabbed_until) where funcid = 4; 2. Straight indices like the following: create index job_run_after on workqueue.job(run_after); create index job_grabbed_until on workqueue.job(grabbed_until); create index job_funcid on workqueue.job(funcid); create index job_coalesce on workqueue.job(coalesce); Note that it is _possible_ (though by no means guaranteed) that all three might prove useful, if you're running 8.1+ where PostgreSQL supports bitmap index scans. Another possibility... 3. You might change your process to process multiple records in a run so that you might instead run the query (perhaps via a cursor?) with LIMIT [Something Bigger than 1]. It does seem mighty expensive to run a 245ms query to find just one record. It seems quite likely that you could return the top 100 rows (LIMIT 100) without necessarily finding it runs in any more time. Returning 100 tuples in 245ms seems rather more acceptable, no? :-) -- (format nil [EMAIL PROTECTED] cbbrowne linuxfinances.info) http://linuxdatabases.info/info/linuxdistributions.html Rules of the Evil Overlord #32. I will not fly into a rage and kill a messenger who brings me bad news just to illustrate how evil I really am. Good messengers are hard to come by. http://www.eviloverlord.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Oddly slow queries
[EMAIL PROTECTED] (Thomas Spreng) writes: On 16.04.2008, at 01:24, PFC wrote: The queries in question (select's) occasionally take up to 5 mins even if they take ~2-3 sec under normal conditions, there are no sequencial scans done in those queries. There are not many users connected (around 3, maybe) to this database usually since it's still in a testing phase. I tried to hunt down the problem by playing around with resource usage cfg options but it didn't really made a difference. Could that be caused by a CHECKPOINT ? actually there are a few log (around 12 per day) entries concerning checkpoints: LOG: checkpoints are occurring too frequently (10 seconds apart) HINT: Consider increasing the configuration parameter checkpoint_segments. But wouldn't that only affect write performance? The main problems I'm concerned about affect SELECT queries. No, that will certainly NOT just affect write performance; if the postmaster is busy writing out checkpoints, that will block SELECT queries that are accessing whatever is being checkpointed. When we were on 7.4, we would *frequently* see SELECT queries that should be running Very Quick that would get blocked by the checkpoint flush. We'd periodically see hordes of queries of the form: select id from some_table where unique_field = 'somevalue.something'; which would normally run in less than 1ms running for (say) 2s. And the logs would show something looking rather like the following: 2008-04-03 09:01:52 LOG select id from some_table where unique_field = 'somevalue.something'; - 952ms 2008-04-03 09:01:52 LOG select id from some_table where unique_field = 'somevalue.something'; - 742ms 2008-04-03 09:01:52 LOG select id from some_table where unique_field = 'another.something'; - 1341ms 2008-04-03 09:01:52 LOG select id from some_table where unique_field = 'somevalue.something'; - 911ms 2008-04-03 09:01:52 LOG select id from some_table where unique_field = 'another.something'; - 1244ms 2008-04-03 09:01:52 LOG select id from some_table where unique_field = 'another.something'; - 2311ms 2008-04-03 09:01:52 LOG select id from some_table where unique_field = 'another.something'; - 1799ms 2008-04-03 09:01:52 LOG select id from some_table where unique_field = 'somevalue.something'; - 1992ms This was happening because the checkpoint was flushing those two tuples, and hence blocking 8 SELECTs that came in during the flush. There are two things worth considering: 1. If the checkpoints are taking place too frequently, then that is clear evidence that something is taking place that is injecting REALLY heavy update load on your database at those times. If the postmaster is checkpointing every 10s, that implies Rather Heavy Load, so it is pretty well guaranteed that performance of other activity will suck at least somewhat because this load is sucking up all the I/O bandwidth that it can. So, to a degree, there may be little to be done to improve on this. 2. On the other hand, if you're on 8.1 or so, you may be able to configure the Background Writer to incrementally flush checkpoint data earlier, and avoid the condition of 1. Mind you, you'd have to set BgWr to be pretty aggressive, based on the 10s periodicity that you describe; that may not be a nice configuration to have all the time :-(. -- output = reverse(ofni.sesabatadxunil @ enworbbc) http://cbbrowne.com/info/multiplexor.html Nagging is the repetition of unpalatable truths. --Baroness Edith Summerskill -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Background writer underemphasized ...
[EMAIL PROTECTED] (Marinos Yannikos) writes: This helped with our configuration: bgwriter_delay = 1ms # 10-1ms between rounds bgwriter_lru_maxpages = 1000 # 0-1000 max buffers written/round FYI, I'd be inclined to reduce both of those numbers, as it should reduce the variability of behaviour. Rather than cleaning 1K pages every 10s, I would rather clean 100 pages every 1s, as that will have much the same effect, but spread the work more evenly. Or perhaps 10 pages every 100ms... Cut the delay *too* low and this might make the background writer, in effect, poll *too* often, and start chewing resources, but there's doubtless some sweet spot in between... -- cbbrowne,@,cbbrowne.com http://linuxdatabases.info/info/oses.html For systems, the analogue of a face-lift is to add to the control graph an edge that creates a cycle, not just an additional node. -- Alan J. Perlis -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] which is more important? freq of checkpoints or the duration of them?
[EMAIL PROTECTED] (Douglas J Hunley) writes: Subject about says it all. Should I be more concerned about checkpoints happening 'frequently' or lasting 'longer'? In other words, is it ok to checkpoint say, every 5 minutes, if it only last a second or three or better to have checkpoints every 10 minutes that last half a minute? Stupid examples probably, but you get my point I hope :) Well, with the (new-in-8.1) background writer, you should be able to have whatever combination might appear attractive, as the point of the background writer is to push out dirty pages. Pre-8.1, your choice would be either to: a) Flush frequently, and so have the checkpoints be of short duration, or b) Flush infrequently, so that the checkpoint flushes would have a long duration. Now, if you have reasonable settings (I'm not sure how well its tuning is documented :-(), checkpoint flushes should be able to be short, however infrequent they may be. In effect, the oops, the database got blocked by checkpoint flushing issue should now be gone... The issue that then remains is whether to checkpoint often, in which case crash recovery will tend to be be quicker, or whether to checkpoint seldom, in which case crash recovery will have fewer checkpoints to choose from, and hence will run somewhat longer. If your systems don't crash much, and recovery time isn't a big deal, then this probably doesn't much matter... -- (reverse (concatenate 'string ofni.sesabatadxunil @ enworbbc)) http://linuxfinances.info/info/sap.html I don't plan to maintain it, just to install it. -- Richard M. Stallman -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your Subscription: http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.orgextra=pgsql-performance
Re: [PERFORM] Making the most of memory?
[EMAIL PROTECTED] (Scott Marlowe) writes: On Jan 23, 2008 1:57 PM, Guy Rouillier [EMAIL PROTECTED] wrote: Scott Marlowe wrote: I assume you're talking about solid state drives? They have their uses, but for most use cases, having plenty of RAM in your server will be a better way to spend your money. For certain high throughput, relatively small databases (i.e. transactional work) the SSD can be quite useful. Unless somebody has changes some physics recently, I'm not understanding the recent discussions of SSD in the general press. Flash has a limited number of writes before it becomes unreliable. On good quality consumer Actually, I was referring to all SSD systems, some of which are based on flash memory, some on DRAM, sometimes backed by hard drives. There's always a use case for a given piece of tech. Yeah, I could see an SSD making use of a mixture of technologies... - Obviously, it needs a pile of RAM. - Then, have a battery that can keep the RAM backed up for [a while]. - If power goes out, then contents of RAM get copied out to the flash memory. In this context, flash has several merits over disk drives. Notably, the absence of moving mechanical parts means: - Hopefully lower power consumption than a disk drive - Less fragility than a disk drive - Quite likely the flash will be smaller than a disk drive The fact that the number of writes may be limited should only be an important factor if power goes out *INCREDIBLY* frequently, as data only gets written upon power loss. The combination of RAM + battery + flash looks like a real winner, when they are combined using a protocol that takes advantage of their strengths, and which doesn't rest on their weaknesses. -- let name=cbbrowne and tld=cbbrowne.com in String.concat @ [name;tld];; http://cbbrowne.com/info/advocacy.html Roses are red Violets are blue Some poems rhyme But this one doesn't. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Making the most of memory?
[EMAIL PROTECTED] (Florian Weimer) writes: So, that web site seems to list products starting at about 32GB in a separate rack-mounted box with redundant everything. I'd be more interested in just putting the WAL on an SSD device, so 500MB or 1GB would be quite sufficient. Can anyone point me towards such a device? A dedicated RAID controller with battery-backed cache of ssuficient size and two mirrored disks should not perform that bad, and has the advantage of easy availability. That won't provide as souped up performance as WAL on SSD, and it's from technical people wishing for things that some of those things actually emerge... It appears that the SSD market place is going pretty nuts right now as vendors are hawking flash-based SSD devices that are specifically targeted at replacing disk drives for laptops. I agree that there would be a considerable value for DBMS applications in having availability of a device that combines the strengths of both Flash (persistence) and DRAM (sheer volume of IOPs) to provide something better than they offered alone. I expect that the standard size for this is more likely to be 32GB than 1GB, what with modern shrinkage of physical sizing... -- let name=cbbrowne and tld=linuxfinances.info in String.concat @ [name;tld];; http://www3.sympatico.ca/cbbrowne/spiritual.html When we write programs that learn, it turns out that we do and they don't. -- Alan J. Perlis ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] viewing source code
[EMAIL PROTECTED] (Roberts, Jon) writes: I think it is foolish to not make PostgreSQL as feature rich when it comes to security as the competition because you are idealistic when it comes to the concept of source code. PostgreSQL is better in many ways to MS SQL Server and equal to many features of Oracle but when it comes to security, it is closer to MS Access. I don't think that's quite fair. There most certainly *is* a rich set of security features in PostgreSQL, with some not-unreasonable defaults, to the point that it has been pointed at as being 'more secure out of the box' than pretty well any DBMS. When people try to put security measures into the database that are intended to secure it from, yea, verily, even the DBAs, it often appears that once the feature list gets long enough, the critical faculties of peoples' brains seem to shut off. They seem to imagine that since there's a named set of features, that: a) They are actually usable, and b) They actually accomplish what they claim to be intended for. Frequently, neither condition is true. We've run into cases where attempts to manage fairly complex sets of role-based security pretty much falls apart (e.g. - they are not usable) because for it to work, it's necessary that too many people understand and follow the security design. When *reality* is that the developers build things in an ad-hoc fashion without regard to security, then you've got a ball of mud, from a security standpoint, that no amount of pounding will force into the rigidly-defined security hole. Note that ad-hoc reporting and analysis will always tend to fall into this ball of mud category. They don't know what data they need until they start exploring the problem they're given, and that tends to fit Really Badly with any attempt to strictly define security access. Usability (item a) is troublesome :-(. When you write about trying to hide source code and the likes, we start thinking of item b), the matter of whether it actually accomplishes what is claimed. -- [Vizzini has just cut the rope The Dread Pirate Roberts is climbing up] Vizzini: HE DIDN'T FALL? INCONCEIVABLE. Inigo Montoya: You keep using that word. I do not think it means what you think it means. -- People seem to think that adding passwords, encrypting things, whether via private or public key encryption, or other obfuscation provides security. Rephrasing Inigo Montoy, I am not so sure that provides security means what you think it means. I worked one place where I heard a tale of Payroll of Years Past. They used to manage executive payroll (for a Fortune 500 company, hence with some multi-million dollar paycheques!) via temporarily adding the data into the peons' system. They had this clever idea: - We want to keep the execs' numbers secret from the peons who run the system. - Ergo, we'll load the data in, temporarily, run the cheques, whilst having someone watch that the peons aren't reading anything they shouldn't. - Then we'll reverse that data out, and the peons won't know what they shouldn't know. Unfortunately, the joker that thought this up didn't realize that the transactional system would record those sets of changes multiple times. So anyone looking over the audit logs would see the Secret Values listed, not once, but twice. And they couldn't purge those audit logs without bringing down the wrath of the auditors; to do so would be to invalidate internal controls that they spent more money than those executive salaries on. Duh. They quickly shifted Executive Payroll to be managed, by hand, by certain members of the executives' administrative staff. That's much the same kind of problem that pops up here. You may *imagine* that you're hiding the stored procedures, but if they're sufficiently there that they can be run, they obviously aren't hidden as far as the DBMS is concerned, and there can't be *too* much of a veil between DBA and DBMS, otherwise you have to accept that the system is not intended to be manageable. We've done some thinking about how to try to hide this information; unfortunately, a whole lot of the mechanisms people think of simply don't work. Vendors may *claim* that their products are secure, but that may be because they know their customers neither know nor truly care what the word means; they merely feel reassured because it's inconceivable (in roughly the _Princess Bride_ sense!) to break the security of the product. -- let name=cbbrowne and tld=linuxfinances.info in name ^ @ ^ tld;; http://cbbrowne.com/info/spreadsheets.html Rules of the Evil Overlord #109. I will see to it that plucky young lads/lasses in strange clothes and with the accent of an outlander shall REGULARLY climb some monument in the main square of my capital and denounce me, claim to know the secret of my power, rally the masses to rebellion, etc. That way, the citizens will be jaded in case the real
Re: [PERFORM] Need to run CLUSTER to keep performance
[EMAIL PROTECTED] (Rafael Martinez) writes: Heikki Linnakangas wrote: On a small table like that you could run VACUUM every few minutes without much impact on performance. That should keep the table size in check. Ok, we run VACUUM ANALYZE only one time a day, every night. But we would espect the performance to get ok again after running vacuum, and it doesn't. Only CLUSTER helps. You have characterized the shape of the problem Right There. If you only VACUUM that table once a day, then it has a whole day to get cluttered with dead tuples, which increases its size to encompass 651 pages, and NOTHING ever allows it to shrink back to a small size. Plain VACUUM (or VACUUM ANALYZE) does not attempt to shrink table sizes. Only VACUUM FULL and CLUSTER do that. Here are some options to parameterize your choices: - If you vacuum the table often enough that only 10% of the table consists of dead tuples, then you can expect the table to perpetually have 10% of dead space. - If you vacuum the table seldom enough that 90% of the table may be expected to consist of dead tuples, then you can expect this table to consistently have 90% of its space be dead. It sounds like this particular table needs to be vacuumed quite a bit more frequently than once a day. On our systems, we have certain tables where tuples get killed off so frequently that we find it worthwhile to vacuum those tables once every two to three minutes. If we didn't, we'd see application performance bog down until it forced us to CLUSTER or VACUUM FULL the table. -- cbbrowne,@,acm.org http://linuxfinances.info/info/linux.html How much more helpful could I be than to provide you with the appropriate e-mail address? I could engrave it on a clue-by-four and deliver it to you in Chicago, I suppose. -- Seen on Slashdot... ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Bunching transactions
[EMAIL PROTECTED] (Jean-David Beyer) writes: But what is the limitation on such a thing? In this case, I am just populating the database and there are no other users at such a time. I am willing to lose the whole insert of a file if something goes wrong -- I would fix whatever went wrong and start over anyway. But at some point, disk IO would have to be done. Is this just a function of how big /pgsql/data/postgresql.conf's shared_buffers is set to? Or does it have to do with wal_buffers and checkpoint_segments? I have done bulk data loads where I was typically loading hundreds of thousands of rows in as a single transaction, and it is worth observing that loading in data from a pg_dump will do exactly the same thing, where, in general, each table's data is loaded as a single transaction. It has tended to be the case that increasing the number of checkpoint segments is helpful, though it's less obvious that this is the case in 8.2 and later versions, what with the ongoing changes to checkpoint flushing. In general, this isn't something that typically needs to get tuned really finely; if you tune your DB, in general, pretty big transactions should generally work fine, up to rather large sizes of pretty big. -- cbbrowne,@,acm.org http://linuxdatabases.info/info/languages.html Why use Windows, since there is a door? -- [EMAIL PROTECTED] Andre Fachat ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Bunching transactions
[EMAIL PROTECTED] (Jean-David Beyer) writes: Chris Browne wrote: [EMAIL PROTECTED] (Jean-David Beyer) writes: But what is the limitation on such a thing? In this case, I am just populating the database and there are no other users at such a time. I am willing to lose the whole insert of a file if something goes wrong -- I would fix whatever went wrong and start over anyway. But at some point, disk IO would have to be done. Is this just a function of how big /pgsql/data/postgresql.conf's shared_buffers is set to? Or does it have to do with wal_buffers and checkpoint_segments? I have done bulk data loads where I was typically loading hundreds of thousands of rows in as a single transaction, and it is worth observing that loading in data from a pg_dump will do exactly the same thing, where, in general, each table's data is loaded as a single transaction. I guess a reasonable standard of performance would be that if my initial population of the database takes only a little longer than a restore of the database using pg_restore, I am pretty close, and that is good enough. Of course, the restore depends on how fast my tape drive can pull the tape -- it claims up to 12 MB/sec transfer rate, so it looks as though it will be tape-limited rather than postgreSQL-limited. That's quite possible. There is a further factor, which is that grouping things into larger transactions has very clearly diminishing returns. Supposing you have a stream of 50,000 operations updating one tuple (those could be UPDATE, DELETE, or INSERT; it is not, at first order, material what sort they are), then the effects of grouping are thus... - With none... Cost = cost of doing 50,000 updates + cost of doing 50,000 COMMITs - If you COMMIT after every 2 updates Cost = cost of doing 50,000 updates + cost of doing 25,000 COMMITs - If you COMMIT after every 10 updates Cost = cost of doing 50,000 updates + cost of doing 5,000 COMMITs - If you COMMIT after every 100 updates Cost = cost of doing 50,000 updates + cost of doing 500 COMMITs The amount of work that COMMIT does is fairly much constant, regardless of the number of updates in the transaction, so that the cost, in that equation, of COMMITs pretty quickly evaporates to irrelevancy. And increasing the sizes of the transactions does not give you *increasing* performance improvements; the improvements will tend to decline. I wouldn't worry about trying to strictly minimize the number of transactions COMMITted; once you have grouped enough data into one transaction, that should be good enough. Further, the Right Thing is to group related data together, and come up with a policy that is driven primarily by the need for data consistency. If things work well enough, then don't go off trying to optimize something that doesn't really need optimization, and perhaps break the logic of the application. -- output = (cbbrowne @ acm.org) http://cbbrowne.com/info/unix.html Users should cultivate an ability to make the simplest molehill into a mountain by finding controversial interpretations of innocuous sounding statements that the sender never intended or imagined. -- from the Symbolics Guidelines for Sending Mail ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] how to improve the performance of creating index
[EMAIL PROTECTED] (Yinan Li) writes: I am trying to improve the performance of creating index.:p I've set shared_buffers = 1024MB:p Effective_cache_size = 1024MB:p Work_mem = 1GB:p Maintenance_work_mem=512MB:p (I'm sure that the server process has received the SIGHUP signal):p However, when create index, I found that the memory used by Postgres is only 50MB. And it is very slow. How to make it faster?:p All helps are appreciated.:p Those values seem rather large, with the exception of the effective cache size, which I would expect to be somewhat bigger, based on the other values. Note that the values for work_mem and maintenance_work_mem get used each time something is sorted or maintained. So if those values get set high, this can pretty easily lead to scary amounts of swapping, which would tend to lead to things getting very slow. You may want to do a census as to how much resources you have on the server. Knowing that would help people make more rational evaluations of whether your parameters are sensible or not. -- (reverse (concatenate 'string moc.enworbbc @ enworbbc)) http://linuxdatabases.info/info/languages.html All syllogisms have three parts, therefore this is not a syllogism. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] performance of like queries
[EMAIL PROTECTED] (Kevin Kempter) writes: any suggestions for improving LIKE '%text%' queries? If you know that the 'text' portion of that query won't change, then you might create a partial index on the boolean condition. That is, create index index_foo_text on my_table (tfield) where (tfield like '%text%'); I somehow doubt that is the case; more likely you want to be able to search for: select * from my_table where tfield like '%this%'; select * from my_table where tfield like '%that%'; select * from my_table where tfield like '%the other thing%'; There are basically three choices, at that point: 1. Get more memory, and hope that you can have all the data get cached in memory. 2. Get more better disk, so that you can scan the table faster on disk. 3. Look into tsearch2, which provides a full text search capability. -- (format nil [EMAIL PROTECTED] cbbrowne linuxdatabases.info) http://cbbrowne.com/info/x.html We're born with a number of powerful instincts, which are found across all cultures. Chief amongst these are a dislike of snakes, a fear of falling, and a hatred of popup windows -- Vlatko Juric-Kokic ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Clustered tables improves perfs ?
[EMAIL PROTECTED] (Patrice Castet) writes: I wonder if clustering a table improves perfs somehow ? Any example/ideas about that ? ref : http://www.postgresql.org/docs/8.2/interactive/sql-cluster.html Sometimes. 1. It compacts the table, which may be of value, particularly if the table is not seeing heavy UPDATE/DELETE traffic. VACUUM and VACUUM FULL do somewhat similar things; if you are using VACUUM frequently enough, this is not likely to have a material effect. 2. It transforms the contents of the table into some specified order, which will improve efficiency for any queries that use that specific ordering. -- output = reverse(moc.enworbbc @ enworbbc) http://linuxdatabases.info/info/emacs.html You can swear at the keyboard and it won't be offended. It was going to treat you badly anyway -- Arthur Norman ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] 8.2 Autovacuum BUG ?
[EMAIL PROTECTED] (Pallav Kalva) writes: Tom Lane wrote: Pallav Kalva [EMAIL PROTECTED] writes: We turned on autovacuums on 8.2 and we have a database which is read only , it is basically a USPS database used only for address lookups (only SELECTS, no updates/deletes/inserts). This database has about 10gig data and yesterday autovacuum started on this database and all of a sudden I see lot of archive logs generated during this time, I guess it might have generated close to 3-4gig data during this period. Probably represents freezing of old tuples, which is a WAL-logged operation as of 8.2. Is it likely that the data is 200M transactions old? If nothing changed on these tables how can it freeze old tuples ? It does so very easily, by changing the XID from whatever it was to 2 (which indicates that a tuple has been frozen.) I don't imagine you were wondering how it is done - more likely you were wondering why. Why is to prevent transaction ID wraparound failures. Does it mean that once it reaches 200M transactions it will do the same thing all over again ? It won't freeze those same tuples again, as they're obviously already frozen, but a vacuum next week may be expected to freeze tuples that are roughly a week newer. If I am doing just SELECTS on these tables ? how can there be any transactions ? or SELECTS considered transactions too ? Every query submitted comes in the context of a transaction. If there wasn't a BEGIN submitted somewhere, then yes, every SELECT could potentially invoke a transaction, irrespective of whether it writes data or not. If you submit a million SELECT statements, yes, that could, indeed, indicate a million transactions. -- let name=cbbrowne and tld=cbbrowne.com in name ^ @ ^ tld;; http://cbbrowne.com/info/nonrdbms.html How much deeper would the ocean be if sponges didn't live there? ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] 8.2 Autovacuum BUG ?
[EMAIL PROTECTED] (Pallav Kalva) writes: Mark Lewis wrote: On Fri, 2007-08-31 at 12:25 -0400, Pallav Kalva wrote: Can you please correct me if I am wrong, I want to understand how this works. Based on what you said, it will run autovacuum again when it passes 200M transactions, as SELECTS are transactions too and are going on these tables. But the next time when it runs autovacuum, it shouldnt freeze the tuples again as they are already frozen and wont generate lot of archive logs ? Or is this because of it ran autovacuum for the first time on this db ? just the first time it does this process ? That is correct. The tuples are now frozen, which means that they will not need to be frozen ever again unless you insert/update any records. My main concern is filling up my disk with archive logs, so from all the replies I get is that since tuples are already frozen, next time when it runs autovacuum it wont generate any archive logs. Is my assumption right ? No, your assumption is wrong. Later vacuums will not generate archive files for the tuples that were *previously* frozen, but if you have additional tuples that have gotten old enough to reach the freeze point, THOSE tuples will get frozen, and so you'll continue to see archive logs generated. And this is Certainly Not A Bug. If the system did not do this, those unfrozen tuples would eventually disappear when your current transaction XID rolls over. The freezing is *necessary.* -- let name=cbbrowne and tld=cbbrowne.com in name ^ @ ^ tld;; http://linuxdatabases.info/info/unix.html Rules of the Evil Overlord #86. I will make sure that my doomsday device is up to code and properly grounded. http://www.eviloverlord.com/ ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] Update table performance
[EMAIL PROTECTED] (Mark Makarowsky) writes: I have a table with 4,889,820 records in it. The table also has 47 fields. I'm having problems with update performance. Just as a test, I issued the following update: update valley set test='this is a test' This took 905641 ms. Isn't that kind of slow? There aren't any indexes, triggers, constraints or anything on this table. The version of Postgres is PostgreSQL 8.2.4 on i686-pc-mingw32, compiled by GCC gcc.exe (GCC) 3.4.2 (mingw-special). The operating environment is Windows 2003 Standard Edition w/service pack 2. It is 2.20 Ghz with 1.0 GB of RAM. Here is the results from Explain: Seq Scan on valley (cost=0.00..1034083.57 rows=4897257 width=601) Here are the settings in the postgresql.conf. Any ideas or is this the expected speed? Hmm. - You asked to update 4,889,820 records. - It's a table consisting of 8.5GB of data (based on the cost info) For this to take 15 minutes doesn't seem particularly outrageous. -- output = (cbbrowne @ acm.org) http://cbbrowne.com/info/oses.html Rules of the Evil Overlord #65. If I must have computer systems with publically available terminals, the maps they display of my complex will have a room clearly marked as the Main Control Room. That room will be the Execution Chamber. The actual main control room will be marked as Sewage Overflow Containment. http://www.eviloverlord.com/ ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] performance of postgresql in replication using slony
[EMAIL PROTECTED] (Jeff Davis) writes: On Thu, 2007-07-26 at 01:44 -0700, angga erwina wrote: Hi all, whats the benefits of replication by using slony in postgresql?? My office is separate in several difference place..its about hundreds branch office in the difference place..so any one can help me to replicate our dbase by using slony?? and why slony?? This question should be asked on the slony1-general list, you'll get more responses there. The benefit of using slony is that you can read from many servers rather than just one. Indeed. It would be worth taking a peek at the documentation, notably the introductory material, as that will give some idea as to whether Slony-I is suitable at all for the desired purpose. One thing that tweaks my antennae a bit is the mention of having hundreds branch office; there are two things worth mentioning that would be relevant to that: - Slony-I is a single-master replication system, *not* a multimaster system. If someone is expecting to do updates at branch offices, and that this will propagate everywhere, that is likely not to work out easily or well. - If it *is* fair to assess that there will only be one master, Slony-I is intended to support a relatively limited number of nodes. There are no absolute restrictions on numbers of subscribers, but there are enough communications costs that grow in a polynomial fashion that I would be quite disinclined to have more than a dozen nodes in a cluster. -- (reverse (concatenate 'string ofni.sesabatadxunil @ enworbbc)) http://www3.sympatico.ca/cbbrowne/slony.html HELP! I'm being attacked by a tenured professor! ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] best use of an EMC SAN
[EMAIL PROTECTED] (Dave Cramer) writes: On 11-Jul-07, at 10:05 AM, Gregory Stark wrote: Dave Cramer [EMAIL PROTECTED] writes: Assuming we have 24 73G drives is it better to make one big metalun and carve it up and let the SAN manage the where everything is, or is it better to specify which spindles are where. This is quite a controversial question with proponents of both strategies. I would suggest having one RAID-1 array for the WAL and throw the rest of the This is quite unexpected. Since the WAL is primarily all writes, isn't a RAID 1 the slowest of all for writing ? The thing is, the disk array caches this LIKE CRAZY. I'm not quite sure how many batteries are in there to back things up; there seems to be multiple levels of such, which means that as far as fsync() is concerned, the data is committed very quickly even if it takes a while to physically hit disk. One piece of the controversy will be that the disk being used for WAL is certain to be written to as heavily and continuously as your heavy load causes. A fallout of this is that those disks are likely to be worked harder than the disk used for storing plain old data, with the result that if you devote disk to WAL, you'll likely burn thru replacement drives faster there than you do for the POD disk. It is not certain whether it is more desirable to: a) Spread that wear and tear across the whole array, or b) Target certain disks for that wear and tear, and expect to need to replace them somewhat more frequently. At some point, I'd like to do a test on a decent disk array where we take multiple configurations. Assuming 24 drives: - Use all 24 to make one big filesystem as the base case - Split off a set (6?) for WAL - Split off a set (6? 9?) to have a second tablespace, and shift indices there My suspicion is that the use all 24 for one big filesystem scenario is likely to be fastest by some small margin, and that the other cases will lose a very little bit in comparison. Andrew Sullivan had a somewhat similar finding a few years ago on some old Solaris hardware that unfortunately isn't at all relevant today. He basically found that moving WAL off to separate disk didn't affect performance materially. What's quite regrettable is that it is almost sure to be difficult to construct a test that, on a well-appointed modern disk array, won't basically stay in cache. -- let name=cbbrowne and tld=acm.org in name ^ @ ^ tld;; http://linuxdatabases.info/info/nonrdbms.html 16-inch Rotary Debugger: A highly effective tool for locating problems in computer software. Available for delivery in most major metropolitan areas. Anchovies contribute to poor coding style. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Very long SQL strings
[EMAIL PROTECTED] (Tom Lane) writes: PS: for the record, there is a hard limit at 1GB of query text, owing to restrictions built into palloc. But I think you'd hit other memory limits or performance bottlenecks before that one. It would be much funnier to set a hard limit of 640K of query text. The reasoning should be obvious :-). I once ran into the situation where Slony-I generated a query that made the parser blow out (some sort of memory problem / running out of stack space somewhere thing); it was just short of 640K long, and so we figured that evidently it was wrong to conclude that 640K ought to be enough for anybody. Neil Conway was an observer; he was speculating that, with some (possibly nontrivial) change to the parser, we should have been able to cope with it. The query consisted mostly of a NOT IN clause where the list had some atrocious number of entries in it (all integers). (Aside: I wound up writing a query compressor (now in 1.2) which would read that list and, if it was at all large, try to squeeze any sets of consecutive integers into sets of NOT BETWEEN clauses. Usually, the lists, of XIDs, were more or less consecutive, and frequently, in the cases where the query got to MBs in size, there would be sets of hundreds or even thousands of consecutive integers such that we'd be left with a tiny query after this...) -- select 'cbbrowne' || '@' || 'linuxfinances.info'; http://linuxfinances.info/info/linux.html As of next Monday, MACLISP will no longer support list structure. Please downgrade your programs. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] [doc patch] a slight VACUUM / VACUUM FULL doc improvement proposal
[EMAIL PROTECTED] (Michael Stone) writes: On Wed, May 16, 2007 at 12:09:26PM -0400, Alvaro Herrera wrote: Maybe, but we should also mention that CLUSTER is a likely faster workaround. Unless, of course, you don't particularly care about the order of the items in your table; you might end up wasting vastly more time rewriting tables due to unnecessary clustering than for full vacuums on a table that doesn't need it. Actually, this is irrelevant. If CLUSTER is faster than VACUUM FULL (and if it isn't, in all cases, it *frequently* is, and probably will be, nearly always, soon), then it's a faster workaround. -- output = (cbbrowne @ linuxfinances.info) http://cbbrowne.com/info/oses.html What if you slept? And what if, in your sleep, you dreamed? And what if, in your dream, you went to heaven and there plucked a strange and beautiful flower? And what if, when you awoke, you had the flower in your hand? Ah, what then? --Coleridge ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] What`s wrong with JFS configuration?
[EMAIL PROTECTED] (Paweł Gruszczyński) writes: To test I use pgBench with default database schema, run for 25, 50, 75 users at one time. Every test I run 5 time to take average. Unfortunetly my result shows that ext is fastest, ext3 and jfs are very simillar. I can understand that ext2 without jurnaling is faster than ext3, it is said that jfs is 40 - 60% faster. I cant see the difference. Part of My results: (transaction type | scaling factor | num of clients | tpl | num on transactions | tps including connection time | tps excliding connection time) EXT2: TPC-B (sort of),50,75,13,975|975,338.286682,358.855582 TPC-B (sort of),50,75,133,9975|9975,126.777438,127.023687 TPC-B (sort of),50,75,1333,99975|99975,125.612325,125.636193 EXT3: TPC-B (sort of),50,75,13,975|975,226.139237,244.619009 TPC-B (sort of),50,75,133,9975|9975,88.678922,88.935371 TPC-B (sort of),50,75,1333,99975|99975,79.126892,79.147423 JFS: TPC-B (sort of),50,75,13,975|975,235.626369,255.863271 TPC-B (sort of),50,75,133,9975|9975,88.408323,88.664584 TPC-B (sort of),50,75,1333,99975|99975,81.003394,81.024297 Can anyone tell me what`s wrong with my test? Or maybe it is normal? For one thing, this test is *probably* staying mostly in memory. That will be skewing results away from measuring anything about the filesystem. When I did some testing of comparative Linux filesystem performance, back in 2003, I found that JFS was maybe 20% percent faster on a write-only workload than XFS, which was a few percent faster than ext3. The differences weren't terribly large. If you're seeing such huge differences with pgbench (which includes read load, which should be virtually unaffected by one's choice of filesystem), then I can only conclude that something about your testing methodology is magnifying the differences. -- let name=cbbrowne and tld=cbbrowne.com in String.concat @ [name;tld];; http://cbbrowne.com/info/oses.html On the Internet, no one knows you're using Windows NT -- Ramiro Estrugo, [EMAIL PROTECTED] ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Configuration Advice
[EMAIL PROTECTED] (Steve) writes: I'm wondering what we can do to make this better if anything; would it be better to leave the indexes on? It doesn't seem to be. Definitely NOT. Generating an index via a bulk sort is a LOT faster than loading data into an index one tuple at a time. We saw a BIG increase in performance of Slony-I when, in version 1.1.5, we added a modification that shuts off indexes during COPY and then does a reindex. Conceivably, you might look at how Slony-I does that, and try doing the same thing; it might well be faster than doing a bunch of reindexes serially. (Or not...) Would it be better to use INSERTs instead of copies? Doesn't seem to be. I'd be mighty surprised. - The load process itself takes about 6 gigs of memory, the rest is free for postgres because this is basically all the machine does. The settings you have do not seem conspicuously wrong in any way. The one thought which comes to mind is that if you could turn this into a *mostly* incremental change, that might help. The thought: - Load the big chunk of data into a new table - Generate some minimal set of indices on the new table - Generate four queries that compare old to new: q1 - See which tuples are unchanged from yesterday to today q2 - See which tuples have been deleted from yesterday to today q3 - See which tuples have been added q4 - See which tuples have been modified If the unchanged set is extremely large, then you might see benefit to doing updates based on deleting the rows indicated by q2, inserting rows based on q3, and updating based on q4. In principle, computing and applying those 4 queries might be quicker than rebuilding from scratch. In principle, applying q2, then q4, then vacuuming, then q3, ought to be optimal. -- let name=cbbrowne and tld=cbbrowne.com in name ^ @ ^ tld;; http://linuxdatabases.info/info/linux.html A 'Cape Cod Salsa' just isn't right. -- Unknown ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] New to PostgreSQL, performance considerations
[EMAIL PROTECTED] (Daniel van Ham Colchete) writes: You are right Christopher. Okay. Let's solve this matter. What PostgreSQL benchmark software should I use??? pgbench is one option. There's a TPC-W at pgFoundry (http://pgfoundry.org/projects/tpc-w-php/). There's the Open Source Database Benchmark. (http://osdb.sourceforge.net/) Those are three reasonable options. I'll test PostgreSQL 8.1 on a Fedora Core 6 and on a Gentoo. I'll get the same version FC6 uses and install it at my Gentoo. I'll use the same hardware (diferent partitions to each). Wrong approach. You'll be comparing apples to oranges, because Gentoo and Fedora pluck sources from different points in the source code stream. In order to prove what you want to prove, you need to run the benchmarks all on Gentoo, where you run with 4 categorizations: 1. Where you run PostgreSQL and GLIBC without any processor-specific optimizations 2. Where you run PostgreSQL and GLIBC with all relevant processor-specific optimizations 3. Where you run PostgreSQL with, and GLIBC without processor-specific optimizations 4. Where you run PostgreSQL without, and GLIBC with processor-specific optimizations That would allow one to clearly distinguish which optimizations are particularly relevant. I'm not saying that Gentoo is faster than FC6. I just want to prove that if you compile your software to make better use of your processor, it will run faster. It might take a few days because I'm pretty busy right now at my job. I expect that you'll discover, if you actually do these tests, that this belief is fairly much nonsense. - Modern CPUs do a huge amount of on-CPU self-tuning. - CPU features that could have a material effect tend to be unusable when compiling general purpose libraries and applications. GCC doesn't generate MMX-like instructions. - Database application performance tends to be I/O driven. - When database application performance *isn't* I/O driven, it is likely to be driven by cache management, which compiler options won't affect. -- output = reverse(ofni.secnanifxunil @ enworbbc) http://linuxdatabases.info/info/sgml.html very few people approach me in real life and insist on proving they are drooling idiots. -- Erik Naggum, comp.lang.lisp ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] One table is very slow, but replicated table (same data) is fine
[EMAIL PROTECTED] writes: If anyone knows what may cause this problem, or has any other ideas, I would be grateful. Submit the command VACUUM ANALYZE VERBOSE locations; on both servers, and post the output of that. That might help us tell for sure whether the table is bloated (and needs VACUUM FULL/CLUSTER). The query plans are suggestive; on the 'master', the cost is 113921.40, whereas on the 'slave' it's 2185.09; I'll bet that those numbers are proportional to the number of pages assigned to the table on the respective servers... -- (reverse (concatenate 'string ofni.sesabatadxunil @ enworbbc)) http://cbbrowne.com/info/lsf.html We use Linux for all our mission-critical applications. Having the source code means that we are not held hostage by anyone's support department. -- Russell Nelson, President of Crynwr Software ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Optimization of this SQL sentence
[EMAIL PROTECTED] (Merlin Moncure) writes: On 10/17/06, Mario Weilguni [EMAIL PROTECTED] wrote: Am Dienstag, 17. Oktober 2006 11:52 schrieb Alexander Staubo: Lastly, note that in PostgreSQL these length declarations are not necessary: contacto varchar(255), fuente varchar(512), prefijopais varchar(10) Instead, use: contacto text, fuente text, prefijopais text See the PostgreSQL manual for an explanation of varchar vs. text. Enforcing length constraints with varchar(xyz) is good database design, not a bad one. Using text everywhere might be tempting because it works, but it's not a good idea. while you are correct, i think the spirit of the argument is wrong becuase there is no constraint to be enforced in those fields. a length constraint of n is only valid is n + 1 characters are an error and should be rejected by the database. anything else is IMO bad form. There are practial exceptions to this rule though, for example client technology that might require a length. so, imo alexander is correct: contacto varchar(255) ...is a false constraint, why exactly 255? is that were the dart landed? Yeah, 255 seems silly to me. If I'm going to be arbitrary, there are two better choices: 1. 80, because that's how many characters one can fit across a piece of paper whilst keeping things pretty readable; 2. 64, because that will fit on a screen, and leave some space for a field name/description. specifically limiting text fields so users 'don't enter too much data' is a manifestation c programmer's disease :) No, I can't agree. I'm pretty accustomed to languages that don't pinch you the ways C does, and I still dislike having over-wide columns because it makes it more difficult to generate readable reports. -- output = (cbbrowne @ linuxfinances.info) http://linuxdatabases.info/info/unix.html Instant coffee is like pouring hot water over the cremated remains of a good friend. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Optimization of this SQL sentence
[EMAIL PROTECTED] (Alexander Staubo) writes: On Oct 17, 2006, at 17:29 , Mario Weilguni wrote: Am Dienstag, 17. Oktober 2006 11:52 schrieb Alexander Staubo: Lastly, note that in PostgreSQL these length declarations are not necessary: contacto varchar(255), fuente varchar(512), prefijopais varchar(10) Enforcing length constraints with varchar(xyz) is good database design, not a bad one. Using text everywhere might be tempting because it works, but it's not a good idea. Enforcing length constraints is generally a bad idea because it assumes you know the data domain as expressed in a quantity of characters. Off the top of your head, do you know the maximum length of a zip code? A street address? The name of a city? In the case of a zip code? Sure. US zip codes are integer values either 5 or 9 characters long. In the case of some of our internal applications, we need to conform to some IETF and ITU standards which actually do enforce some maximum lengths on these sorts of things. In almost all cases the limit you invent is arbitrary, and the probability of being incompatible with any given input is inversely proportional to that arbitrary limit. I'd be quite inclined to limit things like addresses to somewhat smaller sizes than you might expect. If addresses are to be used to generate labels for envelopes, for instance, it's reasonably important to limit sizes to those that might fit on a label or an envelope. Encoding specific length constraints in the database makes sense when they relate explicitly to business logic, but I can think of only a few cases where it would make sense: restricting the length of passwords, user names, and so on. In a few cases you do know with 100% certainty the limit of your field, such as with standardized abbreviations: ISO 3166 country codes, for example. And sometimes you want to cap data due to storage or transmission costs. There's another reason: Open things up wide, and some people will fill the space with rubbish. -- cbbrowne,@,acm.org http://linuxfinances.info/info/internet.html The Amiga is proof that if you build a better mousetrap, the rats will gang up on you. -- Bill Roberts [EMAIL PROTECTED] ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Simple join optimized badly?
[EMAIL PROTECTED] (Craig A. James) writes: Mark Kirkwood wrote: The result? I can't use my function in any WHERE clause that involves any other conditions or joins. Only by itself. PG will occasionally decide to use my function as a filter instead of doing the join or the other WHERE conditions first, and I'm dead. this is an argument for cost-for-functions rather than hints AFAICS. Perhaps you scanned past what I wrote a couple paragraphs farther down. I'm going to repeat it because it's the KEY POINT I'm trying to make: Craig James wrote: Now you might argue that function-cost needs to be added to the optimizer's arsenal of tricks. And I'd agree with you: That WOULD be a better solution than hints. But I need my problem solved TODAY, not next year. Hints can help solve problems NOW that can be brought to the PG team's attention later, and in the mean time let me get my application to work. Unfortunately, that hint language also needs to mandate a temporal awareness of when hints were introduced so that it doesn't worsen things down the road. e.g. - Suppose you upgrade to 8.4, where the query optimizer becomes smart enough (perhaps combined with entirely new kinds of scan strategies) to make certain of your hints obsolete and/or downright wrong. Those hints (well, *some* of them) ought to be ignored, right? The trouble is that the hint language will be painfully large and complex. Its likely-nonstandard interaction with SQL will make query parsing worse. All we really have, at this point, is a vague desire for a hint language, as opposed to any clear direction as to what it should look like, and how it needs to interact with other system components. That's not nearly enough; there needs to be a clear design. -- (format nil [EMAIL PROTECTED] cbbrowne cbbrowne.com) http://cbbrowne.com/info/advocacy.html 'Typos in FINNEGANS WAKE? How could you tell?' -- Kim Stanley Robinson ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Simple join optimized badly?
[EMAIL PROTECTED] (Tom Lane) writes: Another thing we've been beat up about in the past is that loading a pg_dump script doesn't ANALYZE the data afterward... Do I misrecall, or were there not plans (circa 7.4...) to for pg_dump to have an option to do an ANALYZE at the end? I seem to remember some dispute as to whether the default should be to include the ANALYZE, with an option to suppress it, or the opposite... -- (reverse (concatenate 'string ofni.sesabatadxunil @ enworbbc)) http://www3.sympatico.ca/cbbrowne/wp.html You can measure a programmer's perspective by noting his attitude on the continuing vitality of FORTRAN. -- Alan J. Perlis ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] BUG #2658: Query not using index
[EMAIL PROTECTED] (Graham Davis) writes: Adding DESC to both columns in the SORT BY did not make the query use the multikey index. So both SELECT DISTINCT ON (assetid) assetid, ts FROM asset_positions ORDER BY assetid, ts DESC; and SELECT DISTINCT ON (assetid) assetid, ts FROM asset_positions ORDER BY assetid DESC, ts DESC; use the same query plans and both do sequential scans without using either the (assetid, ts) or (ts) indexes. Any other ideas on how to make this query use an index? Thanks, Why do you want to worsen performance by forcing the use of an index? You are reading through the entire table, after all, and doing so via a sequential scan is normally the fastest way to do that. An index scan would only be more efficient if you don't have enough space in memory to store all assetid values. -- (reverse (concatenate 'string gro.mca @ enworbbc)) http://www3.sympatico.ca/cbbrowne/emacs.html Expect the unexpected. -- The Hitchhiker's Guide to the Galaxy, page 7023 ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] BUG #2658: Query not using index
[EMAIL PROTECTED] (Graham Davis) writes: 40 seconds is much too slow for this query to run and I'm assuming that the use of an index will make it much faster (as seen when I removed the GROUP BY clause). Any tips? Assumptions are dangerous things. An aggregate like this has *got to* scan the entire table, and given that that is the case, an index scan is NOT optimal; a seq scan is. An index scan is just going to be slower. -- let name=cbbrowne and tld=linuxdatabases.info in String.concat @ [name;tld];; http://cbbrowne.com/info/linux.html The computer is the ultimate polluter: its feces are indistinguishable from the food it produces. -- Alan J. Perlis ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] XFS filessystem for Datawarehousing -2
[EMAIL PROTECTED] (Denis Lussier) writes: I have no personal experience with XFS, but, I've seen numerous internal edb-postgres test results that show that of all file systems... OCFS 2.0 seems to be quite good for PG update intensive apps (especially on 64 bit machines). I have been curious about OCFS for some time; it sounded like a case where there could possibly be some useful semantic changes to filesystem functionality, notably that: - atime is pretty irrelevant; - it might try to work with pretty fixed block sizes (8K, perhaps?) rather than try to be efficient at handling tiny files It sounds like it ought to be able to be a good fit. Of course, with a big warning sticker of what is required for Oracle to work properly is implemented, anything more is not a guarantee on it, who's going to trust it? -- select 'cbbrowne' || '@' || 'cbbrowne.com'; http://www.ntlug.org/~cbbrowne/oses.html There isn't any reason why Linux can't be implemented as an enterprise computing solution. Find out what you've been missing while you've been rebooting Windows NT. - Infoworld ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] XFS filessystem for Datawarehousing
[EMAIL PROTECTED] (Milen Kulev) writes: I am pretty exited whether XFS will clearly outpertform ETX3 (no default setups for both are planned !). I am not sure whether is it worth to include JFS in comparison too ... I did some benchmarking about 2 years ago, and found that JFS was a few percent faster than XFS which was a few percent faster than ext3, on a huge amounts of writes workload. That the difference was only a few percent made us draw the conclusion that FS performance was fairly much irrelevant. It is of *vastly* more importance whether the filesystem will survive power outages and the like, and, actually, Linux hasn't fared as well with that as I'd like. :-( The differences are small enough that what you should *actually* test for is NOT PERFORMANCE. You should instead test for reliability. - Turn off the power when the DB is under load, and see how well it survives. - Pull the fibrechannel cable, and see if the filesystem (and database) survives when under load. If you find that XFS is 4% faster, that's likely a *terrible* trade-off if it only survives power outage half as often as (say) ext3. -- (reverse (concatenate 'string gro.gultn @ enworbbc)) http://cbbrowne.com/info/wp.html C combines the power of assembler language with the convenience of assembler language. -- Unknown ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Is it possible to start two instances of postgresql?
[EMAIL PROTECTED] writes: Is it possible to start two instances of postgresql with different port and directory which run simultaneously? Certainly. We have one HACMP cluster which hosts 14 PostgreSQL instances across two physical boxes. (If one went down, they'd all migrate to the survivor...) If can then will this cause any problem or performance drop down? There certainly can be; the databases will be sharing disks, memory, and CPUs, so if they are avidly competing for resources, the competition is sure to have some impact on performance. Flip side: That 14 database cluster has several databases that are known to be very lightly used; they *aren't* competing, and aren't a problem. Consider it obvious that if you haven't enough memory or I/O bandwidth to cover your two PG instances, you'll find performance sucks... If you have enough, then it can work fine... -- select 'cbbrowne' || '@' || 'acm.org'; http://cbbrowne.com/info/linuxxian.html At Microsoft, it doesn't matter which file you're compiling, only which flags you #define. -- Colin Plumb ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Which processor runs better for Postgresql?
[EMAIL PROTECTED] (Steve Poe) writes: I have a client who is running Postgresql 7.4.x series database (required to use 7.4.x). They are planning an upgrade to a new server. They are insistent on Dell. Then they're being insistent on poor performance. If you search for dell postgresql performance you'll find plenty of examples of people who have been disappointed when they insisted on Dell for PostgreSQL. Here is a *long* thread on the matter... http://archives.postgresql.org/pgsql-performance/2004-12/msg00022.php I am hoping the client is willing to wait for Dell to ship a AMD Opeteron-based server. Based on Dell's history, I would neither: a) Hold my breath, nor b) Expect an Opteron-based Dell server to perform as well as seemingly-equivalent servers provisioned by other vendors. We got burned by some Celestica-built Opterons that didn't turn out quite as hoped. We have had somewhat better results with some HP Opterons; they appear to be surviving less-than-ideal 3rd world data centre situations with reasonable aplomb. (Based on the amount of dust in their diet, I'm somewhat surprised the disk drives are still running...) -- let name=cbbrowne and tld=acm.org in name ^ @ ^ tld;; http://cbbrowne.com/info/nonrdbms.html We are Pentium of Borg. Division is futile. You will be approximated. (seen in someone's .signature) ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Sun Fire T2000 and PostgreSQL 8.1.3
josh@agliodbs.com (Josh Berkus) writes: Juan, When I hit this pgsql on this laptop with a large query I can see the load spike up really high on both of my virtual processors. Whatever, pgsql is doing it looks like both cpu's are being used indepently. Nope, sorry, you're being decieved. Postgres is strictly one process, one query. It's not entirely deception; there is indeed independent use of both CPUs, it's just that it isn't from multithreading... -- output = reverse(gro.mca @ enworbbc) http://www.ntlug.org/~cbbrowne/internet.html Don't use C; In my opinion, C is a library programming language not an app programming language. -- Owen Taylor (GTK+ and ORBit developer) ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Decide between Postgresql and Mysql (help of
[EMAIL PROTECTED] (Craig A. James) writes: Gorshkov wrote: /flame on if you were *that* worried about performance, you wouldn't be using PHP or *any* interperted language /flame off sorry - couldn't resist it :-) I hope this was just a joke. You should be sure to clarify - there might be some newbie out there who thinks you are seriously suggesting coding major web sites in some old-fashioned compiled language. Actually, this seems not so bad a point... If people are so interested in micro-managing certain bits of how performance works, then it seems an excellent question to ask why NOT write all the CGIs in C. After all, CGI in C *won't* suffer from the performance troubles associated with repetitively loading in Perl/PHP frameworks (which is why things like FastCGI, mod_perl, and such came about), and you can get a fair level of assurance that the compiled C won't be the performance bottleneck. And yes, it does become natural to ask why not write CGIs in ASM? ;-) -- (format nil [EMAIL PROTECTED] cbbrowne acm.org) http://www3.sympatico.ca/cbbrowne/lisp.html When I was a boy of fourteen, my father was so ignorant I could hardly stand to have the old man around. But when I got to be twenty-one, I was astonished at how much the old man had learned in seven years. -- Mark Twain ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Scaling up PostgreSQL in Multiple CPU / Dual Core
[EMAIL PROTECTED] (Jim C. Nasby) writes: On Thu, Mar 23, 2006 at 09:22:34PM -0500, Christopher Browne wrote: Martha Stewart called it a Good Thing when [EMAIL PROTECTED] (Scott Marlowe) wrote: On Thu, 2006-03-23 at 10:43, Joshua D. Drake wrote: Has someone been working on the problem of splitting a query into pieces and running it on multiple CPUs / multiple machines? Yes. Bizgress has done that. I believe that is limited to Bizgress MPP yes? Yep. I hope that someday it will be released to the postgresql global dev group for inclusion. Or at least parts of it. Question: Does the Bizgress/MPP use threading for this concurrency? Or forking? If it does so via forking, that's more portable, and less dependent on specific complexities of threading implementations (which amounts to non-portability ;-)). Most times Jan comes to town, we spend a few minutes musing about the splitting queries across threads problem, and dismiss it again; if there's the beginning of a split across processes, that's decidedly neat :-). Correct me if I'm wrong, but there's no way to (reasonably) accomplish that without having some dedicated extra processes laying around that you can use to execute the queries, no? In other words, the cost of a fork() during query execution would be too prohibitive... Counterexample... The sort of scenario we keep musing about is where you split off a (thread|process) for each partition of a big table. There is in fact a natural such partitioning, in that tables get split at the 1GB mark, by default. Consider doing a join against 2 tables that are each 8GB in size (e.g. - they consist of 8 data files). Let's assume that the query plan indicates doing seq scans on both. You *know* you'll be reading through 16 files, each 1GB in size. Spawning a process for each of those files doesn't strike me as prohibitively expensive. A naive read on this is that you might start with one backend process, which then spawns 16 more. Each of those backends is scanning through one of those 16 files; they then throw relevant tuples into shared memory to be aggregated/joined by the central one. That particular scenario is one where the fork()s would hardly be noticeable. FWIW, DB2 executes all queries in a dedicated set of processes. The process handling the connection from the client will pass a query request off to one of the executor processes. I can't remember which process actually plans the query, but I know that the executor runs it. It seems to me that the kinds of cases where extra processes/threads would be warranted are quite likely to be cases where fork()ing may be an immaterial cost. -- let name=cbbrowne and tld=ntlug.org in String.concat @ [name;tld];; http://www.ntlug.org/~cbbrowne/languages.html TECO Madness: a moment of convenience, a lifetime of regret. -- Dave Moon ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Scaling up PostgreSQL in Multiple CPU / Dual Core
[EMAIL PROTECTED] (Luke Lonergan) writes: Christopher, On 3/23/06 6:22 PM, Christopher Browne [EMAIL PROTECTED] wrote: Question: Does the Bizgress/MPP use threading for this concurrency? Or forking? If it does so via forking, that's more portable, and less dependent on specific complexities of threading implementations (which amounts to non-portability ;-)). OK - I'll byte: It's process based, we fork backends at slice points in the execution plan. By slice points, do you mean that you'd try to partition tables (e.g. - if there's a Seq Scan on a table with 8 1GB segments, you could spawn as many as 8 processes), or that two scans that are then merge joined means a process for each scan, and a process for the merge join? Or perhaps both :-). Or perhaps something else entirely ;-). To take care of the startup latency problem, we persist sets of these backends, called gangs. They appear, persist for connection scope for reuse, then are disbanded. If only that could happen to more gangs... -- output = (cbbrowne @ cbbrowne.com) http://cbbrowne.com/info/multiplexor.html I'm sorry, the teleportation booth you have reached is not in service at this time. Please hand-reassemble your molecules or call an operator to help you ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Scaling up PostgreSQL in Multiple CPU / Dual Core
[EMAIL PROTECTED] (Michael Stone) writes: On Fri, Mar 24, 2006 at 01:21:23PM -0500, Chris Browne wrote: A naive read on this is that you might start with one backend process, which then spawns 16 more. Each of those backends is scanning through one of those 16 files; they then throw relevant tuples into shared memory to be aggregated/joined by the central one. Of course, table scanning is going to be IO limited in most cases, and having every query spawn 16 independent IO threads is likely to slow things down in more cases than it speeds them up. It could work if you have a bunch of storage devices, but at that point it's probably easier and more direct to implement a clustered approach. All stipulated, yes. It obviously wouldn't be terribly useful to scan more aggressively than I/O bandwidth can support. The point is that this is one of the kinds of places where concurrent processing could do some good... -- let name=cbbrowne and tld=acm.org in name ^ @ ^ tld;; http://cbbrowne.com/info/spiritual.html Save the whales. Collect the whole set. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Large Table With Only a Few Rows
Nik [EMAIL PROTECTED] writes: I have a table that has only a few records in it at the time, and they get deleted every few seconds and new records are inserted. Table never has more than 5-10 records in it. However, I noticed a deteriorating performance in deletes and inserts on it. So I performed vacuum analyze on it three times (twice in a row, and once two days later). In the statistics it says that the table size is 863Mb, toast table size is 246Mb, and indexes size is 134Mb, even though the table has only 5-10 rows in it it. I was wondering how can I reclaim all this space and improve the performance? You need to run VACUUM ANALYZE on this table very frequently. Based on what you describe, very frequently should be on the order of at least once per minute. Schedule a cron job specifically to vacuum this table, with a cron entry like the following: * * * * * /usr/local/bin/vacuumdb -z -t my_table -p 5432 my_database Of course, you need to bring it back down to size, first. You could run CLUSTER on the table to bring it back down to size; that's probably the fastest way... cluster my_table_pk on my_table; VACUUM FULL would also do the job, but probably not as quickly. -- (reverse (concatenate 'string gro.gultn @ enworbbc)) http://cbbrowne.com/info/sgml.html Now they can put you in jail if they *THINK* you're gonna commit a crime. Let me say that again, because it sounds vaguely important --george carlin ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] The trigger can be specified to fire on time condition?
[EMAIL PROTECTED] (Jamal Ghaffour) writes: Hi All, I ' m using the postgresql datbase to stores cookies. Theses cookies become invalid after 30 mn and have to be deleted. i have defined a procedure that will delete all invalid cookies, but i don't know how to call it in loop way (for example each hour). I think that it possible because this behaivor is the same of the autovaccum procedure that handle the vaccum process every time (60s in default way). After reading the documentation, it seems that triggers can't handle this stuff . how can i resolve the problem ? Time-based event scheduling is done using cron, external to the database. -- output = reverse(gro.mca @ enworbbc) http://cbbrowne.com/info/sgml.html Even in the area of anticompetitive conduct, Microsoft is mainly an imitator. -- Ralph Nader (1998/11/11) ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Default autovacuum settings too conservative
matthew@zeut.net (Matthew T. O'Connor) writes: I think the default settings should be designed to minimize the impact autovacuum has on the system while preventing the system from ever getting wildly bloated (also protect xid wraparound, but that doesn't have anything to do with the thresholds). That would suggest setting the base threshold autovacuum_vacuum_threshold relatively low, and the scale factor autovacuum_vacuum_scale_factor fairly high. -- let name=cbbrowne and tld=ntlug.org in name ^ @ ^ tld;; http://cbbrowne.com/info/nonrdbms.html I think it may be possible to simplify and condense the content of this thread somewhat: GX is an ex-API. It is no longer supported - The Rest of Us No it isn't. It's just pining for the fjords! - Lawson -- Michael Paquette ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Investigating IO Saturation
[EMAIL PROTECTED] (Tom Lane) writes: Brad Nicholson [EMAIL PROTECTED] writes: I'm investigating a potential IO issue. We're running 7.4 on AIX 5.1. During periods of high activity (reads, writes, and vacuums), we are seeing iostat reporting 100% disk usage. I have a feeling that the iostat numbers are misleading. I can make iostat usage jump from less than 10% to greater than 95% by running a single vacuum against a moderate sized table (no noticeable change in the other activity). That's not particularly surprising, and I see no reason to think that iostat is lying to you. More recent versions of PG include parameters that you can use to throttle vacuum's I/O demand ... but unthrottled, it's definitely an I/O hog. I believe it's 7.4 where the cost-based vacuum parameters entered in, so that would, in principle, already be an option. [rummaging around...] Hmm There was a patch for 7.4, but it's only standard as of 8.0... The vmstat numbers suggest that vacuum is not completely killing you, but you could probably get some improvement in foreground query performance by throttling it back. There are other good reasons to consider an update, anyway. I'd have reservations about throttling it back because that would lead to VACUUMs running, and holding transactions open, for 6 hours instead of 2. That is consistent with benchmarking; there was a report of the default policy cutting I/O load by ~80% at the cost of vacuums taking 3x as long to complete. The real answer is to move to 8.x, where VACUUM doesn't chew up shared memory cache as it does in 7.4 and earlier. But in the interim, we need to make sure we tilt over the right windmills, or something of the sort :-). -- output = reverse(gro.gultn @ enworbbc) http://www3.sympatico.ca/cbbrowne/linuxxian.html Women and cats will do as they please, and men and dogs should relax and get used to the idea. -- Robert A. Heinlein ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Autovacuum / full vacuum
[EMAIL PROTECTED] (Mindaugas) writes: Even a database-wide vacuum does not take locks on more than one table. The table locks are acquired and released one by one, as the operation proceeds. Has that changed recently? I have always seen vacuumdb or SQL VACUUM (without table specifications) running as one long transaction which doesn't release the locks that it is granted until the end of the transaction. You sure? It's not supposed to, and watching a database-wide vacuum with select * from pg_locks doesn't look to me like it ever has locks on more than one table (plus the table's indexes and toast table). Are there some plans to remove vacuum altogether? I don't see that on the TODO list... http://www.postgresql.org/docs/faqs.TODO.html To the contrary, there is a whole section on what functionality to *ADD* to VACUUM. -- let name=cbbrowne and tld=acm.org in String.concat @ [name;tld];; http://www.ntlug.org/~cbbrowne/finances.html There are two types of hackers working on Linux: those who can spell, and those who can't. There is a constant, pitched battle between the two camps. --Russ Nelson (Linux Kernel Summary, Ver. 1.1.75 - 1.1.76) ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Autovacuum / full vacuum (off-topic?)
[EMAIL PROTECTED] (Michael Crozier) writes: On Wednesday 18 January 2006 08:54 am, Chris Browne wrote: To the contrary, there is a whole section on what functionality to *ADD* to VACUUM. Near but not quite off the topic of VACUUM and new features... I've been thinking about parsing the vacuum output and storing it in Postgresql. All the tuple, page, cpu time, etc... information would be inserted into a reasonably flat set of tables. The benefits I would expect from this are: * monitoring ability - I could routinely monitor the values in the table to warn when vacuum's are failing or reclaimed space has risen dramatically. I find it easier to write and maintain monitoring agents that perform SQL queries than ones that need to routinely parse log files and coordinate with cron. * historical perspective on tuple use - which a relatively small amount of storage, I could use the vacuum output to get an idea of usage levels over time, which is beneficial for planning additional capacity * historical information could theoretically inform the autovacuum, though I assume there are better alternatives planned. * it could cut down on traffic on this list if admin could see routine maintenance in a historical context. Assuming this isn't a fundamentally horrible idea, it would be nice if there were ways to do this without parsing the pretty-printed vacuum text (ie, callbacks, triggers, guc variable). I'd like to know if anybody does this already, thinks its a bad idea, or can knock me on the noggin with the pg manual and say, it's already there!. We had someone working on that for a while; I don't think it got to the point of being something ready to unleash on the world. I certainly agree that it would be plenty useful to have this sort of information available. Having a body of historical information could lead to having some more informed suggestions for heuristics. -- (reverse (concatenate 'string gro.mca @ enworbbc)) http://cbbrowne.com/info/unix.html Bad command. Bad, bad command! Sit! Stay! Staaay... ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Autovacuum / full vacuum
[EMAIL PROTECTED] (Andrew Sullivan) writes: On Tue, Jan 17, 2006 at 11:18:59AM +0100, Michael Riess wrote: hi, I'm curious as to why autovacuum is not designed to do full vacuum. I Because nothing that runs automatically should ever take an exclusive lock on the entire database, which is what VACUUM FULL does. That's a bit more than what autovacuum would probably do... autovacuum does things table by table, so that what would be locked should just be one table. Even so, I'd not be keen on having anything that runs automatically take an exclusive lock on even as much as a table. activity. Increasing the FSM so that even during these bursts most space would be reused would mean to reduce the available memory for all other database tasks. I don't believe the hit is enough that you should even notice it. You'd have to post some pretty incredible use cases to show that the tiny loss of memory to FSM is worth (a) an exclusive lock and (b) the loss of efficiency you get from having some preallocated pages in tables. There is *a* case for setting up full vacuums of *some* objects. If you have a table whose tuples all get modified in the course of some common query, that will lead to a pretty conspicuous bloating of *that table.* Even with a big FSM, the pattern of how updates take place will lead to that table having ~50% of its space being dead/free, which is way higher than the desirable stable proportion of 10-15%. For that sort of table, it may be attractive to run VACUUM FULL on a regular basis. Of course, it may also be attractive to try to come up with an update process that won't kill the whole table's contents at once ;-). -- let name=cbbrowne and tld=cbbrowne.com in String.concat @ [name;tld];; http://cbbrowne.com/info/x.html As long as each individual is facing the TV tube alone, formal freedom poses no threat to privilege. --Noam Chomsky ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Autovacuum / full vacuum
[EMAIL PROTECTED] (Alvaro Herrera) writes: Chris Browne wrote: [EMAIL PROTECTED] (Andrew Sullivan) writes: On Tue, Jan 17, 2006 at 11:18:59AM +0100, Michael Riess wrote: hi, I'm curious as to why autovacuum is not designed to do full vacuum. I Because nothing that runs automatically should ever take an exclusive lock on the entire database, which is what VACUUM FULL does. That's a bit more than what autovacuum would probably do... autovacuum does things table by table, so that what would be locked should just be one table. Even a database-wide vacuum does not take locks on more than one table. The table locks are acquired and released one by one, as the operation proceeds. And as you know, autovacuum (both 8.1's and contrib) does issue database-wide vacuums, if it finds a database close to an xid wraparound. Has that changed recently? I have always seen vacuumdb or SQL VACUUM (without table specifications) running as one long transaction which doesn't release the locks that it is granted until the end of the transaction. -- cbbrowne,@,acm.org http://cbbrowne.com/info/spiritual.html My nostalgia for Icon makes me forget about any of the bad things. I don't have much nostalgia for Perl, so its faults I remember. -- Scott Gilbert comp.lang.python ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] 15,000 tables
Michael Riess [EMAIL PROTECTED] writes: On 12/1/05, Michael Riess [EMAIL PROTECTED] wrote: we are currently running a postgres server (upgraded to 8.1) which has one large database with approx. 15,000 tables. Unfortunately performance suffers from that, because the internal tables (especially that which holds the attribute info) get too large. (We NEED that many tables, please don't recommend to reduce them) Have you ANALYZEd your database? VACUUMing? Of course ... before 8.1 we routinely did a vacuum full analyze each night. As of 8.1 we use autovacuum. VACUUM FULL was probably always overkill, unless always includes versions prior to 7.3... BTW, are you using some kind of weird ERP? I have one that treat informix as a fool and don't let me get all of informix potential... maybe the same is in your case... No. Our database contains tables for we content management systems. The server hosts approx. 500 cms applications, and each of them has approx. 30 tables. That's why I'm asking if it was better to have 500 databases with 30 tables each. In previous Postgres versions this led to even worse performance ... This has the feeling of fitting with Alan Perlis' dictum below... Supposing you have 500 databases, each with 30 tables, each with 4 indices, then you'll find you have, on disk... # of files = 500 x 30 x 5 = 75000 files If each is regularly being accessed, that's bits of 75000 files getting shoved through OS and shared memory caches. Oh, yes, and you'll also have regular participation of some of the pg_catalog files, with ~500 instances of THOSE, multiplied some number of ways... An application with 15000 frequently accessed tables doesn't strike me as being something that can possibly turn out well. You have, in effect, more tables than (arguably) bloated ERP systems like SAP R/3; it only has a few thousand tables, and since many are module-specific, and nobody ever implements *all* the modules, it is likely only a few hundred that are hot spots. No 15000 there... -- (format nil [EMAIL PROTECTED] cbbrowne cbbrowne.com) http://www3.sympatico.ca/cbbrowne/languages.html It is better to have 100 functions operate on one data structure than 10 functions on 10 data structures. -- Alan J. Perlis ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] index auto changes after copying data ?
[EMAIL PROTECTED] ([EMAIL PROTECTED]) writes: I know in mysql, index will auto change after copying data Of course, index will change after inserting a line in postgresql, but what about copying data? Do you mean, by this, something like... Are indexes affected by loading data using the COPY command just as they are if data is loaded using INSERT? If so, then the answer is Yes, certainly. Indexes are updated whichever statement you use to load in data. -- let name=cbbrowne and tld=acm.org in name ^ @ ^ tld;; http://www.ntlug.org/~cbbrowne/finances.html Rules of the Evil Overlord #160. Before being accepted into my Legions of Terror, potential recruits will have to pass peripheral vision and hearing tests, and be able to recognize the sound of a pebble thrown to distract them. http://www.eviloverlord.com/ ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Status of Opteron vs Xeon
[EMAIL PROTECTED] (Jeff Frost) writes: What's the current status of how much faster the Opteron is compared to the Xeons? I know the Opterons used to be close to 2x faster, but is that still the case? I understand much work has been done to reduce the contect switching storms on the Xeon architecture, is this correct? Work has gone into 8.1 to try to help with the context switch storms; that doesn't affect previous versions. Furthermore, it does not do anything to address the consideration that memory access on Opterons seem to be intrinsically faster than on Xeon due to differences in the memory bus architecture. The only evident ways to address that are: a) For Intel to deploy chips with better memory buses; b) For Intel to convince people to deploy compilers that optimize badly on AMD to make Intel chips look better... -- (format nil [EMAIL PROTECTED] cbbrowne ntlug.org) http://cbbrowne.com/info/lsf.html A mathematician is a machine for converting caffeine into theorems. ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Ultra-cheap NVRAM device
[EMAIL PROTECTED] (Dan Harris) writes: On Oct 3, 2005, at 5:02 AM, Steinar H. Gunderson wrote: I thought this might be interesting, not the least due to the extremely low price ($150 + the price of regular DIMMs): Replying before my other post came through.. It looks like their benchmarks are markedly improved since the last article I read on this. There may be more interest now.. It still needs a few more generations worth of improvement. 1. It's still limited to SATA speed 2. It's not ECC smart What I'd love to see would be something that much smarter, or, at least, that pushes the limits of SATA speed, and which has both a battery on board and enough CF storage to cope with outages. -- output = reverse(gro.mca @ enworbbc) http://www.ntlug.org/~cbbrowne/linuxxian.html We all live in a yellow subroutine, a yellow subroutine, a yellow subroutine... ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] int2 vs int4 in Postgres
[EMAIL PROTECTED] (Announce) writes: I KNOW that I am not going to have anywhere near 32,000+ different genres in my genre table so why use int4? Would that squeeze a few more milliseconds of performance out of a LARGE song table query with a genre lookup? By the way, I see a lot of queries on tables NOT optimized in this fashion that run in less than a millisecond, so it would seem remarkable to me if there were milliseconds to be squeezed out in the first place... -- output = reverse(moc.enworbbc @ enworbbc) http://www.ntlug.org/~cbbrowne/sap.html Why do we drive on parkways and park on driveways? ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] int2 vs int4 in Postgres
[EMAIL PROTECTED] (Announce) writes: I KNOW that I am not going to have anywhere near 32,000+ different genres in my genre table so why use int4? Would that squeeze a few more milliseconds of performance out of a LARGE song table query with a genre lookup? If the field is immaterial in terms of the size of the table, then it won't help materially. If you were going to index on it, however, THAT would make it significant for indices involving the genre column. Fitting more tuples into each page is a big help, and this would help. I doubt it'll be material, but I'd think it a good thing to apply what restrictions to your data types that you can, a priori, so I'd be inclined to use int2 for this... -- let name=cbbrowne and tld=cbbrowne.com in String.concat @ [name;tld];; http://cbbrowne.com/info/nonrdbms.html Rules of the Evil Overlord #136. If I build a bomb, I will simply remember which wire to cut if it has to be deactivated and make every wire red. http://www.eviloverlord.com/ ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Advice on RAID card
[EMAIL PROTECTED] (Joshua D. Drake) writes: There is a huge advantage to software raid on all kinds of levels. If you have the CPU then I suggest it. However you will never get the performance out of software raid on the high level (think 1 gig of cache) that you would on a software raid setup. This appears to be a case where the ludicrous MHz increases on desktop CPUs has actually provided a material benefit. The sorts of embedded controllers typically used on RAID controllers are StrongARMs and i960s, and, well, 250MHz is actually fast for these. When AMD and Intel fight over adding gigahertz and megabytes of cache to their chips, this means that the RAID work can get pushed over to one's main CPU without chewing up terribly much of its bandwidth. That says to me that in the absence of battery backed cache, it's not worth having a bottom-end RAID controller. Particularly if the death of the controller would be likely to kill your data. Battery-backed cache changes the value proposition, of course... -- select 'cbbrowne' || '@' || 'acm.org'; http://cbbrowne.com/info/linuxdistributions.html All generalizations are false, including this one. ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] VACUUM FULL vs CLUSTER
[EMAIL PROTECTED] (Stef) writes: Bruno Wolff III mentioned : = If you have a proper FSM setting you shouldn't need to do vacuum fulls = (unless you have an older version of postgres where index bloat might = be an issue). What version of postgres was the last version that had the index bloat problem? I believe that was fixed in 7.3; it was certainly resolved by 7.4... -- (format nil [EMAIL PROTECTED] cbbrowne cbbrowne.com) http://www.ntlug.org/~cbbrowne/spiritual.html MICROS~1 has brought the microcomputer OS to the point where it is more bloated than even OSes from what was previously larger classes of machines altogether. This is perhaps Bill's single greatest accomplishment. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] please comment on cpu 32 bit or 64 bit
[EMAIL PROTECTED] (wisan watcharinporn) writes: please help me , comment on postgresql (8.x.x) performance on cpu AMD, INTEL and why i should use 32 bit or 64 cpu ? (what the performance difference) Generally speaking, the width of your I/O bus will be more important to performance than the width of the processor bus. That is, having more and better disk will have more impact on performance than getting a better CPU. That being said, if you plan to have a system with significantly more than 2GB of memory, there seem to be pretty substantial benefits to the speed of AMD memory bus access, and that can be quite significant, given that if you have a lot of memory, and thus are often operating out of cache, and are slinging around big queries, THAT implies a lot of shoving data around in memory. AMD/Opteron has a way faster memory bus than the Intel/Xeon systems. But this is only likely to be significant if you're doing processing intense enough that you commonly have 4GB of memory in use. If not, then you'd better focus on I/O speed, which is typically pretty independent of the CPU... -- (format nil [EMAIL PROTECTED] cbbrowne ntlug.org) http://www.ntlug.org/~cbbrowne/rdbms.html Anyway I know how to not be bothered by consing on the fly. -- Dave Moon ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] When to do a vacuum for highly active table
[EMAIL PROTECTED] (Rigmor Ukuhe) writes: -Original Message- From: [EMAIL PROTECTED] [mailto:pgsql-performance- [EMAIL PROTECTED] On Behalf Of Markus Benne Sent: Wednesday, August 31, 2005 12:14 AM To: pgsql-performance@postgresql.org Subject: [PERFORM] When to do a vacuum for highly active table We have a highly active table that has virtually all entries updated every 5 minutes. Typical size of the table is 50,000 entries, and entries have grown fat. We are currently vaccuming hourly, and towards the end of the hour we are seeing degradation, when compared to the top of the hour. Vaccum is slowly killing our system, as it is starting to take up to 10 minutes, and load at the time of vacuum is 6+ on a Linux box. During the vacuum, overall system is goin unresponsive, then comes back once vacuum completes. Play with vacuum_cost_delay option. In our case it made BIG difference (going from very heavy hitting to almost unnoticed vacuuming.) That helps only if the ONLY problem you're having is from the direct I/O of the vacuum. If part of the problem is that the table is so large that it takes 4h for VACUUM to complete, thereby leaving a transaction open for 4h, thereby causing other degradations, then vacuum_cost_delay will have a NEGATIVE impact, as it will mean that the vacuum on that table will take even /more/ than 4h. :-( For the above scenario, it is almost certain that the solution comes in two pieces: 1. VACUUM FULL / CLUSTER to bring the size down. The table has grown fat, and no number of repetitions of plain vacuum will fix this. 2. Do plain vacuum on the table VASTLY more frequently, probably every 5 minutes, possibly more often than that. By doing this, you prevent things from getting so bad again. By the way, in this sort of situation, _ANY_ transaction that runs more than about 5 minutes represents a serious enemy to performance, as it will tend to cause the hot table to get fatter. -- (reverse (concatenate 'string gro.gultn @ enworbbc)) http://www.ntlug.org/~cbbrowne/linux.html TECO Madness: a moment of regret, a lifetime of convenience. -- Kent Pitman ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] When to do a vacuum for highly active table
[EMAIL PROTECTED] (Markus Benne) writes: We have a highly active table that has virtually all entries updated every 5 minutes. Typical size of the table is 50,000 entries, and entries have grown fat. We are currently vaccuming hourly, and towards the end of the hour we are seeing degradation, when compared to the top of the hour. You're not vacuuming the table nearly often enough. You should vacuum this table every five minutes, and possibly more often than that. [We have some tables like that, albeit smaller than 50K entries, which we vacuum once per minute in production...] We are thinking of splitting the table in two: the part the updates often, and the part the updates infrequently as we suspect that record size impacts vacuum. There's *some* merit to that. You might discover that there's a hot spot that needs to be vacuumed once per minute. But it may be simpler to just hit the table with a vacuum once every few minutes even though some tuples are seldom updated. -- output = reverse(gro.gultn @ enworbbc) http://cbbrowne.com/info/spreadsheets.html Signs of a Klingon Programmer #3: By filing this TPR you have challenged the honor of my family. Prepare to die! ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Read/Write block sizes
[EMAIL PROTECTED] (Ron) writes: At 03:45 PM 8/25/2005, Josh Berkus wrote: Ask me sometime about my replacement for GNU sort. Â It uses the same sorting algorithm, but it's an order of magnitude faster due to better I/O strategy. Â Someday, in my infinite spare time, I hope to demonstrate that kind of improvement with a patch to pg. Since we desperately need some improvements in sort performance, I do hope you follow up on this. I'll generalize that. IMO we desperately need any and all improvements in IO performance. Even more so than we need improvements in sorting or sorting IO performance. That's frankly a step backwards. Feel free to specialise that instead. A patch that improves some specific aspect of performance is a thousand times better than any sort of desperate desire for any and all improvements in I/O performance. The latter is unlikely to provide any usable result. The specialized patch is also pointedly better in that a *confidently submitted* patch is likely to be way better than any sort of desperate clutching at whatever may come to hand. Far too often, I see people trying to address performance problems via the desperate clutching at whatever seems near to hand, and that generally turns out very badly as a particular result of the whole desperate clutching part. If you can get a sort improvement submitted, that's a concrete improvement... -- select 'cbbrowne' || '@' || 'ntlug.org'; http://www3.sympatico.ca/cbbrowne/lisp.html Appendium to the Rules of the Evil Overlord #1: I will not build excessively integrated security-and-HVAC systems. They may be Really Cool, but are far too vulnerable to breakdowns. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Performance for relative large DB
tobbe [EMAIL PROTECTED] writes: Hi Chris. Thanks for the answer. Sorry that i was a bit unclear. 1) We update around 20.000 posts per night. No surprise there; I would have been surprised to see 100/nite or 6M/nite... 2) What i meant was that we suspect that the DBMS called PervasiveSQL that we are using today is much to small. That's why we're looking for alternatives. Today we base our solution much on using querry-specific tables created at night, so instead of doing querrys direct on the post table (with 4-6M rows) at daytime, we have the data pre-aligned in several much smaller tables. This is just to make the current DBMS coop with our amount of data. What I am particulary interested in is if we can expect to run all our select querrys directly from the post table with PostgreSQL. Given a decent set of indices, I'd expect that to work OK... Whether 4M or 6M rows, that's pretty moderate in size. If there are specific states that rows are in which are of interest, then you can get big wins out of having partial indices... Consider... create index partial_post_status on posts where status in ('Active', 'Pending', 'Locked'); -- When processing of postings are completely finished, they wind up with 'Closed' status We have some 'stateful' tables in our environment where the interesting states are 'P' (where work is pending) and 'C' (where all the work has been completed and the records are never of interest again except as ancient history); the partial index where status = 'P' winds up being incredibly helpful. It's worth your while to dump data out from Pervasive and load it into a PostgreSQL instance and to do some typical sorts of queries on the PostgreSQL side. Do EXPLAIN ANALYZE [some select statement]; and you'll get a feel for how PostgreSQL is running the queries. Fiddling with indices to see how that affects things will also be a big help. You may find there are columns with large cardinalities (quite a lot of unique values) where you want to improve the stats analysis via... alter posts alter column [whatever] set statistics 100; -- Default is 10 bins analyze posts; -- then run ANALYZE to update statistics 3) How well does postgres work with load balancing environments. Is it built-in? Load balancing means too many things. Can you be more specific about what you consider it to mean? For Internet registry operations, we use replication (Slony-I) to create replicas used to take particular sorts of load off the master systems. But you might be referring to something else... For instance, connection pools, whether implemented inside applications (everyone doing Java has one or more favorite Java connection pool implementations) or in web servers (Apache has a DB connection pool manager) or in an outside application (pgpool, a C-based connection pool manager) are also sometimes used for load balancing. -- (reverse (concatenate 'string gro.mca @ enworbbc)) http://www3.sympatico.ca/cbbrowne/postgresql.html In case you weren't aware, ad homineum is not latin for the user of this technique is a fine debater. -- Thomas F. Burdick ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Read/Write block sizes
[EMAIL PROTECTED] (Steve Poe) writes: Chris, Unless I am wrong, you're making the assumpting the amount of time spent and ROI is known. Maybe those who've been down this path know how to get that additional 2-4% in 30 minutes or less? While each person and business' performance gains (or not) could vary, someone spending the 50-100h to gain 2-4% over a course of a month for a 24x7 operation would seem worth the investment? What we *do* know is that adding these knobs would involve a significant amount of effort, as the values are widely used throughout the database engine. Making them dynamic (e.g. - so they could be tuned on a tablespace-by-tablespace basis) would undoubtedly require rather a lot of development effort. They are definitely NOT 30 minute changes. Moreover, knowing how to adjust them is almost certainly also NOT a 30 minute configuration change; significant benchmarking effort for the individual application is almost sure to be needed. It's not much different from the reason why PostgreSQL doesn't use threading... The problem with using threading is that introducing it to the code base would require a pretty enormous amount of effort (I'll bet multiple person-years), and it wouldn't provide *any* benefit until you get rather a long ways down the road. Everyone involved in development seems to me to have a reasonably keen understanding as to what the potential benefits of threading are; the value is that there fall out plenty of opportunities to parallelize the evaluation of portions of queries. Alas, it wouldn't be until *after* all the effort goes in that we would get any idea as to what kinds of speedups this would provide. In effect, there has to be a year invested in *breaking* PostgreSQL (because this would initially break a lot, since thread programming is a really tough skill) where you don't actually see any benefits. I would assume that dbt2 with STP helps minimize the amount of hours someone has to invest to determine performance gains with configurable options? That's going to help in constructing a default knob value. And if we find an optimal default, that encourages sticking with the current approach, of using #define to apply that value... If someone spends 100h working on one of these items, and gets a 2% performance improvement, that's almost certain to be less desirable than spending 50h on something else that gets a 4% improvement. And we might discover that memory management improvements in Linux 2.6.16 or FreeBSD 5.5 allow some OS kernels to provide some such improvements for free behind our backs without *any* need to write database code. :-) -- let name=cbbrowne and tld=ntlug.org in String.concat @ [name;tld];; http://www.ntlug.org/~cbbrowne/lisp.html For those of you who are into writing programs that are as obscure and complicated as possible, there are opportunities for... real fun here -- Arthur Norman ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Caching by Postgres
[EMAIL PROTECTED] (Donald Courtney) writes: I mean well with this comment - This whole issue of data caching is a troubling issue with postreSQL in that even if you ran postgreSQL on a 64 bit address space with larger number of CPUs you won't see much of a scale up and possibly even a drop. I am not alone in having the *expectation* that a database should have some cache size parameter and the option to skip the file system. If I use oracle, sybase, mysql and maxdb they all have the ability to size a data cache and move to 64 bits. Is this a crazy idea - that a project be started to get this adopted? Is it too big and structural to contemplate? This project amounts to Implement Your Own Operating System, because it requires that the DBMS take over the things that operating systems normally do, like: a) Managing access to filesystems and b) Managing memory The world is already sufficiently filled up with numerous variations of Linux, BSD 4.4 Lite, and UNIX System V; I can't see justification for reinventing this wheel still again. -- (format nil [EMAIL PROTECTED] cbbrowne acm.org) http://cbbrowne.com/info/multiplexor.html Rules of the Evil Overlord #196. I will hire an expert marksman to stand by the entrance to my fortress. His job will be to shoot anyone who rides up to challenge me. http://www.eviloverlord.com/ ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Caching by Postgres
[EMAIL PROTECTED] (Michael Stone) writes: On Tue, Aug 23, 2005 at 12:38:04PM -0700, Josh Berkus wrote: which have a clear and measurable effect on performance and are fixable without bloating the PG code. Some of these issues (COPY path, context switching Does that include increasing the size of read/write blocks? I've noticed that with a large enough table it takes a while to do a sequential scan, even if it's cached; I wonder if the fact that it takes a million read(2) calls to get through an 8G table is part of that. But behind the scenes, the OS is still going to have to evaluate the is this in cache? question for each and every one of those pages. (Assuming the kernel's page size is 8K; if it's smaller, the number of evaluations will be even higher...) Grouping the read(2) calls together isn't going to have any impact on _that_ evaluation. -- let name=cbbrowne and tld=ntlug.org in name ^ @ ^ tld;; http://www3.sympatico.ca/cbbrowne/finances.html People who don't use computers are more sociable, reasonable, and ... less twisted -- Arthur Norman ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Cheap RAM disk?
[EMAIL PROTECTED] (John A Meinel) writes: I saw a review of a relatively inexpensive RAM disk over at anandtech.com, the Gigabyte i-RAM http://www.anandtech.com/storage/showdoc.aspx?i=2480 And the review shows that it's not *all* that valuable for many of the cases they looked at. Basically, it is a PCI card, which takes standard DDR RAM, and has a SATA port on it, so that to the system, it looks like a normal SATA drive. The card costs about $100-150, and you fill it with your own ram, so for a 4GB (max size) disk, it costs around $500. Looking for solid state storage devices, the cheapest I found was around $5k for 2GB. Gigabyte claims that the battery backup can last up to 16h, which seems decent, if not really long (the $5k solution has a built-in harddrive so that if the power goes out, it uses the battery power to copy the ramdisk onto the harddrive for more permanent storage). Anyway, would something like this be reasonable as a drive for storing pg_xlog? With 4GB you could have as many as 256 checkpoint segments. I'm a little leary as it is definitely a version 1.0 product (it is still using an FPGA as the controller, so they were obviously pushing to get the card into production). What disappoints me is that nobody has tried the CF/RAM answer; rather than putting a hard drive on the board, you put on some form of flash device (CompactFlash or such), where if power fails, it pushes data onto the CF. That ought to be cheaper (both in terms of hardware cost and power consumption) than using a hard disk. But it seems like this might be a decent way to improve insert performance, without setting fsync=false. That's the case which might prove Ludicrously Quicker than any of the sample cases in the review. Probably it should see some serious testing (as in power spikes/pulled plugs, etc). I know the article made some claim that if you actually pull out the card it goes into high consumption mode which is somehow greater than if you leave it in the slot with the power off. Which to me seems like a lot of bull, and really means the 16h is only under best-case circumstances. But even 1-2h is sufficient to handle a simple power outage. Certainly. Anyway, I thought I would mention it to the list, to see if anyone else has heard of it, or has any thoughts on the matter. I'm sure there are some people who are using more expensive ram disks, maybe they have some ideas about what this device is missing. (other than costing about 1/10th the price) Well, if it hits a 2.0 version, it may get interesting... -- (format nil [EMAIL PROTECTED] cbbrowne acm.org) http://www.ntlug.org/~cbbrowne/sap.html Rules of the Evil Overlord #78. I will not tell my Legions of Terror And he must be taken alive! The command will be: ``And try to take him alive if it is reasonably practical.'' http://www.eviloverlord.com/ ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Cheap RAM disk?
[EMAIL PROTECTED] (Jeffrey W. Baker) writes: I haven't tried this product, but the microbenchmarks seem truly slow. I think you would get a similar benefit by simply sticking a 1GB or 2GB DIMM -- battery-backed, of course -- in your RAID controller. Well, the microbenchmarks were pretty pre-sophomoric, essentially trying to express how the device would be useful to a Windows user that *might* play games... I'm sure it's hurt by the fact that it's using a SATA (version 1) interface rather than something faster. Mind you, I'd like to see the product succeed, because they might come up with a version 2 of it that is what I'd really like... -- (format nil [EMAIL PROTECTED] cbbrowne acm.org) http://www.ntlug.org/~cbbrowne/sap.html Rules of the Evil Overlord #78. I will not tell my Legions of Terror And he must be taken alive! The command will be: ``And try to take him alive if it is reasonably practical.'' http://www.eviloverlord.com/ ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Mount database on RAM disk?
[EMAIL PROTECTED] (Stuart Bishop) writes: I'm putting together a road map on how our systems can scale as our load increases. As part of this, I need to look into setting up some fast read only mirrors of our database. We should have more than enough RAM to fit everything into memory. I would like to find out if I could expect better performance by mounting the database from a RAM disk, or if I would be better off keeping that RAM free and increasing the effective_cache_size appropriately. If you were willing to take on a not-inconsiderable risk, I'd think that storing WAL files on a RAMDISK would be likely to be the fastest improvement imaginable. If I could get and deploy some SSD (Solid State Disk) devices that would make this sort of thing *actually safe,* I'd expect that to be a pretty fabulous improvement, at least for write-heavy database activity. I'd also be interested in knowing if this is dependant on whether I am running 7.4, 8.0 or 8.1. Behaviour of all three could be somewhat different, as management of the shared cache has been in flux... -- (format nil [EMAIL PROTECTED] cbbrowne acm.org) http://www.ntlug.org/~cbbrowne/sap.html Rules of the Evil Overlord #78. I will not tell my Legions of Terror And he must be taken alive! The command will be: ``And try to take him alive if it is reasonably practical.'' http://www.eviloverlord.com/ ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Prefetch - OffTopic
[EMAIL PROTECTED] (Mohan, Ross) writes: for time-series and insane fast, nothing beats kdB, I believe www.kx.com ... Which is well and fine if you're prepared to require that all of the staff that interact with data are skilled APL hackers. Skilled enough that they're all ready to leap into Whitney's ASCII-based variant, K. -- (format nil [EMAIL PROTECTED] cbbrowne acm.org) http://www.ntlug.org/~cbbrowne/functional.html Rules of the Evil Overlord #78. I will not tell my Legions of Terror And he must be taken alive! The command will be: ``And try to take him alive if it is reasonably practical.'' http://www.eviloverlord.com/ ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] PGSQL Capacity
[EMAIL PROTECTED] writes: How can i know a capacity of a pg database ? How many records my table can have ? I saw in a message that someone have 50 000 records it's possible in a table ? (My table have 8 string field (length 32 car)). Thanks for your response. The capacity is much more likely to be limited by the size of the disk drives and filesystems you have available to you than by anything else. If your table consists of 8- 32 character strings, then each tuple will consume around 256 bytes of memory, and you will be able to fit on the order of 30 tuples into each 8K page. By default, you can extend a single table file to up to 1GB before it splits off to another piece. That would mean each file can have about 3.9M tuples. From there, you can have as many 1GB pieces as the disk will support. So you could have (plenty * 3.9M tuples), which could add up to be rather large. If you're expecting 50K records, that will be no big deal at all. -- (format nil [EMAIL PROTECTED] cbbrowne acm.org) http://www.ntlug.org/~cbbrowne/sap.html Rules of the Evil Overlord #78. I will not tell my Legions of Terror And he must be taken alive! The command will be: ``And try to take him alive if it is reasonably practical.'' http://www.eviloverlord.com/ ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] batch inserts are slow
[EMAIL PROTECTED] (Christopher Petrilli) writes: On 5/2/05, Tim Terlegård [EMAIL PROTECTED] wrote: Howdy! I'm converting an application to be using postgresql instead of oracle. There seems to be only one issue left, batch inserts in postgresql seem significant slower than in oracle. I have about 200 batch jobs, each consisting of about 14 000 inserts. Each job takes 1.3 seconds in postgresql and 0.25 seconds in oracle. With 200 jobs this means several more minutes to complete the task. By fixing this I think the application using postgresql over all would be faster than when using oracle. Just as on Oracle you would use SQL*Loader for this application, you should use the COPY syntax for PostgreSQL. You will find it a lot faster. I have used it by building the input files and executing 'psql' with a COPY command, and also by using it with a subprocess, both are quite effective. I'd suggest taking a peek at the PGForge project, pgloader http://pgfoundry.org/projects/pgloader/. This is intended to provide somewhat analagous functionality to SQL*Loader; a particularly useful thing about it is that it will load those records that it can, and generate a file consisting of just the failures. It uses COPY, internally, so it does run reasonably fast. To the extent to which it is inadequate, it would be neat to see some enhancements... -- (format nil [EMAIL PROTECTED] cbbrowne acm.org) http://www.ntlug.org/~cbbrowne/sap.html Rules of the Evil Overlord #78. I will not tell my Legions of Terror And he must be taken alive! The command will be: ``And try to take him alive if it is reasonably practical.'' http://www.eviloverlord.com/ ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Index bloat problem?
josh@agliodbs.com (Josh Berkus) writes: Bill, What about if an out-of-the-ordinary number of rows were deleted (say 75% of rows in the table, as opposed to normal 5%) followed by a 'VACUUM ANALYZE'? Could things get out of whack because of that situation? Yes. You'd want to run REINDEX after and event like that. As you should now. Based on Tom's recent comments, I'd be inclined to handle this via doing a CLUSTER, which has the triple heroism effect of: a) Reorganizing the entire table to conform with the relevant index order, b) Having the effect of VACUUM FULL, and c) Having the effect of REINDEX all in one command. It has all of the oops, that blocked me for 20 minutes effect of REINDEX and VACUUM FULL, but at least it doesn't have the effect twice... -- (format nil [EMAIL PROTECTED] cbbrowne acm.org) http://www.ntlug.org/~cbbrowne/sap.html Rules of the Evil Overlord #78. I will not tell my Legions of Terror And he must be taken alive! The command will be: ``And try to take him alive if it is reasonably practical.'' http://www.eviloverlord.com/ ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq