Re: Sequential I/O Cost (was Re: [PERFORM] A Better External Sort?)
>From: "Jeffrey W. Baker" <[EMAIL PROTECTED]> >Sent: Sep 29, 2005 12:33 AM >Subject: Sequential I/O Cost (was Re: [PERFORM] A Better External Sort?) > >On Wed, 2005-09-28 at 12:03 -0400, Ron Peacetree wrote: >>>From: "Jeffrey W. Baker" <[EMAIL PROTECTED]> >>>Perhaps I believe this because you can now buy as much sequential I/O >>>as you want. Random I/O is the only real savings. >>> >> 1= No, you can not "buy as much sequential IO as you want". Even if >> with an infinite budget, there are physical and engineering limits. Long >> before you reach those limits, you will pay exponentially increasing costs >> for linearly increasing performance gains. So even if you _can_ buy a >> certain level of sequential IO, it may not be the most efficient way to >> spend money. > >This is just false. You can buy sequential I/O for linear money up to >and beyond your platform's main memory bandwidth. Even 1GB/sec will >severely tax memory bandwidth of mainstream platforms, and you can >achieve this rate for a modest cost. > I don't think you can prove this statement. A= www.pricewatch.com lists 7200rpm 320GB SATA II HDs for ~$160. ASTR according to www.storagereview.com is ~50MBps. Average access time is ~12-13ms. Absolute TOTL 15Krpm 147GB U320 or FC HDs cost ~4x as much per GB, yet only deliver ~80-90MBps ASTR and average access times of ~5.5-6.0ms. Your statement is clearly false in terms of atomic raw HD performance. B= low end RAID controllers can be obtained for a few $100's. But even amongst them, a $600+ card does not perform 3-6x better than a $100-$200 card. When the low end HW is not enough, the next step in price is to ~$10K+ (ie Xyratex), and the ones after that are to ~$100K+ (ie NetApps) and ~$1M+ (ie EMC, IBM, etc). None of these ~10x steps in price results in a ~10x increase in performance. Your statement is clearly false in terms of HW based RAID performance. C= A commodity AMD64 mainboard with a dual channel DDR PC3200 RAM subsystem has 6.4GBps of bandwidth. These are as common as weeds and almost as cheap: www.pricewatch.com Your statement about commodity systems main memory bandwidth being "severely taxed at 1GBps" is clearly false. D= Xyratecs makes RAID HW for NetApps and EMC. NONE of their current HW can deliver 1GBps. More like 600-700MBps. Engino and Dot Hill have similar limitations on their current products. No PCI or PCI-X based HW could ever do more than ~800-850MBps since that's the RW limit of those busses. Next Gen products are likely to 2x those limits and cross the 1GBps barrier based on ~90MBps SAS or FC HD's and PCI-Ex8 (2GBps max) and PCI-Ex16 (4GBps max). Note that not even next gen or 2 gens from now RAID HW will be able to match the memory bandwidth of the current commodity memory subsystem mentioned in "C" above. Your statement that one can achieve a HD IO rate that will tax RAM bandwidth at modest cost is clearly false. QED Your statement is false on all counts and in all respects. >I have one array that can supply this rate and it has only 15 disks. It >would fit on my desk. I think your dire talk about the limits of >science and engineering may be a tad overblown. > Name it and post its BOM, configuration specs, price and ordering information. Then tell us what it's plugged into and all the same details on _that_. If all 15 HD's are being used for one RAID set, then you can't be using RAID 10, which means any claims re: write performance in particular should be closely examined. A 15 volume RAID 5 made of the fastest 15Krpm U320 or FC HDs, each with ~85.9MBps ASTR, could in theory do ~14*85.9= ~1.2GBps raw ASTR for at least reads, but no one I know of makes commodity RAID HW that can keep up with this, nor can any one PCI-X bus support it even if such commodity RAID HW did exist. Hmmm. SW RAID on at least a PCI-Ex8 bus might be able to do it if we can multiplex enough 4Gbps FC lines (4Gbps= 400MBps => max of 4 of the above HDs per line and 4 FC lines) with low enough latency and have enough CPU driving it...Won't be easy nor cheap though. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Comparative performance
It appears that PostgreSQL is two to three times slower than MySQL. For example, some pages that have some 30,000 characters (when saved as HTML) take 1 to 1 1/2 seconds with MySQL but 3 to 4 seconds with PostgreSQL. I had read that the former was generally faster than the latter, particularly for simple web applications but I was hoping that Postgres' performance would not be that noticeably slower. From my experience, the postgres libraries in PHP are a piece of crap, and add a lot of overhead even from small queries. For instance, a simple query like "SELECT * FROM table WHERE primary_key_id=1234" can take the following time, on my laptop, with data in the filesystem cache of course : EXPLAIN ANALYZE <0.1 ms python + psycopg 2 0.1 ms (damn fast) php + mysql 0.3 ms php + postgres 1-2 ms (damn slow) So, if your pages are designed in The PHP Way (ie. a large number of small queries), I might suggest using a language with a decent postgres interface (python, among others), or rewriting your bunches of small queries as Stored Procedures or Joins, which will provide large speedups. Doing >50 queries on a page is always a bad idea, but it's tolerable in php-mysql, not in php-postgres. If it's only one large query, there is a problem, as postgres is usually a lot smarter about query optimization. If you use the usual mysql techniques (like, storing a page counter in a row in a table, or storing sessions in a table) beware, these are no-nos for postgres, these things should NOT be done with a database anyway, try memcached for instance. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] [PERFORM] A Better External Sort?
Your main example seems to focus on a large table where a key column has constrained values. This case is interesting in proportion to the number of possible values. If I have billions of rows, each having one of only two values, I can think of a trivial and very fast method of returning the table "sorted" by that key: make two sequential passes, returning the first value on the first pass and the second value on the second pass. This will be faster than the method you propose. 1= No that was not my main example. It was the simplest example used to frame the later more complicated examples. Please don't get hung up on it. 2= You are incorrect. Since IO is the most expensive operation we can do, any method that makes two passes through the data at top scanning speed will take at least 2x as long as any method that only takes one such pass. You do not get the point. As the time you get the sorted references to the tuples, you need to fetch the tuples themself, check their visbility, etc. and returns them to the client. So, if there is only 2 values in the column of big table that is larger than available RAM, two seq scans of the table without any sorting is the fastest solution. Cordialement, Jean-Gérard Pailloncy ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Comparative performance
Magnus Hagander wrote: That actually depends a lot on *how* you use it. I've seen pg-on-windows deployments that come within a few percent of the linux performance. I've also seen those that are absolutely horrible compared. One sure way to kill the performance is to do a lot of small connections. Using persistent connection is even more important on Windows than it is on Unix. It could easily explain a difference like this. I just tried using pg_pconnect() and I didn't notice any significant improvement. What bothers me most is that with Postgres I tend to see jerky behavior on almost every page: the upper 1/2 or 2/3 of the page is displayed first and you can see a blank bottom (or you can see a half-filled completion bar). With MySQL each page is generally displayed in one swoop. Joe ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Comparative performance
On Thu, Sep 29, 2005 at 08:16:11AM -0400, Joe wrote: > I just tried using pg_pconnect() and I didn't notice any significant > improvement. PHP persistent connections are not really persistent -- or so I've been told. Anyhow, what was discussed here was pg_query, not pg_connect. You really want to reduce the number of pg_query() calls in any case; you haven't told us how many there are yet, but it sounds like there are a lot of them. > What bothers me most is that with Postgres I tend to see jerky behavior on > almost every page: the upper 1/2 or 2/3 of the page is displayed first and > you can see a blank bottom (or you can see a half-filled completion bar). > With MySQL each page is generally displayed in one swoop. This might just be your TCP/IP stack finding out that the rest of the page isn't likely to come anytime soon, and start sending it out... or something else. I wouldn't put too much weight on it, it's likely to go away as soon as you fix the rest of the problem. /* Steinar */ -- Homepage: http://www.sesse.net/ ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Comparative performance
On Thu, 29 Sep 2005, Joe wrote: > Magnus Hagander wrote: > > That actually depends a lot on *how* you use it. I've seen pg-on-windows > > deployments that come within a few percent of the linux performance. > > I've also seen those that are absolutely horrible compared. > > > > One sure way to kill the performance is to do a lot of small > > connections. Using persistent connection is even more important on > > Windows than it is on Unix. It could easily explain a difference like > > this. > > I just tried using pg_pconnect() and I didn't notice any significant > improvement. What bothers me most is that with Postgres I tend to see jerky > behavior on almost every page: the upper 1/2 or 2/3 of the page is displayed > first and you can see a blank bottom (or you can see a half-filled completion > bar). With MySQL each page is generally displayed in one swoop. Please post the table definitions, queries and explain analyze results so we can tell you why the performance is poor. Gavin ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Comparative performance
PFC wrote: From my experience, the postgres libraries in PHP are a piece of crap, and add a lot of overhead even from small queries. For instance, a simple query like "SELECT * FROM table WHERE primary_key_id=1234" can take the following time, on my laptop, with data in the filesystem cache of course : EXPLAIN ANALYZE<0.1 ms python + psycopg 20.1 ms (damn fast) php + mysql0.3 ms php + postgres1-2 ms (damn slow) As a Trac user I was considering moving to Python, so it's good to know that, but the rewrite is a longer term project. So, if your pages are designed in The PHP Way (ie. a large number of small queries), I might suggest using a language with a decent postgres interface (python, among others), or rewriting your bunches of small queries as Stored Procedures or Joins, which will provide large speedups. Doing >50 queries on a page is always a bad idea, but it's tolerable in php-mysql, not in php-postgres. The pages do use a number of queries to collect all the data for display but nowhere near 50. I'd say it's probably less than a dozen. As an aside, one of my tasks (before the conversion) was to analyze the queries and see where they could be tweaked for performance, but with MySQL that was never a top priority. The schema is fairly simple having two main tables: topic and entry (sort of like account and transaction in an accounting scenario). There are two additional tables that perhaps could be merged into the entry table (and that would reduce the number of queries) but I do not want to make major changes to the schema (and the app) for the PostgreSQL conversion. Joe ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Comparative performance
Gavin Sherry wrote: Please post the table definitions, queries and explain analyze results so we can tell you why the performance is poor. I did try to post that last night but apparently my reply didn't make it to the list. Here it is again: Matthew Nuzum wrote: > This is the right list. Post detail and I'm sure you'll get some suggestions. Thanks, Matthew (and Chris and Gavin). The main table used in the query is defined as follows: CREATE TABLE entry ( entry_id serial PRIMARY KEY, title VARCHAR(128) NOT NULL, subtitle VARCHAR(128), subject_type SMALLINT, subject_id INTEGER REFERENCES topic, actor_type SMALLINT, actor_id INTEGER REFERENCES topic, actor VARCHAR(64), actor_role VARCHAR(64), rel_entry_id INTEGER, rel_entry VARCHAR(64), description VARCHAR(255), quote text, url VARCHAR(255), entry_date CHAR(10), created DATE NOT NULL DEFAULT CURRENT_DATE, updated TIMESTAMP WITH TIME ZONE NOT NULL DEFAULT CURRENT_TIMESTAMP) WITHOUT OIDS; CREATE INDEX entry_actor_id ON entry (actor_id); CREATE INDEX entry_subject_id ON entry (subject_id); It has 3422 rows at this time. The query for one of the pages is the following: SELECT entry_id, subject_type AS type, subject_type, subject_id, actor_type, actor_id, actor, actor_role, rel_entry_id, rel_entry, title, subtitle, description, url, quote AS main_quote, NULL AS rel_quote, substring(entry_date from 8) AS dom, substring(entry_date from 1) AS date_ymd, substring(entry_date from 1 for 7) AS date_ym, substring(entry_date from 1 for 4) AS date_y, created, updated FROM entry WHERE subject_id = 1079 UNION SELECT entry_id, actor_type AS type, subject_type, subject_id, actor_type, actor_id, actor, actor_role, rel_entry_id, rel_entry, title, subtitle, description, url, quote AS main_quote, NULL AS rel_quote, substring(entry_date from 8) AS dom, substring(entry_date from 1) AS date_ymd, substring(entry_date from 1 for 7) AS date_ym, substring(entry_date from 1 for 4) AS date_y, created, updated FROM entry WHERE actor_id = 1079 ORDER BY type, title, subtitle; The output of EXPLAIN ANALYZE is: Sort (cost=158.98..159.14 rows=62 width=568) (actual time=16.000..16.000 rows=59 loops=1) Sort Key: "type", title, subtitle -> Unique (cost=153.57..157.14 rows=62 width=568) (actual time=16.000..16.000 rows=59 loops=1) -> Sort (cost=153.57..153.73 rows=62 width=568) (actual time=16.000..16.000 rows=59 loops=1) Sort Key: entry_id, "type", subject_type, subject_id, actor_type, actor_id, actor, actor_role, rel_entry_id, rel_entry, title, subtitle, description, url, main_quote, rel_quote, dom, date_ymd, date_ym, date_y, created, updated -> Append (cost=0.00..151.73 rows=62 width=568) (actual time=0.000..16.000 rows=59 loops=1) -> Subquery Scan "*SELECT* 1" (cost=0.00..17.21 rows=4 width=568) (actual time=0.000..0.000 rows=3 loops=1) -> Index Scan using entry_subject_id on entry (cost=0.00..17.17 rows=4 width=568) (actual time=0.000..0.000 rows=3 loops=1) Index Cond: (subject_id = 1079) -> Subquery Scan "*SELECT* 2" (cost=0.00..134.52 rows=58 width=568) (actual time=0.000..16.000 rows=56 loops=1) -> Seq Scan on entry (cost=0.00..133.94 rows=58 width=568) (actual time=0.000..16.000 rows=56 loops=1) Filter: (actor_id = 1079) Total runtime: 16.000 ms (13 rows) What I don't quite understand is why it's doing a sequential scan on actor_id instead of using the entry_actor_id index. Note that actor_id has 928 non-null values (27%), whereas subject_id has 3089 non-null values (90%). Note that the entry_date column was originally a MySQL date but it had partial dates, i.e., some days and months are set to zero. Eventually I hope to define a PostgreSQL datatype for it and to simplify the substring retrievals. However, I don't think the extra computational time should affect the overall runtime significantly. Gavin, I'm using PostgreSQL 8.0.3, Apache 1.3.28, PHP 4.3.4, MySQL 4.0.16 and I'm comparing both databases on XP (on a Pentium 4, 1.6 GHz, 256 MB RAM). Thanks for any feedback. Joe ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Comparative performance
Joe wrote: The pages do use a number of queries to collect all the data for display but nowhere near 50. I'd say it's probably less than a dozen. The schema is fairly simple having two main tables: topic and entry (sort of like account and transaction in an accounting scenario). There are two additional tables that perhaps could be merged into the entry table Hm, if you only have 4 tables, why do you need 12 queries? To reduce queries, join them in the query; no need to merge them physically. If you have only two main tables, I'd bet you only need 1-2 queries for the whole page. Regards, Andreas ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Delete query takes exorbitant amount of time
Mark Lewis wrote: > I imported my test dataset > and was almost immediately able to track down the cause of my > performance problem. Why don't you tell us what the problem was :-) ? Regards Gaetano Mendola ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] [HACKERS] Query in SQL statement
CREATE SEQUENCE ai_id; CREATE TABLE badusers ( id int DEFAULT nextval('ai_id') NOT NULL, UserName varchar(30), Date datetime DEFAULT '-00-00 00:00:00' NOT NULL, Reason varchar(200), Admin varchar(30) DEFAULT '-', PRIMARY KEY (id), KEY UserName (UserName), KEY Date (Date) ); Am always getting foll. Errors, ERROR: relation "ai_id" already exists ERROR: syntax error at or near "(" at character 240 You have just copied the Mysql code to Postgresql. It will in no way work. Your default for 'Date' is illegal in postgresql and hence it must allow NULLs. There is no such thing as a 'datetime' type. There is no such thing as 'Key'. Also your mixed case identifiers won't be preserved. You want: CREATE TABLE badusers ( id SERIAL PRIMARY KEY, UserName varchar(30), Date timestamp, Reason varchar(200), Admin varchar(30) DEFAULT '-' ); CREATE INDEX UserName_Idx ON badusers(Username); CREATE INDEX Date_Idx ON badusers(Date); ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Monitoring Postgresql performance
Arnau wrote: > Hi all, > > I have been "googling" a bit searching info about a way to monitor > postgresql (CPU & Memory, num processes, ... ) and I haven't found > anything relevant. I'm using munin to monitor others parameters of my > servers and I'd like to include postgresql or have a similar tool. Any > of you is using anything like that? all kind of hints are welcome :-) > > Cheers! We use Cricket + Nagios ( new Netsaint release ). Regards Gaetano Mendola ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Advice on RAID card
I think the answer is simple if the question is low end Raid card or software ? go on the software and youll get better performance. If this is a high end server i wouldnt think twice. HW RAID is a must and not only because the performance but because the easynes ( hot swap and such ) and the battery Ron Peacetree wrote: While I understand being economical, at some point one crosses the line to being penny wise and pound foolish. How much is the data on this server going to be worth? How much much will it cost you to recover or restore it (assuming that is even possible if you lose it)? If your data is worth nothing or the cost to recover or restore it is negligible, then you don't need (nor should want) a DB server. You'll get higher performance at less cost via a number of other methods. OTOH, if you data _does_ have value by any of the above metrics, then it is worth it to pay attention to reliable, safe, fast, physical IO. Battery backed HD caches of appropriate size are usually well worth the $, as they pay for themselves (and then some) with the first data loss they prevent. RAID 5 means you are _always_ only 2 HDs from data loss, and 1 HD from a serious performance hit. Part of the trade-off with using SATA HDs that cost 1/3-1/4 their U320 15Krpm brethren is that such circumstances are +FAR+ more likely with SATA HDs. If you are not going to use RAID 10 because of cost issues, then spend the $ to get the biggest battery backed cache you can afford and justify as being cheaper than what the proper RAID 6 or RAID 10 setup would cost you. Even if you are going to use SW RAID and the controller will just be a JBOD controller. On the general subject of costs... At this writing, SODIMM RAM costs ~$100 (US) per GB. Standard DIMMs cost ~$75 per GB unless you buy 4GB ones, in which case they cost ~$100 per GB. The "sweet spot" in SATA HD pricing is ~$160 for 320GB at 7200rpm (don't buy the 36GB or 74GB WD Raptors, they are no longer worth it). If you are careful you can get SATA HD's with 16MB rather than 8MB buffers for that price. Each such HD will give you ~50MB/s of raw Average Sustained Transfer Rate. Decent x86 compatible CPUs are available for ~$200-$400 apiece. Rarely will a commodity HW DB server need a more high end CPU. Some of the above numbers rate to either fall to 1/2 cost or 2x in value for the dollar within the next 6-9 months, and all of them will within the next 18 months. And so will RAID controller costs. Your salary will hopefully not degrade at that rate, and it is unlikely that your value for the dollar will increase at that rate. Nor is it likely that data worth putting on a DB server will do so. Figure out what your required performance and reliability for the next 18 months is going to be, and buy the stuff from the above list that will sustain that. No matter what. Anything less rates _highly_ to end up costing you and your organization more money within the next 18months than you will "save" in initial acquisition cost. Ron -Original Message- From: PFC <[EMAIL PROTECTED]> Sent: Sep 24, 2005 12:27 PM Subject: Re: [PERFORM] Advice on RAID card It looks like a rebranded low end Adaptec 64MB PCI-X <-> SATA RAID card. Looks like the 64MB buffer is not upgradable. Looks like it's SATA, not SATA II Yeah, that's exactly what it is. I can get one for 150 Euro, the Areca is at least 600. This is for a budget server so while it would be nice to have all the high-tech stuff, it's not the point. My question was raher, is it one of the crap RAID5 cards which are actually SLOWER than plain IDE disks, or is it decent, even though low-end (and cheap), and worth it compared to software RAID5 ? Assuming you are not building 1U boxes, get one of the full height cards and order it with the maximum size buffer you can afford. The cards take 1 SODIMM, so that will be a max of 1GB or 2GB depending on whether 2GB SODIMMs are available to you yet. It's for a budget dev server which should have RAID5 for reliability, but not necessarily stellar performance (and price). I asked about this card because I can get one at a good price. Thanks for taking the time to answer. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org -- -- Canaan Surfing Ltd. Internet Service Providers Ben-Nes Michael - Manager Tel: 972-4-6991122 Cel: 972-52-8555757 Fax: 972-4-6990098 http://www.canaan.net.il -- ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Comparative performance
I just tried using pg_pconnect() and I didn't notice any significant improvement. What bothers me most is that with Postgres I tend to see jerky behavior on almost every page: the upper 1/2 or 2/3 of the page is displayed first and you can see a blank bottom (or you can see a half-filled completion bar). With MySQL each page is generally displayed in one swoop. Persistent connections are useful when your page is fast and the connection time is an important part of your page time. It is mandatory if you want to serve more than 20-50 hits/s without causing unnecessary load on the database. This is not your case, which is why you don't notice any improvement... ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] [PERFORM] A Better External Sort?
Just to add a little anarchy in your nice debate... Who really needs all the results of a sort on your terabyte table ? I guess not many people do a SELECT from such a table and want all the results. So, this leaves : - Really wanting all the results, to fetch using a cursor, - CLUSTER type things, where you really want everything in order, - Aggregates (Sort->GroupAggregate), which might really need to sort the whole table. - Complex queries where the whole dataset needs to be examined, in order to return a few values - Joins (again, the whole table is probably not going to be selected) - And the ones I forgot. However, Most likely you only want to SELECT N rows, in some ordering : - the first N (ORDER BY x LIMIT N) - last N (ORDER BY x DESC LIMIT N) - WHERE x>value ORDER BY x LIMIT N - WHERE x Or, you are doing a Merge JOIN against some other table ; in that case, yes, you might need the whole sorted terabyte table, but most likely there are WHERE clauses in the query that restrict the set, and thus, maybe we can get some conditions or limit values on the column to sort. Also the new, optimized hash join, which is more memory efficient, might cover this case. Point is, sometimes, you only need part of the results of your sort. And the bigger the sort, the most likely it becomes that you only want part of the results. So, while we're in the fun hand-waving, new algorithm trying mode, why not consider this right from the start ? (I know I'm totally in hand-waving mode right now, so slap me if needed). I'd say your new, fancy sort algorithm needs a few more input values : - Range of values that must appear in the final result of the sort : none, minimum, maximum, both, or even a set of values from the other side of the join, hashed, or sorted. - LIMIT information (first N, last N, none) - Enhanced Limit information (first/last N values of the second column to sort, for each value of the first column) (the infamous "top10 by category" query) - etc. With this, the amount of data that needs to be kept in memory is dramatically reduced, from the whole table (even using your compressed keys, that's big) to something more manageable which will be closer to the size of the final result set which will be returned to the client, and avoid a lot of effort. So, this would not be useful in all cases, but when it applies, it would be really useful. Regards ! ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Comparative performance
Total runtime: 16.000 ms Even though this query isn't that optimized, it's still only 16 milliseconds. Why does it take this long for PHP to get the results ? Can you try pg_query'ing this exact same query, FROM PHP, and timing it with getmicrotime() ? You can even do an EXPLAIN ANALYZE from pg_query and display the results in your webpage, to check how long the query takes on the server. You can also try it on a Linux box. This smells like a TCP communication problem. ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] [PERFORM] A Better External Sort?
Jeff, Ron, First off, Jeff, please take it easy. We're discussing 8.2 features at this point and there's no reason to get stressed out at Ron. You can get plenty stressed out when 8.2 is near feature freeze. ;-) Regarding use cases for better sorts: The biggest single area where I see PostgreSQL external sort sucking is on index creation on large tables. For example, for free version of TPCH, it takes only 1.5 hours to load a 60GB Lineitem table on OSDL's hardware, but over 3 hours to create each index on that table. This means that over all our load into TPCH takes 4 times as long to create the indexes as it did to bulk load the data. Anyone restoring a large database from pg_dump is in the same situation. Even worse, if you have to create a new index on a large table on a production database in use, because the I/O from the index creation swamps everything. Following an index creation, we see that 95% of the time required is the external sort, which averages 2mb/s. This is with seperate drives for the WAL, the pg_tmp, the table and the index. I've confirmed that increasing work_mem beyond a small minimum (around 128mb) had no benefit on the overall index creation speed. --Josh Berkus ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] [PERFORM] A Better External Sort?
Josh, On 9/29/05 9:54 AM, "Josh Berkus" wrote: > Following an index creation, we see that 95% of the time required is the > external sort, which averages 2mb/s. This is with seperate drives for > the WAL, the pg_tmp, the table and the index. I've confirmed that > increasing work_mem beyond a small minimum (around 128mb) had no benefit > on the overall index creation speed. Yp! That about sums it up - regardless of taking 1 or 2 passes through the heap being sorted, 1.5 - 2 MB/s is the wrong number. This is not necessarily an algorithmic problem, but is a optimization problem with Postgres that must be fixed before it can be competitive. We read/write to/from disk at 240MB/s and so 2 passes would run at a net rate of 120MB/s through the sort set if it were that efficient. Anyone interested in tackling the real performance issue? (flame bait, but for a worthy cause :-) - Luke ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [PERFORM] A Better External Sort?
On Thu, 2005-09-29 at 10:06 -0700, Luke Lonergan wrote: > Josh, > > On 9/29/05 9:54 AM, "Josh Berkus" wrote: > > > Following an index creation, we see that 95% of the time required is the > > external sort, which averages 2mb/s. This is with seperate drives for > > the WAL, the pg_tmp, the table and the index. I've confirmed that > > increasing work_mem beyond a small minimum (around 128mb) had no benefit > > on the overall index creation speed. > > Yp! That about sums it up - regardless of taking 1 or 2 passes through > the heap being sorted, 1.5 - 2 MB/s is the wrong number. Yeah this is really bad ... approximately the speed of GNU sort. Josh, do you happen to know how many passes are needed in the multiphase merge on your 60GB table? Looking through tuplesort.c, I have a couple of initial ideas. Are we allowed to fork here? That would open up the possibility of using the CPU and the I/O in parallel. I see that tuplesort.c also suffers from the kind of postgresql-wide disease of calling all the way up and down a big stack of software for each tuple individually. Perhaps it could be changed to work on vectors. I think the largest speedup will be to dump the multiphase merge and merge all tapes in one pass, no matter how large M. Currently M is capped at 6, so a sort of 60GB with 1GB sort memory needs 13 passes over the tape. It could be done in a single pass heap merge with N*log(M) comparisons, and, more importantly, far less input and output. I would also recommend using an external processes to asynchronously feed the tuples into the heap during the merge. What's the timeframe for 8.2? -jwb ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [HACKERS] [PERFORM] A Better External Sort?
Jeff, > Josh, do you happen to know how many passes are needed in the multiphase > merge on your 60GB table? No, any idea how to test that? > I think the largest speedup will be to dump the multiphase merge and > merge all tapes in one pass, no matter how large M. Currently M is > capped at 6, so a sort of 60GB with 1GB sort memory needs 13 passes over > the tape. It could be done in a single pass heap merge with N*log(M) > comparisons, and, more importantly, far less input and output. Yes, but the evidence suggests that we're actually not using the whole 1GB of RAM ... maybe using only 32MB of it which would mean over 200 passes (I'm not sure of the exact match). Just fixing our algorithm so that it used all of the work_mem permitted might improve things tremendously. > I would also recommend using an external processes to asynchronously > feed the tuples into the heap during the merge. > > What's the timeframe for 8.2? Too far out to tell yet. Probably 9mo to 1 year, that's been our history. -- --Josh Josh Berkus Aglio Database Solutions San Francisco ---(end of broadcast)--- TIP 6: explain analyze is your friend
[PERFORM] SQL Function performance
Hi All, I have a SQL function like : CREATE OR REPLACE FUNCTION fn_get_yetkili_inisyer_listesi(int4, int4) RETURNS SETOF kod_adi_liste_type AS $BODY$ SELECT Y.KOD,Y.ADI FROM T_YER Y WHERE EXISTS (SELECT 1 FROM T_GUZER G WHERE (G.BIN_YER_KOD = $1 OR COALESCE($1,0)=0) AND FN_FIRMA_ISVISIBLE(G.FIRMA_NO,$2) = 1 AND G.IN_YER_KOD = Y.KOD) AND Y.IPTAL = 'H'; $BODY$ LANGUAGE 'sql' VOLATILE; When i use like "SELECT * FROM fn_get_yetkili_inisyer_listesi(1, 3474)" and planner result is "Function Scan on fn_get_yetkili_inisyer_listesi (cost=0.00..12.50 rows=1000 width=36) (1 row) " and it runs very slow. But when i use like "SELECT Y.KOD,Y.ADI FROM T_YER Y WHERE EXISTS (SELECT 1 FROM T_GUZER G WHERE (G.BIN_YER_KOD = 1 OR COALESCE(1,0)=0) AND FN_FIRMA_ISVISIBLE(G.FIRMA_NO,3474) = 1 AND G.IN_YER_KOD = Y.KOD) AND Y.IPTAL = 'H';" planner result : " QUERY PLAN - Seq Scan on t_yer y (cost=0.00..3307.79 rows=58 width=14) Filter: (((iptal)::text = 'H'::text) AND (subplan)) SubPlan -> Index Scan using t_guzer_ucret_giris_performans_idx on t_guzer g (cost =0.00..28.73 rows=1 width=0) Index Cond: ((bin_yer_kod = 1) AND (in_yer_kod = $0)) Filter: (fn_firma_isvisible(firma_no, 3474) = 1) (6 rows) " and it runs very fast. Any idea ? Adnan DURSUN ASRIN Bilişim Hiz.Ltd. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Monitoring Postgresql performance
On 9/28/05, Matthew Nuzum <[EMAIL PROTECTED]> wrote: > On 9/28/05, Arnau <[EMAIL PROTECTED]> wrote: > > Hi all, > > > >I have been "googling" a bit searching info about a way to monitor > > postgresql (CPU & Memory, num processes, ... ) and I haven't found > > anything relevant. I'm using munin to monitor others parameters of my > > servers and I'd like to include postgresql or have a similar tool. Any > > of you is using anything like that? all kind of hints are welcome :-) We are also using cricket + nagios. On each DB server: Setup snmpd and use snmpd.conf to set disk quotas and mark processes that need to be running (like postmaster,syslog,sshd) On the monitoring server(s): Use cricket for long term trends & graphs. Use nagios for current status and alerting and some trending. (Nagios has plugins over SNMP for load,cpu,memory,disk and processes) Here's the nagios plugins I have hacked up over the past few months and what they do. I'd imagine some could use better names. I can provide these of package them up if anyone is interested. check_pgconn.pl - Shows percentage of connections available. It uses "SELECT COUNT(*) FROM pg_stat_activity" / "SHOW max_connections". It can also alert when less than a certain number of connections are available. check_pgqueries.pl - If you have query logging enabled this summarizes the types of queries running (SELECT ,INSERT ,DELETE ,UPDATE ,ALTER ,CREATE ,TRUNCATE, VACUUM, COPY) and warns if any queries have been running longer than 5 minutes (configurable). check_pglocks.pl - Look for locks that block and for baselining lock activity. check_pgtime.pl - Makes sure that postgresql's time is in sync with the monitoring server. check_pgqueries.pl - Whines if any queries are in the "waiting" state. The script that runs on each DB server does "ps auxww | grep postgres | grep -i "[W]aiting"" and exposes that through SNMP using the exec functionality. Nagios then alerts if queries are being blocked. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Comparative performance
Andreas Pflug wrote: Hm, if you only have 4 tables, why do you need 12 queries? To reduce queries, join them in the query; no need to merge them physically. If you have only two main tables, I'd bet you only need 1-2 queries for the whole page. There are more than four tables and the queries are not functionally overlapping. As an example, allow me to refer to the page www.freedomcircle.com/topic.php/Economists. The top row of navigation buttons (Life, Liberty, etc.) is created from a query of the 'topic' table. It could've been hard-coded as a PHP array, but with less flexibility. The alphabetical links are from a SELECT DISTINCT substring from topic. It could've been generated by a PHP for loop (originally implemented that way) but again with less flexibility. The listing of economists is another SELECT from topic. The subheadings (Articles, Books) come from a SELECT of an entry_type table --which currently has 70 rows-- and is read into a PHP array since we don't know what headings will be used in a given page. The detail of the entries comes from that query that I posted earlier, but there are three additional queries that are used for specialized entry types (relationships between topics --e.g., Prof. Williams teaches at George Mason, events, and multi-author or multi-subject articles and books). And there's yet another table for the specific book information. Once the data is retrieved it's sorted internally with PHP, at the heading level, before display. Maybe there is some way to merge all the queries (some already fairly complex) that fetch the data for the entries box but I believe it would be a monstrosity with over 100 lines of SQL. Thanks, Joe ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Comparative performance
PFC wrote: Even though this query isn't that optimized, it's still only 16 milliseconds. Why does it take this long for PHP to get the results ? Can you try pg_query'ing this exact same query, FROM PHP, and timing it with getmicrotime() ? Thanks, that's what I was looking for. It's microtime(), BTW. It'll take me some time to instrument it, but that way I can pinpoint what is really slow. You can even do an EXPLAIN ANALYZE from pg_query and display the results in your webpage, to check how long the query takes on the server. You can also try it on a Linux box. My current host only supports MySQL. I contacted hub.org to see if they could assist in this transition but I haven't heard back. This smells like a TCP communication problem. I'm puzzled by that remark. How much does TCP get into the picture in a local Windows client/server environment? Joe ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Monitoring Postgresql performance
Hi, impressive But you forgot to include those scipts as attachment or they got lost somehow ;-) could you post them (again)? thanx, Juraj Am Donnerstag, den 29.09.2005, 13:02 -0700 schrieb Tony Wasson: > On 9/28/05, Matthew Nuzum <[EMAIL PROTECTED]> wrote: > > On 9/28/05, Arnau <[EMAIL PROTECTED]> wrote: > > > Hi all, > > > > > >I have been "googling" a bit searching info about a way to monitor > > > postgresql (CPU & Memory, num processes, ... ) and I haven't found > > > anything relevant. I'm using munin to monitor others parameters of my > > > servers and I'd like to include postgresql or have a similar tool. Any > > > of you is using anything like that? all kind of hints are welcome :-) > > We are also using cricket + nagios. > > On each DB server: Setup snmpd and use snmpd.conf to set disk quotas > and mark processes that need to be running (like > postmaster,syslog,sshd) > > On the monitoring server(s): Use cricket for long term trends & > graphs. Use nagios for current status and alerting and some trending. > (Nagios has plugins over SNMP for load,cpu,memory,disk and processes) > > Here's the nagios plugins I have hacked up over the past few months > and what they do. I'd imagine some could use better names. I can > provide these of package them up if anyone is interested. > > check_pgconn.pl - Shows percentage of connections available. It uses > "SELECT COUNT(*) FROM pg_stat_activity" / "SHOW max_connections". It > can also alert when less than a certain number of connections are > available. > > check_pgqueries.pl - If you have query logging enabled this summarizes > the types of queries running (SELECT ,INSERT ,DELETE ,UPDATE ,ALTER > ,CREATE ,TRUNCATE, VACUUM, COPY) and warns if any queries have been > running longer than 5 minutes (configurable). > > check_pglocks.pl - Look for locks that block and for baselining lock activity. > > check_pgtime.pl - Makes sure that postgresql's time is in sync with > the monitoring server. > > check_pgqueries.pl - Whines if any queries are in the "waiting" state. > The script that runs on each DB server does "ps auxww | grep postgres > | grep -i "[W]aiting"" and exposes that through SNMP using the exec > functionality. Nagios then alerts if queries are being blocked. > > ---(end of broadcast)--- > TIP 1: if posting/reading through Usenet, please send an appropriate >subscribe-nomail command to [EMAIL PROTECTED] so that your >message can get through to the mailing list cleanly > ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Monitoring Postgresql performance
Arnau wrote: Hi all, I have been "googling" a bit searching info about a way to monitor postgresql (CPU & Memory, num processes, ... ) and I haven't found anything relevant. I'm using munin to monitor others parameters of my servers and I'd like to include postgresql or have a similar tool. Any of you is using anything like that? all kind of hints are welcome :-) Probably, as you said, this is not so much relevant, as it is something at *early* stages of usability :-) but have you looked at pgtop? The basic requirement is that you enable your postmaster stats collector and query command strings. Here is the first announcement email: http://archives.postgresql.org/pgsql-announce/2005-05/msg0.php And its home on the CPAN: http://search.cpan.org/dist/pgtop/pgtop -- Cosimo ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[PERFORM] How to Trigger An Automtic Vacuum on Selected Tables
I am running version 8.0.1 on Windows 2003. I have an application that subjects PostgreSQL to sudden bursts of activity at times which cannot be predicted. The bursts are significant enough to cause performance degradation, which can be fixed by a 'vacuum analyze'. I am aware of the existence and contents of tables like pg_class. QUESTION: I would like to trigger a vacuum analyze process on a table whenever it gets a large enough burst of activity to warrant it. Using the data in pg_class (like the number of pages the system found the last time it was vacuumed / analyzed), I would like to compare those statistics to current size, and trigger a vacuum/analyze on a table if needed. Does anyone know of any available tools, or an approach I could use, to determine what the CURRENT SIZE is ? ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] Is There Any Way ....
... to do the following: (1) Make a table memory-resident only ? (2) Set up user variables in memory that are persistent across all sessions, for as long as the database is up and running ? (3) Assure that a disk-based table is always in memory (outside of keeping it in memory buffers as a result of frequent activity which would prevent LRU operations from taking it out) ? ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] How to Trigger An Automtic Vacuum on Selected Tables
Autovacuum does exactly what I understood you want :-) -Mensaje original- De: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] nombre de Lane Van Ingen Enviado el: jueves, 29 de septiembre de 2005 20:06 Para: pgsql-performance@postgresql.org Asunto: [PERFORM] How to Trigger An Automtic Vacuum on Selected Tables I am running version 8.0.1 on Windows 2003. I have an application that subjects PostgreSQL to sudden bursts of activity at times which cannot be predicted. The bursts are significant enough to cause performance degradation, which can be fixed by a 'vacuum analyze'. I am aware of the existence and contents of tables like pg_class. QUESTION: I would like to trigger a vacuum analyze process on a table whenever it gets a large enough burst of activity to warrant it. Using the data in pg_class (like the number of pages the system found the last time it was vacuumed / analyzed), I would like to compare those statistics to current size, and trigger a vacuum/analyze on a table if needed. Does anyone know of any available tools, or an approach I could use, to determine what the CURRENT SIZE is ? ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Is There Any Way ....
1) AFAIK, no. Just in case you are thinking "There should be a way coz I know it will be used all the time", you must know that postgresql philosophy is "I'm smarter than you". If table is used all the time, it will be in memory, if not, it won't waste memory. 2) don't know. 3) see number 1) Of course, you could run into a pathological case where table is queried just before being taken out of memory. But it means, the table isn't queried all the time... Greetings... -Mensaje original- De: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] nombre de Lane Van Ingen Enviado el: jueves, 29 de septiembre de 2005 20:21 Para: pgsql-performance@postgresql.org Asunto: [PERFORM] Is There Any Way ... to do the following: (1) Make a table memory-resident only ? (2) Set up user variables in memory that are persistent across all sessions, for as long as the database is up and running ? (3) Assure that a disk-based table is always in memory (outside of keeping it in memory buffers as a result of frequent activity which would prevent LRU operations from taking it out) ? ---(end of broadcast)--- TIP 6: explain analyze is your friend ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Is There Any Way ....
Quoting Lane Van Ingen <[EMAIL PROTECTED]>: > ... to do the following: > (1) Make a table memory-resident only ? Put it on a RAM filesystem. On Linux, shmfs. On *BSD, mfs. Solaris, tmpfs. > (2) Set up user variables in memory that are persistent across all > sessions, for > as long as the database is up and running ? This sounds like a client thing? Dunno. > (3) Assure that a disk-based table is always in memory (outside of > keeping > it in > memory buffers as a result of frequent activity which would prevent > LRU > operations from taking it out) ? > Put on RAM fs (like question 1). Basically, RAM filesystems are on RAM, meaning you need to have enough physical memory to support them. And of course their contents completely disappear between reboots, so you'll need a way to populate them on bootup and make sure that your updates go to a real nonvolatile storage medium (like disks). And you might get swapping on some types of memory filesystems--Solaris' tmpfs is carved out of virtual memory, which means it will cause swapping if tmpfs contents plus the rest of your applications exceed physical memory. > > > ---(end of broadcast)--- > TIP 6: explain analyze is your friend > ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Is There Any Way ....
On Thu, Sep 29, 2005 at 07:21:08PM -0400, Lane Van Ingen wrote: > (1) Make a table memory-resident only ? You might want to look into memcached, but it's impossible to say whether it will fit your needs or not without more details. /* Steinar */ -- Homepage: http://www.sesse.net/ ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] [PERFORM] A Better External Sort?
>From: Pailloncy Jean-Gerard <[EMAIL PROTECTED]> >Sent: Sep 29, 2005 7:11 AM >Subject: Re: [HACKERS] [PERFORM] A Better External Sort? >>>Jeff Baker: >>>Your main example seems to focus on a large table where a key >>>column has constrained values. This case is interesting in >>>proportion to the number of possible values. If I have billions >>>of rows, each having one of only two values, I can think of a >>>trivial and very fast method of returning the table "sorted" by >>>that key: make two sequential passes, returning the first value >>>on the first pass and the second value on the second pass. >>> This will be faster than the method you propose. >> >>Ron Peacetree: >>1= No that was not my main example. It was the simplest example >>used to frame the later more complicated examples. Please don't >>get hung up on it. >> >>2= You are incorrect. Since IO is the most expensive operation we >>can do, any method that makes two passes through the data at top >>scanning speed will take at least 2x as long as any method that only >>takes one such pass. > >You do not get the point. >As the time you get the sorted references to the tuples, you need to >fetch the tuples themself, check their visbility, etc. and returns >them to the client. > As PFC correctly points out elsewhere in this thread, =maybe= you have to do all that. The vast majority of the time people are not going to want to look at a detailed record by record output of that much data. The most common usage is to calculate or summarize some quality or quantity of the data and display that instead or to use the tuples or some quality of the tuples found as an intermediate step in a longer query process such as a join. Sometimes there's a need to see _some_ of the detailed records; a random sample or a region in a random part of the table or etc. It's rare that there is a RW need to actually list every record in a table of significant size. On the rare occasions where one does have to return or display all records in such large table, network IO and/or display IO speeds are the primary performance bottleneck. Not HD IO. Nonetheless, if there _is_ such a need, there's nothing stopping us from rearranging the records in RAM into sorted order in one pass through RAM (using at most space for one extra record) after constructing the cache conscious Btree index. Then the sorted records can be written to HD in RAM buffer sized chunks very efficiently. Repeating this process until we have stepped through the entire data set will take no more HD IO than one HD scan of the data and leave us with a permanent result that can be reused for multiple purposes. If the sorted records are written in large enough chunks, rereading them at any later time can be done at maximum HD throughput In a total of two HD scans (one to read the original data, one to write out the sorted data) we can make a permanent rearrangement of the data. We've essentially created a cluster index version of the data. >So, if there is only 2 values in the column of big table that is larger >than available RAM, two seq scans of the table without any sorting >is the fastest solution. > If you only need to do this once, yes this wins. OTOH, if you have to do this sort even twice, my method is better. regards, Ron ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] [PERFORM] A Better External Sort?
>From: Zeugswetter Andreas DAZ SD <[EMAIL PROTECTED]> >Sent: Sep 29, 2005 9:28 AM >Subject: RE: [HACKERS] [PERFORM] A Better External Sort? > >>In my original example, a sequential scan of the 1TB of 2KB >>or 4KB records, => 250M or 500M records of data, being sorted >>on a binary value key will take ~1000x more time than reading >>in the ~1GB Btree I described that used a Key+RID (plus node >>pointers) representation of the data. > >Imho you seem to ignore the final step your algorithm needs of >collecting the data rows. After you sorted the keys the collect >step will effectively access the tuples in random order (given a >sufficiently large key range). > "Collecting" the data rows can be done for each RAM buffer full of of data in one pass through RAM after we've built the Btree. Then if desired those data rows can be read out to HD in sorted order in essentially one streaming burst. This combination of index build + RAM buffer rearrangement + write results to HD can be repeat as often as needed until we end up with an overall Btree index and a set of sorted sublists on HD. Overall HD IO for the process is only two effectively sequential passes through the data. Subsequent retrieval of the sorted information from HD can be done at full HD streaming speed and whatever we've decided to save to HD can be reused later if we desire. Hope this helps, Ron ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [PERFORM] A Better External Sort?
Jeff, On 9/29/05 10:44 AM, "Jeffrey W. Baker" <[EMAIL PROTECTED]> wrote: > On Thu, 2005-09-29 at 10:06 -0700, Luke Lonergan wrote: > Looking through tuplesort.c, I have a couple of initial ideas. Are we > allowed to fork here? That would open up the possibility of using the > CPU and the I/O in parallel. I see that tuplesort.c also suffers from > the kind of postgresql-wide disease of calling all the way up and down a > big stack of software for each tuple individually. Perhaps it could be > changed to work on vectors. Yes! > I think the largest speedup will be to dump the multiphase merge and > merge all tapes in one pass, no matter how large M. Currently M is > capped at 6, so a sort of 60GB with 1GB sort memory needs 13 passes over > the tape. It could be done in a single pass heap merge with N*log(M) > comparisons, and, more importantly, far less input and output. Yes again, see above. > I would also recommend using an external processes to asynchronously > feed the tuples into the heap during the merge. Simon Riggs is working this idea a bit - it's slightly less interesting to us because we already have a multiprocessing executor. Our problem is that 4 x slow is still far too slow. > What's the timeframe for 8.2? Let's test it out in Bizgres! - Luke ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] [PERFORM] A Better External Sort?
>From: Josh Berkus >Sent: Sep 29, 2005 12:54 PM >Subject: Re: [HACKERS] [PERFORM] A Better External Sort? > >The biggest single area where I see PostgreSQL external >sort sucking is on index creation on large tables. For >example, for free version of TPCH, it takes only 1.5 hours to >load a 60GB Lineitem table on OSDL's hardware, but over 3 >hours to create each index on that table. This means that >over all our load into TPCH takes 4 times as long to create >the indexes as it did to bulk load the data. > Hmmm. 60GB/5400secs= 11MBps. That's ssllooww. So the first problem is evidently our physical layout and/or HD IO layer sucks. Creating the table and then creating the indexes on the table is going to require more physical IO than if we created the table and the indexes concurrently in chunks and then combined the indexes on the chunks into the overall indexes for the whole table, so there's a potential speed-up. The method I've been talking about is basically a recipe for creating indexes as fast as possible with as few IO operations, HD or RAM, as possible and nearly no random ones, so it could help as well. OTOH, HD IO rate is the fundamental performance metric. As long as our HD IO rate is pessimal, so will the performance of everything else be. Why can't we load a table at closer to the peak IO rate of the HDs? >Anyone restoring a large database from pg_dump is in the >same situation. Even worse, if you have to create a new >index on a large table on a production database in use, >because the I/O from the index creation swamps everything. > Fix for this in the works ;-) >Following an index creation, we see that 95% of the time >required is the external sort, which averages 2mb/s. > Assuming decent HD HW, this is HORRIBLE. What's kind of instrumenting and profiling has been done of the code involved? >This is with seperate drives for the WAL, the pg_tmp, the table >and the index. I've confirmed that increasing work_mem >beyond a small minimum (around 128mb) had no benefit on >the overall index creation speed. > No surprise. The process is severely limited by the abyssmally slow HD IO. Ron ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Comparative performance
> > This smells like a TCP communication problem. > > I'm puzzled by that remark. How much does TCP get into the > picture in a local Windows client/server environment? Windows has no Unix Domain Sockets (no surprise there), so TCP connections over the loopback interface are used to connect to the server. //Magnus ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster