Re: [PERFORM] 121+ million record table perf problems
I've got a table with ~121 million records in it. Select count on it currently takes ~45 minutes, and an update to the table to set a value on one of the columns I finally killed after it ran 17 hours and had still not completed. Queries into the table are butt slow, and The update query that started this all I had to kill after 17hours. It should have updated all 121+ million records. That brought my select count down to 19 minutes, but still a far cry from acceptable. If you have a column that needs to be updated often for all rows, separate it into a different table, and create a view that joins it back to the main table so that your application still sees the old schema. This will greatly speed your update since (in Postgres) and update is the same as a delete+insert. By updating that one column, you're re-writing your entire 121 million rows. If you separate it, you're only rewriting that one column. Don't forget to vacuum/analyze and reindex when you're done. Better yet, if you can stand a short down time, you can drop indexes on that column, truncate, then do 121 million inserts, and finally reindex. That will be MUCH faster. Craig ---(end of broadcast)--- TIP 6: explain analyze is your friend
[PERFORM] Auto-ANALYZE?
Auto-vacuum has made Postgres a much more friendly system. Is there some reason the planner can't also auto-ANALYZE in some situations? Here's an example I ran into: create table my_tmp_table (...); insert into my_tmp_table (select some stuff from here and there); select ... from my_tmp_table join another_table on (...); The last statement generated a horrible plan, because the planner had no idea what was in the temporary table (which only had about 100 rows in it). Simply inserting an ANALYZE before the SELECT improved performance by a factor of 100 or so. There are several situations where you could automatically analyze the data. 1. Any time you have to do a full table scan, you might as well throw in an ANALYZE of the data you're scanning. If I understand things, ANALYZE takes a random sample anyway, so a full table scan should be able to produce even better statistics than a normal ANALYZE. 2. If you have a table with NO statistics, the chances of generating a sensible plan are pretty random. Since ANALYZE is quite fast, if the planner encounters no statistics, why not ANALYZE it on the spot? (This might need to be a configurable feature, though.) 3. A user-configurable update threshold, such as, When 75% of the rows have changed since the last ANALYZE, trigger an auto-analyze. The user-configurable part would account for the fact that some tables stats don't change much even after many updates, but others may need to be reanalyzed after a modest number of updates. Auto-vacuum, combined with auto-analyze, would eliminate many of the problems that plague neophyte (and sometimes experienced) users of Postgres. A substantial percentage of the questions to this list are answered with, Have you ANALYZED? Craig ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] LIKE search and performance
Mark Lewis wrote: PG could scan the index looking for matches first and only load the actual rows if it found a match, but that could only be a possible win if there were very few matches, because the difference in cost between a full index scan and a sequential scan would need to be greater than the cost of randomly fetching all of the matching data rows from the table to look up the visibility information. Just out of curiosity: Does Postgress store a duplicate of the data in the index, even for long strings? I thought indexes only had to store the string up to the point where there was no ambiguity, for example, if I have missing, mississippi and misty, the index only needs missin, missis and mist in the actual index. This would make it impossible to use a full index scan for a LIKE query. Craig ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] LIKE search and performance
Alvaro Herrera wrote: Just out of curiosity: Does Postgress store a duplicate of the data in the index, even for long strings? I thought indexes only had to store the string up to the point where there was no ambiguity, for example, if I have missing, mississippi and misty, the index only needs missin, missis and mist in the actual index. What would happen when you inserted a new tuple with just miss? You would need to expand all the other tuples in the index. That's right. This technique used by some index implementations is a tradeoff between size and update speed. Most words in most natural languages can be distinguished by the first few characters. The chances of having to modify more than a few surrounding nodes when you insert miss is small, so some implementations choose this method. Other implementations choose to store the full string. I was just curious which method Postgres uses. Craig ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[PERFORM] ECC RAM really needed?
We're thinking of building some new servers. We bought some a while back that have ECC (error correcting) RAM, which is absurdly expensive compared to the same amount of non-ECC RAM. Does anyone have any real-life data about the error rate of non-ECC RAM, and whether it matters or not? In my long career, I've never once had a computer that corrupted memory, or at least I never knew if it did. ECC sound like a good idea, but is it solving a non-problem? Thanks, Craig ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] How PostgreSQL handles multiple DDBB instances?
On Fri, 2007-05-25 at 20:16 +0200, Arnau wrote: The point I'm worried is performance. Do you think the performance would be better executing exactly the same queries only adding an extra column to all the tables e.g. customer_id, than open a connection to the only one customers DB and execute the query there? There is no simple answer to this question; it depends too much on your data. In many cases, adding a customer_id to every table, and perhaps also per-customer views (per Jeff's suggestion), can work really well. However, performance is not the only consideration, or even the main consideration. We operate with about 150 separate databases. In our cases, administration issues and software design outweighed performance issues. For example, with separate databases, security is simpler, *and* it's easy to convince the customer that their data is protected. Creating views only helps for read-only access. When the customer wants to modify their data, how will you keep them from accessing and overwriting one another's data? Even with views, can you convince the customer you've done it right? With separate databases, you use the built-in security of Postgres, and don't have to duplicate it in your schema and apps. With separate databases, it's really easy to discard a customer. This can be particularly important for a big customer with millions of linked records. In a database-for-everyone design, you'll have lots of foreign keys, indexes, etc. that make deleting a whole customer a REALLY big job. Contrast that with just discarding a whole database, which typically takes a couple seconds. But even more important (to us) is the simplicity of the applications and management. It's far more than just an extra ... and customer = xyz added to every query. Throwing the customers together means every application has to understand security, and many operations that would be simple become horribly tangled. Want to back up a customer's data? You can't use pg_dump, you have to write your own dump app. Want to restore a customer's data? Same. Want to do a big update? Your whole database is affected and probably needs to be vacuum/analyzed. On and on, at every turn, management and applications are more complex. If you have hundreds of separate databases, it's also easy to scale: Just buy more servers, and move some of the databases. With a single monster database, as load increases, you may hit the wall sooner or later. Postgres is really good at maintaining many separate databases. Why do it yourself? There are indeed performance issues, but even that's not black and white. Depending on the specifics of your queries and the load on your servers, you may get better performance from a single monster database, or from hundreds of separate databases. So, your question has no simple answer. You should indeed evaluate the performance, but other issues may dominate your decision. Craig ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
[PERFORM] Autodetect of software RAID1+0 fails
Apologies for a somewhat off-topic question, but... The Linux kernel doesn't properly detect my software RAID1+0 when I boot up. It detects the two RAID1 arrays, the partitions of which are marked properly. But it can't find the RAID0 on top of that, because there's no corresponding device to auto-detect. The result is that it creates /dev/md0 and /dev/md1 and assembles the RAID1 devices on bootup, but /dev/md2 isn't created, so the RAID0 can't be assembled at boot time. Here's what it looks like: $ cat /proc/mdstat Personalities : [raid0] [raid1] md2 : active raid0 md0[0] md1[1] 234436224 blocks 64k chunks md1 : active raid1 sde1[1] sdc1[2] 117218176 blocks [2/2] [UU] md0 : active raid1 sdd1[1] sdb1[0] 117218176 blocks [2/2] [UU] $ uname -r 2.6.12-1.1381_FC3 After a reboot, I always have to do this: mknod /dev/md2 b 9 2 mdadm --assemble /dev/md2 /dev/md0 /dev/md1 mount /dev/md2 What am I missing here? Thanks, Craig ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Thousands of tables versus on table?
[EMAIL PROTECTED] wrote: various people (not database experts) are pushing to install Oracle cluster so that they can move all of these to one table with a customerID column. They're blowing smoke if they think Oracle can do this. One of my applications had this exact same problem -- table-per-customer versus big-table-for-everyone. Oracle fell over dead, even with the best indexing possible, tuned by the experts, and using partitions keyed to the customerID. We ended up breaking it up into table-per-customer because Oracle fell over dead when we had to do a big update on a customer's entire dataset. All other operations were slowed by the additional index on the customer-ID, especially complex joins. With a table-for-everyone, you're forced to create tricky partitioning or clustering, clever indexes, and even with that, big updates are problematic. And once you do this, then you become heavily tied to one RDBMS and your applications are no longer portable, because clustering, indexing, partitioning and other DB tuning tricks are very specific to each RDBMS. When we moved to Postgres, we never revisited this issue, because both Oracle and Postgres are able to handle thousands of tables well. As I wrote in a previous message on a different topic, often the design of your application is more important than the performance. In our case, the table-per-customer makes the applications simpler, and security is MUCH easier. Oracle is simply not better than Postgres in this regard. As far as I know, there is only one specific situation (discussed frequently here) where Oracle is faster: the count(), min() and max() functions, and I know significant progress has been made since I started using Postgres. I have not found any other query where Oracle is significantly better, and I've found several where Postgres is the clear winner. It's telling that Oracle's license contract prohibits you from publishing comparisons and benchmarks. You have to wonder why. Craig ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] Thousands of tables versus on table?
Scott Marlowe wrote: OTOH, there are some things, like importing data, which are MUCH faster in pgsql than in the big database. An excellent point, I forgot about this. The COPY command is the best thing since the invention of a shirt pocket. We have a database-per-customer design, and one of the mosterous advantages of Postgres is that we can easily do backups. A pg_dump, then scp to a backup server, and in just a minute or two we have a full backup. For recovery, pg_restore is equally fast and amazing. Last time I checked, Oracle didn't have anything close to this. Craig ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Thousands of tables versus on table?
Jonah H. Harris wrote: On 6/6/07, Craig James [EMAIL PROTECTED] wrote: They're blowing smoke if they think Oracle can do this. Oracle could handle this fine. Oracle fell over dead, even with the best indexing possible, tuned by the experts, and using partitions keyed to the customerID. I don't think so, whoever tuned this likely didn't know what they were doing. Wrong on both counts. You didn't read my message. I said that *BOTH* Oracle and Postgres performed well with table-per-customer. I wasn't Oracle bashing. In fact, I was doing the opposite: Someone's coworker claimed ORACLE was the miracle cure for all problems, and I was simply pointing out that there are no miracle cures. (I prefer Postgres for many reasons, but Oracle is a fine RDBMS that I have used extensively.) The technical question is simple: Table-per-customer or big-table-for-everyone. The answer is, it depends. It depends on your application, your read-versus-write ratio, the table size, the design of your application software, and a dozen other factors. There is no simple answer, but there are important technical insights which, I'm happy to report, various people contributed to this discussion. Perhaps you have some technical insight too, because it really is an important question. The reason I assert (and stand by this) that They're blowing smoke when they claim Oracle has the magic cure, is because Oracle and Postgres are both relational databases, they write their data to disks, and they both have indexes with O(log(N)) retrieval/update times. Oracle doesn't have a magical workaround to these facts, nor does Postgres. Craig ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Best way to delete unreferenced rows?
Tyrrill, Ed wrote: I have a table, let's call it A, whose primary key, a_id, is referenced in a second table, let's call it B. For each unique A.a_id there are generally many rows in B with the same a_id. My problem is that I want to delete a row in A when the last row in B that references it is deleted. Right now I just query for rows in A that aren't referenced by B, and that worked great when the tables were small, but it takes over an hour now that the tables have grown larger (over 200 million rows in B and 14 million in A). The delete has to do a sequential scan of both tables since I'm looking for what's not in the indexes. I was going to try creating a trigger after delete on B for each row to check for more rows in B with the same a_id, and delete the row in A if none found. In general I will be deleting 10's of millions of rows from B and 100's of thousands of rows from A on a daily basis. What do you think? Does anyone have any other suggestions on different ways to approach this? Essentially what you're doing is taking the one-hour job and spreading out in little chunks over thousands of queries. If you have 10^7 rows in B and 10^5 rows in A, then on average you have 100 references from B to A. That means that 99% of the time, your trigger will scan B and find that there's nothing to do. This could add a lot of overhead to your ordinary transactions, costing a lot more in the long run than just doing the once-a-day big cleanout. You didn't send the specifics of the query you're using, along with an EXPLAIN ANALYZE of it in operation. It also be that your SQL is not optimal, and that somebody could suggest a more efficient query. It's also possible that it's not the sequential scans that are the problem, but rather that it just takes a long time to delete 100,000 rows from table A because you have a lot of indexes. Or it could be a combination of performance problems. You haven't given us enough information to really analyze your problem. Send more details! Craig ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Best way to delete unreferenced rows?
Tyrrill, Ed wrote: QUERY PLAN --- Merge Left Join (cost=38725295.93..42505394.70 rows=13799645 width=8) (actual time=6503583.342..8220629.311 rows=93524 loops=1) Merge Cond: (outer.record_id = inner.record_id) Filter: (inner.record_id IS NULL) - Index Scan using backupobjects_pkey on backupobjects (cost=0.00..521525.10 rows=13799645 width=8) (actual time=15.955..357813.621 rows=13799645 loops=1) - Sort (cost=38725295.93..39262641.69 rows=214938304 width=8) (actual time=6503265.293..7713657.750 rows=214938308 loops=1) Sort Key: backup_location.record_id - Seq Scan on backup_location (cost=0.00..3311212.04 rows=214938304 width=8) (actual time=11.175..1881179.825 rows=214938308 loops=1) Total runtime: 8229178.269 ms (8 rows) I ran vacuum analyze after the last time any inserts, deletes, or updates were done, and before I ran the query above. I've attached my postgresql.conf. The machine has 4 GB of RAM. I thought maybe someone with more expertise than me might answer this, but since they haven't I'll just make a comment. It looks to me like the sort of 214 million rows is what's killing you. I suppose you could try to increase the sort memory, but that's a lot of memory. It seems to me an index merge of a relation this large would be faster, but that's a topic for the experts. On a theoretical level, the problem is that it's sorting the largest table. Perhaps you could re-cast the query so that it only has to sort the smaller table, something like select a.id from a where a.id not in (select distinct b.id from b) where b is the smaller table. There's still no guarantee that it won't do a sort on a, though. In fact one of the clever things about Postgres is that it can convert a query like the one above into a regular join, unless you do something like select ... offset 0 which blocks the optimizer from doing the rearrangement. But I think the first approach is to try to tune for a better plan using your original query. Craig ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] test / live environment, major performance difference
On 2007-06-11 Christo Du Preez wrote: I really hope someone can shed some light on my problem. I'm not sure if this is a posgres or potgis issue. Anyway, we have 2 development laptops and one live server, somehow I managed to get the same query to perform very well om my laptop, but on both the server and the other laptop it's really performing bad. One simple possibility that bit me in the past: If you do pg_dump/pg_restore to create a copy of the database, you have to ANALYZE the newly-restored database. I mistakenly assumed that pg_restore would do this, but you have to run ANALYZE explicitely after a restore. Craig ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PERFORM] Replication
Looking for replication solutions, I find: Slony-I Seems good, single master only, master is a single point of failure, no good failover system for electing a new master or having a failed master rejoin the cluster. Slave databases are mostly for safety or for parallelizing queries for performance. Suffers from O(N^2) communications (N = cluster size). Slony-II Seems brilliant, a solid theoretical foundation, at the forefront of computer science. But can't find project status -- when will it be available? Is it a pipe dream, or a nearly-ready reality? PGReplication Appears to be a page that someone forgot to erase from the old GBorg site. PGCluster Seems pretty good, but web site is not current, there are releases in use that are not on the web site, and also seems to always be a couple steps behind the current release of Postgres. Two single-points failure spots, load balancer and the data replicator. Is this a good summary of the status of replication? Have I missed any important solutions or mischaracterized anything? Thanks! Craig ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Replication
Thanks to all who replied and filled in the blanks. The problem with the web is you never know if you've missed something. Joshua D. Drake wrote: Looking for replication solutions, I find... Slony-II Dead Wow, I'm surprised. Is it dead for lack of need, lack of resources, too complex, or all of the above? It sounded like such a promising theoretical foundation. Ben wrote: Which replication problem are you trying to solve? Most of our data is replicated offline using custom tools tailored to our loading pattern, but we have a small amount of global information, such as user signups, system configuration, advertisements, and such, that go into a single small (~5-10 MB) global database used by all servers. We need nearly-real-time replication, and instant failover. That is, it's far more important for the system to keep working than it is to lose a little data. Transactional integrity is not important. Actual hardware failures are rare, and if a user just happens to sign up, or do save preferences, at the instant the global-database server goes down, it's not a tragedy. But it's not OK for the entire web site to go down when the one global-database server fails. Slony-I can keep several slave databases up to date, which is nice. And I think I can combine it with a PGPool instance on each server, with the master as primary and few Slony-copies as secondary. That way, if the master goes down, the PGPool servers all switch to their secondary Slony slaves, and read-only access can continue. If the master crashes, users will be able to do most activities, but new users can't sign up, and existing users can't change their preferences, until either the master server comes back, or one of the slaves is promoted to master. The problem is, there don't seem to be any vote a new master type of tools for Slony-I, and also, if the original master comes back online, it has no way to know that a new master has been elected. So I'd have to write a bunch of SOAP services or something to do all of this. I would consider PGCluster, but it seems to be a patch to Postgres itself. I'm reluctant to introduce such a major piece of technology into our entire system, when only one tiny part of it needs the replication service. Thanks, Craig ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Replication
Andreas Kostyrka wrote: Slony provides near instantaneous failovers (in the single digit seconds range). You can script an automatic failover if the master server becomes unreachable. But Slony slaves are read-only, correct? So the system isn't fully functional once the master goes down. That leaves you the problem of restarting your app (or making it reconnect) to the new master. Don't you have to run a Slony app to convert one of the slaves into the master? 5-10MB data implies such a fast initial replication, that making the server rejoin the cluster by setting it up from scratch is not an issue. The problem is to PREVENT it from rejoining the cluster. If you have some semi-automatic process that detects the dead server and converts a slave to the master, and in the mean time the dead server manages to reboot itself (or its network gets fixed, or whatever the problem was), then you have two masters sending out updates, and you're screwed. The problem is, there don't seem to be any vote a new master type of tools for Slony-I, and also, if the original master comes back online, it has no way to know that a new master has been elected. So I'd have to write a bunch of SOAP services or something to do all of this. You don't need SOAP services, and you do not need to elect a new master. if dbX goes down, dbY takes over, you should be able to decide on a static takeover pattern easily enough. I can't see how that is true. Any self-healing distributed system needs something like the following: - A distributed system of nodes that check each other's health - A way to detect that a node is down and to transmit that information across the nodes - An election mechanism that nominates a new master if the master fails - A way for a node coming online to determine if it is a master or a slave Any solution less than this can cause corruption because you can have two nodes that both think they're master, or end up with no master and no process for electing a master. As far as I can tell, Slony doesn't do any of this. Is there a simpler solution? I've never heard of one. The point here is, that the servers need to react to a problem, but you probably want to get the admin on duty to look at the situation as quickly as possible anyway. No, our requirement is no administrator interaction. We need instant, automatic recovery from failure so that the system stays online. Furthermore, you need to checkout pgpool, I seem to remember that it has some bad habits in routing queries. (E.g. it wants to apply write queries to all nodes, but slony makes the other nodes readonly. Furthermore, anything inside a BEGIN is sent to the master node, which is bad with some ORMs, that by default wrap any access into a transaction) I should have been more clear about this. I was planning to use PGPool in the PGPool-1 mode (not the new PGPool-2 features that allow replication). So it would only be acting as a failover mechanism. Slony would be used as the replication mechanism. I don't think I can use PGPool as the replicator, because then it becomes a new single point of failure that could bring the whole system down. If you're using it for INSERT/UPDATE, then there can only be one PGPool server. I was thinking I'd put a PGPool server on every machine in failover mode only. It would have the Slony master as the primary connection, and a Slony slave as the failover connection. The applications would route all INSERT/UPDATE statements directly to the Slony master, and all SELECT statements to the PGPool on localhost. When the master failed, all of the PGPool servers would automatically switch to one of the Slony slaves. This way, the system would keep running on the Slony slaves (so it would be read-only), until a sysadmin could get the master Slony back online. And when the master came online, the PGPool servers would automatically reconnect and write-access would be restored. Does this make sense? Craig ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Replication
Markus Schiltknecht wrote: Not quite... there's still Postgres-R, see www.postgres-r.org And I'm continuously working on it, despite not having updated the website for almost a year now... I planned on releasing the next development snapshot together with 8.3, as that seems to be delayed, that seems realistic ;-) Is Postgres-R the same thing as Slony-II? There's a lot of info and news around about Slony-II, but your web page doesn't seem to mention it. While researching replication solutions, I had a heck of a time sorting out the dead or outdated web pages (like the stuff on gborg) from the active projects. Either way, it's great to know you're working on it. Craig ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] rtree/gist index taking enormous amount of space in 8.2.3
Dolafi, Tom wrote: min(fmin) | max(fmin)|avg(fmin) 1 | 55296469 |11423945 min(fmax) | max(fmax)|avg(fmax) 18 | 3288 |11424491 There are 5,704,211 rows in the table. When you're looking for weird index problems, it's more interesting to know if there are certain numbers that occur a LOT. From your statistics above, each number occurs about 10 times in the table. But do some particular numbers occur thousands, or even millions, of times? Here is a query that will print a list of the highest-occuring values. You might expect a few occurances of 20, and maybe 30, but if you have thousands or millions of occurances of certain numbers, then that can screw up an index. select fmax, c from (select fmax, count(fmax) as c from your_table group by fmax) as foo where c 3 order by c desc; Craig ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[PERFORM] Join with lower/upper limits doesn't scale well
I have the same schema in two different databases. In smalldb, the two tables of interest have about 430,000 rows, in bigdb, the two tables each contain about 5.5 million rows. I'm processing the data, and for various reasons it works out well to process it in 100,000 row chunks. However, it turns out for the big schema, selecting 100,000 rows is the longest single step of the processing. Below is the explain/analyze output of the query from each database. Since both tables are indexed on the joined columns, I don't understand why the big table should be so much slower -- I hoped this would scale well, or at least O(log(N)), not O(N). What's going on here? I don't know if I'm reading this right, but it looks like the sort is taking all the time, but that doesn't make sense because in both cases it's sorting 100,000 rows. Thanks, Craig bigdb= explain analyze bigdb- select r.row_num, m.molkeys from my_rownum r bigdb- join my_molkeys m on (r.version_id = m.version_id) bigdb- where r.row_num = 10 AND r.row_num 20 bigdb- order by r.row_num; Sort (cost=431000.85..431248.23 rows=98951 width=363) (actual time=46306.748..46417.448 rows=10 loops=1) Sort Key: r.row_num - Hash Join (cost=2583.59..422790.68 rows=98951 width=363) (actual time=469.010..45752.131 rows=10 loops=1) Hash Cond: (outer.version_id = inner.version_id) - Seq Scan on my_molkeys m (cost=0.00..323448.30 rows=5472530 width=363) (actual time=11.243..33299.933 rows=5472532 loops=1) - Hash (cost=2336.21..2336.21 rows=98951 width=8) (actual time=442.260..442.260 rows=10 loops=1) - Index Scan using i_chm_rownum_row_num on my_rownum r (cost=0.00..2336.21 rows=98951 width=8) (actual time=47.551..278.736 rows=10 loops=1) Index Cond: ((row_num = 10) AND (row_num 20)) Total runtime: 46543.163 ms smalldb= explain analyze smalldb- select r.row_num, m.molkeys from my_rownum r smalldb- join my_molkeys m on (r.version_id = m.version_id) smalldb- where r.row_num = 10 AND r.row_num 20 smalldb- order by r.row_num; Sort (cost=43598.23..43853.38 rows=102059 width=295) (actual time=4097.180..4207.733 rows=10 loops=1) Sort Key: r.row_num - Hash Join (cost=2665.09..35107.41 rows=102059 width=295) (actual time=411.635..3629.756 rows=10 loops=1) Hash Cond: (outer.version_id = inner.version_id) - Seq Scan on my_molkeys m (cost=0.00..23378.90 rows=459590 width=295) (actual time=8.563..2011.455 rows=459590 loops=1) - Hash (cost=2409.95..2409.95 rows=102059 width=8) (actual time=402.867..402.867 rows=10 loops=1) - Index Scan using i_chm_rownum_row_num_8525 on my_rownum r (cost=0.00..2409.95 rows=102059 width=8) (actual time=37.122..242.528 rows=10 loops=1) Index Cond: ((row_num = 10) AND (row_num 20)) Total runtime: 4333.501 ms Table bigdb.my_rownum Column | Type | Modifiers +-+--- version_id | integer | parent_id | integer | row_num| integer | Indexes: i_chm_rownum_row_num UNIQUE, btree (row_num) i_chm_rownum_version_id UNIQUE, btree (version_id) i_chm_rownum_parent_id btree (parent_id) Table bigdb.my_molkeys Column | Type | Modifiers +-+--- version_id | integer | molkeys| text| Indexes: i_chm_molkeys_version_id UNIQUE, btree (version_id) ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PERFORM] Equivalent queries produce different plans
The two queries below produce different plans. select r.version_id, r.row_num, m.molkeys from my_rownum r join my_molkeys m on (r.version_id = m.version_id) where r.version_id = 320 and r.version_id 330 order by r.version_id; select r.version_id, r.row_num, m.molkeys from my_rownum r join my_molkeys m on (r.version_id = m.version_id) where r.version_id = 320 and r.version_id 330 and m.version_id = 320 and m.version_id 330 order by r.version_id; I discovered this while looking at the plans for the first query. It seemed to be ignoring the fact that it could push the between condition along to the second table, since the condition and the join are on the same indexed columns. So, I added a redundant condition, and bingo, it was a lot faster. In the analysis shown below, the timing (about 1.0 and 1.5 seconds respectively) are for a hot database that's been queried a couple of times. In real life on a cold database, the times are more like 10 seconds and 21 seconds, so it's quite significant. Thanks, Craig db= explain analyze db- select r.version_id, r.row_num, m.molkeys from my_rownum r db- join my_molkeys m on (r.version_id = m.version_id) db- where r.version_id = 320 db- and r.version_id 330 db- order by r.version_id; Sort (cost=264979.51..265091.06 rows=44620 width=366) (actual time=1424.126..1476.048 rows=46947 loops=1) Sort Key: r.version_id - Nested Loop (cost=366.72..261533.64 rows=44620 width=366) (actual time=41.649..1186.331 rows=46947 loops=1) - Bitmap Heap Scan on my_rownum r (cost=366.72..41168.37 rows=44620 width=8) (actual time=41.616..431.783 rows=46947 loops=1) Recheck Cond: ((version_id = 320) AND (version_id 330)) - Bitmap Index Scan on i_chm_rownum_version_id_4998 (cost=0.00..366.72 rows=44620 width=0) (actual time=21.244..21.244 rows=46947 loops=1) Index Cond: ((version_id = 320) AND (version_id 330)) - Index Scan using i_chm_molkeys_version_id on my_molkeys m (cost=0.00..4.93 rows=1 width=362) (actual time=0.009..0.010 rows=1 loops=46947) Index Cond: (outer.version_id = m.version_id) Total runtime: 1534.638 ms (10 rows) db= explain analyze db- select r.version_id, r.row_num, m.molkeys from my_rownum r db- join my_molkeys m on (r.version_id = m.version_id) db- where r.version_id = 320 db- and r.version_id 330 db- and m.version_id = 320 db- and m.version_id 330 db- order by r.version_id; Sort (cost=157732.20..157732.95 rows=298 width=366) (actual time=985.383..1037.423 rows=46947 loops=1) Sort Key: r.version_id - Hash Join (cost=41279.92..157719.95 rows=298 width=366) (actual time=502.875..805.402 rows=46947 loops=1) Hash Cond: (outer.version_id = inner.version_id) - Index Scan using i_chm_molkeys_version_id on my_molkeys m (cost=0.00..115717.85 rows=47947 width=362) (actual time=0.023..117.270 rows=46947 loops=1) Index Cond: ((version_id = 320) AND (version_id 330)) - Hash (cost=41168.37..41168.37 rows=44620 width=8) (actual time=502.813..502.813 rows=46947 loops=1) - Bitmap Heap Scan on my_rownum r (cost=366.72..41168.37 rows=44620 width=8) (actual time=41.621..417.508 rows=46947 loops=1) Recheck Cond: ((version_id = 320) AND (version_id 330)) - Bitmap Index Scan on i_chm_rownum_version_id_4998 (cost=0.00..366.72 rows=44620 width=0) (actual time=21.174..21.174 rows=46947 loops=1) Index Cond: ((version_id = 320) AND (version_id 330)) Total runtime: 1096.031 ms (12 rows) ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Equivalent queries produce different plans
Sorry, I forgot to mention: This is 8.1.4, with a fairly ordinary configuration on a 4 GB system. Craig Craig James wrote: The two queries below produce different plans. select r.version_id, r.row_num, m.molkeys from my_rownum r join my_molkeys m on (r.version_id = m.version_id) where r.version_id = 320 and r.version_id 330 order by r.version_id; select r.version_id, r.row_num, m.molkeys from my_rownum r join my_molkeys m on (r.version_id = m.version_id) where r.version_id = 320 and r.version_id 330 and m.version_id = 320 and m.version_id 330 order by r.version_id; I discovered this while looking at the plans for the first query. It seemed to be ignoring the fact that it could push the between condition along to the second table, since the condition and the join are on the same indexed columns. So, I added a redundant condition, and bingo, it was a lot faster. In the analysis shown below, the timing (about 1.0 and 1.5 seconds respectively) are for a hot database that's been queried a couple of times. In real life on a cold database, the times are more like 10 seconds and 21 seconds, so it's quite significant. Thanks, Craig db= explain analyze db- select r.version_id, r.row_num, m.molkeys from my_rownum r db- join my_molkeys m on (r.version_id = m.version_id) db- where r.version_id = 320 db- and r.version_id 330 db- order by r.version_id; Sort (cost=264979.51..265091.06 rows=44620 width=366) (actual time=1424.126..1476.048 rows=46947 loops=1) Sort Key: r.version_id - Nested Loop (cost=366.72..261533.64 rows=44620 width=366) (actual time=41.649..1186.331 rows=46947 loops=1) - Bitmap Heap Scan on my_rownum r (cost=366.72..41168.37 rows=44620 width=8) (actual time=41.616..431.783 rows=46947 loops=1) Recheck Cond: ((version_id = 320) AND (version_id 330)) - Bitmap Index Scan on i_chm_rownum_version_id_4998 (cost=0.00..366.72 rows=44620 width=0) (actual time=21.244..21.244 rows=46947 loops=1) Index Cond: ((version_id = 320) AND (version_id 330)) - Index Scan using i_chm_molkeys_version_id on my_molkeys m (cost=0.00..4.93 rows=1 width=362) (actual time=0.009..0.010 rows=1 loops=46947) Index Cond: (outer.version_id = m.version_id) Total runtime: 1534.638 ms (10 rows) db= explain analyze db- select r.version_id, r.row_num, m.molkeys from my_rownum r db- join my_molkeys m on (r.version_id = m.version_id) db- where r.version_id = 320 db- and r.version_id 330 db- and m.version_id = 320 db- and m.version_id 330 db- order by r.version_id; Sort (cost=157732.20..157732.95 rows=298 width=366) (actual time=985.383..1037.423 rows=46947 loops=1) Sort Key: r.version_id - Hash Join (cost=41279.92..157719.95 rows=298 width=366) (actual time=502.875..805.402 rows=46947 loops=1) Hash Cond: (outer.version_id = inner.version_id) - Index Scan using i_chm_molkeys_version_id on my_molkeys m (cost=0.00..115717.85 rows=47947 width=362) (actual time=0.023..117.270 rows=46947 loops=1) Index Cond: ((version_id = 320) AND (version_id 330)) - Hash (cost=41168.37..41168.37 rows=44620 width=8) (actual time=502.813..502.813 rows=46947 loops=1) - Bitmap Heap Scan on my_rownum r (cost=366.72..41168.37 rows=44620 width=8) (actual time=41.621..417.508 rows=46947 loops=1) Recheck Cond: ((version_id = 320) AND (version_id 330)) - Bitmap Index Scan on i_chm_rownum_version_id_4998 (cost=0.00..366.72 rows=44620 width=0) (actual time=21.174..21.174 rows=46947 loops=1) Index Cond: ((version_id = 320) AND (version_id 330)) Total runtime: 1096.031 ms (12 rows) ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[PERFORM] pg_restore causes 100
Here's an oddity. I have 10 databases, each with about a dozen connections to Postgres (about 120 connections total), and at midnight they're all idle. These are mod_perl programs (like a FastCGI -- they stay connected so they're ready for instant service). So using ps -ef and grep, we find one of the databases looks like this: postgres 22708 7619 0 Jul11 ?00:00:06 postgres: chemmega chemmega 192.168.10.192(46915) idle postgres 22709 7619 0 Jul11 ?00:00:06 postgres: chemmega chemmega 192.168.10.192(46916) idle postgres 22710 7619 0 Jul11 ?00:00:06 postgres: chemmega chemmega 192.168.10.192(46917) idle postgres 22711 7619 0 Jul11 ?00:00:06 postgres: chemmega chemmega 192.168.10.192(46918) idle postgres 22712 7619 0 Jul11 ?00:00:06 postgres: chemmega chemmega 192.168.10.192(46919) idle postgres 22724 7619 0 Jul11 ?00:00:06 postgres: chemmega chemmega 192.168.10.192(42440) idle postgres 22725 7619 0 Jul11 ?00:00:06 postgres: chemmega chemmega 192.168.10.192(42441) idle postgres 22726 7619 0 Jul11 ?00:00:06 postgres: chemmega chemmega 192.168.10.192(42442) idle postgres 22727 7619 0 Jul11 ?00:00:06 postgres: chemmega chemmega 192.168.10.192(42443) idle postgres 22728 7619 0 Jul11 ?00:00:06 postgres: chemmega chemmega 192.168.10.192(42444) idle postgres 22731 7619 0 Jul11 ?00:00:06 postgres: chemmega chemmega 192.168.10.192(42447) idle Now here's the weird thing. I'm running a pg_restore of a database (on the order of 4GB compressed, maybe 34M rows of ordinary data, and 15M rows in one BLOB table that's typically 2K per blob). When I do this, ALL of the postgress backends start working at about 1% CPU apiece. This means that the 120 idle postgres backends are together using almost 100% of one CPU on top of the 100% CPU being used by pg_restore. See the output of top(1) below. Is this normal? All I can guess at is that something's going on in shared memory that every Postgres backend has to respond to. Thanks, Craig Tasks: 305 total, 1 running, 304 sleeping, 0 stopped, 0 zombie Cpu(s): 33.5% us, 1.5% sy, 0.0% ni, 57.8% id, 6.6% wa, 0.2% hi, 0.4% si Mem: 4151456k total, 4011020k used, 140436k free,10096k buffers Swap: 2104504k total,94136k used, 2010368k free, 3168596k cached PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 6681 postgres 16 0 217m 188m 161m D 50.4 4.6 4:29.30 postmaster 1577 root 10 -5 000 S 1.0 0.0 108:01.97 md0_raid1 8487 postgres 15 0 187m 8704 4996 S 1.0 0.2 0:06.56 postmaster 8506 postgres 15 0 187m 8604 4892 S 1.0 0.2 0:06.37 postmaster 8507 postgres 15 0 187m 8708 5004 S 1.0 0.2 0:06.42 postmaster 8512 postgres 15 0 187m 8612 4904 S 1.0 0.2 0:06.65 postmaster 8751 postgres 15 0 187m 10m 7520 S 1.0 0.3 0:07.95 postmaster 8752 postgres 15 0 187m 10m 7492 S 1.0 0.3 0:07.84 postmaster 14053 postgres 15 0 187m 8752 5044 S 1.0 0.2 0:06.53 postmaster 16515 postgres 15 0 187m 8156 4452 S 1.0 0.2 0:06.33 postmaster 25351 postgres 15 0 187m 9772 6064 S 1.0 0.2 0:06.75 postmaster 25387 postgres 15 0 187m 8444 4752 S 1.0 0.2 0:06.45 postmaster
Re: [PERFORM] select on 1milion register = 6s
Bruno Rodrigues Siqueira wrote: Who can help me? My SELECT in a base with 1 milion register, using expression index = 6seconds… Run your query using EXPLAIN ANALYZE SELECT ... your query ... and then post the results to this newsgroup. Nobody can help until they see the results of EXPLAIN ANALYZE. Also, include all other relevant information, such as Postgres version, operating system, amount of memory, and any changes you have made to the Postgres configuration file. Craig ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Slow query with backwards index scan
Tilmann Singer wrote: * [EMAIL PROTECTED] [EMAIL PROTECTED] [20070728 21:05]: Let's try putting the sort/limit in each piece of the UNION to speed them up separately. SELECT * FROM ( (SELECT * FROM large_table lt WHERE lt.user_id = 12345 ORDER BY created_at DESC LIMIT 10) AS q1 UNION (SELECT * FROM large_table lt WHERE user_id IN (SELECT contact_id FROM relationships WHERE user_id=12345) ORDER BY created_at DESC LIMIT 10) AS q2 ORDER BY created_at DESC LIMIT 10; It's not possible to use ORDER BY or LIMIT within unioned queries. http://www.postgresql.org/docs/8.2/static/sql-select.html#SQL-UNION If I'm reading this documentation correctly, it *is* possible, as long as they're inside of a sub-select, as in this case. Craig ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PERFORM] What to vacuum after deleting lots of tables
If I delete a whole bunch of tables (like 10,000 tables), should I vacuum system tables, and if so, which ones? (This system is still on 8.1.4 and isn't running autovacuum). Thanks, Craig ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Tablespaces and NFS
Carlos Moreno wrote: Anyone has tried a setup combining tablespaces with NFS-mounted partitions? There has been some discussion of this recently, you can find it in the archives (http://archives.postgresql.org/). The word seems to be that NFS can lead to data corruption. Craig ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Low CPU Usage
Luiz K. Matsumura wrote: Is connected to full 100Mb, it transfers many things quick to clients. Is running Apache adn JBoss, transfer rate is good, I did scp to copy many archives and is as quick as the old server. I have no idea how to continue researching this problem. Now I'm going to do some networks tests. Any chance this is your desktop machine, and you're also using it for audio? Microsoft built in a feature (!) that reduces network speed by 90% when music is playing: http://it.slashdot.org/article.pl?sid=07/08/26/1628200from=rss Craig ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] Is ANALYZE transactional?
If I do: begin; update some_table set foo = newvalue where a_bunch_of_rows_are_changed; analyze some_table; rollback; does it roll back the statistics? (I think the answer is yes, but I need to be sure.) Thanks, Craig ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Curious about dead rows.
Alvaro Herrera wrote: To recap: - your app only does inserts - there has been no rollback lately - there are no updates - there are no deletes The only other source of dead rows I can think is triggers ... do you have any? (Not necessarily on this table -- perhaps triggers on other tables can cause updates on this one). Oh, rolled back COPY can cause dead rows too. What about an unreliable network that causes lot of disconnects? Wouldn't the server process do a rollback? Craig ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Query only slow on first run
tmp wrote: what exactly is that random_number column A random float that is initialized when the row is created and never modified afterwards. The physical row ordering will clearly not match the random_number ordering. However, other queries uses a row ordering by the primary key so I don't think it would make much sense to make the index on random_number a clustering index just in order to speed up this single query. and why are you desirous of ordering by it? In order to simulate a random pick of K rows. See [1]. A trick that I used is to sample the random column once, and create a much smaller table of the first N rows, where N is the sample size you want, and use that. If you need a different N samples each time, you can create a temporary table, put your random N rows into that, do an ANALYZE, and then join to this smaller table. The overall performance can be MUCH faster even though you're creating and populating a whole table, than the plan that Postgres comes up with. This seems wrong-headed (why shouldn't Postgres be able to be as efficient on its own?), but it works. Craig ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[PERFORM] libgcc double-free, backend won't die
This is driving me crazy. I have some Postgres C function extensions in a shared library. They've been working fine. I upgraded to Fedora Core 6 and gcc4, and now every time psql(1) disconnects from the server, the serverlog gets this message: *** glibc detected *** postgres: mydb mydb [local] idle: double free or corruption! (!prev): 0x08bfcde8 and the backend process won't die. Every single connection that executes one of my functions leaves an idle process, like this: $ ps -ef | grep postgres postgres 12938 12920 0 23:24 ?00:00:00 postgres: mydb mydb [local] idle This error only happens on disconnect. As long as I keep the connection open, I can Worse, these zombie Postgres processes won't die, which means I can't shut down and restart Postgres unless I kill -9 all of them, and I can't use this at all because I get zillions of these dead processes. I've used valgrind on a test application that runs all of my functions outside of the Postgres environment, and not a single problem shows up even after hours of processing. I tried setting MALLOC_CHECK_ to various values, so that I could trap the abort() call using gdb, but once MALLOC_CHECK_ is set, the double-free error never occurs. (But malloc slows down too much this way.) I even read through the documentation for C functions again, and carefully examined my code. Nothing is amiss, some of the functions are quite simple yet still exhibit this problem. Anyone seen this before? It's driving me nuts. Postgres 8.1.4 Linux kernel 2.6.22 gcc 4.1.1 Thanks, Craig ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] libgcc double-free, backend won't die
Alvaro Herrera wrote: Craig James wrote: This is driving me crazy. I have some Postgres C function extensions in a shared library. They've been working fine. I upgraded to Fedora Core 6 and gcc4, and now every time psql(1) disconnects from the server, the serverlog gets this message: *** glibc detected *** postgres: mydb mydb [local] idle: double free or corruption! (!prev): 0x08bfcde8 Do you have any Perl or Python functions or stuff like that? There is one Perl function, but it is never invoked during this test. I connect to Postgres, issue one select myfunc(), and disconnect. Postgres 8.1.4 Please upgrade to 8.1.10 and try again. If it still fails we will be much more interested in tracking it down. Good idea, but alas, no difference. I get the same double free or corruption! mesage. I compiled 8.1.10 from source and installed, then rebuilt all of my code from scratch and reinstalled the shared object. Same message as before. Here is my guess -- and this is just a guess. My functions use a third-party library which, of necessity, uses malloc/free in the ordinary way. I suspect that there's a bug in the Postgres palloc() code that's walking over memory that regular malloc() allocates. The third-party library (OpenBabel) has been tested pretty thoroughly by me an others and has no memory corruption problems. All malloc's are freed properly. Does that seem like a possibility? I can't figure out how to use ordinary tools like valgrind with a Postgres backend process to track this down. Thanks, Craig ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] libgcc double-free, backend won't die
Tom Lane wrote: Craig James [EMAIL PROTECTED] writes: This is driving me crazy. I have some Postgres C function extensions in a shared library. They've been working fine. I upgraded to Fedora Core 6 and gcc4, and now every time psql(1) disconnects from the server, the serverlog gets this message: *** glibc detected *** postgres: mydb mydb [local] idle: double free or corruption! (!prev): 0x08bfcde8 Have you tried attaching to one of these processes with gdb to see where it ends up? Have you checked to see if the processes are becoming multi-threaded? regards, tom lane # ps -ef | grep postgres postgres 31362 1 0 06:53 ?00:00:00 /usr/local/pgsql/bin/postmaster -D /postgres/main postgres 31364 31362 0 06:53 ?00:00:00 postgres: writer process postgres 31365 31362 0 06:53 ?00:00:00 postgres: stats buffer process postgres 31366 31365 0 06:53 ?00:00:00 postgres: stats collector process postgres 31442 31362 0 06:54 ?00:00:00 postgres: craig_test craig_test [local] idle root 31518 31500 0 07:06 pts/600:00:00 grep postgres # gdb -p 31442 GNU gdb Red Hat Linux (6.5-15.fc6rh) Copyright (C) 2006 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. [snip - a bunch of symbol table stuff] 0x00110402 in __kernel_vsyscall () (gdb) bt #0 0x00110402 in __kernel_vsyscall () #1 0x0082fb8e in __lll_mutex_lock_wait () from /lib/libc.so.6 #2 0x007bfce8 in _L_lock_14096 () from /lib/libc.so.6 #3 0x007befa4 in free () from /lib/libc.so.6 #4 0x00744f93 in _dl_map_object_deps () from /lib/ld-linux.so.2 #5 0x0074989d in dl_open_worker () from /lib/ld-linux.so.2 #6 0x00745c36 in _dl_catch_error () from /lib/ld-linux.so.2 #7 0x00749222 in _dl_open () from /lib/ld-linux.so.2 #8 0x00858712 in do_dlopen () from /lib/libc.so.6 #9 0x00745c36 in _dl_catch_error () from /lib/ld-linux.so.2 #10 0x008588c5 in __libc_dlopen_mode () from /lib/libc.so.6 #11 0x00836139 in init () from /lib/libc.so.6 #12 0x008362d3 in backtrace () from /lib/libc.so.6 #13 0x007b3e11 in __libc_message () from /lib/libc.so.6 #14 0x007bba96 in _int_free () from /lib/libc.so.6 #15 0x007befb0 in free () from /lib/libc.so.6 #16 0x001f943a in DeleteByteCode (node=0x890ff4) at chains.cpp:477 #17 0x00780859 in exit () from /lib/libc.so.6 #18 0x081a6064 in proc_exit () #19 0x081b5b9d in PostgresMain () #20 0x0818e34b in ServerLoop () #21 0x0818f1de in PostmasterMain () #22 0x08152369 in main () (gdb) ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] libgcc double-free, backend won't die
Alvaro Herrera wrote: Craig James wrote: Alvaro Herrera wrote: Craig James wrote: Here is my guess -- and this is just a guess. My functions use a third-party library which, of necessity, uses malloc/free in the ordinary way. I suspect that there's a bug in the Postgres palloc() code that's walking over memory that regular malloc() allocates. The third-party library (OpenBabel) has been tested pretty thoroughly by me an others and has no memory corruption problems. All malloc's are freed properly. Does that seem like a possibility? Not really. palloc uses malloc underneath. But some Postgres code could be walking off the end of a malloc'ed block, even if palloc() is allocating and deallocating correctly. Which is why I was hoping to use valgrind to see what's going on. I very much doubt it. Since you've now shown that OpenBabel is multithreaded, then that's a much more likely cause. Can you elaborate? Are multithreaded libraries not allowed to be linked to Postgres? Thanks, Craig ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] libgcc double-free, backend won't die
Alvaro Herrera wrote: ...Since you've now shown that OpenBabel is multithreaded, then that's a much more likely cause. Can you elaborate? Are multithreaded libraries not allowed to be linked to Postgres? Absolutely not. Ok, thanks, I'll work on recompiling OpenBabel without thread support. Since I'm not a Postgres developer, perhaps one of the maintainers could update the Postgres manual. In chapter 32.9.6, it says, To be precise, a shared library needs to be created. This should be amended to say, To be precise, a non-threaded, shared library needs to be created. Cheers, Craig ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] libgcc double-free, backend won't die
Tom Lane wrote: Craig James [EMAIL PROTECTED] writes: GNU gdb Red Hat Linux (6.5-15.fc6rh) Copyright (C) 2006 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. [snip - a bunch of symbol table stuff] Please show that stuff you snipped --- it might have some relevant information. The stack trace looks a bit like a threading problem... # gdb -p 31442 GNU gdb Red Hat Linux (6.5-15.fc6rh) Copyright (C) 2006 Free Software Foundation, Inc. GDB is free software, covered by the GNU General Public License, and you are welcome to change it and/or distribute copies of it under certain conditions. Type show copying to see the conditions. There is absolutely no warranty for GDB. Type show warranty for details. This GDB was configured as i386-redhat-linux-gnu. Attaching to process 31442 Reading symbols from /usr/local/pgsql/bin/postgres...(no debugging symbols found)...done. Using host libthread_db library /lib/libthread_db.so.1. Reading symbols from /usr/lib/libz.so.1...(no debugging symbols found)...done. Loaded symbols for /usr/lib/libz.so.1 Reading symbols from /usr/lib/libreadline.so.5...(no debugging symbols found)...done. Loaded symbols for /usr/lib/libreadline.so.5 Reading symbols from /lib/libtermcap.so.2...(no debugging symbols found)...done. Loaded symbols for /lib/libtermcap.so.2 Reading symbols from /lib/libcrypt.so.1... (no debugging symbols found)...done. Loaded symbols for /lib/libcrypt.so.1 Reading symbols from /lib/libresolv.so.2...(no debugging symbols found)...done. Loaded symbols for /lib/libresolv.so.2 Reading symbols from /lib/libnsl.so.1...(no debugging symbols found)...done. Loaded symbols for /lib/libnsl.so.1 Reading symbols from /lib/libdl.so.2...(no debugging symbols found)...done. Loaded symbols for /lib/libdl.so.2 Reading symbols from /lib/libm.so.6... (no debugging symbols found)...done. Loaded symbols for /lib/libm.so.6 Reading symbols from /lib/libc.so.6...(no debugging symbols found)...done. Loaded symbols for /lib/libc.so.6 Reading symbols from /lib/ld-linux.so.2...(no debugging symbols found)...done. Loaded symbols for /lib/ld-linux.so.2 Reading symbols from /lib/libnss_files.so.2...(no debugging symbols found)...done. Loaded symbols for /lib/libnss_files.so.2 Reading symbols from /usr/local/pgsql/lib/libchmoogle.so...done. Loaded symbols for /usr/local/pgsql/lib/libchmoogle.so Reading symbols from /lib/libgcc_s.so.1...done. Loaded symbols for /lib/libgcc_s.so.1 Reading symbols from /usr/local/openbabel-inst/openbabel-dev-2-1-x/lib/openbabel/jaguarformat.so...done. Loaded symbols for /usr/local/openbabel-inst/openbabel-dev-2-1-x/lib/openbabel/jaguarformat.so Reading symbols from /usr/local/openbabel-inst/openbabel-dev-2-1-x/lib/libopenbabel.so.2...done. Loaded symbols for /usr/local/openbabel-inst/openbabel-dev-2-1-x/lib/libopenbabel.so.2 Reading symbols from /usr/lib/libstdc++.so.6...done. Loaded symbols for /usr/lib/libstdc++.so.6 Reading symbols from /usr/local/openbabel-inst/openbabel-dev-2-1-x/lib/openbabel/fastaformat.so...done. Loaded symbols for /usr/local/openbabel-inst/openbabel-dev-2-1-x/lib/openbabel/fastaformat.so Reading symbols from /usr/local/openbabel-inst/openbabel-dev-2-1-x/lib/openbabel/cansmilesformat.so...done. Loaded symbols for /usr/local/openbabel-inst/openbabel-dev-2-1-x/lib/openbabel/cansmilesformat.so Reading symbols from /usr/local/openbabel-inst/openbabel-dev-2-1-x/lib/openbabel/APIInterface.so...done. Loaded symbols for /usr/local/openbabel-inst/openbabel-dev-2-1-x/lib/openbabel/APIInterface.so Reading symbols from /usr/local/openbabel-inst/openbabel-dev-2-1-x/lib/openbabel/mmodformat.so...done. Loaded symbols for /usr/local/openbabel-inst/openbabel-dev-2-1-x/lib/openbabel/mmodformat.so Reading symbols from /usr/local/openbabel-inst/openbabel-dev-2-1-x/lib/openbabel/molreportformat.so...done. Loaded symbols for /usr/local/openbabel-inst/openbabel-dev-2-1-x/lib/openbabel/molreportformat.so Reading symbols from /usr/local/openbabel-inst/openbabel-dev-2-1-x/lib/openbabel/fhformat.so...done. Loaded symbols for /usr/local/openbabel-inst/openbabel-dev-2-1-x/lib/openbabel/fhformat.so Reading symbols from /usr/local/openbabel-inst/openbabel-dev-2-1-x/lib/openbabel/chemkinformat.so...done. Loaded symbols for /usr/local/openbabel-inst/openbabel-dev-2-1-x/lib/openbabel/chemkinformat.so Reading symbols from /usr/local/openbabel-inst/openbabel-dev-2-1-x/lib/openbabel/mmcifformat.so...done. Loaded symbols for /usr/local/openbabel-inst/openbabel-dev-2-1-x/lib/openbabel/mmcifformat.so Reading symbols from /usr/local/openbabel-inst/openbabel-dev-2-1-x/lib/openbabel/thermoformat.so...done. Loaded symbols for /usr/local/openbabel-inst/openbabel-dev-2-1-x/lib/openbabel/thermoformat.so Reading symbols from /usr/local/openbabel-inst/openbabel-dev-2-1-x/lib/openbabel/carformat.so...done. Loaded symbols
Re: [PERFORM] libgcc double-free, backend won't die
Tom Lane wrote: Magnus Hagander [EMAIL PROTECTED] writes: On Tue, Dec 11, 2007 at 07:50:17AM -0800, Craig James wrote: Since I'm not a Postgres developer, perhaps one of the maintainers could update the Postgres manual. In chapter 32.9.6, it says, To be precise, a shared library needs to be created. This should be amended to say, To be precise, a non-threaded, shared library needs to be created. Just before someone goes ahead and writes it (which is probably a good idea in general), don't write it just like taht - because it's platform dependent. I can find no such text in our documentation at all, nor any reference to OpenBabel. I think Craig must be looking at someone else's documentation. http://www.postgresql.org/docs/8.1/static/xfunc-c.html#DFUNChttp://www.postgresql.org/docs/8.1/static/xfunc-c.html#DFUNC Craig ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] libgcc double-free, backend won't die
Gregory Stark wrote: Tom Lane [EMAIL PROTECTED] writes: James Mansion [EMAIL PROTECTED] writes: Is there any particular reason not to ensure that any low-level threading support in libc is enabled right from the get-go, as a build-time option? Yes. 1) It's of no value to us Who is us? Some of us would like to use the system for advanced scientific work, and scientific libraries are usually written in C++. 2) On many platforms there is a nonzero performance penalty I'm surprised you say this, given that you're usually the voice of reason when it comes to rejecting hypothetical statements in favor of tested facts. If building Postgres using thread-safe technology is really a performance burden, that could be easily verified. A nonzero performance penalty, what does that mean, a 0.0001% slowdown? I find it hard to believe that the performance penalty of thread-safe version would even be measurable. If nobody has the time to do such a test, or other priorities take precedence, that's understandable. But the results aren't in yet. And the only reason to do that would be to work around one bug in one small range of glibc versions. If you're going to use a multi-threaded library (which isn't very common since it's hard to do safely for all those other reasons) surely using a version of your OS without any thread related bugs is a better idea. You're jumping ahead. This problem has not been accurately diagnosed yet. It could be that the pthreads issue is completely misleading everyone, and in fact there is a genuine memory corruption going on here. Or not. We don't know yet. I have made zero progress fixing this problem. The one small range of glibc versions is a giveaway. I've seen this problem in FC3, 5, and 6 (I went through this series of upgrades all in one week trying to fix this problem). With each version, I recompiled Postgres and OpenBabel from scratch. I'm going to try FC7 next since it's now the only official supported version, but I don't believe glibc is the problem. Andrew Dalke, a regular contributor to the OpenBabel forum, suggests another problem: It could be a result of linking the wrong libraries together. The gcc/ld system has a byzantine set of rules and hacks that if I understand Andrew's posting) select different versions of the same library depending on what it thinks you might need. It's possible that the wrong version of some system library is getting linked in. Craig ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
[PERFORM] Multi-threading friendliness (was: libgcc double-free, backend won't die)
Bruce Momjian wrote: James Mansion wrote: I think you have your head in the ground, but its your perogative. *You* might not care, but anyone wanting to use thread-aware libraries (and I'm *not* talking about threading in any Postgres code) will certainly value it if they can do so with some stability. I suggest you find out the cause of your problem and then we can do more research. Talking about us changing the Postgres behavior from the report of one user who doesn't even have the full details isn't productive. I think you're confusing James Mansion with me (Craig James). I'm the one with the unresolved problem. James is suggesting, completely independently of whether or not there's a bug in my system, that a thread-friendly option for Postgres would be very useful. Don't confuse thread-friendly with a threaded implemetation of Postgres itself. These are two separate questions. Thread-friendly involves compile/link options that don't affect the Postgres source code at all. Craig ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] Making the most of memory?
Guy Rouillier wrote: Scott Marlowe wrote: I assume you're talking about solid state drives? They have their uses, but for most use cases, having plenty of RAM in your server will be a better way to spend your money. For certain high throughput, relatively small databases (i.e. transactional work) the SSD can be quite useful. Unless somebody has changes some physics recently, I'm not understanding the recent discussions of SSD in the general press. Flash has a limited number of writes before it becomes unreliable. On good quality consumer grade, that's about 300,000 writes, while on industrial grade it's about 10 times that. That's fine for mp3 players and cameras; even professional photographers probably won't rewrite the same spot on a flash card that many times in a lifetime. But for database applications, 300,000 writes is trivial. 3 million will go a lot longer, but in non-archival applications, I imagine even that mark won't take but a year or two to surpass. One trick they use is to remap the physical Flash RAM to different logical addresses. Typical apps update a small percentage of the data frequently, and the rest of the data rarely or never. By shuffling the physical Flash RAM around, the media lasts a lot longer than a simple analysis might indicate. Craig ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] 8x2.5 or 6x3.5 disks
Mike Smith wrote: I’ve seen a few performance posts on using different hardware technologies to gain improvements. Most of those comments are on raid, interface and rotation speed. One area that doesn’t seem to have been mentioned is to run your disks empty. ... On the outside of the disk you get a lot more data per seek than on the inside. Double whammy you get it faster. Performance can vary more than 100% between the outer and inner tracks of the disk. So running a slower disk twice as big may give you more benefit than running a small capacity 15K disk full. The slower disks are also generally more reliable and mostly much cheaper. ... This is not very green as you need to buy more disks for the same amount of data and its liable to upset your purchasing department who won’t understand why you don’t want to fill your disks up. So presumably the empty-disk effect could also be achieved by partitioning, say 25% of the drive for the database, and 75% empty partition. But in fact, you could use that low performance 75% for rarely-used or static data, such as the output from pg_dump, that is written during non-peak times. Pretty cool. Craig ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
[PERFORM] Dell Perc/6
Does anyone have performance info about the new Dell Perc/6 controllers? I found a long discussion (Dell vs HP) about the Perc/5, but nothing about Perc/6. What's under the covers? Here is the (abbreviated) info from Dell on this machine: PowerEdge 1950 IIIQuad Core Intel® Xeon® E5405, 2x6MB Cache, 2.0GHz, 1333MHz FSB Additional Processors Quad Core Intel® Xeon® E5405, 2x6MB Cache, 2.0GHz, 1333MHz FSB Memory8GB 667MHz (4x2GB), Dual Ranked DIMMs Hard Drive Configuration Integrated SAS/SATA RAID 5, PERC 6/i Integrated Thanks, Craig ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PERFORM] How to allocate 8 disks
We're upgrading to a medium-sized server, a Dell PowerEdge 2950, dual-quad CPU's and 8 GB memory. This box can hold at most 8 disks (10K SCSI 2.5 146 GB drives) and has Dell's Perc 6/i RAID controller. I'm thinking of this: 6 disks RAID 1+0 Postgres data 1 disk WAL 1 disk Linux I've often seen RAID 1 recommended for the WAL. Is that strictly for reliability, or is there a performance advantage to RAID 1 for the WAL? It seems to me separating the OS and WAL on two disks is better than making a single RAID 1 and sharing it, from a performance point of view. Thanks, Craig ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] How to allocate 8 disks
Joshua D. Drake wrote: On Sat, 01 Mar 2008 10:06:54 -0800 Craig James [EMAIL PROTECTED] wrote: We're upgrading to a medium-sized server, a Dell PowerEdge 2950, dual-quad CPU's and 8 GB memory. This box can hold at most 8 disks (10K SCSI 2.5 146 GB drives) and has Dell's Perc 6/i RAID controller. I'm thinking of this: 6 disks RAID 1+0 Postgres data 1 disk WAL 1 disk Linux I've often seen RAID 1 recommended for the WAL. Is that strictly for reliability, or is there a performance advantage to RAID 1 for the WAL? It seems to me separating the OS and WAL on two disks is better than making a single RAID 1 and sharing it, from a performance point of view. This scares me... You lose WAL you are a goner. Combine your OS and WAL into a RAID 1. Right, I do understand that, but reliability is not a top priority in this system. The database will be replicated, and can be reproduced from the raw data. It's not an accounting system, it finds scientific results. That's not to say I *won't* take your advice, we may in fact combine the OS and WAL on one disk. Reliability is a good thing, but I need to know all of the tradeoffs, so that I can weigh performance, reliability, and cost and make the right choice. So my question still stands: From a strictly performance point of view, would it be better to separate the OS and the WAL onto two disks? Is there any performance advantage to RAID 1? My understanding is that RAID 1 can give 2x seek performance during read, but no advantage during write. For the WAL, it seems to me that RAID 1 has no performance benefits, so separating the WAL and OS seems like a peformance advantage. Another option would be: 4 disks RAID 1+0 Postgres data 2 disks RAID 1WAL 1 diskLinux 1 diskspare This would give us reliability, but I think the performance would be considerably worse, since the primary Postgres data would come from 4 disks instead of six. I guess we could also consider: 4 disks RAID 1+0 Postgres data 4 disks RAID 1+0 WAL and Linux Or even 8 disks RAID 1+0 Everything This is a dedicated system and does nothing but Apache/Postgres, so the OS should get very little traffic. But if that's the case, I guess you could argue that your suggestion of combining OS and WAL on a 2-disk RAID 1 would be the way to go, since the OS activity wouldn't affect the WAL very much. I suppose the thing to do is get the system, and run bonnie on various configurations. I've never run bonnie before -- can I get some useful results without a huge learning curve? Thanks, Craig ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] How to allocate 8 disks
Matthew wrote: On Sat, 1 Mar 2008, Craig James wrote: Right, I do understand that, but reliability is not a top priority in this system. The database will be replicated, and can be reproduced from the raw data. So what you're saying is: 1. Reliability is not important. 2. There's zero write traffic once the database is set up. Well, I actually didn't say either of those things, but I appreciate the feedback. RAID 0 is an interesting suggestion, but given our constraints, it's not an option. Reliability is important, but not as important as, say, a banking system. And as far as zero write traffic, I don't know where that came from. It's a hitlist based system, where complex search results are saved for the user in tables, and the write traffic can be quite high. If this is true, then RAID-0 is the way to go. I think Greg's options are good. Either: 2 discs RAID 1: OS 6 discs RAID 0: database + WAL which is what we're using here (except with more discs), or: 8 discs RAID 10: everything Right now, an 8-disk RAID 10 is looking like the best choice. The Dell Perc 6i has configurations that include a battery-backed cache, so performance should be quite good. However, if reliability *really* isn't an issue, and you can accept reinstalling the system if you lose a disc, then there's a third option: 8 discs RAID 0: Everything I imagine the MTBF on a system like this would be 1 year, which is out of the question, even with a backup system that can take over. A failure completely wipes the system, OS and everything, so you're guaranteed that once or twice a year, you have to rebuild your system from the ground up. I'd rather spend that time at the beach! Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your Subscription: http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.orgextra=pgsql-performance
Re: [PERFORM] count * performance issue
In the 3 years I've been using Postgres, the problem of count() performance has come up more times than I can recall, and each time the answer is, It's a sequential scan -- redesign your application. My question is: What do the other databases do that Postgres can't do, and why not? Count() on Oracle and MySQL is almost instantaneous, even for very large tables. So why can't Postgres do what they do? On the one hand, I understand that Postgres has its architecture, and I understand the issue of row visibility, and so forth. On the other hand, my database is just sitting there, nothing going on, no connections except me, and... it takes FIFTY FIVE SECONDS to count 20 million rows, a query that either Oracle or MySQL would answer in a fraction of a second. It's hard for me to believe there isn't a better way. This is a real problem. Countless people (including me) have spent significant effort rewriting applications because of this performance flaw in Postgres. Over and over, the response is, You don't really need to do that ... change your application. Well, sure, it's always possible to change the application, but that misses the point. To most of us users, count() seems like it should be a trivial operation. On other relational database systems, it is a trivial operation. This is really a significant flaw on an otherwise excellent relational database system. My rant for today... Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://mail.postgresql.org/mj/mj_wwwusr?domain=postgresql.orgextra=pgsql-performance
Re: [PERFORM] count * performance issue
Tom Lane wrote: Craig James [EMAIL PROTECTED] writes: Count() on Oracle and MySQL is almost instantaneous, even for very large tables. So why can't Postgres do what they do? AFAIK the above claim is false for Oracle. They have the same transactional issues we do. My experience doesn't match this claim. When I ported my application from Oracle to Postgres, this was the single biggest performance problem. count() in Oracle was always very fast. We're not talking about a 20% or 50% difference, we're talking about a small fraction of a second (Oracle) versus a minute (Postgres) -- something like two or three orders of magnitude. It may be that Oracle has a way to detect when there's no transaction and use a faster method. If so, this was a clever optimization -- in my experience, that represents the vast majority of the times you want to use count(). It's not very useful to count the rows of a table that many apps are actively modifying since the result may change the moment your transaction completes. Most of the time when you use count(), it's because you're the only one modifying the table, so the count will be meaningful. Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Benchmark: Dell/Perc 6, 8 disk RAID 10
I just received a new server and thought benchmarks would be interesting. I think this looks pretty good, but maybe there are some suggestions about the configuration file. This is a web app, a mix of read/write, where writes tend to be insert into ... (select ...) where the resulting insert is on the order of 100 to 10K rows of two integers. An external process also uses a LOT of CPU power along with each query. Thanks, Craig Configuration: Dell 2950 8 CPU (Intel 2GHz Xeon) 8 GB memory Dell Perc 6i with battery-backed cache RAID 10 of 8x 146GB SAS 10K 2.5 disks Everything (OS, WAL and databases) are on the one RAID array. Diffs from original configuration: max_connections = 1000 shared_buffers = 400MB work_mem = 256MB max_fsm_pages = 100 max_fsm_relations = 5000 wal_buffers = 256kB effective_cache_size = 4GB Bonnie output (slightly reformatted) -- Delete files in random order...done. Version 1.03 --Sequential Output-- --Sequential Input- --Random- -Per Chr- --Block---Rewrite- -Per Chr- --Block----Seeks-- Size K/sec %CP K/sec %CP K/sec %CPK/sec %CP K/sec %CP/sec %CP 16G 64205 99 234252 38 112924 2665275 98 293852 24 940.3 1 --Sequential Create--Random Create -Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP/sec %CP/sec %CP/sec %CP/sec %CP/sec %CP 16 12203 95 + +++ 19469 94 12297 95 + +++ 15578 82 www.xxx.com,16G,64205,99,234252,38,112924,26,65275,98,293852,24,940.3,1,16,12203,95,+,+++,19469,94,12297,95,+,+++,15578,82 -- $ pgbench -c 10 -t 1 -v test -U test starting vacuum...end. starting vacuum accounts...end. transaction type: TPC-B (sort of) scaling factor: 1 number of clients: 10 number of transactions per client: 1 number of transactions actually processed: 10/10 tps = 2786.377933 (including connections establishing) tps = 2787.888209 (excluding connections establishing) -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Benchmark: Dell/Perc 6, 8 disk RAID 10
Joshua D. Drake wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On Wed, 12 Mar 2008 21:55:18 -0700 Craig James [EMAIL PROTECTED] wrote: Diffs from original configuration: max_connections = 1000 shared_buffers = 400MB work_mem = 256MB max_fsm_pages = 100 max_fsm_relations = 5000 wal_buffers = 256kB effective_cache_size = 4GB I didn't see which OS but I assume linux. I didn't see postgresql so I assume 8.3. Right on both counts. wal_sync_method = open_sync checkpoint_segments = 30 shared_buffers = 2000MB asyncrhonous_commit = off (sp?) Try again. Nice improvement! About 25% increase in TPS: $ pgbench -c 10 -t 1 -v test -U test starting vacuum...end. starting vacuum accounts...end. transaction type: TPC-B (sort of) scaling factor: 1 number of clients: 10 number of transactions per client: 1 number of transactions actually processed: 10/10 tps = 3423.636423 (including connections establishing) tps = 3425.957521 (excluding connections establishing) For reference, here are the results before your suggested changes: $ pgbench -c 10 -t 1 -v test -U test starting vacuum...end. starting vacuum accounts...end. transaction type: TPC-B (sort of) scaling factor: 1 number of clients: 10 number of transactions per client: 1 number of transactions actually processed: 10/10 tps = 2786.377933 (including connections establishing) tps = 2787.888209 (excluding connections establishing) Thanks! Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] What is the best way to storage music files in Postgresql
Rich wrote: I am going to embarkon building a music library using apache, postgresql and php. What is the best way to store the music files? Which file type should I use? In Postgres, its all just binary data. It's entirely up to you which particular format you use. mp2, mp3 mp4, wmv, avi, whatever, it's all the same to Postgres. A better question is: Should you store the binary data in Postgres itself, or keep it in files and only store the filenames? The Postgres archives have several discussions of this topic, and the answer is, it depends. Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Benchmark: Dell/Perc 6, 8 disk RAID 10
Dave Cramer wrote: On 16-Mar-08, at 2:19 AM, Justin wrote: I decided to reformat the raid 10 into ext2 to see if there was any real big difference in performance as some people have noted here is the test results please note the WAL files are still on the raid 0 set which is still in ext3 file system format. these test where run with the fsync as before. I made sure every thing was the same as with the first test. This is opposite to the way I run things. I use ext2 on the WAL and ext3 on the data. I'd also suggest RAID 10 on the WAL it is mostly write. Just out of curiosity: Last time I did research, the word seemed to be that xfs was better than ext2 or ext3. Is that not true? Why use ext2/3 at all if xfs is faster for Postgres? Criag -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Benchmark: Dell/Perc 6, 8 disk RAID 10
Craig James wrote: Dave Cramer wrote: On 16-Mar-08, at 2:19 AM, Justin wrote: I decided to reformat the raid 10 into ext2 to see if there was any real big difference in performance as some people have noted here is the test results please note the WAL files are still on the raid 0 set which is still in ext3 file system format. these test where run with the fsync as before. I made sure every thing was the same as with the first test. This is opposite to the way I run things. I use ext2 on the WAL and ext3 on the data. I'd also suggest RAID 10 on the WAL it is mostly write. Just out of curiosity: Last time I did research, the word seemed to be that xfs was better than ext2 or ext3. Is that not true? Why use ext2/3 at all if xfs is faster for Postgres? Criag And let's see if I can write my own name ... Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Benchmark: Dell/Perc 6, 8 disk RAID 10
Justin wrote: 2000 tps ??? do you have fsync turned off ? Dave No its turned on. Unless I'm seriously confused, something is wrong with these numbers. That's the sort of performance you expect from a good-sized RAID 10 six-disk array. With a single 7200 rpm SATA disk and XFS, I get 640 tps. There's no way you could 2000 tps from a single disk. Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Anybody using the Dell Powervault MD3000 array?
Gavin M. Roy wrote: On Wed, Apr 16, 2008 at 4:39 PM, Joshua D. Drake [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote: On Wed, 16 Apr 2008 13:37:32 -0700 Jeffrey Baker [EMAIL PROTECTED] mailto:[EMAIL PROTECTED] wrote: I can second this. The MSA 70 is a great unit for the money. Thank you both. The MSA 70 looks like an ordinary disk shelf. What controllers do you use? Or, do you just go with a software RAID? P800, from HP. In a Dell box I use a Perc 6/E with a SAS to Mini SAS cable. There was a fairly long recent thread discussing the Dell Perc 6 controller starting here: http://archives.postgresql.org/pgsql-performance/2008-03/msg00264.php and one relevant follow-up regarding the MD1000 box: http://archives.postgresql.org/pgsql-performance/2008-03/msg00280.php (Unfortunately, the Postgres web archive does a terrible job formatting plain-old-text messages, it doesn't seem to know that it should wrap paragraphs, so some of these are pretty hard to read as web pages.) Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] two memory-consuming postgres processes
On Fri, May 2, 2008 at 2:26 PM, Alexy Khrabrov [EMAIL PROTECTED] wrote: I naively thought that if I have a 100,000,000 row table, of the form (integer,integer,smallint,date), and add a real coumn to it, it will scroll through the memory reasonably fast. In Postgres, an update is the same as a delete/insert. That means that changing the data in one column rewrites ALL of the columns for that row, and you end up with a table that's 50% dead space, which you then have to vacuum. Sometimes if you have a volatile column that goes with several static columns, you're far better off to create a second table for the volatile data, duplicating the primary key in both tables. In your case, it would mean the difference between 10^8 inserts of (int, float), very fast, compared to what you're doing now, which is 10^8 insert and 10^8 deletes of (int, int, smallint, date, float), followed by a big vacuum/analyze (also slow). The down side of this design is that later on, it requires a join to fetch all the data for each key. You do have a primary key on your data, right? Or some sort of index? Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Backup causing poor performance - suggestions
Campbell, Lance wrote: We currently backup all of our database tables per schema using pg_dump every half hour. We have been noticing that the database performance has been very poor during the backup process. How can I improve the performance? It sounds like the goal is to have frequent, near-real-time backups of your databases for recovery purposes. Maybe instead of looking at pg_dump's performance, a better solution would be a replication system such as Slony, or a warm backup using Skype Tools. Backing up the database every half hour puts a large load on the system during the dump, and means you are re-dumping the same data, 48 times per day. If you use a replication solution, the backup process is continuous (spread out through the day), and you're not re-dumping static data; the only data that moves around is the new data. I've used Slony with mixed success; depending on the complexity and size of your database, it can be quite effective. I've heard very good reports about Skype Tools, which has both a Slony-like replicator (not as configurable as Slony, but easier to set up and use), plus an entirely separate set of scripts that simplifies warm standby using WAL logging. Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] What constitutes a complex query
Justin wrote: This falls under the stupid question and i'm just curious what other people think what makes a query complex? There are two kinds: 1. Hard for Postgres to get the answer. 2. Hard for a person to comprehend. Which do you mean? Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] RAID 10 Benchmark with different I/O schedulers
Greg Smith wrote: On Mon, 5 May 2008, Craig James wrote: pgbench -i -s 20 -U test That's way too low to expect you'll see a difference in I/O schedulers. A scale of 20 is giving you a 320MB database, you can fit the whole thing in RAM and almost all of it on your controller cache. What's there to schedule? You're just moving between buffers that are generally large enough to hold most of what they need. Test repeated with: autovacuum enabled database destroyed and recreated between runs pgbench -i -s 600 ... pgbench -c 10 -t 5 -n ... I/O Sched AVG Test1 Test2 --- - cfq705 695715 noop 758 769747 deadline 741 705775 anticipatory 494 477511 I only did two runs of each, which took about 24 minutes. Like the first round of tests, the noise in the measurements (about 10%) exceeds the difference between scheduler-algorithm performance, except that anticipatory seems to be measurably slower. So it still looks like cfq, noop and deadline are more or less equivalent when used with a battery-backed RAID. Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] RAID 10 Benchmark with different I/O schedulers
Greg Smith wrote: On Tue, 6 May 2008, Craig James wrote: I only did two runs of each, which took about 24 minutes. Like the first round of tests, the noise in the measurements (about 10%) exceeds the difference between scheduler-algorithm performance, except that anticipatory seems to be measurably slower. Those are much better results. Any test that says anticipatory is anything other than useless for database system use with a good controller I presume is broken, so that's how I know you're in the right ballpark now but weren't before. In order to actually get some useful data out of the noise that is pgbench, you need a lot more measurements of longer runs. As perspective, the last time I did something in this area, in order to get enough data to get a clear picture I ran tests for 12 hours. I'm hoping to repeat that soon with some more common hardware that gives useful results I can give out. This data is good enough for what I'm doing. There were reports from non-RAID users that the I/O scheduling could make as much as a 4x difference in performance (which makes sense for non-RAID), but these tests show me that three of the four I/O schedulers are within 10% of each other. Since this matches my intuition of how battery-backed RAID will work, I'm satisfied. If our servers get overloaded to the point where 10% matters, then I need a much more dramatic solution, like faster machines or more machines. Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Problem with 11 M records table
idc danny wrote: Hi everybody, I'm fairly new to PostgreSQL and I have a problem with a query: SELECT * FROM LockerEvents LIMIT 1 OFFSET 1099 The table LockerEvents has 11 Mlillions records on it and this query takes about 60 seconds to complete. The OFFSET clause is almost always inefficient for anything but very small tables or small offsets. In order for a relational database (not just Postgres) to figure out which row is the 1100th row, it has to actually retrieve the first 1099 rows and and discard them. There is no magical way to go directly to the 11-millionth row. Even on a trivial query such as yours with no WHERE clause, the only way to determine which row is the 11 millionths is to scan the previous 1099. There are better (faster) ways to achieve this, but it depends on why you are doing this query. That is, do you just want this one block of data, or are you scanning the whole database in 10,000-row blocks? Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Problem with 11 M records table
idc danny wrote: Hi James, Than you for your response. What I want to achieve is to give to the application user 10k rows where the records are one after another in the table, and the application has a paginating GUI (First page, Previous page, Next page, Last page - all links Jump to page combobox) where thsi particular query gets to run if the user clicks on the Last page link. The application receive the first 10k rows in under a second when the user clicks on First page link and receive the last 10k rows in about 60 seconds when he clicks on Last page link. You need a sequence that automatically assigns an ascending my_rownum to each row as it is added to the table, and an index on that my_rownum column. Then you select your page by (for example) select * from my_table where my_rownum = 100 and id 110; That will do what you want, with instant performance that's linear over your whole table. If your table will have deletions, then you have to update the row numbering a lot, which will cause you terrible performance problems due to the nature of the UPDATE operation in Postgres. If this is the case, then you should keep a separate table just for numbering the rows, which is joined to your main table when you want to retrieve a page of data. When you delete data (which should be batched, since this will be expensive), then you truncate your rownum table, reset the sequence that generates your row numbers, then regenerate your row numbers with something like insert into my_rownum_table (select id, nextval('my_rownum_seq') from my_big_table). To retrieve a page, just do select ... from my_table join my_rownum_table on (...), which will be really fast since you'll have indexes on both tables. Note that this method requires that you have a primary key, or at least a unique column, on your main table, so that you have something to join with your row-number table. Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] which ext3 fs type should I use for postgresql
Matthew Wakeling wrote: Probably of more use are some of the other settings: -m reserved-blocks-percentage - this reserves a portion of the filesystem that only root can write to. If root has no need for it, you can kill this by setting it to zero. The default is for 5% of the disc to be wasted. This is not a good idea. The 5% is NOT reserved for root's use, but rather is to prevent severe file fragmentation. As the disk gets full, the remaining empty spaces tend to be small spaces scattered all over the disk, meaning that even for modest-sized files, the kernel can't allocate contiguous disk blocks. If you reduce this restriction to 0%, you are virtually guaranteed poor performance when you fill up your disk, since those files that are allocated last will be massively fragmented. Worse, the fragmented files that you create remain fragmented even if you clean up to get back below the 95% mark. If Postgres happened to insert a lot of data on a 99% full file system, those blocks could be spread all over the place, and they'd stay that way forever, even after you cleared some space. Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Typecast bug?
This seems like a bug to me, but it shows up as a performance problem. Since the column being queried is an integer, the second query (see below) can't possibly match, yet Postgres uses a typecast, forcing a full table scan for a value that can't possibly be in the table. The application could intercept these bogus queries, but that requires building schema-specific and postgres-specific knowledge into the application (i.e. What is the maximum legal integer for this column?). Craig explain analyze select version_id, parent_id from version where version_id = 9; QUERY PLAN -- Index Scan using version_pkey on version (cost=0.00..9.89 rows=1 width=8) (actual time=0.054..0.054 rows=0 loops=1) Index Cond: (version_id = 9) Total runtime: 0.130 ms (3 rows) emol_warehouse_1= explain analyze select version_id, parent_id from version where version_id = 999; QUERY PLAN Seq Scan on version (cost=0.00..253431.77 rows=48393 width=8) (actual time=3135.530..3135.530 rows=0 loops=1) Filter: ((version_id)::numeric = 999::numeric) Total runtime: 3135.557 ms (3 rows) \d version Table emol_warehouse_1.version Column | Type | Modifiers +-+--- version_id | integer | not null parent_id | integer | not null ... more columns Indexes: version_pkey PRIMARY KEY, btree (version_id) -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Typecast bug?
Tom Lane wrote: Craig James [EMAIL PROTECTED] writes: This seems like a bug to me, but it shows up as a performance problem. emol_warehouse_1= explain analyze select version_id, parent_id from version where version_id = 999; If you actually *need* so many 9's here as to force it out of the range of bigint, then why is your id column not declared numeric? This seems to me to be about on par with complaining that intcol = 4.2e1 won't be indexed. We have a numeric data type hierarchy, learn to work with it ... Your suggestion of learn to work with it doesn't fly. A good design separates the database schema details from the application to the greatest extent possible. What you're suggesting is that every application that queries against a Postgres database should know the exact range of every numeric data type of every indexed column in the schema, simply because Postgres can't recognize an out-of-range numeric value. In this case, the optimizer could have instantly returned zero results with no further work, since the query was out of range for that column. This seems like a pretty simple optimization to me, and it seems like a helpful suggestion to make to this forum. BTW, this query came from throwing lots of junk at a web app in an effort to uncover exactly this sort of problem. It's not a real query, but then, hackers don't use real queries. The app checks that its input is a well-formed integer expression, but then assumes Postgres can deal with it from there. Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Mailing list hacked by spammer?
I've never gotten a single spam from the Postgres mailing list ... until today. A Chinese company selling consumer products is using this list. I have my filters set to automatically trust this list because it has been so reliable until now. It would be really, really unfortunate if this list fell to the spammers. Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Mailing list hacked by spammer?
Glyn Astill wrote: Most likely just a forged header or something, hardly hacked though is it. Yes, hack is the correct term. The bad guys have hacked into the major email systems, including gmail, which was the origin of this spam: http://www.theregister.co.uk/2008/02/25/gmail_captcha_crack/ I think you need to do some training: http://www2.b3ta.com/bigquiz/hacker-or-spacker/ Sending a link to a web site that plays loud rap music is not a friendly way to make your point. Craig - Original Message From: Craig James [EMAIL PROTECTED] To: pgsql-performance@postgresql.org Sent: Friday, 18 July, 2008 4:02:37 PM Subject: [PERFORM] Mailing list hacked by spammer? I've never gotten a single spam from the Postgres mailing list ... until today. A Chinese company selling consumer products is using this list. I have my filters set to automatically trust this list because it has been so reliable until now. It would be really, really unfortunate if this list fell to the spammers. Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance __ Not happy with your email address?. Get the one you really want - millions of new email addresses available now at Yahoo! http://uk.docs.yahoo.com/ymail/new.html -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Perl/DBI vs Native
Valentin Bogdanov wrote: I have ran quite a few tests comparing how long a query takes to execute from Perl/DBI as compared to psql/pqlib. No matter how many times I run the test the results were always the same. I run a SELECT all on a fairly big table and enabled the log_min_duration_statement option. With psql postgres consistently logs half a second while the exact same query executed with Perl/DBI takes again consistently 2 seconds. The problem may be that your two tests are not equivalent. When Perl executes a statement, it copies the *entire* result set back to the client before it returns the first row. The following program might appear to just be fetching the first row: $sth = $dbh-prepare(select item from mytable); $sth-execute(); $item = $sth-fetchrow_array(); But in fact, before Perl returns from the $sth-execute() statement, it has already run the query and copied all of the rows into a hidden, client-side cache. Each $sth-fetchrow_array() merely copies the data from the hidden cache into your local variable. By contrast, psql executes the query, and starts returning the data a page at a time. So it may appear to be much faster. This also means that Perl has trouble with very large tables. If the mytable in the above example is very large, say a hundred billion rows, you simply can't execute this statement in Perl. It will try to copy 100 billion rows into memory before returning the first answer. The reason for Perl's behind-the-scenes caching is because it allows multiple connections to a single database, and multiple statements on each database handle. By executing each statement completely, it gives the appearance that multiple concurrent queries are supported. The downside is that it can be a huge memory hog. Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Using PK value as a String
Valentin Bogdanov wrote: --- On Mon, 11/8/08, Gregory Stark [EMAIL PROTECTED] wrote: From: Gregory Stark [EMAIL PROTECTED] Subject: Re: [PERFORM] Using PK value as a String To: Jay [EMAIL PROTECTED] Cc: pgsql-performance@postgresql.org Date: Monday, 11 August, 2008, 10:30 AM Jay [EMAIL PROTECTED] writes: I have a table named table_Users: CREATE TABLE table_Users ( UserID character(40) NOT NULL default '', Username varchar(256) NOT NULL default '', Email varchar(256) NOT NULL default '' etc... ); ... But the real question here is what's the better design. If you use Username you'll be cursing if you ever want to provide a facility to allow people to change their usernames. You may not want such a facility now but one day... I don't understand Gregory's suggestion about the design. I thought using natural primary keys as opposed to surrogate ones is a better design strategy, even when it comes to performance considerations and even more so if there are complex relationships within the database. No, exactly the opposite. Data about users (such as name, email address, etc.) are rarely a good choice as a foreign key, and shouldn't be considered keys in most circumstances. As Gregory points out, you're spreading the user's name across the database, effectively denormalizing it. Instead, you should have a user record, with an arbitrary key, an integer or OID, that you use as the foreign key for all other tables. That way, when the username changes, only one table will be affected. And it's much more efficient to use an integer as the key than a long string. Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] select on 22 GB table causes An I/O error occured while sending to the backend. exception
The OOM killer is a terrible idea for any serious database server. I wrote a detailed technical paper on this almost 15 years ago when Silicon Graphics had this same feature, and Oracle and other critical server processes couldn't be made reliable. The problem with overallocating memory as Linux does by default is that EVERY application, no matter how well designed and written, becomes unreliable: It can be killed because of some OTHER process. You can be as clever as you like, and do all the QA possible, and demonstrate that there isn't a single bug in Postgres, and it will STILL be unreliable if you run it on a Linux system that allows overcommitted memory. IMHO, all Postgres servers should run with memory-overcommit disabled. On Linux, that means /proc/sys/vm/overcommit_memory=2. Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] select on 22 GB table causes An I/O error occured while sending to the backend. exception
[EMAIL PROTECTED] wrote: On Wed, 27 Aug 2008, Craig James wrote: The OOM killer is a terrible idea for any serious database server. I wrote a detailed technical paper on this almost 15 years ago when Silicon Graphics had this same feature, and Oracle and other critical server processes couldn't be made reliable. The problem with overallocating memory as Linux does by default is that EVERY application, no matter how well designed and written, becomes unreliable: It can be killed because of some OTHER process. You can be as clever as you like, and do all the QA possible, and demonstrate that there isn't a single bug in Postgres, and it will STILL be unreliable if you run it on a Linux system that allows overcommitted memory. IMHO, all Postgres servers should run with memory-overcommit disabled. On Linux, that means /proc/sys/vm/overcommit_memory=2. it depends on how much stuff you allow others to run on the box. if you have no control of that then yes, the box is unreliable (but it's not just becouse of the OOM killer, it's becouse those other users can eat up all the other box resources as well CPU, network bandwidth, disk bandwidth, etc) even with overcommit disabled, the only way you can be sure that a program will not fail is to make sure that it never needs to allocate memory. with overcommit off you could have one program that eats up 100% of your ram without failing (handling the error on memory allocation such that it doesn't crash), but which will cause _every_ other program on the system to fail, including any scripts (becouse every command executed will require forking and without overcommit that will require allocating the total memory that your shell has allocated so that it can run a trivial command (like ps or kill that you are trying to use to fix the problem) if you have a box with unpredictable memory use, disabling overcommit will not make it reliable. it may make it less unreliable (the fact that the linux OOM killer will pick one of the worst possible processes to kill is a problem), but less unreliable is not the same as reliable. The problem with any argument in favor of memory overcommit and OOM is that there is a MUCH better, and simpler, solution. Buy a really big disk, say a terabyte, and allocate the whole thing as swap space. Then do a decent job of configuring your kernel so that any reasonable process can allocate huge chunks of memory that it will never use, but can't use the whole terrabyte. Using real swap space instead of overallocated memory is a much better solution. - It's cheap. - There is no performance hit at all if you buy enough real memory - If runaway processes start actually using memory, the system slows down, but server processes like Postgres *aren't killed*. - When a runaway process starts everybody swapping, you can just find it and kill it. Once it's dead, everything else goes back to normal. It's hard to imagine a situation where any program or collection of programs would actually try to allocate more than a terrabyte of memory and exceed the swap space on a single terrabyte disk. The cost is almost nothing, a few hundred dollars. So turn off overcommit, and buy an extra disk if you actually need a lot of virtual memory. Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] select on 22 GB table causes An I/O error occured while sending to the backend. exception
Matthew Wakeling wrote: On Thu, 28 Aug 2008, Steve Atkins wrote: Probably the best solution is to just tell the kernel somehow to never kill the postmaster. Or configure adequate swap space? Oh yes, that's very important. However, that gives the machine the opportunity to thrash. No, that's where the whole argument for allowing overcommitted memory falls flat. The entire argument for allowing overcommitted memory hinges on the fact that processes *won't use the memory*. If they use it, then overcommitting causes problems everywhere, such as a Postmaster getting arbitrarily killed. If a process *doesn't* use the memory, then there's no problem with thrashing, right? So it never makes sense to enable overcommitted memory when Postgres, or any server, is running. Allocating a big, fat terabyte swap disk is ALWAYS better than allowing overcommitted memory. If your usage is such that overcommitted memory would never be used, then the swap disk will never be used either. If your processes do use the memory, then your performance goes into the toilet, and you know it's time to buy more memory or a second server, but in the mean time your server processes at least keep running while you kill the rogue processes. Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Best hardware/cost tradoff?
-Mensaje original- De: [EMAIL PROTECTED] * I think we will go for hardware-based RAID 1 with a good battery-backed-up controller. I have read that software RAID perform surprisingly good, but for a production site where hotplug replacement of dead disks is required, is software RAID still worth it? ... I havent had any issues with software raid (mdadm) and hot-swaps. It keeps working in degraded mode and as soon as you replace the defective disk it can reconstruct the array on the fly. Performance will suffer while at it but the service keeps up. The battery backup makes a very strong point for a hw controller. Still, I have heard good things on combining a HW controller with JBODS leaving the RAID affair to mdadm. In your scenario though with *lots* of random reads, if I had to choose between a HW controller 2 disks or software RAID with 4 or 6 disks, I would go for the disks. There are motherboards with 6 SATA ports. For the money you will save on the controller you can afford 6 disks in a RAID 10 setup. This is good advice. Hot-swapping seems cool, but how often will you actually use it? Maybe once every year? With Software RAID, replacing a disk means shutdown, swap the hardware, and reboot, which is usually less than ten minutes, and you're back in business. If that's the only thing that happens, you'll have 99.97% uptime on your server. If you're on a limited budget, a software RAID 1+0 will be very cost effective and give good performance for lots of random reads. Hardware RAID with a battery-backed cache helps with writes and hot swapping. If your random-read performance needs outweigh these two factors, consider software RAID. Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] select on 22 GB table causes An I/O error occured while sending to the backend. exception
James Mansion wrote: I can't see how an OS can lie to processes about memory being allocated to them and not be ridiculed as a toy, but there you go. I don't think Linux is the only perpetrator - doesn't AIX do this too? This is a leftover from the days of massive physical modeling (chemistry, physics, astronomy, ...) programs written in FORTRAN. Since FORTRAN didn't have pointers, scientists would allocate massive three-dimensional arrays, and their code might only access a tiny fraction of the memory. The operating-system vendors, particularly SGI, added features to the various flavors of UNIX, including the ability to overcommit memory, to support these FORTRAN programs, which at the time were some of the most important applications driving computer science and computer architectures of workstation-class computers. When these workstation-class computers evolved enough to rival mainframes, companies started shifting apps like Oracle onto the cheaper workstation-class computers. Unfortunately, the legacy of the days of these FORTRAN programs is still with us, and every few years we have to go through this discussion again. Disable overcommitted memory. There is NO REASON to use it on any modern server-class computer, and MANY REASONS WHY IT IS A BAD IDEA. Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] select on 22 GB table causes An I/O error occured while sending to the backend. exception
Florian Weimer wrote: * Craig James: So it never makes sense to enable overcommitted memory when Postgres, or any server, is running. There are some run-time environments which allocate huge chunks of memory on startup, without marking them as not yet in use. SBCL is in this category, and also the Hotspot VM (at least some extent). I stand by my assertion: It never makes sense. Do these applications allocate a terrabyte of memory? I doubt it. Buy a terrabyte swap disk and disable overcommitted memory. Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] select on 22 GB table causes An I/O error occured while sending to the backend. exception
Florian Weimer wrote: * Craig James: There are some run-time environments which allocate huge chunks of memory on startup, without marking them as not yet in use. SBCL is in this category, and also the Hotspot VM (at least some extent). I stand by my assertion: It never makes sense. Do these applications allocate a terrabyte of memory? I doubt it. SBCL sizes its allocated memory region based on the total amount of RAM and swap space. In this case, buying larger disks does not help. 8-P SBCL, as Steel Bank Common Lisp? Why would you run that on a server machine alongside Postgres? If I had to use SBLC and Postgres, I'd put SBLC on a separate machine all its own, so that it couldn't corrupt Postgres or other servers that had to be reliable. Are you saying that if I bought a terrabyte of swap disk, SBLC would allocate a terrabyte of space? Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Sort causes system to freeze
Maybe this is an obviously dumb thing to do, but it looked reasonable to me. The problem is, the seemingly simple sort below causes a fairly powerful computer to completely freeze for 5-10 minutes. During the sort, you can't login, you can't use any shell sessions you already have open, the Apache server barely works, and even if you do nice -20 top before you start the sort, the top(1) command comes to a halt while the sort is proceeding! As nearly as I can tell, the sort operation is causing a swap storm of some sort -- nothing else in my many years of UNIX/Linux experience can cause a nice -20 process to freeze. The sort operation never finishes -- it's always killed by the system. Once it dies, everything returns to normal. This is 8.3.0. (Yes, I'll upgrade soon.) Is this a known bug, or do I have to rewrite this query somehow? Maybe add indexes to all four columns being sorted? Thanks! Craig = explain select * from plus order by supplier_id, compound_id, units, price; QUERY PLAN --- Sort (cost=5517200.48..5587870.73 rows=28268100 width=65) Sort Key: supplier_id, compound_id, units, price - Seq Scan on plus (cost=0.00..859211.00 rows=28268100 width=65) = \d plus Table emol_warehouse_1.plus Column | Type | Modifiers ---+---+--- supplier_id | integer | supplier_name | text | compound_id | text | amount| text | units | text | price | numeric(12,2) | currency | text | description | text | sku | text | Indexes: i_plus_compound_id btree (supplier_id, compound_id) i_plus_supplier_id btree (supplier_id) max_connections = 1000 shared_buffers = 2000MB work_mem = 256MB max_fsm_pages = 100 max_fsm_relations = 5000 synchronous_commit = off #wal_sync_method = fdatasync wal_buffers = 256kB checkpoint_segments = 30 effective_cache_size = 4GB Machine: Dell, 8x64-bit CPUs, 8GB ram, Perc6i battery-backed RAID controller, 8 disks as RAID10 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Need help with 8.4 Performance Testing
justin wrote: Tom Lane wrote: Hmm ... I wonder whether this means that the current work on parallelizing I/O (the posix_fadvise patch in particular) is a dead end. Because what that is basically going to do is expend more CPU to improve I/O efficiency. If you believe this thesis then that's not the road we want to go down. regards, tom lane What does the CPU/ Memory/Bus performance road map look like? Is the IO performance for storage device for what ever it be, going to be on par with the above to cause this problem? Flash memory will become just a fourth layer in the memory caching system (on-board CPU, high-speed secondary cache, main memory, and persistent memory). The idea of external storage will probably disappear altogether -- computers will just have memory, and won't forget anything when you turn them off. Since most computers are 64 bits these days, all data and programs will just hang out in memory at all times, and be directly addressable by the CPU. The distinction between disk and memory arose from the fact that disks were large, slow devices relative to core memory and had to be connected by long wires, hence the need for I/O subsystems. As flash memory becomes mainstream, I expect this distinction to disappear. Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] postgresql 8.3 tps rate
David Rees wrote: On Thu, Jan 22, 2009 at 1:27 PM, Ibrahim Harrani ibrahim.harr...@gmail.com wrote: Version 1.93d --Sequential Output-- --Sequential Input- --Random- Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks-- MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP myserver 300M 391 97 9619 1 8537 2 673 99 + +++ 1196 16 Latency 211ms 388ms 325ms 27652us 722us6720ms Version 1.93d --Sequential Create-- Random Create myserver-Create-- --Read--- -Delete-- -Create-- --Read--- -Delete-- files /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP 16 9004 25 + +++ + +++ 8246 20 + +++ + +++ Latency 592ms 208us 102us 673ms 179us 100us You should be testing bonnie with a file size that is at least double the amount of memory in your machine - in this case, 4GB files, not 300MB files. When I compare my bonnie++ result with the one at http://www.westnet.com/~gsmith/content/postgresql/pg-disktesting.htm, It seems that there is something wrong with the disks!? Yes, your machine appears to be very slow. You should be able to write in the order of 30-50MB/s+ and read in the order of 40-80MB/s+. Random IO should be in the 200 tps range for a 7200rpm SATA RAID1. Have you tried the really basic speed test? time (dd if=/dev/zero of=bigfile bs=8192 count=100; sync) time dd if=bigfile of=/dev/null bs=8192 Divide 8.2GB by the times reported. On a single 10K SATA drive, I get about 55MB/sec write and 61 MB/sec read. If you can't get similar numbers, then something is wrong. Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] postgresql 8.3 tps rate
Ibrahim Harrani wrote: Hi Craig, Here is the result. It seems that disk write is terrible!. r...@myserver /usr]# time (dd if=/dev/zero of=bigfile bs=8192 count=100; sync) 100+0 records in 100+0 records out 819200 bytes transferred in 945.343806 secs (8665630 bytes/sec) real15m46.206s user0m0.368s sys 0m15.560s So it's nothing to do with Postgres. I'm no expert solving this sort of problem, but I'd start by looking for: - a rogue process that's using disk bandwidth (use vmstat when the system is idle) - system logs, maybe there are a zillion error messages - if you have a second disk, try its performance - if you don't have a second disk, buy one, install it, and try it - get another SATA controller and try that Or do the reverse: Put the disk in a different computer (one that you've tested beforehand and verified is fast) and see if the problem follows the disk. Same for the SATA card. It could be your SATA controller, the disk, some strange hdparm setting ... who knows? I ran into this once a LONG time ago with a kernal that didn't recognize the disk or driver or something, and disabled the DMA (direct-memory access) feature, which meant the CPU had to handle every single byte coming from the disk, which of course meant SLOW, plus you couldn't even type while the disk was busy. A simple manual call to hdparm(1) to force DMA on fixed it. Weird stuff like that can be very hard to find. I also saw very low write speed once on a RAID device with a battery-backed cache, when the battery went dead. The RAID controller went into its conservative mode, which for some reason was much slower than the disk's raw performance. Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] I have a fusion IO drive available for testing
Dave Cramer wrote: So I tried writing directly to the device, gets around 250MB/s, reads at around 500MB/s The client is using redhat so xfs is not an option. I'm using Red Hat and XFS, and have been for years. Why is XFS not an option with Red Hat? Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] partition question for new server setup
Whit Armstrong wrote: I have the opportunity to set up a new postgres server for our production database. I've read several times in various postgres lists about the importance of separating logs from the actual database data to avoid disk contention. Can someone suggest a typical partitioning scheme for a postgres server? My initial thought was to create /var/lib/postgresql as a partition on a separate set of disks. However, I can see that the xlog files will be stored here as well: http://www.postgresql.org/docs/8.3/interactive/storage-file-layout.html Should the xlog files be stored on a separate partition to improve performance? Any suggestions would be very helpful. Or if there is a document that lays out some best practices for server setup, that would be great. The database usage will be read heavy (financial data) with batch writes occurring overnight and occassionally during the day. server information: Dell PowerEdge 2970, 8 core Opteron 2384 6 1TB hard drives with a PERC 6i 64GB of ram We're running a similar configuration: PowerEdge 8 core, PERC 6i, but we have 8 of the 2.5 10K 384GB disks. When I asked the same question on this forum, I was advised to just put all 8 disks into a single RAID 10, and forget about separating things. The performance of a battery-backed PERC 6i (you did get a battery-backed cache, right?) with 8 disks is quite good. In order to separate the logs, OS and data, I'd have to split off at least two of the 8 disks, leaving only six for the RAID 10 array. But then my xlogs would be on a single disk, which might not be safe. A more robust approach would be to split off four of the disks, put the OS on a RAID 1, the xlog on a RAID 1, and the database data on a 4-disk RAID 10. Now I've separated the data, but my primary partition has lost half its disks. So, I took the advice, and just made one giant 8-disk RAID 10, and I'm very happy with it. It has everything: Postgres, OS and logs. But since the RAID array is 8 disks instead of 4, the net performance seems to quite good. But ... your mileage may vary. My box has just one thing running on it: Postgres. There is almost no other disk activity to interfere with the file-system caching. If your server is going to have a bunch of other activity that generate a lot of non-Postgres disk activity, then this advice might not apply. Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] partition question for new server setup
Kenneth Marshall wrote: Additionally are there any clear choices w/ regard to filesystem types? ?Our choices would be xfs, ext3, or ext4. Well, there's a lot of people who use xfs and ext3. XFS is generally rated higher than ext3 both for performance and reliability. However, we run Centos 5 in production, and XFS isn't one of the blessed file systems it comes with, so we're running ext3. It's worked quite well for us. The other optimizations are using data=writeback when mounting the ext3 filesystem for PostgreSQL and using the elevator=deadline for the disk driver. I do not know how you specify that for Ubuntu. After a reading various articles, I thought that noop was the right choice when you're using a battery-backed RAID controller. The RAID controller is going to cache all data and reschedule the writes anyway, so the kernal schedule is irrelevant at best, and can slow things down. On Ubuntu, it's echo noop /sys/block/hdx/queue/scheduler where hdx is replaced by the appropriate device. Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] superlative missuse
David Wilson wrote: On Tue, May 12, 2009 at 5:53 PM, Angel Alvarez cl...@uah.es wrote: we suffer a 'more optimal' superlative missuse there is not so 'more optimal' thing but a simple 'better' thing. im not native english speaker but i think it still applies. Well this a superlative list so all of you deserve a better optimal use. As a native english speaker: You are technically correct. However, more optimal has a well-understood meaning as closer to optimal, and as such is appropriate and generally acceptable despite being technically incorrect. I disagree -- it's a glaring error. More optimized or better optimized are perfectly good, and correct, phrases. Why not use them? Every time I read more optimal, I am embarrassed for the person who is showing his/her ignorance of the basics of English grammar. If I wrote, It's more best, would you find that acceptable? This is a postgres mailing list, not an english grammar mailing list... Since you replied on the list, it's only appropriate to get at least one rebuttal. Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] raid10 hard disk choice
Matthew Wakeling wrote: On Thu, 21 May 2009, Linos wrote: i have to buy a new server and in the budget i have (small) i have to select one of this two options: -4 sas 146gb 15k rpm raid10. -8 sas 146gb 10k rpm raid10. It depends what you are doing. I think in most situations, the second option is better, but there may be a few situations where the reverse is true. Basically, the first option will only be faster if you are doing lots of seeking (small requests) in a single thread. As soon as you go multi-threaded or are looking at sequential scans, you're better off with more discs. Since you have to share the disks with a file server, which might be heavily used, the 8-disk array will probably be better even if you're doing lots of seeking in a single thread. Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Hosted servers with good DB disk performance?
Greg Smith wrote: What I'd love to have is a way to rent a fairly serious piece of dedicated hardware, ideally with multiple (at least 4) hard drives in a RAID configuration and a battery-backed write cache. The cache is negotiable. Linux would be preferred, FreeBSD or Solaris would also work; not Windows though (see good DB performance). We tried this with poor results. Most of the co-location and server-farm places are set up with generic systems that are optimized for small-to-medium-sized web sites. They use MySQL and are surprised to hear there's an alternative open-source DB. They claim to be able to create custom configurations, but it's a lie. The problem is that they run on thin profit margins, and their techs are mostly ignorant, they just follow scripts. If something goes wrong, or they make an error, you can't get anything through their thick heads. And you can't go down there and fix it yourself. For example, we told them EXACTLY how to set up our system, but they decided that automatic monthly RPM OS updates couldn't hurt. So the first of the month, we in the morning to find that Linux had been updated to libraries that were incompatible with our own software, the system automatically rebooted and our web site was dead. And many similar incidents. We finally bought some nice Dell servers and found a co-location site that provides us all the infrastructure (reliable power, internet, cooling, security...), and we're in charge of the computers. We've never looked back. Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Query plan issues - volatile tables
Brian Herlihy wrote: We have a problem with some of our query plans. One of our tables is quite volatile, but postgres always uses the last statistics snapshot from the last time it was analyzed for query planning. Is there a way to tell postgres that it should not trust the statistics for this table? Basically we want it to assume that there may be 0, 1 or 100,000 entries coming out from a query on that table at any time, and that it should not make any assumptions. I had a similar problem, and just changed my application to do an analyze either just before the query, or just after a major update to the table. Analyze is very fast, almost always a orders of magnitude faster than the time lost to a poor query plan. Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Scalability in postgres
Greg Smith wrote: No amount of theoretical discussion advances that any until you're at least staring at a very specific locking problem you've already characterized extensively via profiling. And even then, profiling trumps theory every time. In theory, there is no difference between theory and practice. In practice, there is a great deal of difference. Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] How would you store read/unread topic status?
Mathieu Nebra wrote: Greg Stark a écrit : All the other comments are accurate, though it does seem like something the database ought to be able to handle. The other thing which hasn't been mentioned is that you have a lot of indexes. Updates require maintaining all those indexes. Are all of these indexes really necessary? Do you have routine queries which look up users based on their flags? Or all all your oltp transactions for specific userids in which case you probably just need the index on userid. We are using these indexes, but I can't be sure if we _really_ need them or not. I can go into detail. We have: UserID - TopicID - LastReadAnswerID - WrittenStatus - IsFavorite So basically, we toggle the boolean flag WrittenStatus when the user has written in that topic. The same goes for IsFavorite. Do those last two columns hold much data? Another thing to consider is to split this into two tables: UserID - TopicID - LastReadAnswerID UserID - TopicID - WrittenStatus - IsFavorite As others have pointed out, an UPDATE in Postgres is a select/delete/insert, and if you're updating just the LastReadAnswerID all the time, you're wasting time deleting and re-inserting a lot of data that never change (assuming they're not trivially small columns). This might also solve the problem of too many indexes -- the table that's updated frequently would only have an index on (UserID, TopicID), so the update only affects one index. Then to minimize the impact on your app, create a view that looks like the original table for read-only apps. Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Sorting by an arbitrary criterion
Suppose I have a large table with a small-cardinality CATEGORY column (say, categories 1..5). I need to sort by an arbitrary (i.e. user-specified) mapping of CATEGORY, something like this: 1 = 'z' 2 = 'a' 3 = 'b' 4 = 'w' 5 = 'h' So when I get done, the sort order should be 2,3,5,4,1. I could create a temporary table with the category-to-key mapping, but is there any way to do this in a single SQL statement? Thanks, Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Used computers?
Apologies for a slightly off-topic question ... a friend is overseeing the demise of a company and has several computers that they need to get rid of. She's an attorney and knows little about them except that they're IBM and cost $50K originally. Where does one go to sell equipment like this, and/or get a rough idea of its worth? Thanks, Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Per-database warm standby?
8.4 has vastly improved the warm-standby features, but it looks to me like this is still an installation-wide backup, not a per-database backup. That is, if you have (say) a couple hundred databases, and you only want warm-backup on one of them, you can't do it (except using other solutions like Slony). Is that right? Thanks, Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Number of tables
Fabio La Farcioli wrote: i am developing a web app for thousands users (1.000/2.000). Each user have a 2 table of work...I finally have 2.000 (users) x 2 tables = 4.000 tables! Postgres support an elevate number of tables?? i have problem of performance ??? We have run databases with over 100,000 tables with no problems. However, we found that it's not a good idea to have a table-per-user design. As you get more users, it is hard to maintain the database. Most of the time there are only a few users active. So, we create a single large archive table, identical to the per-user table except that it also has a user-id column. When a user hasn't logged in for a few hours, a cron process copies their tables into the large archive table, and returns their personal tables to a pool of available tables. When the user logs back in, a hidden part of the login process gets a table from the pool of available tables, assigns it to this user, and copies the user's data from the archive into this personal table. They are now ready to work. This whole process takes just a fraction of a second for most users. We keep a pool of about 200 tables, which automatically will expand (create more tables) if needed, but we've never had more than 200 users active at one time. Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Number of tables
Greg Stark wrote: What you want is a multi-column primary key where userid is part of the key. You don't want to have a separate table for each user unless each user has their own unique set of columns. Not always true. When the user logs back in, a hidden part of the login process gets a table from the pool of available tables, assigns it to this user, and copies the user's data from the archive into this personal table. They are now ready to work. This whole process takes just a fraction of a second for most users. And what does all this accomplish? The primary difference is between delete from big_table where userid = xx vesus truncate user_table There are also significant differences in performance for large inserts, because a single-user table almost never needs indexes at all, whereas a big table for everyone has to have at least one user-id column that's indexed. In our application, the per-user tables are hitlists -- scratch lists that are populated something like this. The hitlist is something like this: create table hitlist_xxx ( row_id integer, sortorder integer default nextval('hitlist_seq_xxx') ) truncate table hitlist_xxx; select setval(hitlist_seq_xxx, 1, false); insert into hitlist_xxx (row_id) (select some_id from ... where ... order by ...); Once the hitlist is populated, the user can page through it quickly with no further searching, e.g. using a web app. We tested the performance using a single large table in Postgres, and it was not nearly what we needed. These hitlists tend to be transitory, and the typical operation is to discard the entire list and create a new one. Sometimes the user will sort the entire list based on some criterion, which also requires a copy/delete/re-insert using a new order-by. With both Oracle and Postgres, truncate is MUCH faster than delete, and the added index needed for a single large table only makes it worse. With Postgres, the repeated large delete/insert makes for tables that need a lot of vacuuming and index bloat, further hurting performance. Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Using Gprof with Postgresql
Pierre Frédéric Caillaud wrote: I just compiled it with gcc and produces the gmon.out file for every process; by the way I am running below script in order to produce readable .out files gprof .../pgsql/bin/postgres gmon.out createtable2.out is postgres the right executable? regards reydan Off topic, but hace you tried oprofile ? It's excellent... I find valgrind to be an excellent profiling tool. It has the advantage that it runs on an unmodified executable (using a virtual machine). You can compile postgres the regular way, start the system up, and then create a short shell script called postgres that you put in place of the original executable that invokes valgrind on the original executable. Then when postgres starts up your backend, you have just one valgrind process running, rather than the whole Postgres system. Valgrind does 100% tracing of the program rather than statistical sampling, and since it runs in a pure virtual machine, it can detect almost all memory corruption and leaks. The big disadvantage of valgrind is that it slows the process WAY down, like by a factor of 5-10 on CPU. For a pure CPU process, it doesn't screw up your stats, but if a process is mixed CPU and I/O, the CPU will appear to dominate. Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] High CPU load on Postgres Server during Peak times!!!!
Dave Dutcher wrote: You need a COMMIT for every BEGIN. If you just run a SELECT statement without first beginning a transaction, then you should not end up with a connection that is Idle in Transaction. If you are beginning a transaction, doing a select, and then not committing, then yes that is a bug. The BEGIN can be hidden, though. For example, if the application is written in Perl, $dbh = DBI-connect($dsn, $user, $pass, {AutoCommit = 0}); will automatically start a transaction the first time you do anything. Under the covers, the Perl DBI issues the BEGIN for you, and you have to do an explicit $dbh-commit(); to commit it. Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] High CPU load on Postgres Server during Peak times!!!!
Gerhard Wiesinger wrote: Hello Craig, Are you sure this is correct? The test program (see below) with autocommit=0 counts up when an insert is done in another session and there is no commit done. I think with each new select a new implicit transaction is done when no explicit BEGIN has been established. Sorry, I should have been more specific. A transaction starts when you do something that will alter data in the database, such as insert, update, alter table, create sequence, and so forth. The Perl DBI won't start a transaction for a select. But my basic point is still valid: Some languages like Perl can implicitely start a transaction, so if programmers aren't familiar with this behavior, they can accidentally create long-running transactions. Craig Can one confirm this behavior? Thnx. Ciao, Gerhard # Disable autocommit! my $dbh = DBI-connect($con, $dbuser, $dbpass, {RaiseError = 1, AutoCommit=0}) || die Unable to access Database '$dbname' on host '$dbhost' as user '$dbuser'. Error returned was: . $DBI::errstr .; my $sth = $dbh-prepare('SELECT COUNT(*) FROM employee;'); for (;;) { $sth-execute(); my ($count) = $sth-fetchrow(); print count=$count\n; $sth-finish(); # $dbh-commit; sleep(3); } $dbh-disconnect; -- http://www.wiesinger.com/ On Thu, 24 Sep 2009, Craig James wrote: Dave Dutcher wrote: You need a COMMIT for every BEGIN. If you just run a SELECT statement without first beginning a transaction, then you should not end up with a connection that is Idle in Transaction. If you are beginning a transaction, doing a select, and then not committing, then yes that is a bug. The BEGIN can be hidden, though. For example, if the application is written in Perl, $dbh = DBI-connect($dsn, $user, $pass, {AutoCommit = 0}); will automatically start a transaction the first time you do anything. Under the covers, the Perl DBI issues the BEGIN for you, and you have to do an explicit $dbh-commit(); to commit it. Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Bad performance of SELECT ... where id IN (...)
Xia Qingran wrote: Hi, I have a big performance problem in my SQL select query: select * from event where user_id in (500,499,498,497,496,495,494,493,492,491,490,489,488,487,486,485,484,483,482,481,480,479,478,477,476,475,474,473,472,471,470,469,468,467,466,465,464,463,462,461,460,459,458,457,456,455,454,453,452,451,450,449,448,447,446,445,444,443,442,441,440,439,438,437,436,435,434,433,432,431,430,429,428,427,426,425,424,423,422,421,420,419,418,417,416,415,414,413,412,411,410,409,408,407,406,405,404,403,402,401, 400,399,398,397,396,395,394,393,392,391,390,389,388,387,386,385,384,383,382,381,380,379,378,377,376,375,374,373,372,371,370,369,368,367,366,365,364,363,362,361,360,359,358,357,356,355,354,353,352,351,350,349,348,347,346,345,344,343,342,341,340,339,338,337,336,335,334,333,332,331,330,329,328,327,326,325,324,323,322,321,320,319,318,317,316,315,314,313,312,311,310,309,308,307,306,305,304,303,302,301, 300,299,298,297,296,295,294,293,292,291,290,289,288,287,286,285,284,283,282,281,280,279,278,277,276,275,274,273,272,271,270,269,268,267,266,265,264,263,262,261,260,259,258,257,256,255,254,253,252,251,250,249,248,247,246,245,244,243,242,241,240,239,238,237,236,235,234,233,232,231,230,229,228,227,226,225,224,223,222,221,220,219,218,217,216,215,214,213,212,211,210,209,208,207,206,205,204,203,202,201, 200,199,198,197,196,195,194,193,192,191,190,189,188,187,186,185,184,183,182,181,180,179,178,177,176,175,174,173,172,171,170,169,168,167,166,165,164,163,162,161,160,159,158,157,156,155,154,153,152,151,150,149,148,147,146,145,144,143,142,141,140,139,138,137,136,135,134,133,132,131,130,129,128,127,126,125,124,123,122,121,120,119,118,117,116,115,114,113,112,111,110,109,108,107,106,105,104,103,102,101, 100,99,98,97,96,95,94,93,92,91,90,89,88,87,86,85,84,83,82,81,80,79,78,77,76,75,74,73,72,71,70,69,68,67,66,65,64,63,62,61,60,59,58,57,56,55,54,53,52,51,50,49,48,47,46,45,44,43,42,41,40,39,38,37,36,35,34,33,32,31,30,29,28,27,26,25,24,23,22,21,20,19,18,17,16,15,14,13,12,11,10,9,8,7,6,5,4,3,2,1,0); The above SELECT always spends 1200ms. If your user_id is always in a narrow range like this, or even in any range that is a small fraction of the total, then add a range condition, like this: select * from event where user_id = 500 and user_id = 0 and user_id in (...) I did this exact same thing in my application and it worked well. Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Best suiting OS
Scott Marlowe wrote: Personally, I use Fedora, and my servers have been quite stable. One of our main web servers running Fedora: It's not that there can't be stable releases of FC, it's that it's not the focus of that project. So, if you get lucky, great! I can't imagine running a production DB on FC, with it's short supported life span and focus on development and not stability. I use Fedora, and it was a mistake. I am looking for a better solution. Fedora has been very stable (uptime of 430 days on one server), BUT... Realistically, the lifetime of a release is as low as SIX MONTHS. We bought servers just as a FC release was coming out, and thought we'd be safe by going with the older, tested release. But six months after that, the next FC release came out, and the version we'd installed fell off the support list. It takes almost no time with Fedora to run into big problems. Maybe there's a security release of ssh, you try to compile it, but it needs the latest gcc, but that's not available on your unsupported version of FC that you installed less than a year ago. Or maybe you need a new version of PHP to pass audit with your credit-card processor, but again, your FC release isn't supported so you have to uninstall the FC PHP, get the source, and compile PHP from scratch ... on and on it goes. Fedora is a very nice project, but it's not suitable for production database servers. This discussion has been very helpful indeed, and we appreciate everyone's contributions. I'm leaning towards a stable Debian release for our next upgrade, but there are several other well-reasoned suggestions here. Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] updating a row in a table with only one row
Merlin Moncure wrote: On Tue, Oct 6, 2009 at 10:59 AM, Michal Vitecek f...@mageo.cz wrote: Merlin Moncure wrote: On Mon, Oct 5, 2009 at 5:17 AM, Michal Vitecek f...@mageo.cz wrote: Could the problem be the HW RAID card? There's ServerRAID 8k with 256MB with write-back enabled. Could it be that its internal cache becomes full and all disk I/O operations are delayed until it writes all changes to hard drives? that's possible...the red flag is going to be iowait. if your server can't keep up with the sync demands for example, you will eventually outrun the write cache and you can start to see slow queries. With your server though it would take in the hundreds of (write) transactions per second to do that minimum. The problem is that the server is not loaded in any way. The iowait is 0.62%, there's only 72 sectors written/s, but the maximum await that I saw was 28ms (!). Any attempts to reduce the time (I/O schedulers, disabling bgwriter, increasing number of checkpoints, decreasing shared buffers, disabling read cache on the card etc.) didn't help. After some 3-5m there occurs a COMMIT which takes 100-1x longer time than usual. Setting fsynch to off Temporarily improved the COMMIT times considerably but I fear to have this option off all the time. Is anybody else using the same RAID card? I suspect the problem lies somewhere between the aacraid module and the card. The aacraid module ignores setting of the 'cache' parameter to 3 -- this should completely disable the SYNCHRONIZE_CACHE command. I think you're right. One thing you can do is leave fsync on but disable synchronous_commit. This is compromise between fsync on/off (data consistent following crash, but you may lose some transactions). We need to know what iowait is at the precise moment you get the long commit time. Throw a top, give it short update interval (like .25 seconds), and watch. top(1) has a batch mode (-b) that's useful for sending results to a file. Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Query plan for NOT IN
Kevin Grittner wrote: Which leaves the issue open -- a flexible way to flag the *reason* (or *reasons*) for the absence of a value could be a nice enhancement, if someone could invent a good implementation. Of course, one could always add a column to indicate the reason for a NULL; and perhaps that would be as good as any scheme to attach reason flags to NULL. You'd just have to make sure the reason column was null capable for those rows where there *was* a value, which would make the reason not applicable I'd argue that this is just a special case of a broader problem of metadata: Data about the data. For example, I could have a temperature, 40 degrees, and an error bounds, +/- 0.25 degrees. Nobody would think twice about making these separate columns. I don't see how this is any different from a person's middle initial of NULL, plus a separate column indicating not known versus doesn't have one if that distinction is important. There are many examples like this, where a simple value in one column isn't sufficient, so another column contains metadata that qualifies or clarifies the information. NULL is just one such case. But, this should probably be on an SQL discussion board, not PG performance... Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance