Re: [PERFORM] pg_dump and pg_restore
On Mon, May 17, 2010 at 12:04 AM, Jayadevan M jayadevan.maym...@ibsplc.com wrote: Hello all, I was testing how much time a pg_dump backup would take to get restored. Initially, I tried it with psql (on a backup taken with pg_dumpall). It took me about one hour. I felt that I should target for a recovery time of 15 minutes to half an hour. So I went through the blogs/documentation etc and switched to pg_dump and pg_restore. I tested only the database with the maximum volume of data (about 1.5 GB). With pg_restore -U postgres -v -d PROFICIENT --clean -Fc proficient.dmp it took about 45 minutes. I tried it with pg_restore -U postgres -j8 -v -d PROFICIENT --clean -Fc proficient.dmp Not much improvement there either. Have I missed something or 1.5 GB data on a machine with the following configuration will take about 45 minutes? There is nothing else running on the machine consuming memory or CPU. Out of 300 odd tables, about 10 tables have millions of records, rest are all having a few thousand records at most. Here are the specs ( a pc class machine)- PostgreSQL 8.4.3 on i686-pc-linux-gnu CentOS release 5.2 Intel(R) Pentium(R) D CPU 2.80GHz 2 GB RAM Storage is local disk. Postgresql parameters (what I felt are relevant) - max_connections = 100 shared_buffers = 64MB work_mem = 16MB maintenance_work_mem = 16MB synchronous_commit on Do the big tables have lots of indexes? If so, you should raise maintenance_work_mem. Peter -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] postgresql is slow with larger table even it is in RAM
On Tue, Mar 25, 2008 at 3:35 AM, sathiya psql [EMAIL PROTECTED] wrote: Dear Friends, I have a table with 32 lakh record in it. Table size is nearly 700 MB, and my machine had a 1 GB + 256 MB RAM, i had created the table space in RAM, and then created this table in this RAM. So now everything is in RAM, if i do a count(*) on this table it returns 327600 in 3 seconds, why it is taking 3 seconds ? because am sure that no Disk I/O is happening. ( using vmstat i had confirmed, no disk I/O is happening, swap is also not used ) Any Idea on this ??? I searched a lot in newsgroups ... can't find relevant things ( because everywhere they are speaking about disk access speed, here i don't want to worry about disk access ) If required i will give more information on this. Two things: - Are you VACUUM'ing regularly? It could be that you have a lot of dead rows and the table is spread out over a lot of pages of mostly dead space. That would cause *very* slow seq scans. - What is your shared_buffers set to? If it's really low then postgres could be constantly swapping from ram-disk to memory. Not much would be cached, and performance would suffer. FWIW, I did a select count(*) on a table with just over 30 rows, and it only took 0.28 sec. Peter -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Anyone using a SAN?
Dell acquired Equallogic last November/December. I noticed your Dell meeting was a Dell/EMC meeting. Have you talked to them or anyone else about Equallogic? Now that you mention it, I do recall a bit about Equalogic in the Dell pitch. It didn't really stand out in my mind and a lot of the technical details were similar enough to the EMC details that they just melded in my mind. When I was looking at iSCSI solutions, the Equallogic was really slick. Of course, I needed high-end performance, which of course came at a steep price, and the project got canned. Oh well. Still, the EL solution claimed near linear scalability when additional capacity/shelves were added. And, they have a lot of really nice technologies for managing the system. If you think Equalogic is slick, check out 3par. They've got a lot of very cool features and concepts. Unfortunately, this comes at a higher price. To each his own, I guess. Our meetings didn't focus a lot on scalability of capacity, as we just didn't think to ask. I think the basic pitch was it scales well without any real hard data. Peter -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] What is the best way to storage music files in Postgresql
I am going to embarkon building a music library using apache, postgresql and php. What is the best way to store the music files? Your options are either to use a BLOB within the database or to store paths to normal files in the file system in the database. I suspect using normal files will make backup and management a great deal easier than using in-database BLOBs, so personally I'd do it that way. I discussed something like this with some co-workers recently, and here's what I had to say. Not all of these apply to the original message, but they are things to consider when marrying a database to a file storage system. Storing the files in the database as BLOBs: Pros: - The files can always be seen by the database system as long as it's up (there's no dependence on an external file system). - There is one set of locking mechanisms, meaning that the file operations can be atomic with the database operations. - There is one set of permissions to deal with. Cons: - There is almost no way to access files outside of the database. If the database goes down, you are screwed. - If you don't make good use of tablespaces and put blobs on a separate disk system, the disk could thrash going between data and blobs, affecting performance. - There are stricter limits for PostgreSQL blobs (1 GB size limits, I've read). Storing files externally, storing pathnames in the database: Pros: - You can access and manage files from outside the database and possibly using different interfaces. - There's a lot less to store directly in the database. - You can use existing file-system permissions, mechanisms, and limits. Cons: - You are dealing with two storage systems and two different locking systems which are unlikely to play nice with each other. Transactions are not guaranteed to be atomic (e.g. a database rollback will not rollback a file system operation, a commit will not guarantee that data in a file will stay). - The file system has to be seen by the database system and any remote clients that wish to use your application, meaning that a networked FS is likely to be used (depending on how many clients you have and how you like to separate services), with all the fun that comes from administering one of those. Note that this one in particular really only applies to enterprise-level installations, not smaller installations like the original poster's. - If you don't put files on a separate disk-system or networked FS, you can get poor performance from the disk thrashing between the database and the files. There are a couple main points: 1. The favorite answer in computing, it depends, applies here. What you decide depends on your storage system, your service and installation policies, and how important fully atomic transactions are to you. 2. If you want optimal performance out of either of these basic models, you should make proper use of separate disk systems. I have no idea which one is faster (it depends, I'm sure) nor do I have much of an idea of how to benchmark this properly. Peter -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] What is the best way to storage music files in Postgresql
It seems to me as such a database gets larger, it will become much harder to manage with the 2 systems. I am talking mostly about music. So each song should not get too large. I was just talking about points to consider in general. Getting to your specific situation... As far as BLOBs vs. file pointers. Test it out, use what you're most comfortable using. I would not set up a networked file system for the sole purpose of managing and storing files a database will point to. If you already have access to a networked file system, consider that as an option, but don't add more work for yourself if you don't have to. Many applications I work on use the database to store pathnames while the files themselves are stored in a networked file system. It's honestly not a huge pain to manage this if it's already available, but as I mentioned before, there are caveats. Also, in my experiences, the amount of management you do in a database doesn't directly depending on the amount of data you put in. In other words, your database shouldn't become much more difficult to manage over time if all you are doing is adding more rows to tables. I have read alot on this list and on other resources and there seems to be leanings toward 1+0 raids for storage. It seems to the most flexible when it comes to speed, redundancy and recovery time. I do want my database to be fully atomic. I think that is important as this database grows. Are my assumptions wrong? As far as RAID levels go, RAID 10 is usually optimal for databases, so your assumptions are correct. The extra cost for disks, I believe, is paid off by the advantages you mentioned, at least for typical database-related workloads. RAID 0 doesn't allow for any disaster recovery, RAID 1 is ok as long as you can handle having only 2 disks available, and RAID 5 and RAID 6 are just huge pains and terribly slow for writes. Note that you should go for a battery-backup if you use hardware RAID. Hope this helps. Peter -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Anyone using a SAN?
Hi all, I had a few meetings with SAN vendors and I thought I'd give you some follow-up on points of potential interest. - Dell/EMC The representative was like the Dell dude grown up. The sales pitch mentioned price point about twenty times (to the point where it was annoying), and the pitch ultimately boiled down to Dude, you're getting a SAN. My apologies in advance to bringing back repressed memories of the Dell dude. As far as technical stuff goes, it's about what you'd expect from a low-level SAN. The cost for a SAN was in the $2-3 per GB range if you went with the cheap option...not terrible, but not great either, especially since you'd have to buy lots of GB. Performance numbers weren't bad, but they weren't great either. - 3par The sales pitch was more focused on technical aspects and only mentioned price point twice...which is a win in my books, at least compared to Dell. Their real place to shine was in the technical aspect. Whereas Dell just wanted to sell you a storage system that you put on a network, 3par wanted to sell you a storage system specifically designed for a network, and change the very way you think about storage. They had a bunch of cool management concepts, and very advanced failover, power outage, and backup techniques and tools. Performance wasn't shabby, either, for instance a RAID 5 set could get about 90% the IOPS and transfer rate that a RAID 10 set could. How exactly this compares to DAS they didn't say. The main stumbling block with 3par is price. While they didn't give any specific numbers, best estimates put a SAN in the $5-7 per GB range. The extra features just might be worth it though. - Lefthand This is going to be an upcoming meeting, so I don't have as good of an opinion. Looking at their website, they seem more to the Dell end in terms of price and functionality. I'll keep you in touch as I have more info. They seem good for entry-level SANs, though. Luckily, almost everything here works with Linux (at least the major distros), including the management tools, in case people were worried about that. One of the key points to consider going forward is that the competition of iSCSI and Fibre Channel techs will likely bring price down in the future. While SANs are certainly more expensive than their DAS counterparts, the gap appears to be closing. However, to paraphrase a discussion between a few of my co-workers, you can buy toilet paper or kitty litter in huge quantities because you know you'll eventually use it...and it doesn't change in performance or basic functionality. Storage is just something that you don't always want to buy a lot of in one go. It will get bigger, and cheaper, and probably faster in a relatively short amount of time. The other thing is that you can't really get a small SAN. The minimum is usually in the multiple TB range (and usually 10 TB). I'd love to be able to put together a proof of concept and a test using 3par's technology and commodity 80GB slow disks, but I really can't. You're stuck with going all-in right away, and enough people have had problems being married to specific techs or vendors that it's really hard to break that uneasiness. Thanks for reading, hopefully you found it slightly informative. Peter -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] disabling an index without deleting it?
This might be a weird question...is there any way to disable a particular index without dropping it? There are a few queries I run where I'd like to test out the effects of having (and not having) different indexes on particular query plans and performance. I'd really prefer not to have to drop and ultimately recreate a particular index, as some of the data sets are quite large. So, is there any way to do this, or at least mimic this sort of behavior? Peter ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Anyone using a SAN?
That's true about SANs in general. You don't buy a SAN because it'll cost less than just buying the disks and a controller. You buy a SAN because it'll let you make managing it easier. The break-even point has more to do with how many servers you're able to put on the SAN and how often you need to do tricky backup and upgrade procedures than it doeswith the hardware. One big reason we're really looking into a SAN option is that we have a lot of unused disk space. A typical disk usage scheme for us is 6 GB for a clean Linux install, and 20 GB for a Windows install. Our disks are typically 80GB, and even after decent amounts of usage we're not even approaching half that. We install a lot of software in AFS, our networked file system, and users' home directories and project directories are in AFS as well. Local disk space is relegated to the OS and vendor software, servers that need it, and seldom-used scratch space. There might very well be a break-even point for us in terms of cost. One of the other things I was interested in was the hidden costs of a SAN. For instance, we'd probably have to invest in more UPS capacity to protect our data. Are there any other similar points that people don't initially consider regarding a SAN? Again, thanks for all your help. Peter ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
[PERFORM] Anyone using a SAN?
Hi all, We're considering setting up a SAN where I work. Is there anyone using a SAN, for postgres or other purposes? If so I have a few questions for you. - Are there any vendors to avoid or ones that are particularly good? - What performance or reliability implications exist when using SANs? - Are there any killer features with SANs compared to local storage? Any other comments are certainly welcome. Peter ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Anyone using a SAN?
Thanks for all your input, it is very helpful. A SAN for our postgres deployment is probably sufficient in terms of performance, because we just don't have that much data. I'm a little concerned about needs for user and research databases, but if a project needs a big, fast database, it might be wise to have them shell out for DAS. My co-workers and I are meeting with a vendor in two weeks (3Par, specifically), and I think I have a better idea of what I should be looking at. I'll keep you all up on the situation. Keep the ideas coming as I still would like to know of any other important factors. Thanks again. Peter ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Join Query Perfomance Issue
I have serious performance problems with the following type of queries: Doesnt looks too bad to me, but i'm not that deep into sql query optimization. However, these type of query is used in a function to access a normalized, partitioned database, so better performance in this queries would speed up the whole database system big times. Any suggestions here would be great. I allready tested some things, using inner join, rearranging the order of the tables, but but only minor changes in the runtime, the version above seemed to get us the best performance. Can you send the table definitions of the tables involved in the query, including index information? Might be if we look hard enough we can find something. Peter ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] TB-sized databases
Thanks all. This is just what I needed. On Nov 26, 2007 1:16 PM, Stephen Cook [EMAIL PROTECTED] wrote: I think either would work; both PostgreSQL and MS SQL Server have success stories out there running VLDBs. It really depends on what you know and what you have. If you have a lot of experience with Postgres running on Linux, and not much with SQL Server on Windows, of course the former would be a better choice for you. You stand a much better chance working with tools you know. Pablo Alcaraz wrote: I had a client that tried to use Ms Sql Server to run a 500Gb+ database. The database simply colapsed. They switched to Teradata and it is running good. This database has now 1.5Tb+. Currently I have clients using postgresql huge databases and they are happy. In one client's database the biggest table has 237Gb+ (only 1 table!) and postgresql run the database without problem using partitioning, triggers and rules (using postgresql 8.2.5). Pablo Peter Koczan wrote: Hi all, I have a user who is looking to store 500+ GB of data in a database (and when all the indexes and metadata are factored in, it's going to be more like 3-4 TB). He is wondering how well PostgreSQL scales with TB-sized databases and what can be done to help optimize them (mostly hardware and config parameters, maybe a little advocacy). I can't speak on that since I don't have any DBs approaching that size. The other part of this puzzle is that he's torn between MS SQL Server (running on Windows and unsupported by us) and PostgreSQL (running on Linux...which we would fully support). If any of you have ideas of how well PostgreSQL compares to SQL Server, especially in TB-sized databases, that would be much appreciated. We're running PG 8.2.5, by the way. Peter ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[PERFORM] TB-sized databases
Hi all, I have a user who is looking to store 500+ GB of data in a database (and when all the indexes and metadata are factored in, it's going to be more like 3-4 TB). He is wondering how well PostgreSQL scales with TB-sized databases and what can be done to help optimize them (mostly hardware and config parameters, maybe a little advocacy). I can't speak on that since I don't have any DBs approaching that size. The other part of this puzzle is that he's torn between MS SQL Server (running on Windows and unsupported by us) and PostgreSQL (running on Linux...which we would fully support). If any of you have ideas of how well PostgreSQL compares to SQL Server, especially in TB-sized databases, that would be much appreciated. We're running PG 8.2.5, by the way. Peter ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Memory Settings....
I recently tweaked some configs for performance, so I'll let you in on what I changed. For memory usage, you'll want to look at shared_buffers, work_mem, and maintenance_work_mem. Postgres defaults to very low values of this, and to get good performance and not a lot of disk paging, you'll want to raise those values (you will need to restart the server and possibly tweak some memory config for lots of shared_buffers, I had to raise SHMMAX on Linux, but I don't know the Windows analogue). The basic rule of thumb for shared_buffers is 25%-50% of main memory, enough to use main memory but leaving some to allow work_mem to do its thing and allow any other programs to run smoothly. Tweak this as necessary. The other big thing is the free space map, which tracks free space and helps to prevent index bloat. A VACUUM VERBOSE in a database will tell you what these values should be set to. Go here for full details: http://www.postgresql.org/docs/8.2/static/runtime-config.html, especially http://www.postgresql.org/docs/8.2/static/runtime-config-resource.html Peter On 10/22/07, Lee Keel [EMAIL PROTECTED] wrote: I have a client server that is dedicated to being a Postgres 8.2.4 database server for many websites. This server will contain approximately 15 databases each containing between 40-100 tables. Each database will have approximately 7 web applications pulling data from it, but there will probably be no more than 50 simultaneous requests. The majority of the tables will be very small tables around 1K in total size. However, most of the queries will be going to the other 10-15 tables that are in each database that will contain postgis shapes. These tables will range in size from 50 to 730K rows and each row will range in size from a 2K to 3MB. The data will be truncated and reinserted as part of a nightly process but other than that, there won't be many writes during the day. I am trying to tune this server to its maximum capacity. I would appreciate any advice on any of the settings that I should look at. I have not changed any of the settings before because I have never really needed to. And even now, I have not experienced any bad performance, I am simply trying to turn the track before the train gets here. Server Specification: Windows 2003 Enterprise R2 Dual-Quad Core 2.33GHz 8GB RAM 263 GB HD (I am not 100% on drive speed, but I think it is 15K) Thanks in advance, Lee Keel This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the sender. This message contains confidential information and is intended only for the individual named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail. ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] sequence query performance issues
*light bulb* Ahhh, that's it. So, I guess the solution is either to cast the column or wait for 8.3 (which isn't a problem since the port won't be done until 8.3 is released anyway). Just a quick bit of follow-up: This query works and is equivalent to what I was trying to do (minus the randomization and limiting): = select a.uid from generate_series(1000, 32000) as a(uid) where a.uid::smallint not in (select uid from people where uid is not null); It turns out that this and using coalesce are a wash in terms of performance, usually coming within 10 ms of each other no matter what limit and ordering constraints you put on the queries. Peter = explain analyze select a.uid from generate_series(1000, 32767) as a(uid) where a.uid not in (select coalesce(uid, 0) from people); QUERY PLAN - Function Scan on generate_series a (cost=718.41..733.41 rows=500 width=4) (actual time=68.742..186.340 rows=26808 loops=1) Filter: (NOT (hashed subplan)) SubPlan - Seq Scan on people (cost=0.00..702.68 rows=6294 width=2) (actual time=0.025..28.368 rows=6294 loops=1) Total runtime: 286.311 ms (5 rows) = explain analyze select a.uid from generate_series(1000, 32767) as a(uid) where a.uid::smallint not in (select uid from people where uid is not null); QUERY PLAN - Function Scan on generate_series a (cost=699.34..716.84 rows=500 width=4) (actual time=58.508..177.683 rows=26808 loops=1) Filter: (NOT (hashed subplan)) SubPlan - Seq Scan on people (cost=0.00..686.94 rows=4958 width=2) (actual time=0.017..23.123 rows=4971 loops=1) Filter: (uid IS NOT NULL) Total runtime: 277.699 ms (6 rows) ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] sequence query performance issues
Hmm - why is it doing that? I'm betting that the OP's people.uid column is not an integer. Existing PG releases can't use hashed subplans for cross-data-type comparisons (8.3 will be a bit smarter). *light bulb* Ahhh, that's it. So, I guess the solution is either to cast the column or wait for 8.3 (which isn't a problem since the port won't be done until 8.3 is released anyway). Thanks again. Peter ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[PERFORM] sequence query performance issues
Hello, I have a weird performance issue with a query I'm testing. Basically, I'm trying to port a function that generates user uids, and since postgres offers a sequence generator function, I figure I'd take advantage of that. Basically, I generate our uid range, filter out those which are in use, and randomly pick however many I need. However, when I run it it takes forever (10 minutes and I get nothing so I cancelled the query) and cpu usage on the server is maxed out. Here's my query (I'll post the explain output later so as not to obscure my question): = select a.uid from generate_series(1000, 32767) as a(uid) where a.uid not in (select uid from people) order by random() limit 1; I thought that nulls were a problem, so I tried: = select a.uid from generate_series(1000, 32767) as a(uid) where a.uid not in (select coalesce(uid,0) from people) order by random() limit 1; And that finished in less than a second. I then tried: = select a.uid from generate_series(1000, 32767) as a(uid) where a.uid not in (select coalesce(uid,0) from people where uid is not null) order by random() limit 1; And we're back to taking forever. So I have 2 questions: - Is there a better query for this purpose? Mine works when coalesced, but it seems a little brute-force and the random() sorting, while kinda nice, is slow. - Is this in any way expected? I know that nulls sometimes cause problems, but why is it taking forever even when trying to filter those out? Thanks. Peter The gory details: - There is an btree index on people(uid), and there are ~6300 rows, of which ~1300 have null uids. - EXPLAIN output (I couldn't get EXPLAIN ANALYZE output from the first two queries since they took too long): = explain select a.uid from generate_series(1000, 32767) as a(uid) where a.uid not in (select uid from people) order by random() limit 1; QUERY PLAN -- Limit (cost=40025.57..40025.60 rows=10 width=4) - Sort (cost=40025.57..40026.82 rows=500 width=4) Sort Key: random() - Function Scan on generate_series a (cost=693.16..40003.16 rows=500 width=4) Filter: (NOT (subplan)) SubPlan - Materialize (cost=693.16..756.03 rows=6287 width=2) - Seq Scan on people (cost=0.00..686.87 rows=6287 width=2) (8 rows) = explain select a.uid from generate_series(1000, 32767) as a(uid) where a.uid not in (select uid from people where uid is not null) order by random() limit 1; QUERY PLAN -- Limit (cost=31486.71..31486.73 rows=10 width=4) - Sort (cost=31486.71..31487.96 rows=500 width=4) Sort Key: random() - Function Scan on generate_series a (cost=691.79..31464.29 rows=500 width=4) Filter: (NOT (subplan)) SubPlan - Materialize (cost=691.79..741.00 rows=4921 width=2) - Seq Scan on people (cost=0.00..686.87 rows=4921 width=2) Filter: (uid IS NOT NULL) (9 rows) = explain select a.uid from generate_series(1000, 32767) as a(uid) where a.uid not in (select coalesce(uid, 0) from people) order by random() limit 1; QUERY PLAN Limit (cost=756.97..756.99 rows=10 width=4) - Sort (cost=756.97..758.22 rows=500 width=4) Sort Key: random() - Function Scan on generate_series a (cost=718.30..734.55 rows=500 width=4) Filter: (NOT (hashed subplan)) SubPlan - Seq Scan on people (cost=0.00..702.59 rows=6287 width=2) (7 rows) = explain analyze select a.uid from generate_series(1000, 32767) as a(uid) where a.uid not in (select coalesce(uid, 0) from people) order by random() limit 1; QUERY PLAN - Limit (cost=756.97..756.99 rows=10 width=4) (actual time=370.444..370.554 rows=10 loops=1) - Sort (cost=756.97..758.22 rows=500 width=4) (actual time=370.434..370.472 rows=10 loops=1) Sort Key: random() - Function Scan on generate_series a (cost=718.30..734.55 rows=500 width=4) (actual time=70.018..199.540 rows=26808 loops=1) Filter: (NOT (hashed subplan)) SubPlan - Seq Scan on people (cost=0.00..702.59 rows=6287 width=2) (actual time=0.023..29.167 rows=6294 loops=1) Total runtime: 372.224 ms (8 rows) ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by
Re: [PERFORM] Tablespaces and NFS
Anyway... One detail I don't understand --- why do you claim that You can't take advantage of the shared file system because you can't share tablespaces among clusters or servers ??? I say that because you can't set up two servers to point to the same tablespace (i.e. you can't have server A and server B both point to the tablespace in /mnt/nfs/postgres/), which basically defeats one of the main purposes of using a shared file system, seeing, using, and editing files from anywhere. This is ill-advised and probably won't work for 2 reasons. - Postgres tablespaces require empty directories to for initialization. If you create a tablespace on server A, it puts files in the previously empty directory. If you then try to create a tablespace on server B pointing to the same location, it won't work since the directory is no longer empty. You can get around this, in theory, but you'd either have to directly mess with system tables or fool Postgres into thinking that each server independently created that tablespace (to which anyone will say, NO). - If you do manage to fool postgres into having two servers pointing at the same tablespace, the servers really, REALLY won't play nice with these shared resources, since they have no knowledge of each other (i mean, two clusters on the same server don't play nice with memory). Basically, if they compete for the same file, either I/O will be EXTREMELY slow because of file-locking mechanisms in the file system, or you open things up to race conditions and data corruption. In other words: BAD I know this doesn't fully apply to you, but I thought I should explain my points betters since you asked so nicely :-) This seems to be the killer point --- mainly because the network connection is a 100Mbps (around 10 MB/sec --- less than 1/4 of the performance we'd expect from an internal hard drive). If at least it was a Gigabit connection, I might still be tempted to retry the experiment. I was thinking that *maybe* the latencies and contention due to heads movements (in the order of the millisec) would take precedence and thus, a network-distributed cluster of hard drives would end up winning. If you get decently fast disks, or put some slower disks in RAID 10, you'll easily get 100 MB/sec (and that's a conservative estimate). Even with a Gbit network, you'll get, in theory 128 MB/sec, and that's assuming that the NFS'd disks aren't a bottleneck. We're clear that that would be the *optimal* solution --- problem is, there's a lot of client-side software that we would have to change; I'm first looking for a transparent solution in which I could distribute the load at a hardware level, seeing the DB server as a single entity --- the ideal solution, of course, being the use of tablespaces with 4 or 6 *internal* hard disks (but that's not an option with our current web hoster). I sadly don't know enough networking to tell you tell the client software no really, I'm over here. However, one of the things I'm fond of is using a module to store connection strings, and dynamically loading said module on the client side. For instance, with Perl I use... use DBI; use DBD::Pg; use My::DBs; my $dbh = DBI-connect($My::DBs::mydb); Assuming that the module and its entries are kept up to date, it will just work. That way, there's only 1 module to change instead of n client apps. I can have a new server with a new name up without changing any client code. Anyway, I'll keep working on alternative solutions --- I think I have enough evidence to close this NFS door. That's probably for the best. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Tablespaces and NFS
On 9/19/07, Carlos Moreno [EMAIL PROTECTED] wrote: Hi, Anyone has tried a setup combining tablespaces with NFS-mounted partitions? I'm considering the idea as a performance-booster --- our problem is that we are renting our dedicated server from a hoster that does not offer much flexibility in terms of custom hardware configuration; so, the *ideal* alternative to load the machine with 4 or 6 hard drives and use tablespaces is off the table (no pun intended). We could, however, set up a few additional servers where we could configure NFS shares, mount them on the main PostgreSQL server, and configure tablespaces to load balance the access to disk. Would you estimate that this will indeed boost performance?? (our system does lots of writing to DB --- in all forms: inserts, updates, and deletes) As a corollary question: what about the WALs and tablespaces?? Are the WALs distributed when we setup a tablespace and create tables in it? (that is, are the WALs corresponding to the tables in a tablespace stored in the directory corresponding to the tablespace? Or is it only the data, and the WAL keeps being the one and only?) Thanks, Carlos About 5 months ago, I did an experiment serving tablespaces out of AFS, another shared file system. You can read my full post at http://archives.postgresql.org/pgsql-admin/2007-04/msg00188.php On the whole, you're not going to see a performance improvement running tablespaces on NFS (unless the disk system on the NFS server is a lot faster) since you have to go through the network as well as NFS, both of which add overhead. Usually, locking mechanisms on shared file systems don't play nice with databases. You're better off using something else to load balance or replicate data. Peter P.S. Why not just set up those servers you're planning on using as NFS shares as your postgres server(s)? ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org