Re: [PERFORM] PostgreSQL and Ultrasparc T1
Jignesh, Juan says the following below: I figured the number of cores on the T1000/2000 processors would be utilized by the forked copies of the postgresql server. From the comments I have seen so far it does not look like this is the case. I think this needs to be refuted. Doesn't Solaris switch processes as well as threads (LWPs, whatever) equally well amongst cores? I realize the process context switch is more expensive than the thread switch, but Solaris will utilize all cores as processes or threads become ready to run, correct? BTW, it's great to see folks with your email address on the list. I feel it points to a brighter future for all involved. Thanks, Rick Jignesh K. Shah [EMAIL PROTECTED] To Sent by: Juan Casero [EMAIL PROTECTED] pgsql-performance cc [EMAIL PROTECTED] pgsql-performance@postgresql.org .org Subject Re: [PERFORM] PostgreSQL and Ultrasparc T1 12/19/2005 11:19 PM I guess it depends on what you term as your metric for measurement. If it is just one query execution time .. It may not be the best on UltraSPARC T1. But if you have more than 8 complex queries running simultaneously, UltraSPARC T1 can do well compared comparatively provided the application can scale also along with it. The best way to approach is to figure out your peak workload, find an accurate way to measure the true metric and then design a benchmark for it and run it on both servers. Regards, Jignesh Juan Casero wrote: Ok. That is what I wanted to know. Right now this database is a PostgreSQL 7.4.8 system. I am using it in a sort of DSS role. I have weekly summaries of the sales for our division going back three years. I have a PHP based webapp that I wrote to give the managers access to this data. The webapp lets them make selections for reports and then it submits a parameterized query to the database for execution. The returned data rows are displayed and formatted in their web browser. My largest sales table is about 13 million rows along with all the indexes it takes up about 20 gigabytes. I need to scale this application up to nearly 100 gigabytes to handle daily sales summaries. Once we start looking at daily sales figures our database size could grow ten to twenty times. I use postgresql because it gives me the kind of enterprise database features I need to program the complex logic for the queries.I also need the transaction isolation facilities it provides so I can optimize the queries in plpgsql without worrying about multiple users temp tables colliding with each other. Additionally, I hope to rewrite the front end application in JSP so maybe I could use the multithreaded features of the Java to exploit a multicore multi-cpu system. There are almost no writes to the database tables. The bulk of the application is just executing parameterized queries and returning huge amounts of data. I know bizgres is supposed to be better at this but I want to stay away from anything that is beta. I cannot afford for this thing to go wrong. My reasoning for looking at the T1000/2000 was simply the large number of cores. I know postgresql uses a super server that forks copies of itself to handle incoming requests on port 5432. But I figured the number of cores on the T1000/2000 processors would be utilized by the forked copies of the postgresql server. From the comments I have seen so far it does not look like this is the case. We had originally sized up a dual processor dual core AMD opteron system from HP for this but I thought I could get more bang for the buck on a T1000/2000. It now seems I may have been wrong. I am stronger in Linux than Solaris so I am not upset I am just trying to find the best hardware for the anticipated needs of this application. Thanks, Juan On Monday 19 December 2005 01:25, Scott Marlowe wrote: From: [EMAIL PROTECTED] on behalf of Juan Casero QUOTE: Hi - Can anyone tell me how well PostgreSQL 8.x performs on the new Sun Ultrasparc T1 processor and architecture on
Re: [PERFORM] Cheap RAM disk?
you'd be much better served by putting a big NVRAM cache in front of a fast disk array I agree with the point below, but I think price was the issue of the original discussion. That said, it seems that a single high speed spindle would give this a run for its money in both price and performance, and for the same reasons Mike points out. Maybe a SCSI 160 or 320 at 15k, or maybe even something slower. Rick [EMAIL PROTECTED] wrote on 07/26/2005 01:33:43 PM: On Tue, Jul 26, 2005 at 11:23:23AM -0700, Luke Lonergan wrote: Yup - interesting and very niche product - it seems like it's only obvious application is for the Postgresql WAL problem :-) On the contrary--it's not obvious that it is an ideal fit for a WAL. A ram disk like this is optimized for highly random access applications. The WAL is a single sequential writer. If you're in the kind of market that needs a really high performance WAL you'd be much better served by putting a big NVRAM cache in front of a fast disk array than by buying a toy like this. Mike Stone ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Partitioning / Clustering
exploring the option of buying 10 cheapass machines for $300 each. At the moment, that $300 buys you, from Dell, a 2.5Ghz Pentium 4 Buy cheaper ass Dells with an AMD 64 3000+. Beats the crap out of the 2.5 GHz Pentium, especially for PostgreSQL. See the thread Whence the Opterons for more Rick [EMAIL PROTECTED] wrote on 05/10/2005 10:02:50 AM: I think that perhaps he was trying to avoid having to buy Big Iron at all. With all the Opteron v. Xeon around here, and talk of $30,000 machines, perhaps it would be worth exploring the option of buying 10 cheapass machines for $300 each. At the moment, that $300 buys you, from Dell, a 2.5Ghz Pentium 4 w/ 256mb of RAM and a 40Gb hard drive and gigabit ethernet. The aggregate CPU and bandwidth is pretty stupendous, but not as easy to harness as a single machine. For those of us looking at batch and data warehousing applications, it would be really handy to be able to partition databases, tables, and processing load across banks of cheap hardware. Yes, clustering solutions can distribute the data, and can even do it on a per-table basis in some cases. This still leaves it up to the application's logic to handle reunification of the data. Ideas: 1. Create a table/storage type that consists of a select statement on another machine. While I don't think the current executor is capable of working on multiple nodes of an execution tree at the same time, it would be great if it could offload a select of tuples from a remote table to an entirely different server and merge the resulting data into the current execution. I believe MySQL has this, and Oracle may implement it in another way. 2. There is no #2 at this time, but I'm sure one can be hypothesized. ...Google and other companies have definitely proved that one can harness huge clusters of cheap hardware. It can't be _that_ hard, can it. :) -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of John A Meinel Sent: Tuesday, May 10, 2005 7:41 AM To: Alex Stapleton Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Partitioning / Clustering Alex Stapleton wrote: What is the status of Postgres support for any sort of multi-machine scaling support? What are you meant to do once you've upgraded your box and tuned the conf files as much as you can? But your query load is just too high for a single machine? Upgrading stock Dell boxes (I know we could be using better machines, but I am trying to tackle the real issue) is not a hugely price efficient way of getting extra performance, nor particularly scalable in the long term. Switch from Dell Xeon boxes, and go to Opterons. :) Seriously, Dell is far away from Big Iron. I don't know what performance you are looking for, but you can easily get into inserting 10M rows/day with quality hardware. But actually is it your SELECT load that is too high, or your INSERT load, or something inbetween. Because Slony is around if it is a SELECT problem. http://gborg.postgresql.org/project/slony1/projdisplay.php Basically, Slony is a Master/Slave replication system. So if you have INSERT going into the Master, you can have as many replicated slaves, which can handle your SELECT load. Slony is an asynchronous replicator, so there is a time delay from the INSERT until it will show up on a slave, but that time could be pretty small. This would require some application level support, since an INSERT goes to a different place than a SELECT. But there has been some discussion about pg_pool being able to spread the query load, and having it be aware of the difference between a SELECT and an INSERT and have it route the query to the correct host. The biggest problem being that functions could cause a SELECT func() to actually insert a row, which pg_pool wouldn't know about. There are 2 possible solutions, a) don't do that when you are using this system, b) add some sort of comment hint so that pg_pool can understand that the select is actually an INSERT, and needs to be done on the master. So, when/is PG meant to be getting a decent partitioning system? MySQL is getting one (eventually) which is apparently meant to be similiar to Oracle's according to the docs. Clusgres does not appear to be widely/or at all used, and info on it seems pretty thin on the ground, so I am not too keen on going with that. Is the real solution to multi- machine partitioning (as in, not like MySQLs MERGE tables) on PostgreSQL actually doing it in our application API? This seems like a less than perfect solution once we want to add redundancy and things into the mix. There is also PGCluster http://pgfoundry.org/projects/pgcluster/ Which is trying to be more of a Synchronous multi-master system. I haven't heard of Clusgres, so I'm guessing it is an older attempt, which has been overtaken by pgcluster. Just realize that clusters don't
[PERFORM] Disk Edge Partitioning
I saw an interesting thought in another thread about placing database data in a partition that uses cylinders at the outer edge of the disk. I want to try this. Are the lower number cylinders closer to the edge of a SCSI disk or is it the other way around? What about ATA? Cheers, Rick ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] How to improve db performance with $7K?
[EMAIL PROTECTED] wrote on 04/19/2005 11:10:22 AM: What is 'multiple initiators' used for in the real world? I asked this same question and got an answer off list: Somebody said their SAN hardware used multiple initiators. I would try to check the archives for you, but this thread is becoming more of a rope. Multiple initiators means multiple sources on the bus issuing I/O instructions to the drives. In theory you can have two computers on the same SCSI bus issuing I/O requests to the same drive, or to anything else on the bus, but I've never seen this implemented. Others have noted this feature as being a big deal, so somebody is benefiting from it. Rick -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Intel SRCS16 SATA raid?
Dave wrote An interesting test would be to stick several drives in a cabinet and graph how performance is affected at the different price points/ technologies/number of drives. From the discussion on the $7k server thread, it seems the RAID controller would be an important data point also. And RAID level. And application load/kind. Hmmm. I just talked myself out of it. Seems like I'd end up with something akin to those database benchmarks we all love to hate. Rick [EMAIL PROTECTED] wrote on 04/15/2005 08:40:13 AM: -Original Message- From: Alex Turner [mailto:[EMAIL PROTECTED] Sent: Thursday, April 14, 2005 6:15 PM To: Dave Held Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Intel SRCS16 SATA raid? Looking at the numbers, the raptor with TCQ enabled was close or beat the Atlas III 10k drive on most benchmarks. And I would be willing to bet that the Atlas 10k is not using the same generation of technology as the Raptors. Naturaly a 15k drive is going to be faster in many areas, but it is also much more expensive. It was only 44% better on the server tests than the raptor with TCQ, but it costs nearly 300% more ($538 cdw.com, $180 newegg.com). State that in terms of cars. Would you be willing to pay 300% more for a car that is 44% faster than your competitor's? Of course you would, because we all recognize that the cost of speed/performance does not scale linearly. Naturally, you buy the best speed that you can afford, but when it comes to hard drives, the only major feature whose price tends to scale anywhere close to linearly is capacity. Note also that the 15k drive was the only drive that kept up with the raptor on raw transfer speed, which is going to matter for WAL. So get a Raptor for your WAL partition. ;) [...] The Raptor drives can be had for as little as $180/ea, which is quite a good price point considering they can keep up with their SCSI 10k RPM counterparts on almost all tests with NCQ enabled (Note that 3ware controllers _don't_ support NCQ, although they claim their HBA based queueing is 95% as good as NCQ on the drive). Just keep in mind the points made by the Seagate article. You're buying much more than just performance for that $500+. You're also buying vibrational tolerance, high MTBF, better internal environmental controls, and a pretty significant margin on seek time, which is probably your most important feature for disks storing tables. An interesting test would be to stick several drives in a cabinet and graph how performance is affected at the different price points/ technologies/number of drives. __ David B. Held Software Engineer/Array Services Group 200 14th Ave. East, Sartell, MN 56377 320.534.3637 320.253.7800 800.752.8129 ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Intel SRCS16 SATA raid?
This is a different thread that the $7k server thread. Greg Stark started it and wrote: I'm also wondering about whether I'm better off with one of these SATA raid controllers or just going with SCSI drives. Rick [EMAIL PROTECTED] wrote on 04/15/2005 10:01:56 AM: The original thread was how much can I get for $7k You can't fit a 15k RPM SCSI solution into $7K ;) Some of us are ona budget! 10k RPM SATA drives give acceptable performance at a good price, thats really the point here. I have never really argued that SATA is going to match SCSI performance on multidrive arrays for IO/sec. But it's all about the benjamins baby. If I told my boss we need $25k for a database machine, he'd tell me that was impossible, and I have $5k to do it. If I tell him $7k - he will swallow that. We don't _need_ the amazing performance of a 15k RPM drive config. Our biggest hit is reads, so we can buy 3xSATA machines and load balance. It's all about the application, and buying what is appropriate. I don't buy a Corvette if all I need is a malibu. Alex Turner netEconomist On 4/15/05, Dave Held [EMAIL PROTECTED] wrote: -Original Message- From: Alex Turner [mailto:[EMAIL PROTECTED] Sent: Thursday, April 14, 2005 6:15 PM To: Dave Held Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Intel SRCS16 SATA raid? Looking at the numbers, the raptor with TCQ enabled was close or beat the Atlas III 10k drive on most benchmarks. And I would be willing to bet that the Atlas 10k is not using the same generation of technology as the Raptors. Naturaly a 15k drive is going to be faster in many areas, but it is also much more expensive. It was only 44% better on the server tests than the raptor with TCQ, but it costs nearly 300% more ($538 cdw.com, $180 newegg.com). State that in terms of cars. Would you be willing to pay 300% more for a car that is 44% faster than your competitor's? Of course you would, because we all recognize that the cost of speed/performance does not scale linearly. Naturally, you buy the best speed that you can afford, but when it comes to hard drives, the only major feature whose price tends to scale anywhere close to linearly is capacity. Note also that the 15k drive was the only drive that kept up with the raptor on raw transfer speed, which is going to matter for WAL. So get a Raptor for your WAL partition. ;) [...] The Raptor drives can be had for as little as $180/ea, which is quite a good price point considering they can keep up with their SCSI 10k RPM counterparts on almost all tests with NCQ enabled (Note that 3ware controllers _don't_ support NCQ, although they claim their HBA based queueing is 95% as good as NCQ on the drive). Just keep in mind the points made by the Seagate article. You're buying much more than just performance for that $500+. You're also buying vibrational tolerance, high MTBF, better internal environmental controls, and a pretty significant margin on seek time, which is probably your most important feature for disks storing tables. An interesting test would be to stick several drives in a cabinet and graph how performance is affected at the different price points/ technologies/number of drives. __ David B. Held Software Engineer/Array Services Group 200 14th Ave. East, Sartell, MN 56377 320.534.3637 320.253.7800 800.752.8129 ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Intel SRCS16 SATA raid?
Greg, I posted this link under a different thread (the $7k server thread). It is a very good read on why SCSI is better for servers than ATA. I didn't note bias, though it is from a drive manufacturer. YMMV. There is an interesting, though dated appendix on different manufacturers' drive characteristics. http://www.seagate.com/content/docs/pdf/whitepaper/D2c_More_than_Interface_ATA_vs_SCSI_042003.pdf Enjoy, Rick [EMAIL PROTECTED] wrote on 04/14/2005 09:54:45 AM: Our vendor is trying to sell us on an Intel SRCS16 SATA raid controller instead of the 3ware one. Poking around it seems this does come with Linux drivers and there is a battery backup option. So it doesn't seem to be completely insane. Anyone have any experience with these controllers? I'm also wondering about whether I'm better off with one of these SATA raid controllers or just going with SCSI drives. -- greg ---(end of broadcast)--- TIP 8: explain analyze is your friend ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Intel SRCS16 SATA raid?
Nice research Alex. Your data strongly support the information in the paper. Your SCSI drives blew away the others in all of the server benchmarks. They're only marginally better in desktop use. I do find it somewhat amazing that a 15K SCSI 320 drive isn't going to help me play Unreal Tournament much faster. That's okay. I suck at it anyway. My kid has never lost to me. She enjoys seeing daddy as a bloody smear and bouncing body parts anyway. It promotes togetherness. Here's a quote from the paper: [SCSI] interfaces support multiple initiators or hosts. The drive must keep track of separate sets of information for each host to which it is attached, e.g., maintaining the processor pointer sets for multiple initiators and tagged commands. The capability of SCSI/FC to efficiently process commands and tasks in parallel has also resulted in a higher overhead kernel structure for the firmware. Has anyone ever seen a system with multiple hosts or initiators on a SCSI bus? Seems like it would be a very cool thing in an SMP architecture, but I've not seen an example implemented. Rick Alex Turner [EMAIL PROTECTED] wrote on 04/14/2005 12:13:41 PM: I have put together a little head to head performance of a 15k SCSI, 10k SCSI 10K SATA w/TCQ, 10K SATA wo/TCQ and 7.2K SATA drive comparison at storage review http://www.storagereview.com/php/benchmark/compare_rtg_2001.php? typeID=10testbedID=3osID=4raidconfigID=1numDrives=1devID_0=232devID_1=40devID_2=259devID_3=267devID_4=261devID_5=248devCnt=6 It does illustrate some of the weaknesses of SATA drives, but all in all the Raptor drives put on a good show. Alex Turner netEconomist On 4/14/05, Alex Turner [EMAIL PROTECTED] wrote: I have read a large chunk of this, and I would highly recommend it to anyone who has been participating in the drive discussions. It is most informative!! Alex Turner netEconomist On 4/14/05, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: Greg, I posted this link under a different thread (the $7k server thread). It is a very good read on why SCSI is better for servers than ATA. I didn't note bias, though it is from a drive manufacturer. YMMV. There is an interesting, though dated appendix on different manufacturers' drive characteristics. http://www.seagate. com/content/docs/pdf/whitepaper/D2c_More_than_Interface_ATA_vs_SCSI_042003.pdf Enjoy, Rick [EMAIL PROTECTED] wrote on 04/14/2005 09:54:45 AM: Our vendor is trying to sell us on an Intel SRCS16 SATA raid controller instead of the 3ware one. Poking around it seems this does come with Linux drivers and there is a battery backup option. So it doesn't seem to be completely insane. Anyone have any experience with these controllers? I'm also wondering about whether I'm better off with one of these SATA raid controllers or just going with SCSI drives. -- greg ---(end of broadcast)--- TIP 8: explain analyze is your friend ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] How to improve db performance with $7K?
Another simple question: Why is SCSI more expensive? After the eleventy-millionth controller is made, it seems like SCSI and SATA are using a controller board and a spinning disk. Is somebody still making money by licensing SCSI technology? Rick [EMAIL PROTECTED] wrote on 04/06/2005 11:58:33 PM: You asked for it! ;-) If you want cheap, get SATA. If you want fast under *load* conditions, get SCSI. Everything else at this time is marketing hype, either intentional or learned. Ignoring dollars, expect to see SCSI beat SATA by 40%. * * * What I tell you three times is true * * * Also, compare the warranty you get with any SATA drive with any SCSI drive. Yes, you still have some change leftover to buy more SATA drives when they fail, but... it fundamentally comes down to some actual implementation and not what is printed on the cardboard box. Disk systems are bound by the rules of queueing theory. You can hit the sales rep over the head with your queueing theory book. Ultra320 SCSI is king of the hill for high concurrency databases. If you're only streaming or serving files, save some money and get a bunch of SATA drives. But if you're reading/writing all over the disk, the simple first-come-first-serve SATA heuristic will hose your performance under load conditions. Next year, they will *try* bring out some SATA cards that improve on first-come-first-serve, but they ain't here now. There are a lot of rigged performance tests out there... Maybe by the time they fix the queueing problems, serial Attached SCSI (a/k/a SAS) will be out. Looks like Ultra320 is the end of the line for parallel SCSI, as Ultra640 SCSI (a/k/a SPI-5) is dead in the water. Ultra320 SCSI. Ultra320 SCSI. Ultra320 SCSI. Serial Attached SCSI. Serial Attached SCSI. Serial Attached SCSI. For future trends, see: http://www.incits.org/archive/2003/in031163/in031163.htm douglas p.s. For extra credit, try comparing SATA and SCSI drives when they're 90% full. On Apr 6, 2005, at 8:32 PM, Alex Turner wrote: I guess I'm setting myself up here, and I'm really not being ignorant, but can someone explain exactly how is SCSI is supposed to better than SATA? Both systems use drives with platters. Each drive can physically only read one thing at a time. SATA gives each drive it's own channel, but you have to share in SCSI. A SATA controller typicaly can do 3Gb/sec (384MB/sec) per drive, but SCSI can only do 320MB/sec across the entire array. What am I missing here? Alex Turner netEconomist ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] How to improve db performance with $7K?
Yep, that's it, as well as increased quality control. I found this from Seagate: http://www.seagate.com/content/docs/pdf/whitepaper/D2c_More_than_Interface_ATA_vs_SCSI_042003.pdf With this quote (note that ES stands for Enterprise System and PS stands for Personal System): There is significantly more silicon on ES products. The following comparison comes from a study done in 2000: the ES ASIC gate count is more than 2x a PS drive, the embedded SRAM space for program code is 2x, the permanent flash memory for program code is 2x, data SRAM and cache SRAM space is more than 10x. The complexity of the SCSI/FC interface compared to the IDE/ATA interface shows up here due in part to the more complex system architectures in which ES drives find themselves. ES interfaces support multiple initiators or hosts. The drive must keep track of separate sets of information for each host to which it is attached, e.g., maintaining the processor pointer sets for multiple initiators and tagged commands. The capability of SCSI/FC to efficiently process commands and tasks in parallel has also resulted in a higher overhead kernel structure for the firmware. All of these complexities and an overall richer command set result in the need for a more expensive PCB to carry the electronics. Rick Alex Turner [EMAIL PROTECTED] wrote on 04/07/2005 10:46:31 AM: Based on the reading I'm doing, and somebody please correct me if I'm wrong, it seems that SCSI drives contain an on disk controller that has to process the tagged queue. SATA-I doesn't have this. This additional controller, is basicaly an on board computer that figures out the best order in which to process commands. I believe you are also paying for the increased tolerance that generates a better speed. If you compare an 80Gig 7200RPM IDE drive to a WD Raptor 76G 10k RPM to a Seagate 10k.6 drive to a Seagate Cheatah 15k drive, each one represents a step up in parts and technology, thereby generating a cost increase (at least thats what the manufactures tell us). I know if you ever held a 15k drive in your hand, you can notice a considerable weight difference between it and a 7200RPM IDE drive. Alex Turner netEconomist On Apr 7, 2005 11:37 AM, [EMAIL PROTECTED] [EMAIL PROTECTED] wrote: Another simple question: Why is SCSI more expensive? After the eleventy-millionth controller is made, it seems like SCSI and SATA are using a controller board and a spinning disk. Is somebody still making money by licensing SCSI technology? Rick [EMAIL PROTECTED] wrote on 04/06/2005 11:58:33 PM: You asked for it! ;-) If you want cheap, get SATA. If you want fast under *load* conditions, get SCSI. Everything else at this time is marketing hype, either intentional or learned. Ignoring dollars, expect to see SCSI beat SATA by 40%. * * * What I tell you three times is true * * * Also, compare the warranty you get with any SATA drive with any SCSI drive. Yes, you still have some change leftover to buy more SATA drives when they fail, but... it fundamentally comes down to some actual implementation and not what is printed on the cardboard box. Disk systems are bound by the rules of queueing theory. You can hit the sales rep over the head with your queueing theory book. Ultra320 SCSI is king of the hill for high concurrency databases. If you're only streaming or serving files, save some money and get a bunch of SATA drives. But if you're reading/writing all over the disk, the simple first-come-first-serve SATA heuristic will hose your performance under load conditions. Next year, they will *try* bring out some SATA cards that improve on first-come-first-serve, but they ain't here now. There are a lot of rigged performance tests out there... Maybe by the time they fix the queueing problems, serial Attached SCSI (a/k/a SAS) will be out. Looks like Ultra320 is the end of the line for parallel SCSI, as Ultra640 SCSI (a/k/a SPI-5) is dead in the water. Ultra320 SCSI. Ultra320 SCSI. Ultra320 SCSI. Serial Attached SCSI. Serial Attached SCSI. Serial Attached SCSI. For future trends, see: http://www.incits.org/archive/2003/in031163/in031163.htm douglas p.s. For extra credit, try comparing SATA and SCSI drives when they're 90% full. On Apr 6, 2005, at 8:32 PM, Alex Turner wrote: I guess I'm setting myself up here, and I'm really not being ignorant, but can someone explain exactly how is SCSI is supposed to better than SATA? Both systems use drives with platters. Each drive can physically only read one thing at a time. SATA gives each drive it's own channel, but you have to share in SCSI. A SATA controller typicaly can do 3Gb/sec (384MB/sec) per drive, but SCSI can only do 320MB/sec across the entire array. What am I missing here? Alex
Re: [PERFORM] Reading recommendations
Steve Wampler [EMAIL PROTECTED] wrote on 03/30/2005 03:58:12 PM: [EMAIL PROTECTED] wrote: Mohan, Ross wrote: VOIP over BitTorrent? Now *that* I want to see. Aught to be at least as interesting as the TCP/IP over carrier pigeon experiment - and more challenging to boot! It was very challenging. I worked on the credit window sizing and retransmission timer estimation algorithms. We took into account weather patterns, size and age of the bird, feeding times, and the average number of times a bird circles before determining magnetic north. Interestingly, packet size had little effect in the final algorithms. I would love to share them with all of you, but they're classified. Ah, but VOIPOBT requires many people all saying the same thing at the same time. The synchronization alone (since you need to distribute these people adequately to avoid overloading a trunk line...) is probably sufficiently hard to make it interesting. Then there are the problems of different accents, dilects, and languages ;) Interestingly, we had a follow on contract to investigate routing optimization using flooding techniques. Oddly, it was commissioned by a consortium of local car washes. Work stopped when the park service sued us for the cost of cleaning all the statuary, and the company went out of business. We were serving cornish game hens at our frequent dinner parties for months. -- Steve Wampler -- [EMAIL PROTECTED] The gods that smiled on your birth are now laughing out loud. ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Reading recommendations
[EMAIL PROTECTED] wrote on 03/31/2005 10:48:09 AM: Stefan Weiss wrote: On 2005-03-31 15:19, [EMAIL PROTECTED] wrote: Now *that* I want to see. Aught to be at least as interesting as the TCP/IP over carrier pigeon experiment - and more challenging to boot! .. Interestingly, we had a follow on contract to investigate routing optimization using flooding techniques. Oddly, it was commissioned by a consortium of local car washes. Work stopped when the park service sued us for the cost of cleaning all the statuary, and the company went out of business. We were serving cornish game hens at our frequent dinner parties for months. This method might have been safer (and it works great with Apaches): http://eagle.auc.ca/~dreid/ Aha - VOIPOBD as well as VOIPOBT! What more can one want? VOIPOCP, I suppose... Start collecting recipes for small game birds now. We ran out pretty quickly. Finally came up with Pigeon Helper and sold it to homeless shelters in New York. Sales were slow until we added a wine sauce. -- Steve Wampler -- [EMAIL PROTECTED] The gods that smiled on your birth are now laughing out loud. ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Reading recommendations
[EMAIL PROTECTED] wrote on 03/30/2005 10:58:21 AM: Allow telecommute from across the pond and I might be interested :-) Please post phone bills to this list. -- Michael Fuhr http://www.fuhr.org/~mfuhr/ ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Reading recommendations
It was very challenging. I worked on the credit window sizing and retransmission timer estimation algorithms. We took into account weather patterns, size and age of the bird, feeding times, and the average number of times a bird circles before determining magnetic north. Interestingly, packet size had little effect in the final algorithms. [EMAIL PROTECTED] wrote on 03/30/2005 11:52:13 AM: Mohan, Ross wrote: VOIP over BitTorrent? Now *that* I want to see. Aught to be at least as interesting as the TCP/IP over carrier pigeon experiment - and more challenging to boot! It was very challenging. I worked on the credit window sizing and retransmission timer estimation algorithms. We took into account weather patterns, size and age of the bird, feeding times, and the average number of times a bird circles before determining magnetic north. Interestingly, packet size had little effect in the final algorithms. I would love to share them with all of you, but they're classified. -- Steve Wampler -- [EMAIL PROTECTED] The gods that smiled on your birth are now laughing out loud. ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Questions about 2 databases.
this seems like a dead waste of effort :-(. The work to put the data into the main database isn't lessened at all; you've just added extra work to manage the buffer database. True from the view point of the server, but not from the throughput in the client session (client viewpoint). The client will have a blazingly fast session with the buffer database. I'm assuming the buffer database table size is zero or very small. Constraints will be a problem if there are PKs, FKs that need satisfied on the server that are not adequately testable in the buffer. Might not be a problem if the full table fits on the RAM disk, but you still have to worry about two clients inserting the same PK. Rick Tom Lane [EMAIL PROTECTED]To: [EMAIL PROTECTED] Sent by: cc: pgsql-performance@postgresql.org [EMAIL PROTECTED]Subject: Re: [PERFORM] Questions about 2 databases. tgresql.org 03/11/2005 03:33 PM jelle [EMAIL PROTECTED] writes: 1) on a single 7.4.6 postgres instance does each database have it own WAL file or is that shared? Is it the same on 8.0.x? Shared. 2) what's the high performance way of moving 200 rows between similar tables on different databases? Does it matter if the databases are on the same or seperate postgres instances? COPY would be my recommendation. For a no-programming-effort solution you could just pipe the output of pg_dump --data-only -t mytable into psql. Not sure if it's worth developing a custom application to replace that. My web app does lots of inserts that aren't read until a session is complete. The plan is to put the heavy insert session onto a ramdisk based pg-db and transfer the relevant data to the master pg-db upon session completion. Currently running 7.4.6. Unless you have a large proportion of sessions that are abandoned and hence never need be transferred to the main database at all, this seems like a dead waste of effort :-(. The work to put the data into the main database isn't lessened at all; you've just added extra work to manage the buffer database. regards, tom lane ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] PostgreSQL clustering VS MySQL clustering
I think maybe a SAN in conjunction with tablespaces might be the answer. Still need one honking server. Rick Stephen Frost [EMAIL PROTECTED] To: Christopher Kings-Lynne [EMAIL PROTECTED] Sent by: cc: Hervé Piedvache [EMAIL PROTECTED], pgsql-performance@postgresql.org [EMAIL PROTECTED]Subject: Re: [PERFORM] PostgreSQL clustering VS MySQL clustering tgresql.org 01/20/2005 10:08 AM * Christopher Kings-Lynne ([EMAIL PROTECTED]) wrote: PostgreSQL has replication, but not partitioning (which is what you want). It doesn't have multi-server partitioning.. It's got partitioning within a single server (doesn't it? I thought it did, I know it was discussed w/ the guy from Cox Communications and I thought he was using it :). So, your only option is Oracle or another very expensive commercial database. Or partition the data at the application layer. Stephen (See attached file: signature.asc) signature.asc Description: Binary data ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] PostgreSQL vs. Oracle vs. Microsoft
Jim wrote: you'd be hard-pressed to find too many real-world examples where you could do something with a PostgreSQL procedural language that you couldn't do with PL/SQL. Rick mumbled: You can't get it for nothing! %) Jim C. Nasby [EMAIL PROTECTED] To: [EMAIL PROTECTED] Sent by: cc: Frank Wiles [EMAIL PROTECTED], Yann Michel [EMAIL PROTECTED], [EMAIL PROTECTED] [EMAIL PROTECTED], pgsql-performance@postgresql.org tgresql.orgSubject: Re: [PERFORM] PostgreSQL vs. Oracle vs. Microsoft 01/10/2005 06:29 PM On Mon, Jan 10, 2005 at 12:46:01PM -0500, Alex Turner wrote: You sir are correct! You can't use perl in MS-SQL or Oracle ;). On the other hand, PL/SQL is incredibly powerful, especially combined with all the tools/utilities that come with Oracle. I think you'd be hard-pressed to find too many real-world examples where you could do something with a PostgreSQL procedural language that you couldn't do with PL/SQL. -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED] ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] Low Performance for big hospital server ..
In my younger days I denormalized a database for performance reasons and have been paid for it dearly with increased maintenance costs. Adding enhanced capabilities and new functionality will render denormalization worse than useless quickly. --Rick Frank Wiles [EMAIL PROTECTED] To: Josh Berkus josh@agliodbs.com Sent by: cc: pgsql-performance@postgresql.org [EMAIL PROTECTED]Subject: Re: [PERFORM] Low Performance for big hospital server .. tgresql.org 01/06/2005 12:12 PM On Thu, 6 Jan 2005 09:06:55 -0800 Josh Berkus josh@agliodbs.com wrote: I can't tell you how many times I've seen this sort of thing. And the developers always tell me Well, we denormalized for performance reasons ... Now that's rich. I don't think I've ever seen a database perform worse after it was normalized. In fact, I can't even think of a situation where it could! - Frank Wiles [EMAIL PROTECTED] http://www.wiles.org - ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Query Performance and IOWait
Andrew, It seems that you could combine the subquery's WHERE clause with the main query's to produce a simpler query, i.e. one without a subquery. Rick Andrew Janian [EMAIL PROTECTED]To: [EMAIL PROTECTED] Sent by: cc: [EMAIL PROTECTED]Subject: [PERFORM] Query Performance and IOWait tgresql.org 11/18/2004 08:42 AM Hello All, I have a setup with a Dell Poweredge 2650 with Red Hat and Postgres 7.4.5 with a database with about 27GB of data. The table in question has about 35 million rows. I am running the following query: SELECT * FROM mb_fix_message WHERE msg_client_order_id IN ( SELECT msg_client_order_id FROM mb_fix_message WHERE msg_log_time = '2004-06-01' AND msg_log_time '2004-06-01 13:30:00.000' AND msg_message_type IN ('D','G') AND mb_ord_type = '1' ) AND msg_log_time '2004-06-01' AND msg_log_time '2004-06-01 23:59:59.999' AND msg_message_type = '8' AND (mb_raw_text LIKE '%39=1%' OR mb_raw_text LIKE '%39=2%'); with the following plan: QUERY PLAN Nested Loop IN Join (cost=0.00..34047.29 rows=1 width=526) - Index Scan using mfi_log_time on mb_fix_message (cost=0.00..22231.31 rows=2539 width=526) Index Cond: ((msg_log_time '2004-06-01 00:00:00'::timestamp without time zone) AND (msg_log_time '2004-06-01 23:59:59.999'::timestamp without time zone)) Filter: (((msg_message_type)::text = '8'::text) AND (((mb_raw_text)::text ~~ '%39=1%'::text) OR ((mb_raw_text)::text ~~ '%39=2%'::text))) - Index Scan using mfi_client_ordid on mb_fix_message (cost=0.00..445.56 rows=1 width=18) Index Cond: ((outer.msg_client_order_id)::text = (mb_fix_message.msg_client_order_id)::text) Filter: ((msg_log_time = '2004-06-01 00:00:00'::timestamp without time zone) AND (msg_log_time '2004-06-01 13:30:00'::timestamp without time zone) AND ((msg_message_type)::text = 'D'::text) OR ((msg_message_type)::text = 'G'::text)) AND ((mb_ord_type)::text = '1'::text)) While running, this query produces 100% iowait usage on its processor and takes a ungodly amount of time (about an hour). The postgres settings are as follows: shared_buffers = 32768 # min 16, at least max_connections*2, 8KB each sort_mem = 262144 # min 64, size in KB And the /etc/sysctl.conf has: kernel.shmall = 274235392 kernel.shmmax = 274235392 The system has 4GB of RAM. I am pretty sure of these settings, but only from my reading of the docs and others' recommendations online. Thanks, Andrew Janian OMS Development Scottrade Financial Services (314) 965-1555 x 1513 Cell: (314) 369-2083 ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
[PERFORM] Does PostgreSQL run with Oracle?
My basic question to the community is is PostgreSQL approximately as fast as Oracle? I don't want benchmarks, they're BS. I want a gut feel from this community because I know many of you are in mixed shops that run both products, or have had experience with both. I fully intend to tune, vacuum, analyze, size buffers, etc. I've read what people have written on the topic, and from that my gut feel is that using PostgreSQL will not adversely affect performance of my application versus Oracle. I know it won't adversely affect my pocket book. I also know that requests for help will be quick, clear, and multifaceted. I'm currently running single processor UltraSPARC workstations, and intend to use Intel Arch laptops and Linux. The application is a big turnkey workstation app. I know the hardware switch alone will enhance performance, and may do so to the point where even a slower database will still be adequate. Whadyall think? Thanks, Rick ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]