Re: [PERFORM] Tsearch2 Initial Search Speed
Alan Hodgson wrote: It's because everything is cached, in particular the relevant rows from the email table (accessing which took 22 of the original 27 seconds). The plan looks good for what it's doing. I don't see that query getting much faster unless you could add a lot more cache RAM; 30K random IOs off disk is going to take a fair bit of time regardless of what you do. Thanks Alan, I guessed that the caching was the difference, but I do not understand why there is a heap scan on the email table? The query seems to use the email_fts_index correctly, which only takes 6 seconds, why does it then need to scan the email table? Sorry If I sound a bit stupid - I am not very experienced with the analyse statement. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Tsearch2 Initial Search Speed
I think I may have answered my own question partially, the problem may be how I structure the query. I always structured my tsearch queries as follows following my initial read of the tsearch2 instructions... select email_id from email, to_tsquery('default', 'howard') as q where q@@fts; However if I construct them in the following way, as stipulated in the 8.3 documentation select email_id from email where fts@@to_tsquery('default','howard') Then the results are better due to the fact that the email table is not necessarily scanned as can be seen from the two analyse statements: Original statement: Nested Loop (cost=4.40..65.08 rows=16 width=8) - Function Scan on q (cost=0.00..0.01 rows=1 width=32) - Bitmap Heap Scan on email (cost=4.40..64.87 rows=16 width=489) Filter: (email.fts @@ q.q) - Bitmap Index Scan on email_fts_index (cost=0.00..4.40 rows=16 width=0) Index Cond: (email.fts @@ q.q) Second statement: Bitmap Heap Scan on email (cost=4.40..64.91 rows=16 width=8) Filter: (fts @@ '''howard'''::tsquery) - Bitmap Index Scan on email_fts_index (cost=0.00..4.40 rows=16 width=0) Index Cond: (fts @@ '''howard'''::tsquery) This misses out the random access of the email table, turning my 27 second query into 6 seconds. I guess the construction of the first statement effectively stops the query optimisation from working. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Tsearch2 Initial Search Speed
On Tue, 17 Jun 2008, Howard Cole wrote: Alan Hodgson wrote: It's because everything is cached, in particular the relevant rows from the email table (accessing which took 22 of the original 27 seconds). Thanks Alan, I guessed that the caching was the difference, but I do not understand why there is a heap scan on the email table? The query seems to use the email_fts_index correctly, which only takes 6 seconds, why does it then need to scan the email table? It's not a sequential scan - that really would take a fair time. It's a bitmap heap scan - that is, it has built a bitmap of the rows needed by using the index, and now it needs to fetch all those rows from the email table. There's 14938 of them, and they're likely scattered all over the table, so you'll probably have to do 14938 seeks on the disc. At 5ms a pop, that would be 70 seconds, so count yourself lucky it only takes 22 seconds instead! If you aren't actually interested in having all 14938 rows splurged at you, try using the LIMIT keyword at the end of the query. That would make it run a bit faster, and would make sense if you only want to display the first twenty on a web page or something. Matthew -- For every complex problem, there is a solution that is simple, neat, and wrong. -- H. L. Mencken -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Tsearch2 Initial Search Speed
On Tue, 17 Jun 2008, Howard Cole wrote: I think I may have answered my own question partially, the problem may be how I structure the query. Original statement: Nested Loop (cost=4.40..65.08 rows=16 width=8) - Function Scan on q (cost=0.00..0.01 rows=1 width=32) - Bitmap Heap Scan on email (cost=4.40..64.87 rows=16 width=489) Filter: (email.fts @@ q.q) - Bitmap Index Scan on email_fts_index (cost=0.00..4.40 rows=16 width=0) Index Cond: (email.fts @@ q.q) Second statement: Bitmap Heap Scan on email (cost=4.40..64.91 rows=16 width=8) Filter: (fts @@ '''howard'''::tsquery) - Bitmap Index Scan on email_fts_index (cost=0.00..4.40 rows=16 width=0) Index Cond: (fts @@ '''howard'''::tsquery) As far as I can see, that shouldn't make any difference. Both queries still do the bitmap heap scan, and have almost exactly the same cost. Matthew -- Lord grant me patience, and I want it NOW! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Tsearch2 Initial Search Speed
As far as I can see, that shouldn't make any difference. Both queries still do the bitmap heap scan, and have almost exactly the same cost. Matthew You may have a point there Matthew, they both appear to do a scan on the email table (Why?). But for whatever reason, I swear the second method is significantly faster! If I run the new style query first, then the original style (to_tsquery as q) then the original style still takes longer, even with the new style cached! Incidentally, how can I clear the cache in between queries? -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Tsearch2 Initial Search Speed
On Tue, 17 Jun 2008, Howard Cole wrote: They both appear to do a scan on the email table (Why?). The indexes don't contain copies of the row data. They only contain pointers to the rows in the table. So once the index has been consulted, Postgres still needs to look at the table to fetch the actual rows. Of course, it only needs to bother looking where the index points, and that is the benefit of an index. Matthew -- I've run DOOM more in the last few days than I have the last few months. I just love debugging ;-) -- Linus Torvalds -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Tsearch2 Initial Search Speed
Matthew Wakeling wrote: On Tue, 17 Jun 2008, Howard Cole wrote: They both appear to do a scan on the email table (Why?). The indexes don't contain copies of the row data. They only contain pointers to the rows in the table. So once the index has been consulted, Postgres still needs to look at the table to fetch the actual rows. Of course, it only needs to bother looking where the index points, and that is the benefit of an index. Matthew Thanks for your patience with me here Matthew, But what I don't understand is why it needs to do a scan on email. If I do a query that uses another index, then it uses the index only and does not scan the email table. The scan on the fts index takes 6 seconds, which presumably returns email_id's (the email_id being the primary key) - what does it then need from the email table that takes 22 seconds? e.g. triohq= explain select email_id from email where email_directory_id=1; QUERY PLAN - Index Scan using email_email_directory_id_idx on email (cost=0.00..129.01 rows =35 width=8) Index Cond: (email_directory_id = 1) (2 rows) -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Tsearch2 Initial Search Speed
On Tue, 17 Jun 2008, Howard Cole wrote: If I do a query that uses another index, then it uses the index only and does not scan the email table. Not true. It only looks a little bit like that from the explain output. However, if you look closely: Index Scan using email_email_directory_id_idx on email (cost=0.00..129.01 rows=35 width=8) Index Cond: (email_directory_id = 1) (2 rows) It's a scan *using* the index, but *on* the table email. This index scan is having to read the email table too. The scan on the fts index takes 6 seconds, which presumably returns email_id's (the email_id being the primary key) - what does it then need from the email table that takes 22 seconds? Actually, the index returns page numbers in the table on disc which may contain one or more rows that are relevant. Postgres has to fetch the whole row to find out the email_id and any other information, including whether the row is visible in your current transaction (concurrency control complicates it all). Just having a page number isn't much use to you! Matthew -- First law of computing: Anything can go wro sig: Segmentation fault. core dumped. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Migration Articles.. ???
On Tue, 17 Jun 2008, sathiya psql wrote: I have a database in postgres X.Y which has around 90 tables, and lot of data in it. In the next version of that product, i had some more tables, so how to migrate that,. there may be 150 tables., in that 90 tables, 70 may be the same, 20 got deleted, and 80 may be new., i want the 70 tables to have same data as it is., Please do not cross-post. This question has nothing to do with performance. (Cross-posting answer so everyone else doesn't answer the same.) You'll want to dump the source database selectively, and then reload the dump into a new database. RTFM on pg_dump, especially the -t and -T options. Matthew -- All of this sounds mildly turgid and messy and confusing... but what the heck. That's what programming's all about, really -- Computer Science Lecturer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Tsearch2 Initial Search Speed
Actually, the index returns page numbers in the table on disc which may contain one or more rows that are relevant. Postgres has to fetch the whole row to find out the email_id and any other information, including whether the row is visible in your current transaction (concurrency control complicates it all). Just having a page number isn't much use to you! Matthew I learn something new every day. Thanks Matthew. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Which hardware ?
Hi, I need to install a 8.3 database and was wondering which hardware would be sufficient to have good performances (less than 30s for² slowest select). Database size: 25 Go /year, 5 years of history One main table containing 40 million lines per year. Batch inserts of 10 lines. Very very few deletes, few updates. 30 other tables, 4 levels of hierarchy, containing from 10 lines up to 2 lines. 5 of them have forein keys on the main table. I will use table partitionning on the year column. Statements will mainly do sums on the main table, grouped by whatever column of the database (3-5 joined tables, or join on join), with some criterions that may vary, lots of joined varchar in ('a','b',...,'z'). It's almost impossible to predict what users will do via the webapplication that queries this database: almost all select, join, group by, where... possibilities are available. Up to 4 simultaneous users. I'm planning to host it on a quad xeon 2.66Ghz with 8Go of DDR2, and a dual (RAID1) SATA2 750Go HD. Perharps with another HD for indexes. Do you think it will be enough ? Is another RAID for better performances a minimum requirement ? Will a secondary HD for indexes help ? Which OS would you use ? (knowing that there will be a JDK 1.6 installed too) With 5 millions of lines, the same application runs quite fast on windows 2000 on a single P4 2.8 GHz (very few statements last more than 10s, mostly when concurrent statements are made). Each statement consumes 100% of the CPU. thanks for advices. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Which hardware ?
On Tue, Jun 17, 2008 at 7:38 AM, Lionel [EMAIL PROTECTED] wrote: Hi, I need to install a 8.3 database and was wondering which hardware would be sufficient to have good performances (less than 30s for² slowest select). Database size: 25 Go /year, 5 years of history One main table containing 40 million lines per year. Batch inserts of 10 lines. Very very few deletes, few updates. 30 other tables, 4 levels of hierarchy, containing from 10 lines up to 2 lines. 5 of them have forein keys on the main table. I will use table partitionning on the year column. Statements will mainly do sums on the main table, grouped by whatever column of the database (3-5 joined tables, or join on join), with some criterions that may vary, lots of joined varchar in ('a','b',...,'z'). It's almost impossible to predict what users will do via the webapplication that queries this database: almost all select, join, group by, where... possibilities are available. Up to 4 simultaneous users. I'm planning to host it on a quad xeon 2.66Ghz with 8Go of DDR2, and a dual (RAID1) SATA2 750Go HD. Perharps with another HD for indexes. Do you think it will be enough ? Is another RAID for better performances a minimum requirement ? Will a secondary HD for indexes help ? More drives, all in the same RAID-10 setup. For reporting like this writing speed often isn't that critical, so you are often better off with software RAID-10 than using a mediocre hardware RAID controller (most adapatecs, low end LSI, etc...) You'd be surprised what going from a 2 disk RAID1 to a 4 disk RAID10 can do in these circumstances. Going up to 6, 8, 10 or more disks really makes a difference. Which OS would you use ? (knowing that there will be a JDK 1.6 installed too) I'd use RHEL5 because it's what I'm familiar with. Any stable flavor of linux or FreeBSD7 are good performance choices if you know how to drive them. With 5 millions of lines, the same application runs quite fast on windows 2000 on a single P4 2.8 GHz (very few statements last more than 10s, mostly when concurrent statements are made). Each statement consumes 100% of the CPU. That statement about concurrent statements REALLY sells me on the idea of a many disk RAID10 here. I'd take that over quad cores for what you're doing any day. Not that I'd turn down quad cores here either. :) -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Which hardware ?
On Tue, Jun 17, 2008 at 03:38:59PM +0200, Lionel wrote: Hi, I need to install a 8.3 database and was wondering which hardware would be sufficient to have good performances (less than 30s for� slowest select). Statements will mainly do sums on the main table, grouped by whatever column of the database (3-5 joined tables, or join on join), with some criterions that may vary, lots of joined varchar in ('a','b',...,'z'). It's almost impossible to predict what users will do via the webapplication that queries this database: almost all select, join, group by, where... possibilities are available. I'm not sure that I have any specific recommendation to make in the face of such sweeping requirements. But I'd say you need to make I/O cheap, which means piles of memory and extremely fast disk subsystems. Also, there's another important question (which never gets asked in these discussions), which is, How much is the performance worth to you? If the last 10% of users get something longer than 30s, but less than 40s, and they will pay no more to get the extra 10s response time, then it's worth nothing to you, and you shouldn't fix it. Up to 4 simultaneous users. You won't need lots of processer, then. I'm planning to host it on a quad xeon 2.66Ghz with 8Go of DDR2, and a dual (RAID1) SATA2 750Go HD. Perharps with another HD for indexes. How big's the database? If you can have enough memory to hold the whole thing, including all indexes, in memory, that's what you want. Apart from that, dual SATA2 is probably underpowered. But. . . Which OS would you use ? (knowing that there will be a JDK 1.6 installed too) . . .I think this is the real mistake. Get a separate database box. It's approximately impossible to tune a box correctly for both your application and your database, in my experience. A -- Andrew Sullivan [EMAIL PROTECTED] +1 503 667 4564 x104 http://www.commandprompt.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Which hardware ?
On Tue, 17 Jun 2008, Lionel wrote: I need to install a 8.3 database and was wondering which hardware would be sufficient to have good performances (less than 30s for² slowest select). It's almost impossible to predict what users will do via the webapplication that queries this database: almost all select, join, group by, where... possibilities are available. Well, Scott has given you some good pointers on how to make a fast system. However, your original question (is this fast enough) is impossible to answer, especially if the users are allowed to run absolutely anything they want. I bet you I could craft a query that takes more than 30 seconds regardless of how fast you make your system. Having said that, I'll add the suggestion that you should put as much RAM in the box as possible. It can only help. As others have said, if you only have four users, then CPU power isn't going to be such a problem, and given that, I'd disagree with Andrew and say as long as you have plenty of RAM, Java can play well with a database on the same box. Depends what it is doing, of course. Matthew -- To be or not to be -- Shakespeare To do is to be -- Nietzsche To be is to do -- Sartre Do be do be do -- Sinatra -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Which hardware ?
Andrew Sullivan wrote: You won't need lots of processer, then. can't find less than quad core for this price range... How big's the database? with 20 millions of rows, the main table is 3.5 Go on win XP. With 8 Go of indexes. I estimate the whole database around 30 Go / year If you can have enough memory to hold the whole thing, including all indexes, in memory, that's what you want. Apart from that, dual SATA2 is probably underpowered. But. . . RAID is twice more expansive. (600euros/month for a 5x750Go SATA2 with 12Gb of ram and unnecessary 2x quad core) didn't find any RAID 10 not too expansive dedicated server. If this setup is twice as fast, I can afford it. But if it a 30sec VS 40sec...I'm not sure my customer will pay. Which OS would you use ? (knowing that there will be a JDK 1.6 installed too) . . .I think this is the real mistake. Get a separate database box. It's approximately impossible to tune a box correctly for both your application and your database, in my experience. My tomcat webapp is well coded and consumes nearly nothing. On such powerful hardware, I prefer to run both on the same server. I could eventually run it on a different server, much less powerfull, but it's not on the same network, I guess this would be an issue. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Which hardware ?
On Tue, 17 Jun 2008, Andrew Sullivan wrote: Which OS would you use ? (knowing that there will be a JDK 1.6 installed too) . . .I think this is the real mistake. Get a separate database box. It's approximately impossible to tune a box correctly for both your application and your database, in my experience. I can't remember the last time I deployed a PG box that didn't have a Java app or three on it, too. You've got even odds that putting it a separate system will even be a improvement. Yes, if the Java app is a pig and the machine doesn't have enough resources, separating it out to another system will help. But there are plenty of these buggers that will end up so much slower from the additional network latency that it's a net loss (depends on how the app groups its requests for rows). If you know enough about Java to watch things like how much memory the JVMs are taking up, I wouldn't hesitate to put them all on the same machine. Considering that Lionel's system seems pretty overpowered for what he's doing--runs plenty fast on a much slower system, enough RAM to hold a large portion of the primary tables and database, all batch updates that don't really need a good RAID setup--I'd say looks good here and recommend he just follow the plan he outlined. Just watch the system with top for a bit under load to make sure the Java processes are staying under control. As for OS, a RHEL5 or clone like CentOS should work fine here, which is more appropriate depends on your support requirements. I would recommend against using FreeBSD as it's not the friendliest Java platform, and the additional complexity of Solaris seems like overkill for your app. Basically, evem though it's available for more of them, I only consider deploying a Java app on one of the mainstream platforms listed at http://www.java.com/en/download/manual.jsp right now because those are the mature releases. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Which hardware ?
On Tue, Jun 17, 2008 at 04:49:17PM +0200, Lionel wrote: My tomcat webapp is well coded and consumes nearly nothing. If I were ever inclined to say, Nonsense, about code I've never seen, this is probably the occasion on which I'd do it. A running JVM is necessarily going to use some memory, and that is memory use that you won't be able to factor out properly when developing models of your database system performance. I could eventually run it on a different server, much less powerfull, but it's not on the same network, I guess this would be an issue. The power of the system is hard to know about in the context (with only 8Go of memory, I don't consider this a powerful box at all, note). But why wouldn't it be on the same network? You're using the network stack anyway, note: JVMs can't go over domain sockets. A -- Andrew Sullivan [EMAIL PROTECTED] +1 503 667 4564 x104 http://www.commandprompt.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Which hardware ?
On Tue, Jun 17, 2008 at 9:32 AM, Greg Smith [EMAIL PROTECTED] wrote: Considering that Lionel's system seems pretty overpowered for what he's doing--runs plenty fast on a much slower system, enough RAM to hold a large portion of the primary tables and database, all batch updates that don't really need a good RAID setup--I'd say looks good here and recommend he just follow the plan he outlined. Just watch the system with top for a bit under load to make sure the Java processes are staying under control. In the original post he mentioned that he had 5 years of data at about 25G / year. With 125G of data, it's likely that if most queries are on recent data it'll be in RAM, but anything that hits older data will NOT have that luxury. Which is why I recommended RAID-10. It doesn't have to be on a $1200 card with 44 disks or something, but even 4 disks in a sw RAID-10 will be noticeably faster (about 2x) than a simple RAID-1 at hitting that old data. We had a reporting server with about 80G of data on a machine with 4G ram last place I worked, and it could take it a few extra seconds to hit the old data, but the SW RAID-10 on it made it much faster at reporting than it would have been with a single disk. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Tsearch2 Initial Search Speed
On Tuesday 17 June 2008, Howard Cole [EMAIL PROTECTED] wrote: This misses out the random access of the email table, turning my 27 second query into 6 seconds. It took less time because it retrieved a lot less data - it still has to look at the table. -- Alan -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Tsearch2 Initial Search Speed
On Tuesday 17 June 2008, Howard Cole [EMAIL PROTECTED] wrote: Incidentally, how can I clear the cache in between queries? Stop PostgreSQL, unmount the filesystem it's on, remount it, restart PostgreSQL. Works under Linux. If it's on a filesystem you can't unmount hot, you'll need to reboot. -- Alan -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Tsearch2 Initial Search Speed
On Tue, 17 Jun 2008, Alan Hodgson wrote: On Tuesday 17 June 2008, Howard Cole [EMAIL PROTECTED] wrote: Incidentally, how can I clear the cache in between queries? Stop PostgreSQL, unmount the filesystem it's on, remount it, restart PostgreSQL. Works under Linux. If it's on a filesystem you can't unmount hot, you'll need to reboot. Not true - on recent Linux kernels, you can drop the OS cache by running echo 1 /proc/sys/vm/drop_caches as root. You'll still need to restart Postgres to drop its cache too. Matthew -- Richards' Laws of Data Security: 1. Don't buy a computer. 2. If you must buy a computer, don't turn it on. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Which hardware ?
On Tue, Jun 17, 2008 at 9:42 AM, Andrew Sullivan [EMAIL PROTECTED] wrote: On Tue, Jun 17, 2008 at 04:49:17PM +0200, Lionel wrote: My tomcat webapp is well coded and consumes nearly nothing. If I were ever inclined to say, Nonsense, about code I've never seen, this is probably the occasion on which I'd do it. A running JVM is necessarily going to use some memory, and that is memory use that you won't be able to factor out properly when developing models of your database system performance. But if that amount of memory is 256 Megs and it only ever acts as a control panel or data access point, it's probably not a huge issue. If it's 2 Gig it's another issue. It's all about scale. The real performance hog for me on all in one boxes has been perl / fastcgi setups. The power of the system is hard to know about in the context (with only 8Go of memory, I don't consider this a powerful box at all, note). I always think of main memory in terms of how high a cache hit rate it can get me. If 8G gets you a 50% hit rate, and 16G gets you a 95% hit rate, then 16G is the way to go. But if 8G gets you to 75% and 32G gets you to 79% because of your usage patterns (the world isn't always bell curve shaped) then 8G is plenty and it's time to work on faster disk subsystems if you need more performance. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Which hardware ?
On Tue, 17 Jun 2008, Andrew Sullivan wrote: A running JVM is necessarily going to use some memory, and that is memory use that you won't be able to factor out properly when developing models of your database system performance. Now you've wandered into pure FUD. Tuning maximum memory usage on a Java app so you can model it is straightforward (albeit a little confusing at first), and in most cases you can just sample it periodically to get a good enough estimate for database tuning purposes. JVMs let you adjust maximum memory use with -Xmx , and if anything the bigger problem I run into is that using too much memory hits that limit and crashes Java long before it becomes a hazard to the database. This is a system with 8GB of RAM here; having some Tomcat instances co-existing with the database when there's that much room to work is not that hard. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Which hardware ?
On Tue, 17 Jun 2008, Scott Marlowe wrote: We had a reporting server with about 80G of data on a machine with 4G ram last place I worked, and it could take it a few extra seconds to hit the old data, but the SW RAID-10 on it made it much faster at reporting than it would have been with a single disk. I agree with your statement above, that query time could likely be dropped a few seconds with a better disk setup. I just question whether that's necessary given the performance target here. Right now the app is running on an underpowered Windows box and is returning results in around 10s, on a sample data set that sounds like 1/8 of a year worth of data (1/40 of the total). It is seemingly CPU bound with not enough processor to handle concurrent queries being the source of the worst-case behavior. The target is keeping that 30s on more powerful hardware, with at least 6X as much processor power and a more efficient OS, while using yearly partitions to keep the amount of data to juggle at once under control. That seems reasonable to me, and while better disks would be nice I don't see any evidence they're really needed here. This application sounds a batch processing/reporting one where plus or minus a few seconds doesn't have a lot of business value. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Which hardware ?
On Tue, Jun 17, 2008 at 10:56 AM, Greg Smith [EMAIL PROTECTED] wrote: On Tue, 17 Jun 2008, Scott Marlowe wrote: We had a reporting server with about 80G of data on a machine with 4G ram last place I worked, and it could take it a few extra seconds to hit the old data, but the SW RAID-10 on it made it much faster at reporting than it would have been with a single disk. I agree with your statement above, that query time could likely be dropped a few seconds with a better disk setup. I just question whether that's necessary given the performance target here. Right now the app is running on an underpowered Windows box and is returning results in around 10s, on a sample data set that sounds like 1/8 of a year worth of data (1/40 of the total). It is seemingly CPU bound with not enough processor to handle concurrent queries being the source of the worst-case behavior. The target is keeping that 30s on more powerful hardware, with at least 6X as much processor power and a more efficient OS, while using yearly partitions to keep the amount of data to juggle at once under control. That seems reasonable to me, and while better disks would be nice I don't see any evidence they're really needed here. This application sounds a batch processing/reporting one where plus or minus a few seconds doesn't have a lot of business value. I think you're making a big assumption that this is CPU bound. And it may be that when all the queries are operating on current data that it is. But as soon as a few ugly queries fire that need to read tens of gigs of data off the drives, then you'll start to switch to I/O bound and the system will slow a lot. We had a single drive box doing work on an 80G set that was just fine with the most recent bits. Until I ran a report that ran across the last year instead of the last two days, and took 2 hours to run. All the queries that had run really quickly on all the recent data suddenly were going from 1 or 2 seconds to 2 or 3 minutes. And I'd have to kill my reporting query. Moved it to the same exact hardware but with a 4 disc RAID-10 and the little queries stayed 1-2 seconds while th reporting queries were cut down by factors of about 4 to 10. RAID-1 will be somewhere between them I'd imagine. RAID-10 has an amazing ability to handle parallel accesses without falling over performance-wise. You're absolutely right though, we really need to know the value of fast performance here. If you're monitoring industrial systems you need fast enough response to spot problems before they escalate to disasters. If you're running aggregations of numbers used for filling out quarterly reports, not so much. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Which hardware ?
On Tue, Jun 17, 2008 at 11:59 AM, Lionel [EMAIL PROTECTED] wrote: Scott Marlowe wrote: You're absolutely right though, we really need to know the value of fast performance here. the main problem is that my customers are used to have their reporting after few seconds. They want do have 10 times more data but still have the same speed, which is, I think, quite impossible. If you're running aggregations of numbers used for filling out quarterly reports, not so much. The application is used to analyse products sales behaviour, display charts, perform comparisons, study progression... 10-40 seconds seems to be a quite good performance. More than 1 minute will be too slow (meaning they won't pay for that). I did some test with a 20 millions lines database on a single disk dual core 2GB win XP system (default postgresql config), most of the time is spent in I/O: 50-100 secs for statements that scan 6 millions of lines, which will happen. Almost no CPU activity. So here is the next question: 4 disks RAID10 (did not find a french web host yet) or 5 disk RAID5 (found at 600euros/month) ? I don't want to have any RAID issue... I did not have any problem with my basic RAID1 since many years, and don't want that to change. Do you have root access on your servers? then just ask for 5 disks with one holding the OS / Apps and you'll do the rest. Software RAID is probably a good fit for cheap right now. If you can set it up yourself, you might be best off with 2 disk RAID-1. 5 750G disks in a RAID-1 yields 750G of storage (duh) but allows for five different readers to operate without the heads having to seek. large amounts of data can be read at a medium speed from a RAID-1 like this. But most RAID implementations don't aggregate bandwidth for RAID-1. They do for RAID-0. So, having a huge RAID-0 zero array allows for reading a large chunk of data really fast from all disks at once. RAID1+0 gives you the ability to tune this in either direction. But the standard config of a 4 disk setup (striping two mirrors, each made from two disks, is a good compromise to start with. Average read speed of array is doubled, and the ability to have two reads not conflict helps too. RAID5 is a comproise to provide the most storage while having mediocre performance or, when degraded, horrifficaly poor performance. Hard drives are cheap, hosting not as much. Also, always look at optimizing their queries. A lot of analysis is done by brute force queries that rewritten intelligently suddenly run in minutes not hours. or seconds not minutes. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Which hardware ?
On Tue, 17 Jun 2008, Lionel wrote: I did some test with a 20 millions lines database on a single disk dual core 2GB win XP system (default postgresql config), most of the time is spent in I/O: 50-100 secs for statements that scan 6 millions of lines, which will happen. Almost no CPU activity. I hope you're aware that the default config is awful, and there are all sorts of possible causes for heavy I/O churn that might improve if you setup the postgresql.conf file to use the server's resources more aggressively (the default is setup for machines with a very small amount of RAM). There are lots of links to articles that cover the various areas you might improve at http://wiki.postgresql.org/wiki/Performance_Optimization -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Which hardware ?
From: [EMAIL PROTECTED] Subject: [PERFORM] Which hardware ? Date: Tue, 17 Jun 2008 15:38:59 +0200 To: pgsql-performance@postgresql.org Hi, I need to install a 8.3 database and was wondering which hardware would be sufficient to have good performances (less than 30s for² slowest select). Database size: 25 Go /year, 5 years of history One main table containing 40 million lines per year. Batch inserts of 10 lines. Very very few deletes, few updates. 30 other tables, 4 levels of hierarchy, containing from 10 lines up to 2 lines. 5 of them have forein keys on the main table. I will use table partitionning on the year column. Statements will mainly do sums on the main table, grouped by whatever column of the database (3-5 joined tables, or join on join), with some criterions that may vary, lots of joined varchar in ('a','b',...,'z'). It's almost impossible to predict what users will do via the webapplication that queries this database: almost all select, join, group by, where... possibilities are available. Up to 4 simultaneous users. I'm planning to host it on a quad xeon 2.66Ghz with 8Go of DDR2, and a dual (RAID1) SATA2 750Go HD. Perharps with another HD for indexes. Do you think it will be enough ? Is another RAID for better performances a minimum requirement ? Will a secondary HD for indexes help ? Which OS would you use ? (knowing that there will be a JDK 1.6 installed too) With 5 millions of lines, the same application runs quite fast on windows 2000 on a single P4 2.8 GHz (very few statements last more than 10s, mostly when concurrent statements are made). Each statement consumes 100% of the CPU. thanks for advices. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance I think hardware isnt going to solve your problem, especially the cpu. You only have four users.. and postgres can only use 1 core per query. If you have sequential scans that span this table and say it has 60-80 million rows, It can could take longer then 30 seconds. Even if you have alot of ram. Just imagine what postgres is doing... if its target search is going to end in searching 40 million rows and it has to aggregate on two, or three columns its going to be slow. No amount of hardware is going to fix this. Sure you can gain some speed by having entire tables in ram. No magic bullet here. Disk is definitely not a magic bullet.Even if you have a bunch of fast disks its still much slower then RAM in performing reads.So if you read heavy then adding more disk isnt going to just solve all your problems. RAM is nice. The more pages you can keep in ram the less reading from the disk. Even with that all said and done... aggregating lots of rows takes time.I suggest you come up with a system from preaggregating your data if possible. Identify all of your target dimensions. If your lucky, you only have a few key dimensions which can reduce size of table by lots and reduce queries to 1-2 seconds. There are a number of ways to tackle this, but postgres is a nice db to do this with, since writers do not block readers. I think you should focus on getting this system to work well with minimal hardware first. Then you can upgrade. Over the next few years the db is only going to get larger. You have 4 users now.. but who's to say what it will evolve into. _ Earn cashback on your purchases with Live Search - the search that pays you back! http://search.live.com/cashback/?pkw=form=MIJAAF/publ=HMTGL/crea=earncashback -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance