Re: [PERFORM] Feature Request --- was: PostgreSQL Performance Tuning
Greg, 1) Collect up data about their system (memory, disk layout), find out a bit about their apps/workload, and generate a config file based on that. We could start with this. Where I bogged down is that collecting system information about several different operating systems ... and in some cases generating scripts for boosting things like shmmax ... is actually quite a large problem from a slog perspective; there is no standard way even within Linux to describe CPUs, for example. Collecting available disk space information is even worse. So I'd like some help on this portion. I actually have algorithms which are good enough to start with for most of the important GUCs worked out, and others could be set through an interactive script (Does your application require large batch loads involving thousands or millions of updates in the same transaction? How large (GB) do you expect your database to be?) 2) Connect to the database and look around. Study the tables and some their stats, make some estimates based on what your find, produce a new config file. I'm not sure that much more for (2) can be done than for (1). Tables-on-disk don't tell us much. 3) Monitor the database while it's doing its thing. See which parts go well and which go badly by viewing database statistics like pg_statio. From that, figure out where the bottlenecks are likely to be and push more resources toward them. What I've been working on lately is exposing more readouts of performance-related database internals to make this more practical. We really should collaborate on that. When first exposed to this problem, most people assume that (1) is good enough--ask some questions, look at the machine, and magically a reasonable starting configuration can be produced. It's already been pointed out that anyone with enough knowledge to do all that can probably spit out a reasonable guess for the config file without help. But that's actually more than most people already do. Further, if you don't start with a reasonable configuration, then it's difficult-impossible to analyze where your settings are out-of-whack; behavior introduced by some way-to-low settings will mask any other tuning that needs to be done. It's also hard/impossible to devise tuning algorithms that work for both gross tuning (increase shared_buffers by 100x) and fine tuning (decrease bgwriter_interval to 45ms). So whether or not we do (3), we need to do (1) first. -- Josh Berkus PostgreSQL @ Sun San Francisco ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] sytem log audit/reporting and psql
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi! Fei Liu schrieb: [...] Now here are my questions: These are a lot of questions, and some of them are not related to pgsql-performance or even PostgreSQL. I'll try to answer some of them, because I'm currently experimenting with partitioned tables, too. 2) Is partitioning a good approach to speed up log query/view? The user comment in partitioning in pgsql manual seems to indicate partitioning may be slower than non-partitioned table under certain circumstances. You can look at table partitioning under several points of view. My experience with table partitioning under the aspect of performance is: *) It is a benefit for large tables only, and the definition of large depends on your data. I did some testing for an application we are developing and here it shows that table partitioning does not seem to make sense for tables with less than 10 million rows (perhaps even more) *) The performance benefit depends on your queries. Some queries get a big improvement, but some queries might even run significantly slower. *) Depending on the way you setup your system, inserts can be much slower with partitioned tables (e.g. if you are using triggers to automatically create your partitions on demand) 3) How to avoid repetitive log entry scanning since my cron job script is run daily but logrotate runs weekly? This means everytime my script will be parsing duplicate entries. This has nothing to do with postgres, but I wrote something similar years ago. Here's what I did and what you could do: Remember the last line of your logfile in some external file (or even the database). Then on the next run you can read the logfile again line by line and skip all lines until you have found the line you saved on the last run. * If you find the line that way, just start parsing the logfile beginning at the next line. * If you can not find your line and you reach EOF, start parsing again at the beginning of the logfile. * If this is your first run and you don't have a line stored yet, start parsing at the beginning of the logfile When you are finished you have to remember the last line from the logfile again at some place. 6) What are the best approaches to analyze postgresql query performance and how to improve postgresql query performance? Here are some general recommendations for performance testing from my experience: *) Test with real data. For table partitioning this means you have to create really large datasets to make your tests useful. You should write a small program to generate your test data if possible or use some other means to create your test database. You also need time: creating a test database and importing 100 million rows of test data will take several hours or even days! *) Test with the real queries from your application! Testing with just a few easy standard queries will almost for sure not be sufficient to get the right numbers for the performance you will see in your application later on! Look at the thread Query performance problems with partitioned tables I started on pgsql-performance just yesterday to see what I mean! *) Use EXPLAIN ANALYZE and look at the cost and actual time numbers this gives you. It also will show you the query plan used by PostgreSQL when executing your query. You sometimes might be surprised what is going on behind the scenes... *) If you are just using some stopwatch to time your queries be aware of other factors which might significantly influence your test: Caching, other jobs running on the machine in parallel, cosmic rays, ... *) Before running your tests you should always try to get to some well defined starting point (this might even mean rebooting your server before running each test) and you should always repeat each test several times and then calculate a mean value (and standard deviation to see how good your results are...) *) Document your test setup and procedure as well as your results (otherwise two days later you won't remember which test obtained what result) HTH - - andreas - -- Andreas Haumer | mailto:[EMAIL PROTECTED] *x Software + Systeme | http://www.xss.co.at/ Karmarschgasse 51/2/20 | Tel: +43-1-6060114-0 A-1100 Vienna, Austria | Fax: +43-1-6060114-71 -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFGN3NIxJmyeGcXPhERAspaAJ9MgymiwyehN6yU6jGtA0pbkdolsACfb6JC kB5KLyQ5WOTUD9uabVzsjwY= =3QSa -END PGP SIGNATURE- ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] index structure for 114-dimension vector
Let me just thank the list, especially for the references. (I found similar papers myself with Google: and to think I have a university library alumni card and barely need it any more!) I'll write again on the sorts of results I get. BEGIN:VCARD VERSION:2.1 N:Lazarus;Andrew;;;Ph.D. FN:Andrew Lazarus, Ph.D. EMAIL;PREF;INTERNET:[EMAIL PROTECTED] TITLE:Director of RD ADR;WORK:;800-366-0688;3028 Fillmore Street;San Francisco;CA;94123;USA LABEL;WORK;ENCODING=QUOTED-PRINTABLE:800-366-0688=0D=0A3028 Fillmore S= treet=0D=0ASan Francisco=0D=0ACA=0D=0A94123=0D=0AUSA X-GENDER:Male REV:18991230T08Z END:VCARD ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] index structure for 114-dimension vector
On 5/1/07, Andrew Lazarus [EMAIL PROTECTED] wrote: Let me just thank the list, especially for the references. (I found similar papers myself with Google: and to think I have a university library alumni card and barely need it any more!) I'll write again on the sorts of results I get. Looking forward to hearing about them. I have worked with such dataset problems, but never attempted to apply them to a relational database such as PostgreSQL. If you want speed, nothing beats in-memory vectors on a modern CPU architecture. Alexander. ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Feature Request --- was: PostgreSQL Performance Tuning
large problem from a slog perspective; there is no standard way even within Linux to describe CPUs, for example. Collecting available disk space information is even worse. So I'd like some help on this portion. Quite likely, naiveness follows... But, aren't things like /proc/cpuinfo , /proc/meminfo, /proc/partitions / /proc/diskstats standard, at the very least across Linux distros? I'm not familiar with BSD or other Unix flavours, but I would expect these (or their equivalent) to exist in those, no? Am I just being naive? Carlos -- ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Feature Request --- was: PostgreSQL Performance Tuning
On Tue, 1 May 2007, Carlos Moreno wrote: large problem from a slog perspective; there is no standard way even within Linux to describe CPUs, for example. Collecting available disk space information is even worse. So I'd like some help on this portion. Quite likely, naiveness follows... But, aren't things like /proc/cpuinfo , /proc/meminfo, /proc/partitions / /proc/diskstats standard, at the very least across Linux distros? I'm not familiar with BSD or other Unix flavours, but I would expect these (or their equivalent) to exist in those, no? Am I just being naive? unfortunantly yes. across different linux distros they are fairly standard (however different kernel versions will change them) however different kernels need drasticly different tools to get the info from them. David Lang ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [PERFORM] Feature Request --- was: PostgreSQL Performance Tuning
On Mon, 30 Apr 2007, Kevin Hunter wrote: I recognize that PostgreSQL and MySQL try to address different problem-areas, but is this one reason why a lot of people with whom I talk prefer MySQL? Because PostgreSQL is so slooow out of the box? It doesn't help, but there are many other differences that are as big or bigger. Here are a few samples off the top of my head: 1) Performance issues due to MVCC (MySQL fans love to point out how fast they can do select count(*) from x) 2) Not knowing you have to run vacuum analyze and therefore never seeing a good query result 3) Unfair comparison of PostgreSQL with robust WAL vs. MySQL+MyISAM on write-heavy worksloads These are real issues, which of course stack on top of things like outdated opinions from older PG releases with performance issues resolved in the last few years. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] Feature Request --- was: PostgreSQL Performance Tuning
On Tue, 1 May 2007, Josh Berkus wrote: there is no standard way even within Linux to describe CPUs, for example. Collecting available disk space information is even worse. So I'd like some help on this portion. I'm not fooled--secretly you and your co-workers laugh at how easy this is on Solaris and are perfectly happy with how difficult it is on Linux, right? I joke becuase I've been re-solving some variant on this problem every few years for a decade now and it just won't go away. Last time I checked the right answer was to find someone else who's already done it, packaged that into a library, and appears committed to keeping it up to date; just pull a new rev of that when you need it. For example, for the CPU/memory part, top solves this problem and is always kept current, so on open-source platforms there's the potential to re-use that code. Now that I know that's one thing you're (understandably) fighting with I'll dig up my references on that (again). It's also hard/impossible to devise tuning algorithms that work for both gross tuning (increase shared_buffers by 100x) and fine tuning (decrease bgwriter_interval to 45ms). I would advocate focusing on iterative improvements to an existing configuration rather than even bothering with generating a one-off config for exactly this reason. It *is* hard/impossible to get it right in a single shot, because of how many parameters interact and the way bottlenecks clear, so why not assume from the start you're going to do it several times--then you've only got one piece of software to write. The idea I have in my head is a tool that gathers system info, connects to the database, and then spits out recommendations in order of expected effectiveness--with the specific caveat that changing too many things at one time isn't recommended, and some notion of parameter dependencies. The first time you run it, you'd be told that shared_buffers was wildly low, effective_cache_size isn't even in the right ballpark, and your work_mem looks small relative to the size of your tables; fix those before you bother doing anything else because any data collected with those at very wrong values is bogus. Take two, those parameters pass their sanity tests, but since you're actually running at a reasonable speed now the fact that your tables are no longer being vacuumed frequently enough might bubble to the top. It would take a few passes through to nail down everything, but as long as it's put together such that you'd be in a similar position to the single-shot tool after running it once it would remove that as something separate that needed to be built. To argue against myself for a second, it may very well be the case that writing the simpler tool is the only way to get a useful prototype for building the more complicated one; very easy to get bogged down in feature creep on a grand design otherwise. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Feature Request --- was: PostgreSQL Performance Tuning
On Tue, 1 May 2007, Greg Smith wrote: On Tue, 1 May 2007, Josh Berkus wrote: there is no standard way even within Linux to describe CPUs, for example. Collecting available disk space information is even worse. So I'd like some help on this portion. what type of description of the CPU's are you looking for? It's also hard/impossible to devise tuning algorithms that work for both gross tuning (increase shared_buffers by 100x) and fine tuning (decrease bgwriter_interval to 45ms). I would advocate focusing on iterative improvements to an existing configuration rather than even bothering with generating a one-off config for exactly this reason. It *is* hard/impossible to get it right in a single shot, because of how many parameters interact and the way bottlenecks clear, so why not assume from the start you're going to do it several times--then you've only got one piece of software to write. nobody is asking for things to be right the first time. The idea I have in my head is a tool that gathers system info, connects to the database, and then spits out recommendations in order of expected effectiveness--with the specific caveat that changing too many things at one time isn't recommended, and some notion of parameter dependencies. The first time you run it, you'd be told that shared_buffers was wildly low, effective_cache_size isn't even in the right ballpark, and your work_mem looks small relative to the size of your tables; fix those before you bother doing anything else because any data collected with those at very wrong values is bogus. why not have a much simpler script that gets these values up into the right ballpark first? then after that the process and analysis that you are suggesting would be useful. the problem is that the defaults are _so_ far off that no sane incremental program is going to be able to converge on the right answer rapidly. David Lang Take two, those parameters pass their sanity tests, but since you're actually running at a reasonable speed now the fact that your tables are no longer being vacuumed frequently enough might bubble to the top. It would take a few passes through to nail down everything, but as long as it's put together such that you'd be in a similar position to the single-shot tool after running it once it would remove that as something separate that needed to be built. To argue against myself for a second, it may very well be the case that writing the simpler tool is the only way to get a useful prototype for building the more complicated one; very easy to get bogged down in feature creep on a grand design otherwise. ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings