Re: [PERFORM] tsearch2/GIST performance factors?
Oleg wrote: Did you consider *decreasing* SIGLENINT ? Size of index will diminish and performance could be increased. I use in current project SIGLENINT=15 The default value for SIGLENINT actually didn't work at all. It was only by increasing it that I got any performance at all. An examination of the GIST indexes showed that most of the first level and many of the second level bitmaps were saturated. tsearch2's index is a lossy index, read http://www.sai.msu.su/~megera/oddmuse/index.cgi/Tsearch_V2_internals so search results should be rechecked ! Yes, thanks. We do indeed recheck the actual results. The tests I'm running are just on the raw index performance - how long does it take to select ... where dockeys @@ to_tsquery(...). We have our TODO http://www.sai.msu.su/~megera/oddmuse/index.cgi/todo and hope to find sponsorhips for fts project for 8.2 release. Unfortunately, I didn't find spare time to package tsearchd for you, it should certainly help you. At this point we may not have time to try tsearchd, and unfortunately we're not in a position to sponsor anything yet. My original question is still bothering me. Is it normal for a keyword that occurs in more than about 2% of the documents to cause such inconsistent performance? Is there any single thing I might look at that would help improve performance (like, do I need more memory? More shared memory? Different config parameters?) Thanks, Craig ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Sequential scan on FK join
Martin Nickel wrote: When I turn of seqscan it does use the index - and it runs 20 to 30% longer. Based on that, the planner is correctly choosing a sequential scan - but that's just hard for me to comprehend. I'm joining on an int4 key, 2048 per index page - I guess that's a lot of reads - then the data -page reads. Still, the 8-minute query time seems excessive. You'll be getting (many) fewer than 2048 index entries per page. There's a page header and various pointers involved too, and index pages aren't going to be full. So - it needs to search the table on dates, fetch the id's and then assemble them for the hash join. Of course, if you have too many to join then all this will spill to disk slowing you further. Now, you'd rather get down below 8 minutes. There are a number of options: 1. Make sure your disk i/o is being pushed to its limit 2. Look into increasing the sort memory for this one query set work_mem... (see the runtime configuration section of the manual) 3. Actually - are you happy that your general configuration is OK? 4. Perhaps use a cursor - I'm guessing you want to process these mailings in some way and only want them one at a time in any case. 5. Try the query one day at a time and see if the balance tips the other way - you'll be dealing with substantially less data per query which might match your system better. Of course, this may not be practical for your applicaton. 6. If your lead table is updated only rarely, you could try a CLUSTER on the table by mailing_id - that should speed the scan. Read the manual for the cluster command first though. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Help tuning postgres
reindex should be faster, since you're not dumping/reloading the table contents on top of rebuilding the index, you're just rebuilding the index. Robert Treat emdeon Practice Services Alachua, Florida On Wed, 2005-10-12 at 13:32, Steve Poe wrote: Would it not be faster to do a dump/reload of the table than reindex or is it about the same? Steve Poe On Wed, 2005-10-12 at 13:21 -0400, Tom Lane wrote: Emil Briggs [EMAIL PROTECTED] writes: Not yet, the db is in production use and I have to plan for a down-time for that... or is it not impacting the activity on the table ? It will cause some performance hit while you are doing it. It'll also lock out writes on the table until the index is rebuilt, so he does need to schedule downtime. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings -- Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] Help tuning postgres
In the light of what you've explained below about nonremovable row versions reported by vacuum, I wonder if I should worry about the following type of report: INFO: vacuuming public.some_table INFO: some_table: removed 29598 row versions in 452 pages DETAIL: CPU 0.01s/0.04u sec elapsed 18.77 sec. INFO: some_table: found 29598 removable, 39684 nonremovable row versions in 851 pages DETAIL: 0 dead row versions cannot be removed yet. There were 0 unused item pointers. 0 pages are entirely empty. CPU 0.02s/0.07u sec elapsed 23.16 sec. VACUUM Does that mean that 39684 nonremovable pages are actually the active live pages in the table (as it reports 0 dead) ? I'm sure I don't have any long running transaction, at least according to pg_stats_activity (backed by the linux ps too). Or I should run a vacuum full... This table is one of which has frequently updated rows. TIA, Csaba. On Thu, 2005-10-13 at 14:40, Andrew Sullivan wrote: On Thu, Oct 13, 2005 at 10:15:03AM +0200, Csaba Nagy wrote: OK, this sounds interesting, but I don't understand: why would an update chase down a lot of dead tuples ? Should I read up on some docs, cause I obviously don't know enough about how updates work on postgres... Right. Here's the issue: MVCC does not replace rows when you update. Instead, it marks the old row as expired, and sets the new values. The old row is still there, and it's available for other transactions who need to see it. As the docs say (see http://www.postgresql.org/docs/8.0/interactive/transaction-iso.html), In effect, a SELECT query sees a snapshot of the database as of the instant that that query begins to run. And that can be true because the original data is still there, although marked as expired for subsequent transactions. UPDATE works the same was as SELECT in terms of searching for rows (so does any command that searches for data). Now, when you select data, you actually have to traverse all the existing versions of the tuple in order to get the one that's live for you. This is normally not a problem: VACUUM goes around and cleans out old, expired data that is not live for _anyone_. It does this by looking for the oldest transaction that is open. (As far as I understand it, this is actually the oldest transaction in the entire back end; but I've never understood why that should the the case, and I'm too incompetent/dumb to understand the code, so I may be wrong on this point.) If you have very long-running transactions, then, you can end up with a lot of versions of dead tuples on the table, and so reading the few records you want can turn out actually to be a very expensive operation, even though it ought to be cheap. You can see this by using the VERBOSE option to VACUUM: test=# VACUUM VERBOSE eval1 ; INFO: vacuuming public.eval1 INFO: eval1: found 0 removable, 0 nonremovable row versions in 0 pages DETAIL: 0 dead row versions cannot be removed yet. There were 0 unused item pointers. 0 pages are entirely empty. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: vacuuming pg_toast.pg_toast_18831 INFO: index pg_toast_18831_index now contains 0 row versions in 1 pages DETAIL: 0 index pages have been deleted, 0 are currently reusable. CPU 0.00s/0.00u sec elapsed 0.00 sec. INFO: pg_toast_18831: found 0 removable, 0 nonremovable row versions in 0 pages DETAIL: 0 dead row versions cannot be removed yet. There were 0 unused item pointers. 0 pages are entirely empty. CPU 0.00s/0.00u sec elapsed 0.00 sec. VACUUM Note those removable and nonremovable row versions. It's the unremovable ones that can hurt. WARNING: doing VACUUM on a big table on a disk that's already pegged is going to cause you performance pain, because it scans the whole table. In some cases, though, you have no choice: if the winds are already out of your sails, and you're effectively stopped, anything that might get you moving again is an improvement. And how would the analyze help in finding this out ? I thought it would only show me additionally the actual timings, not more detail in what was done... Yes, it shows the actual timings, and the actual number of rows. But if the estimates that the planner makes are wildly different than the actual results, then you know your statistics are wrong, and that the planner is going about things the wrong way. ANALYSE is a big help. There's also a verbose option to it, but it's usually less useful in production situations. A ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Help tuning postgres
First of all thanks all for the input. I probably can't afford even the reindex till Christmas, when we have about 2 weeks of company holiday... but I guess I'll have to do something until Christmas. The system should at least look like working all the time. I can have downtime, but only for short periods preferably less than 1 minute. The tables we're talking about have ~10 million rows the smaller ones and ~150 million rows the bigger ones, and I guess reindex will take quite some time. I wonder if I could device a scheme like: - create a temp table exactly like the production table, including indexes and foreign keys; - create triggers on the production table which log all inserts, deletes, updates to a log table; - activate these triggers; - copy all data from the production table to a temp table (this will take the bulk of the time needed for the whole operation); - replay the log on the temp table repeatedly if necessary, until the temp table is sufficiently close to the original; - rename the original table to something else, and then rename the temp table to the original name, all this in a transaction - this would be ideally the only visible delay for the user, and if the system is not busy, it should be quick I guess; - replay on more time the log; All this should happen in a point in time when there's little traffic to the data base. Replaying could be as simple as a few delete triggers on the log table, which replay the deleted record on the production table, and the replay then consisting in a delete operation on the log table. This is so that new log entries can be replayed later without replaying again what was already replayed. The big tables I should do this procedure on have low probability of conflicting operations (like insert and immediate delete of the same row, or multiple insert of the same row, multiple conflicting updates of the same row, etc.), this is why I think replaying the log will work fine... of course this whole set up will be a lot more work than just reindex... I wonder if somebody tried anything like this and if it has chances to work ? Thanks, Csaba. On Tue, 2005-10-18 at 17:18, Robert Treat wrote: reindex should be faster, since you're not dumping/reloading the table contents on top of rebuilding the index, you're just rebuilding the index. Robert Treat emdeon Practice Services Alachua, Florida On Wed, 2005-10-12 at 13:32, Steve Poe wrote: Would it not be faster to do a dump/reload of the table than reindex or is it about the same? Steve Poe On Wed, 2005-10-12 at 13:21 -0400, Tom Lane wrote: Emil Briggs [EMAIL PROTECTED] writes: Not yet, the db is in production use and I have to plan for a down-time for that... or is it not impacting the activity on the table ? It will cause some performance hit while you are doing it. It'll also lock out writes on the table until the index is rebuilt, so he does need to schedule downtime. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Help tuning postgres
On Tue, Oct 18, 2005 at 05:21:37PM +0200, Csaba Nagy wrote: INFO: vacuuming public.some_table INFO: some_table: removed 29598 row versions in 452 pages DETAIL: CPU 0.01s/0.04u sec elapsed 18.77 sec. INFO: some_table: found 29598 removable, 39684 nonremovable row versions in 851 pages DETAIL: 0 dead row versions cannot be removed yet. Does that mean that 39684 nonremovable pages are actually the active live pages in the table (as it reports 0 dead) ? I'm sure I don't have any long running transaction, at least according to pg_stats_activity (backed by the linux ps too). Or I should run a vacuum full... This table is one of which has frequently updated rows. No, you should be ok there. What that should tell you is that you have about 40,000 rows in the table. But notice that your vacuum process just removed about 75% of the live table rows. Moreover, your 39684 rows are taking 851 pages. On a standard installation, that's usually 8Kb/page. So that's about 6,808 Kb of physical storage space you're using. Is that consistent with the size of your data? If it's very large compared to the data you have stored in there, you may want to ask if you're leaking space from the free space map (because of that table turnover, which seems pretty severe). A -- Andrew Sullivan | [EMAIL PROTECTED] The whole tendency of modern prose is away from concreteness. --George Orwell ---(end of broadcast)--- TIP 6: explain analyze is your friend
[PERFORM] Inefficient escape codes.
Hello there, This is my first post in the list. I have a deep low-level background on computer programming, but I am totally newbie to sql databases. I am using postgres because of its commercial license. My problem is with storing large values. I have a database that stores large ammounts of data (each row consisting of up to 5MB). After carefully reading the Postgres 8.0 manual (the version I'm using), I was told that the best option was to create a bytea field. Large objects are out of the line here since we have lots of tables. As I understand it, the client needs to put the data into the server using a textual-based command. This makes the 5MB data grow up-to 5x, making it 25MB in the worst case. (Example: 0x01 - \\001). My question is: 1) Is there any way for me to send the binary field directly without needing escape codes? 2) Will this mean that the client actually wastes my network bandwidth converting binary data to text? Or does the client transparently manage this? Thanks for any light on the subject, Rodrigo
Re: [PERFORM] Inefficient escape codes.
On Tue, Oct 18, 2005 at 06:07:12PM +, Rodrigo Madera wrote: 1) Is there any way for me to send the binary field directly without needing escape codes? In 7.4 and later the client/server protocol supports binary data transfer. If you're programming with libpq you can use PQexecParams() to send and/or retrieve values in binary instead of text. http://www.postgresql.org/docs/8.0/interactive/libpq-exec.html#LIBPQ-EXEC-MAIN APIs built on top of libpq or that implement the protcol themselves might provide hooks to this capability; check your documentation. What language and API are you using? See also COPY BINARY: http://www.postgresql.org/docs/8.0/interactive/sql-copy.html 2) Will this mean that the client actually wastes my network bandwidth converting binary data to text? Or does the client transparently manage this? Binary transfer sends data in binary, not by automatically converting to and from text. -- Michael Fuhr ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] Inefficient escape codes.
[Please copy the mailing list on replies so others can participate in and learn from the discussion.] On Tue, Oct 18, 2005 at 07:09:08PM +, Rodrigo Madera wrote: What language and API are you using? I'm using libpqxx. A nice STL-style library for C++ (I am 101% C++). I've only dabbled with libpqxx; I don't know if or how you can make it send data in binary instead of text. See the documentation or ask in a mailing list like libpqxx-general or pgsql-interfaces. Binary transfer sends data in binary, not by automatically converting to and from text. Uh, I'm sorry I didn't get that... If I send: insert into foo values('\\001\\002') will libpq send 0x01, 0x02 or 001002?? If you do it that way libpq will send the string as text with escape sequences; you can use a sniffer like tcpdump or ethereal to see this for yourself. To send the data in binary you'd call PQexecParams() with a query like INSERT INTO foo VALUES ($1). The $1 is a placeholder; the other arguments to PQexecParams() provide the data itself, the data type and length, and specify whether the data is in text format or binary. See the libpq documentation for details. -- Michael Fuhr ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org