Re: [PERFORM] Queryplan within FTS/GIN index -search.
Tom Lane t...@sss.pgh.pa.us wrote: Kevin Grittner kevin.gritt...@wicourts.gov writes: Tom Lane t...@sss.pgh.pa.us wrote: Any sane text search application is going to try to filter out common words as stopwords; it's only the failure to do that that's making this run slow. I'd rather have the index used for the selective test, and apply the remaining tests to the rows retrieved from the heap. Uh, that was exactly my point. Indexing common words is a waste. Perhaps I'm missing something. My point was that there are words which are too common to be useful for index searches, yet uncommon enough to usefully limit the results. These words could typically benefit from tsearch2 style parsing and dictionaries; so declaring them as stop words would be bad from a functional perspective, yet searching an index for them would be bad from a performance perspective. One solution would be for the users to rigorously identify all of these words, include them on one stop word list but not another, include *two* tsvector columns in the table (with and without the iffy words), index only the one with the larger stop word list, and generate two tsquery values to search the two different columns. Best of both worlds. Sort of. The staff time to create and maintain such a list would obviously be costly and writing the queries would be error-prone. Second best would be to somehow recognize the iffy words and exclude them from the index and the index search phase, but apply the check when the row is retrieved from the heap. I really have a hard time seeing how the conditional exclusion from the index could be accomplished, though. Next best would be to let them fall into the index, but exclude top level ANDed values from the index search, applying them only to the recheck when the row is read from the heap. The seems, at least conceptually, like it could be done. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Problem with database performance, Debian 4gb ram ?
Grant Masan grant.mas...@gmail.com wrote: max_connections = 80 shared_buffers = 512MB temp_buffers = 8MB work_mem = 20MB maintenance_work_mem = 384MB wal_buffers = 8MB checkpoint_segments = 128MB effective_cache_size = 2304MB checkpoint_timeout = 1h Pending further information, these seem sane to me. cpu_tuple_cost = 0.0030 cpu_index_tuple_cost = 0.0010 cpu_operator_cost = 0.0005 Why did you make these adjustments? I usually have to change the ratio between page and cpu costs toward the other direction. Unless you have carefully measured performance with and without these changes and found a clear win with these, I would recommend going back to the defaults for these three and tuning from there. fsync = off Only use this if you can afford to lose all data in the database. (There are some situations where this is actually OK, but they are unusual.) As others have stated, though, we'd need more information to really give much useful advice. An EXPLAIN ANALYZE of a query which isn't performing to expectations would be helpful, especially if you include the table definitions (with indexes) of all tables involved in the query. Output from vmstat or iostat with a fairly small interval (I usually use 1) while the query is running would be useful, too. Knowing the exact version of PostgreSQL (like from SELECT version();) would be useful, as well as knowing more about you disk array and controller(s). -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Queryplan within FTS/GIN index -search.
Kevin Grittner kevin.gritt...@wicourts.gov writes: Perhaps I'm missing something. My point was that there are words which are too common to be useful for index searches, yet uncommon enough to usefully limit the results. These words could typically benefit from tsearch2 style parsing and dictionaries; so declaring them as stop words would be bad from a functional perspective, yet searching an index for them would be bad from a performance perspective. Right, but the original complaint in this thread was that a GIN index is slow about searching for very common terms. The answer to that clearly is to not index common terms, rather than worry about making the case a bit faster. It may well be that Jesper's identified a place where the GIN code could be improved --- it seems like having the top-level search logic be more aware of the AND/OR structure of queries would be useful. But the particular example shown here doesn't make a very good case for that, because it's hard to tell how much of a penalty would be taken in more realistic examples. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Queryplan within FTS/GIN index -search.
Tom Lane t...@sss.pgh.pa.us wrote: The answer to that clearly is to not index common terms My understanding is that we don't currently get statistics on how common the terms in a tsvector column are until we ANALYZE the *index* created from it. Seems like sort of a Catch 22. Also, if we exclude words which are in the tsvector from the index on the tsvector, we need to know what words were excluded so we know not to search on them as well as forcing the recheck of the full tsquery (unless this always happens already?). It may well be that Jesper's identified a place where the GIN code could be improved My naive assumption has been that it would be possible to get an improvement without touching the index logic, by changing this part of the query plan: Index Cond: (ftsbody_body_fts @@ to_tsquery ('TERM1 TERM2 TERM3 TERM4 TERM5'::text)) to something like this: Index Cond: (ftsbody_body_fts @@ to_tsquery ('TERM1'::text)) and count on this doing the rest: Recheck Cond: (ftsbody_body_fts @@ to_tsquery ('TERM1 TERM2 TERM3 TERM4 TERM5'::text)) I'm wondering if anyone has ever confirmed that probing for the more frequent term through the index is *ever* a win, versus using the index for the most common of the top level AND conditions and doing the rest on recheck. That seems like a dangerous assumption from which to start. But the particular example shown here doesn't make a very good case for that, because it's hard to tell how much of a penalty would be taken in more realistic examples. Fair enough. We're in the early stages of moving to tsearch2 and I haven't run across this yet in practice. If I do, I'll follow up. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Queryplan within FTS/GIN index -search.
Kevin Grittner kevin.gritt...@wicourts.gov wrote: I'm wondering if anyone has ever confirmed that probing for the more frequent term through the index is *ever* a win, versus using the index for the most common of the top level AND conditions and doing the rest on recheck. s/most/least/ -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Problem with database performance, Debian 4gb ram ?
On Tue, Nov 3, 2009 at 7:13 AM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Grant Masan grant.mas...@gmail.com wrote: cpu_tuple_cost = 0.0030 cpu_index_tuple_cost = 0.0010 cpu_operator_cost = 0.0005 Why did you make these adjustments? I usually have to change the ratio between page and cpu costs toward the other direction. Is that because the database is mostly cached in memory? If I take the documented descriptions of the costs parameters at face value, I find that cpu_tuple_cost should be even lower yet. Cheer, Jeff -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Free memory usage Sol10, 8.2.9
All, I'm trying to understand the free memory usage and why it falls below 17G sometimes and what could be causing it. Any pointers would be appreciated. r...@prod1 # prtconf System Configuration: Sun Microsystems sun4u Memory size: 32768 Megabytes [postg...@prod1 ~]$ vmstat 5 10 kthr memorypagedisk faults cpu r b w swap free re mf pi po fr de sr 1m 1m 1m m1 in sy cs us sy id 0 0 0 51713480 21130304 58 185 325 104 104 0 0 23 3 7 1 488 604 573 1 2 97 0 0 0 51048768 18523456 6 10 0 192 192 0 0 4 0 3 0 527 753 807 2 1 97 0 0 0 51713480 21130304 58 185 325 104 104 0 0 1 23 3 7 488 604 573 1 2 97 0 0 0 51067112 18538472 0 1 0 171 171 0 0 4 8 0 4 522 573 740 2 1 97 0 0 0 51072744 18542992 0 0 0 187 187 0 0 0 22 0 7 532 657 780 2 1 97 0 0 0 51069944 18540736 146 1729 3 174 174 0 0 0 9 0 3 526 3227 944 4 5 91 0 0 0 51065728 18537360 32 33 0 192 192 0 0 0 20 0 3 522 1147 927 3 2 95 0 0 0 51065728 18537336 0 0 0 190 190 0 0 0 26 0 3 517 628 789 2 1 97 0 0 0 51065728 18537336 0 0 0 168 168 0 0 0 25 0 11 517 668 810 2 2 96 0 0 0 51062960 18535152 0 165 2 190 190 0 0 14 29 0 4 552 732 808 2 1 97 prstat -am NPROC USERNAME SWAP RSS MEMORY TIME CPU 21 postgres 8312M 8300M25% 112:24:15 2.1% 53 root 347M 236M 0.7% 130:52:02 0.1% 7 daemon708M 714M 2.2% 21:53:05 0.0% 4 mot 5552K 15M 0.0% 0:00:00 0.0% 1 smmsp1384K 5480K 0.0% 0:00:59 0.0% [postg...@prod1]$ ps -eaf | grep postgres | wc -l 24 max_connections = 600 shared_buffers = 8000MB temp_buffers = 8MB work_mem = 2MB maintenance_work_mem = 256MB max_fsm_pages = 2048000 max_fsm_relations = 2000 effective_cache_size = 4000MB Thanks, Stalin
Re: [PERFORM] Queryplan within FTS/GIN index -search.
Tom Lane wrote: It may well be that Jesper's identified a place where the GIN code could be improved --- it seems like having the top-level search logic be more aware of the AND/OR structure of queries would be useful. But the particular example shown here doesn't make a very good case for that, because it's hard to tell how much of a penalty would be taken in more realistic examples. With a term sitting in: 80% of the docs the penalty is: x23 60% of the docs the penalty is: x17 40% of the docs the penalty is: x13 of doing vectorcol @@ ts_query('term commonterm') compared to vectorcol @@ ts_query('term) and vectorcol @@ ts_query('commonterm'); where term is non-existing (or rare). (in query execution performance on a fully memory recident dataset, doing test with drop_caches and restart pg to simulate a dead disk the numbers are a bit higher). http://article.gmane.org/gmane.comp.db.postgresql.performance/22496/match= Would you ever quantify a term sitting in 60-80% as a stop-word candidate? I dont know if x13 in execution performance is worth hunting or there are lower hanging fruits sitting in the fts-search-system. This is essentially the penalty the user will get for adding a terms to their search that rarely restricts the results. In term of the usual set theory that databases work in, a search for a stop-word translated into the full set. This is just not the case in where it throws a warning and returns the empty set. This warning can be caught by application code to produce the correct result to the users, but just slightly more complex queries dont do this: ftstest=# select id from ftstest where body_fts @@ to_tsquery('random | the') limit 10; id (0 rows) Here I would have expected the same error.. I basically have to hook in the complete stop-word dictionary in a FTS-preparser to give the user the expected results or have I missed a feature somwhere? My reason for not pushing commonterms into the stopword list is that they actually perform excellent in PG. Same body as usual, but commonterm99 is sitting in 99% of the documents. ftstest=# set enable_seqscan=off; SET ftstest=# explain analyze select id from ftstest where body_fts @@ to_tsquery('commonterm99'); QUERY PLAN -- Bitmap Heap Scan on ftstest (cost=1051476.74..1107666.07 rows=197887 width=4) (actual time=51.036..121.348 rows=197951 loops=1) Recheck Cond: (body_fts @@ to_tsquery('commonterm99'::text)) - Bitmap Index Scan on ftstest_gin_idx (cost=0.00..1051427.26 rows=197887 width=0) (actual time=49.602..49.602 rows=197951 loops=1) Index Cond: (body_fts @@ to_tsquery('commonterm99'::text)) Total runtime: 147.350 ms (5 rows) ftstest=# set enable_seqscan=on; SET ftstest=# explain analyze select id from ftstest where body_fts @@ to_tsquery('commonterm99'); QUERY PLAN -- Seq Scan on ftstest (cost=0.00..56744.00 rows=197887 width=4) (actual time=0.086..7134.384 rows=197951 loops=1) Filter: (body_fts @@ to_tsquery('commonterm99'::text)) Total runtime: 7194.182 ms (3 rows) So in order to get the result with a speedup of more than x50 I simply cannot add these terms to the stop-words because then the first query would resolve to an error and getting results would then be up to the second query. My bet is that doing a seq_scan will never be beneficial for this type of query. As far as I can see the only consequence of simply not remove stop-words at all is a (fairly small) increase in index-size. It seems to me that stop-words were invented when it was hard to get more than 2GB of memory into a computer to get the index-size reduced to a size that better could fit into memory. But nowadays it seems like the downsides are hard to see? Jesper -- Jesper -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Optimizer + bind variables
David Kerr wrote: Does/is it possible for the PG optimizer come up with differnet plans when you're using bind variables vs when you send static values? Yes, if the bind variable form causes your DB access driver to use a server-side prepared statement. Pg can't use its statistics to improve its query planning if it doesn't have a value for a parameter when it's building the query plan. Whether a server-side prepared statement is used or not depends on how you're connecting to the database - ie your DB access driver and version. If you're using JDBC, I *think* the JDBC driver does parameter placement client-side unless you're using a JDBC prepared statement and the JDBC prepared statement is re-used several times, at which point it sets up a server-side prepared statement. AFAIK otherwise it uses client-side (or Pg protocol level) parameter placement. if it's possible for the plan to be different how can i generate an xplan for the bind version? xplan = explain? If so: Use PREPARE to prepare a statement with the params, then use: EXPLAIN EXECUTE prepared_statement_name(params); eg: x= PREPARE blah AS SELECT * FROM generate_series(1,100); PREPARE x= EXPLAIN EXECUTE blah; QUERY PLAN Function Scan on generate_series (cost=0.00..12.50 rows=1000 width=4) (1 row) -- Craig Ringer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Optimizer + bind variables
Does/is it possible for the PG optimizer come up with differnet plans when you're using bind variables vs when you send static values? like if my query was select * from users (add a bunch of complex joins) where username = 'dave' vs select * from users (add a bunch of complex joins) where username = '?' In oracle they are frequently different. if it's possible for the plan to be different how can i generate an xplan for the bind version? Thanks! Dave -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Free memory usage Sol10, 8.2.9
On 11/03/2009 07:16 PM, Subbiah Stalin-XCGF84 wrote: All, I'm trying to understand the free memory usage and why it falls below 17G sometimes and what could be causing it. Any pointers would be appreciated. r...@prod1 # prtconf System Configuration: Sun Microsystems sun4u Memory size: 32768 Megabytes [postg...@prod1 ~]$ vmstat 5 10 kthr memorypagedisk faults cpu r b w swap free re mf pi po fr de sr 1m 1m 1m m1 in sy cs us sy id 0 0 0 51713480 21130304 58 185 325 104 104 0 0 23 3 7 1 488 604 573 1 2 97 0 0 0 51048768 18523456 6 10 0 192 192 0 0 4 0 3 0 527 753 807 2 1 97 Memory used by the OS for caching files is no longer free. Free memory is wasted memory. -J -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] maintaining a reference to a fetched row
Brian Karlak wrote: The setup is relatively simple: there is a central queue table in postgres. Worker daemons do a bounded, ordered, limited SELECT to grab a row, which they lock by setting a value in the queue.status column. You can probably do an UPDATE ... RETURNING to turn that into one operation - but that won't work with a cursor :-( My question is this: is there some way that I can keep a cursor / pointer / reference / whatever to the row I fetched originally, so that I don't have to search for it again when I'm ready to write results? You could use a cursor, but it won't work if you're locking rows by testing a 'status' flag, because that requires the worker to commit the transaction (so others can see the status flag) before starting work. A cursor only exists within a transaction. BEGIN; DECLARE curs CURSOR FOR SELECT * FROM queue ORDER BY queue_id LIMIT 1; FETCH NEXT FROM curs; -- -- Set the status - but nobody else can see the change yet because we -- haven't committed! We'll have a Pg row lock on the record due to the -- UPDATE, preventing other UPDATEs but not other SELECTs. -- -- We can't start work until the transaction commits, but committing -- will close the cursor. -- UPDATE queue SET status = 1 WHERE CURRENT OF curs; I don't have a good answer for you there. Perhaps using Pg's locking to do your queueing, rather than updating a status flag, might let you use a cursor? Have a look at the list archives - there's been a fair bit of discussion of queuing mechanisms. -- Craig Ringer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] High Frequency Inserts to Postgres Database vs Writing to a File
Hi: I have an application wherein a process needs to read data from a stream and store the records for further analysis and reporting. The data in the stream is in the form of variable length records with clearly defined fields - so it can be stored in a database or in a file. The only caveat is that the rate of records coming in the stream could be several 1000 records a second. The design choice I am faced with currently is whether to use a postgres database or a flat file for this purpose. My application already maintains a postgres (8.3.4) database for other reasons - so it seemed like the straightforward thing to do. However I am concerned about the performance overhead of writing several 1000 records a second to the database. The same database is being used simultaneously for other activities as well and I do not want those to be adversely affected by this operation (especially the query times). The advantage of running complex queries to mine the data in various different ways is very appealing but the performance concerns are making me wonder if just using a flat file to store the data would be a better approach. Anybody have any experience in high frequency writes to a postgres database? - Jay
Re: [PERFORM] Optimizer + bind variables
On Wed, Nov 04, 2009 at 07:43:16AM +0800, Craig Ringer wrote: - David Kerr wrote: - Does/is it possible for the PG optimizer come up with differnet plans when - you're using bind variables vs when you send static values? - - Yes, if the bind variable form causes your DB access driver to use a - server-side prepared statement. Pg can't use its statistics to improve - its query planning if it doesn't have a value for a parameter when it's - building the query plan. hmm, that's a little unclear to me. let's assume that the application is using prepare: Assuming the database hasn't changed, would: PREPARE bla1 as SELECT * from users where username = '$1'; explain execute bla1 give the same output as explain select * from users where username = 'dave'; ? - Whether a server-side prepared statement is used or not depends on how - you're connecting to the database - ie your DB access driver and - version. If you're using JDBC, I *think* the JDBC driver does parameter - placement client-side unless you're using a JDBC prepared statement and - the JDBC prepared statement is re-used several times, at which point it - sets up a server-side prepared statement. AFAIK otherwise it uses - client-side (or Pg protocol level) parameter placement. that's interesting, i'll need to find out which mine are using, probably a mix of both. - if it's possible for the plan to be different how can i generate an - xplan for the bind version? - - xplan = explain? If so: yeah, sorry. - Use PREPARE to prepare a statement with the params, then use: - - EXPLAIN EXECUTE prepared_statement_name(params); - - eg: - - x= PREPARE blah AS SELECT * FROM generate_series(1,100); - PREPARE - x= EXPLAIN EXECUTE blah; -QUERY PLAN - - Function Scan on generate_series (cost=0.00..12.50 rows=1000 width=4) - (1 row) great thanks! Dave -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] maintaining a reference to a fetched row
On Nov 3, 2009, at 4:03 PM, Craig Ringer wrote: I don't have a good answer for you there. Perhaps using Pg's locking to do your queueing, rather than updating a status flag, might let you use a cursor? Have a look at the list archives - there's been a fair bit of discussion of queuing mechanisms. This is an interesting idea. I'll see what I can find in the archives. It will likely take a bit of refactoring, but such is life ... Thanks! Brian -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] maintaining a reference to a fetched row
Hello All -- I have a simple queuing application written on top of postgres which I'm trying to squeeze some more performance out of. The setup is relatively simple: there is a central queue table in postgres. Worker daemons do a bounded, ordered, limited SELECT to grab a row, which they lock by setting a value in the queue.status column. When the task is complete, results are written back to the row. The system is designed to allow multiple concurrent daemons to access a queue. At any one time, we expect 1-5M active items on the queue. Now this design is never going to win any performance awards against a true queuing system like Active/Rabbit/Zero MQ, but it's tolerably fast for our applications. Fetch/mark times are about 1ms, independent of the number of items on the queue. This is acceptable considering that our tasks take ~50ms to run. However, the writing of results back to the row takes ~5ms, which is slower than I'd like. It seems that this is because I need to to do an index scan on the queue table to find the row I just fetched. My question is this: is there some way that I can keep a cursor / pointer / reference / whatever to the row I fetched originally, so that I don't have to search for it again when I'm ready to write results? Thanks in advance for any pointers you can provide. Brian -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Optimizer + bind variables
On Wed, Nov 04, 2009 at 11:02:22AM +1100, Chris wrote: - David Kerr wrote: - On Wed, Nov 04, 2009 at 07:43:16AM +0800, Craig Ringer wrote: - - David Kerr wrote: - No. - - This is explained in the notes here: - - http://www.postgresql.org/docs/current/static/sql-prepare.html sigh and i've read that before too. On the upside, then it behaves like I would expect it to, which is good. Thanks Dave -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] High Frequency Inserts to Postgres Database vs Writing to a File
On Tue, Nov 3, 2009 at 8:12 PM, Jay Manni jma...@fireeye.com wrote: Hi: I have an application wherein a process needs to read data from a stream and store the records for further analysis and reporting. The data in the stream is in the form of variable length records with clearly defined fields – so it can be stored in a database or in a file. The only caveat is that the rate of records coming in the stream could be several 1000 records a second. The design choice I am faced with currently is whether to use a postgres database or a flat file for this purpose. My application already maintains a A common approach is to store them in flat files, then insert the flat files at a later time so that if the db falls behind no data is lost. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] High Frequency Inserts to Postgres Database vs Writing to a File
could be several 1000 records a second. So, are there periods when there are no/few records coming in? Do the records/data/files really need to be persisted? The following statement makes me think you should go the flat file route: The advantage of running complex queries to mine the data in various different ways is very appealing Please don't be offended, but that sounds a little like feature creep. I've found that it's best to keep it simple and don't do a bunch of work now for what might be requested in the future. I know it's not exactly what you were looking for... Just food for thought. Best of luck! David From: Jay Manni jma...@fireeye.com To: pgsql-performance@postgresql.org pgsql-performance@postgresql.org Sent: Tue, November 3, 2009 7:12:29 PM Subject: [PERFORM] High Frequency Inserts to Postgres Database vs Writing to a File Hi: I have an application wherein a process needs to read data from a stream and store the records for further analysis and reporting. The data in the stream is in the form of variable length records with clearly defined fields – so it can be stored in a database or in a file. The only caveat is that the rate of records coming in the stream could be several 1000 records a second. The design choice I am faced with currently is whether to use a postgres database or a flat file for this purpose. My application already maintains a postgres (8.3.4) database for other reasons – so it seemed like the straightforward thing to do. However I am concerned about the performance overhead of writing several 1000 records a second to the database. The same database is being used simultaneously for other activities as well and I do not want those to be adversely affected by this operation (especially the query times). The advantage of running complex queries to mine the data in various different ways is very appealing but the performance concerns are making me wonder if just using a flat file to store the data would be a better approach. Anybody have any experience in high frequency writes to a postgres database? - Jay
Re: [PERFORM] Queryplan within FTS/GIN index -search.
I wrote: Tom Lane t...@sss.pgh.pa.us wrote: But the particular example shown here doesn't make a very good case for that, because it's hard to tell how much of a penalty would be taken in more realistic examples. Fair enough. We're in the early stages of moving to tsearch2 and I haven't run across this yet in practice. If I do, I'll follow up. We have a staging database which allowed some limited testing quickly. While it's real production data, we haven't been gathering this type of data long, so it's got relatively few rows; therefore, it wasn't feasible to try any tests which would be disk-bound, so I primed the cache for all of these, and they are all totally served from cache. For various reasons which I'll omit unless asked, we do our text searches through functions which take a selection string, turn it into a tsquery with a little extra massaging on our part, run the query with a minimum ranking to return, and return a set of records ordered by the ranking in descending sequence. Under these conditions there is a slight performance gain in adding an additional test which matches 1356 out of 1691 rows. Not surprisingly for a fully cached query set, timings were very consistent from run to run. While undoubtedly a little unusual in approach, this is production software run against real-world data. I confirmed that it is using the GIN index on the tsvector for these runs. By the way, the tsearch2 features have been received very well so far. One of the first reactions from most users is surprise at how fast it is. :-) Anyway, our production results don't confirm the issue shown with the artificial test data. scca= select count(*) from DocThumbnail where text is not null; count --- 1691 (1 row) Time: 0.619 ms scca= select count(*) from (select DocThumbnail_text_rank('guardian ad litem', 0.1)) x; count --- 41 (1 row) Time: 19.394 ms scca= select count(*) from (select DocThumbnail_text_rank('guardian ad litem attorney', 0.1)) x; count --- 4 (1 row) Time: 16.434 ms scca= select count(*) from (select DocThumbnail_text_rank('attorney', 0.1)) x; count --- 1356 (1 row) Time: 415.056 ms scca= select count(*) from (select DocThumbnail_text_rank('guardian ad litem party', 0.1)) x; count --- 2 (1 row) Time: 16.290 ms scca= select count(*) from (select DocThumbnail_text_rank('party', 0.1)) x; count --- 935 (1 row) Time: 386.941 ms -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] maintaining a reference to a fetched row
Brian Karlak zen...@metaweb.com writes: My question is this: is there some way that I can keep a cursor / pointer / reference / whatever to the row I fetched originally, so that I don't have to search for it again when I'm ready to write results? If you don't expect any updates to the row meanwhile, ctid might serve. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Problem with database performance, Debian 4gb ram ?
Jeff Janes jeff.ja...@gmail.com wrote: On Tue, Nov 3, 2009 at 7:13 AM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Grant Masan grant.mas...@gmail.com wrote: cpu_tuple_cost = 0.0030 cpu_index_tuple_cost = 0.0010 cpu_operator_cost = 0.0005 Why did you make these adjustments? I usually have to change the ratio between page and cpu costs toward the other direction. Is that because the database is mostly cached in memory? If I take the documented descriptions of the costs parameters at face value, I find that cpu_tuple_cost should be even lower yet. Right, the optimizer doesn't model caching effects very well, so I find that in practice I have to fudge these from their putative meanings to allow for typical caching. Even with only a small fraction of the database cached, the heavily accessed indexes tend to be fairly well cached, so overall performance improves markedly by dropping random_page_cost to about 2, even in our lowest-percentage- cached databases. I've occasionally tried using the defaults for that GUC, which has always resulted in user complaints about unacceptable performance of important queries. While I tend to reduce the random_page_cost and seq_page_cost to tweak things, raising the cpu_*_cost settings would accomplish the same thing, so reducing them as show above would tend to push things into sequential scans where indexed access might be faster. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance