[DOCS] Tsearch docs question
The Tsearch docs, under the GiST and GIN section, say: "Lossiness [of GiST] causes serious performance degradation since random access of heap records is slow and limits the usefulness of GiST indexes." The docs do go into some detail, but I think it causes some confusion, also. Let me digress to state how I understand the relationship between GIN, GiST, and RECHECK: The benefit of avoiding RECHECK is to avoid the need to re-evaluate the predicate after finding the entry in the index. This can be valuable in tsearch, because the functions are much more expensive than (for example) integer equality. We (currently) have to visit the heap anyway, to see the visibility information. So avoiding a RECHECK clause doesn't do anything to prevent random heap I/O (although, a less-lossy index will have fewer false positives, by definition). GIN (as used with tsearch) is lossy for more sophisticated tsqueries (those involving labels) and non-lossy for simpler tsqueries. There's only one tsquery type, so PostgreSQL has no way of differentiating between these two cases. GiST (as used with tsearch) is lossy for large tsvectors or tsqueries containing labels; and non-lossy for small tsvectors matched against a tsquery that contains no labels. PostgreSQL can't differentiate between these two cases. So, for GiST they always RECHECK (so you're always sure to get the right result), and for GIN the default operator does not RECHECK (for performance), but if you suspect that you might be using labels in your tsqueries you need to use a special RECHECKing operator, "@@@", to be accurate. Is the above accurate? Back to the docs: I think the docs could clear this issue up somewhat. The current wording suggests that GIN performs better because it avoids a trip to the heap, when in reality it seems the benefit is avoiding the need to re-evaluate the expensive tsearch functions (which might need to access TOASTed data). There's also a related issue: I think a RECHECK would be less costly if you have the tsvectors materialized in the table (using triggers) and index that. Maybe that could be a tip for using GiST indexes. Regards, Jeff Davis ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [DOCS] Pattern for use of the alias "Postgres"
[ BCC to docs.] FYI, since no one liked my second-in-paragraph usage of "Postgres" in the FAQ and developer's FAQ, I have removed it. --- bruce wrote: > [ BCC to docs because this might affect documentation too.] > > You probably remember the discussion about promoting the use of the > alias "Postgres" in addition to the official name "PostgreSQL". I have > changed the FAQ so that in paragraphs with multiple references to > "PostgreSQL" we also use the alias "Postgres". > > I have talked about a similar change to our documentation and perhaps > the web site, but I am _not_ ready to discuss those. > > What I want to ask about is an idea a few people have mentioned. They > don't like that we change usage in the same paragraph. The suggestion > is to mention that "Postgres" is an alias to "PostgreSQL" at the top of > the document and just use "Postgres" in the remainder of the document. > > This seems like a lot more use of the alias than I though we wanted as a > group, but because several of the people suggesting this also didn't > want the alias at all, I figure I should ask and we can discuss it. > > So, for the FAQ, which currently uses the second-entry-per-paragraph > logic, should it be changed to the logic suggested above where every > mention but the first is "Postgres"? (This will of course affect the > documentation changes when we are ready to discuss those.) -- Bruce Momjian <[EMAIL PROTECTED]>http://momjian.us EnterpriseDB http://postgres.enterprisedb.com + If your life is a hard drive, Christ can be your backup. + ---(end of broadcast)--- TIP 7: You can help support the PostgreSQL project by donating at http://www.postgresql.org/about/donate
Re: [DOCS] Tsearch docs question
Jeff Davis <[EMAIL PROTECTED]> writes: > The Tsearch docs, under the GiST and GIN section, say: > "Lossiness [of GiST] causes serious performance degradation since random > access of heap records is slow and limits the usefulness of GiST > indexes." > The docs do go into some detail, but I think it causes some confusion, > also. Are you looking at CVS HEAD, or what was there in beta1? I rewrote that stuff a few days ago: http://developer.postgresql.org/pgdocs/postgres/textsearch-indexes.html > There's also a related issue: I think a RECHECK would be less costly if > you have the tsvectors materialized in the table (using triggers) and > index that. Maybe that could be a tip for using GiST indexes. Yeah, I mentioned that somewhere in the chapter, I think. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
[DOCS] First mention option for Postgres
Hello, I am sure we are all long bored with this thread (on both sides) so I thought I would offer a compromise. When dealing with long documents it is quite common to see something like this: PostgreSQL hereafter referred to as Postgres I propose that we could have the FAQ and the Docs do the same thing. The very first mention in these documents will same something to the effect of: PostgreSQL hereafter referred to as Postgres, is the kick butt database of the planet and the dolphin that can't swim doesn't even compare. My proposal is not suggesting in *any* way that we change the following: The official project name, which is PostgreSQL The domain name of the project, which is postgresql.org Nor am I in anyway suggesting that we "promote" the name Postgres. We should instead continue our focus on promoting our project which is PostgreSQL and incorporates something much larger than just the database which is nicknamed Postgres. Bruce, would like this to happen for 8.3 Final. If he is willing to do the work required for the change, I see it as a reasonable compromise based on the various opinions of the community members. Sincerely, Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 24x7/Emergency: +1.800.492.2240 PostgreSQL solutions since 1997 http://www.commandprompt.com/ UNIQUE NOT NULL Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate PostgreSQL Replication: http://www.commandprompt.com/products/ signature.asc Description: PGP signature
Re: [DOCS] Tsearch docs question
On Fri, 2007-10-26 at 15:26 -0400, Tom Lane wrote: > Are you looking at CVS HEAD, or what was there in beta1? I rewrote > that stuff a few days ago: > http://developer.postgresql.org/pgdocs/postgres/textsearch-indexes.html > Excellent, thanks, that's a big improvement to those docs all around. I should have checked the latest before posting, almost everything I mentioned was already addressed. There's still one very minor thing: "A GiST index is lossy, meaning it is necessary to check the actual table row to eliminate false matches." could be changed to something like: "A GiST index is lossy, meaning that the index may produce false matches, and it is necessary to check the actual table row before eliminating these false matches. And perhaps change: "Lossiness causes performance degradation since random access to table records is slow; ..." to something like: "Lossiness causes performance degradation due to unnecessary random accesses to table records; ..." The only reason I say this is because, on my first reading, I read that to mean that lossless indexes don't require trips to the heap at all (which isn't true, yet). Regards, Jeff Davis ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [DOCS] Tsearch docs question
Jeff Davis <[EMAIL PROTECTED]> writes: > There's still one very minor thing: Updated, thanks for the suggestions. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
