[DOCS] Tsearch docs question

2007-10-26 Thread Jeff Davis
The Tsearch docs, under the GiST and GIN section, say:

"Lossiness [of GiST] causes serious performance degradation since random
access of heap records is slow and limits the usefulness of GiST
indexes."

The docs do go into some detail, but I think it causes some confusion,
also.

Let me digress to state how I understand the relationship between GIN,
GiST, and RECHECK:

The benefit of avoiding RECHECK is to avoid the need to re-evaluate the
predicate after finding the entry in the index. This can be valuable in
tsearch, because the functions are much more expensive than (for
example) integer equality. We (currently) have to visit the heap anyway,
to see the visibility information. So avoiding a RECHECK clause doesn't
do anything to prevent random heap I/O (although, a less-lossy index
will have fewer false positives, by definition).

GIN (as used with tsearch) is lossy for more sophisticated tsqueries
(those involving labels) and non-lossy for simpler tsqueries. There's
only one tsquery type, so PostgreSQL has no way of differentiating
between these two cases.

GiST (as used with tsearch) is lossy for large tsvectors or tsqueries
containing labels; and non-lossy for small tsvectors matched against a
tsquery that contains no labels. PostgreSQL can't differentiate between
these two cases.

So, for GiST they always RECHECK (so you're always sure to get the right
result), and for GIN the default operator does not RECHECK (for
performance), but if you suspect that you might be using labels in your
tsqueries you need to use a special RECHECKing operator, "@@@", to be
accurate.

Is the above accurate?

Back to the docs: I think the docs could clear this issue up somewhat.
The current wording suggests that GIN performs better because it avoids
a trip to the heap, when in reality it seems the benefit is avoiding the
need to re-evaluate the expensive tsearch functions (which might need to
access TOASTed data).

There's also a related issue: I think a RECHECK would be less costly if
you have the tsvectors materialized in the table (using triggers) and
index that. Maybe that could be a tip for using GiST indexes.

Regards,
Jeff Davis


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [DOCS] Pattern for use of the alias "Postgres"

2007-10-26 Thread Bruce Momjian
[ BCC to docs.]

FYI, since no one liked my second-in-paragraph usage of "Postgres" in
the FAQ and developer's FAQ, I have removed it.

---

bruce wrote:
> [ BCC to docs because this might affect documentation too.]
> 
> You probably remember the discussion about promoting the use of the
> alias "Postgres" in addition to the official name "PostgreSQL".  I have
> changed the FAQ so that in paragraphs with multiple references to
> "PostgreSQL" we also use the alias "Postgres".
> 
> I have talked about a similar change to our documentation and perhaps
> the web site, but I am _not_ ready to discuss those.
> 
> What I want to ask about is an idea a few people have mentioned.  They
> don't like that we change usage in the same paragraph.  The suggestion
> is to mention that "Postgres" is an alias to "PostgreSQL" at the top of
> the document and just use "Postgres" in the remainder of the document.
> 
> This seems like a lot more use of the alias than I though we wanted as a
> group, but because several of the people suggesting this also didn't
> want the alias at all, I figure I should ask and we can discuss it.
> 
> So, for the FAQ, which currently uses the second-entry-per-paragraph
> logic, should it be changed to the logic suggested above where every
> mention but the first is "Postgres"?  (This will of course affect the
> documentation changes when we are ready to discuss those.)

-- 
  Bruce Momjian  <[EMAIL PROTECTED]>http://momjian.us
  EnterpriseDB http://postgres.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [DOCS] Tsearch docs question

2007-10-26 Thread Tom Lane
Jeff Davis <[EMAIL PROTECTED]> writes:
> The Tsearch docs, under the GiST and GIN section, say:
> "Lossiness [of GiST] causes serious performance degradation since random
> access of heap records is slow and limits the usefulness of GiST
> indexes."

> The docs do go into some detail, but I think it causes some confusion,
> also.

Are you looking at CVS HEAD, or what was there in beta1?  I rewrote
that stuff a few days ago:
http://developer.postgresql.org/pgdocs/postgres/textsearch-indexes.html

> There's also a related issue: I think a RECHECK would be less costly if
> you have the tsvectors materialized in the table (using triggers) and
> index that. Maybe that could be a tip for using GiST indexes.

Yeah, I mentioned that somewhere in the chapter, I think.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[DOCS] First mention option for Postgres

2007-10-26 Thread Joshua D. Drake
Hello,

I am sure we are all long bored with this thread (on both sides) so I
thought I would offer a compromise. When dealing with long documents it
is quite common to see something like this:

PostgreSQL hereafter referred to as Postgres

I propose that we could have the FAQ and the Docs do the same thing.
The very first mention in these documents will same something to the
effect of:

PostgreSQL hereafter referred to as Postgres, is the kick butt database
of the planet and the dolphin that can't swim doesn't even compare.

My proposal is not suggesting in *any* way that we change
the following:

The official project name, which is PostgreSQL
The domain name of the project, which is postgresql.org

Nor am I in anyway suggesting that we "promote" the name Postgres. We
should instead continue our focus on promoting our project which is
PostgreSQL and incorporates something much larger than just the
database which is nicknamed Postgres.

Bruce, would like this to happen for 8.3 Final. If he is willing to do
the work required for the change, I see it as a reasonable compromise
based on the various opinions of the community members.

Sincerely,

Joshua D. Drake


-- 

  === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564   24x7/Emergency: +1.800.492.2240
PostgreSQL solutions since 1997  http://www.commandprompt.com/
UNIQUE NOT NULL
Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/



signature.asc
Description: PGP signature


Re: [DOCS] Tsearch docs question

2007-10-26 Thread Jeff Davis
On Fri, 2007-10-26 at 15:26 -0400, Tom Lane wrote:
> Are you looking at CVS HEAD, or what was there in beta1?  I rewrote
> that stuff a few days ago:
> http://developer.postgresql.org/pgdocs/postgres/textsearch-indexes.html
> 

Excellent, thanks, that's a big improvement to those docs all around. I
should have checked the latest before posting, almost everything I
mentioned was already addressed.

There's still one very minor thing:

"A GiST index is lossy, meaning it is necessary to check the actual
table row to eliminate false matches."

could be changed to something like:

"A GiST index is lossy, meaning that the index may produce false
matches, and it is necessary to check the actual table row before
eliminating these false matches.

And perhaps change:

"Lossiness causes performance degradation since random access to table
records is slow; ..."

to something like:

"Lossiness causes performance degradation due to unnecessary random
accesses to table records; ..."

The only reason I say this is because, on my first reading, I read that
to mean that lossless indexes don't require trips to the heap at all
(which isn't true, yet).

Regards,
Jeff Davis


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [DOCS] Tsearch docs question

2007-10-26 Thread Tom Lane
Jeff Davis <[EMAIL PROTECTED]> writes:
> There's still one very minor thing:

Updated, thanks for the suggestions.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend