On Wed, 20 Jun 2007, Bruce Momjian wrote:

Oleg Bartunov wrote:
On Wed, 20 Jun 2007, Bruce Momjian wrote:
Comments to editorial work of Bruce Momjian.


it is useful to have a predefined list of lexemes.

Bruce, here should be list of types of lexemes !

Agreed.  Are the list of lexemes parser-specific?

yes, it it parser which defines types of lexemes.

OK, how will users get a list of supported lexemes?  Do we need a list
per supported parser?

it's documented, see "Parser functions" for token_type();

postgres=# select * from token_type('default');
 tokid |    alias     |            description
     1 | lword        | Latin word
     2 | nlword       | Non-latin word
     3 | word         | Word
     4 | email        | Email
     5 | url          | URL
     6 | host         | Host
     7 | sfloat       | Scientific notation
     8 | version      | VERSION
     9 | part_hword   | Part of hyphenated word
    10 | nlpart_hword | Non-latin part of hyphenated word
    11 | lpart_hword  | Latin part of hyphenated word
    12 | blank        | Space symbols
    13 | tag          | HTML Tag
    14 | protocol     | Protocol head
    15 | hword        | Hyphenated word
    16 | lhword       | Latin hyphenated word
    17 | nlhword      | Non-latin hyphenated word
    18 | uri          | URI
    19 | file         | File or path name
    20 | float        | Decimal notation
    21 | int          | Signed integer
    22 | uint         | Unsigned integer
    23 | entity       | HTML Entity

The integer option controls several behaviors which is done using bit-wise
fields and <literal>|</literal> (for example, <literal>2|4</literal>):
<!-- why so complex? -->

to avoid 2 arguments

But I don't see why you would want to set two of those values --- they
seem mutually exclusive, e.g.

        1 divides the rank by the 1 + logarithm of the document length
        2 divides the rank by the length itself

I assume you do either one, not both.

but what's about others variants ?

OK, here is the full list:

        0 (the default) ignores document length
        1 divides the rank by the 1 + logarithm of the document length
        2 divides the rank by the length itself
        4 divides the rank by the mean harmonic distance between extents
        8 divides the rank by the number of unique words in document
        16 divides the rank by 1 + logarithm of the number of unique words in

so which ones would be both enabled?

no one ! This is a list of possible values of rank normalization flag, which could be ORed together.

=# select rank_cd('1:1,2,3 4:5 6:7', '1&4',1);
=# select rank_cd('1:1,2,3 4:5 6:7', '1&4',1|16);

What I missed is the definition of extent.

From http://www.sai.msu.su/~megera/wiki/NewExtentsBasedRanking
Extent is a shortest and non-nested sequence of words, which satisfy a query.

I don't understand how that relates to this.

because of "4 divides the rank by the mean harmonic distance between extents"
it reflects how dense extents which satisfy query are in document.

its <replaceable>id</replaceable> or <replaceable>ts_name</replaceable>; <!-- n
if none is specified that the current configuration is used.

I don't understand this question

Same issue as above --- why allow a number here when the name works just
fine.  We don't allow tables to be specified by number, so why

<!-- why?  -->
Note that the cascade dropping of the <function>headline</function> function
cause dropping of the <literal>parser</literal> used in fulltext configuration

hmm, probably it should be reversed - cascade dropping of the parser cause
dropping of the headline function.


In example below, <literal>fulltext_idx</literal> is
a GIN index:<!-- why isn't this automatic -->

It's explained above. The problem is that current index api doesn't allow
to say if search was lossy or exact, so to preserve performance of
GIN index we had to introduce @@@ operator, which is the same as @@, but

Well, then we have to fix the API.  Telling users to use a different
operator based on what index is defined is just bad style.

This was raised by Heikki and we discussed it a bit in Ottawa, but it's
unclear if it's doable for 8.3.  @@@ operator is in rare use, so we could
say it will be improved in future versions.

Uh, I am wondering if we just have to force heap access in all cases
until it is fixed.

no-no ! We'll lost performance of GIN index, which isn't lossy and don't
need heap access. I don't see what's wrong if we say that some feature
doesn't supported by text search operator with GIN index.

We need to decide if we need oids as user-visible argument. I don't see
any value, probably Teodor think other way.

This is a good time to clean up the API because there are going to be
user-visible changes anyway.

I agree. Keep in mind this, until we get more serious tasks done.

Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?


Reply via email to