These are good ideas to improve the TextIndex. I already encouraged Erik
to put alltogether into a Fishbowl proposal,

----- Original Message -----
From: "Dieter Maurer" <[EMAIL PROTECTED]>
To: "Rik Hoekstra" <[EMAIL PROTECTED]>
Cc: "Chris McDonough" <[EMAIL PROTECTED]>; "Erik Enge"
Sent: Monday, June 18, 2001 4:59 PM
Subject: Re: PossitionIndex (was: Re: [Zope-dev] ZCatalog phrase

> Rik Hoekstra writes:
>  > This raises the question how dependent the splitter on the
paticularities of the
>  > document source - I do not really see how different splitters could be
>  > for one single document. This is perhaps less obvious than it appears,
as you
>  > may want to use different splitters for documents in different
languages. Taken
>  > as a whole I would say choosing a splitter would be a decision that had
to be
>  > taken at indexing time anyway. But perhaps it's just my imagination
that is
>  > lacking.
> There are lots of things you may want to change based on
> experience with your index:
>   *  change the set of token boundary characters
>      they define, where words are broken out.
>   *  change the set of removed characters
>      they are removed from the words, usually for
>      normalization.
>      In German, e.g., you can write both "Auto-Lackierer"
>      and "Autolackierer". You want to normalize
>      these different spellings.
>   *  change the set of "composing" characters
>      German is very rich in composite terms.
>      You may want to index under each component term.
>      For this, you need the rules on how the composition
>      is build.
>      For text, it is usually '-'. But if you have
>      computer sources, '_' or ':' may be relevant, too.
> Of couse, the search must follow the same splitting rules
> than the indexing did. Changing the rules (the splitter
> or its configuration) after indexing will make the index
> inconsistent.
> Dieter
> _______________________________________________
> Zope-Dev maillist  -  [EMAIL PROTECTED]
> **  No cross posts or HTML encoding!  **
> (Related lists -
> )

Zope-Dev maillist  -  [EMAIL PROTECTED]
**  No cross posts or HTML encoding!  **
(Related lists - )

Reply via email to