On 6/25/07, Tom Lane <[EMAIL PROTECTED]> wrote:
Well, it's not hard at all to find chunks of English text that have
embedded bits of French, Spanish, or what-have-you, but that's not an
argument for trying to intermix the stemmers.  I doubt that such simple
bits of program could tell the language difference well enough to
determine which stemming rules to apply.

While I imagine that is probably true of many, if not most, my project
in particular would greatly benefit from the ability to mix stemmers.
I work with complex bibliographic data, which has language information
embedded within records.  This is not limited to the record level
either.  Individual fields within each bibliographic record can be in
different langauges.

Especially in countries where making software multi-lingual (such as
Canada (en_CA/fr_CA)) is a requirement for use in public institutions,
the ability to choose a stemmer and stop-word list at will for any
particular record will actually provide the exact behavior needed.
The obvious generalization from Canada would be to support any mix of
languages supported by tsearch2.

I can certainly understand the benefit of making the default
configuration a simple locale to language map, but there are
definitely uses for searching using different stemmers/stop-lists even
within the same corpus/index.  So, as a datapoint for the discussion,
I would ask that the option of multiple languages per DB locale not be
removed if it can be at all avoided.

Thanks for listening (and for all the great work on getting tsearch
into core! :) ...

Mike Rylander

---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at


Reply via email to