Hi Ludmila,
[...]
>> I'm not sure, what whould happen if you add the missing '(' and ')' to
>> CFG_BIBINDEX_CHARS_PUNCTUATION and reindex your site? Could you just
>> use expressions like "StatID DE-HGF 0100" and "StatID DE-HGF 0110",
>> removing all problematic characters? (Yes, I know your site is large,
>> maybe you have a smaller installation to test this).
>
> These 2 config variable are useful for breaking phrases into words
> (thus having an impact on the indexing of words - or better said, what
> is considered a 'word'), but they won't help too much if you need to
> do exact phrase search (which uses the phrase index). The
> CFG_BIBINDEX_CHARS_PUNCTUATION is used to split the phrase into blocks
> (which also get indexed as words) and then the blocks are further
> split into words using CFG_BIBINDEX_CHARS_ALPHANUMERIC_SEPARATORS.
> All this work is done for computing the words, but phrases are left as
> they are (without any processing). We can check in ipython what would
> be the difference if '(' and ')' get added.
thanks for the detailed explanation; I could hardly arrive to this
conclusion my own. Once I tried to add a few local variations of
Quotation marks that, you know, are highly cultural (see, for example,
http://en.wikipedia.org/wiki/Quotation_mark,_non-English_usage), but I
wasn't able to come up with a good behaviour, although I don't remember
the details now; it was quite ago. Where should I have to add the '«',
'»', '„', '“'? Just to CFG_BIBINDEX_CHARS_ALPHANUMERIC_SEPARATORS,
right?
Thanks again,
Ferran