On Fri, 06 Dec 2013, [email protected] wrote: > But it looks like I *should* customize it: > English stemming makes probably no sense for Russian or Kyrgyz > content.
Yes, definitely, you should customise it WRT your tag selection and the deemed importance of information in various fields, and then WRT language used. > But there’s probably no algorithm for these languages anyway. Russian is supported. Kyrgyz is not. > And I don’tunderstand why one should define the stemming language per > field - I guess we’re not the first library with content in different > languages. We used to store different languages in different fields. E.g. CERN bulletin is bilingual English/French and the articles look like this: http://cds.cern.ch/record/1633174/export/hm MARC-wise, we should ideally make use of fields such as 242 (title translation) and read language information from the subfield there: http://www.loc.gov/marc/bibliographic/bd242.html While this is already possible and we are using this technique for many modules, BibRank does not understand it yet. > These are two of the records where the indexer crashes: Thanks. Many of the fields are not recognised, e.g. 653 in the records vs 6531/6532 in the default wrd.cfg. Please try to (i) amend wrd.cfg; (ii) hard-delete your phantom records; (iii) rebalance ranking weights again to see if things improve. Best regards -- Tibor Simko

