Am 2013-12-23 um 16:05 schrieb Henning Hraban Ramm <[email protected]>:
> Sorry, rejoiced too soon - it’s still (or again) there. And still only a part of our documents is indexed, and I must enhance this situation „ASAP“. (Nobody seems have to answered to Theodoros Theodoropoulos who has the same problem.) wrd.cfg ATM looks like (modulo most comments): [rank_method] function = word_similarity [word_similarity] stemming = None table = rnkWORD01F stopword = False relevance_number_output_prologue = ( relevance_number_output_epilogue = ) tag1 = 653__a, 2, ru tag2 = 245__%, 10, ru #tag3 = 520__%, 2, ru #tag4 = 852__%, 2, en #tag5 = 100__%, 3, none #tag6 = 700__%, 2, none #tag7 = 490__%, 5, ru #tag8 = 260__%, 1, ru [find_similar] max_word_occurence = 0.05 min_word_occurence = 0.00 min_word_length = 3 min_nr_words_docs = 3 max_nr_words_upper = 20 max_nr_words_lower = 10 default_min_relevance = 75 Are there other places I need to configure? >> If you can have N languages, where N can arbitrarily raise, then indeed >> you'd use another technique like: >> >> 242 $a War and Peace $y eng >> 242 $a Guerre et Paix $y fre >> 242 $a מלחמה ושלום $y heb >> 245 $a Война и мир $y rus >> >> The beauty of the document model is that it is supporting both use >> cases. (Uploading, searching, submitting, editing etc are mostly fine, >> modulo some corners you've hit, such as language-dependent ranking >> weights.) > > Yes, looks convincing; so I’d need to annotate every field with the language > of the record. Ok, I’m ready to update all records and spread the value of 041__a to *__y - but would that help? Is it possible to configure the indexer to use this value? Like tag1 = 245__y[=rus], 10, ru tag1 = 245__y[=eng], 10, en tag1 = 245__y[=kir], 10, none? tag1 = 245__y[=kaz], 10, none? tag1 = 245__y[=tgk], 10, none? etc. I would need to add new languages as they appear. At the moment we have just 6, as far as i see. Does it „hurt“ that subfield y is not standardized? And does it understand our three letter ISO codes and not only two letter codes? (Would be no big problem to change that.) Oh my, we can’t be the first institution dealing with documents in multiple languages?? Greetlings, Hraban --- http://www.fiee.net https://www.cacert.org (I'm an assurer)

