Am 2013-12-23 um 16:05 schrieb Henning Hraban Ramm <[email protected]>:

> Sorry, rejoiced too soon - it’s still (or again) there.

And still only a part of our documents is indexed, and I must enhance this 
situation „ASAP“.
(Nobody seems have to answered to Theodoros Theodoropoulos who has the same 
problem.)

wrd.cfg ATM looks like (modulo most comments):

[rank_method]
function = word_similarity

[word_similarity]
stemming = None
table = rnkWORD01F
stopword = False
relevance_number_output_prologue = (
relevance_number_output_epilogue = )
tag1 = 653__a, 2, ru
tag2 = 245__%, 10, ru
#tag3 = 520__%, 2, ru
#tag4 = 852__%, 2, en
#tag5 = 100__%, 3, none
#tag6 = 700__%, 2, none
#tag7 = 490__%, 5, ru
#tag8 = 260__%, 1, ru

[find_similar]
max_word_occurence = 0.05
min_word_occurence = 0.00
min_word_length = 3
min_nr_words_docs = 3
max_nr_words_upper = 20
max_nr_words_lower = 10
default_min_relevance = 75


Are there other places I need to configure?

>> If you can have N languages, where N can arbitrarily raise, then indeed
>> you'd use another technique like:
>> 
>>  242 $a War and Peace $y eng
>>  242 $a Guerre et Paix $y fre
>>  242 $a מלחמה ושלום $y heb
>>  245 $a Война и мир $y rus
>> 
>> The beauty of the document model is that it is supporting both use
>> cases.  (Uploading, searching, submitting, editing etc are mostly fine,
>> modulo some corners you've hit, such as language-dependent ranking
>> weights.)
> 
> Yes, looks convincing; so I’d need to annotate every field with the language 
> of the record.

Ok, I’m ready to update all records and spread the value of 041__a to *__y - 
but would that help? Is it possible to configure the indexer to use this value?
Like
tag1 = 245__y[=rus], 10, ru
tag1 = 245__y[=eng], 10, en
tag1 = 245__y[=kir], 10, none?
tag1 = 245__y[=kaz], 10, none?
tag1 = 245__y[=tgk], 10, none?
etc.
I would need to add new languages as they appear. At the moment we have just 6, 
as far as i see.

Does it „hurt“ that subfield y is not standardized?
And does it understand our three letter ISO codes and not only two letter 
codes? (Would be no big problem to change that.)


Oh my, we can’t be the first institution dealing with documents in multiple 
languages??

Greetlings, Hraban
---
http://www.fiee.net
https://www.cacert.org (I'm an assurer)







Reply via email to