Hi,

I am struggling to compute the collection frequency of a term (PyLucene 4.10.1).
So far, I can have the collection count of terms with :

reader = IndexReader.open(SimpleFSDirectory(File(LUCENE_INDEX)))
termVector = reader.getTermVector(docID, "contents");
termsEnumvar = termVector.iterator(None)
termsref = BytesRefIterator.cast_(termsEnumvar)
cf_dict = {}
try:
    while (termsref.next()):
        termval = TermsEnum.cast_(termsref)
        fg = termval.term().utf8ToString()
        cf = reader.totalTermFreq(Term("contents", termval.term())    # 
collection count
        cf_dict[fg]=cf
except StopIteration, e:
    print ''

I would like to have the "frequency" in cf_dict instead of the count. For this, 
I need to divide it with the total number of indistinct terms in the index.

Does anyone know how to get this ?

Thank you for your help,

Floran



Reply via email to