Hi, I am struggling to compute the collection frequency of a term (PyLucene 4.10.1). So far, I can have the collection count of terms with :
reader = IndexReader.open(SimpleFSDirectory(File(LUCENE_INDEX))) termVector = reader.getTermVector(docID, "contents"); termsEnumvar = termVector.iterator(None) termsref = BytesRefIterator.cast_(termsEnumvar) cf_dict = {} try: while (termsref.next()): termval = TermsEnum.cast_(termsref) fg = termval.term().utf8ToString() cf = reader.totalTermFreq(Term("contents", termval.term()) # collection count cf_dict[fg]=cf except StopIteration, e: print '' I would like to have the "frequency" in cf_dict instead of the count. For this, I need to divide it with the total number of indistinct terms in the index. Does anyone know how to get this ? Thank you for your help, Floran