Hi, In my programme, I tried to select the most relevant document based on bigrams.
System gives me the following output. {contents: /1, assist librarian/1, assist manjula/2, assist sabaragamuwa/1, fine manjula/1, librari manjula/1, librarian sabaragamuwa/1, main librari/2, manjula assist/4, manjula fine/1, manjula name/1, name manjula/1, sabaragamuwa univers/3, univers main/2, univers sabaragamuwa/1} The frequencies of the bigrams are also correctly identified by the system. But the tf-idf scores of these bigrams are given as 0. However, the same programme gives the correct tf-idf values for unigrams. Following is the code snippet that I wrote to determine the tf-idf of bigrams. ******************************** for(int q1=1; q1<NB+1; q1++){ //NB-Number of Bigrams IndexReader indexReader = IndexReader.open(directory); IndexSearcher indexSearcher = new IndexSearcher(indexReader); Analyzer analyzer = new WhitespaceAnalyzer(); QueryParser queryParser = new QueryParser(FIELD_CONTENTS, analyzer); Query query = queryParser.parse(terms[pos[freqs.length-q1]]); Hits hits = indexSearcher.search(query); Iterator<Hit> it = hits.iterator(); TopDocs results=indexSearcher.search(query,10); ScoreDoc[] hits1=results.scoreDocs; for(ScoreDoc hit:hits1){ Document doc=indexSearcher.doc(hit.doc); tfidf[q1-1]=hit.score; } } *************************** Here, "hit.score" should give the tf-idf value of each bigram. Why it is given as 0? If someone can please explain me how to resolve this problem. Thanks, Manjula.