Constructing an IDF table without indexing documents

2011-06-24 Thread Xiyang Chen
Hi, I'm developing a search application with two types of documents: 1. Documents that need to be indexed and queried against 2. Documents that will never show up in search results, but their content needs to contribute to the global term frequency table In other words, the application

Index a manually constructed term vector as a Lucene document

2011-07-10 Thread Xiyang Chen
Hi, I have some term vectors each with a number of (term, score) pairs. These vectors were derived from text documents which are no longer obtainable. I want to be able to add these vectors as individual documents to Lucene index. I understand this can be accomplished by writing my own analyzer c

Tokenize a dictionary of phrases

2011-08-21 Thread Xiyang Chen
Hi, I have a dictionary of multi-word phrases and I'd like to analyze documents such that anything that appears in the dictionary will be treated as one single token. For example, if the dictionary contains "brown fox", then the sentence The quick brown fox jumps over the lazy dog. Will be tok