Hi Paul,
You could add a rule to the StandardTokenizer JFlex grammar to handle this
case, bypassing its other rules.
Another option is to create a char filter that substitutes PUNCT-EXCLAMATION
for exclamation points, PUNCT-PERIOD for periods, etc., but only when the
entire input consists excl
Hi,
I would like to read the term and its frequency or score out of indices. How
can I do it using Java?
Thanks!
Hi Mead,
You may want to check out the permuterm index idea.
http://www-nlp.stanford.edu/IR-book/html/htmledition/permuterm-indexes-1.html
Basically you write a custom filter that takes a term and generates all
word permutations off it. On the query side, you convert your query so
its always a p
Hi Paul,
Since you have modified the StandardAnalyzer (I presume you mean
StandardFilter), why not do a check on the term.text() and if its all
punctuation, skip the analysis for that term? Something like this in
your StandardFilter:
public final boolean incrementToken() throws IOException {
Ch
Hi,
I am having a weird experience. I made a few changes with the source code
(Lucene 3.3). I created a basic application to test it. First, I added
Lucene 3.3 project to basic project as "required projects on the build path"
to be able to debug. When everything was ok, I removed it from required
We have a modified version of a Lucene StandardAnalyzer , we use it for
tokenizing music metadata such as as artist names & song titles, so
typically only a few words. On tokenizing it usually it strips out
punctuations which is correct, however if the input text consists of
only punctuation
You'll have to call .commit() from the IndexWriter to make the changes
externally visible.
The call IndexReader.reopen to get a reader seeing the committed
changes; the reopen will be efficient (only open "new" segments vs the
old reader).
It's still best to use near-real-time reader when possibl
The Hits class was deprecated at some point and has been removed from
recent releases.
The 2.9.3 javadoc at
http://lucene.apache.org/java/2_9_3/api/core/org/apache/lucene/search/Hits.html
shows a little code sample
TopDocs topDocs = searcher.search(query, numHits);
ScoreDoc[] hits = topDocs.sc