RE: How do you see if a tokenstream has tokens without consuming the tokens ?

2011-10-17 Thread Steven A Rowe
Hi Paul, You could add a rule to the StandardTokenizer JFlex grammar to handle this case, bypassing its other rules. Another option is to create a char filter that substitutes PUNCT-EXCLAMATION for exclamation points, PUNCT-PERIOD for periods, etc., but only when the entire input consists excl

How can I read records from Lucene

2011-10-17 Thread dyzc
Hi, I would like to read the term and its frequency or score out of indices. How can I do it using Java? Thanks!

Re: Is there any "Query" in Lucene can search the term, which is similar as "SQL-LIKE"?

2011-10-17 Thread Sujit Pal
Hi Mead, You may want to check out the permuterm index idea. http://www-nlp.stanford.edu/IR-book/html/htmledition/permuterm-indexes-1.html Basically you write a custom filter that takes a term and generates all word permutations off it. On the query side, you convert your query so its always a p

Re: How do you see if a tokenstream has tokens without consuming the tokens ?

2011-10-17 Thread Sujit Pal
Hi Paul, Since you have modified the StandardAnalyzer (I presume you mean StandardFilter), why not do a check on the term.text() and if its all punctuation, skip the analysis for that term? Something like this in your StandardFilter: public final boolean incrementToken() throws IOException { Ch

this IndexReader is closed only with jar

2011-10-17 Thread Zeynep P.
Hi, I am having a weird experience. I made a few changes with the source code (Lucene 3.3). I created a basic application to test it. First, I added Lucene 3.3 project to basic project as "required projects on the build path" to be able to debug. When everything was ok, I removed it from required

How do you see if a tokenstream has tokens without consuming the tokens ?

2011-10-17 Thread Paul Taylor
We have a modified version of a Lucene StandardAnalyzer , we use it for tokenizing music metadata such as as artist names & song titles, so typically only a few words. On tokenizing it usually it strips out punctuations which is correct, however if the input text consists of only punctuation

Re: IndexReader#reopen() on externally changed index

2011-10-17 Thread Michael McCandless
You'll have to call .commit() from the IndexWriter to make the changes externally visible. The call IndexReader.reopen to get a reader seeing the committed changes; the reopen will be efficient (only open "new" segments vs the old reader). It's still best to use near-real-time reader when possibl

Re: Picking single results out of a list of results

2011-10-17 Thread Ian Lea
The Hits class was deprecated at some point and has been removed from recent releases. The 2.9.3 javadoc at http://lucene.apache.org/java/2_9_3/api/core/org/apache/lucene/search/Hits.html shows a little code sample TopDocs topDocs = searcher.search(query, numHits); ScoreDoc[] hits = topDocs.sc