Analyzer Incorrect?
Hi all, Sorry for the flood of questions this week, clients finally started using the search engine I wrote which uses Lucene. When I first started developing with Lucene the Analyzers it came with did some odd things so I decided to implement my own but it is not working the way I expect it to. First and foremost I would like to like to have case insensitive searches and I do not want to tokenize the fields. No field will ever have a space in it so therefore there is no need to tokenize it. I came up with this Analyzer but case still seems to be an issue: public TokenStream tokenStream(String field, final Reader reader) { // do not tokenize any field TokenStream t = new CharTokenizer(reader) { protected boolean isTokenChar(char c) { return true; } }; //case insensitive search t = new LowerCaseFilter(t); return t; } Is there anything I am doing wrong in the Analyzer I have written? Thanks, Rob - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Analyzer Incorrect?
On Friday 04 April 2003 05:24, Rob Outar wrote: Hi all, Sorry for the flood of questions this week, clients finally started using the search engine I wrote which uses Lucene. When I first started Yup... that's the root of all evil. :-) (I'm in similar situation, going through user acceptance test as we speak... and getting ready to do second version that'll have more advanced metadata based search using Lucene). developing with Lucene the Analyzers it came with did some odd things so I decided to implement my own but it is not working the way I expect it to. First and foremost I would like to like to have case insensitive searches and I do not want to tokenize the fields. No field will ever have a space If you don't need to tokenize a field, you don't need an analyzer either. However, to get case insensitive search, you should lower-case field contents before adding them to document. QueryParser will do lower casing for search terms automatically (if you are using it), so matching should work fine then. -+ Tatu +- - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Analyzer Incorrect?
Yeah it has been a bad week. I don't think Query parser is not lowercasing my fields, maybe it is something I am doing wrong: public synchronized String[] queryIndex(String query) throws ParseException, IOException { checkForIndexChange(); QueryParser p = new QueryParser(, new RepositoryIndexAnalyzer()); this.query = p.parse(query); Hits hits = this.searcher.search(this.query); return buildReturnArray(hits); } When I create Querypaser I do not want it to have default field since clients can query on whatever field they want. I use my Analyzer which I do not think is lowercasing the fields because I have tested querying with all lowercase (got results) with mixed case (no results) so I think my code or my analyzer is hosed. Thanks, Rob -Original Message- From: Tatu Saloranta [mailto:[EMAIL PROTECTED] Sent: Friday, April 04, 2003 9:09 AM To: Lucene Users List Subject: Re: Analyzer Incorrect? On Friday 04 April 2003 05:24, Rob Outar wrote: Hi all, Sorry for the flood of questions this week, clients finally started using the search engine I wrote which uses Lucene. When I first started Yup... that's the root of all evil. :-) (I'm in similar situation, going through user acceptance test as we speak... and getting ready to do second version that'll have more advanced metadata based search using Lucene). developing with Lucene the Analyzers it came with did some odd things so I decided to implement my own but it is not working the way I expect it to. First and foremost I would like to like to have case insensitive searches and I do not want to tokenize the fields. No field will ever have a space If you don't need to tokenize a field, you don't need an analyzer either. However, to get case insensitive search, you should lower-case field contents before adding them to document. QueryParser will do lower casing for search terms automatically (if you are using it), so matching should work fine then. -+ Tatu +- - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: about increment update
Thank you Ian, it works. Kerr. - Original Message - From: Ian Lea [EMAIL PROTECTED] To: kerr [EMAIL PROTECTED] Cc: Lucene Users List [EMAIL PROTECTED] Sent: Friday, April 04, 2003 4:55 AM Subject: Re: about increment update Try this: 1. Open reader. 2. removeModifiedFiles(reader) 3. reader.close() 4. Open writer. 5. updateIndexDocs() 6. writer.close(); i.e. don't have both reader and writer open at the same time. btw I suspect you might be removing index entries for files that have been modified, but adding all files. Another index keeps growing problem! Could be wrong. -- Ian. [EMAIL PROTECTED] (kerr) wrote Thank you Otis, Yes, reader should be closed. But it isn't the reason of this Exception. the errors happen before deleting file. Kerr. close() Closes files associated with this index. Also saves any new deletions to disk. No other methods should be called after this has been called. - Original Message - From: Otis Gospodnetic [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Thursday, April 03, 2003 12:14 PM Subject: Re: about increment update Maybe this is missing? http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/index/IndexReader.html#close() Otis --- kerr [EMAIL PROTECTED] wrote: Hello everyone, Here I try to increment update index file and follow the idea to delete modified file first and re-add it. Here is the source. But when I execute it, the index directory create a file(write.lock) when execute the line reader.delete(i);, and caught a class java.io.IOException with message: Index locked for write. After that, when I execute the line IndexWriter writer = new IndexWriter(index, new StandardAnalyzer(), false); caught a class java.io.IOException with message: Index locked for write if I delete the file(write.lock), the error will re-happen. anyone can help and thanks. Kerr. import org.apache.lucene.analysis.standard.StandardAnalyzer; import org.apache.lucene.index.IndexWriter; import org.apache.lucene.document.Document; import org.apache.lucene.document.Field; import org.apache.lucene.store.Directory; import org.apache.lucene.store.FSDirectory; import org.apache.lucene.index.IndexReader; import org.apache.lucene.index.Term; import java.io.File; import java.util.Date; public class UpdateIndexFiles { public static void main(String[] args) { try { Date start = new Date(); Directory directory = FSDirectory.getDirectory(index, false); IndexReader reader = IndexReader.open(directory); System.out.println(reader.isLocked(directory)); //reader.unlock(directory); IndexWriter writer = new IndexWriter(index, new StandardAnalyzer(), false); String base = ; if (args.length == 0){ base = D:\\Tomcat\\webapps\\ROOT\\test; } else { base = args[0]; } removeModifiedFiles(reader); updateIndexDocs(reader, writer, new File(base)); writer.optimize(); writer.close(); Date end = new Date(); System.out.print(end.getTime() - start.getTime()); System.out.println( total milliseconds); } catch (Exception e) { System.out.println( caught a + e.getClass() + \n with message: + e.getMessage()); e.printStackTrace(); } } public static void removeModifiedFiles(IndexReader reader) throws Exception { Document adoc; String path; File aFile; for (int i=0; ireader.numDocs(); i++){ adoc = reader.document(i); path = adoc.get(path); aFile = new File(path); if (reader.lastModified(path) aFile.lastModified()){ System.out.println(reader.isLocked(path)); reader.delete(i); } } } public static void updateIndexDocs(IndexReader reader, IndexWriter writer, File file) throws Exception { if (file.isDirectory()) { String[] files = file.list(); for (int i = 0; i files.length; i++) updateIndexDocs(reader, writer, new File(file, files[i])); } else { if (!reader.indexExists(file)){ System.out.println(adding + file); writer.addDocument(FileDocument.Document(file)); } else {} } } } -- Searchable personal storage and archiving from http://www.digimem.net/ - To unsubscribe, e-mail: [EMAIL