date:20030404

Analyzer Incorrect?

2003-04-04 Thread Rob Outar

Hi all,

Sorry for the flood of questions this week, clients finally started using
the search engine I wrote which uses Lucene.  When I first started
developing with Lucene the Analyzers it came with did some odd things so I
decided to implement my own but it is not working the way I expect it to.
First and foremost I would like to like to have case insensitive searches
and I do not want to tokenize the fields.  No field will ever have a space
in it so therefore there is no need to tokenize it.  I came up with this
Analyzer but case still seems to be an issue:

  public TokenStream tokenStream(String field, final Reader reader) {

// do not tokenize any field
TokenStream t = new CharTokenizer(reader) {
protected boolean isTokenChar(char c) {
return true;
}
};

//case insensitive search
t = new LowerCaseFilter(t);
return t;
}

Is there anything I am doing wrong in the Analyzer I have written?

Thanks,

Rob


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: Analyzer Incorrect?

2003-04-04 Thread Tatu Saloranta

On Friday 04 April 2003 05:24, Rob Outar wrote:
 Hi all,

   Sorry for the flood of questions this week, clients finally started using
 the search engine I wrote which uses Lucene.  When I first started

Yup... that's the root of all evil. :-)
(I'm in similar situation, going through user acceptance test as we speak... 
and getting ready to do second version that'll have more advanced metadata
based search using Lucene).

 developing with Lucene the Analyzers it came with did some odd things so I
 decided to implement my own but it is not working the way I expect it to.
 First and foremost I would like to like to have case insensitive searches
 and I do not want to tokenize the fields.  No field will ever have a space

If you don't need to tokenize a field, you don't need an analyzer either. 
However, to get case insensitive search, you should lower-case field contents 
before adding them to document. QueryParser will do lower casing for search 
terms automatically (if you are using it), so matching should work fine then.

-+ Tatu +-


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

RE: Analyzer Incorrect?

2003-04-04 Thread Rob Outar

Yeah it has been a bad week.  I don't think Query parser is not lowercasing
my fields, maybe it is something I am doing wrong:

 public synchronized String[] queryIndex(String query) throws
ParseException,
IOException {

checkForIndexChange();
QueryParser p = new QueryParser(,
new RepositoryIndexAnalyzer());
this.query = p.parse(query);
Hits hits = this.searcher.search(this.query);
return buildReturnArray(hits);

}

When I create Querypaser I do not want it to have default field since
clients can query on whatever field they want.  I use my Analyzer which I do
not think is lowercasing the fields because I have tested querying with all
lowercase (got results) with mixed case (no results) so I think my code or
my analyzer is hosed.

Thanks,

Rob


-Original Message-
From: Tatu Saloranta [mailto:[EMAIL PROTECTED]
Sent: Friday, April 04, 2003 9:09 AM
To: Lucene Users List
Subject: Re: Analyzer Incorrect?


On Friday 04 April 2003 05:24, Rob Outar wrote:
 Hi all,

   Sorry for the flood of questions this week, clients finally started using
 the search engine I wrote which uses Lucene.  When I first started

Yup... that's the root of all evil. :-)
(I'm in similar situation, going through user acceptance test as we speak...
and getting ready to do second version that'll have more advanced metadata
based search using Lucene).

 developing with Lucene the Analyzers it came with did some odd things so I
 decided to implement my own but it is not working the way I expect it to.
 First and foremost I would like to like to have case insensitive searches
 and I do not want to tokenize the fields.  No field will ever have a space

If you don't need to tokenize a field, you don't need an analyzer either.
However, to get case insensitive search, you should lower-case field
contents
before adding them to document. QueryParser will do lower casing for search
terms automatically (if you are using it), so matching should work fine
then.

-+ Tatu +-


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]


-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: about increment update

2003-04-04 Thread keelkerr

Thank you Ian, it works.  Kerr.
- Original Message - 
From: Ian Lea [EMAIL PROTECTED]
To: kerr [EMAIL PROTECTED]
Cc: Lucene Users List [EMAIL PROTECTED]
Sent: Friday, April 04, 2003 4:55 AM
Subject: Re: about increment update


 Try this:
 
 1.  Open reader.
 2.  removeModifiedFiles(reader)
 3.  reader.close()
 4.  Open writer.
 5.  updateIndexDocs()
 6.  writer.close();
 
 i.e. don't have both reader and writer open at the same time.
 
 btw I suspect you might be removing index entries for files that
 have been modified, but adding all files. Another index keeps
 growing problem!  Could be wrong.
 
 
 --
 Ian.
 
  [EMAIL PROTECTED] (kerr) wrote 
 
  Thank you Otis,
  Yes, reader should be closed. But it isn't the reason of this Exception.
  the errors happen before deleting file.
 Kerr.
  close()
  Closes files associated with this index. Also saves any new deletions to disk. No 
  other methods should be called after this has been called.
  
  - Original Message - 
  From: Otis Gospodnetic [EMAIL PROTECTED]
  To: Lucene Users List [EMAIL PROTECTED]
  Sent: Thursday, April 03, 2003 12:14 PM
  Subject: Re: about increment update
  
  
   Maybe this is missing?
   http://jakarta.apache.org/lucene/docs/api/org/apache/lucene/index/IndexReader.html#close()
   
   Otis
   
   --- kerr [EMAIL PROTECTED] wrote:
Hello everyone,
Here I try to increment update index file and follow the idea to
delete modified file first and re-add it. Here is the source.
But when I execute it, the index directory create a file(write.lock)
when execute the line
reader.delete(i);, 
and caught a class java.io.IOException   with message: Index locked
for write.
After that, when I execute the line
IndexWriter writer = new IndexWriter(index, new
StandardAnalyzer(), false);
caught a class java.io.IOException   with message: Index locked for
write
if I delete the file(write.lock), the error will re-happen.
anyone can help and thanks.
   Kerr.


import org.apache.lucene.analysis.standard.StandardAnalyzer;
import org.apache.lucene.index.IndexWriter;
import org.apache.lucene.document.Document;
import org.apache.lucene.document.Field;
import org.apache.lucene.store.Directory;
import org.apache.lucene.store.FSDirectory;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.Term;

import java.io.File;
import java.util.Date;


public class UpdateIndexFiles {
  public static void main(String[] args) {
try {
  Date start = new Date();

  Directory directory = FSDirectory.getDirectory(index, false);
  IndexReader reader = IndexReader.open(directory);
  System.out.println(reader.isLocked(directory));
  //reader.unlock(directory);
  IndexWriter writer = new IndexWriter(index, new
StandardAnalyzer(), false);

  String base = ;
  if (args.length == 0){
base = D:\\Tomcat\\webapps\\ROOT\\test;
  } else {
base = args[0];
  }
  removeModifiedFiles(reader);
  updateIndexDocs(reader, writer, new File(base));

  writer.optimize();
  writer.close();

  Date end = new Date();

  System.out.print(end.getTime() - start.getTime());
  System.out.println( total milliseconds);

} catch (Exception e) {
  System.out.println( caught a  + e.getClass() +
   \n with message:  + e.getMessage());
  e.printStackTrace();
}
  }

  public static void removeModifiedFiles(IndexReader reader) throws
Exception {
Document adoc;
String path;
File aFile;
for (int i=0; ireader.numDocs(); i++){
  adoc = reader.document(i);
  path = adoc.get(path);
  aFile = new File(path);
  if (reader.lastModified(path)  aFile.lastModified()){
System.out.println(reader.isLocked(path));
reader.delete(i);
  }
}
  }

  public static void updateIndexDocs(IndexReader reader, IndexWriter
writer, File file)
   throws Exception {

if (file.isDirectory()) {
  String[] files = file.list();
  for (int i = 0; i  files.length; i++)
  updateIndexDocs(reader, writer, new File(file, files[i]));
} else {
  if (!reader.indexExists(file)){
System.out.println(adding  + file);
writer.addDocument(FileDocument.Document(file));
  } else {}
}
  }
}
 
 --
 Searchable personal storage and archiving from http://www.digimem.net/
 
 





 -
 To unsubscribe, e-mail: [EMAIL

Analyzer Incorrect?

Re: Analyzer Incorrect?

RE: Analyzer Incorrect?

Re: about increment update

4 matches

Site Navigation

Mail list logo

Footer information