Indexing sit (stuff it) files

2005-03-02 Thread Luke Shannon
Hello; I've almost completed my zip file indexer. I used the following to get an InputStream for each file in the archive: ZipFile zip = new ZipFile(new File(fileLocation)); ZipEntry zipEntry; Enumeration files = zip.entries(); while (files.hasMoreElements

Re: Zip Files

2005-03-07 Thread Luke Shannon
ched.getPath() + " " + e); return null; } } - Original Message - From: "Luke Shannon" <[EMAIL PROTECTED]> To: "Lucene Users List" Sent: Tuesday, March 01, 2005 10:34 AM Subject: Zip Files >

Score Question

2005-03-09 Thread Luke Shannon
Hi; Has the scoring changed recently? I just upgraded all the jars is our application (Lucene included). I'm getting scores like this from documents in hits: 6.9699495E-4 The XSL that creates the user interface converts the score to an int and than display it. This currently resulting in a zero

Re: Score Question

2005-03-10 Thread Luke Shannon
A couple of times. Luke - Original Message - From: "Erik Hatcher" <[EMAIL PROTECTED]> To: Sent: Wednesday, March 09, 2005 8:03 PM Subject: Re: Score Question > Did you reindex after upgrading? > > Erik > > On Mar 9, 2005, at 5:55 PM, Luke Shannon wr

Re: Score Question

2005-03-10 Thread Luke Shannon
I think I've found my problem. In the example I'm having the problem with I do a multiple field query. I think I need to play with my boosting factors. This is the section of the book that I think will lead to a resolution to my problem: In addition to the explicit factors in this equation, othe

Boost/Scoring Question

2005-03-10 Thread Luke Shannon
Hello; This may be a trivial questions, but it has me stuck. I'm getting some really small scores: 8.799379E-4 I need to figure out why they are so small. I think it is problem which can be resolved using boosting. I'm not sure how to boost given the system I have. The fields I query against

Re: Best way to purposely corrupt an index?

2005-04-19 Thread Luke Shannon
The only time I have seen corrupted indexes is when the java process is killed during the indexing process. If you shutdown tomcat (or what ever you are running for java) during the indexing process you will end up with a corrupted index. - Original Message - From: "Andy Roberts" <[EMAIL

Re: getting document metadata

2005-05-03 Thread Luke Shannon
Hi Pablo; Can you give a little more detail? I don't understand what you mean when you say "indexing the path when adding the document to the index". If you get a Lucene document using LucenePDFDocument class (http://www.pdfbox.org/javadoc/index.html), the document returned will contain a field

Re: indexing synonyms / reducing the index size

2005-05-05 Thread Luke Shannon
Hi Pablo; I handle synonyms in the Query rather than the Index. Whenever I build a query I check to see if there is a synonym for each word, or a replacement for the entire string the user is searching on. If there is (either or both cases) I include all the synonyms/replacement strings applicable

Re: Max Field Length

2005-05-06 Thread Luke Shannon
Hi; I think by default only 10,000 terms will be indexed for a field. You can change this using the maxFieldLength method of IndexWriter. Luke - Original Message - From: "Ernesto De Santis" <[EMAIL PROTECTED]> To: "Lucene Users List" Sent: Friday, May 06, 2005 5:42 PM Subject: Max Fi