Lucene 1.3 final to 1.4final problem

2004-07-08 Thread Karthik N S
Hey Dev Guys Apologies I have a Quick Problem... The no of Hits on set of Documents indexed using 1.3-final is not same on 1.4-final version [ The only modification done to the src is , I have upgraded my CustomAnalyzer on basis of StopAnalyzer avaliable in 1.4 ] Does doing

Lucene 1.3 final to 1.4final problem

2004-07-08 Thread Karthik N S
Hey Dev Guys Apologies Can Some body Explain me Why for an I/P word TA to the StopAnalyzer.java returns [ta] instead of [ta] TA == [ta] instead of [ta] $125.96 === [125.95] instead of [$125.95] Is it something wrong I have been missing. with

Re: boolean operators and score

2004-07-08 Thread Niraj Alok
If i do it by sorting the input before sending it to lucene, it could become unmanageable to handle and could also throw unexpected results for the user. e.g . if i type: winston churchill and world war and germany i could split the string by and and get the sorted string as (churchill winston)

Re: boolean operators and score

2004-07-08 Thread Brisbart Franck
Niraj Alok wrote: Hi Guys, Finally I have sorted the problem of hits score thanks to the great help of Franck. I have hit another problem with the boolean operators now. When I search for Winston and churchill i get a set of perfectly acceptable results. But when I change the order, churchill and

Re: indexing help

2004-07-08 Thread Grant Ingersoll
Hi John, The source code is available from CVS, make it non-final and do what you need to do. Of course, you may have a hard time finding help later if you aren't using something everyone else is and your solution doesn't work... :-) If I understand correctly what you are trying to do, you

Re: boolean operators and score

2004-07-08 Thread Don Vaillancourt
What could actually be done is perhaps sort the search result by document id. Of course your relevancy will be all shot, but at least you would have control over the sorting order. At 09:05 AM 07/07/2004, you wrote: Hi Guys, Finally I have sorted the problem of hits score thanks to the great

RE: Problem with match on a non tokenized field.

2004-07-08 Thread Polina Litvak
Thanks a lot for your help. I have one more question: How would you handle a query consisting of two fields combined with a Boolean operator, where one field is only indexed and stored (a Keyword) and another is tokenized, indexed and store ? Is it possible to have parts of the same query

Re: indexing help

2004-07-08 Thread John Wang
Hi Grant: Thanks for the options. How likely will the lucene file formats change? Are there really no more optiosn? :(... Thanks -John On Thu, 08 Jul 2004 08:50:44 -0400, Grant Ingersoll [EMAIL PROTECTED] wrote: Hi John, The source code is available from CVS, make it non-final

Re: indexing help

2004-07-08 Thread John Wang
Hi Grant: I have something that would extract only the important words from a document along with its importance, furthermore, these important words may not be physically in the document, it could be synonyms to some of the words in the document. So the output of a process for a document is

Re: Way to repair an index broking during 1/2 optimize?

2004-07-08 Thread Peter M Cipollone
You might try merging the existing index into a new index located on a ram disk. Once it is done, you can move the directory from ram disk back to your hard disk. I think this will work as long as the old index did not finish merging. You might do a strings command on the segments file to make

problem running lucene 1.4 demo on a solaris machine (permission denied)

2004-07-08 Thread MATL (Mats Lindberg)
Hello I have downloaded the lucene 1.4 to a windows machine, and it all works fine, when i tries to move this to a solaris machine i get the following error: /opt/tomcat/common/lib/lucene-1.4-final.jar: cannot execute If i then tries to change the permission (777) on the above file, i get

Re: Way to repair an index broking during 1/2 optimize?

2004-07-08 Thread Doug Cutting
Kevin A. Burton wrote: So is it possible to fix this index now? Can I just delete the most recent segment that was created? I can find this by ls -alt Sorry, I forgot to answer your question: this should work fine. I don't think you should even have to delete that segment. Also, to elaborate

Re: problem running lucene 1.4 demo on a solaris machine (permission denied)

2004-07-08 Thread Doug Cutting
MATL (Mats Lindberg) wrote: When i copied the lucene jar file to the solaris machine from the windows machine i used a ftp program. FTP probably mangled the file. You need to use FTP's binary mode. Doug - To unsubscribe, e-mail:

Re: Way to repair an index broking during 1/2 optimize?

2004-07-08 Thread Kevin A. Burton
Peter M Cipollone wrote: You might try merging the existing index into a new index located on a ram disk. Once it is done, you can move the directory from ram disk back to your hard disk. I think this will work as long as the old index did not finish merging. You might do a strings command on

Re: Way to repair an index broking during 1/2 optimize?

2004-07-08 Thread Kevin A. Burton
Doug Cutting wrote: Kevin A. Burton wrote: Also... what can I do to speed up this optimize? Ideally it wouldn't take 6 hours. Was this the index with the mergeFactor of 5000? If so, that's why it's so slow: you've delayed all of the work until the end. Indexing on a ramfs will make things

Re: Way to repair an index broking during 1/2 optimize?

2004-07-08 Thread Kevin A. Burton
Doug Cutting wrote: Kevin A. Burton wrote: So is it possible to fix this index now? Can I just delete the most recent segment that was created? I can find this by ls -alt Sorry, I forgot to answer your question: this should work fine. I don't think you should even have to delete that segment.

Re: Understanding TooManyClauses-Exception and Query-RAM-size

2004-07-08 Thread Kevin A. Burton
[EMAIL PROTECTED] wrote: Hi, a couple of weeks ago we migrated from Lucene 1.2 to 1.4rc3. Everything went smoothly, but we are experiencing some problems with that new constant limit maxClauseCount=1024 which leeds to Exceptions of type org.apache.lucene.search.BooleanQuery$TooManyClauses

Re: indexing help

2004-07-08 Thread John Wang
Thanks Doug. I will do just that. Just for my education, can you maybe elaborate on using the implement an IndexReader that delivers a synthetic index approach? Thanks in advance -John On Thu, 08 Jul 2004 10:01:59 -0700, Doug Cutting [EMAIL PROTECTED] wrote: John Wang wrote: The

Re: Way to repair an index broking during 1/2 optimize?

2004-07-08 Thread Doug Cutting
Kevin A. Burton wrote: No... I changed the mergeFactor back to 10 as you suggested. Then I am confused about why it should take so long. Did you by chance set the IndexWriter.infoStream to something, so that it logs merges? If so, it would be interesting to see that output, especially the last

Re: Lucene shouldn't use java.io.tmpdir

2004-07-08 Thread Kevin A. Burton
Otis Gospodnetic wrote: Hey Kevin, Not sure if you're aware of it, but you can specify the lock dir, so in your example, both JVMs could use the exact same lock dir, as long as you invoke the VMs with the same params. Most people won't do this or won't even understand WHY they need to do this

Re: Way to repair an index broking during 1/2 optimize?

2004-07-08 Thread Kevin A. Burton
Doug Cutting wrote: Kevin A. Burton wrote: No... I changed the mergeFactor back to 10 as you suggested. Then I am confused about why it should take so long. Did you by chance set the IndexWriter.infoStream to something, so that it logs merges? If so, it would be interesting to see that output,

Re: Lucene shouldn't use java.io.tmpdir

2004-07-08 Thread Doug Cutting
Kevin A. Burton wrote: This is why I think it makes more sense to use our own java.io.tmpdir to be on the safe side. I think the bug is that Tomcat changes java.io.tmpdir. I thought that the point of the system property java.io.tmpdir was to have a portable name for /tmp on unix,

Re: indexing help

2004-07-08 Thread Doug Cutting
John Wang wrote: Just for my education, can you maybe elaborate on using the implement an IndexReader that delivers a synthetic index approach? IndexReader is an abstract class. It has few data fields, and few non-static methods that are not implemented in terms of abstract methods. So, in

Re: Lucene shouldn't use java.io.tmpdir

2004-07-08 Thread Kevin A. Burton
Doug Cutting wrote: Kevin A. Burton wrote: This is why I think it makes more sense to use our own java.io.tmpdir to be on the safe side. I think the bug is that Tomcat changes java.io.tmpdir. I thought that the point of the system property java.io.tmpdir was to have a portable name for /tmp on

Where's the search(Query query, Sort sort) method of Searcher

2004-07-08 Thread Bill Tschumy
I'm trying to do a search and sort the results using a Sort object. The 1.4-final API says that Searcher has the following method. Hits search(Query query, Sort sort) However, when I try to use it in the code below: IndexSearcher is = new IndexSearcher(fsDir); Query query =

Re: Way to repair an index broking during 1/2 optimize?

2004-07-08 Thread Doug Cutting
Kevin A. Burton wrote: During an optimize I assume Lucene starts writing to a new segment and leaves all others in place until everything is done and THEN deletes them? That's correct. The only settings I uses are: targetIndex.mergeFactor=10; targetIndex.minMergeDocs=1000; the resulting index has

Re: Way to repair an index broking during 1/2 optimize?

2004-07-08 Thread Kevin A. Burton
Doug Cutting wrote: Something sounds very wrong for there to be that many files. The maximum number of files should be around: (7 + numIndexedFields) * (mergeFactor-1) * (log_base_mergeFactor(numDocs/minMergeDocs)) With 14M documents, log_10(14M/1000) is 4, which gives, for you: (7 +

Browse by Letter within a Category

2004-07-08 Thread O'Hare, Thomas
I would like to implement the following functionality: - Search a specific field (category) and limit the search where the title field begins with a given letter, and return the results sorted in alphabetical order by title. Both the category and title fields are tokenized, indexed and stored in

Re: boolean operators and score

2004-07-08 Thread Niraj Alok
Hi Don, After months of struggling with lucene and finally achieving the complex relevancy desired, the client would kill me if i now make that relevancy all lost. I am trying to do it with the way Franck suggested by sorting the words the user has entered, but otherwise, isn't this a bug of