Lucene Book

2004-09-07 Thread ebrahim . faisal
Hi I am new to Lucene. Can anyone guide me from where i can download free Lucene book. Thanx Regards E.Faisal Important Email Information :- The information in this email is confidential and may be legally privileged. It is intended solely for the addressee. Access to this email by

Re: Lucene Book

2004-09-07 Thread Erik Hatcher
On Sep 7, 2004, at 3:00 AM, [EMAIL PROTECTED] wrote: I am new to Lucene. Can anyone guide me from where i can download free Lucene book. Free?! http://www.manning.com/hatcher2 is the book Otis and I have spent the last year laboring on. It has been a long hard effort that is about to come to

Re: Lucene Book

2004-09-07 Thread Terry Steichen
Jeez, Erik! Where's your sense of public spirit ;-) Terry PS: Glad to hear you're (finally!) nearing publication. - Original Message - From: Erik Hatcher To: Lucene Users List Sent: Tuesday, September 07, 2004 6:43 AM Subject: Re: Lucene Book On Sep 7, 2004, at 3:00

Re: Lucene Book

2004-09-07 Thread Otis Gospodnetic
Hello Ebrahim, Like Erik said, the book about Lucene is coming soon. Although it won't be free, Erik, I, and a few other people already shared some of our knowledge about Lucene in several articles about Lucene. There is a page on the Lucene Wiki that has links to all known Lucene articles. I

Use of + and - in queries

2004-09-07 Thread Bill Tschumy
I don't understand the difference in using + and - in queries compared to using AND and NOT. Even the Query Syntax document seems a bit confused. In the section on the NOT operator it says: To search for documents that contain jakarta apache but not jakarta lucene use the query: jakarta

RE: Spam:too many open files

2004-09-07 Thread wallen
I sent out an email to this list a few weeks ago about how to fix a corrupt index. I basically edited the segments file with a hex editor removing the entry for the missing file and decremented the total count of files from the file count that is near the beginning of the segments file.

RE: Spam:too many open files

2004-09-07 Thread wallen
A note to developers, the code checked into lucene CVS ~Aug 15th, post 1.4.1, was causing frequent index corruptions. When I reverted back to version 1.4 I no longer am getting the corruptions. I was unable to trace the problem to anything specific, but was using the newer code to take advantage

Re: telling one version of the index from another?

2004-09-07 Thread Doug Cutting
Bill Janssen wrote: Hi. Hey, Bill. It's been a long time! I've got a Lucene application that's been in use for about two years. Some users are using Lucene 1.2, some 1.3, and some are moving to 1.4. The indices seem to behave differently under each version. I'd like to add code to my application

Re: Possible to remove duplicate documents in sort API?

2004-09-07 Thread Doug Cutting
Kevin A. Burton wrote: My problem is that I have two machines... one for searching, one for indexing. The searcher has an existing index. The indexer found an UPDATED document and then adds it to a new index and pushes that new index over to the searcher. The searcher then reloads and when

Re: Why doesn't Document use a HashSet instead of a LinkedList (DocumentFieldList)

2004-09-07 Thread Doug Cutting
Kevin A. Burton wrote: It looks like Document.java uses its own implementation of a LinkedList.. Why not use a HashMap to enable O(1) lookup... right now field lookup is O(N) which is certainly no fun. Was this benchmarked? Perhaps theres the assumption that since documents often have few

Re: Spam:too many open files

2004-09-07 Thread Daniel Naber
On Tuesday 07 September 2004 17:41, [EMAIL PROTECTED] wrote: A note to developers, the code checked into lucene CVS ~Aug 15th, post 1.4.1, was causing frequent index corruptions. When I reverted back to version 1.4 I no longer am getting the corruptions. Here are some changes from around that

getting most common terms for a smaller set of documents

2004-09-07 Thread wallen
Dear Lucene Users: What is the best way to get the most common terms for a subset of the total documents in your index? I know how to get the most common terms for a field for the entire index, but what is the most efficient way to do this for a subset of documents? Here is the code I am using

Re: telling one version of the index from another?

2004-09-07 Thread Bill Janssen
Thanks, Doug, much as I'd figured from looking at the code. Here's a follow-up question: Is there any programmatic way to tell which version of the Lucene code a program is using? A version number or string would be great (perhaps an idea for the next release), but a list of classes in one

lucene locks index, tomcat has to stop and restart

2004-09-07 Thread hui liu
Hi all, I met with such a problem with lucene demo: Each time when I create lucene index, I have to first stop tomcat, and restart tomcat after the index is created. The reason is: the index is locked when using IndexReader.open(index) method in the jsp file. So, I tried to modify the jsp codes

Moving from a single server to a cluster

2004-09-07 Thread Ben Sinclair
My application currently uses Lucene with an index living on the filesystem, and it works fine. I'm moving to a clustered environment soon and need to figure out how to keep my indexes together. Since the index is on the filesystem, each machine in the cluster will end up with a different index.

lucene index parser problem

2004-09-07 Thread hui liu
Hi, I have such a problem when creating lucene index for many html files: It shows aborted, expectedtagnametagend for those html files which contain java scripts. It seems it cannot parse the tags \. Does anyone has any solution? Thank you very very much...!!! Ivy.

lucene locks index, tomcat has to stop and restart

2004-09-07 Thread hui liu
Hi, I met with such a problem with lucene demo: Each time when I create lucene index, I have to first stop tomcat, and restart tomcat after the index is created. The reason is: the index is locked when using IndexReader.open(index) method in the jsp file. So, I tried to modify the jsp codes by

Re: lucene index parser problem

2004-09-07 Thread Patrick Burleson
Why oh why did you send this to the tomcat lists? Don't cross post! Especially when the question doesn't even apply to one of the lists. Patrick On Tue, 7 Sep 2004 16:35:35 -0400, hui liu [EMAIL PROTECTED] wrote: Hi, I have such a problem when creating lucene index for many html files:

Re: lucene locks index, tomcat has to stop and restart

2004-09-07 Thread Patrick Burleson
This isn't a Tomcat specific problem, but sounds like a problem with how you the reader is being used. Somewhere in the JSP a IndexReader variable was probably assigned to. A line something like: IndexReader ir = IndexReader.open(somepath); To close the reader, and thus solve the problem,

Re: lucene locks index, tomcat has to stop and restart

2004-09-07 Thread Patrick Burleson
Ah, I see your problem. From the Lucene Javadocs on IndexSearcher.close(): Note that the underlying IndexReader is not closed, if IndexSearcher was constructed with IndexSearcher(IndexReader r). If the IndexReader was supplied implicitly by specifying a directory, then the IndexReader gets

Re: Use of + and - in queries

2004-09-07 Thread Otis Gospodnetic
Hi Bill, No difference, it's just that Lucene's query syntax recognizes both 'NOT' and '-' and uses them the same way - to exclude certain documents from sesrch results. Otis --- Bill Tschumy [EMAIL PROTECTED] wrote: I don't understand the difference in using + and - in queries compared to

Re: Moving from a single server to a cluster

2004-09-07 Thread Otis Gospodnetic
I've used scp and rsync successfully in the past. Lucene now includes a remote searcher (RMI stuff), so you may want to consider a single index, too. Otis --- Ben Sinclair [EMAIL PROTECTED] wrote: My application currently uses Lucene with an index living on the filesystem, and it works fine.

RE: too many open files

2004-09-07 Thread Will Allen
I suspect it has to do with this change: --- jakarta-lucene/src/java/org/apache/lucene/index/SegmentMerger.java 2004/08/08 13:03:59 1.12 +++ jakarta-lucene/src/java/org/apache/lucene/index/SegmentMerger.java 2004/08/11 17:37:52 1.13 I wouldn't know where to start to reproduce the

Re: Spam:too many open files

2004-09-07 Thread Dmitry Serebrennikov
Hi Wallen, Actually, the files Daniel listed were modified on 8/11 and then again on 8/15. In the time between 8/11 to 8/15, I belive there could have been any number of problems, including corrupt indexes and poor multithreaded performance. However, I think after 8/15, the files should be in

MultiFieldQueryParser seems broken... Fix attached.

2004-09-07 Thread Bill Janssen
Hi! I'm using Lucene for an application which has lots of fields/document, in which the users can specify in their config files what fields they wish to be included by default in a search. I'd been happily using MultiFieldQueryParser to do the searches, but the darn users started demanding more

RE: Spam:too many open files

2004-09-07 Thread Will Allen
I will deploy and test through the end of the week and report back Friday if the problem persists. Thank you! -Original Message- From: Dmitry Serebrennikov [mailto:[EMAIL PROTECTED] Sent: Tuesday, September 07, 2004 8:40 PM To: Lucene Users List Subject: Re: Spam:too many open files

Use of explain() vs search()

2004-09-07 Thread Minh Kama Yie
Hi all, I was wondering if anyone could tell me what the expected behaviour is for calling an explain() without calling a search() first on a particular query. Would it effectively do a search and then I can examine the Explanation in order to check whether it matches? I'm currently looking at

Re: Use of explain() vs search()

2004-09-07 Thread Minh Kama Yie
Hi all, Sorry I should clarify my last point. The search() would return no hits, but the explain() using the apparently invalid docId returns a value greater than 0. For what it's worth it's performing a PhraseQuery. Thanks in advance, Minh Minh Kama Yie wrote: Hi all, I was wondering if anyone

pdf in Chinese

2004-09-07 Thread [EMAIL PROTECTED]
Hi all, i use pdfbox to parse pdf file to lucene document.when i parse Chinese pdf file,pdfbox is not always success. Is anyone have some advice? - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands,