RE: Lucene1.4.1 + OutOf Memory

2004-11-10 Thread Karthik N S
Hi Guy's Apologies . I am NOT Using sorting code hits = multiSearcher.search(query, new Sort(new SortField(filename, SortField.STRING))); but using multiSearcher.search(query) in Core Files setup and still getting the Error. More Advises Required.. Karthik

RE: Lucene1.4.1 + OutOf Memory

2004-11-10 Thread iouli . golovatyi
Exception too many files open means: - searcher object is nor closed after query execution - too little file handlers Regards J. Karthik N S

Re: Locking issue

2004-11-10 Thread Erik Hatcher
On Nov 10, 2004, at 2:17 AM, [EMAIL PROTECTED] wrote: Otis or Erik, do you know if a Reader continously opening should cause the Writer to fail with a Lock obtain timed out error? No need to address individuals here. With the information provided, I have no idea what the issue may be. There

RE: Lucene1.4.1 + OutOf Memory

2004-11-10 Thread Karthik N S
Hi Guy's Apologies. That's Why Somebody on the form asked me to Switch to : 40 Mergerd Indexes [1000 subindexes each] + MultiSearcher / ParallelSearcher + Search on Content Field Only for 2 the problem of to many Files open was solved since now there were only 40

Re: Lucene1.4.1 + OutOf Memory

2004-11-10 Thread Erik Hatcher
On Nov 10, 2004, at 1:55 AM, Karthik N S wrote: Hi Guys Apologies.. No need to apologize for asking questions. History Ist type : 4 subindexes + MultiSearcher + Search on Content Field You've got 40,000 indexes aggregated under a MultiSearcher and you're wondering why you're

RE: Lucene1.4.1 + OutOf Memory

2004-11-10 Thread Rupinder Singh Mazara
hi all I had a similar problem with jdk1.4.1, Doug had sent me a patch which I am attaching following is the mail from Doug It sounds like the ThreadLocal in TermInfosReader is not getting correctly garbage collected when the TermInfosReader is collected. Researching a bit, this was a bug in

stopword AND validword throws exception

2004-11-10 Thread Sanyi
Hi! I've left out custom stopwords from my index using the StopAnalyzer(customstopwords). Now, when I try to searh the index the same way (StopAnalyzer(customstopwords)), it seems to act strange: This query works as expected: validword AND stopword (throws out the stopword part and searches

Re: Searching in keyword field ?

2004-11-10 Thread Thierry Ferrero
Thanks Justin, it works fine - Original Message - From: Justin Swanhart [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Tuesday, November 09, 2004 7:41 PM Subject: Re: Searching in keyword field ? You can add the category keyword multiple times to a document.

Re: stopword AND validword throws exception

2004-11-10 Thread Daniel Naber
On Wednesday 10 November 2004 10:46, Sanyi wrote: This query seems to crash: stopword AND validword (java.lang.ArrayIndexOutOfBoundsException: -1) I think this has been fixed in the development version (which will become Lucene 1.9). Regards Daniel -- http://www.danielnaber.de

Re: stopword AND validword throws exception

2004-11-10 Thread Morus Walter
Sanyi writes: This query works as expected: validword AND stopword (throws out the stopword part and searches for validword) This query seems to crash: stopword AND validword (java.lang.ArrayIndexOutOfBoundsException: -1) Maybe it can't handle the case if it had to remove the very

Re: stopword AND validword throws exception

2004-11-10 Thread Sanyi
Thanx for your replies guys. Now, I was trying to locate the latest patch for this problem group, and the last thread I've read about this is: http://issues.apache.org/bugzilla/show_bug.cgi?id=25820 It ends with an open question from Morus: If you want me to change the patch, let me know. That

RE: Lucene1.4.1 + OutOf Memory

2004-11-10 Thread Karthik N S
Hi Guy's Apologies.. Yes Erik The Day I switched from Lucene1.3.1 to Lucene1.4.1 We are using the CompoundFile format to writer.setUseCompoundFile(true); Some More Advises Please. Thx in advance -Original Message- From: Erik Hatcher [mailto:[EMAIL PROTECTED]

RE: Lucene1.4.1 + OutOf Memory

2004-11-10 Thread Karthik N S
Hi Rupinder Singh Mazara Apologies Can u Past the code on to the Mail instead of Attachement... [ Cause I am not bale to get the Attachement on the Company's mail ] Thx in advance Karthik -Original Message- From: Rupinder Singh Mazara [mailto:[EMAIL PROTECTED]

RE: Lucene1.4.1 + OutOf Memory

2004-11-10 Thread Rupinder Singh Mazara
karthik i think the core problem in your case is the use of compound files, i would be best to switch it off or alternatively issue a optimize as soon as the indexing is over. i am copying the file contents between file tags, the patch is to be applied on TermInfosReader.java, this was done

Re: stopword AND validword throws exception

2004-11-10 Thread Morus Walter
Sanyi writes: Thanx for your replies guys. Now, I was trying to locate the latest patch for this problem group, and the last thread I've read about this is: http://issues.apache.org/bugzilla/show_bug.cgi?id=25820 It ends with an open question from Morus: If you want me to change the

Re: stopword AND validword throws exception

2004-11-10 Thread Sanyi
But the fix seems to be included in 1.4.2. see http://cvs.apache.org/viewcvs.cgi/*checkout*/jakarta-lucene/CHANGES.txt?rev=1.96.2.4 item 5 Thank you! I'm just downloading 1.4.2. I hope it'll work ;) Sanyi __ Do you Yahoo!? Check out the

Re: Filters for Openoffice File Indexing available (Java)

2004-11-10 Thread Daniel Naber
On Monday 08 November 2004 11:30, Joachim Arrasz wrote: So now we are looking for search and index Filters for Lucene, that were able to integrate out OpenOffice Files also into search result. I don't know of any existing solutions, but it's not so difficult to write one: Extract the ZIP file

Re: Filters for Openoffice File Indexing available (Java)

2004-11-10 Thread Joachim Arrasz
Hi Daniel, I don't know of any existing solutions, but it's not so difficult to write one: Extract the ZIP file using Java's built-in ZIP classes and parse content.xml and meta.xml. I'm not sure if whitespace issues might become tricky, e.g. two paragraphs could be in the file as pone/pptwo/p,

Re: Indexing within an XML document

2004-11-10 Thread Otis Gospodnetic
Redirecting to lucene-user, which is more appropriate. I'm not sure what exactly the question is here, but: Parse your XML document and for each p element you encounter create a new Document instance, and then populate its fields with some data, like the URI data you mentioned. If you parse with

Re: Filters for Openoffice File Indexing available (Java)

2004-11-10 Thread Daniel Naber
On Wednesday 10 November 2004 15:18, Joachim Arrasz wrote: Why should i parse meta.xml? I thaught content.xml should be enough. It contains the file's title, keywords, and author etc (those are not in content.xml). Regards Daniel -- http://www.danielnaber.de

Indexing MS Files

2004-11-10 Thread Luke Shannon
I need to index Word, Excel and Power Point files. Is this the place to start? http://jakarta.apache.org/poi/ Is there something better? Thanks, Luke

Re: Indexing MS Files

2004-11-10 Thread Otis Gospodnetic
That's one place to start. The other one would be textmining.org, at least for Word files. I used both POI and Textmining API in Lucene in Action, and the latter was much simpler to use. You can also find some comments about both libs in lucene-user archives. People tend to like Textmining API

Re: Indexing MS Files

2004-11-10 Thread Luke Shannon
Thanks Otis. I am looking forward to this book. Any idea when it may be released? - Original Message - From: Otis Gospodnetic [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Wednesday, November 10, 2004 11:54 AM Subject: Re: Indexing MS Files That's one place to start.

Merging multiple indexes

2004-11-10 Thread Ravi
Whats's the simplest way to merge 2 or more indexes into one large index. Thanks in advance, Ravi. - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Locking issue

2004-11-10 Thread yahootintin . 1247688
No need to address individuals here. Sorry about that. I just respect the knowledge that you and Otis have about Lucene so that's why I was asking you specifically. With the information provided, I have no idea what the issue may be. Running the small sample file that is attached to the

Re: Indexing MS Files

2004-11-10 Thread Otis Gospodnetic
As Manning publications said, you should be able to get it for your grandma this Christmas. Otis --- Luke Shannon [EMAIL PROTECTED] wrote: Thanks Otis. I am looking forward to this book. Any idea when it may be released? - Original Message - From: Otis Gospodnetic [EMAIL

Re: Merging multiple indexes

2004-11-10 Thread Otis Gospodnetic
Use IndexWriter's addIndexes(Directory[]) call. Otis --- Ravi [EMAIL PROTECTED] wrote: Whats's the simplest way to merge 2 or more indexes into one large index. Thanks in advance, Ravi. - To unsubscribe,

Re: Indexing MS Files

2004-11-10 Thread Thierry Ferrero
I used OpenOffice API to convert all Word and Excel version. For me it's the solution for complex Word and Excel document. http://api.openoffice.org/ Good luck ! // UNO API import com.sun.star.bridge.XUnoUrlResolver; import com.sun.star.uno.XComponentContext; import com.sun.star.uno.UnoRuntime;

Re: Indexing MS Files

2004-11-10 Thread Luke Shannon
Thanks. Grandmas around the world will certainly be surprised this Christmas. - Original Message - From: Otis Gospodnetic [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Wednesday, November 10, 2004 12:18 PM Subject: Re: Indexing MS Files As Manning publications said,

Acedemic Question About Indexing

2004-11-10 Thread Luke Shannon
I am working on debugging an existing Lucene implementation. Before I started, I built a demo to understand Lucene. In my demo I indexed the entire content hierarhcy all at once, and than optimize this index and used it for queries. It was time consuming but very simply. The code I am currently

Re: Indexing MS Files

2004-11-10 Thread Luke Shannon
This looks great. Thank you Thierry! - Original Message - From: Thierry Ferrero [EMAIL PROTECTED] To: Lucene Users List [EMAIL PROTECTED] Sent: Wednesday, November 10, 2004 12:23 PM Subject: Re: Indexing MS Files I used OpenOffice API to convert all Word and Excel version. For me

Re: Acedemic Question About Indexing

2004-11-10 Thread Otis Gospodnetic
Uh, I hate to market it, but it's in the book. But you don't have to wait for it, as there already is a Lucene demo that does what you described. I am not sure if the demo always recreates the index or whether it deletes and re-adds only the new and modified files, but if it's the former,

Re: Acedemic Question About Indexing

2004-11-10 Thread Luke Shannon
Don't worry, regardless of what I learn in this forum I am telling my company to get me a copy of that bad boy when it comes out (which as far as I am concerned can't be soon enough). I will pay for grama's myself. I think I have reviewed the code you are referring to and have something similar

Re: Locking issue

2004-11-10 Thread yahootintin . 1247688
Hi, With the information provided, I have no idea what the issue may be. Is there some information that I should post that will help determine why Lucene is giving me this error? Thanks. --- Lucene Users List [EMAIL PROTECTED] wrote: On Nov 10, 2004, at 2:17 AM, [EMAIL

Re: Locking issue

2004-11-10 Thread Erik Hatcher
On Nov 10, 2004, at 5:48 PM, [EMAIL PROTECTED] wrote: Hi, With the information provided, I have no idea what the issue may be. Is there some information that I should post that will help determine why Lucene is giving me this error? You mentioned posting code - though I don't recall getting an

RE: Acedemic Question About Indexing

2004-11-10 Thread Will Allen
I have an application that I run monthly that indexes 40 million documents into 6 indexes, then uses a multisearcher. The advantage for me is that I can have multiple writers indexing 1/6 of that total data reducing the time it takes to index by about 5X. -Original Message- From: Luke

Re: Locking issue

2004-11-10 Thread yahootintin-lucene
Whoops! Looks like my attachment didn't make it through. I'm re-attaching my simple test app. Thanks. --- Erik Hatcher [EMAIL PROTECTED] wrote: On Nov 10, 2004, at 5:48 PM, [EMAIL PROTECTED] wrote: Hi, With the information provided, I have no idea what the issue may be. Is

Re: Locking issue

2004-11-10 Thread yahootintin . 1247688
I added it to Bugzilla like you suggested: http://issues.apache.org/bugzilla/show_bug.cgi?id=32171 Let me know if you see any way to get around this issue. --- Lucene Users List [EMAIL PROTECTED] wrote: Whoops! Looks like my attachment didn't make it through. I'm re-attaching my simple

Re: Locking issue

2004-11-10 Thread Erik Hatcher
I just ran the code you provided. On my puny PowerBook (Mac OS X 10.3.5) it dies in much less than 5 minutes. I do not know what the issue is, but certainly the actions the program is taking are atypical. Opening and closing an IndexWriter repeatedly is certainly expensive on large indexes.

Re: Locking issue

2004-11-10 Thread Erik Hatcher
I just added a Thread.sleep(1000) in the writer thread and it has run for quite some time, and is still running as I send this. Erik On Nov 10, 2004, at 8:02 PM, [EMAIL PROTECTED] wrote: I added it to Bugzilla like you suggested: http://issues.apache.org/bugzilla/show_bug.cgi?id=32171

Query#rewrite Question

2004-11-10 Thread Satoshi Hasegawa
Hello, Our program accepts input in the form of Lucene query syntax from the user, but we wish to perform additional tasks such as thesaurus expansion. So I want to manipulate the Query object that results from parsing. My question is, is the result of the Query#rewrite

Re: Using Lucene to store document

2004-11-10 Thread Nhan Nguyen Dang
Hi Otis, Please let me know what HEAD version of Lucene is? Actually, I'm consider the advantages of storing document using Lucene Stored field - For my Search engine. I've tested with thousands of documents and see that retrieve document (in this case XML file) with Lucene is a little bit

Search scalability

2004-11-10 Thread Ravi
We have one large index for a document repository of 800,000 documents. The size of the index is 800MB. When we do searches against the index, it takes 300-500ms for a single search. We wanted to test the scalability and tried 100 parallel searches against the index with the same query and the

Re: Search scalability

2004-11-10 Thread Otis Gospodnetic
Hello, 100 parallel searches going against a single index on a single disk means a lot of disk seeks all happening at once. One simple way of working around this is to load your FSDirectory into RAMDirectory. This should be faster (could you report your observations/comparisons?). You can also

Re: Using Lucene to store document

2004-11-10 Thread Otis Gospodnetic
Hello, HEAD version means that you should check out Lucene straight out of CVS. How to work with CVS is another story, probably described somewhere on jakarta.apache.org site. Otis --- Nhan Nguyen Dang [EMAIL PROTECTED] wrote: Hi Otis, Please let me know what HEAD version of Lucene is?

Re: Locking issue

2004-11-10 Thread yahootintin-lucene
Yes, I tried that too and it worked. The issue is that our Operations folks plan to install this on a pretty busy box and I was hoping that Lucene wouldn't cause issues if it only had a small slice of the CPU. Guess I'll tell them to buy a bigger box! Unless you have any other ideas. I'm

Re: Search scalability

2004-11-10 Thread yahootintin-lucene
Does it take 800MB of RAM to load that index into a RAMDirectory? Or are only some of the files loaded into RAM? --- Otis Gospodnetic [EMAIL PROTECTED] wrote: Hello, 100 parallel searches going against a single index on a single disk means a lot of disk seeks all happening at once. One

Bug in the BooleanQuery optimizer? ..TooManyClauses

2004-11-10 Thread Sanyi
Hi! First of all, I've read about BooleanQuery$TooManyClauses, so I know that it has a 1024 Clauses limit by default which is good enough for me, but I still think it works strange. Example: I have an index with about 20Million documents. Let's say that there is about 3000 variants in the