Re: status of LARM project

2004-04-28 Thread Otis Gospodnetic
Kelvin is all correct. A few years ago there were no quality open source crawlers available. There are now a number of very good ones. Archive.org's crawler is available, there is Larbin, Nutch, etc. LARM works, it's just not maintained any more. Otis --- Kelvin Tan [EMAIL PROTECTED] wrote:

RE: ArrayIndexOutOfBoundsException

2004-04-28 Thread Phil brunet
Hi. I had this problem when i transfered a Lucene index by FTP in ASCII mode. Using binary mode, i never has such a problem. Philippe From: James Dunn [EMAIL PROTECTED] Reply-To: Lucene Users List [EMAIL PROTECTED] To: [EMAIL PROTECTED] Subject: ArrayIndexOutOfBoundsException Date: Mon, 26 Apr

Re: Segments file get deleted?!

2004-04-28 Thread Surya Kiran
Hi Thanks for reply. I got that error in my previous build. Now i didnt see it at all. Also i couldnt able to retain the log. I will definetly come back if i see it again. Anyway below is my machine config: Windows XP Personal Ed., 512MB, P4. My app server is Resin 2.1.12 I will definetly come

Count for a keyword occurance in a file

2004-04-28 Thread hemal bhatt
Hi, How can I get a count of the score given by Hits.Score(). i.e I want to know how many times a keyword occurs in a file. Any help on this would be appreciated.   regards Hemal Bhatt regards Hemal bhatt

[Lucene] XML Indexing

2004-04-28 Thread Samuel Tang
XMLIndexingDemo seems not able to index traditional Chinese characters. I can only search for English text and not Chinese. In fact, my XML document contains both Chinese and English text. How can I fix this problem? Is it necessary for me to convert the Chinese characters in BIG5 to UTF-8

Combining text search + relational search

2004-04-28 Thread Mike_Belasco
I need to somehow aloow users to do a text search and query relational database attributes at the same time. The attributes are basically metadata about the documents that the text search will be perfomed on. I have the text of the documents indexed in Lucene. Does anyone have any advice or

RE: Re-associate a token with its source

2004-04-28 Thread Olaia Vázquez Sánchez
Thank you, but I think I didn't explained my problem clearly enough. I have four positions (top, bottom, right and left) for each one of the words of the document so I would have to store in the index the content of the page with the positions in the middle.

Re: Combining text search + relational search

2004-04-28 Thread Stephane James Vaucher
I'm a bit confused why you want this. As far as I know, but relational db searches will return exact matches without a mesure of relevancy. To mesure relevancy, you need a search engine. For your results to be coherent, you would have to put everything in the lucene index. As for memory

Re: Read past EOF and negative bufferLength problem (1.4 rc2)

2004-04-28 Thread Joe Berkovitz
Daniel, Everything works fine with the latest CVS version of lucene. It looks like the bug I hit was the one that you referenced in your email which is now fixed. Thanks for your help. . .. . ...joe Daniel Naber wrote: Am Dienstag, 27. April 2004 21:00 schrieb Joe Berkovitz:

Re: Combining text search + relational search

2004-04-28 Thread Mike_Belasco
Bascially I want to limit the results of the text search by the rows that are returned in a relational search of other attribute data related to the document. The text of the document is just like any other attribute it just needs to be queried differently. Does that make sense? Thanks Mike

Re: Combining text search + relational search

2004-04-28 Thread Otis Gospodnetic
Create a Lucene index from data in DB, and make sure to include PKs in one of the fields (use Field.Keyword). Then query your RDBMS and get back the ResultSet. Then get the PK from each ResultSet and use it to construct a Lucene BooleanQuery, which should include your original query string AND

RE: ArrayIndexOutOfBoundsException

2004-04-28 Thread James Dunn
Philippe, thanks for the reply. I didn't FTP my index anywhere, but your response does make it seem that my index is in fact corrupted somehow. Does anyone know of a tool that can verify the validity of a Lucene index, and/or possibly repair it? If not, anyone have any idea how difficult it

'Lock obtain timed out' even though NO locks exist...

2004-04-28 Thread Kevin A. Burton
I've noticed this really strange problem on one of our boxes. It's happened twice already. We have indexes where when Lucnes starts it says 'Lock obtain timed out' ... however NO locks exist for the directory. There are no other processes present and no locks in the index dir or /tmp. Is

RE: 'Lock obtain timed out' even though NO locks exist...

2004-04-28 Thread ANarayan
It is possible that a previous operation on the index left the lock open. Leaving the IndexWriter or Reader open without closing them ( in a finally block ) could cause this. Anand -Original Message- From: Kevin A. Burton [mailto:[EMAIL PROTECTED] Sent: Wednesday, April 28, 2004 2:57 PM

Re: 'Lock obtain timed out' even though NO locks exist...

2004-04-28 Thread James Dunn
Which version of lucene are you using? In 1.2, I believe the lock file was located in the index directory itself. In 1.3, it's in your system's tmp folder. Perhaps it's a permission problem on either one of those folders. Maybe your process doesn't have write access to the correct folder and

Re: 'Lock obtain timed out' even though NO locks exist...

2004-04-28 Thread Kevin A. Burton
[EMAIL PROTECTED] wrote: It is possible that a previous operation on the index left the lock open. Leaving the IndexWriter or Reader open without closing them ( in a finally block ) could cause this. Actually this is exactly the problem... I ran some single index tests and a single process

lucene applicability and performance

2004-04-28 Thread Greg Conway
Hello. Apologies if this has come up before, I'm new to the list and didn't see anything in the archives that exactly matched my situation. I am considering using Lucene to index and search a large collection of small documents in a specialized domain -- probably only a few thousands unique

Re: 'Lock obtain timed out' even though NO locks exist...

2004-04-28 Thread Kevin A. Burton
Kevin A. Burton wrote: Actually this is exactly the problem... I ran some single index tests and a single process seems to read from it. The problem is that we were running under Tomcat with diff webapps for testing and didn't run into this problem before. We had an 11G index that just took

Re: 'Lock obtain timed out' even though NO locks exist...

2004-04-28 Thread Kevin A. Burton
James Dunn wrote: Which version of lucene are you using? In 1.2, I believe the lock file was located in the index directory itself. In 1.3, it's in your system's tmp folder. Yes... 1.3 and I have a script that removes the locks from both dirs... This is only one process so it's just fine

RE: 'Lock obtain timed out' even though NO locks exist...

2004-04-28 Thread Gus Kormeier
Not sure if our installation is the same or not, but we are also using Tomcat. I had a similiar problem last week, it occurred after Tomcat went through a hard restart and some software errors had the website hammered. I found the lock file in /usr/local/tomcat/temp/ using locate. According to

Bug in Sandbox - Berkeley DB

2004-04-28 Thread Andy Goodell
IndexReader.delete(int docid) doesn't work with the Berkeley DB implementation of org.apache.lucene.store.Directory This error message appears when closing an IndexReader which has a deletion: PANIC: Invalid argument I get this stack trace: java.io.IOException: DB_RUNRECOVERY: Fatal error, run

Re: 'Lock obtain timed out' even though NO locks exist...

2004-04-28 Thread Kevin A. Burton
Gus Kormeier wrote: Not sure if our installation is the same or not, but we are also using Tomcat. I had a similiar problem last week, it occurred after Tomcat went through a hard restart and some software errors had the website hammered. I found the lock file in /usr/local/tomcat/temp/ using

Re: lucene applicability and performance

2004-04-28 Thread Ype Kingma
Greg, On Wednesday 28 April 2004 21:44, Greg Conway wrote: Hello. Apologies if this has come up before, I'm new to the list and didn't see anything in the archives that exactly matched my situation. It has, but each situation is different. Try this:

Re: lucene applicability and performance

2004-04-28 Thread Ype Kingma
Greg, Yes, see RemoteSearchable and MultiSearcher in org.apache.lucene.search. (See the javadoc on the website) I meant ParallelMultiSearcher. Good night, Ype - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional

Created LockObtainTimedOut wiki page

2004-04-28 Thread Kevin A. Burton
I just created a LockObtainTimedOut wiki entry... feel free to add. I just entered the Tomcat issue with java.io.tmpdir as well. http://wiki.apache.org/jakarta-lucene/LockObtainTimedOut Peace! -- Please reply using PGP. http://peerfear.org/pubkey.asc NewsMonster -