Kelvin is all correct.
A few years ago there were no quality open source crawlers available.
There are now a number of very good ones. Archive.org's crawler is
available, there is Larbin, Nutch, etc.
LARM works, it's just not maintained any more.
Otis
--- Kelvin Tan [EMAIL PROTECTED] wrote:
Hi.
I had this problem when i transfered a Lucene index by FTP in ASCII mode.
Using binary mode, i never has such a problem.
Philippe
From: James Dunn [EMAIL PROTECTED]
Reply-To: Lucene Users List [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Subject: ArrayIndexOutOfBoundsException
Date: Mon, 26 Apr
Hi Thanks for reply. I got that error in my previous build. Now i didnt see
it at all.
Also i couldnt able to retain the log. I will definetly come back if i see
it again.
Anyway below is my machine config:
Windows XP Personal Ed., 512MB, P4.
My app server is Resin 2.1.12
I will definetly come
Hi,
How can I get a count of the score given by Hits.Score().
i.e I want to know how many times a keyword occurs in a file.
Any help on this would be appreciated.
regards
Hemal Bhatt
regards
Hemal bhatt
XMLIndexingDemo seems not able to index traditional Chinese characters. I can only
search for English text and not Chinese. In fact, my XML document contains both
Chinese and English text. How can I fix this problem? Is it necessary for me to
convert the Chinese characters in BIG5 to UTF-8
I need to somehow aloow users to do a text search and query relational
database attributes at the same time. The attributes are basically metadata
about the documents that the text search will be perfomed on. I have the
text of the documents indexed in Lucene. Does anyone have any advice or
Thank you, but I think I didn't explained my problem clearly enough.
I have four positions (top, bottom, right and left) for each one of the
words of the document so I would have to store in the index the content of
the page with the positions in the middle.
I'm a bit confused why you want this.
As far as I know, but relational db searches will return exact
matches without a mesure of relevancy. To mesure relevancy, you need a
search engine. For your results to be coherent, you would have to put
everything in the lucene index.
As for memory
Daniel,
Everything works fine with the latest CVS version of lucene. It looks
like the bug I hit was the one that you referenced in your email which
is now fixed.
Thanks for your help.
. .. . ...joe
Daniel Naber wrote:
Am Dienstag, 27. April 2004 21:00 schrieb Joe Berkovitz:
Bascially I want to limit the results of the text search by the rows that
are returned in a relational search of other attribute data related to the
document. The text of the document is just like any other attribute it just
needs to be queried differently. Does that make sense?
Thanks
Mike
Create a Lucene index from data in DB, and make sure to include PKs in
one of the fields (use Field.Keyword).
Then query your RDBMS and get back the ResultSet.
Then get the PK from each ResultSet and use it to construct a Lucene
BooleanQuery, which should include your original query string AND
Philippe, thanks for the reply. I didn't FTP my index
anywhere, but your response does make it seem that my
index is in fact corrupted somehow.
Does anyone know of a tool that can verify the
validity of a Lucene index, and/or possibly repair it?
If not, anyone have any idea how difficult it
I've noticed this really strange problem on one of our boxes. It's
happened twice already.
We have indexes where when Lucnes starts it says 'Lock obtain timed out'
... however NO locks exist for the directory.
There are no other processes present and no locks in the index dir or /tmp.
Is
It is possible that a previous operation on the index left the lock open.
Leaving the IndexWriter or Reader open without closing them ( in a finally
block ) could cause this.
Anand
-Original Message-
From: Kevin A. Burton [mailto:[EMAIL PROTECTED]
Sent: Wednesday, April 28, 2004 2:57 PM
Which version of lucene are you using? In 1.2, I
believe the lock file was located in the index
directory itself. In 1.3, it's in your system's tmp
folder.
Perhaps it's a permission problem on either one of
those folders. Maybe your process doesn't have write
access to the correct folder and
[EMAIL PROTECTED] wrote:
It is possible that a previous operation on the index left the lock open.
Leaving the IndexWriter or Reader open without closing them ( in a finally
block ) could cause this.
Actually this is exactly the problem... I ran some single index tests
and a single process
Hello. Apologies if this has come up before, I'm new to the list and
didn't see anything in the archives that exactly matched my situation.
I am considering using Lucene to index and search a large collection of
small documents in a specialized domain -- probably only a few
thousands unique
Kevin A. Burton wrote:
Actually this is exactly the problem... I ran some single index tests
and a single process seems to read from it.
The problem is that we were running under Tomcat with diff webapps for
testing and didn't run into this problem before. We had an 11G index
that just took
James Dunn wrote:
Which version of lucene are you using? In 1.2, I
believe the lock file was located in the index
directory itself. In 1.3, it's in your system's tmp
folder.
Yes... 1.3 and I have a script that removes the locks from both dirs...
This is only one process so it's just fine
Not sure if our installation is the same or not, but we are also using
Tomcat.
I had a similiar problem last week, it occurred after Tomcat went through a
hard restart and some software errors had the website hammered.
I found the lock file in /usr/local/tomcat/temp/ using locate.
According to
IndexReader.delete(int docid) doesn't work with the Berkeley DB
implementation of org.apache.lucene.store.Directory
This error message appears when closing an IndexReader which has a deletion:
PANIC: Invalid argument
I get this stack trace:
java.io.IOException: DB_RUNRECOVERY: Fatal error, run
Gus Kormeier wrote:
Not sure if our installation is the same or not, but we are also using
Tomcat.
I had a similiar problem last week, it occurred after Tomcat went through a
hard restart and some software errors had the website hammered.
I found the lock file in /usr/local/tomcat/temp/ using
Greg,
On Wednesday 28 April 2004 21:44, Greg Conway wrote:
Hello. Apologies if this has come up before, I'm new to the list and
didn't see anything in the archives that exactly matched my situation.
It has, but each situation is different. Try this:
Greg,
Yes, see RemoteSearchable and MultiSearcher in org.apache.lucene.search.
(See the javadoc on the website)
I meant ParallelMultiSearcher.
Good night,
Ype
-
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional
I just created a LockObtainTimedOut wiki entry... feel free to add. I
just entered the Tomcat issue with java.io.tmpdir as well.
http://wiki.apache.org/jakarta-lucene/LockObtainTimedOut
Peace!
--
Please reply using PGP.
http://peerfear.org/pubkey.asc
NewsMonster -
25 matches
Mail list logo