Ardor Wei writes:
What might be the problem? How to solve it?
Any suggestion or idea will be appreciated.
The only problem with locking I saw so far is that you have
to make sure that the temp dir is the same for all applications.
Lucene 1.3 stores it's lock in the directory that is defined
Hi Doug,
thank you for the answer so far.
I actually wanted to add a large amount of text from an existing document to
find a close related one. Can you suggest another good way of doing this. A
direct match will not occur anyway. How can I make a most Vector Space Model
(VSM) like query (each
I'm looking at a lot of the code in Lucene... I assume Vector is used
for legacy reasons. In an upcoming version I think it might make sense
to migrate to using a LinkedList... since Vector has to do an array copy
when it's exhausted.
It's also synchronized which kind of sucks...
I'm seeing
I agree that synchronization in Vector is a waste of time if it isn't
required,
but I'm not sure if LinkedList is a better (faster) choice than ArrayList. I
think only a profiler could tell.
Francesco
Kevin A. Burton [EMAIL PROTECTED] wrote:
I'm looking at a lot of the code in Lucene... I
Karl Koch wrote:
Hi Doug,
thank you for the answer so far.
I actually wanted to add a large amount of text from an existing document to
find a close related one. Can you suggest another good way of doing this. A
direct match will not occur anyway. How can I make a most Vector Space Model
(VSM)
Hello!
Is there any idea how to achieve boosting terms in HTML-documents surrounded
by HTML tags, such as B, H1, etc.?
Can it be done with use of existing API or reimplemeting or implementation
of TokenStream with custom Token types is needed?
Though it seems to me, that even such
It definitely cannot be done with custom token types. You're probably
aiming for field-specific boosting, so you will need to parse the HTML
into separate fields and use a multi-field search approach.
I'm sure there are other tricks that could be used for boosting, like
inserting the words
Thanks for answer.
Yes I'm up to field specific boosting, but also I'm looking for creating
short descriptions on documents found, based on query (like it is done in
most search engines). I've thought about those solutions but it seemed to me
that it is not straightforward and will cause troubles
On Jan 20, 2004, at 10:22 AM, Terry Steichen wrote:
1) Is there a way to set the query boost factor depending not on the
presence of a term, but on the presence of two specific terms? For
example, I may want to boost the relevance of a document that contains
both iraq and clerics, but not
Erik,
Thanks for your response. My specific comments (TS==) are inserted below.
I should make clear that I'm using
fairly complex, embedded queries - not ones that the user is expected to
enter.
Regards,
Terry
- Original Message -
From: Erik Hatcher [EMAIL PROTECTED]
To: Lucene Users
On Jan 21, 2004, at 10:01 AM, Terry Steichen wrote:
But doesn't the query itself take this into account? If there are
multiple matching terms then the overlap (coord) factor kicks in.
TS==Except that I'd like to be able to choose to do this on a
query-by-query basis. In other words,
it's
Hi,
I'd like to help working on improving Lucene. How can I help?
Le Mercredi 21 Janvier 2004 16:38, Doug Cutting a écrit :
Francesco Bellomi wrote:
I agree that synchronization in Vector is a waste of time if it isn't
required,
It would be interesting to see if such synchronization
Erik Hatcher writes:
TS==I've not been able to get negative boosting to work at all. Maybe
there's a problem with my syntax.
If, for example, I do a search with green beret^10, it works just
fine.
But green beret^-2 gives me a
ParseException showing a lexical error.
Have you
Hello Morus,
--- Morus Walter [EMAIL PROTECTED] wrote:
Hi,
I'm currently trying to get rid of query parser problems with
stopwords
(depending on the query, there are ArrayIndexOutOfBoundsExceptions,
e.g. for stop AND nonstop where stop is a stopword and nonstop not).
While this isn't
Hello Doug,
that sounds interesting to me. I refer to a paper written by NIST about
Relevance Feedback which was doing test with 20 - 200 words. This is why I
thought it might be good to be able to use all non stopwords of a document for that
and see what is happening. Do you know good papers
Karl:
http://nagoya.apache.org/eyebrowse/[EMAIL PROTECTED]msgId=114748
Status: several people have mentioned they wanted to work on it, but
nobody has contributed any patches. The code you see at the above URL
is not compatible with Lucene 1.3, but could be brought up to date.
Otis
--- Karl
there are just about as many ways of doing it as there are papers that talk about
automatic relevance feedback. many require domain-specific reference documents that
are full of facts and therefore good sources of related words. some people use
Wordnet. some of these techniques can add 400-500
Karl Koch wrote:
Do you know good papers about strategies of how
to select keywords effectivly beyond the scope of stopword lists and stemming?
Using term frequencies of the document is not really possible since lucene
is not providing access to a document vector, isn't it?
Lucene does let you
Morus,
Unfortunately, using positive boost factors less than 1 causes the parser to
barf the same as do negative boost factors.
Regards,
Terry
- Original Message -
From: Morus Walter [EMAIL PROTECTED]
To: Lucene Users List [EMAIL PROTECTED]
Sent: Wednesday, January 21, 2004 10:54 AM
On Jan 21, 2004, at 4:21 PM, Terry Steichen wrote:
PS: Is this in the docs? If not, maybe it should be mentioned.
Depends on what you consider the docs. I looked at QueryParser.jj to
see what it parses.
Also, on http://jakarta.apache.org/lucene/docs/queryparsersyntax.html
it has an example of
On Wednesday 21 January 2004 08:38, Doug Cutting wrote:
Francesco Bellomi wrote:
I agree that synchronization in Vector is a waste of time if it isn't
required,
It would be interesting to see if such synchronization actually impairs
overall performance significantly. This would be fairly
I'm getting the following stack trace from lucene-1.3-final running on
JDK 1.4.2_03-b02 on linux
java.io.FileNotFoundException: /home/matt/blah/idx/_123n.tis (Too many open files)
at java.io.RandomAccessFile.open(Native Method)
at
22 matches
Mail list logo