1.3-final: now giving me java.io.FileNotFoundException (Too many open files)

2004-01-21 Thread Matt Quail
I'm getting the following stack trace from lucene-1.3-final running on JDK 1.4.2_03-b02 on linux java.io.FileNotFoundException: /home/matt/blah/idx/_123n.tis (Too many open files) at java.io.RandomAccessFile.open(Native Method) at java.io.RandomAccessFile.(RandomAccessFile.java:20

Re: Vector -> LinkedList for performance reasons...

2004-01-21 Thread Tatu Saloranta
On Wednesday 21 January 2004 08:38, Doug Cutting wrote: > Francesco Bellomi wrote: > > I agree that synchronization in Vector is a waste of time if it isn't > > required, > > It would be interesting to see if such synchronization actually impairs > overall performance significantly. This would be

Re: Query Term Questions

2004-01-21 Thread Erik Hatcher
On Jan 21, 2004, at 4:21 PM, Terry Steichen wrote: PS: Is this in the docs? If not, maybe it should be mentioned. Depends on what you consider "the docs". I looked at QueryParser.jj to see what it parses. Also, on it has an example

Re: Query Term Questions

2004-01-21 Thread Terry Steichen
Right you are - the leading zero was the key. Thanks. Regards, Terry PS: Is this in the docs? If not, maybe it should be mentioned. - Original Message - From: "Erik Hatcher" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Wednesday, January 21, 2004 2:04 PM Subj

Re: Query Term Questions

2004-01-21 Thread Erik Hatcher
On Jan 21, 2004, at 1:07 PM, Terry Steichen wrote: Unfortunately, using positive boost factors less than 1 causes the parser to barf the same as do negative boost factors. Are you sure about that? Works for me. QueryParser just isn't set up to deal with a minus sign, but "term^0.5" should work

Re: Query Term Questions

2004-01-21 Thread Terry Steichen
Morus, Unfortunately, using positive boost factors less than 1 causes the parser to barf the same as do negative boost factors. Regards, Terry - Original Message - From: "Morus Walter" <[EMAIL PROTECTED]> To: "Lucene Users List" <[EMAIL PROTECTED]> Sent: Wednesday, January 21, 2004 10:5

Re: setMaxClauseCount ??

2004-01-21 Thread Doug Cutting
Karl Koch wrote: Do you know good papers about strategies of how to select keywords effectivly beyond the scope of stopword lists and stemming? Using term frequencies of the document is not really possible since lucene is not providing access to a document vector, isn't it? Lucene does let you acce

RE: setMaxClauseCount ??

2004-01-21 Thread Chong, Herb
there are just about as many ways of doing it as there are papers that talk about automatic relevance feedback. many require domain-specific reference documents that are full of facts and therefore good sources of related words. some people use Wordnet. some of these techniques can add 400-500 t

Re: setMaxClauseCount ??

2004-01-21 Thread Otis Gospodnetic
Karl: http://nagoya.apache.org/eyebrowse/[EMAIL PROTECTED]&msgId=114748 Status: several people have mentioned they wanted to work on it, but nobody has contributed any patches. The code you see at the above URL is not compatible with Lucene 1.3, but could be brought up to date. Otis --- Karl Ko

Re: setMaxClauseCount ??

2004-01-21 Thread Karl Koch
Hello Doug, that sounds interesting to me. I refer to a paper written by NIST about Relevance Feedback which was doing test with 20 - 200 words. This is why I thought it might be good to be able to use all non stopwords of a document for that and see what is happening. Do you know good papers abou

Re: QueryParser and stopwords

2004-01-21 Thread Otis Gospodnetic
Hello Morus, --- Morus Walter <[EMAIL PROTECTED]> wrote: > Hi, > > I'm currently trying to get rid of query parser problems with > stopwords > (depending on the query, there are ArrayIndexOutOfBoundsExceptions, > e.g. for stop AND nonstop where stop is a stopword and nonstop not). > > While this

Re: Query Term Questions

2004-01-21 Thread Morus Walter
Erik Hatcher writes: > > > > TS==>I've not been able to get negative boosting to work at all. Maybe > > there's a problem with my syntax. > > If, for example, I do a search with "green beret"^10, it works just > > fine. > > But "green beret"^-2 gives me a > > ParseException showing a lexical erro

Re: Vector -> LinkedList for performance reasons...

2004-01-21 Thread Nicolas Toper
Hi, I'd like to help working on improving Lucene. How can I help? Le Mercredi 21 Janvier 2004 16:38, Doug Cutting a écrit : > Francesco Bellomi wrote: > > I agree that synchronization in Vector is a waste of time if it isn't > > required, > > It would be interesting to see if such synchronization a

Re: Query Term Questions

2004-01-21 Thread Doug Cutting
Terry Steichen wrote: 1) Is there a way to set the query boost factor depending not on the presence of a term, but on the presence of two specific terms? For example, I may want to boost the relevance of a document that contains both "iraq" and "clerics", but not boost the relevance of documents t

Re: Vector -> LinkedList for performance reasons...

2004-01-21 Thread Doug Cutting
Francesco Bellomi wrote: I agree that synchronization in Vector is a waste of time if it isn't required, It would be interesting to see if such synchronization actually impairs overall performance significantly. This would be fairly simple to test. but I'm not sure if LinkedList is a better (fas

Re: setMaxClauseCount ??

2004-01-21 Thread Doug Cutting
Andrzej Bialecki wrote: Karl Koch wrote: I actually wanted to add a large amount of text from an existing document to find a close related one. Can you suggest another good way of doing this. You should try to reduce the dimensionality by reducing the number of unique features. In this case, you

Re: Query Term Questions

2004-01-21 Thread Erik Hatcher
On Jan 21, 2004, at 10:01 AM, Terry Steichen wrote: But doesn't the query itself take this into account? If there are multiple matching terms then the overlap (coord) factor kicks in. TS==>Except that I'd like to be able to choose to do this on a query-by-query basis. In other words, it's desirab

Re: Query Term Questions

2004-01-21 Thread Terry Steichen
Erik, Thanks for your response. My specific comments (TS==>) are inserted below. I should make clear that I'm using fairly complex, embedded queries - not ones that the user is expected to enter. Regards, Terry - Original Message - From: "Erik Hatcher" <[EMAIL PROTECTED]> To: "Lucene U

Re: Query Term Questions

2004-01-21 Thread Erik Hatcher
On Jan 20, 2004, at 10:22 AM, Terry Steichen wrote: 1) Is there a way to set the query boost factor depending not on the presence of a term, but on the presence of two specific terms? For example, I may want to boost the relevance of a document that contains both "iraq" and "clerics", but not b

Re: Query Term Questions

2004-01-21 Thread Erik Hatcher
On Jan 21, 2004, at 7:44 AM, Terry Steichen wrote: By the silence, I gather that the answers to my questions are "no", "no" and "no". Silence should not be interpreted this way. Perhaps folks don't know, or are busy, or any number of possibilities. I'll reply to your original message in a sec.

AW: HTML tagged terms boosting...

2004-01-21 Thread Alexey Maksakov
Thanks for answer. Yes I'm up to field specific boosting, but also I'm looking for creating short descriptions on documents found, based on query (like it is done in most search engines). I've thought about those solutions but it seemed to me that it is not straightforward and will cause troubles

Re: Query Term Questions

2004-01-21 Thread Morus Walter
Terry Steichen writes: > By the silence, I gather that the answers to my questions are "no", "no" and > "no". > > 2) Is it possible to apply (or simulate) a negative query boost factor? For > example, I may have a complex query with lots of terms but want to reduce > the relevance of a matching d

Re: Query Term Questions

2004-01-21 Thread Terry Steichen
By the silence, I gather that the answers to my questions are "no", "no" and "no". Regards, Terry - Original Message - From: "Terry Steichen" <[EMAIL PROTECTED]> To: "Lucene Users Group" <[EMAIL PROTECTED]> Sent: Tuesday, January 20, 2004 10:22 AM Subject: Query Term Questions 1) Is th

Re: HTML tagged terms boosting...

2004-01-21 Thread Erik Hatcher
It definitely cannot be done with custom token types. You're probably aiming for field-specific boosting, so you will need to parse the HTML into separate fields and use a multi-field search approach. I'm sure there are other tricks that could be used for boosting, like inserting the words ins

HTML tagged terms boosting...

2004-01-21 Thread Alexey Maksakov
Hello! Is there any idea how to achieve boosting terms in HTML-documents surrounded by HTML tags, such as , , etc.? Can it be done with use of existing API or reimplemeting or implementation of TokenStream with custom Token types is needed? Though it seems to me, that even such re-implementation

Re: setMaxClauseCount ??

2004-01-21 Thread Andrzej Bialecki
Karl Koch wrote: Hi Doug, thank you for the answer so far. I actually wanted to add a large amount of text from an existing document to find a close related one. Can you suggest another good way of doing this. A direct match will not occur anyway. How can I make a most Vector Space Model (VSM) l

Re: Vector -> LinkedList for performance reasons...

2004-01-21 Thread Francesco Bellomi
I agree that synchronization in Vector is a waste of time if it isn't required, but I'm not sure if LinkedList is a better (faster) choice than ArrayList. I think only a profiler could tell. Francesco Kevin A. Burton <[EMAIL PROTECTED]> wrote: >> I'm looking at a lot of the code in Lucene... I a

Vector -> LinkedList for performance reasons...

2004-01-21 Thread Kevin A. Burton
I'm looking at a lot of the code in Lucene... I assume Vector is used for legacy reasons. In an upcoming version I think it might make sense to migrate to using a LinkedList... since Vector has to do an array copy when it's exhausted. It's also synchronized which kind of sucks... I'm seeing t

Re: setMaxClauseCount ??

2004-01-21 Thread Karl Koch
Hi Doug, thank you for the answer so far. I actually wanted to add a large amount of text from an existing document to find a close related one. Can you suggest another good way of doing this. A direct match will not occur anyway. How can I make a most Vector Space Model (VSM) like query (each wo

Re: Lucene search result no stable

2004-01-21 Thread Morus Walter
Ardor Wei writes: > > What might be the problem? How to solve it? > Any suggestion or idea will be appreciated. > The only problem with locking I saw so far is that you have to make sure that the temp dir is the same for all applications. Lucene 1.3 stores it's lock in the directory that is defin