Re: Strange search results with wildcard - Bug?

2004-09-24 Thread Ulrich Mayring
Daniel Naber wrote: AND always refers to the terms on both sides, +/- only refers to the term on the right. So a AND b - +a +b is correct. *slap forehead* - you're right. Wasn't there something about operator precedence way back when ;-) Anyway, thanks to my stupidity and the help on this

Re: Strange search results with wildcard - Bug?

2004-09-24 Thread Morus Walter
Ulrich Mayring writes: Daniel Naber wrote: AND always refers to the terms on both sides, +/- only refers to the term on the right. So a AND b - +a +b is correct. *slap forehead* - you're right. Wasn't there something about operator precedence way back when ;-) Yes. January. And

Iterate through the TermFreqVector

2004-09-24 Thread William Lee
Is there a simple way to iterate through all the documents to get their TermFreqVectors? Do I need to write a custom IndexSearcher for this? Or can I just get an enumeration of the document ID and call IndexReader.getTermFreqVector(int)? Thanks, Will -- William (Will) Lee Email: [EMAIL

Re: problem with get/setBoost of document fields

2004-09-24 Thread Bastian Grimm [Eastbeam GmbH]
thanks doug, that works... but i have to do this setNorm() for each document, which has been indexed up to now, right? there are round about 1 mio. docs in the index... i dont think it's a good idea to perform a search and do it for every doc (and every field of the doc...). is there any

Using Proximity for Ranking

2004-09-24 Thread Olena Medelyan
Dear Lucene-Users, is there any possibility to use proximity for long queries (10 and more terms) automatically? I need a kind of ranking feature, that would give higher relevance scores to those documents, that contain query terms (or some of query terms) with a lower distance between them. I

RE: Using Proximity for Ranking

2004-09-24 Thread Chong, Herb
not without changing the contents of the index structure to store word locations. Herb... -Original Message- From: Olena Medelyan [mailto:[EMAIL PROTECTED] Sent: Friday, September 24, 2004 9:28 AM To: Lucene Users List Subject: Using Proximity for Ranking Dear Lucene-Users, is there

Re: Using Proximity for Ranking

2004-09-24 Thread Daniel Naber
On Friday 24 September 2004 15:27, Olena Medelyan wrote: I know that I can use the slop operator for phrase search (red fox~3), but what I need should work for partial matching as well. You can use the value of Integer.MAX_VALUE instead of 3 in your example, something like: +red +fox +red

not tokenized fields

2004-09-24 Thread Wermus Fernando
Luceners, When a field is not tokenized should I replace every space for a ?? I'm looking up for : my dear If I test with luke, it splits the words in 'my' and 'dear'. So I can't find in my not tokenized field. The same happens for my dear In these case I don't know why it

Keyword query confusion

2004-09-24 Thread Fred Toth
Hi all, I'm trying to understand what's going on with the query parser and keyword fields. I've got a large subset of my documents which are publications. So as to be able to query these, I've got this in the indexer: doc.add(Field.Keyword(is_pub, 1)); However, if I run a query: is_pub:1 I

RE: Power Point Processing

2004-09-24 Thread Zhang, Lisheng
Hi, Thanks very much for helps, I will try that. Best regards, Lisheng -Original Message- From: Magnus Johansson [mailto:[EMAIL PROTECTED] Sent: Thursday, September 23, 2004 11:15 PM To: Lucene Users List Subject: Re: Power Point Processing I've had some success with the code found at

RE: Keyword query confusion

2004-09-24 Thread Aviran
The StandardAnalyzer removes the 1 as it is a stop word. There are two ways you can work around this problem. 1 as you mentioned is to create a Query object programmatically. 2 You can use WhiteSpace Analyzer instead of StandardAnalyzer. Aviran -Original Message- From: Fred Toth

Questions related to closing the searcher

2004-09-24 Thread Edwin Tang
Thanks for the tip. However, since the index is constantly updated, I won't have to check whether it has changed. I'm just puzzled as to why I'm running out of memory when I'm closing the searcher, setting it to null, running the garbage collector, then getting a new searcher. Ed --- [EMAIL

demo IndexHTML parser breaks unicode?

2004-09-24 Thread Fred Toth
Hi, I was hoping it wouldn't come to this: I've got unicode in my source HTML. In particular, within meta tags, and it's getting broken by the indexer. Note that I'm not trying to query on any of this, just store and retrieve document titles with unicode characters. Has anyone else experienced

Re: demo IndexHTML parser breaks unicode?

2004-09-24 Thread Fred Toth
Sorry, that didn't cure it. Again, anyone want to point me to the quickest replacement HTML parser (that's unicode clean)? Thanks, Fred At 03:17 PM 9/24/2004, you wrote: On Friday 24 September 2004 19:58, Fred Toth wrote: I've got unicode in my source HTML. In particular, within meta tags, and

RE: demo IndexHTML parser breaks unicode?

2004-09-24 Thread wallen
In org.apache.lucene.demo.HTMLDocument you need to change the input stream to use a different encoding. Replace the fis with this: fis = new InputStreamReader(new FileInputStream(f), UTF-16); -Original Message- From: Fred Toth [mailto:[EMAIL PROTECTED] Sent: Friday, September 24, 2004

RE: demo IndexHTML parser breaks unicode?

2004-09-24 Thread Fred Toth
Hi, Thanks for the tip, but that didn't work in my case. Presumably with this patch, and the changes in CVS, this makes the parser work with UTF-16. I can't really tell because the index appears now to be completely UTF-16 and I can't search for anything. My input is actually UTF-8 anyway, and if