date:20040924

Re: Strange search results with wildcard - Bug?

2004-09-24 Thread Ulrich Mayring

Daniel Naber wrote: AND always refers to the terms on both sides, +/- only refers to the term on the right. So a AND b - +a +b is correct. *slap forehead* - you're right. Wasn't there something about operator precedence way back when ;-) Anyway, thanks to my stupidity and the help on this

Re: Strange search results with wildcard - Bug?

2004-09-24 Thread Morus Walter

Ulrich Mayring writes: Daniel Naber wrote: AND always refers to the terms on both sides, +/- only refers to the term on the right. So a AND b - +a +b is correct. *slap forehead* - you're right. Wasn't there something about operator precedence way back when ;-) Yes. January. And

Iterate through the TermFreqVector

2004-09-24 Thread William Lee

Is there a simple way to iterate through all the documents to get their TermFreqVectors? Do I need to write a custom IndexSearcher for this? Or can I just get an enumeration of the document ID and call IndexReader.getTermFreqVector(int)? Thanks, Will -- William (Will) Lee Email: [EMAIL

Re: problem with get/setBoost of document fields

2004-09-24 Thread Bastian Grimm [Eastbeam GmbH]

thanks doug, that works... but i have to do this setNorm() for each document, which has been indexed up to now, right? there are round about 1 mio. docs in the index... i dont think it's a good idea to perform a search and do it for every doc (and every field of the doc...). is there any

Using Proximity for Ranking

2004-09-24 Thread Olena Medelyan

Dear Lucene-Users, is there any possibility to use proximity for long queries (10 and more terms) automatically? I need a kind of ranking feature, that would give higher relevance scores to those documents, that contain query terms (or some of query terms) with a lower distance between them. I

RE: Using Proximity for Ranking

2004-09-24 Thread Chong, Herb

not without changing the contents of the index structure to store word locations. Herb... -Original Message- From: Olena Medelyan [mailto:[EMAIL PROTECTED] Sent: Friday, September 24, 2004 9:28 AM To: Lucene Users List Subject: Using Proximity for Ranking Dear Lucene-Users, is there

Re: Using Proximity for Ranking

2004-09-24 Thread Daniel Naber

On Friday 24 September 2004 15:27, Olena Medelyan wrote: I know that I can use the slop operator for phrase search (red fox~3), but what I need should work for partial matching as well. You can use the value of Integer.MAX_VALUE instead of 3 in your example, something like: +red +fox +red

not tokenized fields

2004-09-24 Thread Wermus Fernando

Luceners, When a field is not tokenized should I replace every space for a ?? I'm looking up for : my dear If I test with luke, it splits the words in 'my' and 'dear'. So I can't find in my not tokenized field. The same happens for my dear In these case I don't know why it

Keyword query confusion

2004-09-24 Thread Fred Toth

Hi all, I'm trying to understand what's going on with the query parser and keyword fields. I've got a large subset of my documents which are publications. So as to be able to query these, I've got this in the indexer: doc.add(Field.Keyword(is_pub, 1)); However, if I run a query: is_pub:1 I

RE: Power Point Processing

2004-09-24 Thread Zhang, Lisheng

Hi, Thanks very much for helps, I will try that. Best regards, Lisheng -Original Message- From: Magnus Johansson [mailto:[EMAIL PROTECTED] Sent: Thursday, September 23, 2004 11:15 PM To: Lucene Users List Subject: Re: Power Point Processing I've had some success with the code found at

RE: Keyword query confusion

2004-09-24 Thread Aviran

The StandardAnalyzer removes the 1 as it is a stop word. There are two ways you can work around this problem. 1 as you mentioned is to create a Query object programmatically. 2 You can use WhiteSpace Analyzer instead of StandardAnalyzer. Aviran -Original Message- From: Fred Toth

Questions related to closing the searcher

2004-09-24 Thread Edwin Tang

Thanks for the tip. However, since the index is constantly updated, I won't have to check whether it has changed. I'm just puzzled as to why I'm running out of memory when I'm closing the searcher, setting it to null, running the garbage collector, then getting a new searcher. Ed --- [EMAIL

demo IndexHTML parser breaks unicode?

2004-09-24 Thread Fred Toth

Hi, I was hoping it wouldn't come to this: I've got unicode in my source HTML. In particular, within meta tags, and it's getting broken by the indexer. Note that I'm not trying to query on any of this, just store and retrieve document titles with unicode characters. Has anyone else experienced

Re: demo IndexHTML parser breaks unicode?

2004-09-24 Thread Fred Toth

Sorry, that didn't cure it. Again, anyone want to point me to the quickest replacement HTML parser (that's unicode clean)? Thanks, Fred At 03:17 PM 9/24/2004, you wrote: On Friday 24 September 2004 19:58, Fred Toth wrote: I've got unicode in my source HTML. In particular, within meta tags, and

RE: demo IndexHTML parser breaks unicode?

2004-09-24 Thread wallen

In org.apache.lucene.demo.HTMLDocument you need to change the input stream to use a different encoding. Replace the fis with this: fis = new InputStreamReader(new FileInputStream(f), UTF-16); -Original Message- From: Fred Toth [mailto:[EMAIL PROTECTED] Sent: Friday, September 24, 2004

RE: demo IndexHTML parser breaks unicode?

2004-09-24 Thread Fred Toth

Hi, Thanks for the tip, but that didn't work in my case. Presumably with this patch, and the changes in CVS, this makes the parser work with UTF-16. I can't really tell because the index appears now to be completely UTF-16 and I can't search for anything. My input is actually UTF-8 anyway, and if

Re: Strange search results with wildcard - Bug?

Re: Strange search results with wildcard - Bug?

Iterate through the TermFreqVector

Re: problem with get/setBoost of document fields

Using Proximity for Ranking

RE: Using Proximity for Ranking

Re: Using Proximity for Ranking

not tokenized fields

Keyword query confusion

RE: Power Point Processing

RE: Keyword query confusion

Questions related to closing the searcher

demo IndexHTML parser breaks unicode?

Re: demo IndexHTML parser breaks unicode?

RE: demo IndexHTML parser breaks unicode?

RE: demo IndexHTML parser breaks unicode?

16 matches

Site Navigation

Mail list logo

Footer information