SpanQuery scoring seems different

2008-04-02 Thread Cedric Ho
Hi all, It seems that SpanNearQuery doesn't consider the boosting of the nested terms: 1.334 = (MATCH) weight(spanNear([content2MBM:morgan^4.0, content2MBM:stanley^4.0], 2, true) in 11976), product of: 2.0 = queryWeight(spanNear([content2MBM:morgan^4.0, content2MBM:stanley^4.0], 2,

Re: SpanQuery scoring seems different

2008-04-02 Thread Cedric Ho
And I just found an old jira issue which might explain this behavior LUCENE-533 http://www.archivum.info/[EMAIL PROTECTED]/2006-03/msg00265.html Cedric On Wed, Apr 2, 2008 at 3:15 PM, Cedric Ho [EMAIL PROTECTED] wrote: Hi all, It seems that SpanNearQuery doesn't consider the boosting of

How to reconstruct field value from index ?

2008-04-02 Thread wuqi
Hi, I want to reconstruct the field value from index, just the same as the function Reconstruct and Edit in the tool Luke . Just any hints is OK. Thanks in advance. Thanks -qi

Re: stemming in Lucene

2008-04-02 Thread Mathieu Lecarme
Wojtek H a écrit : Hi all, Snowball stemmers are part of Lucene, but for few languages only. We have documents in various languages and so need stemmers for many languages (in particular polish). One of the ideas is to use ispell dictionaries. There are ispell dicts for many languages and so

Re: Lucene Compression

2008-04-02 Thread Grant Ingersoll
It's generally considered best practice to compress things first in your app and then add them as a binary field. That being said, I don't see why that would blow up on it's own. Have you tried compressing it outside of Lucene to see what happens? If you can reproduce it as a test case

Re: Lucene Compression

2008-04-02 Thread eks dev
the example you have sent is too small for the type of compression implemented in lucene. The problem is that you have to store decoding symbol table , header ...* for each* document you compress. The best you can do for this would be to use some compressor with static decoding table (some

Re: How to reconstruct field value from index ?

2008-04-02 Thread Karl Wettin
wuqi skrev: Hi, I want to reconstruct the field value from index, just the same as the function Reconstruct and Edit in the tool Luke . Just any hints is OK. Thanks in advance. Thanks -qi http://issues.apache.org/jira/browse/LUCENE-1016 karl

Re: How to reconstruct field value from index ?

2008-04-02 Thread wuqi
Thank you Karl. I am just interested in how does Luke reconstruct with field that is unsorted and has no TermVector. Seems Luke have to iterate all the terms in the index,and check whether certain term is contained in the document. - Original Message - From: Karl Wettin [EMAIL

Sorting VS Scoring

2008-04-02 Thread John Xiao
I'm trying to figure out what the best practice is in term of using sorting or customized scoring. For example, if I have want to index some static pages and rank them by how many times a page is viewed. I can get the page view counters and store them in the index document as a field COUNTER. I

Re: How to reconstruct field value from index ?

2008-04-02 Thread Andrzej Bialecki
wuqi wrote: Thank you Karl. I am just interested in how does Luke reconstruct with field that is unsorted and has no TermVector. Seems Luke have to iterate all the terms in the index,and check whether certain term is contained in the document. Correct, that's exactly how this function works in

Re: Sorting VS Scoring

2008-04-02 Thread Erick Erickson
The problem here is that you'll have to keep deleting and adding your documents in order to update the counter field for all of these solutions, and I doubt that's what you really want to do. There is much discussion of updating a document that's already in the index, but I don't think it's there

Adding attribute to index

2008-04-02 Thread Nitasha Walia (niwalia)
Hi, I am a new user of Java Lucene and need to learn how to add a new attribute, such that, given a database of emails, containing sender information, searching for a keyword, results in 1. The sender of the email 2. The email. I am using Lucene-2.3.1, and don't know where to start in the

RE: Sorting VS Scoring

2008-04-02 Thread John Xiao
Updating index is easy, I can have a background thread to do it. Someone mentioned the searchable archive before, where do I find it? Thanks, -Original Message- From: Erick Erickson [mailto:[EMAIL PROTECTED] Sent: Wednesday, April 02, 2008 11:12 AM To: java-user@lucene.apache.org

Re: Adding attribute to index

2008-04-02 Thread Donna L Gresh
This is fast and loose code (from my head; check the syntax). I *highly* recommend you get a copy of the book Lucene in Action; it will really help. To create the index, add a document with two fields; one for the sender and one for the email text. IndexWriter indexWriter = new

Re: Sorting VS Scoring

2008-04-02 Thread Erick Erickson
Try http://www.nabble.com/Lucene---Java-Users-f45.html Updating may not be as easy as you think. You'll be changing your index *every* time a user accesses it. And your changes won't be seen until you close/reopen the searcher. But maybe you've worked it out already Erick On Wed, Apr 2,

Re: Adding attribute to index

2008-04-02 Thread Michael Wechner
Nitasha Walia (niwalia) wrote: Hi, I am a new user of Java Lucene and need to learn how to add a new attribute, such that, given a database of emails, containing sender information, searching for a keyword, results in what kind of database do you use to store your emails? I am asking

Unicode Tokenizer problem with Registered Trademark Search

2008-04-02 Thread Bruce.Nawrocki
I am having a problem when searching for certain Unicode characters, such as the Registered Trademark. That's the Unicode character 00AE. It's also a problem searching for a Japanese Yen symbol (Unicode character 00A5). I'm using the Lucene 2.0.0 jar file, and we used to use Lucene 1.4.2 jar

RE: Problems about using Lucene to generate tag cloud..

2008-04-02 Thread Dominique Béjean
Hum, it looks like it is not true. Use a do-while loop make the first terms.term().field() generate a null pointer exception. -Message d'origine- De : Daniel Noll [mailto:[EMAIL PROTECTED] Envoyé : mardi 1 avril 2008 23:58 À : java-user@lucene.apache.org Objet : Re: Problems about using

RE: Unicode Tokenizer problem with Registered Trademark Search

2008-04-02 Thread Steven A Rowe
Hi Bruce, On 04/02/2008 at 4:58 PM, [EMAIL PROTECTED] wrote: I am having a problem when searching for certain Unicode characters, such as the Registered Trademark. That's the Unicode character 00AE. It's also a problem searching for a Japanese Yen symbol (Unicode character 00A5). I'm using

Re: Problems about using Lucene to generate tag cloud..

2008-04-02 Thread Daniel Noll
On Thursday 03 April 2008 08:08:09 Dominique Béjean wrote: Hum, it looks like it is not true. Use a do-while loop make the first terms.term().field() generate a null pointer exception. Depends which terms method you use. TermEnum terms = reader.terms();