[jira] Updated: (LUCENE-2395) Add a scoring DistanceQuery that does not need caches and separate filters

2010-04-15 Thread Uwe Schindler (JIRA)
Add a scoring DistanceQuery that does not need caches and separate filters > -- > > Key: LUCENE-2395 > URL: https://issues.apache.org/jira/browse/LUCENE-2395 > Proj

[jira] Updated: (LUCENE-2395) Add a scoring DistanceQuery that does not need caches and separate filters

2010-04-15 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-2395: -- Attachment: (was: DistanceQuery.java) > Add a scoring DistanceQuery that does not n

[jira] Updated: (LUCENE-2395) Add a scoring DistanceQuery that does not need caches and separate filters

2010-04-15 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2395?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler updated LUCENE-2395: -- Attachment: DistanceQuery.java small updates to Chris' patches. > Add a

[jira] Updated: (LUCENE-2395) Add a scoring DistanceQuery that does not need caches and separate filters

2010-04-15 Thread Uwe Schindler (JIRA)
classes are missing (coming with Chris' later patches), but it shows how it should work and how its customizeable. > Add a scoring DistanceQuery that does not need caches and separate filters > -- > >

[jira] Updated: (LUCENE-2395) Add a scoring DistanceQuery that does not need caches and separate filters

2010-04-15 Thread Uwe Schindler (JIRA)
thought about the broken distance query in contrib. It lacks the following features: - It needs a query/filter for the enclosing bbox (which is constant score) - It needs a separate filter for filtering out hits to far away (inside bbox but outside distance limit) - It has no scoring, so if somebody

[jira] Commented: (LUCENE-2395) Add a scoring DistanceQuery that does not need caches and separate filters

2010-04-15 Thread Chris Male (JIRA)
sses the current problems with caching calculated distances and means that Spatial will work with per segment. > Add a scoring DistanceQuery that does not need caches and separate filters > -- > >

[jira] Created: (LUCENE-2395) Add a scoring DistanceQuery that does not need caches and separate filters

2010-04-15 Thread Uwe Schindler (JIRA)
Add a scoring DistanceQuery that does not need caches and separate filters -- Key: LUCENE-2395 URL: https://issues.apache.org/jira/browse/LUCENE-2395 Project: Lucene - Java

[jira] Commented: (LUCENE-2392) Enable flexible scoring

2010-04-12 Thread Robert Muir (JIRA)
why have it ordered at all? > Enable flexible scoring > --- > > Key: LUCENE-2392 > URL: https://issues.apache.org/jira/browse/LUCENE-2392 > Project: Lucene - Java > Issue Type: Improvement >

[jira] Commented: (LUCENE-2392) Enable flexible scoring

2010-04-12 Thread Shai Erera (JIRA)
above. I misunderstood that the stats I need are stored per-field per-doc. So that will allow me to compute the docLength as I want. > Enable flexible scoring > --- > > Key: LUCENE-2392 > URL: https://issues.apache.org

[jira] Commented: (LUCENE-2392) Enable flexible scoring

2010-04-12 Thread Michael McCandless (JIRA)
the "baby steps" part of the original thread). Ie, the IR world seems to have converged on a smallish set of "stats" that are commonly required, so I'd like to make those initial stats work well, for starters. Commit that (it enables all sorts of state of the art scoring mo

[jira] Commented: (LUCENE-2392) Enable flexible scoring

2010-04-12 Thread Michael McCandless (JIRA)
(~171 words per doc on avg). > Enable flexible scoring > --- > > Key: LUCENE-2392 > URL: https://issues.apache.org/jira/browse/LUCENE-2392 > Project: Lucene - Java > Issue Type: Improvement >

Re: [jira] Commented: (LUCENE-2392) Enable flexible scoring

2010-04-12 Thread Shai Erera
e doc Length as one perceives it. Why is that problematic? What Mike opened is an issue titled "enable flexible scoring" ... what I'm asking for falls under that hood? Also, maybe we should have that discussion on the issue? Shai On Mon, Apr 12, 2010 at 11:31 AM, Robert Muir wrot

Re: [jira] Commented: (LUCENE-2392) Enable flexible scoring

2010-04-12 Thread Robert Muir
ow that length is computed. Wherever we write the norms, we'll > call that impl, which by default will do what Lucene does today? > I think though that it's not a field-level setting, but an IW one? > > > Enable flexible scoring > > --- > >

[jira] Commented: (LUCENE-2392) Enable flexible scoring

2010-04-12 Thread Shai Erera (JIRA)
uted. Wherever we write the norms, we'll call that impl, which by default will do what Lucene does today? I think though that it's not a field-level setting, but an IW one? > Enable flexible scoring > --- > > Key: LUCENE-2392 >

[jira] Commented: (LUCENE-2392) Enable flexible scoring

2010-04-11 Thread Robert Muir (JIRA)
d the discountOverlaps=false (no longer the default) should be considered deprecated compatibility behavior :) > Enable flexible scoring > --- > > Key: LUCENE-2392 > URL: https://issues.apache.org/jira/browse/LUCENE-2392 >

[jira] Updated: (LUCENE-2392) Enable flexible scoring

2010-04-11 Thread Michael McCandless (JIRA)
ble scoring > --- > > Key: LUCENE-2392 > URL: https://issues.apache.org/jira/browse/LUCENE-2392 > Project: Lucene - Java > Issue Type: Improvement > Components: Search >R

[jira] Created: (LUCENE-2392) Enable flexible scoring

2010-04-11 Thread Michael McCandless (JIRA)
Enable flexible scoring --- Key: LUCENE-2392 URL: https://issues.apache.org/jira/browse/LUCENE-2392 Project: Lucene - Java Issue Type: Improvement Components: Search Reporter: Michael McCandless

Re: Baby steps towards making Lucene's scoring more flexible...

2010-03-29 Thread Michael McCandless
> "flexible matching", which is more expansive than "flexible scoring"? >> >> I think so.  Maybe it shouldn't be called a Similarity (which to me >> (though, carrying a heavy curse of knowledge burden...) means >> "scoring")?  Matcher? &g

Re: Baby steps towards making Lucene's scoring more flexible...

2010-03-29 Thread Michael McCandless
he way a field is tokenized is part of its field definition, thus > the Analyzer is part of the field definition, thus the analyzer is part of the > schema and needs to be stored with the index. OK. > Still, we support different Analyzers at search time by way of QueryParser. > QueryParser

Re: Baby steps towards making Lucene's scoring more flexible...

2010-03-26 Thread Marvin Humphrey
On Thu, Mar 25, 2010 at 06:24:34AM -0400, Michael McCandless wrote: > > Maybe aggressive automatic data-reduction makes more sense in the context of > > "flexible matching", which is more expansive than "flexible scoring"? > > I think so. Maybe it shouldn

Re: Baby steps towards making Lucene's scoring more flexible...

2010-03-25 Thread Marvin Humphrey
ers at search time by way of QueryParser. QueryParser's constructor requires a Schema, but also accepts an optional Analyzer which if supplied will be used instead of the Analyzers from the Schema. > > Maybe aggressive automatic data-reduction makes more sense in the context of > &g

Re: Baby steps towards making Lucene's scoring more flexible...

2010-03-25 Thread Michael McCandless
t;> Ie so the chosen Sim can properly recompute all boost bytes (if it uses >> those), for scoring models that "pivot" based on avg's of these stats? > > Yes, we could support that. > > It's not high on my todo-list for core Lucy, though: poor payoff for

[jira] Commented: (LUCENE-2091) Add BM25 Scoring to Lucene

2010-03-24 Thread Katja Hofmann (JIRA)
back! The change you suggested works; now it compiles without problems (I used lucene 3.0.1) Best regards, Katja > Add BM25 Scoring to Lucene > -- > > Key: LUCENE-2091 > URL: https://issues.apache.org/jira/browse/LUCENE-2091 &g

Re: Baby steps towards making Lucene's scoring more flexible...

2010-03-22 Thread Marvin Humphrey
osen Sim can properly recompute all boost bytes (if it uses > those), for scoring models that "pivot" based on avg's of these stats? Yes, we could support that. It's not high on my todo-list for core Lucy, though: poor payoff for all the complexity it would introduce,

[jira] Commented: (LUCENE-2091) Add BM25 Scoring to Lucene

2010-03-22 Thread Yuval Feinstein (JIRA)
dd BM25 Scoring to Lucene > -- > > Key: LUCENE-2091 > URL: https://issues.apache.org/jira/browse/LUCENE-2091 > Project: Lucene - Java > Issue Type: New Feature > Components: contrib/* >

[jira] Commented: (LUCENE-2091) Add BM25 Scoring to Lucene

2010-03-19 Thread Katja Hofmann (JIRA)
[javac] float fieldNorm = this.getSimilarity().decodeNormValue(norms[i][this.docID()]); [javac] > Add BM25 Scoring to Lucene > -- > > Key: LUCENE-2091 > URL: https://issues.apache.org/jira/browse/LUCENE-20

Re: Baby steps towards making Lucene's scoring more flexible...

2010-03-18 Thread Michael McCandless
On Mon, Mar 15, 2010 at 7:49 PM, Marvin Humphrey wrote: > On Mon, Mar 15, 2010 at 05:28:33AM -0500, Michael McCandless wrote: >> I mean specifically one should not have to commit to the precise >> scoring model they will use for a given field, when they index that >> field.

Re: Baby steps towards making Lucene's scoring more flexible...

2010-03-15 Thread Marvin Humphrey
On Mon, Mar 15, 2010 at 05:28:33AM -0500, Michael McCandless wrote: > I mean specifically one should not have to commit to the precise > scoring model they will use for a given field, when they index that > field. Yeah, I've never seen committing to a precise scoring model at inde

Re: Baby steps towards making Lucene's scoring more flexible...

2010-03-15 Thread Robert Muir
>>> But I don't like baking in search concepts at index time... >> > Many scoring models are possible if you store enough stats in the > index. > in general the missing stats seem to fit in two buckets/categories: 1) length normalization pivot: average length in

Re: Baby steps towards making Lucene's scoring more flexible...

2010-03-15 Thread Michael McCandless
pts baked in, and a flat file would > be best. :) > > Seriously... optimizing on-disk data structures to accommodate anticipated > search query patterns and maximize speed and relevance... that's what > indexing's all about, ain't it? You're over-reading into

Re: Baby steps towards making Lucene's scoring more flexible...

2010-03-14 Thread Marvin Humphrey
est. :) Seriously... optimizing on-disk data structures to accommodate anticipated search query patterns and maximize speed and relevance... that's what indexing's all about, ain't it? And what class other than Similarity knows enough about the scoring algorithm to perform these d

Re: Baby steps towards making Lucene's scoring more flexible...

2010-03-13 Thread Michael McCandless
ty judgments. >> > However, that polymorphism would be handled internally -- it wouldn't be >> > the >> > responsibility of the user to determine whether a codec supported a >> > particular >> > scoring model. >> >> Is that "yes&q

Re: Baby steps towards making Lucene's scoring more flexible...

2010-03-13 Thread Michael McCandless
On Thu, Mar 11, 2010 at 12:35 PM, Marvin Humphrey wrote: > On Mon, Mar 08, 2010 at 02:10:35PM -0500, Michael McCandless wrote: > >> We ask it to give us a Codec. > > There's a conflict between the segment-wide role of the "Codec" class and its > role as specifier for posting format. > > In some se

Re: Baby steps towards making Lucene's scoring more flexible...

2010-03-12 Thread Marvin Humphrey
ternally -- it wouldn't be the > > responsibility of the user to determine whether a codec supported a > > particular > > scoring model. > > Is that "yes" (a user can do MatchOnlySim at search time" if the field > were indexed with B25Sim)? In ess

Re: Baby steps towards making Lucene's scoring more flexible...

2010-03-11 Thread Marvin Humphrey
On Mon, Mar 08, 2010 at 02:10:35PM -0500, Michael McCandless wrote: > We ask it to give us a Codec. There's a conflict between the segment-wide role of the "Codec" class and its role as specifier for posting format. In some sense, you could argue that the "codec" reads/writes the entire index se

Re: Baby steps towards making Lucene's scoring more flexible...

2010-03-11 Thread Michael McCandless
ng, but then sometimes >> >> use match-only and sometimes full-scoring when querying against that >> >> field? >> > >> > The same way that Lucene knows that sometimes it needs a docs-only-enum and >> > sometimes it needs a docs-and-positions enum.

Re: Baby steps towards making Lucene's scoring more flexible...

2010-03-09 Thread Marvin Humphrey
On Tue, Mar 09, 2010 at 01:18:12PM -0500, Michael McCandless wrote: > > >> You said "of course" before but... how in your proposal could one > >> store all stats for a given field during indexing, but then sometimes > >> use match-only and sometimes

Re: Baby steps towards making Lucene's scoring more flexible...

2010-03-09 Thread Michael McCandless
, it wasn't enforced. OK. >> You said "of course" before but... how in your proposal could one >> store all stats for a given field during indexing, but then sometimes >> use match-only and sometimes full-scoring when querying against that >> field? > > Th

Re: Baby steps towards making Lucene's scoring more flexible...

2010-03-09 Thread Marvin Humphrey
> store all stats for a given field during indexing, but then sometimes > use match-only and sometimes full-scoring when querying against that > field? The same way that Lucene knows that sometimes it needs a docs-only-enum and sometimes it needs a docs-and-positions enum. Sometimes you ne

Re: Baby steps towards making Lucene's scoring more flexible...

2010-03-09 Thread Michael McCandless
roximately the same amount of work no matter how you time-shift it. Yes. >> I do agree there's some connection -- if I don't store tf nor >> positions then I can't use a Sim that needs these stats. >> >> > I also like the idea of novice/intermediate users

Re: Baby steps towards making Lucene's scoring more flexible...

2010-03-08 Thread Marvin Humphrey
these stats. > > > I also like the idea of novice/intermediate users being able to express the > > intent for how a field gets scored by choosing a Similarity subclass, > > without > > having to worry about the underlying details of posting format. > > Well.. I

RE: Baby steps towards making Lucene's scoring more flexible...

2010-03-08 Thread Steven A Rowe
On 03/08/2010 at 2:10 PM, Michael McCandless wrote: > On Mon, Mar 8, 2010 at 2:07 PM, Steven A Rowe wrote: > > On 03/08/2010 at 1:57 PM, Steven A Rowe wrote: > > > On 03/08/2010 at 1:13 PM, Michael McCandless wrote: > > > > On Sun, Mar 7, 2010 at 1:21 PM, Marvin Humphrey > > > > wrote: > > > > >

Re: Baby steps towards making Lucene's scoring more flexible...

2010-03-08 Thread Michael McCandless
On Mon, Mar 8, 2010 at 2:07 PM, Steven A Rowe wrote: > On 03/08/2010 at 1:57 PM, Steven A Rowe wrote: >> On 03/08/2010 at 1:13 PM, Michael McCandless wrote: >> > On Sun, Mar 7, 2010 at 1:21 PM, Marvin Humphrey >> > wrote: >> > > On Sat, Mar 06, 2010 at 05:07:18AM -0500, Michael McCandless wrote:

RE: Baby steps towards making Lucene's scoring more flexible...

2010-03-08 Thread Steven A Rowe
On 03/08/2010 at 1:57 PM, Steven A Rowe wrote: > On 03/08/2010 at 1:13 PM, Michael McCandless wrote: > > On Sun, Mar 7, 2010 at 1:21 PM, Marvin Humphrey > > wrote: > > > On Sat, Mar 06, 2010 at 05:07:18AM -0500, Michael McCandless wrote: > > > > > What's the flex API for specifying a custom postin

RE: Baby steps towards making Lucene's scoring more flexible...

2010-03-08 Thread Steven A Rowe
On 03/08/2010 at 1:13 PM, Michael McCandless wrote: > On Sun, Mar 7, 2010 at 1:21 PM, Marvin Humphrey > wrote: > > On Sat, Mar 06, 2010 at 05:07:18AM -0500, Michael McCandless wrote: > > > > What's the flex API for specifying a custom posting format? > > > > > > You implement a Codecs class, whic

Re: Baby steps towards making Lucene's scoring more flexible...

2010-03-08 Thread Michael McCandless
gt; be the class Lucene uses to get reader/writer for other parts of the >> index. > > Huh? What does the posting format specifier have to do with e.g. stored > fields? > > What you're describing sounds more like the Architecture class in KinoSearch. OK. >> I'

Re: Baby steps towards making Lucene's scoring more flexible...

2010-03-07 Thread Marvin Humphrey
> I'm a little confused: if I indexed a field with full postings data, > shouldn't I still be allowed score with match only scoring? Of course. > When a movie is encoded to a file, the codec(s) determine all sorts of > interesting details. Then when you watch the movie you

[jira] Commented: (LUCENE-2091) Add BM25 Scoring to Lucene

2010-03-06 Thread Yuval Feinstein (JIRA)
.0.1, please let me know and I will try to handle the changes. > Add BM25 Scoring to Lucene > -- > > Key: LUCENE-2091 > URL: https://issues.apache.org/jira/browse/LUCENE-2091 > Project: Lucene - Java >

Re: Baby steps towards making Lucene's scoring more flexible...

2010-03-06 Thread Michael McCandless
guaranteed to have consistently random distribution of field lengths across > nodes. > > Hoss had a good example illustrating why per-node IDF doesn't always work well > in a cluster: search cluster of news content with nodes divided by year, and > the top scoring hit for "iphone&

[jira] Commented: (LUCENE-2091) Add BM25 Scoring to Lucene

2010-03-06 Thread Vinay Setty (JIRA)
eight are all changed. Does anyone have a modified version of BM25 classes which works with latest version of Lucene? > Add BM25 Scoring to Lucene > -- > > Key: LUCENE-2091 > URL: https://issues.apache.org/jira/browse/LUCENE-20

Re: Baby steps towards making Lucene's scoring more flexible...

2010-03-05 Thread Marvin Humphrey
strating why per-node IDF doesn't always work well in a cluster: search cluster of news content with nodes divided by year, and the top scoring hit for "iphone" is a misspelling from 1997 (because it was an extremely rare term on that search node). Similarly, if you calc field length s

Re: Baby steps towards making Lucene's scoring more flexible...

2010-03-04 Thread Michael McCandless
On Tue, Mar 2, 2010 at 4:12 PM, Marvin Humphrey wrote: > On Tue, Mar 02, 2010 at 05:55:44AM -0500, Michael McCandless wrote: >> The problem is, these scoring models need the avg field length (in >> tokens) across the entire index, to compute the norms. >> >> Ie, you

Re: Baby steps towards making Lucene's scoring more flexible...

2010-03-02 Thread Marvin Humphrey
On Tue, Mar 02, 2010 at 05:55:44AM -0500, Michael McCandless wrote: > The problem is, these scoring models need the avg field length (in > tokens) across the entire index, to compute the norms. > > Ie, you can't do that on writing a single segment. I don't see why

Re: Baby steps towards making Lucene's scoring more flexible...

2010-03-02 Thread Michael McCandless
earch time. > Even in Lucene, it seems odd to want to calculate all of those on > the fly each time you open an index. It seems to me that this is a > specialized need of BM25. The problem is, these scoring models need the avg field length (in tokens) across the entire index, to compu

Re: Baby steps towards making Lucene's scoring more flexible...

2010-02-28 Thread Marvin Humphrey
ty is where we decode norms right now. In my opinion, it should be the Similarity object from which we specify per-field posting formats. See my reply to Robert in the BM25 thread: http://markmail.org/message/77rmrfmpatxd3p2e That way, custom scoring implementations can guarantee that

Baby steps towards making Lucene's scoring more flexible...

2010-02-26 Thread Michael McCandless
In thinking about & discussing with Robert how to allow Lucene to support other scoring models, eg lnu.ltc, BM25, etc I think a relatively contained set of changes can give us a solid step forward. Something like this: * Store additional per-doc stats in the index, eg in a cu

[jira] Commented: (LUCENE-2091) Add BM25 Scoring to Lucene

2010-02-16 Thread Joaquin Perez-Iglesias (JIRA)
r strictly positive to avoid terms being ignored at all. {quote} > Add BM25 Scoring to Lucene > -- > > Key: LUCENE-2091 > URL: https://issues.apache.org/jira/browse/LUCENE-2091 > Project: Lucene - Ja

[jira] Issue Comment Edited: (LUCENE-2091) Add BM25 Scoring to Lucene

2010-02-16 Thread Robert Muir (JIRA)
stopwords list is used. I'm curious what you think about this as it looks like a potential improvement for people not using stopwords (multilingual situation, etc) > Add BM25 Scoring to Lucene > -- > > Key: LUCENE-2091 >

[jira] Commented: (LUCENE-2091) Add BM25 Scoring to Lucene

2010-02-16 Thread Robert Muir (JIRA)
it looks like a potential improvement for people not using stopwords (multilingual situation, etc) > Add BM25 Scoring to Lucene > -- > > Key: LUCENE-2091 > URL: https://issues.apache.org/jira/browse/LUCENE-2091 >

[jira] Issue Comment Edited: (LUCENE-329) Fuzzy query scoring issues

2010-02-15 Thread Mark Harwood (JIRA)
ncy about probability of variants given the other input terms in the query but that feels like its straying into spell checker territory and ngrams etc. > Fuzzy query scoring issues > -- > > Key: LUCENE-329 > URL: https://issue

[jira] Commented: (LUCENE-329) Fuzzy query scoring issues

2010-02-15 Thread Mark Harwood (JIRA)
n factors we have to hand - the IDF of the user's supposedly valid input and the similarity measure of each variant compared to the input. We could get fancy about probability of variants given the other input terms in the query but that feels like its straying into spell checker territory an

[jira] Commented: (LUCENE-329) Fuzzy query scoring issues

2010-02-15 Thread Eks Dev (JIRA)
ese two freqs bring some easy precision points (HF-LF Pairs are much more likely to be typos that two HF-HF... ). > Fuzzy query scoring issues > -- > > Key: LUCENE-329 > URL: https://issues.apache.org/jira/browse/LUCENE-329 &

[jira] Commented: (LUCENE-329) Fuzzy query scoring issues

2010-02-15 Thread Robert Muir (JIRA)
sn't as simple as offering a choice between preserving IDF for all terms or not. Mark, right, my mistake. I will move this patch to LUCENE-124 so there is a simple alternative, you can proceed here with a smarter method... sorry i got confused amongst the different issues :) > Fuzz

[jira] Updated: (LUCENE-329) Fuzzy query scoring issues

2010-02-15 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-329: --- Attachment: (was: LUCENE-329.patch) > Fuzzy query scoring iss

[jira] Commented: (LUCENE-329) Fuzzy query scoring issues

2010-02-15 Thread Mark Harwood (JIRA)
sn't as simple as offering a choice between preserving IDF for all terms or not. Instead, it is a proposal that we should use the *input* term's IDF for scoring all variants of the same root term (or taking an average of variants where the root term does not exist). This I feel p

[jira] Assigned: (LUCENE-329) Fuzzy query scoring issues

2010-02-15 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler reassigned LUCENE-329: Assignee: (was: Lucene Developers) > Fuzzy query scoring iss

[jira] Assigned: (LUCENE-329) Fuzzy query scoring issues

2010-02-15 Thread Uwe Schindler (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Uwe Schindler reassigned LUCENE-329: Assignee: (was: Lucene Developers) > Fuzzy query scoring iss

[jira] Commented: (LUCENE-329) Fuzzy query scoring issues

2010-02-15 Thread Robert Muir (JIRA)
. You can still create a 'smarter' method here, it won't get in the way as now FuzzyQuery does not have a hardcoded rewrite method. > Fuzzy query scoring issues > -- > > Key: LUCENE-329 > URL: https://i

[jira] Commented: (LUCENE-329) Fuzzy query scoring issues

2010-02-15 Thread Robert Muir (JIRA)
. You can still create a 'smarter' method here, it won't get in the way as now FuzzyQuery does not have a hardcoded rewrite method. > Fuzzy query scoring issues > -- > > Key: LUCENE-329 > URL: https://i

[jira] Updated: (LUCENE-329) Fuzzy query scoring issues

2010-02-15 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-329: --- Attachment: LUCENE-329.patch here is a rough patch > Fuzzy query scoring iss

[jira] Updated: (LUCENE-329) Fuzzy query scoring issues

2010-02-15 Thread Robert Muir (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Robert Muir updated LUCENE-329: --- Attachment: LUCENE-329.patch here is a rough patch > Fuzzy query scoring iss

[jira] Commented: (LUCENE-329) Fuzzy query scoring issues

2010-02-15 Thread Robert Muir (JIRA)
r would lose this feature (available in FuzzyLikeThisQuery) Mark, it wouldn't lose any features. we simply provide another option, just like we do for other MultiTermQuery rewrites for other queries, so users can choose what they want to use. its just an additional choice. > F

[jira] Commented: (LUCENE-329) Fuzzy query scoring issues

2010-02-15 Thread Robert Muir (JIRA)
r would lose this feature (available in FuzzyLikeThisQuery) Mark, it wouldn't lose any features. we simply provide another option, just like we do for other MultiTermQuery rewrites for other queries, so users can choose what they want to use. its just an additional choice. > F

[jira] Commented: (LUCENE-329) Fuzzy query scoring issues

2010-02-15 Thread Mark Harwood (JIRA)
r would lose this feature (available in FuzzyLikeThisQuery) > Fuzzy query scoring issues > -- > > Key: LUCENE-329 > URL: https://issues.apache.org/jira/browse/LUCENE-329 > Project: Lucene - Java >

[jira] Commented: (LUCENE-329) Fuzzy query scoring issues

2010-02-15 Thread Mark Harwood (JIRA)
r would lose this feature (available in FuzzyLikeThisQuery) > Fuzzy query scoring issues > -- > > Key: LUCENE-329 > URL: https://issues.apache.org/jira/browse/LUCENE-329 > Project: Lucene - Java >

[jira] Commented: (LUCENE-329) Fuzzy query scoring issues

2010-02-15 Thread Robert Muir (JIRA)
nice way. we can make an alternative rewrite method for fuzzy that does just like TopTermsRewrite, except it creates a BooleanQuery of ConstantScore queries instead. this way the score will be equal to the boost. then users could choose which one they want to use. > Fuzzy query scoring

[jira] Commented: (LUCENE-329) Fuzzy query scoring issues

2010-02-15 Thread Robert Muir (JIRA)
nice way. we can make an alternative rewrite method for fuzzy that does just like TopTermsRewrite, except it creates a BooleanQuery of ConstantScore queries instead. this way the score will be equal to the boost. then users could choose which one they want to use. > Fuzzy query scoring

[jira] Created: (LUCENE-2236) Similarity can only be set per index, but I may want to adjust scoring behaviour at a field level

2010-01-25 Thread Paul taylor (JIRA)
Similarity can only be set per index, but I may want to adjust scoring behaviour at a field level - Key: LUCENE-2236 URL: https://issues.apache.org/jira/browse/LUCENE

[jira] Updated: (LUCENE-2130) Investigate Rewriting Constant Scoring MultiTermQueries per segment

2009-12-07 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated LUCENE-2130: Fix Version/s: Flex Branch > Investigate Rewriting Constant Scoring MultiTermQueries per segm

[jira] Updated: (LUCENE-2130) Investigate Rewriting Constant Scoring MultiTermQueries per segment

2009-12-07 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated LUCENE-2130: Attachment: LUCENE-2130.patch updated > Investigate Rewriting Constant Scoring MultiTermQuer

[jira] Updated: (LUCENE-2130) Investigate Rewriting Constant Scoring MultiTermQueries per segment

2009-12-07 Thread Mark Miller (JIRA)
subreader and grab the actual constantscore weight. It works I think - but its a little ugly. I've spewed too much confusion in this issue - just going to rewrite the summary. > Investigate Rewriting Constant Scoring MultiTermQueries per segment > ---

[jira] Updated: (LUCENE-2130) Investigate Rewriting Constant Scoring MultiTermQueries per segment

2009-12-07 Thread Mark Miller (JIRA)
mode does still use the booleanquery (of course, why else have it) - but its only going to be with few clauses, so neither is really a benefit.) > Investigate Rewriting Constant Scoring MultiTermQueries per segment > --- > >

[jira] Issue Comment Edited: (LUCENE-2130) Investigate Rewriting Constant Scoring MultiTermQueries per segment

2009-12-07 Thread Mark Miller (JIRA)
mmary - you would't apply a huge boolean query - you'd just have a sparser filter. This might not be that beneficial. * edit * Smaller, sparser filter? > Investigate Rewriting Constant Scor

[jira] Commented: (LUCENE-2130) Investigate Rewriting Constant Scoring MultiTermQueries per segment

2009-12-07 Thread Robert Muir (JIRA)
n, they frequently scan the entire term dictionary only to return a few results. > Investigate Rewriting Constant Scoring MultiTermQueries per segment > --- > > Key: LUCENE-2130 > URL

[jira] Issue Comment Edited: (LUCENE-2130) Investigate Rewriting Constant Scoring MultiTermQueries per segment

2009-12-07 Thread Mark Miller (JIRA)
6 AM: -- The ugly patch - (which doesn't yet handle the filter supplied case) was (Author: markrmil...@gmail.com): The ugly patch > Investigate Rewriting Constant Scoring MultiTermQueries

[jira] Issue Comment Edited: (LUCENE-2130) Investigate Rewriting Constant Scoring MultiTermQueries per segment

2009-12-07 Thread Mark Miller (JIRA)
ould't apply a huge boolean query - you'd just have a sparser filter. This might not be that beneficial. > Investigate Rewriting Constant Scoring MultiTermQueries per segment > --- > > Key: LUCE

[jira] Updated: (LUCENE-2130) Investigate Rewriting Constant Scoring MultiTermQueries per segment

2009-12-07 Thread Mark Miller (JIRA)
[ https://issues.apache.org/jira/browse/LUCENE-2130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mark Miller updated LUCENE-2130: Attachment: LUCENE-2130.patch The ugly patch > Investigate Rewriting Constant Scor

[jira] Commented: (LUCENE-2130) Investigate Rewriting Constant Scoring MultiTermQueries per segment

2009-12-07 Thread Mark Miller (JIRA)
the advantage when you are enumerating a lot of terms is that you avoid DirectoryReaders MultiTermEnum and its PQ. > Investigate Rewriting Constant Scoring MultiTermQueries per segment > --- > >

[jira] Commented: (LUCENE-2130) Investigate Rewriting Constant Scoring MultiTermQueries per segment

2009-12-07 Thread Mark Miller (JIRA)
you would't apply a huge boolean query - you'd just have a sparser filter. This might not be that beneficial. > Investigate Rewriting Constant Scoring MultiTermQueries per segment > --- > >

[jira] Created: (LUCENE-2130) Investigate Rewriting Constant Scoring MultiTermQueries per segment

2009-12-07 Thread Mark Miller (JIRA)
Investigate Rewriting Constant Scoring MultiTermQueries per segment --- Key: LUCENE-2130 URL: https://issues.apache.org/jira/browse/LUCENE-2130 Project: Lucene - Java Issue

[jira] Commented: (LUCENE-2091) Add BM25 Scoring to Lucene

2009-12-04 Thread Robert Muir (JIRA)
one can try. > Add BM25 Scoring to Lucene > -- > > Key: LUCENE-2091 > URL: https://issues.apache.org/jira/browse/LUCENE-2091 > Project: Lucene - Java > Issue Type: New Feature > Compone

[jira] Commented: (LUCENE-2091) Add BM25 Scoring to Lucene

2009-12-04 Thread Grant Ingersoll (JIRA)
e, I tried modifying length normalization with SweetSpot etc as others have done in the past. For this corpus I was unable to improve it in this way. Yeah, can't speak for SweetSpot, but there are other approaches too that don't favor shorter docs all the time. > Add BM25

[jira] Commented: (LUCENE-2091) Add BM25 Scoring to Lucene

2009-12-04 Thread Robert Muir (JIRA)
also donated an implementation of the Axiomatic Retr. Function. I've never been able to get that scoring function to do anything more than be consistently worse than the default Lucene formula. I tried at least 3 test collections with it... bq. I'm also curious if anyone has compared BM

[jira] Commented: (LUCENE-2091) Add BM25 Scoring to Lucene

2009-12-04 Thread Grant Ingersoll (JIRA)
yet, but... Should we take just a small step back and consider what it would take to actually make scoring more pluggable instead of just thinking about how best to integrate BM25? In other words, someone else has also donated an implementation of the Axiomatic Retr. Function. Much like BM25

[jira] Commented: (LUCENE-2091) Add BM25 Scoring to Lucene

2009-12-04 Thread Michael McCandless (JIRA)
ad make a dedicated posting list, which would be properly merged, but we'd then have to re-walk to compute the stats for the newly merged segment. > Add BM25 Scoring to Lucene > -- > > Key: LUCENE-2091 > URL: https://issu

[jira] Issue Comment Edited: (LUCENE-2091) Add BM25 Scoring to Lucene

2009-12-04 Thread Joaquin Perez-Iglesias (JIRA)
ery type), as far as frequency and docFreq of the phrase/terms are available. At this point it is not supported in the patch, but I don't see any reason why it couldn't be implemented, moreover that I don't really know is how to do it :-). > Add BM25 Scoring to Lucene &g

[jira] Issue Comment Edited: (LUCENE-2091) Add BM25 Scoring to Lucene

2009-12-04 Thread Joaquin Perez-Iglesias (JIRA)
ery type), as far as frequency and docFreq of the phrase/terms are available. At this point it is not supported in the patch, but I don't see any reason why it couldn't be implemented, moreover that I don't really know how to do it :-). > Add BM25 Scoring to Lucene >

[jira] Commented: (LUCENE-2091) Add BM25 Scoring to Lucene

2009-12-04 Thread Joaquin Perez-Iglesias (JIRA)
requency and docFreq of the phrase/terms are available. At this point it is not supported in the patch, but I don't see any reason why it couldn't be implemented, moreover that I don't really know how to do it :-). > Add BM25 Scoring to Lucene > --

[jira] Commented: (LUCENE-2091) Add BM25 Scoring to Lucene

2009-12-04 Thread Simon Willnauer (JIRA)
b:x2 TermWeigth will calculate the IDF for Term(a, x1) and Term(b, x2), am I missing something? > Add BM25 Scoring to Lucene > -- > > Key: LUCENE-2091 > URL: https://issues.apache.org/jira/browse/LUCENE-2091 >

[jira] Commented: (LUCENE-2091) Add BM25 Scoring to Lucene

2009-12-04 Thread Michael McCandless (JIRA)
document level IDF) Is there anything else? bq. Only simple boolean queries based on terms are supported (with operators or, and, not). For instance it does not support PhraseQuery. This is concerning -- is there no way to score a PhraseQuery in BM25F? > Add

  1   2   3   4   >