Re: New implementation of MLT

2013-04-03 Thread Osullivan L .
Greetings,

I have a custom analyzer which converts Library of Congress Callnumbers into 
normalized strings:

   
  


  

   

Thus, values like:

PQ239.Z56
PQ239.H63 2008
PQ239.S62 1982
PQ239.B68 1983
PQ2390.S35 A5
PQ2390.S35 B8 1898
PQ2389 .R65 F3 1854 t.1
PQ239.A7 1969
PQ2.N6 1959
PQ22.A4 D47 1949
PQ238.L57 1985

become:

PQ 0239.00 Z0.56
PQ 0239.00 H0.63 002008
PQ 0239.00 S0.62 001982
PQ 0239.00 B0.68 001983
PQ 2390.00 S0.35 A0.50
PQ 2390.00 S0.35 B0.80 001898
PQ 2389.00 R0.65 F0.30 001854 T.01
PQ 0002.00 N0.60 001959
PQ 0022.00 A0.40 D0.47 001949
PQ 0238.00 L0.57 001985

This allows items to be accurately sorted by callnumber.

I would also like to perform ranged searches on the normalised callnumber but 
whereas callnumber-normalized=[DS+TO+FE] will correctly list items with 
callnumbers between DS and FE, starting with DT* and finishing with FD* , 
callnumber-normalized=[DS763+TO+FE] incorrectly starts at DT* and finishes with 
FD*.

Can anyone explain why this might be the case?

Looking at 
http://wiki.apache.org/solr/MultitermQueryAnalysis#Current_components_that_implement_MultiTermAwareComponent,
 would I have to add one of the MultiTermAware Factories to make this work?

Thanks,

Luke


--
Luke O'Sullivan
Systems Developer
Web Team
Swansea University, Singleton Park, Swansea SA2 8PP, UK
l.osulli...@swansea.ac.uk
01792 602772
@l_os_cymru


Re: New implementation of MLT

2013-03-31 Thread Erick Erickson
Gagan:

Absolutely open up a JIRA and attach a patch!

Erick

On Sun, Mar 31, 2013 at 1:18 AM, Gagandeep singh  wrote:
> Hi folks
>
> We started using the default implementation of MLT
> (org.apache.solr.handler.MoreLikeThisHandler) recently and found that there
> are a couple of things it lacks:
>
> Searching for terms in the same field as the original document:
>
> the current implementation picks the top field to search an interesting term
> in based on docFreq, however this can give bad results if say original
> product is from brand:"RED Valentino", and we end up searching red in color
> field.
>
> Phrase boosts:
>
> if product name is "business cards", then it makes sense to give a boost to
> the phrase boost to products which are also business cards.
>
> Support for bq, bf, fq, multiplicative boost:
>
> you might want to filter out_of_stock products, give a multiplicative boost
> to a product based on their price similarity / launch date.
>
> Support of explainOther
>
> We had a use case for each of these and i ended up writing my own
> MLTQueryParser which builds the MLT query for a given document. It also has
> a new concept called childDocs. You can think of some documents as products,
> and a collection of products can be though of as a category page. You could
> search for similar documents based on the products a category page has.
>
> I was wondering if you guys would be interested in an alternate
> implementation of MLT that supports all the knobs that solr search does. I
> could post a patch file maybe?
>
> Thanks
> Gagan
>

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org