Re: Similarity Implementation

2016-07-07 Thread Đạt Cao Mạnh
Hi Siraj, I think https://lucene.apache.org/core/6_1_0/core/index.html?org/apache/lucene/search/ConstantScoreQuery.html should be good enough. On Fri, Jul 8, 2016 at 12:27 AM Siraj Haider wrote: > We are in the process of upgrading from 2.x to 6.x. In 2.x we implemented > our own similarity whe

Re: Similarity formula documentation is misleading + how to make field-agnostic queries?

2015-01-19 Thread danield
Update: I have implemented my own subclasses of QueryParser, BooleanQuery, BooleanScorer and Similarity to deal with this. I have been successful in getting the exact behaviour I want... when calling the .explain() method. However, the scores for some documents often differ when calling IndexSearc

Re: Similarity formula documentation is misleading + how to make field-agnostic queries?

2015-01-15 Thread Jack Krupansky
File a Jira for this particular doc fix since it is significant and not just mere worksmithing. Better yet, submit a patch since that's Javadoc, although the exact form of the doc fix might be debatable, so I general description of the problem should be sufficient, unless you feel motivated. -- Ja

Re: Similarity formula documentation is misleading + how to make field-agnostic queries?

2015-01-15 Thread danield
Oh thanks Mike, it did say somewhere. I guess it wouldn't hurt to make that explanation more prominent, as I clearly missed it. Never mind, I am working on my own solution for this, through subclassing QueryParser, BooleanQuery, BooleanScorer, Similarity and a bunch of other classes. Cheers, Dani

Re: Similarity formula documentation is misleading + how to make field-agnostic queries?

2015-01-15 Thread Michael Sokolov
On 1/15/15 11:23 AM, danield wrote: Hi Mike, Thank you for your reply. Yes, I had thought of this, but it is not a solution to my problem, and this is because the Term Frequency and therefore the results will still be wrong, as prepending or appending a string to the term will still make it a di

Re: Similarity formula documentation is misleading + how to make field-agnostic queries?

2015-01-15 Thread danield
Hi Mike, Thank you for your reply. Yes, I had thought of this, but it is not a solution to my problem, and this is because the Term Frequency and therefore the results will still be wrong, as prepending or appending a string to the term will still make it a different term. Similarily, I could use

Re: Similarity formula documentation is misleading + how to make field-agnostic queries?

2015-01-14 Thread Michael Sokolov
In practice, normalization by field length proves to be more useful than normalization by the sum of the lengths of all fields (document length), which I think is what you seem to be after. Think of a book chapter document with two fields: title and full text. It makes little sense to weight

Re: Similarity formula documentation is misleading + how to make field-agnostic queries?

2015-01-13 Thread danield
Corrections: document2={field1:”term1”, field2:”term1”} Coord(query1,document2)= 1/1 = 1 (Doesn't affect the problem/observation) -- View this message in context: http://lucene.472066.n3.nabble.com/Similarity-formula-documentation-is-misleading-how-to-make-field-agnostic-queries-tp4179307p4

Re: Similarity coefficient for more exact matching

2012-05-10 Thread Ian Lea
Similarity.setDefault(new MySimilarity()) is certainly better than the 2 calls I recommended. Thanks. I find it hard to see why one might not want to do this in normal usage but have a vague recollection of someone once outlining some obscure scenarios where different similarities at index and se

RE: Similarity coefficient for more exact matching

2012-05-04 Thread Paul Hill
> [use] IndexWriterConfig.setSimilarity() and > IndexSearcher.setSimilarity(), unless you are clever or like being confused. > > SweetSpotSimilarity might also be worth a look. > > -- > Ian. Being even less clever, I just make sure I set: Similarity.setDefault(new MySimilarity()) when crawl

Re: Similarity coefficient for more exact matching

2012-04-27 Thread Ian Lea
You can override org.apache.lucene.search.Similarity/DefaultSimilarity to tweak quite a lot of stuff. computeNorm() may be the method you are interested in. Called at indexing time so be sure to use the same implementation at index and query time, using IndexWriterConfig.setSimilarity() and Index

Re: Similarity based on regexp

2010-04-08 Thread Michael McCandless
You can use RegexQuery (from contrib/regex) for this? (In 3.1 there's a higher performance, very similar, RegexpQuery, too). Mike On Thu, Apr 8, 2010 at 10:10 AM, Hans-Henning Gabriel wrote: > Hello everybody, > > this is what I would like to do: > I have an index with documents containing a fi

Re: similarity function

2009-11-08 Thread Chris Hostetter
: "how do i set the score of each document result to be the score of that : of the field that best matches the search terms"? you'll want something like this psuedo code... DisjunctionMaxQuery dq = new DMQ foreach fieldname in list_of_fields { BooleanQuery bq = new BQ foreach word in l

Re: similarity function

2009-10-28 Thread Joel Halbert
I suppose this could be summarised as: "how do i set the score of each document result to be the score of that of the field that best matches the search terms"? -Original Message- From: Joel Halbert Reply-To: java-user@lucene.apache.org To: Lucene Users Subject: similarity function Da

Re: Similarity

2009-06-23 Thread Shashi Kant
y and constrcuting vector space. > > - RB > > > - Original Message > From: Shashi Kant > To: java-user@lucene.apache.org > Sent: Tuesday, June 23, 2009 3:20:16 PM > Subject: Re: Similarity > > I suspect what you are looking for is "Latent Semantics

Re: Similarity

2009-06-23 Thread Cool The Breezer
used for analyzing terms semantically and constrcuting vector space. - RB - Original Message From: Shashi Kant To: java-user@lucene.apache.org Sent: Tuesday, June 23, 2009 3:20:16 PM Subject: Re: Similarity I suspect what you are looking for is "Latent Semantics"

Re: Similarity

2009-06-23 Thread Shashi Kant
I suspect what you are looking for is "Latent Semantics" - it can algorithmically infer that "iPod~iPhone" or "Apple~Steve Jobs". Google for "Latent Semantic Indexing" or "Latent Semantic Analysis" - you can apply some of those approaches using the TermVectors in Lucene index. Ontologies such as Wo

Re: Similarity and Lucene

2009-03-20 Thread Amin Mohammed-Coleman
Allthough (I could be wrong) but I'm wondering if the lenthNorm is the correct one I should be overriding. I'm interested in the number of times a term occurs found in a document (more occurance the higher the score) which I believe is coord. I may well be i am barking up the wrong tree. Cheers

Re: similarity function

2009-03-05 Thread patrick o'leary
Sounds like your most difficult part will be the question parser using POS. This is kind of old school but use something like the AliceBot AIML library http://en.wikipedia.org/wiki/AIML Where the subjective terms can be extracted from the questions, and indexed separately. Or as Grant and others

Re: similarity function

2009-03-05 Thread Grant Ingersoll
Hi Seid, Do you have a reference for the article? I've done some QA in my day, but don't recall reading that one. At any rate, I do think it is possible to do what you are after. See below. On Mar 5, 2009, at 9:49 AM, Seid Mohammed wrote: For my work, I have read an article stating th

Re: similarity function

2009-03-05 Thread Vasudevan Comandur
Hi, The very fact that you are trying to answer factoid questions to start with, it is better to use OpenNLP components to identify NER (Named Entity recognition) in the document and use those tags as part of your indexing process. REgards Vasu On Thu, Mar 5, 2009 at 8:19 PM, Seid Mohamm

Re: Similarity percentage between two Strings

2008-09-09 Thread Thiago Moreira
For those interested in my solution I took this article as based to implement the requirements. http://www.catalysoft.com/articles/StrikeAMatch.html Thanks. - Original Message - From: [EMAIL PROTECTED] Sent: Thu, September 4, 2008 1:20 Subject:Re: Similarity percentage betwee

Re: Similarity percentage between two Strings

2008-09-04 Thread Karl Wettin
I would create 1-5 ngram sized shingles and measure the distance using Tanimoto coefficient. That would probably work out just fine. You might want to add more weight the greater the size of the shingle. There are shingle filters in lucene/java/contrib/analyzers and there is a Tanimoto dist

Re: Similarity percentage between two Strings

2008-09-04 Thread Ian Lea
Googling for "java string similarity" throws up some stuff you might find useful. -- Ian. On Wed, Sep 3, 2008 at 11:58 PM, Thiago Moreira <[EMAIL PROTECTED]> wrote: > > Well, the similar definition that I'm looking for is the number 2, maybe > the number 3, but to start the number 2 is enou

Re: Similarity percentage between two Strings

2008-09-03 Thread N. Hira
More details may change my opinion (not quite sure how others feel yet), but with the way you've described it so far, it seems like all you need is a basic string matcher: For every message: - if message.subject is found in the pool, then this message is "similar to" the message in the poo

Re: Similarity percentage between two Strings

2008-09-03 Thread Thiago Moreira
- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]

Re: Similarity percentage between two Strings

2008-09-03 Thread N. Hira
I don't know how much of this is a Lucene problem, but -- as I'm sure you will inevitably hear from others on the list -- it depends on what your definition of "similar" is. By similar, do you mean: 1. Identical, except for variations in case (upper/lower) 2. Allow 1., but also allow prefix

Re: Similarity algorithm

2007-06-26 Thread Grant Ingersoll
Lucene In Action is a great book, but you can also have a look at http://lucene.apache.org/java/docs/scoring.html for more info on scoring and how to change the similarity and other details of scoring. Also, search the archives for things you are interested in, there is a lot of information

RE: Similarity algorithm

2007-06-26 Thread Damien McCarthy
The PDF of Lucene in Action can be purchased from www.manning.com I'd suggest reading and understanding Lucene in Action before you attempt anything else :) -Original Message- From: Mahdi Rahimi [mailto:[EMAIL PROTECTED] Sent: 26 June 2007 16:38 To: java-user@lucene.apache.org Subject: Si

RE: Similarity for Span and Boolean query

2007-01-08 Thread J.Zhu
Subject: Re: Similarity for Span and Boolean query : The equation for similarity is given on this web page: : http://lucene.apache.org/java/docs/api/org/apache/lucene/search/Similari : ty.html : : I would like to know what are the equations for similarity if the query : is a span or boolean query

Re: Similarity for Span and Boolean query

2007-01-08 Thread Chris Hostetter
: The equation for similarity is given on this web page: : http://lucene.apache.org/java/docs/api/org/apache/lucene/search/Similari : ty.html : : I would like to know what are the equations for similarity if the query : is a span or boolean query. That equation does cover BooleanQueries -- the "c

Re: Similarity

2005-12-19 Thread Erik Hatcher
On Dec 19, 2005, at 1:23 PM, Klaus wrote: I) What is exactly written to the index? Is the index just an inverted list? Is there term weight scoring stored? http://lucene.apache.org/java/docs/fileformats.html 1) Get all the documents from the index via the inverted list. Yo

Re: Similarity scores for all docs

2005-12-07 Thread Grant Ingersoll
You can use the HitCollector mechanism to fill your array, but what you are doing is essentially what the Hits object already does, plus it provides caching Eugene Ezekiel wrote: Yes, but what I wanna be able to do is something like, fill an array of say size 100 such that: array[0] = similar

Re: Similarity scores for all docs

2005-12-07 Thread Eugene Ezekiel
Yes, but what I wanna be able to do is something like, fill an array of say size 100 such that: array[0] = similarity value of query and doc(0) array[1] = similarity value of query and doc(1) Any idea how to fill this array? Thanks. -- Regards, Eugene Koji Sekiguchi wrote: You can get sco

RE: Similarity scores for all docs

2005-12-07 Thread Koji Sekiguchi
You can get scores by calling Hits.score(). So you should search at first to get Hits object. regards, Koji > -Original Message- > From: Eugene Ezekiel [mailto:[EMAIL PROTECTED] > Sent: Wednesday, December 07, 2005 6:03 PM > To: java-user@lucene.apache.org > Subject: Similarity scores fo