More like this inly return is and score issue

2020-04-30 Thread derrick cui
Hi,
I want to return more fields in moreLikeThis response, how should I reach it?
Currently the main doc returns all fields, but morelikethis result only has I’d 
and score, please help 
Thanks



Re: Need more info on MLT (More Like This) feature

2019-09-26 Thread Alessandro Benedetti
In addition to all the valuable information already shared I am curious to
understand why you think the results are unreliable.
Most of the times is the parameters that cause to ignore some of the terms
of the original document/corpus (as simple of the min/max document frequency
to consider or min term frequency in the source doc) .

I have been working a lot on the MLT in the past years and presenting the
work done (and internals) at various conferences/meetups.

I'll share some slides and some Jira issues that may help you:

https://www.youtube.com/watch?v=jkaj89XwHHw=540s
<https://www.youtube.com/watch?v=jkaj89XwHHw=540s>  
https://www.slideshare.net/SeaseLtd/how-the-lucene-more-like-this-works
<https://www.slideshare.net/SeaseLtd/how-the-lucene-more-like-this-works>  

https://issues.apache.org/jira/browse/LUCENE-8326
<https://issues.apache.org/jira/browse/LUCENE-8326>  
https://issues.apache.org/jira/browse/LUCENE-7802
<https://issues.apache.org/jira/browse/LUCENE-7802>  
https://issues.apache.org/jira/browse/LUCENE-7498
<https://issues.apache.org/jira/browse/LUCENE-7498>  

Generally speaking I favour the MLT query parser, it builds the MLT query
and gives you the chance to see it using the debug query.



-
---
Alessandro Benedetti
Search Consultant, R Software Engineer, Director
Sease Ltd. - www.sease.io
--
Sent from: https://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Need more info on MLT (More Like This) feature

2019-09-14 Thread Chee Yee Lim
By default, MLT uses the top 25 terms from the target document to do
similarity searches. A quick look at the source code (
https://github.com/apache/lucene-solr/blob/master/lucene/queries/src/java/org/apache/lucene/queries/mlt/MoreLikeThis.java
) and Lucene documentation (
https://lucene.apache.org/core/8_1_0/queries/org/apache/lucene/queries/mlt/MoreLikeThis.html
) suggests that MLT's similarity score is defined as a simple TF x IDF for
the top 25 terms. (Others who know more about MLT, please correct me if I
am wrong.)

An easy way to improve your results is to tune the mindf, maxdf, minwl and
maxwl parameters for knnSearch (
https://lucene.apache.org/solr/guide/8_1/stream-source-reference.html#knnsearch
).

Best wishes,
Chee Yee

On Sat, 14 Sep 2019 at 04:09, Dave  wrote:

> As a side note, if you use shingles with the mlt handler I believe you
> will get better scores/relevant results. So “to be free” becomes indexes as
> “to_be” “to_be_free” and “be_free” but also as each word. It makes the
> index significantly larger but creates better “unique terms” in my opinion
> and improved the results for me at least.
>
> On Sep 13, 2019, at 2:51 PM, Srisatya Pyla  wrote:
>
> Thank you very much for quick response. This is very much helpful to us.
> While analyzing the results for some jobs, it is returning high score for
> a document which is not much relevant to the base document.
> Is there any way we can improve the results and scoring?
> How it exactly give the score for matching document based on a matching
> field?  This is helpful to know why it is giving highest matching score for
> the specific documents.
>
>
> Regards,
> --
> *SST  Narasimha Rao Pyla*
>
> *IBM Talent Management SolutionsMobile :*+91 9849315546
> *E-mail :**srisp...@in.ibm.com* 
> [image: IBM]
>
> IBM Visakha Hills
> Visakhapatnam, AP 530045
> India
>
>
>
>
>
> From:Chee Yee Lim 
> To:Srisatya Pyla 
> Cc:solr-user@lucene.apache.org, Rajeev Kasarabada1 <
> kasar...@in.ibm.com>, Archana Gavini1 
> Date:13/09/2019 04:32 PM
> Subject:[EXTERNAL] Re: Need more info on MLT (More Like This)
> feature
> --
>
>
>
> To use knnSearch, you need to submit a POST request to the Stream request
> handler.
>
> Using your example query, you will need to rewrite them from this :
>
> *http://[SOLR*
> URL]/mlt?q=sjkey:1414462-25600-5258=json=true=true=100=jobdescription=1=1=jobtitle,jobdescription=siteid:5258
>
> to this (using curl as an example to send POST request) :
>
> curl --data-urlencode 'expr=knnSearch([collection_name],
> id="1414462-25600-5258",
> qf="jobdescription",
> k=100,
> fl="jobtitle,jobdescription,score",
> sort="score desc",
> fq="siteid:5258",
> mintf=1,
> mindf=1)' http://[SOLRURL]/stream
>
> Note that this assume your document ID is sjkey.
>
> More detailed documentation on how Stream handler works can be seen here,
> *https://lucene.apache.org/solr/guide/8_1/streaming-expressions.html*
> <https://lucene.apache.org/solr/guide/8_1/streaming-expressions.html>.
>
> Best wishes,
> Chee Yee
>
> On Fri, 13 Sep 2019 at 17:57, Srisatya Pyla <*srisp...@in.ibm.com*
> > wrote:
> Hi Chee Yee Lim,
>
>
> Thank you for your quick response.
> We do not find much documentation on knnsearch on how to do use that.
> Could you please guide us with more info on how this can be used?
>
> Can we use this the way we use Solr by querying with Solr URL like
> http://[SOLR URL]/mlt ?  OR any other way?
> And also please provide with any more detailed documentation if you have
> any.
>
>
> Regards,
> --
> *SST  Narasimha Rao Pyla*
>
> *IBM Talent Management SolutionsMobile :*+91 9849315546
> *E-mail :**srisp...@in.ibm.com* 
> [image: IBM]
>
> IBM Visakha Hills
> Visakhapatnam, AP 530045
> India
>
>
>
>
>
>
>
>
>
> - Original message -
> From: Chee Yee Lim <*cheeyee@gmail.com* >
> To: *solr-user@lucene.apache.org* 
> Cc: Archana Gavini1 <*agavi...@in.ibm.com* >, Rajeev
> Kasarabada1 <*kasar...@in.ibm.com* >
> Subject: [EXTERNAL] Re: Need more info on MLT (More Like This) feature
> Date: Thu, Sep 12, 2019 6:43 PM
>
> I've been working with MLT handler (Solr 8.1.1) by calling it the same way
> you did, *http://[SOLR*URL]/mlt. But the response is very unreliable with
> 90% of the same queries resulting in Java null pointer exception, and only
> 10% returning expected response. I do not know what is the cause of this.
>
> I overcame this problem by using knnSearch via Stream hand

Re: Need more info on MLT (More Like This) feature

2019-09-13 Thread Dave
As a side note, if you use shingles with the mlt handler I believe you will get 
better scores/relevant results. So “to be free” becomes indexes as “to_be” 
“to_be_free” and “be_free” but also as each word. It makes the index 
significantly larger but creates better “unique terms” in my opinion and 
improved the results for me at least. 

> On Sep 13, 2019, at 2:51 PM, Srisatya Pyla  wrote:
> 
> Thank you very much for quick response. This is very much helpful to us.
> While analyzing the results for some jobs, it is returning high score for a 
> document which is not much relevant to the base document. 
> Is there any way we can improve the results and scoring?  
> How it exactly give the score for matching document based on a matching 
> field?  This is helpful to know why it is giving highest matching score for 
> the specific documents.
> 
> 
> Regards,
> SST  Narasimha Rao Pyla
> IBM Talent Management Solutions
> Mobile :+91 9849315546
> E-mail :srisp...@in.ibm.com   
> 
> 
> IBM Visakha Hills
> Visakhapatnam, AP 530045
> India
> 
> 
> 
> 
> 
> From:Chee Yee Lim 
> To:Srisatya Pyla 
> Cc:solr-user@lucene.apache.org, Rajeev Kasarabada1 
> , Archana Gavini1 
> Date:13/09/2019 04:32 PM
> Subject:[EXTERNAL] Re: Need more info on MLT (More Like This) feature
> 
> 
> 
> To use knnSearch, you need to submit a POST request to the Stream request 
> handler.
> 
> Using your example query, you will need to rewrite them from this :
> 
> http://[SOLRURL]/mlt?q=sjkey:1414462-25600-5258=json=true=true=100=jobdescription=1=1=jobtitle,jobdescription=siteid:5258
> 
> to this (using curl as an example to send POST request) :
> 
> curl --data-urlencode 'expr=knnSearch([collection_name],
> id="1414462-25600-5258",
> qf="jobdescription",
> k=100,
> fl="jobtitle,jobdescription,score",
> sort="score desc",
> fq="siteid:5258",
> mintf=1, 
> mindf=1)' http://[SOLRURL]/stream
> 
> Note that this assume your document ID is sjkey.
> 
> More detailed documentation on how Stream handler works can be seen here, 
> https://lucene.apache.org/solr/guide/8_1/streaming-expressions.html.
> 
> Best wishes,
> Chee Yee
> 
> On Fri, 13 Sep 2019 at 17:57, Srisatya Pyla  wrote:
> Hi Chee Yee Lim,
> 
> 
> Thank you for your quick response.  
> We do not find much documentation on knnsearch on how to do use that.   
> Could you please guide us with more info on how this can be used?
> 
> Can we use this the way we use Solr by querying with Solr URL like   
> http://[SOLR URL]/mlt ?  OR any other way?
> And also please provide with any more detailed documentation if you have any.
> 
> 
> Regards,
> SST  Narasimha Rao Pyla
> IBM Talent Management Solutions
> Mobile :+91 9849315546
> E-mail :srisp...@in.ibm.com   
> 
> 
> IBM Visakha Hills
> Visakhapatnam, AP 530045
> India
> 
> 
> 
> 
> 
> 
>  
>  
> - Original message -
> From: Chee Yee Lim 
> To: solr-user@lucene.apache.org
> Cc: Archana Gavini1 , Rajeev Kasarabada1 
> 
> Subject: [EXTERNAL] Re: Need more info on MLT (More Like This) feature
> Date: Thu, Sep 12, 2019 6:43 PM
>  
> I've been working with MLT handler (Solr 8.1.1) by calling it the same way 
> you did, http://[SOLRURL]/mlt. But the response is very unreliable with 90% 
> of the same queries resulting in Java null pointer exception, and only 10% 
> returning expected response. I do not know what is the cause of this.
>  
> I overcame this problem by using knnSearch via Stream handler 
> (https://lucene.apache.org/solr/guide/8_1/stream-source-reference.html#knnsearch).
>  It is just a wrapper on MLT, and it works brilliantly. It is worth checking 
> it out if you are running Solr in cloud mode.
>  
> If you pass the fl="score"="score desc" to knnSearch, you will be able 
> to get the results sorted by matching scores.
>  
> Best wishes,
> Chee Yee
>   
> On Thu, 12 Sep 2019 at 19:44, Srisatya Pyla  wrote:
> Hi Solr Seatch Team,
> 
> I am a developer from IBM Kenexa Brassring.  We are using Solr Search engine 
> for searching jobs in our applications.
> We are planning to use MLT feature to get the similar matching documents 
> (jobs) based on one document (job).
> 
> When trying to explore this option, we are using matching field as 
> JobDescription of the job and we are getting some unrelated documents in the 
> MLT results which are not expected.
> 
> The query like below:
> 
> http://[SOLRURL]/mlt?q=sjkey:1414462-25600-5258=json=true=true=100=jobdescription=1=1=jobtitle,jobdescription=siteid:5258
> 
> 
&

RE: Need more info on MLT (More Like This) feature

2019-09-13 Thread Srisatya Pyla
Thank you very much for quick response. This is very much helpful to us.
While analyzing the results for some jobs, it is returning high score for 
a document which is not much relevant to the base document. 
Is there any way we can improve the results and scoring? 
How it exactly give the score for matching document based on a matching 
field?  This is helpful to know why it is giving highest matching score 
for the specific documents.


Regards,

SST  Narasimha Rao Pyla
IBM Talent Management Solutions
Mobile : +91 9849315546
E-mail : srisp...@in.ibm.com


IBM Visakha Hills
Visakhapatnam, AP 530045
India




From:   Chee Yee Lim 
To: Srisatya Pyla 
Cc: solr-user@lucene.apache.org, Rajeev Kasarabada1 
, Archana Gavini1 
Date:   13/09/2019 04:32 PM
Subject:[EXTERNAL] Re: Need more info on MLT (More Like This) 
feature



To use knnSearch, you need to submit a POST request to the Stream request 
handler.

Using your example query, you will need to rewrite them from this :

http://[SOLR
URL]/mlt?q=sjkey:1414462-25600-5258=json=true=true=100=jobdescription=1=1=jobtitle,jobdescription=siteid:5258

to this (using curl as an example to send POST request) :

curl --data-urlencode 'expr=knnSearch([collection_name],
id="1414462-25600-5258",
qf="jobdescription",
k=100,
fl="jobtitle,jobdescription,score",
sort="score desc",
fq="siteid:5258",
mintf=1, 
mindf=1)' http://[SOLRURL]/stream 

Note that this assume your document ID is sjkey.

More detailed documentation on how Stream handler works can be seen here, 
https://lucene.apache.org/solr/guide/8_1/streaming-expressions.html.

Best wishes,
Chee Yee

On Fri, 13 Sep 2019 at 17:57, Srisatya Pyla  wrote:
Hi Chee Yee Lim,


Thank you for your quick response.  
We do not find much documentation on knnsearch on how to do use that.   
Could you please guide us with more info on how this can be used?

Can we use this the way we use Solr by querying with Solr URL like   
http://[SOLR URL]/mlt ?  OR any other way?
And also please provide with any more detailed documentation if you have 
any.


Regards,

SST  Narasimha Rao Pyla
IBM Talent Management Solutions
Mobile :+91 9849315546
E-mail :srisp...@in.ibm.com


IBM Visakha Hills
Visakhapatnam, AP 530045
India






 
 
- Original message -
From: Chee Yee Lim 
To: solr-user@lucene.apache.org
Cc: Archana Gavini1 , Rajeev Kasarabada1 <
kasar...@in.ibm.com>
Subject: [EXTERNAL] Re: Need more info on MLT (More Like This) feature
Date: Thu, Sep 12, 2019 6:43 PM
 
I've been working with MLT handler (Solr 8.1.1) by calling it the same way 
you did, http://[SOLRURL]/mlt. But the response is very unreliable with 
90% of the same queries resulting in Java null pointer exception, and only 
10% returning expected response. I do not know what is the cause of this.
 
I overcame this problem by using knnSearch via Stream handler (
https://lucene.apache.org/solr/guide/8_1/stream-source-reference.html#knnsearch
). It is just a wrapper on MLT, and it works brilliantly. It is worth 
checking it out if you are running Solr in cloud mode.
 
If you pass the fl="score"="score desc" to knnSearch, you will be 
able to get the results sorted by matching scores.
 
Best wishes,
Chee Yee
  
On Thu, 12 Sep 2019 at 19:44, Srisatya Pyla  wrote:
Hi Solr Seatch Team,

I am a developer from IBM Kenexa Brassring.  We are using Solr Search 
engine for searching jobs in our applications.
We are planning to use MLT feature to get the similar matching documents 
(jobs) based on one document (job).

When trying to explore this option, we are using matching field as 
JobDescription of the job and we are getting some unrelated documents in 
the MLT results which are not expected.

The query like below:

http://[SOLR
URL]/mlt?q=sjkey:1414462-25600-5258=json=true=true=100=jobdescription=1=1=jobtitle,jobdescription=siteid:5258



We have few questions:
1) Is there any way we can get the matching score for each of the matching 
document we get in the MLT results, so that we can get the sorting done on 
the score to have the highest matching document at the top of the result.

2) Are there any best practices using MLT Handler?


Regards, 

SST  Narasimha Rao Pyla
IBM Talent Management Solutions
Mobile :+91 9849315546
E-mail :srisp...@in.ibm.com


IBM Visakha Hills
Visakhapatnam, AP 530045
India


 
 







Re: Need more info on MLT (More Like This) feature

2019-09-13 Thread Chee Yee Lim
To use knnSearch, you need to submit a POST request to the Stream request
handler.

Using your example query, you will need to rewrite them from this :

*http://[SOLR*
URL]/mlt?q=sjkey:1414462-25600-5258=json=true=true=100=jobdescription=1=1=jobtitle,jobdescription=siteid:5258

to this (using curl as an example to send POST request) :

curl --data-urlencode 'expr=knnSearch([collection_name],
id="1414462-25600-5258",
qf="jobdescription",
k=100,
fl="jobtitle,jobdescription,score",
sort="score desc",
fq="siteid:5258",
mintf=1,
mindf=1)' http://[SOLRURL]/stream

Note that this assume your document ID is sjkey.

More detailed documentation on how Stream handler works can be seen here,
https://lucene.apache.org/solr/guide/8_1/streaming-expressions.html.

Best wishes,
Chee Yee

On Fri, 13 Sep 2019 at 17:57, Srisatya Pyla  wrote:

> Hi Chee Yee Lim,
>
>
> Thank you for your quick response.
> We do not find much documentation on knnsearch on how to do use that.
> Could you please guide us with more info on how this can be used?
>
> Can we use this the way we use Solr by querying with Solr URL like
> http://[SOLR URL]/mlt ?  OR any other way?
> And also please provide with any more detailed documentation if you have
> any.
>
>
> Regards,
> --
> *SST  Narasimha Rao Pyla*
>
> *IBM Talent Management SolutionsMobile :*+91 9849315546
> *E-mail :**srisp...@in.ibm.com* 
> [image: IBM]
>
> IBM Visakha Hills
> Visakhapatnam, AP 530045
> India
>
>
>
>
>
>
>
>
> - Original message -
> From: Chee Yee Lim 
> To: solr-user@lucene.apache.org
> Cc: Archana Gavini1 , Rajeev Kasarabada1 <
> kasar...@in.ibm.com>
> Subject: [EXTERNAL] Re: Need more info on MLT (More Like This) feature
> Date: Thu, Sep 12, 2019 6:43 PM
>
> I've been working with MLT handler (Solr 8.1.1) by calling it the same way
> you did, *http://[SOLR*URL]/mlt. But the response is very unreliable with
> 90% of the same queries resulting in Java null pointer exception, and only
> 10% returning expected response. I do not know what is the cause of this.
>
> I overcame this problem by using knnSearch via Stream handler (
> *https://lucene.apache.org/solr/guide/8_1/stream-source-reference.html#knnsearch*
> <https://lucene.apache.org/solr/guide/8_1/stream-source-reference.html#knnsearch>).
> It is just a wrapper on MLT, and it works brilliantly. It is worth checking
> it out if you are running Solr in cloud mode.
>
> If you pass the fl="score"="score desc" to knnSearch, you will be
> able to get the results sorted by matching scores.
>
> Best wishes,
> Chee Yee
>
> On Thu, 12 Sep 2019 at 19:44, Srisatya Pyla <*srisp...@in.ibm.com*
> > wrote:
> Hi Solr Seatch Team,
>
> I am a developer from IBM Kenexa Brassring.  We are using Solr Search
> engine for searching jobs in our applications.
> We are planning to use MLT feature to get the similar matching documents
> (jobs) based on one document (job).
>
> When trying to explore this option, we are using matching field as
> JobDescription of the job and we are getting some unrelated documents in
> the MLT results which are not expected.
>
> The query like below:
>
> *http://[SOLR*
> URL]/mlt?q=sjkey:1414462-25600-5258=json=true=true=100=jobdescription=1=1=jobtitle,jobdescription=siteid:5258
>
>
> *We have few questions*:
> 1) Is there any way we can get the matching score for each of the matching
> document we get in the MLT results, so that we can get the sorting done on
> the score to have the highest matching document at the top of the result.
>
> 2) Are there any best practices using MLT Handler?
>
>
> Regards,
> --
> *SST  Narasimha Rao Pyla*
>
> *IBM Talent Management SolutionsMobile :*+91 9849315546
> *E-mail :**srisp...@in.ibm.com* 
>
>
> IBM Visakha Hills
> Visakhapatnam, AP 530045
> India
>
>
>
>
>
>
>
>


RE: Need more info on MLT (More Like This) feature

2019-09-13 Thread Srisatya Pyla
Hi Chee Yee Lim,


Thank you for your quick response. 
We do not find much documentation on knnsearch on how to do use that. 
Could you please guide us with more info on how this can be used?

Can we use this the way we use Solr by querying with Solr URL like 
http://[SOLR URL]/mlt ?  OR any other way?
And also please provide with any more detailed documentation if you have 
any.


Regards,

SST  Narasimha Rao Pyla
IBM Talent Management Solutions
Mobile : +91 9849315546
E-mail : srisp...@in.ibm.com


IBM Visakha Hills
Visakhapatnam, AP 530045
India





 
 
- Original message -
From: Chee Yee Lim 
To: solr-user@lucene.apache.org
Cc: Archana Gavini1 , Rajeev Kasarabada1 

Subject: [EXTERNAL] Re: Need more info on MLT (More Like This) feature
Date: Thu, Sep 12, 2019 6:43 PM
 
I've been working with MLT handler (Solr 8.1.1) by calling it the same way 
you did, http://[SOLR URL]/mlt. But the response is very unreliable with 
90% of the same queries resulting in Java null pointer exception, and only 
10% returning expected response. I do not know what is the cause of this.
 
I overcame this problem by using knnSearch via Stream handler (
https://lucene.apache.org/solr/guide/8_1/stream-source-reference.html#knnsearch
). It is just a wrapper on MLT, and it works brilliantly. It is worth 
checking it out if you are running Solr in cloud mode.
 
If you pass the fl="score"="score desc" to knnSearch, you will be 
able to get the results sorted by matching scores.
 
Best wishes,
Chee Yee
 
On Thu, 12 Sep 2019 at 19:44, Srisatya Pyla  wrote:
Hi Solr Seatch Team,

I am a developer from IBM Kenexa Brassring.  We are using Solr Search 
engine for searching jobs in our applications.
We are planning to use MLT feature to get the similar matching documents 
(jobs) based on one document (job).

When trying to explore this option, we are using matching field as 
JobDescription of the job and we are getting some unrelated documents in 
the MLT results which are not expected.

The query like below:

http://[SOLR 
URL]/mlt?q=sjkey:1414462-25600-5258=json=true=true=100=jobdescription=1=1=jobtitle,jobdescription=siteid:5258



We have few questions:
1) Is there any way we can get the matching score for each of the matching 
document we get in the MLT results, so that we can get the sorting done on 
the score to have the highest matching document at the top of the result.

2) Are there any best practices using MLT Handler?


Regards, 

SST  Narasimha Rao Pyla
IBM Talent Management Solutions
Mobile :+91 9849315546
E-mail :srisp...@in.ibm.com


IBM Visakha Hills
Visakhapatnam, AP 530045
India

 
 
 





Re: Need more info on MLT (More Like This) feature

2019-09-12 Thread Chee Yee Lim
I've been working with MLT handler (Solr 8.1.1) by calling it the same way
you did, http://[SOLR URL]/mlt. But the response is very unreliable with
90% of the same queries resulting in Java null pointer exception, and only
10% returning expected response. I do not know what is the cause of this.

I overcame this problem by using knnSearch via Stream handler (
https://lucene.apache.org/solr/guide/8_1/stream-source-reference.html#knnsearch).
It is just a wrapper on MLT, and it works brilliantly. It is worth checking
it out if you are running Solr in cloud mode.

If you pass the fl="score"="score desc" to knnSearch, you will be able
to get the results sorted by matching scores.

Best wishes,
Chee Yee

On Thu, 12 Sep 2019 at 19:44, Srisatya Pyla  wrote:

> Hi Solr Seatch Team,
>
> I am a developer from IBM Kenexa Brassring.  We are using Solr Search
> engine for searching jobs in our applications.
> We are planning to use MLT feature to get the similar matching documents
> (jobs) based on one document (job).
>
> When trying to explore this option, we are using matching field as
> JobDescription of the job and we are getting some unrelated documents in
> the MLT results which are not expected.
>
> The query like below:
>
> http://[SOLR
> URL]/mlt?q=sjkey:1414462-25600-5258=json=true=true=100=jobdescription=1=1=jobtitle,jobdescription=siteid:5258
>
>
> *We have few questions*:
> 1) Is there any way we can get the matching score for each of the matching
> document we get in the MLT results, so that we can get the sorting done on
> the score to have the highest matching document at the top of the result.
>
> 2) Are there any best practices using MLT Handler?
>
>
> Regards,
> --
> *SST  Narasimha Rao Pyla*
>
> *IBM Talent Management SolutionsMobile :*+91 9849315546
> *E-mail :**srisp...@in.ibm.com* 
> [image: IBM]
>
> IBM Visakha Hills
> Visakhapatnam, AP 530045
> India
>
>
>


Need more info on MLT (More Like This) feature

2019-09-12 Thread Srisatya Pyla
Hi Solr Seatch Team,

I am a developer from IBM Kenexa Brassring.  We are using Solr Search 
engine for searching jobs in our applications.
We are planning to use MLT feature to get the similar matching documents 
(jobs) based on one document (job).

When trying to explore this option, we are using matching field as 
JobDescription of the job and we are getting some unrelated documents in 
the MLT results which are not expected.

The query like below:

http://[SOLR 
URL]/mlt?q=sjkey:1414462-25600-5258=json=true=true=100=jobdescription=1=1=jobtitle,jobdescription=siteid:5258


We have few questions:
1) Is there any way we can get the matching score for each of the matching 
document we get in the MLT results, so that we can get the sorting done on 
the score to have the highest matching document at the top of the result.

2) Are there any best practices using MLT Handler?


Regards,

SST  Narasimha Rao Pyla
IBM Talent Management Solutions
Mobile : +91 9849315546
E-mail : srisp...@in.ibm.com


IBM Visakha Hills
Visakhapatnam, AP 530045
India




Re: more like this query parser with faceting

2019-08-12 Thread Szűcs Roland
Thanks David.
This page I was looking for.

Roland

David Hastings  ezt írta (időpont: 2019. aug.
12., H, 20:52):

> should be fine,
> https://cwiki.apache.org/confluence/display/solr/MoreLikeThisHandler
>
> for more info
>
> On Mon, Aug 12, 2019 at 2:49 PM Szűcs Roland 
> wrote:
>
> > Hi David,
> > Thanks the fast reply. Am I right that I can combine fq with mlt only if
> I
> > use more like this as a query parser?
> >
> > Is there a way to achieve the same with mlt as a request handler?
> > Roland
> >
> > David Hastings  ezt írta (időpont: 2019.
> > aug.
> > 12., H, 20:44):
> >
> > > The easiest way will be to pass in a filter query (fq)
> > >
> > > On Mon, Aug 12, 2019 at 2:40 PM Szűcs Roland <
> > szucs.rol...@bookandwalk.hu>
> > > wrote:
> > >
> > > > Hi All,
> > > >
> > > > Is there any tutorial or example how to use more like this
> > functionality
> > > > when we have some other constraints set by the user through faceting
> > > > parameters like price range, or product category for example?
> > > >
> > > > Cheers,
> > > > Roland
> > > >
> > >
> >
>


Re: more like this query parser with faceting

2019-08-12 Thread David Hastings
should be fine,
https://cwiki.apache.org/confluence/display/solr/MoreLikeThisHandler

for more info

On Mon, Aug 12, 2019 at 2:49 PM Szűcs Roland 
wrote:

> Hi David,
> Thanks the fast reply. Am I right that I can combine fq with mlt only if I
> use more like this as a query parser?
>
> Is there a way to achieve the same with mlt as a request handler?
> Roland
>
> David Hastings  ezt írta (időpont: 2019.
> aug.
> 12., H, 20:44):
>
> > The easiest way will be to pass in a filter query (fq)
> >
> > On Mon, Aug 12, 2019 at 2:40 PM Szűcs Roland <
> szucs.rol...@bookandwalk.hu>
> > wrote:
> >
> > > Hi All,
> > >
> > > Is there any tutorial or example how to use more like this
> functionality
> > > when we have some other constraints set by the user through faceting
> > > parameters like price range, or product category for example?
> > >
> > > Cheers,
> > > Roland
> > >
> >
>


Re: more like this query parser with faceting

2019-08-12 Thread Szűcs Roland
Hi David,
Thanks the fast reply. Am I right that I can combine fq with mlt only if I
use more like this as a query parser?

Is there a way to achieve the same with mlt as a request handler?
Roland

David Hastings  ezt írta (időpont: 2019. aug.
12., H, 20:44):

> The easiest way will be to pass in a filter query (fq)
>
> On Mon, Aug 12, 2019 at 2:40 PM Szűcs Roland 
> wrote:
>
> > Hi All,
> >
> > Is there any tutorial or example how to use more like this functionality
> > when we have some other constraints set by the user through faceting
> > parameters like price range, or product category for example?
> >
> > Cheers,
> > Roland
> >
>


Re: more like this query parser with faceting

2019-08-12 Thread David Hastings
The easiest way will be to pass in a filter query (fq)

On Mon, Aug 12, 2019 at 2:40 PM Szűcs Roland 
wrote:

> Hi All,
>
> Is there any tutorial or example how to use more like this functionality
> when we have some other constraints set by the user through faceting
> parameters like price range, or product category for example?
>
> Cheers,
> Roland
>


more like this query parser with faceting

2019-08-12 Thread Szűcs Roland
Hi All,

Is there any tutorial or example how to use more like this functionality
when we have some other constraints set by the user through faceting
parameters like price range, or product category for example?

Cheers,
Roland


Re: More Like This Query problems

2018-10-18 Thread John Bickerstaff
Found it.

My SOLR does NOT store fields and after some careful checking, it turns out
we do NOT do term vectors either...  So, according to the docs, MLT will
not work.

Thanks for the response David!

On Thu, Oct 18, 2018 at 1:44 PM John Bickerstaff 
wrote:

> Thanks. There are many docs with matching words  I've tried an
> extremely simplified case where a basic query (q=Field1:"foo") returns
> millions of results... however a MLT similar to the one I mention below,
> using a doc Id I know has "foo" in Field1 returns only the same Doc ID as
> submitted in the query.
>
>
> http://XX.XXX.XX.XXX:10001/solr/BPS/select?indent=on=Field1:%22foo%22=json
> (Returns several million as "numFound)
>
>
> http://XX.XXX.XX.XXX:10001/solr/BPS/select?indent=on=Field1=true=id:%2227000:9009:66%22=json
> (returns only the same ID in the More Like This section)
>
> Wouldn't the AND NOT just eliminate my initial doc Id from the list?
> Assuming matches, we would still expect other ids to be returned in any
> case, wouldn't we?  Should that be a Filter Query?
>
> On Thu, Oct 18, 2018, 12:57 PM David Hastings 
> wrote:
>
>> Make sure your query has an “AND NOT id:your doc id”
>> Also be certain there are other documents that will meet your criteria
>> for a test case. Remember it’s unique words in your core/collection
>>
>> On Oct 18, 2018, at 2:43 PM, John Bickerstaff > <mailto:j...@johnbickerstaff.com>> wrote:
>>
>> All,
>>
>>
>> I am having trouble with a “more like this” query in Solr.
>>
>>
>> Here’s what I think should be happening:
>>
>>
>> 1. Query contains Document ID (q=id:"942316176:9009:66
>> <
>> http://10.157.117.55:10001/solr/BPS/select?==true=on=surnames,genders,givennames,birthlocations,deathlocations=true=id:%22942316176:9009:66%22=json
>> >
>> ”)
>>
>> 2. I add the following (on the solr admin page, raw query parameters
>> field)
>>
>>  =true=field1,field2,field3
>>
>> 3. More Like This will take the Document ID, look at the fields (field1,
>> field2, field3) and return a list of documents that have the best match to
>> the contents of those fields in “document Id”
>>
>>
>> What is happening is that I’m getting only one result and it is the same
>> document id as the one I sent in on the query.  What I expected was a list
>> of Doc ID’s for documents that have some kind of match to the submitted
>> Doc
>> ID.
>>
>>
>> Any thoughts or advice would be appreciated.
>>
>>
>> ===
>>
>>
>> Here is an example of the query URL:
>>
>>
>>
>> http://XX.XXX.XXX.XX:10001/solr/BPS/select?==true=on=field1,field2,field3=true=id:%22942316176:9009:66%22=json
>> <
>> http://xx.xxx.xxx.xx:10001/solr/BPS/select?==true=on=field1,field2,field3=true=id:%22942316176:9009:66%22=json
>> >
>>
>>
>> However, when I submit the query, I get only one document ID returned -
>> the
>> same one I submitted in the first place.
>>
>>
>> Here is the important section of the response:
>>
>>
>> {
>>
>>  "*responseHeader*":{
>>
>>"*zkConnected*":true,
>>
>>"*status*":0,
>>
>>"*QTime*":26,
>>
>>"*params*":{
>>
>>  "*q*":"id:\"942316176:9009:66\"",
>>
>>  "*debug*":"true",
>>
>>  "*mlt*":"true",
>>
>>  "*indent*":"on",
>>
>>  "*mlt.fl*”:”field1,field2,field3",
>>
>>  "*wt*":"json",
>>
>>  "*_*":"1539881180264"}},
>>
>>  "*response*":{"*numFound*":1,"*start*":0,"*maxScore*":1.0,"*docs*":[
>>
>>  {
>>
>>"*id*":"942316176:9009:66",
>>
>>"*_version_*":1611920924010872837}]
>>
>>  },
>>
>>  "*moreLikeThis*":[
>>
>>"942316176:9009:66",{"*numFound*":0,"*start*":0,"*docs*":[]
>>
>>}],
>>
>>  "*debug*":{
>>
>


Re: More Like This Query problems

2018-10-18 Thread John Bickerstaff
Thanks. There are many docs with matching words  I've tried an
extremely simplified case where a basic query (q=Field1:"foo") returns
millions of results... however a MLT similar to the one I mention below,
using a doc Id I know has "foo" in Field1 returns only the same Doc ID as
submitted in the query.

http://XX.XXX.XX.XXX:10001/solr/BPS/select?indent=on=Field1:%22foo%22=json
(Returns several million as "numFound)

http://XX.XXX.XX.XXX:10001/solr/BPS/select?indent=on=Field1=true=id:%2227000:9009:66%22=json
(returns only the same ID in the More Like This section)

Wouldn't the AND NOT just eliminate my initial doc Id from the list?
Assuming matches, we would still expect other ids to be returned in any
case, wouldn't we?  Should that be a Filter Query?

On Thu, Oct 18, 2018, 12:57 PM David Hastings  wrote:

> Make sure your query has an “AND NOT id:your doc id”
> Also be certain there are other documents that will meet your criteria for
> a test case. Remember it’s unique words in your core/collection
>
> On Oct 18, 2018, at 2:43 PM, John Bickerstaff  <mailto:j...@johnbickerstaff.com>> wrote:
>
> All,
>
>
> I am having trouble with a “more like this” query in Solr.
>
>
> Here’s what I think should be happening:
>
>
> 1. Query contains Document ID (q=id:"942316176:9009:66
> <
> http://10.157.117.55:10001/solr/BPS/select?==true=on=surnames,genders,givennames,birthlocations,deathlocations=true=id:%22942316176:9009:66%22=json
> >
> ”)
>
> 2. I add the following (on the solr admin page, raw query parameters field)
>
>  =true=field1,field2,field3
>
> 3. More Like This will take the Document ID, look at the fields (field1,
> field2, field3) and return a list of documents that have the best match to
> the contents of those fields in “document Id”
>
>
> What is happening is that I’m getting only one result and it is the same
> document id as the one I sent in on the query.  What I expected was a list
> of Doc ID’s for documents that have some kind of match to the submitted Doc
> ID.
>
>
> Any thoughts or advice would be appreciated.
>
>
> ===
>
>
> Here is an example of the query URL:
>
>
>
> http://XX.XXX.XXX.XX:10001/solr/BPS/select?==true=on=field1,field2,field3=true=id:%22942316176:9009:66%22=json
> <
> http://xx.xxx.xxx.xx:10001/solr/BPS/select?==true=on=field1,field2,field3=true=id:%22942316176:9009:66%22=json
> >
>
>
> However, when I submit the query, I get only one document ID returned - the
> same one I submitted in the first place.
>
>
> Here is the important section of the response:
>
>
> {
>
>  "*responseHeader*":{
>
>"*zkConnected*":true,
>
>"*status*":0,
>
>"*QTime*":26,
>
>"*params*":{
>
>  "*q*":"id:\"942316176:9009:66\"",
>
>  "*debug*":"true",
>
>  "*mlt*":"true",
>
>  "*indent*":"on",
>
>  "*mlt.fl*”:”field1,field2,field3",
>
>  "*wt*":"json",
>
>  "*_*":"1539881180264"}},
>
>  "*response*":{"*numFound*":1,"*start*":0,"*maxScore*":1.0,"*docs*":[
>
>  {
>
>"*id*":"942316176:9009:66",
>
>"*_version_*":1611920924010872837}]
>
>  },
>
>  "*moreLikeThis*":[
>
>"942316176:9009:66",{"*numFound*":0,"*start*":0,"*docs*":[]
>
>}],
>
>  "*debug*":{
>


Re: More Like This Query problems

2018-10-18 Thread David Hastings
Make sure your query has an “AND NOT id:your doc id”
Also be certain there are other documents that will meet your criteria for a 
test case. Remember it’s unique words in your core/collection

On Oct 18, 2018, at 2:43 PM, John Bickerstaff 
mailto:j...@johnbickerstaff.com>> wrote:

All,


I am having trouble with a “more like this” query in Solr.


Here’s what I think should be happening:


1. Query contains Document ID (q=id:"942316176:9009:66
<http://10.157.117.55:10001/solr/BPS/select?==true=on=surnames,genders,givennames,birthlocations,deathlocations=true=id:%22942316176:9009:66%22=json>
”)

2. I add the following (on the solr admin page, raw query parameters field)

 =true=field1,field2,field3

3. More Like This will take the Document ID, look at the fields (field1,
field2, field3) and return a list of documents that have the best match to
the contents of those fields in “document Id”


What is happening is that I’m getting only one result and it is the same
document id as the one I sent in on the query.  What I expected was a list
of Doc ID’s for documents that have some kind of match to the submitted Doc
ID.


Any thoughts or advice would be appreciated.


===


Here is an example of the query URL:


http://XX.XXX.XXX.XX:10001/solr/BPS/select?==true=on=field1,field2,field3=true=id:%22942316176:9009:66%22=json<http://xx.xxx.xxx.xx:10001/solr/BPS/select?==true=on=field1,field2,field3=true=id:%22942316176:9009:66%22=json>


However, when I submit the query, I get only one document ID returned - the
same one I submitted in the first place.


Here is the important section of the response:


{

 "*responseHeader*":{

   "*zkConnected*":true,

   "*status*":0,

   "*QTime*":26,

   "*params*":{

 "*q*":"id:\"942316176:9009:66\"",

 "*debug*":"true",

 "*mlt*":"true",

 "*indent*":"on",

 "*mlt.fl*”:”field1,field2,field3",

 "*wt*":"json",

 "*_*":"1539881180264"}},

 "*response*":{"*numFound*":1,"*start*":0,"*maxScore*":1.0,"*docs*":[

 {

   "*id*":"942316176:9009:66",

   "*_version_*":1611920924010872837}]

 },

 "*moreLikeThis*":[

   "942316176:9009:66",{"*numFound*":0,"*start*":0,"*docs*":[]

   }],

 "*debug*":{


More Like This Query problems

2018-10-18 Thread John Bickerstaff
All,


I am having trouble with a “more like this” query in Solr.


Here’s what I think should be happening:


1. Query contains Document ID (q=id:"942316176:9009:66
<http://10.157.117.55:10001/solr/BPS/select?==true=on=surnames,genders,givennames,birthlocations,deathlocations=true=id:%22942316176:9009:66%22=json>
”)

2. I add the following (on the solr admin page, raw query parameters field)

  =true=field1,field2,field3

3. More Like This will take the Document ID, look at the fields (field1,
field2, field3) and return a list of documents that have the best match to
the contents of those fields in “document Id”


What is happening is that I’m getting only one result and it is the same
document id as the one I sent in on the query.  What I expected was a list
of Doc ID’s for documents that have some kind of match to the submitted Doc
ID.


Any thoughts or advice would be appreciated.


===


Here is an example of the query URL:


http://XX.XXX.XXX.XX:10001/solr/BPS/select?==true=on=field1,field2,field3=true=id:%22942316176:9009:66%22=json


However, when I submit the query, I get only one document ID returned - the
same one I submitted in the first place.


Here is the important section of the response:


{

  "*responseHeader*":{

"*zkConnected*":true,

"*status*":0,

"*QTime*":26,

"*params*":{

  "*q*":"id:\"942316176:9009:66\"",

  "*debug*":"true",

  "*mlt*":"true",

  "*indent*":"on",

  "*mlt.fl*”:”field1,field2,field3",

  "*wt*":"json",

  "*_*":"1539881180264"}},

  "*response*":{"*numFound*":1,"*start*":0,"*maxScore*":1.0,"*docs*":[

  {

"*id*":"942316176:9009:66",

"*_version_*":1611920924010872837}]

  },

  "*moreLikeThis*":[

"942316176:9009:66",{"*numFound*":0,"*start*":0,"*docs*":[]

}],

  "*debug*":{


highlighting more-like-this

2018-10-12 Thread Matt Work Coarr
I want to get highlighted results for more like this queries.  More like
this doesn't support highlighting.

So what I did was ran a more like this query (I have the source document A
and say I get three similar documents back A1, A2, and A3).  I then create
a second query where I use the contents of A as the query.

More specifically, I have all a subset of my fields being appended to a
multivalued "catchall" field.  I use A's concatenated catchall (with
punctuation removed) as the search:

q=catchall:(*CONCATENATED_A_CATCHALL_TEXT*)

And I limit the results to the three documents A1/A2/A3 via qf:

qf=id*:A1_ID*+id*:A2_ID*+id*:A3_ID*

Now I get highlighted results.  But my main problem is very frequent terms
(for/the/to/in...) are highlighted.  I would have thought these would be
excluded via inverse document frequency (since they show up in just about
every document).

Is there a way to improve the highlighting? (Remove the less important
terms, set some threshold, etc)

Matt


highlighting in more like this?

2018-09-18 Thread Matt Work Coarr
Is it possible to get highlighting in more like this queries?  My initial
attempts seem to indicate that it isn't possible (I've only attempted this
via modifying MLT query urls)

(I'm looking for something similar to hl=true=field1,field5,field6 in
a normal search)

Thanks,
Matt


Solr 6 more like this

2016-08-03 Thread sara hajili
hi i switch from solr 5 to solr 6 .
i create my more like this handler that use solr more like this handler and
expand query by adding some word to query.
and now my question is about mlt parameter .
i wanna to know about mlt.mindf and mlt.mintf.what are these doing exactly?
when i didn't set mlt.mindf and mlt.mintf and theses set to default value.
every thing is ok.and i got answer of my handler queickly.
but when i set both of these 1 .
i got heap error.and when i check my solr with jconsole.i saw that these
new query with mlt.mintf=1 and mlt.mindf=1 nead more than 4G heap while
when i execute my query with default mlt.mintf=2 and mlt.mindf=5 i did not
get heap space error .and query execute with less than 512M heap size.

mintf and mindf how can affect on memory use (heap size) of my solr system?


More like this in solr5.4.1

2016-07-19 Thread kostali hassan
I want introdius Morelikethis to get simmilaire document for each query.
I had index rich data pds and msword I guess The fields to use for
similarity
is CONTENT used also for  highlighting document content.
In my case what is the best way to build mlt :MoreLikeThisHandler
 or with the
MoreLikeThisComponent in SearchHandler
.


RE: Solr more like this

2016-07-06 Thread Jamal, Sarfaraz
Could you index it, do the 'like this' and  then delete it from the index?

All in one smooth user experience obviously.

(Just throwing it out there).

Sas



-Original Message-
From: Charlie Hull [mailto:char...@flax.co.uk] 
Sent: Wednesday, July 6, 2016 11:02 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr more like this

On 05/07/2016 19:42, sara hajili wrote:
> Hi
> I indexed pdf files yo solr.and now I wanna to know is there any way 
> to uplaod  a pdf file and solr return related pdf in result?
> I mean I don't want to index pdf file (the file that I wanna to get 
> pdf more like this for this pdf).and just upload pdf file and get mlt 
> result.can I do this??
>
If Solr hasn't indexed a PDF file, it can't work out it's 'like this'. 
So I'd say, no, you can't.

Cheers

Charlie

--
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk


Re: Solr more like this

2016-07-06 Thread Alessandro Benedetti
So, if you already indexed N pdfs you can use the MLT request handler to
look for similar documents to a specific text.
This means you need to manually extract the content from the pdf ( like
using Tika for example) and then passing that to the specific request
handler.

http://wiki.apache.org/solr/MoreLikeThisHandler#Examples

In particular :
http://localhost:8983/solr/mlt?stream.url=http://lucene.apache.org/solr/=manu,cat=list=0

Not sure if this is still valid.
I would verify the request handler code, to check if the request parameters
are still supported.
I will do it tomorrow, but in the meantime feel free to verify.

Cheers

On Wed, Jul 6, 2016 at 4:02 PM, Charlie Hull <char...@flax.co.uk> wrote:

> On 05/07/2016 19:42, sara hajili wrote:
>
>> Hi
>> I indexed pdf files yo solr.and now I wanna to know is there any way to
>> uplaod  a pdf file and solr return related pdf in result?
>> I mean I don't want to index pdf file (the file that I wanna to get pdf
>> more like this for this pdf).and just upload pdf file and get mlt
>> result.can I do this??
>>
>> If Solr hasn't indexed a PDF file, it can't work out it's 'like this'. So
> I'd say, no, you can't.
>
> Cheers
>
> Charlie
>
> --
> Charlie Hull
> Flax - Open Source Enterprise Search
>
> tel/fax: +44 (0)8700 118334
> mobile:  +44 (0)7767 825828
> web: www.flax.co.uk
>



-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Re: Solr more like this

2016-07-06 Thread Charlie Hull

On 05/07/2016 19:42, sara hajili wrote:

Hi
I indexed pdf files yo solr.and now I wanna to know is there any way to
uplaod  a pdf file and solr return related pdf in result?
I mean I don't want to index pdf file (the file that I wanna to get pdf
more like this for this pdf).and just upload pdf file and get mlt
result.can I do this??

If Solr hasn't indexed a PDF file, it can't work out it's 'like this'. 
So I'd say, no, you can't.


Cheers

Charlie

--
Charlie Hull
Flax - Open Source Enterprise Search

tel/fax: +44 (0)8700 118334
mobile:  +44 (0)7767 825828
web: www.flax.co.uk


Solr more like this

2016-07-05 Thread sara hajili
Hi
I indexed pdf files yo solr.and now I wanna to know is there any way to
uplaod  a pdf file and solr return related pdf in result?
I mean I don't want to index pdf file (the file that I wanna to get pdf
more like this for this pdf).and just upload pdf file and get mlt
result.can I do this??


Re: More Like This on not new documents

2016-05-24 Thread Vincenzo D'Amore
Thanks Nick,

I don't plan to index the document, the document is a kind of disposable
object. And it is based on the user query.

I have seen that page, I didn't get how pass the document (my disposable
object) via stream.body parameter.

Googling I found this https://issues.apache.org/jira/browse/SOLR-5351

I see Solr committers are just working on this bug recently, but for now
only the first field is handled by mlt .

So it is not clear how to use of stream.body parameter.

Best regards,
Vincenzo

On Fri, May 13, 2016 at 7:03 PM, Nick D  wrote:

> https://wiki.apache.org/solr/MoreLikeThisHandler
>
> Bottom of the page, using context streams. I believe this still works in
> newer versions of Solr. Although I have not tested it on a new version of
> Solr.
>
> But if you plan on indexing the document anyways then just indexing and
> then passing the ID to mlt isn't a bad thing at all.
>
> Nick
>
> On Fri, May 13, 2016 at 2:23 AM, Vincenzo D'Amore 
> wrote:
>
> > Hi all,
> >
> > anybody know if is there a chance to use the mlt component with a new
> > document not existing in the collection?
> >
> > In other words, if I have a new document, should I always first add it to
> > my collection and only then, using the mlt component, have the list of
> > similar documents?
> >
> >
> > Best regards,
> > Vincenzo
> >
> >
> > --
> > Vincenzo D'Amore
> > email: v.dam...@gmail.com
> > skype: free.dev
> > mobile: +39 349 8513251
> >
>



-- 
Vincenzo D'Amore
email: v.dam...@gmail.com
skype: free.dev
mobile: +39 349 8513251


Re: More Like This on not new documents

2016-05-13 Thread Nick D
https://wiki.apache.org/solr/MoreLikeThisHandler

Bottom of the page, using context streams. I believe this still works in
newer versions of Solr. Although I have not tested it on a new version of
Solr.

But if you plan on indexing the document anyways then just indexing and
then passing the ID to mlt isn't a bad thing at all.

Nick

On Fri, May 13, 2016 at 2:23 AM, Vincenzo D'Amore 
wrote:

> Hi all,
>
> anybody know if is there a chance to use the mlt component with a new
> document not existing in the collection?
>
> In other words, if I have a new document, should I always first add it to
> my collection and only then, using the mlt component, have the list of
> similar documents?
>
>
> Best regards,
> Vincenzo
>
>
> --
> Vincenzo D'Amore
> email: v.dam...@gmail.com
> skype: free.dev
> mobile: +39 349 8513251
>


More Like This on not new documents

2016-05-13 Thread Vincenzo D'Amore
Hi all,

anybody know if is there a chance to use the mlt component with a new
document not existing in the collection?

In other words, if I have a new document, should I always first add it to
my collection and only then, using the mlt component, have the list of
similar documents?


Best regards,
Vincenzo


-- 
Vincenzo D'Amore
email: v.dam...@gmail.com
skype: free.dev
mobile: +39 349 8513251


Re: [More Like This] Query building

2016-04-12 Thread Scott Stults
Hi Alessandro,

It's not uncommon for Solr patches to remain uncommitted for months, even
years. In fact some never get merged. Don't let that discourage you!


k/r,
Scott

On Fri, Mar 11, 2016 at 11:49 AM, Alessandro Benedetti <
abenede...@apache.org> wrote:

> I start to feel that is not that easy to contribute improvements or small
> fix to Solr ( if they are not super interesting to the mass) .
> I think this one could be a good improvement in the MLT but I would love to
> discuss this with some committer.
> The patch is attached, it is there since months ago...
> Any feedback would be appreciated, I want to contribute, but I need some
> second opinions ...
>
> Cheers
>
> On 11 February 2016 at 13:48, Alessandro Benedetti <abenede...@apache.org>
> wrote:
>
> > Hi Guys,
> > is it possible to have any feedback ?
> > Is there any process to speed up bug resolution / discussions ?
> > just want to understand if the patch is not good enough, if I need to
> > improve it or simply no-one took a look ...
> >
> > https://issues.apache.org/jira/browse/LUCENE-6954
> >
> > Cheers
> >
> > On 11 January 2016 at 15:25, Alessandro Benedetti <abenede...@apache.org
> >
> > wrote:
> >
> >> Hi guys,
> >> the patch seems fine to me.
> >> I didn't spend much more time on the code but I checked the tests and
> the
> >> pre-commit checks.
> >> It seems fine to me.
> >> Let me know ,
> >>
> >> Cheers
> >>
> >> On 31 December 2015 at 18:40, Alessandro Benedetti <
> abenede...@apache.org
> >> > wrote:
> >>
> >>> https://issues.apache.org/jira/browse/LUCENE-6954
> >>>
> >>> First draft patch available, I will check better the tests new year !
> >>>
> >>> On 29 December 2015 at 13:43, Alessandro Benedetti <
> >>> abenede...@apache.org> wrote:
> >>>
> >>>> Sure, I will proceed tomorrow with the Jira and the simple patch +
> >>>> tests.
> >>>>
> >>>> In the meantime let's try to collect some additional feedback.
> >>>>
> >>>> Cheers
> >>>>
> >>>> On 29 December 2015 at 12:43, Anshum Gupta <ans...@anshumgupta.net>
> >>>> wrote:
> >>>>
> >>>>> Feel free to create a JIRA and put up a patch if you can.
> >>>>>
> >>>>> On Tue, Dec 29, 2015 at 4:26 PM, Alessandro Benedetti <
> >>>>> abenede...@apache.org
> >>>>> > wrote:
> >>>>>
> >>>>> > Hi guys,
> >>>>> > While I was exploring the way we build the More Like This query, I
> >>>>> > discovered a part I am not convinced of :
> >>>>> >
> >>>>> >
> >>>>> >
> >>>>> > Let's see how we build the query :
> >>>>> > org.apache.lucene.queries.mlt.MoreLikeThis#retrieveTerms(int)
> >>>>> >
> >>>>> > 1) we extract the terms from the interesting fields, adding them to
> >>>>> a map :
> >>>>> >
> >>>>> > Map<String, Int> termFreqMap = new HashMap<>();
> >>>>> >
> >>>>> > *( we lose the relation field-> term, we don't know anymore where
> >>>>> the term
> >>>>> > was coming ! )*
> >>>>> >
> >>>>> > org.apache.lucene.queries.mlt.MoreLikeThis#createQueue
> >>>>> >
> >>>>> > 2) we build the queue that will contain the query terms, at this
> >>>>> point we
> >>>>> > connect again there terms to some field, but :
> >>>>> >
> >>>>> > ...
> >>>>> >> // go through all the fields and find the largest document
> frequency
> >>>>> >> String topField = fieldNames[0];
> >>>>> >> int docFreq = 0;
> >>>>> >> for (String fieldName : fieldNames) {
> >>>>> >>   int freq = ir.docFreq(new Term(fieldName, word));
> >>>>> >>   topField = (freq > docFreq) ? fieldName : topField;
> >>>>> >>   docFreq = (freq > docFreq) ? freq : docFreq;
> >>>>> >> }
> >>>>> >> ...
> >>>>> >
> >>>>> >
> >>>>> > We identify the topField

Re: [More Like This] Query building

2016-03-11 Thread Alessandro Benedetti
I start to feel that is not that easy to contribute improvements or small
fix to Solr ( if they are not super interesting to the mass) .
I think this one could be a good improvement in the MLT but I would love to
discuss this with some committer.
The patch is attached, it is there since months ago...
Any feedback would be appreciated, I want to contribute, but I need some
second opinions ...

Cheers

On 11 February 2016 at 13:48, Alessandro Benedetti <abenede...@apache.org>
wrote:

> Hi Guys,
> is it possible to have any feedback ?
> Is there any process to speed up bug resolution / discussions ?
> just want to understand if the patch is not good enough, if I need to
> improve it or simply no-one took a look ...
>
> https://issues.apache.org/jira/browse/LUCENE-6954
>
> Cheers
>
> On 11 January 2016 at 15:25, Alessandro Benedetti <abenede...@apache.org>
> wrote:
>
>> Hi guys,
>> the patch seems fine to me.
>> I didn't spend much more time on the code but I checked the tests and the
>> pre-commit checks.
>> It seems fine to me.
>> Let me know ,
>>
>> Cheers
>>
>> On 31 December 2015 at 18:40, Alessandro Benedetti <abenede...@apache.org
>> > wrote:
>>
>>> https://issues.apache.org/jira/browse/LUCENE-6954
>>>
>>> First draft patch available, I will check better the tests new year !
>>>
>>> On 29 December 2015 at 13:43, Alessandro Benedetti <
>>> abenede...@apache.org> wrote:
>>>
>>>> Sure, I will proceed tomorrow with the Jira and the simple patch +
>>>> tests.
>>>>
>>>> In the meantime let's try to collect some additional feedback.
>>>>
>>>> Cheers
>>>>
>>>> On 29 December 2015 at 12:43, Anshum Gupta <ans...@anshumgupta.net>
>>>> wrote:
>>>>
>>>>> Feel free to create a JIRA and put up a patch if you can.
>>>>>
>>>>> On Tue, Dec 29, 2015 at 4:26 PM, Alessandro Benedetti <
>>>>> abenede...@apache.org
>>>>> > wrote:
>>>>>
>>>>> > Hi guys,
>>>>> > While I was exploring the way we build the More Like This query, I
>>>>> > discovered a part I am not convinced of :
>>>>> >
>>>>> >
>>>>> >
>>>>> > Let's see how we build the query :
>>>>> > org.apache.lucene.queries.mlt.MoreLikeThis#retrieveTerms(int)
>>>>> >
>>>>> > 1) we extract the terms from the interesting fields, adding them to
>>>>> a map :
>>>>> >
>>>>> > Map<String, Int> termFreqMap = new HashMap<>();
>>>>> >
>>>>> > *( we lose the relation field-> term, we don't know anymore where
>>>>> the term
>>>>> > was coming ! )*
>>>>> >
>>>>> > org.apache.lucene.queries.mlt.MoreLikeThis#createQueue
>>>>> >
>>>>> > 2) we build the queue that will contain the query terms, at this
>>>>> point we
>>>>> > connect again there terms to some field, but :
>>>>> >
>>>>> > ...
>>>>> >> // go through all the fields and find the largest document frequency
>>>>> >> String topField = fieldNames[0];
>>>>> >> int docFreq = 0;
>>>>> >> for (String fieldName : fieldNames) {
>>>>> >>   int freq = ir.docFreq(new Term(fieldName, word));
>>>>> >>   topField = (freq > docFreq) ? fieldName : topField;
>>>>> >>   docFreq = (freq > docFreq) ? freq : docFreq;
>>>>> >> }
>>>>> >> ...
>>>>> >
>>>>> >
>>>>> > We identify the topField as the field with the highest document
>>>>> frequency
>>>>> > for the term t .
>>>>> > Then we build the termQuery :
>>>>> >
>>>>> > queue.add(new ScoreTerm(word, *topField*, score, idf, docFreq, tf));
>>>>> >
>>>>> > In this way we lose a lot of precision.
>>>>> > Not sure why we do that.
>>>>> > I would prefer to keep the relation between terms and fields.
>>>>> > The MLT query can improve a lot the quality.
>>>>> > If i run the MLT on 2 fields : *description* and *facilities* for
>>>>> example.
>>>>> > It is likely I want to find doc

Re: [More Like This] Query building

2016-02-11 Thread Alessandro Benedetti
Hi Guys,
is it possible to have any feedback ?
Is there any process to speed up bug resolution / discussions ?
just want to understand if the patch is not good enough, if I need to
improve it or simply no-one took a look ...

https://issues.apache.org/jira/browse/LUCENE-6954

Cheers

On 11 January 2016 at 15:25, Alessandro Benedetti <abenede...@apache.org>
wrote:

> Hi guys,
> the patch seems fine to me.
> I didn't spend much more time on the code but I checked the tests and the
> pre-commit checks.
> It seems fine to me.
> Let me know ,
>
> Cheers
>
> On 31 December 2015 at 18:40, Alessandro Benedetti <abenede...@apache.org>
> wrote:
>
>> https://issues.apache.org/jira/browse/LUCENE-6954
>>
>> First draft patch available, I will check better the tests new year !
>>
>> On 29 December 2015 at 13:43, Alessandro Benedetti <abenede...@apache.org
>> > wrote:
>>
>>> Sure, I will proceed tomorrow with the Jira and the simple patch + tests.
>>>
>>> In the meantime let's try to collect some additional feedback.
>>>
>>> Cheers
>>>
>>> On 29 December 2015 at 12:43, Anshum Gupta <ans...@anshumgupta.net>
>>> wrote:
>>>
>>>> Feel free to create a JIRA and put up a patch if you can.
>>>>
>>>> On Tue, Dec 29, 2015 at 4:26 PM, Alessandro Benedetti <
>>>> abenede...@apache.org
>>>> > wrote:
>>>>
>>>> > Hi guys,
>>>> > While I was exploring the way we build the More Like This query, I
>>>> > discovered a part I am not convinced of :
>>>> >
>>>> >
>>>> >
>>>> > Let's see how we build the query :
>>>> > org.apache.lucene.queries.mlt.MoreLikeThis#retrieveTerms(int)
>>>> >
>>>> > 1) we extract the terms from the interesting fields, adding them to a
>>>> map :
>>>> >
>>>> > Map<String, Int> termFreqMap = new HashMap<>();
>>>> >
>>>> > *( we lose the relation field-> term, we don't know anymore where the
>>>> term
>>>> > was coming ! )*
>>>> >
>>>> > org.apache.lucene.queries.mlt.MoreLikeThis#createQueue
>>>> >
>>>> > 2) we build the queue that will contain the query terms, at this
>>>> point we
>>>> > connect again there terms to some field, but :
>>>> >
>>>> > ...
>>>> >> // go through all the fields and find the largest document frequency
>>>> >> String topField = fieldNames[0];
>>>> >> int docFreq = 0;
>>>> >> for (String fieldName : fieldNames) {
>>>> >>   int freq = ir.docFreq(new Term(fieldName, word));
>>>> >>   topField = (freq > docFreq) ? fieldName : topField;
>>>> >>   docFreq = (freq > docFreq) ? freq : docFreq;
>>>> >> }
>>>> >> ...
>>>> >
>>>> >
>>>> > We identify the topField as the field with the highest document
>>>> frequency
>>>> > for the term t .
>>>> > Then we build the termQuery :
>>>> >
>>>> > queue.add(new ScoreTerm(word, *topField*, score, idf, docFreq, tf));
>>>> >
>>>> > In this way we lose a lot of precision.
>>>> > Not sure why we do that.
>>>> > I would prefer to keep the relation between terms and fields.
>>>> > The MLT query can improve a lot the quality.
>>>> > If i run the MLT on 2 fields : *description* and *facilities* for
>>>> example.
>>>> > It is likely I want to find documents with similar terms in the
>>>> > description and similar terms in the facilities, without mixing up the
>>>> > things and loosing the semantic of the terms.
>>>> >
>>>> > Let me know your opinion,
>>>> >
>>>> > Cheers
>>>> >
>>>> >
>>>> > --
>>>> > --
>>>> >
>>>> > Benedetti Alessandro
>>>> > Visiting card : http://about.me/alessandro_benedetti
>>>> >
>>>> > "Tyger, tyger burning bright
>>>> > In the forests of the night,
>>>> > What immortal hand or eye
>>>> > Could frame thy fearful symmetry?"
>>>> >
>>>> > William Blake - Songs of Experience -1794 England
>>>> >
>>>>
>>>>
>>>>
>>>> --
>>>> Anshum Gupta
>>>>
>>>
>>>
>>>
>>> --
>>> --
>>>
>>> Benedetti Alessandro
>>> Visiting card : http://about.me/alessandro_benedetti
>>>
>>> "Tyger, tyger burning bright
>>> In the forests of the night,
>>> What immortal hand or eye
>>> Could frame thy fearful symmetry?"
>>>
>>> William Blake - Songs of Experience -1794 England
>>>
>>
>>
>>
>> --
>> --
>>
>> Benedetti Alessandro
>> Visiting card : http://about.me/alessandro_benedetti
>>
>> "Tyger, tyger burning bright
>> In the forests of the night,
>> What immortal hand or eye
>> Could frame thy fearful symmetry?"
>>
>> William Blake - Songs of Experience -1794 England
>>
>
>
>
> --
> --
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>



-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Re: [More Like This] Query building

2016-01-11 Thread Alessandro Benedetti
Hi guys,
the patch seems fine to me.
I didn't spend much more time on the code but I checked the tests and the
pre-commit checks.
It seems fine to me.
Let me know ,

Cheers

On 31 December 2015 at 18:40, Alessandro Benedetti <abenede...@apache.org>
wrote:

> https://issues.apache.org/jira/browse/LUCENE-6954
>
> First draft patch available, I will check better the tests new year !
>
> On 29 December 2015 at 13:43, Alessandro Benedetti <abenede...@apache.org>
> wrote:
>
>> Sure, I will proceed tomorrow with the Jira and the simple patch + tests.
>>
>> In the meantime let's try to collect some additional feedback.
>>
>> Cheers
>>
>> On 29 December 2015 at 12:43, Anshum Gupta <ans...@anshumgupta.net>
>> wrote:
>>
>>> Feel free to create a JIRA and put up a patch if you can.
>>>
>>> On Tue, Dec 29, 2015 at 4:26 PM, Alessandro Benedetti <
>>> abenede...@apache.org
>>> > wrote:
>>>
>>> > Hi guys,
>>> > While I was exploring the way we build the More Like This query, I
>>> > discovered a part I am not convinced of :
>>> >
>>> >
>>> >
>>> > Let's see how we build the query :
>>> > org.apache.lucene.queries.mlt.MoreLikeThis#retrieveTerms(int)
>>> >
>>> > 1) we extract the terms from the interesting fields, adding them to a
>>> map :
>>> >
>>> > Map<String, Int> termFreqMap = new HashMap<>();
>>> >
>>> > *( we lose the relation field-> term, we don't know anymore where the
>>> term
>>> > was coming ! )*
>>> >
>>> > org.apache.lucene.queries.mlt.MoreLikeThis#createQueue
>>> >
>>> > 2) we build the queue that will contain the query terms, at this point
>>> we
>>> > connect again there terms to some field, but :
>>> >
>>> > ...
>>> >> // go through all the fields and find the largest document frequency
>>> >> String topField = fieldNames[0];
>>> >> int docFreq = 0;
>>> >> for (String fieldName : fieldNames) {
>>> >>   int freq = ir.docFreq(new Term(fieldName, word));
>>> >>   topField = (freq > docFreq) ? fieldName : topField;
>>> >>   docFreq = (freq > docFreq) ? freq : docFreq;
>>> >> }
>>> >> ...
>>> >
>>> >
>>> > We identify the topField as the field with the highest document
>>> frequency
>>> > for the term t .
>>> > Then we build the termQuery :
>>> >
>>> > queue.add(new ScoreTerm(word, *topField*, score, idf, docFreq, tf));
>>> >
>>> > In this way we lose a lot of precision.
>>> > Not sure why we do that.
>>> > I would prefer to keep the relation between terms and fields.
>>> > The MLT query can improve a lot the quality.
>>> > If i run the MLT on 2 fields : *description* and *facilities* for
>>> example.
>>> > It is likely I want to find documents with similar terms in the
>>> > description and similar terms in the facilities, without mixing up the
>>> > things and loosing the semantic of the terms.
>>> >
>>> > Let me know your opinion,
>>> >
>>> > Cheers
>>> >
>>> >
>>> > --
>>> > --
>>> >
>>> > Benedetti Alessandro
>>> > Visiting card : http://about.me/alessandro_benedetti
>>> >
>>> > "Tyger, tyger burning bright
>>> > In the forests of the night,
>>> > What immortal hand or eye
>>> > Could frame thy fearful symmetry?"
>>> >
>>> > William Blake - Songs of Experience -1794 England
>>> >
>>>
>>>
>>>
>>> --
>>> Anshum Gupta
>>>
>>
>>
>>
>> --
>> --
>>
>> Benedetti Alessandro
>> Visiting card : http://about.me/alessandro_benedetti
>>
>> "Tyger, tyger burning bright
>> In the forests of the night,
>> What immortal hand or eye
>> Could frame thy fearful symmetry?"
>>
>> William Blake - Songs of Experience -1794 England
>>
>
>
>
> --
> --
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>



-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Re: [More Like This] Query building

2015-12-31 Thread Alessandro Benedetti
https://issues.apache.org/jira/browse/LUCENE-6954

First draft patch available, I will check better the tests new year !

On 29 December 2015 at 13:43, Alessandro Benedetti <abenede...@apache.org>
wrote:

> Sure, I will proceed tomorrow with the Jira and the simple patch + tests.
>
> In the meantime let's try to collect some additional feedback.
>
> Cheers
>
> On 29 December 2015 at 12:43, Anshum Gupta <ans...@anshumgupta.net> wrote:
>
>> Feel free to create a JIRA and put up a patch if you can.
>>
>> On Tue, Dec 29, 2015 at 4:26 PM, Alessandro Benedetti <
>> abenede...@apache.org
>> > wrote:
>>
>> > Hi guys,
>> > While I was exploring the way we build the More Like This query, I
>> > discovered a part I am not convinced of :
>> >
>> >
>> >
>> > Let's see how we build the query :
>> > org.apache.lucene.queries.mlt.MoreLikeThis#retrieveTerms(int)
>> >
>> > 1) we extract the terms from the interesting fields, adding them to a
>> map :
>> >
>> > Map<String, Int> termFreqMap = new HashMap<>();
>> >
>> > *( we lose the relation field-> term, we don't know anymore where the
>> term
>> > was coming ! )*
>> >
>> > org.apache.lucene.queries.mlt.MoreLikeThis#createQueue
>> >
>> > 2) we build the queue that will contain the query terms, at this point
>> we
>> > connect again there terms to some field, but :
>> >
>> > ...
>> >> // go through all the fields and find the largest document frequency
>> >> String topField = fieldNames[0];
>> >> int docFreq = 0;
>> >> for (String fieldName : fieldNames) {
>> >>   int freq = ir.docFreq(new Term(fieldName, word));
>> >>   topField = (freq > docFreq) ? fieldName : topField;
>> >>   docFreq = (freq > docFreq) ? freq : docFreq;
>> >> }
>> >> ...
>> >
>> >
>> > We identify the topField as the field with the highest document
>> frequency
>> > for the term t .
>> > Then we build the termQuery :
>> >
>> > queue.add(new ScoreTerm(word, *topField*, score, idf, docFreq, tf));
>> >
>> > In this way we lose a lot of precision.
>> > Not sure why we do that.
>> > I would prefer to keep the relation between terms and fields.
>> > The MLT query can improve a lot the quality.
>> > If i run the MLT on 2 fields : *description* and *facilities* for
>> example.
>> > It is likely I want to find documents with similar terms in the
>> > description and similar terms in the facilities, without mixing up the
>> > things and loosing the semantic of the terms.
>> >
>> > Let me know your opinion,
>> >
>> > Cheers
>> >
>> >
>> > --
>> > --
>> >
>> > Benedetti Alessandro
>> > Visiting card : http://about.me/alessandro_benedetti
>> >
>> > "Tyger, tyger burning bright
>> > In the forests of the night,
>> > What immortal hand or eye
>> > Could frame thy fearful symmetry?"
>> >
>> > William Blake - Songs of Experience -1794 England
>> >
>>
>>
>>
>> --
>> Anshum Gupta
>>
>
>
>
> --
> --
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>



-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


[More Like This] Query building

2015-12-29 Thread Alessandro Benedetti
Hi guys,
While I was exploring the way we build the More Like This query, I
discovered a part I am not convinced of :



Let's see how we build the query :
org.apache.lucene.queries.mlt.MoreLikeThis#retrieveTerms(int)

1) we extract the terms from the interesting fields, adding them to a map :

Map<String, Int> termFreqMap = new HashMap<>();

*( we lose the relation field-> term, we don't know anymore where the term
was coming ! )*

org.apache.lucene.queries.mlt.MoreLikeThis#createQueue

2) we build the queue that will contain the query terms, at this point we
connect again there terms to some field, but :

...
> // go through all the fields and find the largest document frequency
> String topField = fieldNames[0];
> int docFreq = 0;
> for (String fieldName : fieldNames) {
>   int freq = ir.docFreq(new Term(fieldName, word));
>   topField = (freq > docFreq) ? fieldName : topField;
>   docFreq = (freq > docFreq) ? freq : docFreq;
> }
> ...


We identify the topField as the field with the highest document frequency
for the term t .
Then we build the termQuery :

queue.add(new ScoreTerm(word, *topField*, score, idf, docFreq, tf));

In this way we lose a lot of precision.
Not sure why we do that.
I would prefer to keep the relation between terms and fields.
The MLT query can improve a lot the quality.
If i run the MLT on 2 fields : *description* and *facilities* for example.
It is likely I want to find documents with similar terms in the description
and similar terms in the facilities, without mixing up the things and
loosing the semantic of the terms.

Let me know your opinion,

Cheers


-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Re: [More Like This] Query building

2015-12-29 Thread Anshum Gupta
Feel free to create a JIRA and put up a patch if you can.

On Tue, Dec 29, 2015 at 4:26 PM, Alessandro Benedetti <abenede...@apache.org
> wrote:

> Hi guys,
> While I was exploring the way we build the More Like This query, I
> discovered a part I am not convinced of :
>
>
>
> Let's see how we build the query :
> org.apache.lucene.queries.mlt.MoreLikeThis#retrieveTerms(int)
>
> 1) we extract the terms from the interesting fields, adding them to a map :
>
> Map<String, Int> termFreqMap = new HashMap<>();
>
> *( we lose the relation field-> term, we don't know anymore where the term
> was coming ! )*
>
> org.apache.lucene.queries.mlt.MoreLikeThis#createQueue
>
> 2) we build the queue that will contain the query terms, at this point we
> connect again there terms to some field, but :
>
> ...
>> // go through all the fields and find the largest document frequency
>> String topField = fieldNames[0];
>> int docFreq = 0;
>> for (String fieldName : fieldNames) {
>>   int freq = ir.docFreq(new Term(fieldName, word));
>>   topField = (freq > docFreq) ? fieldName : topField;
>>   docFreq = (freq > docFreq) ? freq : docFreq;
>> }
>> ...
>
>
> We identify the topField as the field with the highest document frequency
> for the term t .
> Then we build the termQuery :
>
> queue.add(new ScoreTerm(word, *topField*, score, idf, docFreq, tf));
>
> In this way we lose a lot of precision.
> Not sure why we do that.
> I would prefer to keep the relation between terms and fields.
> The MLT query can improve a lot the quality.
> If i run the MLT on 2 fields : *description* and *facilities* for example.
> It is likely I want to find documents with similar terms in the
> description and similar terms in the facilities, without mixing up the
> things and loosing the semantic of the terms.
>
> Let me know your opinion,
>
> Cheers
>
>
> --
> --
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>



-- 
Anshum Gupta


Re: [More Like This] Query building

2015-12-29 Thread Alessandro Benedetti
Sure, I will proceed tomorrow with the Jira and the simple patch + tests.

In the meantime let's try to collect some additional feedback.

Cheers

On 29 December 2015 at 12:43, Anshum Gupta <ans...@anshumgupta.net> wrote:

> Feel free to create a JIRA and put up a patch if you can.
>
> On Tue, Dec 29, 2015 at 4:26 PM, Alessandro Benedetti <
> abenede...@apache.org
> > wrote:
>
> > Hi guys,
> > While I was exploring the way we build the More Like This query, I
> > discovered a part I am not convinced of :
> >
> >
> >
> > Let's see how we build the query :
> > org.apache.lucene.queries.mlt.MoreLikeThis#retrieveTerms(int)
> >
> > 1) we extract the terms from the interesting fields, adding them to a
> map :
> >
> > Map<String, Int> termFreqMap = new HashMap<>();
> >
> > *( we lose the relation field-> term, we don't know anymore where the
> term
> > was coming ! )*
> >
> > org.apache.lucene.queries.mlt.MoreLikeThis#createQueue
> >
> > 2) we build the queue that will contain the query terms, at this point we
> > connect again there terms to some field, but :
> >
> > ...
> >> // go through all the fields and find the largest document frequency
> >> String topField = fieldNames[0];
> >> int docFreq = 0;
> >> for (String fieldName : fieldNames) {
> >>   int freq = ir.docFreq(new Term(fieldName, word));
> >>   topField = (freq > docFreq) ? fieldName : topField;
> >>   docFreq = (freq > docFreq) ? freq : docFreq;
> >> }
> >> ...
> >
> >
> > We identify the topField as the field with the highest document frequency
> > for the term t .
> > Then we build the termQuery :
> >
> > queue.add(new ScoreTerm(word, *topField*, score, idf, docFreq, tf));
> >
> > In this way we lose a lot of precision.
> > Not sure why we do that.
> > I would prefer to keep the relation between terms and fields.
> > The MLT query can improve a lot the quality.
> > If i run the MLT on 2 fields : *description* and *facilities* for
> example.
> > It is likely I want to find documents with similar terms in the
> > description and similar terms in the facilities, without mixing up the
> > things and loosing the semantic of the terms.
> >
> > Let me know your opinion,
> >
> > Cheers
> >
> >
> > --
> > --
> >
> > Benedetti Alessandro
> > Visiting card : http://about.me/alessandro_benedetti
> >
> > "Tyger, tyger burning bright
> > In the forests of the night,
> > What immortal hand or eye
> > Could frame thy fearful symmetry?"
> >
> > William Blake - Songs of Experience -1794 England
> >
>
>
>
> --
> Anshum Gupta
>



-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


solr more like this sort / reranking

2015-10-18 Thread sara hajili
hi all.
i am new in solr.i use more like this (query handler) for get docs that are
similar to specefic doc that i mentioned.
and my issue is when i get some docs , i wanna to changed sort of that, for
example imagine that docs have one field as like_count,
i wanna to get more related docs to doc id = 4 and then sort them based on
like_count and score(i mean score that i get in result of more like this
for each docs ,that shows how this doc is related to my doc),so i need to
sort based on this 2 field and i set score for them for example sort based
on like_count^8 and score^25.
how can i do this?
i found new feature RE-ranking
http://blog.thedigitalgroup.com/vijaym/2015/06/19/query-re-ranking-in-solr/
in solr.
is it good approach?
if yes how i used it in more like this and pysolr?
i set in more like this parameter but i couldn't get answerd.
in this way:
param={
...

"rq":"{!rerank reRankQuery=$rqq reRankDocs=3 reRankWeight=10}",
> "rqq":"poem"
>
> ...
> }
> and if not.what is best approach to get this result from  more like this
query?
> tnx for any help


solr more like this sort / reranking

2015-10-17 Thread sara hajili
hi all.
i am new in solr.i use more like this (query handler) for get docs that are
similar to specefic doc that i mentioned.
and my issue is when i get some docs , i wanna to changed sort of that, for
example imagine that docs have one field as like_count,
i wanna to get more related docs to doc id = 4 and then sort them based on
like_count and score(i mean score that i get in result of more like this
for each docs ,that shows how this doc is related to my doc),so i need to
sort based on this 2 field and i set score for them for example sort based
on like_count^8 and score^25.
how can i do this?
i found new feature RE-ranking
http://blog.thedigitalgroup.com/vijaym/2015/06/19/query-re-ranking-in-solr/
in solr.
is it good approach?
if yes how i used it in more like this and pysolr?
i set in more like this parameter but i couldn't get answerd.
in this way:
param={
...

"rq":"{!rerank reRankQuery=$rqq reRankDocs=3 reRankWeight=10}",
"rqq":"poem"

...
}
and if not.what is best approach to get this result from  more like this
query?
tnx for any help


Re: More Like This on numeric fields - BF accepted by MLT handler

2015-09-28 Thread Upayavira
You could use the MLT query parser, and combine that with other queries,
whether as filters or boosts.

You can't yet use stream.body yet, so would need to use the handler if
you need that.

Upayavira

On Mon, Sep 28, 2015, at 09:53 AM, Alessandro Benedetti wrote:
> Hi Upaya,
> thanks for the explanation, I actually already did some investigations
> about it ( my first foundation was :
> http://cephas.net/blog/2008/03/30/how-morelikethis-works-in-lucene/ ) and
> then I took a look to the code.
> 
> Was just wondering what the community was thinking about
> including/providing numerical similarity ( approaches, ideas, possible
> existent solutions).
> Customisation should be the last step, if anything already available.
> 
> Thanks for the support anyway !
> 
> Cheers
> 
> 2015-09-25 12:47 GMT+01:00 Upayavira :
> 
> > Alessandro,
> >
> > I'd suggest you review the code of the MoreLikeThisHandler. It is a
> > little knotty, but it would be worth your while understanding what is
> > going on there.
> >
> > Basically, there are three phases:
> >
> > phase #1: parse the source document into a list of terms (avoided if
> > term vectors enabled and source doc is in index)
> > phase #2: calculate a score for each of these terms and select the n
> > highest scoring ones (default 25)
> > phase #3: build and execute a boolean query using these 25 terms
> >
> > Phase #2 uses a TF/IDF like approach to calculate the scores for those
> > "interesting terms".
> >
> > Once you understand what MLT is doing, you will probably not find it so
> > hard to create your own version which is better suited to your own
> > use-case.
> >
> > Of course, this would probably be better constructed as a QueryParser
> > rather than a request handler, but that's a detail.
> >
> > Upayavira
> >
> > On Fri, Sep 25, 2015, at 11:08 AM, Alessandro Benedetti wrote:
> > > Hi guys,
> > > was just investigating a little bit in how to include numeric fields in
> > > the
> > > MLT calculations.
> > >
> > > As we know, we are currently building a smart lucene query based on the
> > > document in input ( the one to search for similar ones) and run this
> > > query
> > > to obtain the similar docs.
> > > Because the MLT is currently built on TF/IDF , it is mainly thought for
> > > textual fields.
> > > What about we want to include a numeric factor  in the similarity
> > > calculus ?
> > >
> > > e.g.
> > > Solr Document ( Hotel)
> > > mlt.fl=description,stars,trip_advisor_rating
> > >
> > > To find the similarity based not only on the description, but also on the
> > > numeric fields ( stars and rating) .
> > >
> > > The first thought I had , is to add a support for boosting functions.
> > > In this way we are more flexible and we can add how many functions we
> > > want.
> > >
> > > For example adding :
> > > bf=div(1,dist(2,seedDocumentRatingA,seedDocumentRatingB,ratingA,ratingB))
> > >
> > > Also other kind of functions can be applied.
> > > What do you think ? Do you have any alternative ideas ?
> > >
> > > Cheers
> > > --
> > > --
> > >
> > > Benedetti Alessandro
> > > Visiting card : http://about.me/alessandro_benedetti
> > >
> > > "Tyger, tyger burning bright
> > > In the forests of the night,
> > > What immortal hand or eye
> > > Could frame thy fearful symmetry?"
> > >
> > > William Blake - Songs of Experience -1794 England
> >
> 
> 
> 
> -- 
> --
> 
> Benedetti Alessandro
> Visiting card - http://about.me/alessandro_benedetti
> Blog - http://alexbenedetti.blogspot.co.uk
> 
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
> 
> William Blake - Songs of Experience -1794 England


Re: More Like This on numeric fields - BF accepted by MLT handler

2015-09-28 Thread Alessandro Benedetti
Hi Upaya,
thanks for the explanation, I actually already did some investigations
about it ( my first foundation was :
http://cephas.net/blog/2008/03/30/how-morelikethis-works-in-lucene/ ) and
then I took a look to the code.

Was just wondering what the community was thinking about
including/providing numerical similarity ( approaches, ideas, possible
existent solutions).
Customisation should be the last step, if anything already available.

Thanks for the support anyway !

Cheers

2015-09-25 12:47 GMT+01:00 Upayavira :

> Alessandro,
>
> I'd suggest you review the code of the MoreLikeThisHandler. It is a
> little knotty, but it would be worth your while understanding what is
> going on there.
>
> Basically, there are three phases:
>
> phase #1: parse the source document into a list of terms (avoided if
> term vectors enabled and source doc is in index)
> phase #2: calculate a score for each of these terms and select the n
> highest scoring ones (default 25)
> phase #3: build and execute a boolean query using these 25 terms
>
> Phase #2 uses a TF/IDF like approach to calculate the scores for those
> "interesting terms".
>
> Once you understand what MLT is doing, you will probably not find it so
> hard to create your own version which is better suited to your own
> use-case.
>
> Of course, this would probably be better constructed as a QueryParser
> rather than a request handler, but that's a detail.
>
> Upayavira
>
> On Fri, Sep 25, 2015, at 11:08 AM, Alessandro Benedetti wrote:
> > Hi guys,
> > was just investigating a little bit in how to include numeric fields in
> > the
> > MLT calculations.
> >
> > As we know, we are currently building a smart lucene query based on the
> > document in input ( the one to search for similar ones) and run this
> > query
> > to obtain the similar docs.
> > Because the MLT is currently built on TF/IDF , it is mainly thought for
> > textual fields.
> > What about we want to include a numeric factor  in the similarity
> > calculus ?
> >
> > e.g.
> > Solr Document ( Hotel)
> > mlt.fl=description,stars,trip_advisor_rating
> >
> > To find the similarity based not only on the description, but also on the
> > numeric fields ( stars and rating) .
> >
> > The first thought I had , is to add a support for boosting functions.
> > In this way we are more flexible and we can add how many functions we
> > want.
> >
> > For example adding :
> > bf=div(1,dist(2,seedDocumentRatingA,seedDocumentRatingB,ratingA,ratingB))
> >
> > Also other kind of functions can be applied.
> > What do you think ? Do you have any alternative ideas ?
> >
> > Cheers
> > --
> > --
> >
> > Benedetti Alessandro
> > Visiting card : http://about.me/alessandro_benedetti
> >
> > "Tyger, tyger burning bright
> > In the forests of the night,
> > What immortal hand or eye
> > Could frame thy fearful symmetry?"
> >
> > William Blake - Songs of Experience -1794 England
>



-- 
--

Benedetti Alessandro
Visiting card - http://about.me/alessandro_benedetti
Blog - http://alexbenedetti.blogspot.co.uk

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


Re: More Like This on numeric fields - BF accepted by MLT handler

2015-09-25 Thread Upayavira
Alessandro,

I'd suggest you review the code of the MoreLikeThisHandler. It is a
little knotty, but it would be worth your while understanding what is
going on there.

Basically, there are three phases:

phase #1: parse the source document into a list of terms (avoided if
term vectors enabled and source doc is in index)
phase #2: calculate a score for each of these terms and select the n
highest scoring ones (default 25)
phase #3: build and execute a boolean query using these 25 terms

Phase #2 uses a TF/IDF like approach to calculate the scores for those
"interesting terms".

Once you understand what MLT is doing, you will probably not find it so
hard to create your own version which is better suited to your own
use-case.

Of course, this would probably be better constructed as a QueryParser
rather than a request handler, but that's a detail.

Upayavira

On Fri, Sep 25, 2015, at 11:08 AM, Alessandro Benedetti wrote:
> Hi guys,
> was just investigating a little bit in how to include numeric fields in
> the
> MLT calculations.
> 
> As we know, we are currently building a smart lucene query based on the
> document in input ( the one to search for similar ones) and run this
> query
> to obtain the similar docs.
> Because the MLT is currently built on TF/IDF , it is mainly thought for
> textual fields.
> What about we want to include a numeric factor  in the similarity
> calculus ?
> 
> e.g.
> Solr Document ( Hotel)
> mlt.fl=description,stars,trip_advisor_rating
> 
> To find the similarity based not only on the description, but also on the
> numeric fields ( stars and rating) .
> 
> The first thought I had , is to add a support for boosting functions.
> In this way we are more flexible and we can add how many functions we
> want.
> 
> For example adding :
> bf=div(1,dist(2,seedDocumentRatingA,seedDocumentRatingB,ratingA,ratingB))
> 
> Also other kind of functions can be applied.
> What do you think ? Do you have any alternative ideas ?
> 
> Cheers
> -- 
> --
> 
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
> 
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
> 
> William Blake - Songs of Experience -1794 England


More Like This on numeric fields - BF accepted by MLT handler

2015-09-25 Thread Alessandro Benedetti
Hi guys,
was just investigating a little bit in how to include numeric fields in the
MLT calculations.

As we know, we are currently building a smart lucene query based on the
document in input ( the one to search for similar ones) and run this query
to obtain the similar docs.
Because the MLT is currently built on TF/IDF , it is mainly thought for
textual fields.
What about we want to include a numeric factor  in the similarity calculus ?

e.g.
Solr Document ( Hotel)
mlt.fl=description,stars,trip_advisor_rating

To find the similarity based not only on the description, but also on the
numeric fields ( stars and rating) .

The first thought I had , is to add a support for boosting functions.
In this way we are more flexible and we can add how many functions we want.

For example adding :
bf=div(1,dist(2,seedDocumentRatingA,seedDocumentRatingB,ratingA,ratingB))

Also other kind of functions can be applied.
What do you think ? Do you have any alternative ideas ?

Cheers
-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England


solr get score of each doc in edis max search and more like this search result

2015-09-23 Thread sara hajili
hi all
i wanna to get each doc score in search result + restrict search result to
some doc that their score are above than score that i need (i mean i set
minimum score in search and get doc based on upper than that score)
i need this in normal search with edismax and more like this in pysolr
i undrestand that i can set debug = true
and from search resulrt i get

print(search_result.debug['explain'])

but this explain more and i couldn't get each doc score.
any help ?!
tnx


Re: solr get score of each doc in edis max search and more like this search result

2015-09-23 Thread Walter Underwood
You can request the “score” field in the “fl” parameter.

Why do you want to cut off at a particular score value?

Solr scores don’t work like that. They are not absolute relevance scores, they 
change with each query. There is no such thing as a 100% match or a 50% match.

Setting a lower score limit will almost certainly not do what you want. Because 
it doesn’t do anything useful.

I recommend reading this document for more info:

https://wiki.apache.org/lucene-java/ScoresAsPercentages 
<https://wiki.apache.org/lucene-java/ScoresAsPercentages>

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Sep 23, 2015, at 6:53 AM, sara hajili <hajili.s...@gmail.com> wrote:
> 
> hi all
> i wanna to get each doc score in search result + restrict search result to
> some doc that their score are above than score that i need (i mean i set
> minimum score in search and get doc based on upper than that score)
> i need this in normal search with edismax and more like this in pysolr
> i undrestand that i can set debug = true
> and from search resulrt i get
> 
> print(search_result.debug['explain'])
> 
> but this explain more and i couldn't get each doc score.
> any help ?!
> tnx



more like this generated query

2015-04-27 Thread alxsss
Hello,


I am using solr-4.10.4 with mlt. I noticed that mlt constructs query which is 
missing some words. For example, for doc with 
title: Jennnifer Lopez 
keywords: Jennifer, concert, Hollywood


the parsedquery generated by mlt for this doc  is  title:lopez 
keywords:jennifer keywords:concert keywords:hollywood.
It seems to me that there must be  title:jennifer, too


For another doc that has only title, mlt generated query includes  
keywords:famili. This doc has family in title.


Any ideas what is wrong here?


Thanks.
Alex.








more like this and term vectors

2015-02-23 Thread Scott C. Cote
Is there a way to configure the more like this query handler and also receive 
the corresponding term vectors? (tf-idf) ?

I tried by creating a “search component” for the term vectors and adding it to 
the mlt handler, but that did not work.

Here is what I tried:

 searchComponent name=tvComponent 
class=org.apache.solr.handler.component.TermVectorComponent”/

   requestHandler name=/mlt class=solr.MoreLikeThisHandler
lst name=defaults
  str name=mlt.flfilteredText/str
  str name=mlt.mintf1/str
  str name=mlt.mindf1/str
  str name=mlt.interestingTermslist/str
  bool name=tvtrue/bool
/lst 
arr name=last-components
  strtvComponent/str
/arr
   /requestHandler

Now I realize that I could turn on the debug parameter but that does not 
contain the all of the tf/idf (at least not like the tv component provides)

Thanks,

SCott

Re: more like this and term vectors

2015-02-23 Thread Jack Krupansky
It's never helpful when you merely say that it did not work - detail the
symptom, please.

Post both the query and the response. As well as the field and type
definitions for the fields for which you expected term vectors - no term
vectors are enabled by default.

-- Jack Krupansky

On Mon, Feb 23, 2015 at 2:48 PM, Scott C. Cote scottcc...@yahoo.com.invalid
 wrote:

 Is there a way to configure the more like this query handler and also
 receive the corresponding term vectors? (tf-idf) ?

 I tried by creating a “search component” for the term vectors and adding
 it to the mlt handler, but that did not work.

 Here is what I tried:

  searchComponent name=tvComponent
 class=org.apache.solr.handler.component.TermVectorComponent”/

requestHandler name=/mlt class=solr.MoreLikeThisHandler
 lst name=defaults
   str name=mlt.flfilteredText/str
   str name=mlt.mintf1/str
   str name=mlt.mindf1/str
   str name=mlt.interestingTermslist/str
   bool name=tvtrue/bool
 /lst
 arr name=last-components
   strtvComponent/str
 /arr
/requestHandler

 Now I realize that I could turn on the debug parameter but that does not
 contain the all of the tf/idf (at least not like the tv component provides)

 Thanks,

 SCott


RE: More Like This similarity tuning

2015-02-04 Thread Markus Jelsma
Well, maxqt is easy, it is just the number of terms that compose your query.  
MinTF is a strange parameter, rare terms have a low DF and most usually not a 
high TF, so i would keep it at 1. MinDF is more useful, it depends entirely on 
the size of your corpus. If you have a lot of user-generated input - meaning, 
bad spelled terms - then you have to set MinDF to a setting higher than the 
most frequent misspellings but low enough to find rare terms.

It depends on your index.

-Original message-
 From:Ali Nazemian alinazem...@gmail.com
 Sent: Wednesday 4th February 2015 11:15
 To: solr-user@lucene.apache.org
 Subject: More Like This similarity tuning
 
 Hi,
 I am looking for a best practice on More Like This parameters. I really
 appreciate if somebody can tell me what is the best value for these
 parameters in MLT query? Or at lease the proper methodology for finding the
 best value for each of these parameters:
 mlt.mintf
 mlt.mindf
 mlt.maxqt
 
 Thank you very much.
 Best regards.
 
 -- 
 A.Nazemian
 


More Like This similarity tuning

2015-02-04 Thread Ali Nazemian
Hi,
I am looking for a best practice on More Like This parameters. I really
appreciate if somebody can tell me what is the best value for these
parameters in MLT query? Or at lease the proper methodology for finding the
best value for each of these parameters:
mlt.mintf
mlt.mindf
mlt.maxqt

Thank you very much.
Best regards.

-- 
A.Nazemian


Re: More Like This similarity tuning

2015-02-04 Thread Ali Nazemian
Dear Markus,
Would you please explain more about maxqt  parameter and the methodology of
choosing best number of terms for this value?
Best regards.

On Wed, Feb 4, 2015 at 2:46 PM, Markus Jelsma markus.jel...@openindex.io
wrote:

 Well, maxqt is easy, it is just the number of terms that compose your
 query.  MinTF is a strange parameter, rare terms have a low DF and most
 usually not a high TF, so i would keep it at 1. MinDF is more useful, it
 depends entirely on the size of your corpus. If you have a lot of
 user-generated input - meaning, bad spelled terms - then you have to set
 MinDF to a setting higher than the most frequent misspellings but low
 enough to find rare terms.

 It depends on your index.

 -Original message-
  From:Ali Nazemian alinazem...@gmail.com
  Sent: Wednesday 4th February 2015 11:15
  To: solr-user@lucene.apache.org
  Subject: More Like This similarity tuning
 
  Hi,
  I am looking for a best practice on More Like This parameters. I really
  appreciate if somebody can tell me what is the best value for these
  parameters in MLT query? Or at lease the proper methodology for finding
 the
  best value for each of these parameters:
  mlt.mintf
  mlt.mindf
  mlt.maxqt
 
  Thank you very much.
  Best regards.
 
  --
  A.Nazemian
 




-- 
A.Nazemian


Re: Lucene cosine similarity score for more like this query

2015-02-03 Thread Ali Nazemian
Dear Koji,
Thank you very much.
Do you know what is the range of score in this new formula? What is the
reasonable threshold for considering two documents as similar enough in
this formula?
Regards.

On Tue, Feb 3, 2015 at 1:35 PM, Koji Sekiguchi k...@r.email.ne.jp wrote:

 Lucene uses TFIDFSimilarity class to calculate the similarity.
 It is implemented on the idea of cosine measurement but it modifies the
 cosine formula.
 Please take a look at Lucene Practical Scoring Function in the following
 Javadoc:

 http://lucene.apache.org/core/4_10_3/core/org/apache/lucene/
 search/similarities/TFIDFSimilarity.html

 Koji
 --
 http://soleami.com/blog/comparing-document-classification-functions-of-
 lucene-and-mahout.html


 On 2015/02/03 5:39, Ali Nazemian wrote:

 Dear Erik,
 Thank you for your response. Would younplease tell me why this score could
 be higher than 1? While cosine similarity can not be higher than 1.
 On Feb 2, 2015 7:32 PM, Erik Hatcher erik.hatc...@gmail.com wrote:

  The scoring is the same as Lucene.  To get deeper insight into how a
 score
 is computed, use Solr’s debug=true mode to see the explain details in the
 response.

  Erik

  On Feb 2, 2015, at 10:49 AM, Ali Nazemian alinazem...@gmail.com
 wrote:

 Hi,
 I was wondering what is the range of score is brought by more like this
 query in Solr? I know that the Lucene uses cosine similarity in vector
 space model for calculating similarity between two documents. I also
 know
 that cosine similarity is between -1 and 1 but the fact that I dont
 understand is why the score which is brought by more like this query

 could

 be 12 for example?! Would you please explain what is the calculation
 process is Solr?
 Thank you very much.

 Best regards.

 --
 A.Nazemian










-- 
A.Nazemian


Re: Lucene cosine similarity score for more like this query

2015-02-03 Thread Koji Sekiguchi

Lucene uses TFIDFSimilarity class to calculate the similarity.
It is implemented on the idea of cosine measurement but it modifies the cosine 
formula.
Please take a look at Lucene Practical Scoring Function in the following 
Javadoc:

http://lucene.apache.org/core/4_10_3/core/org/apache/lucene/search/similarities/TFIDFSimilarity.html

Koji
--
http://soleami.com/blog/comparing-document-classification-functions-of-lucene-and-mahout.html

On 2015/02/03 5:39, Ali Nazemian wrote:

Dear Erik,
Thank you for your response. Would younplease tell me why this score could
be higher than 1? While cosine similarity can not be higher than 1.
On Feb 2, 2015 7:32 PM, Erik Hatcher erik.hatc...@gmail.com wrote:


The scoring is the same as Lucene.  To get deeper insight into how a score
is computed, use Solr’s debug=true mode to see the explain details in the
response.

 Erik


On Feb 2, 2015, at 10:49 AM, Ali Nazemian alinazem...@gmail.com wrote:

Hi,
I was wondering what is the range of score is brought by more like this
query in Solr? I know that the Lucene uses cosine similarity in vector
space model for calculating similarity between two documents. I also know
that cosine similarity is between -1 and 1 but the fact that I dont
understand is why the score which is brought by more like this query

could

be 12 for example?! Would you please explain what is the calculation
process is Solr?
Thank you very much.

Best regards.

--
A.Nazemian











RE: Lucene cosine similarity score for more like this query

2015-02-02 Thread Markus Jelsma
Hi - MoreLikeThis is not based on cosine similarity. The idea is that rare 
terms - high IDF - are extracted from the source document, and then used to 
build a regular Query(). That query follows the same rules as regular queries, 
the rules of your similarity implemenation, which is TFIDF by default. So, as 
suggested, if you enable debugging, you can clearly see why scores can be above 
1, or even much higher if queryNorm is disabled when using BM25 as similarity.

If you really need cosine similarity between documents, you have to enable term 
vectors for the source fields, and use them to calculate the angle. The problem 
is that this does not scale well, you would need to calculate angles with 
virtually all other documents.

M.
 
-Original message-
 From:Ali Nazemian alinazem...@gmail.com
 Sent: Monday 2nd February 2015 21:39
 To: solr-user@lucene.apache.org
 Subject: Re: Lucene cosine similarity score for more like this query
 
 Dear Erik,
 Thank you for your response. Would younplease tell me why this score could
 be higher than 1? While cosine similarity can not be higher than 1.
 On Feb 2, 2015 7:32 PM, Erik Hatcher erik.hatc...@gmail.com wrote:
 
  The scoring is the same as Lucene.  To get deeper insight into how a score
  is computed, use Solr’s debug=true mode to see the explain details in the
  response.
 
  Erik
 
   On Feb 2, 2015, at 10:49 AM, Ali Nazemian alinazem...@gmail.com wrote:
  
   Hi,
   I was wondering what is the range of score is brought by more like this
   query in Solr? I know that the Lucene uses cosine similarity in vector
   space model for calculating similarity between two documents. I also know
   that cosine similarity is between -1 and 1 but the fact that I dont
   understand is why the score which is brought by more like this query
  could
   be 12 for example?! Would you please explain what is the calculation
   process is Solr?
   Thank you very much.
  
   Best regards.
  
   --
   A.Nazemian
 
 
 


Re: Lucene cosine similarity score for more like this query

2015-02-02 Thread Ali Nazemian
Dear Erik,
Thank you for your response. Would younplease tell me why this score could
be higher than 1? While cosine similarity can not be higher than 1.
On Feb 2, 2015 7:32 PM, Erik Hatcher erik.hatc...@gmail.com wrote:

 The scoring is the same as Lucene.  To get deeper insight into how a score
 is computed, use Solr’s debug=true mode to see the explain details in the
 response.

 Erik

  On Feb 2, 2015, at 10:49 AM, Ali Nazemian alinazem...@gmail.com wrote:
 
  Hi,
  I was wondering what is the range of score is brought by more like this
  query in Solr? I know that the Lucene uses cosine similarity in vector
  space model for calculating similarity between two documents. I also know
  that cosine similarity is between -1 and 1 but the fact that I dont
  understand is why the score which is brought by more like this query
 could
  be 12 for example?! Would you please explain what is the calculation
  process is Solr?
  Thank you very much.
 
  Best regards.
 
  --
  A.Nazemian




RE: Hit Highlighting and More Like This

2015-02-02 Thread Markus Jelsma
Hi - you can use the MLT query parser in Solr 5.0 or patch 4.10.x
https://issues.apache.org/jira/browse/SOLR-6248


 
-Original message-
 From:Tim Hearn timseman...@gmail.com
 Sent: Saturday 31st January 2015 0:31
 To: solr-user@lucene.apache.org
 Subject: Hit Highlighting and More Like This
 
 Hi all,
 
 I'm fairly new to Solr.  It seems like it should be possible to enable the
 hit highlighting feature and more like this feature at the same time, with
 the key words from the MLT query being the terms highlighted.  Is this
 possible?  I am trying right now to do this, but I am not having any
 snippets returned to me.
 
 Thanks!
 


Re: Lucene cosine similarity score for more like this query

2015-02-02 Thread Dikshant Shahi
Conceptually, your understanding is correct about VSM  cosine similarity.
In text analysis, the range is 0 to 1 as there is no negative similarity.

The scores for handler which internally use Lucene's cosine similarity can
also go beyond 1. The reason being these scores are computed for each field
and goes through more computation after that. For example
summation/multiplication of scores for fields, to come up with the final
score for the document. Correct me, if my understanding is wrong.

Thanks,
Dikshant



On Tue, Feb 3, 2015 at 2:53 AM, Markus Jelsma markus.jel...@openindex.io
wrote:

 Hi - MoreLikeThis is not based on cosine similarity. The idea is that rare
 terms - high IDF - are extracted from the source document, and then used to
 build a regular Query(). That query follows the same rules as regular
 queries, the rules of your similarity implemenation, which is TFIDF by
 default. So, as suggested, if you enable debugging, you can clearly see why
 scores can be above 1, or even much higher if queryNorm is disabled when
 using BM25 as similarity.

 If you really need cosine similarity between documents, you have to enable
 term vectors for the source fields, and use them to calculate the angle.
 The problem is that this does not scale well, you would need to calculate
 angles with virtually all other documents.

 M.

 -Original message-
  From:Ali Nazemian alinazem...@gmail.com
  Sent: Monday 2nd February 2015 21:39
  To: solr-user@lucene.apache.org
  Subject: Re: Lucene cosine similarity score for more like this query
 
  Dear Erik,
  Thank you for your response. Would younplease tell me why this score
 could
  be higher than 1? While cosine similarity can not be higher than 1.
  On Feb 2, 2015 7:32 PM, Erik Hatcher erik.hatc...@gmail.com wrote:
 
   The scoring is the same as Lucene.  To get deeper insight into how a
 score
   is computed, use Solr’s debug=true mode to see the explain details in
 the
   response.
  
   Erik
  
On Feb 2, 2015, at 10:49 AM, Ali Nazemian alinazem...@gmail.com
 wrote:
   
Hi,
I was wondering what is the range of score is brought by more like
 this
query in Solr? I know that the Lucene uses cosine similarity in
 vector
space model for calculating similarity between two documents. I also
 know
that cosine similarity is between -1 and 1 but the fact that I dont
understand is why the score which is brought by more like this query
   could
be 12 for example?! Would you please explain what is the
 calculation
process is Solr?
Thank you very much.
   
Best regards.
   
--
A.Nazemian
  
  
 



Lucene cosine similarity score for more like this query

2015-02-02 Thread Ali Nazemian
Hi,
I was wondering what is the range of score is brought by more like this
query in Solr? I know that the Lucene uses cosine similarity in vector
space model for calculating similarity between two documents. I also know
that cosine similarity is between -1 and 1 but the fact that I dont
understand is why the score which is brought by more like this query could
be 12 for example?! Would you please explain what is the calculation
process is Solr?
Thank you very much.

Best regards.

-- 
A.Nazemian


Re: Lucene cosine similarity score for more like this query

2015-02-02 Thread Erik Hatcher
The scoring is the same as Lucene.  To get deeper insight into how a score is 
computed, use Solr’s debug=true mode to see the explain details in the response.

Erik

 On Feb 2, 2015, at 10:49 AM, Ali Nazemian alinazem...@gmail.com wrote:
 
 Hi,
 I was wondering what is the range of score is brought by more like this
 query in Solr? I know that the Lucene uses cosine similarity in vector
 space model for calculating similarity between two documents. I also know
 that cosine similarity is between -1 and 1 but the fact that I dont
 understand is why the score which is brought by more like this query could
 be 12 for example?! Would you please explain what is the calculation
 process is Solr?
 Thank you very much.
 
 Best regards.
 
 -- 
 A.Nazemian



Hit Highlighting and More Like This

2015-01-30 Thread Tim Hearn
Hi all,

I'm fairly new to Solr.  It seems like it should be possible to enable the
hit highlighting feature and more like this feature at the same time, with
the key words from the MLT query being the terms highlighted.  Is this
possible?  I am trying right now to do this, but I am not having any
snippets returned to me.

Thanks!


Re: Minimum Term Matching in More Like This Queries

2014-11-08 Thread Anurag Sharma
There is no direct way of retrieving doc based on minimum term match in
Solr. mlm params 'mlt.mintf' and 'mlt.match.offset' can be explored if they
meets the criteria. Refer below links for more details:
http://wiki.apache.org/solr/MoreLikeThisHandler
https://wiki.apache.org/solr/MoreLikeThis

In case you are using lucene library directly,setPercentTermsToMatch()
function can be used from MoreLikeThisQuery class.
Refer code:
https://svn.apache.org/repos/asf/lucene/dev/trunk/lucene/queries/src/java/org/apache/lucene/queries/mlt/MoreLikeThisQuery.java

On Fri, Nov 7, 2014 at 9:45 PM, Tim Hearn timseman...@gmail.com wrote:

 Hi!

 I'm fairly new to Solr.  Is there a feature which enforces minimum term
 matching for MLT Queries?  More precisely, that is, a document will match
 the MLT query if and only if at least x terms in the query are found in the
 document, with x defined by the user.  I could not find such a feature in
 the documentation, and switching to the edismax query parser and using the
 'mm' parameter does not work for me.

 Thanks!



Minimum Term Matching in More Like This Queries

2014-11-07 Thread Tim Hearn
Hi!

I'm fairly new to Solr.  Is there a feature which enforces minimum term
matching for MLT Queries?  More precisely, that is, a document will match
the MLT query if and only if at least x terms in the query are found in the
document, with x defined by the user.  I could not find such a feature in
the documentation, and switching to the edismax query parser and using the
'mm' parameter does not work for me.

Thanks!


Group and Field Collapsing in SOLR More like this

2013-11-14 Thread balaji
Hi

I have two types of profile : Shadow and DO and I am trying to use MLT to
bring related recommendation of a userID

In the result I get both the types but I want to restrict the results of
document through a field (type) I pass it on.

Currently grouping and field collapsing does not seem to work. Any other way
to achieve it


Thanks
Balaji



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Group-and-Field-Collapsing-in-SOLR-More-like-this-tp4101032.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Help on solr more like this functionality

2013-10-26 Thread Koji Sekiguchi

Hi Suren,

(13/10/25 23:36), Suren Raju wrote:

Hi,

We are trying to solve a business problem by performing solr more like this
query. We are able to perform the more like this search. We have a specific
use case that requires different boost on different match fields. Say i do
more like this based on fields title and description of products. I wanna
provide more boost for match field *title *than the description.

Query im trying so far is

mysolrhost:8983/solr/mlt?q=id:UTF8TESTmlt.fl=title,descriptionmlt.mindf=1mlt.mintf=1

Is there any way to provide different boost for title and description?



I don't have much experience on MLT, but index time boosting might help you?

Koji
--
http://soleami.com/blog/automatically-acquiring-synonym-knowledge-from-wikipedia.html


Help on solr more like this functionality

2013-10-25 Thread Suren Raju
Hi,

We are trying to solve a business problem by performing solr more like this
query. We are able to perform the more like this search. We have a specific
use case that requires different boost on different match fields. Say i do
more like this based on fields title and description of products. I wanna
provide more boost for match field *title *than the description.

Query im trying so far is

mysolrhost:8983/solr/mlt?q=id:UTF8TESTmlt.fl=title,descriptionmlt.mindf=1mlt.mintf=1

Is there any way to provide different boost for title and description?


Many thanks,

Suren.


Constant score for more like this reference document

2013-06-03 Thread Achim Domma
I call the mlt handler using a query which searches for a certain document 
(?q=id:some_document_id). The reference document is included in the result and 
the score is also returned. I found out, that the score if fixed, independent 
of the document. So for each document id I get the same score. The score varies 
between cores, but is fixed per core.

I'm aware of all the warnings about scores not being absolute values and that 
you cannot compare them. But I wonder, why the value is fixed per core. Is it 
just a random value or is it possible to explain how it's calculated?

I'm just digging into the code to get a better understanding of the inner 
working, but I'm not yet deep enough. Feel free to point me to the relevant 
code snippets!

kind regards,
Achim

Re: Getting explain information of more like this search in a more usable format

2013-05-14 Thread Andre Bois-Crettez

On 05/13/2013 03:12 PM, Achim Domma wrote:

I'm mainly interested in showing the terms which each result document has in 
common with the reference document.

regards,
Achim

It seems a good job for highlighting ?
http://docs.lucidworks.com/display/solr/Highlighting
http://wiki.apache.org/solr/SolrConfigXml#The_Highlighter_plugin_configuration_section

--
André Bois-Crettez

Search technology, Kelkoo
http://www.kelkoo.com/


Kelkoo SAS
Société par Actions Simplifiée
Au capital de € 4.168.964,30
Siège social : 8, rue du Sentier 75002 Paris
425 093 069 RCS Paris

Ce message et les pièces jointes sont confidentiels et établis à l'attention 
exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce 
message, merci de le détruire et d'en avertir l'expéditeur.


Re: Getting explain information of more like this search in a more usable format

2013-05-14 Thread Andre Bois-Crettez

On 05/14/2013 03:44 PM, Andre Bois-Crettez wrote:

On 05/13/2013 03:12 PM, Achim Domma wrote:

I'm mainly interested in showing the terms which each result document has in 
common with the reference document.

regards,
Achim

It seems a good job for highlighting ?
http://docs.lucidworks.com/display/solr/Highlighting
http://wiki.apache.org/solr/SolrConfigXml#The_Highlighter_plugin_configuration_section

André Bois-Crettez


Sorry, forget what I said, I totally overlook you where using MLT, and
it does not seem possible to combine it with highlighting.


André

--
André Bois-Crettez

Search technology, Kelkoo
http://www.kelkoo.com/


Kelkoo SAS
Société par Actions Simplifiée
Au capital de € 4.168.964,30
Siège social : 8, rue du Sentier 75002 Paris
425 093 069 RCS Paris

Ce message et les pièces jointes sont confidentiels et établis à l'attention 
exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce 
message, merci de le détruire et d'en avertir l'expéditeur.


Getting explain information of more like this search in a more usable format

2013-05-13 Thread Achim Domma
Hi,

I'm executing a more like this search using the MoreLikeThisHandler. I can add 
score to the fields to be returned, but that's all I could find about getting 
information about how/why documents match. I would like to give my users more 
hints why documents are similar, so I would like to display important 
overlapping terms. If I specify debugQuery=true the result contains a explain 
section which is quite detailed, but in a text format I would have to parse. Is 
there a way to get this kind of information in a more usable way which does not 
force me to use a debug-flag? I'm mainly interested in showing the terms which 
each result document has in common with the reference document.

regards,
Achim

Re: Getting explain information of more like this search in a more usable format

2013-05-13 Thread Jack Krupansky
Try debug.explain.structured=true, which will give you an XML response that 
can be traversed.


Don't worry about the fact that these features are labeled debug - they 
are there simply to explain what is happening. Is there some particular 
concern you have about them being labeled debug? Although, you are not the 
first person to complain! What if Solr simply renamed these features with 
the term detail instead of debug - would that cure your concern?!


-- Jack Krupansky

-Original Message- 
From: Achim Domma

Sent: Monday, May 13, 2013 9:12 AM
To: solr-user@lucene.apache.org
Subject: Getting explain information of more like this search in a more 
usable format


Hi,

I'm executing a more like this search using the MoreLikeThisHandler. I can 
add score to the fields to be returned, but that's all I could find about 
getting information about how/why documents match. I would like to give my 
users more hints why documents are similar, so I would like to display 
important overlapping terms. If I specify debugQuery=true the result 
contains a explain section which is quite detailed, but in a text format I 
would have to parse. Is there a way to get this kind of information in a 
more usable way which does not force me to use a debug-flag? I'm mainly 
interested in showing the terms which each result document has in common 
with the reference document.


regards,
Achim= 



Re: More Like This and Caching

2013-05-10 Thread Giammarco Schisani
Hi David, Jason and Otis,

Thank you for the feedback on the question. It is very much appreciated.

To confirm what caches are being used, I will remove on of the Solr servers
from the cluster, restart it, note the status of the various Solr caches,
issue some MLT queries to it, and compare the status of the cache against
the notes previously taken. I believe this will provide the definitive
answer on this.

I will reply to this thread with my findings.

Kind regards,
Giammarco

On Fri, May 10, 2013 at 1:14 AM, Otis Gospodnetic 
otis.gospodne...@gmail.com wrote:

 This is correct,  doc cache for previously read docs regardless of which
 query read them and query cache for repeat query. Plus OS cache for actual
 index files.

 Otis
 Solr  ElasticSearch Support
 http://sematext.com/
 On May 9, 2013 2:32 PM, Jason Hellman jhell...@innoventsolutions.com
 wrote:

  Purely from empirical observation, both the DocumentCache and
  QueryResultCache are being populated and reused in reloads of a simple
 MLT
  search.  You can see in the cache inserts how much extra-curricular
  activity is happening to populate the MLT data by how many inserts and
  lookups occur on the first load.
 
  (lifted right out of the MLT wiki
 http://wiki.apache.org/solr/MoreLikeThis)
 
 
 
 http://localhost:8983/solr/select?q=apachemlt=truemlt.fl=manu,catmlt.mindf=1mlt.mintf=1fl=id,score
 
  There is no activity in the filterCache, fieldCache, or fieldValueCache -
  and that makes plenty of sense.
 
  On May 9, 2013, at 11:12 AM, David Parks davidpark...@yahoo.com wrote:
 
   I'm not the expert here, but perhaps what you're noticing is actually
 the
   OS's disk cache. The actual solr index isn't cached by solr, but as you
  read
   the blocks off disk the OS disk cache probably did cache those blocks
 for
   you. On the 2nd run the index blocks were read out of memory.
  
   There was a very extensive discussion on this list not long back
 titled:
   Re: SolrCloud loadbalancing, replication, and failover look that
  thread up
   and you'll get a lot of in-depth on the topic.
  
   David
  
  
   -Original Message-
   From: Giammarco Schisani [mailto:giamma...@schisani.com]
   Sent: Thursday, May 09, 2013 2:59 PM
   To: solr-user@lucene.apache.org
   Subject: More Like This and Caching
  
   Hi all,
  
   Could anybody explain which Solr cache (e.g. queryResultCache,
   documentCache, fieldCache, etc.) can be used by the More Like This
  handler?
  
   One of my colleagues had previously suggested that the More Like This
   handler does not take advantage of any of the Solr caches.
  
   However, if I issue two identical MLT requests to the same Solr
 instance,
   the second request will execute much faster than the first request (for
   example, the first request will execute in 200ms and the second request
  will
   execute in 20ms). This makes me believe that at least one of the Solr
  caches
   is being used by the More Like This handler.
  
   I think the documentCache is the cache that is most likely being
 used,
  but
   would you be able to confirm?
  
   As information, I am currently using Solr version 3.6.1.
  
   Kind regards,
   Giammarco Schisani
  
 
 



More Like This and Caching

2013-05-09 Thread Giammarco Schisani
Hi all,

Could anybody explain which Solr cache (e.g. queryResultCache,
documentCache, fieldCache, etc.) can be used by the More Like This handler?

One of my colleagues had previously suggested that the More Like This
handler does not take advantage of any of the Solr caches.

However, if I issue two identical MLT requests to the same Solr instance,
the second request will execute much faster than the first request (for
example, the first request will execute in 200ms and the second request
will execute in 20ms). This makes me believe that at least one of the Solr
caches is being used by the More Like This handler.

I think the documentCache is the cache that is most likely being used,
but would you be able to confirm?

As information, I am currently using Solr version 3.6.1.

Kind regards,
Giammarco Schisani


RE: More Like This and Caching

2013-05-09 Thread David Parks
I'm not the expert here, but perhaps what you're noticing is actually the
OS's disk cache. The actual solr index isn't cached by solr, but as you read
the blocks off disk the OS disk cache probably did cache those blocks for
you. On the 2nd run the index blocks were read out of memory.

There was a very extensive discussion on this list not long back titled:
Re: SolrCloud loadbalancing, replication, and failover look that thread up
and you'll get a lot of in-depth on the topic.

David


-Original Message-
From: Giammarco Schisani [mailto:giamma...@schisani.com] 
Sent: Thursday, May 09, 2013 2:59 PM
To: solr-user@lucene.apache.org
Subject: More Like This and Caching

Hi all,

Could anybody explain which Solr cache (e.g. queryResultCache,
documentCache, fieldCache, etc.) can be used by the More Like This handler?

One of my colleagues had previously suggested that the More Like This
handler does not take advantage of any of the Solr caches.

However, if I issue two identical MLT requests to the same Solr instance,
the second request will execute much faster than the first request (for
example, the first request will execute in 200ms and the second request will
execute in 20ms). This makes me believe that at least one of the Solr caches
is being used by the More Like This handler.

I think the documentCache is the cache that is most likely being used, but
would you be able to confirm?

As information, I am currently using Solr version 3.6.1.

Kind regards,
Giammarco Schisani



Re: More Like This and Caching

2013-05-09 Thread Jason Hellman
Purely from empirical observation, both the DocumentCache and QueryResultCache 
are being populated and reused in reloads of a simple MLT search.  You can see 
in the cache inserts how much extra-curricular activity is happening to 
populate the MLT data by how many inserts and lookups occur on the first load. 

(lifted right out of the MLT wiki http://wiki.apache.org/solr/MoreLikeThis )

http://localhost:8983/solr/select?q=apachemlt=truemlt.fl=manu,catmlt.mindf=1mlt.mintf=1fl=id,score

There is no activity in the filterCache, fieldCache, or fieldValueCache - and 
that makes plenty of sense.

On May 9, 2013, at 11:12 AM, David Parks davidpark...@yahoo.com wrote:

 I'm not the expert here, but perhaps what you're noticing is actually the
 OS's disk cache. The actual solr index isn't cached by solr, but as you read
 the blocks off disk the OS disk cache probably did cache those blocks for
 you. On the 2nd run the index blocks were read out of memory.
 
 There was a very extensive discussion on this list not long back titled:
 Re: SolrCloud loadbalancing, replication, and failover look that thread up
 and you'll get a lot of in-depth on the topic.
 
 David
 
 
 -Original Message-
 From: Giammarco Schisani [mailto:giamma...@schisani.com] 
 Sent: Thursday, May 09, 2013 2:59 PM
 To: solr-user@lucene.apache.org
 Subject: More Like This and Caching
 
 Hi all,
 
 Could anybody explain which Solr cache (e.g. queryResultCache,
 documentCache, fieldCache, etc.) can be used by the More Like This handler?
 
 One of my colleagues had previously suggested that the More Like This
 handler does not take advantage of any of the Solr caches.
 
 However, if I issue two identical MLT requests to the same Solr instance,
 the second request will execute much faster than the first request (for
 example, the first request will execute in 200ms and the second request will
 execute in 20ms). This makes me believe that at least one of the Solr caches
 is being used by the More Like This handler.
 
 I think the documentCache is the cache that is most likely being used, but
 would you be able to confirm?
 
 As information, I am currently using Solr version 3.6.1.
 
 Kind regards,
 Giammarco Schisani
 



Re: More Like This and Caching

2013-05-09 Thread Otis Gospodnetic
This is correct,  doc cache for previously read docs regardless of which
query read them and query cache for repeat query. Plus OS cache for actual
index files.

Otis
Solr  ElasticSearch Support
http://sematext.com/
On May 9, 2013 2:32 PM, Jason Hellman jhell...@innoventsolutions.com
wrote:

 Purely from empirical observation, both the DocumentCache and
 QueryResultCache are being populated and reused in reloads of a simple MLT
 search.  You can see in the cache inserts how much extra-curricular
 activity is happening to populate the MLT data by how many inserts and
 lookups occur on the first load.

 (lifted right out of the MLT wiki http://wiki.apache.org/solr/MoreLikeThis)


 http://localhost:8983/solr/select?q=apachemlt=truemlt.fl=manu,catmlt.mindf=1mlt.mintf=1fl=id,score

 There is no activity in the filterCache, fieldCache, or fieldValueCache -
 and that makes plenty of sense.

 On May 9, 2013, at 11:12 AM, David Parks davidpark...@yahoo.com wrote:

  I'm not the expert here, but perhaps what you're noticing is actually the
  OS's disk cache. The actual solr index isn't cached by solr, but as you
 read
  the blocks off disk the OS disk cache probably did cache those blocks for
  you. On the 2nd run the index blocks were read out of memory.
 
  There was a very extensive discussion on this list not long back titled:
  Re: SolrCloud loadbalancing, replication, and failover look that
 thread up
  and you'll get a lot of in-depth on the topic.
 
  David
 
 
  -Original Message-
  From: Giammarco Schisani [mailto:giamma...@schisani.com]
  Sent: Thursday, May 09, 2013 2:59 PM
  To: solr-user@lucene.apache.org
  Subject: More Like This and Caching
 
  Hi all,
 
  Could anybody explain which Solr cache (e.g. queryResultCache,
  documentCache, fieldCache, etc.) can be used by the More Like This
 handler?
 
  One of my colleagues had previously suggested that the More Like This
  handler does not take advantage of any of the Solr caches.
 
  However, if I issue two identical MLT requests to the same Solr instance,
  the second request will execute much faster than the first request (for
  example, the first request will execute in 200ms and the second request
 will
  execute in 20ms). This makes me believe that at least one of the Solr
 caches
  is being used by the More Like This handler.
 
  I think the documentCache is the cache that is most likely being used,
 but
  would you be able to confirm?
 
  As information, I am currently using Solr version 3.6.1.
 
  Kind regards,
  Giammarco Schisani
 




Returning similarity values for more like this search

2013-04-19 Thread Achim Domma
Hi,

I'm executing a search including a search for similar documents 
(mlt=truemlt.fl=) which works fine so far. I would like to get the 
similarity value for each document. I expected this to be quite common and 
simple, but I could not find a hint how to do it. Any hint how to do it would 
be very appreciated.

kind regards,
Achim

Re: Returning similarity values for more like this search

2013-04-19 Thread Koji Sekiguchi

(13/04/19 23:24), Achim Domma wrote:

Hi,

I'm executing a search including a search for similar documents 
(mlt=truemlt.fl=) which works fine so far. I would like to get the 
similarity value for each document. I expected this to be quite common and simple, 
but I could not find a hint how to do it. Any hint how to do it would be very 
appreciated.

kind regards,
Achim



Using debugQuery=true, you can find explanations in the debug section of the 
response.

See:
https://issues.apache.org/jira/browse/SOLR-860

koji
--
http://soleami.com/blog/lucene-4-is-super-convenient-for-developing-nlp-tools.html


Re: More Like This, finding original record

2013-02-12 Thread Daniel Rijkhof
Well, i have found the following line in
MoreLikeThisHandler$MoreListThisHelper.getMoreLikeThis(..)


  // exclude current document from results  realMLTQuery.add(
  new TermQuery
http://javasourcecode.org/html/open-source/lucene/lucene-3.6.1/org/apache/lucene/search/TermQuery.java.html(new
Term 
http://javasourcecode.org/html/open-source/lucene/lucene-3.6.1/org/apache/lucene/index/Term.java.html(uniqueKeyField.getName(),
uniqueKeyField.getType().storedToIndexed(doc.getFieldable(uniqueKeyField.getName(),
BooleanClause.Occur.MUST_NOT);

I'll try to remove the line someway, and see if my results work for me.

It at least is clear that this line is not surrounded by any if statement
and will always be executed, so 'NO' is the answer to is there a way to
get the current document in the search results?.


Have Fun
Daniel



On Tue, Feb 12, 2013 at 3:25 PM, Daniel Rijkhof daniel.rijk...@gmail.comwrote:

 I guess it's not possible, but perhaps someone knows how to do this:

 Do a more like this query (through the mlt handler),
 And find the match record within the response records (top match, should
 be first in list).

 This would then make it possible for me to compare scores...

 Anybody around that did this? (Modify source code perhaps?)

 Have Fun
 Daniel



Re: More Like This, finding original record

2013-02-12 Thread Otis Gospodnetic
Hello,

Daniel, are you looking for the original doc you used for MLT in the
response? You could always and easily do this on the client side by looking
at IDs of returned docs.

Otis
Solr  ElasticSearch Support
http://sematext.com/



On Feb 12, 2013 9:26 AM, Daniel Rijkhof daniel.rijk...@gmail.com wrote:

 I guess it's not possible, but perhaps someone knows how to do this:

 Do a more like this query (through the mlt handler),
 And find the match record within the response records (top match, should be
 first in list).

 This would then make it possible for me to compare scores...

 Anybody around that did this? (Modify source code perhaps?)

 Have Fun
 Daniel



Problems using distributed More Like This

2013-02-11 Thread Shawn Heisey
SOLR-788 added Distributed MLT to Solr 4.1, but I have not been able to 
get it to work.  I don't know if it's user error, which of course is 
very possible.  If it is user error, I'd like to know what I'm doing 
wrong so I can fix it.  I am actually using a recent checkout of Solr 
4.2, not the released 4.1.


I put some extensive information on SOLR-4414, an issue filed by another 
user having a similar problem.  If you look for the last comment from me 
on Feb 7 that has a code block, you'll see Solr's response when I use 
MoreLikeThisComponent.


https://issues.apache.org/jira/browse/SOLR-4414

Only the last seven of the query parameters were included on the URL - 
the rest of them are in solrconfig.xml.  Due to echoParams=all, the only 
part of the request handler definition that you can't see in the 
response is the fact that last-components contains spellcheck.


I redacted the company domain name from the shards and the one document 
matching the query from the result tag, but there are no other changes 
to the response.


If I send an identical query to the shard core that actually contains 
the document rather than the core with the shards parameter, I get MLT 
results.


I have heard recently that Solr 4.x has hardcoded the unique field name 
for SolrCloud sharding as id ... but my uniqueKey field name is tag_id. 
 Could this be my problem?  It would be a monumental development effort 
to change that field name in our application.  I am not using SolrCloud 
for this index.


Thanks,
Shawn


Re: Problems using distributed More Like This

2013-02-11 Thread Mark Miller
Eventually, I'll get around to trying some more real world testing. Up till 
now, no dev seems to have a real interest in this. I have 0 need for it 
currently, so it's fairly low on my itch scale, but it's on my list anyhow.

- Mark

On Feb 11, 2013, at 12:26 PM, Shawn Heisey s...@elyograg.org wrote:

 SOLR-788 added Distributed MLT to Solr 4.1, but I have not been able to get 
 it to work.  I don't know if it's user error, which of course is very 
 possible.  If it is user error, I'd like to know what I'm doing wrong so I 
 can fix it.  I am actually using a recent checkout of Solr 4.2, not the 
 released 4.1.
 
 I put some extensive information on SOLR-4414, an issue filed by another user 
 having a similar problem.  If you look for the last comment from me on Feb 7 
 that has a code block, you'll see Solr's response when I use 
 MoreLikeThisComponent.
 
 https://issues.apache.org/jira/browse/SOLR-4414
 
 Only the last seven of the query parameters were included on the URL - the 
 rest of them are in solrconfig.xml.  Due to echoParams=all, the only part of 
 the request handler definition that you can't see in the response is the fact 
 that last-components contains spellcheck.
 
 I redacted the company domain name from the shards and the one document 
 matching the query from the result tag, but there are no other changes to 
 the response.
 
 If I send an identical query to the shard core that actually contains the 
 document rather than the core with the shards parameter, I get MLT results.
 
 I have heard recently that Solr 4.x has hardcoded the unique field name for 
 SolrCloud sharding as id ... but my uniqueKey field name is tag_id.  Could 
 this be my problem?  It would be a monumental development effort to change 
 that field name in our application.  I am not using SolrCloud for this index.
 
 Thanks,
 Shawn



Re: More Like this without a document?

2012-11-19 Thread Chris Hostetter

: If I want to use MoreLikeThis algorithm I need to add this documents in the
: index? The MoreLikeThis will work with soft commits? Is there a solution to
: do a MoreLikeThis without adding the document in the index?

you can feed the MoreLikeThisHandler a ContentStream (ie: POST data, or 
file upload, or stream.body request param) of text instead of sending it 
a query and it will use that raw text to find more like this

http://wiki.apache.org/solr/MoreLikeThisHandler

-Hoss


More Like this without a document?

2012-11-05 Thread Raimon Bosch
Hi,

I'm designing a K-nearest neighbors classifier for Solr. So I am taking
information IMDB and creating a set of documents with the description of
each movie and the categories selected for each document.

To validate if the classification is correct I'm using cross-validation. So
I do not include in the index the documents that I want to guess.

If I want to use MoreLikeThis algorithm I need to add this documents in the
index? The MoreLikeThis will work with soft commits? Is there a solution to
do a MoreLikeThis without adding the document in the index?

Thanks,
Raimon Bosch.


Re: More Like this without a document?

2012-11-05 Thread Walter Underwood
I wrote something like this for Ultraseek. After the document was parsed and 
analyzed, I took the top terms (by tf.idf) and did a search, then added fields 
with the categories.

You might be able to use the document analysis request handler for this. 
Analyze it, then choose terms, do the search, modify the doc, then submit it 
for indexing. It would get parsed twice, but that might not be a big deal.

Warning, this could put a big load on Solr. My implementation really pounded 
Ultraseek. The queries are long and they don't really match what is in the 
caches.

wunder

On Nov 5, 2012, at 8:40 AM, Raimon Bosch wrote:

 Hi,
 
 I'm designing a K-nearest neighbors classifier for Solr. So I am taking
 information IMDB and creating a set of documents with the description of
 each movie and the categories selected for each document.
 
 To validate if the classification is correct I'm using cross-validation. So
 I do not include in the index the documents that I want to guess.
 
 If I want to use MoreLikeThis algorithm I need to add this documents in the
 index? The MoreLikeThis will work with soft commits? Is there a solution to
 do a MoreLikeThis without adding the document in the index?
 
 Thanks,
 Raimon Bosch.






same results for select and More Like This(MLT) search

2012-10-03 Thread aniljayanti
Hi,

My application is working fine with normal search (*/select?q=*) using SOLR. 

Normal Query URL: 
solr/select?q=title:lovely

Now want to implement More Like This (MLT) in my application. Configured MLT
in SOLr below like this.

solrconfig.xml
--
  requestHandler name=/mlt class=solr.MoreLikeThisHandler
  lst name=defaults
str name=mlt.fltitle/str
str name=mlt.mintf1/str
str name=mlt.mindf2/str
str name=mlt.boosttrue/str
/lst
  /requestHandler

URL :

solr/mlt?q=title:lovelymlt.fl=title

im getting results and count with MLT same like normal select query. Can you
please guide me if i did any wrong configurations for MLT ???

Thanks in Advance,

AnilJayanti




--
View this message in context: 
http://lucene.472066.n3.nabble.com/same-results-for-select-and-More-Like-This-MLT-search-tp4011563.html
Sent from the Solr - User mailing list archive at Nabble.com.


SOLR 4 BEAT - More Like This

2012-08-28 Thread Nick Koton
I am having difficulty getting MLT to work with a cloud configuration in
SOLR 4 beta.  I have reproduced it with the example schema and data from
the distribution:
http://hostname:8983/solr/example/select?
q=id:VDBDB1A16mlt=truemlt.fl=text,features,name,sku,id,manu,cat,title,desc
ription,keywords,author,resourcename

When I direct this at a single instance SOLR server I get the first response
below while the second response shows the cloud system's response.  The
moreLikeThis section is missing.  However, if I make an error in the query
syntax (like missing mlt.fl) I see the error from both.

Is anyone else seeing similar behavior?

Nick Koton


STANALONE RESPONSE

?xml version=1.0 encoding=UTF-8?
-response -lst name=responseHeaderint name=status0/intint
name=QTime5/int-lst name=paramsstr name=mlttrue/strstr
name=qid:VDBDB1A16/strstr
name=mlt.fltext,features,name,sku,id,manu,cat,title,description,keywords,
author,resourcename/str/lst/lst-result name=response start=0
numFound=1-docstr name=idVDBDB1A16/strstr name=nameA-DATA
V-Series 1GB 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200) System Memory -
OEM/strstr name=manuA-DATA Technology Inc./strstr
name=manu_id_scorsair/str-arr
name=catstrelectronics/strstrmemory/str/arr-arr
name=featuresstrCAS latency 3, 2.7v/str/arrint
name=popularity0/intbool name=inStocktrue/boolstr
name=store45.18414,-93.88141/strdate
name=manufacturedate_dt2006-02-13T15:26:37Z/datestr
name=payloadselectronics|0.9 memory|0.1/strlong
name=_version_1411477698415427584/long/doc/result-lst
name=moreLikeThis-result name=VDBDB1A16 start=0
numFound=4-docstr name=idVS1GB400C3/strstr name=nameCORSAIR
ValueSelect 1GB 184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200) System Memory
- Retail/strstr name=manuCorsair Microsystems Inc./strstr
name=manu_id_scorsair/str-arr
name=catstrelectronics/strstrmemory/str/arrfloat
name=price74.99/floatstr name=price_c74.99,USD/strint
name=popularity7/intbool name=inStocktrue/boolstr
name=store37.7752,-100.0232/strdate
name=manufacturedate_dt2006-02-13T15:26:37Z/datestr
name=payloadselectronics|4.0 memory|2.0/strlong
name=_version_1411477698411233280/long/doc-docstr
name=idTWINX2048-3200PRO/strstr name=nameCORSAIR XMS 2GB (2 x 1GB)
184-Pin DDR SDRAM Unbuffered DDR 400 (PC 3200) Dual Channel Kit System
Memory - Retail/strstr name=manuCorsair Microsystems Inc./strstr
name=manu_id_scorsair/str-arr
name=catstrelectronics/strstrmemory/str/arr-arr
name=featuresstrCAS latency 2, 2-3-3-6 timing, 2.75v, unbuffered,
heat-spreader/str/arrfloat name=price185.0/floatstr
name=price_c185,USD/strint name=popularity5/intbool
name=inStocktrue/boolstr name=store37.7752,-122.4232/strdate
name=manufacturedate_dt2006-02-13T15:26:37Z/datestr
name=payloadselectronics|6.0 memory|3.0/strlong
name=_version_1411477698403893248/long/doc-docstr
name=id0579B002/strstr name=nameCanon PIXMA MP500 All-In-One Photo
Printer/strstr name=manuCanon Inc./strstr
name=manu_id_scanon/str-arr
name=catstrelectronics/strstrmultifunction
printer/strstrprinter/strstrscanner/strstrcopier/str/arr-ar
r name=featuresstrMultifunction ink-jet color photo
printer/strstrFlatbed scanner, optical scan resolution of 1,200 x 2,400
dpi/strstr2.5 color LCD preview screen/strstrDuplex
Copying/strstrPrinting speed up to 29ppm black, 19ppm
color/strstrHi-Speed USB/strstrmemory card: CompactFlash, Micro
Drive, SmartMedia, Memory Stick, Memory Stick Pro, SD Card, and
MultiMediaCard/str/arrfloat name=weight352.0/floatfloat
name=price179.99/floatstr name=price_c179.99,USD/strint
name=popularity6/intbool name=inStocktrue/boolstr
name=store45.19214,-93.89941/strlong
name=_version_1411477698464710656/long/doc-docstr
name=idEN7800GTX/2DHTV/256M/strstr name=nameASUS Extreme
N7800GTX/2DHTV (256 MB)/strstr name=manuASUS Computer Inc./strstr
name=manu_id_sasus/str-arr
name=catstrelectronics/strstrgraphics card/str/arr-arr
name=featuresstrNVIDIA GeForce 7800 GTX GPU/VPU clocked at
486MHz/strstr256MB GDDR3 Memory clocked at 1.35GHz/strstrPCI Express
x16/strstrDual DVI connectors, HDTV out, video input/strstrOpenGL
2.0, DirectX 9.0/str/arrfloat name=weight16.0/floatfloat
name=price479.95/floatstr name=price_c479.95,USD/strint
name=popularity7/intstr name=store40.7143,-74.006/strbool
name=inStockfalse/booldate
name=manufacturedate_dt2006-02-13T00:00:00Z/datelong
name=_version_1411477698517139456/long/doc/result/lst /response

CLOUD RESPONSE

?xml version=1.0 encoding=UTF-8?
-response -lst name=responseHeaderint name=status0/intint
name=QTime30/int-lst name=paramsstr name=mlttrue/strstr
name=qid:VDBDB1A16/strstr
name=mlt.fltext,features,name,sku,id,manu,cat,title,description,keywords,
author,resourcename/str/lst/lst-result name=response
maxScore=1.9162908 start=0 numFound=1-docstr
name=idVDBDB1A16/strstr name=nameA-DATA V-Series 1GB 184-Pin DDR
SDRAM Unbuffered DDR 400 (PC 3200) System Memory - OEM/strstr
name=manuA-DATA Technology Inc./strstr
name=manu_id_scorsair/str-arr
name=catstrelectronics/strstrmemory/str/arr-arr
name=featuresstrCAS 

Re: SOLR 4 BEAT - More Like This

2012-08-28 Thread Shawn Heisey

On 8/28/2012 5:47 AM, Nick Koton wrote:

I am having difficulty getting MLT to work with a cloud configuration in
SOLR 4 beta.  I have reproduced it with the example schema and data from
the distribution:
http://hostname:8983/solr/example/select?
q=id:VDBDB1A16mlt=truemlt.fl=text,features,name,sku,id,manu,cat,title,desc
ription,keywords,author,resourcename

When I direct this at a single instance SOLR server I get the first response
below while the second response shows the cloud system's response.  The
moreLikeThis section is missing.  However, if I make an error in the query
syntax (like missing mlt.fl) I see the error from both.


As of right now, Solr does not support MLT with distributed search. An 
issue was filed in May of 2009 to fix this.  There are a number of 
patches on the issue, but they don't apply to the current codebase.  I 
don't know the Solr code well enough to understand what the patch does, 
or I would have already offered up a new patch.


https://issues.apache.org/jira/browse/SOLR-788

Thanks,
Shawn



Exception in Solr server on more like this

2011-12-22 Thread Scott Smith
I've been trying to get More like this running under solr 3.5.  I get the 
Exception below. The http request is also highlighted below.

I've looked at the FieldType code and I don't understand what's going on there. 
 So, while I know what a null pointer exception means, it isn't telling me what 
I did or didn't do.

FYI - the Body field has termVectors set to true which I thought was 
sufficient for MLT.

What I'm trying to do is submit the phrase country now is the time country to 
MLT to determine the interesting words (which I want returned) and then 
return the top most relevant documents.

Any help on what might be wrong would be appreciated.

Scott

6975 [main] INFO com.mainstreamdata.MediasIndexer.mediasBrowser.SearchFactory  
- SearchFactory:SearchFactory: Search Factory initialized
SolrQuery:: (country now is the time country)
Filter:: (Language:en)
15274 [main] ERROR com.mainstreamdata.MediasIndexer.mediasBrowser.SolrSearch  - 
SolrSearch:getDocTier: Unable to do search:
org.apache.solr.client.solrj.SolrServerException: Error executing query
at 
org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:95)
at 
org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:266)
at 
com.mainstreamdata.MediasIndexer.mediasBrowser.SolrSearch.getTier(SolrSearch.java:309)
at 
com.mainstreamdata.MediasIndexer.mediasBrowser.SolrSearch.getFirstTier(SolrSearch.java:93)
at 
com.mainstreamdata.MediasIndexer.mediasBrowser.SolrSearch.getNextOlderTier(SolrSearch.java:175)
at 
com.mainstreamdata.MediasIndexer.SolrMgrTest.testMoreLikeThis(SolrMgrTest.java:209)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown 
Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at junit.framework.TestCase.runTest(TestCase.java:164)
at junit.framework.TestCase.runBare(TestCase.java:130)
at junit.framework.TestResult$1.protect(TestResult.java:106)
at junit.framework.TestResult.runProtected(TestResult.java:124)
at junit.framework.TestResult.run(TestResult.java:109)
at junit.framework.TestCase.run(TestCase.java:120)
at junit.framework.TestSuite.runTest(TestSuite.java:230)
at junit.framework.TestSuite.run(TestSuite.java:225)
at 
org.eclipse.jdt.internal.junit.runner.junit3.JUnit3TestReference.run(JUnit3TestReference.java:130)
at 
org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)
Caused by: org.apache.solr.common.SolrException: null  
java.lang.NullPointerException
   at 
org.apache.solr.schema.FieldType.storedToIndexed(FieldType.java:374)
at 
org.apache.solr.handler.MoreLikeThisHandler$MoreLikeThisHelper.getMoreLikeThis(MoreLikeThisHandler.java:320)
   at 
org.apache.solr.handler.component.MoreLikeThisComponent.getMoreLikeThese(MoreLikeThisComponent.java:82)
   at 
org.apache.solr.handler.component.MoreLikeThisComponent.process(MoreLikeThisComponent.java:57)
   at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194)
   at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:208)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1372)
   at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
   at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
   at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
   at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
   at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
   at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
   at 
org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
   at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230

RE: Exception in Solr server on more like this

2011-12-22 Thread Scott Smith
This turned out to be SOLR-2986.

-Original Message-
From: Scott Smith [mailto:ssm...@mainstreamdata.com] 
Sent: Thursday, December 22, 2011 1:24 PM
To: solr-user@lucene.apache.org
Subject: Exception in Solr server on more like this

I've been trying to get More like this running under solr 3.5.  I get the 
Exception below. The http request is also highlighted below.

I've looked at the FieldType code and I don't understand what's going on there. 
 So, while I know what a null pointer exception means, it isn't telling me what 
I did or didn't do.

FYI - the Body field has termVectors set to true which I thought was 
sufficient for MLT.

What I'm trying to do is submit the phrase country now is the time country to 
MLT to determine the interesting words (which I want returned) and then 
return the top most relevant documents.

Any help on what might be wrong would be appreciated.

Scott

6975 [main] INFO com.mainstreamdata.MediasIndexer.mediasBrowser.SearchFactory  
- SearchFactory:SearchFactory: Search Factory initialized
SolrQuery:: (country now is the time country)
Filter:: (Language:en)
15274 [main] ERROR com.mainstreamdata.MediasIndexer.mediasBrowser.SolrSearch  - 
SolrSearch:getDocTier: Unable to do search:
org.apache.solr.client.solrj.SolrServerException: Error executing query
at 
org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:95)
at 
org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:266)
at 
com.mainstreamdata.MediasIndexer.mediasBrowser.SolrSearch.getTier(SolrSearch.java:309)
at 
com.mainstreamdata.MediasIndexer.mediasBrowser.SolrSearch.getFirstTier(SolrSearch.java:93)
at 
com.mainstreamdata.MediasIndexer.mediasBrowser.SolrSearch.getNextOlderTier(SolrSearch.java:175)
at 
com.mainstreamdata.MediasIndexer.SolrMgrTest.testMoreLikeThis(SolrMgrTest.java:209)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown 
Source)
at java.lang.reflect.Method.invoke(Unknown Source)
at junit.framework.TestCase.runTest(TestCase.java:164)
at junit.framework.TestCase.runBare(TestCase.java:130)
at junit.framework.TestResult$1.protect(TestResult.java:106)
at junit.framework.TestResult.runProtected(TestResult.java:124)
at junit.framework.TestResult.run(TestResult.java:109)
at junit.framework.TestCase.run(TestCase.java:120)
at junit.framework.TestSuite.runTest(TestSuite.java:230)
at junit.framework.TestSuite.run(TestSuite.java:225)
at 
org.eclipse.jdt.internal.junit.runner.junit3.JUnit3TestReference.run(JUnit3TestReference.java:130)
at 
org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
at 
org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)
Caused by: org.apache.solr.common.SolrException: null  
java.lang.NullPointerException
   at 
org.apache.solr.schema.FieldType.storedToIndexed(FieldType.java:374)
at 
org.apache.solr.handler.MoreLikeThisHandler$MoreLikeThisHelper.getMoreLikeThis(MoreLikeThisHandler.java:320)
   at 
org.apache.solr.handler.component.MoreLikeThisComponent.getMoreLikeThese(MoreLikeThisComponent.java:82)
   at 
org.apache.solr.handler.component.MoreLikeThisComponent.process(MoreLikeThisComponent.java:57)
   at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:194)
   at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:208)
   at org.apache.solr.core.SolrCore.execute(SolrCore.java:1372)
   at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:356)
   at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:252)
   at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
   at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
   at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
   at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
   at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766

more like this

2011-10-05 Thread Fred Zimmerman
Hi,

for my application, I would like to be able to create web queries
(wget/curl) that get more like this for either a single arbitrarily
specified URL or for the first x terms in a search query.  I want to return
the results to myself as a csv file using wt=csv. How can I accomplish the
MLT piece of it?

Fred Z.

-
Subscribe to the Nimble Books Mailing List  http://eepurl.com/czS- for
monthly updates


Re: More Like This on number fields

2011-09-01 Thread Chris Hostetter

: For example, a document with the field numberOfParticipant at 10, i would
: like to have some similar documents with numberOfParticipant between 5 and
: 15.
: 
: Does this option exist ?

No ... MLT works purely on the basis of terms, so if you tried have 
MLT use a numeric field it would just find you docs that had the exact 
same value.

-Hoss


Excluding results from more like this

2011-03-09 Thread Brian Lamb
Hi all,

I'm using MoreLikeThis to find similar results but I'd like to exclude
records by the id number. For example, I use the following URL:

http://localhost:8983/solr/search/?q=id:(2 3
5)mlt=truemlt.fl=description,idfl=*,score

How would I exclude record 4 form the MoreLikeThis results?

I tried,

http://localhost:8983/solr/search/?q=id:(2 3
5)mlt=truemlt.fl=description,idfl=*,scoremlt.q=!4

But that still returned record 4 in the MoreLikeThisResults.


Re: Excluding results from more like this

2011-03-09 Thread Otis Gospodnetic
Brian,

...?q=id:(2  3 5) -4


Otis
---
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 
 From: Brian Lamb brian.l...@journalexperts.com
 To: solr-user@lucene.apache.org
 Sent: Wed, March 9, 2011 4:05:10 PM
 Subject: Excluding results from more like this
 
 Hi all,
 
 I'm using MoreLikeThis to find similar results but I'd like to  exclude
 records by the id number. For example, I use the following  URL:
 
 http://localhost:8983/solr/search/?q=id:(2  3
 5)mlt=truemlt.fl=description,idfl=*,score
 
 How would I  exclude record 4 form the MoreLikeThis results?
 
 I tried,
 
 http://localhost:8983/solr/search/?q=id:(2  3
 5)mlt=truemlt.fl=description,idfl=*,scoremlt.q=!4
 
 But  that still returned record 4 in the MoreLikeThisResults.
 


Re: Excluding results from more like this

2011-03-09 Thread Brian Lamb
That doesn't seem to do it. Record 4 is still showing up in the MoreLikeThis
results.

On Wed, Mar 9, 2011 at 4:12 PM, Otis Gospodnetic otis_gospodne...@yahoo.com
 wrote:

 Brian,

 ...?q=id:(2  3 5) -4


 Otis
 ---
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
 Lucene ecosystem search :: http://search-lucene.com/



 - Original Message 
  From: Brian Lamb brian.l...@journalexperts.com
  To: solr-user@lucene.apache.org
  Sent: Wed, March 9, 2011 4:05:10 PM
  Subject: Excluding results from more like this
 
  Hi all,
 
  I'm using MoreLikeThis to find similar results but I'd like to  exclude
  records by the id number. For example, I use the following  URL:
 
  http://localhost:8983/solr/search/?q=id:(2  3
  5)mlt=truemlt.fl=description,idfl=*,score
 
  How would I  exclude record 4 form the MoreLikeThis results?
 
  I tried,
 
  http://localhost:8983/solr/search/?q=id:(2  3
  5)mlt=truemlt.fl=description,idfl=*,scoremlt.q=!4
 
  But  that still returned record 4 in the MoreLikeThisResults.
 



Re: Excluding results from more like this

2011-03-09 Thread Jonathan Rochkind
Yeah, that just restricts what items are in your main result set (and 
adding -4 has no real effect).


The more like this set is constructed based on your main result set, for 
each document in it.


As far as I can see from here: http://wiki.apache.org/solr/MoreLikeThis

..there seems to be no built-in way to customize the 'more like this' 
results in the way you want, excluding certain document id's.  I don't 
entirely understand what mlt.boost  does, but I don't think it does 
anything useful for this case.


So, if that's so,  you are out of luck, unless you want to write Java 
code. In which case you could try customizing or adding that feature to 
the MoreLikeThis search component, and either suggest your new code back 
as a patch, or just use your own customized version of MoreLikeThis.


On 3/9/2011 4:29 PM, Brian Lamb wrote:

That doesn't seem to do it. Record 4 is still showing up in the MoreLikeThis
results.

On Wed, Mar 9, 2011 at 4:12 PM, Otis Gospodneticotis_gospodne...@yahoo.com

wrote:
Brian,

...?q=id:(2  3 5) -4


Otis
---
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/



- Original Message 

From: Brian Lambbrian.l...@journalexperts.com
To: solr-user@lucene.apache.org
Sent: Wed, March 9, 2011 4:05:10 PM
Subject: Excluding results from more like this

Hi all,

I'm using MoreLikeThis to find similar results but I'd like to  exclude
records by the id number. For example, I use the following  URL:

http://localhost:8983/solr/search/?q=id:(2  3
5)mlt=truemlt.fl=description,idfl=*,score

How would I  exclude record 4 form the MoreLikeThis results?

I tried,

http://localhost:8983/solr/search/?q=id:(2  3
5)mlt=truemlt.fl=description,idfl=*,scoremlt.q=!4

But  that still returned record 4 in the MoreLikeThisResults.



more like this

2011-02-11 Thread lee carroll
Hi a MLT query with a q parameter which returns multiple matches such as

q=id:45 id:34
id:54mlt.fl=filed1mlt.mindf=1mlt.mintf=1mlt=truefl=id,name

seems to return the results of three seperate mlt queries ie

q=id:45 mlt.fl=filed1mlt.mindf=1mlt.mintf=1mlt=truefl=id,name
+
q=id:34 mlt.fl=filed1mlt.mindf=1mlt.mintf=1mlt=truefl=id,name
+
q=id:54mlt.fl=filed1mlt.mindf=1mlt.mintf=1mlt=truefl=id,name

rather than a combined similarity of all three

Is this becuase field1 is not storing term verctors ?

How best to achive a combined similarity mlt ?


More like this and terms positions

2010-10-04 Thread Xavier Schepler

Hi,

does the more like this search uses terms positions information in the 
score formula ?


Re: More like this and terms positions

2010-10-04 Thread Xavier Schepler

On 04/10/2010 16:40, Robert Muir wrote:

On Mon, Oct 4, 2010 at 10:16 AM, Xavier Schepler
xavier.schep...@sciences-po.fr  wrote:

   

Hi,

does the more like this search uses terms positions information in the
score formula ?

 

no, it would be nice if it did use them though (based upon query terms),
seems like it would yield improvements.

http://sifaka.cs.uiuc.edu/~ylv2/pub/sigir10-prm.pdf

   

maybe in a next solr version ?


SOLR-788 - disributed More Like This

2010-08-12 Thread Shawn Heisey
 I tried some time ago to use SOLR-788.  Ultimately I was able to get 
both patch versions to apply (separately), but neither worked.  The 
suggestion I received when I commented on the issue was to download the 
specific release mentioned in the patch and then update, but the patch 
was created before the merge with Lucene, so I have no idea how to go 
about that.


Without a much better understanding of Solr internals and a bunch more 
time to learn Java, there's no way that I can work on it myself.  Is 
there anyone who has the time and inclination to get distributed MLT 
working with branch_3x?  A further goal would be to have it actually 
committed before release.


Thanks,
Shawn



More like this - setting a minimum number of terms used to build queries

2010-03-29 Thread Xavier Schepler

Hey,

Is there a way to make the  more like this feature build its queries 
from a minimum number of interesting terms ?

It looks like this component fires query with only 1 term in them.
I got a lot of results that aren't similar at all with the parsed 
document fields.


My parameters :
mlt.fl=question,mlt.mintf=1mlt.mindf=mlt.minwl=4

The question field contains between 15 and 50 terms.

Xavier S.


  1   2   >