How to Use PageRank like Document Boosting at Solr?
I use Nutch to index my documents. I have a Nutch aware schema at my Solr and there is a field like that: field name=boost type=float stored=true indexed=false/ boost holds the epic score of my documents (similar to Google's pagerank). How can I boost my queries at Solr side?I followed wiki and tried that: q={!boost b=boost}text:supervillians and it says: can not use FieldCache on a field which is neither indexed nor has doc values: boost However there should be a convenient solution for my purpose. Instead of adding something to search query maybe I boost document with a different way while indexing, what do you suggest for me?
Re: How to Use PageRank like Document Boosting at Solr?
Seems like your boost field needs to be indexed. On Jun 12, 2013 3:49 AM, Furkan KAMACI furkankam...@gmail.com wrote: I use Nutch to index my documents. I have a Nutch aware schema at my Solr and there is a field like that: field name=boost type=float stored=true indexed=false/ boost holds the epic score of my documents (similar to Google's pagerank). How can I boost my queries at Solr side?I followed wiki and tried that: q={!boost b=boost}text:supervillians and it says: can not use FieldCache on a field which is neither indexed nor has doc values: boost However there should be a convenient solution for my purpose. Instead of adding something to search query maybe I boost document with a different way while indexing, what do you suggest for me?
Re: pagerank??
According to my knowledge, Solr cannot support this. In my case, I get data by keyword-matching from Solr and then rank the data by PageRank after that. Thanks, Bing On Wed, Apr 4, 2012 at 6:37 AM, Manuel Antonio Novoa Proenza mano...@estudiantes.uci.cu wrote: Hello, I have in my Solr index , many indexed documents. Let me know any way or efficient function to calculate the page rank of websites indexed. s 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci
Re: pagerank??
You might want to look into Nutch and its LinkRank instead of Solr for this. For obtaining such information, you need a crawler to crawl through the links. Not what Solr is meant for. Rav On Wed, Apr 4, 2012 at 8:46 AM, Bing Li lbl...@gmail.com wrote: According to my knowledge, Solr cannot support this. In my case, I get data by keyword-matching from Solr and then rank the data by PageRank after that. Thanks, Bing On Wed, Apr 4, 2012 at 6:37 AM, Manuel Antonio Novoa Proenza mano...@estudiantes.uci.cu wrote: Hello, I have in my Solr index , many indexed documents. Let me know any way or efficient function to calculate the page rank of websites indexed. s 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci
Re: PageRank
hi Rav Thank you for your answer. In my case I use nutch for crawling the web. Using nutch am a true rookie. How do I configure nutch to return that information? And how do I make solr to index that information, or that information is being built with the score of the indexed documents. thank you very much Saludos... Manuel Antonio Novoa Proenza Universidad de las Ciencias Informáticas Email: mano...@estudiantes.uci.cu 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci
Re: PageRank
Hi, Please subscribe to the Nutch mailing list. Scoring is straightforward and calculated scores can be written to the CrawlDB or as external file field for Solr. Cheers On Wed, 04 Apr 2012 10:22:46 -0500 (COT), Manuel Antonio Novoa Proenza mano...@estudiantes.uci.cu wrote: hi Rav Thank you for your answer. In my case I use nutch for crawling the web. Using nutch am a true rookie. How do I configure nutch to return that information? And how do I make solr to index that information, or that information is being built with the score of the indexed documents. thank you very much Saludos... Manuel Antonio Novoa Proenza Universidad de las Ciencias Informáticas Email: mano...@estudiantes.uci.cu 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci -- Markus Jelsma - CTO - Openindex http://www.linkedin.com/in/markus17 050-8536600 / 06-50258350
pagerank??
Hello, I have in my Solr index , many indexed documents. Let me know any way or efficient function to calculate the page rank of websites indexed. s 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci
Re: How to Sort By a PageRank-Like Complicated Strategy?
As I learned, big data, such as Lucene index, was not suitable to be updated frequently. Some people use ExternalFileField for PageRank-like fields. http://lucidworks.lucidimagination.com/display/solr/Solr+Field+Types#SolrFieldTypes-WorkingwithExternalFiles Lucene supports parent/child documents, may be that can be used too. http://blog.mikemccandless.com/2012/01/tochildblockjoinquery-in-lucene.html
Re: How to Sort By a PageRank-Like Complicated Strategy?
Dear Shashi, As I learned, big data, such as Lucene index, was not suitable to be updated frequently. Frequent updating must affect the performance and consistency when Lucene index must be replicated in a large scale cluster. It is expected such a search engine must work in a write-once read-many environment, right? That's what HDFS (Hadoop Distributed File System) provides. According to my experience, it is really slow when updating a Lucene Index. Why did you say I could update Lucene index frequently? Thanks so much! Bing On Mon, Jan 23, 2012 at 11:02 PM, Shashi Kant sk...@sloan.mit.edu wrote: You can update the document in the index quite frequently. IDNK what your requirement is, another option would be to boost query time. On Sun, Jan 22, 2012 at 5:51 AM, Bing Li lbl...@gmail.com wrote: Dear Shashi, Thanks so much for your reply! However, I think the value of PageRank is not a static one. It must update on the fly. As I know, Lucene index is not suitable to be updated too frequently. If so, how to deal with that? Best regards, Bing On Sun, Jan 22, 2012 at 12:43 PM, Shashi Kant sk...@sloan.mit.edu wrote: Lucene has a mechanism to boost up/down documents using your custom ranking algorithm. So if you come up with something like Pagerank you might do something like doc.SetBoost(myboost), before writing to index. On Sat, Jan 21, 2012 at 5:07 PM, Bing Li lbl...@gmail.com wrote: Hi, Kai, Thanks so much for your reply! If the retrieving is done on a string field, not a text field, a complete matching approach should be used according to my understanding, right? If so, how does Lucene rank the retrieved data? Best regards, Bing On Sun, Jan 22, 2012 at 5:56 AM, Kai Lu lukai1...@gmail.com wrote: Solr is kind of retrieval step, you can customize the score formula in Lucene. But it supposes not to be too complicated, like it's better can be factorization. It also regards to the stored information, like TF,DF,position, etc. You can do 2nd phase rerank to the top N data you have got. Sent from my iPad On Jan 21, 2012, at 1:33 PM, Bing Li lbl...@gmail.com wrote: Dear all, I am using SolrJ to implement a system that needs to provide users with searching services. I have some questions about Solr searching as follows. As I know, Lucene retrieves data according to the degree of keyword matching on text field (partial matching). But, if I search data by string field (complete matching), how does Lucene sort the retrieved data? If I want to add new sorting ways, Solr's function query seems to support this feature. However, for a complicated ranking strategy, such PageRank, can Solr provide an interface for me to do that? My ranking ways are more complicated than PageRank. Now I have to load all of matched data from Solr first by keyword and rank them again in my ways before showing to users. It is correct? Thanks so much! Bing
Re: How to Sort By a PageRank-Like Complicated Strategy?
You can update the document in the index quite frequently. IDNK what your requirement is, another option would be to boost query time. On Sun, Jan 22, 2012 at 5:51 AM, Bing Li lbl...@gmail.com wrote: Dear Shashi, Thanks so much for your reply! However, I think the value of PageRank is not a static one. It must update on the fly. As I know, Lucene index is not suitable to be updated too frequently. If so, how to deal with that? Best regards, Bing On Sun, Jan 22, 2012 at 12:43 PM, Shashi Kant sk...@sloan.mit.edu wrote: Lucene has a mechanism to boost up/down documents using your custom ranking algorithm. So if you come up with something like Pagerank you might do something like doc.SetBoost(myboost), before writing to index. On Sat, Jan 21, 2012 at 5:07 PM, Bing Li lbl...@gmail.com wrote: Hi, Kai, Thanks so much for your reply! If the retrieving is done on a string field, not a text field, a complete matching approach should be used according to my understanding, right? If so, how does Lucene rank the retrieved data? Best regards, Bing On Sun, Jan 22, 2012 at 5:56 AM, Kai Lu lukai1...@gmail.com wrote: Solr is kind of retrieval step, you can customize the score formula in Lucene. But it supposes not to be too complicated, like it's better can be factorization. It also regards to the stored information, like TF,DF,position, etc. You can do 2nd phase rerank to the top N data you have got. Sent from my iPad On Jan 21, 2012, at 1:33 PM, Bing Li lbl...@gmail.com wrote: Dear all, I am using SolrJ to implement a system that needs to provide users with searching services. I have some questions about Solr searching as follows. As I know, Lucene retrieves data according to the degree of keyword matching on text field (partial matching). But, if I search data by string field (complete matching), how does Lucene sort the retrieved data? If I want to add new sorting ways, Solr's function query seems to support this feature. However, for a complicated ranking strategy, such PageRank, can Solr provide an interface for me to do that? My ranking ways are more complicated than PageRank. Now I have to load all of matched data from Solr first by keyword and rank them again in my ways before showing to users. It is correct? Thanks so much! Bing
Re: How to Sort By a PageRank-Like Complicated Strategy?
Dear Shashi, Thanks so much for your reply! However, I think the value of PageRank is not a static one. It must update on the fly. As I know, Lucene index is not suitable to be updated too frequently. If so, how to deal with that? Best regards, Bing On Sun, Jan 22, 2012 at 12:43 PM, Shashi Kant sk...@sloan.mit.edu wrote: Lucene has a mechanism to boost up/down documents using your custom ranking algorithm. So if you come up with something like Pagerank you might do something like doc.SetBoost(myboost), before writing to index. On Sat, Jan 21, 2012 at 5:07 PM, Bing Li lbl...@gmail.com wrote: Hi, Kai, Thanks so much for your reply! If the retrieving is done on a string field, not a text field, a complete matching approach should be used according to my understanding, right? If so, how does Lucene rank the retrieved data? Best regards, Bing On Sun, Jan 22, 2012 at 5:56 AM, Kai Lu lukai1...@gmail.com wrote: Solr is kind of retrieval step, you can customize the score formula in Lucene. But it supposes not to be too complicated, like it's better can be factorization. It also regards to the stored information, like TF,DF,position, etc. You can do 2nd phase rerank to the top N data you have got. Sent from my iPad On Jan 21, 2012, at 1:33 PM, Bing Li lbl...@gmail.com wrote: Dear all, I am using SolrJ to implement a system that needs to provide users with searching services. I have some questions about Solr searching as follows. As I know, Lucene retrieves data according to the degree of keyword matching on text field (partial matching). But, if I search data by string field (complete matching), how does Lucene sort the retrieved data? If I want to add new sorting ways, Solr's function query seems to support this feature. However, for a complicated ranking strategy, such PageRank, can Solr provide an interface for me to do that? My ranking ways are more complicated than PageRank. Now I have to load all of matched data from Solr first by keyword and rank them again in my ways before showing to users. It is correct? Thanks so much! Bing
How to Sort By a PageRank-Like Complicated Strategy?
Dear all, I am using SolrJ to implement a system that needs to provide users with searching services. I have some questions about Solr searching as follows. As I know, Lucene retrieves data according to the degree of keyword matching on text field (partial matching). But, if I search data by string field (complete matching), how does Lucene sort the retrieved data? If I want to add new sorting ways, Solr's function query seems to support this feature. However, for a complicated ranking strategy, such PageRank, can Solr provide an interface for me to do that? My ranking ways are more complicated than PageRank. Now I have to load all of matched data from Solr first by keyword and rank them again in my ways before showing to users. It is correct? Thanks so much! Bing
Re: How to Sort By a PageRank-Like Complicated Strategy?
Solr is kind of retrieval step, you can customize the score formula in Lucene. But it supposes not to be too complicated, like it's better can be factorization. It also regards to the stored information, like TF,DF,position, etc. You can do 2nd phase rerank to the top N data you have got. Sent from my iPad On Jan 21, 2012, at 1:33 PM, Bing Li lbl...@gmail.com wrote: Dear all, I am using SolrJ to implement a system that needs to provide users with searching services. I have some questions about Solr searching as follows. As I know, Lucene retrieves data according to the degree of keyword matching on text field (partial matching). But, if I search data by string field (complete matching), how does Lucene sort the retrieved data? If I want to add new sorting ways, Solr's function query seems to support this feature. However, for a complicated ranking strategy, such PageRank, can Solr provide an interface for me to do that? My ranking ways are more complicated than PageRank. Now I have to load all of matched data from Solr first by keyword and rank them again in my ways before showing to users. It is correct? Thanks so much! Bing
Re: How to Sort By a PageRank-Like Complicated Strategy?
Hi, Kai, Thanks so much for your reply! If the retrieving is done on a string field, not a text field, a complete matching approach should be used according to my understanding, right? If so, how does Lucene rank the retrieved data? Best regards, Bing On Sun, Jan 22, 2012 at 5:56 AM, Kai Lu lukai1...@gmail.com wrote: Solr is kind of retrieval step, you can customize the score formula in Lucene. But it supposes not to be too complicated, like it's better can be factorization. It also regards to the stored information, like TF,DF,position, etc. You can do 2nd phase rerank to the top N data you have got. Sent from my iPad On Jan 21, 2012, at 1:33 PM, Bing Li lbl...@gmail.com wrote: Dear all, I am using SolrJ to implement a system that needs to provide users with searching services. I have some questions about Solr searching as follows. As I know, Lucene retrieves data according to the degree of keyword matching on text field (partial matching). But, if I search data by string field (complete matching), how does Lucene sort the retrieved data? If I want to add new sorting ways, Solr's function query seems to support this feature. However, for a complicated ranking strategy, such PageRank, can Solr provide an interface for me to do that? My ranking ways are more complicated than PageRank. Now I have to load all of matched data from Solr first by keyword and rank them again in my ways before showing to users. It is correct? Thanks so much! Bing
Re: How to Sort By a PageRank-Like Complicated Strategy?
Lucene has a mechanism to boost up/down documents using your custom ranking algorithm. So if you come up with something like Pagerank you might do something like doc.SetBoost(myboost), before writing to index. On Sat, Jan 21, 2012 at 5:07 PM, Bing Li lbl...@gmail.com wrote: Hi, Kai, Thanks so much for your reply! If the retrieving is done on a string field, not a text field, a complete matching approach should be used according to my understanding, right? If so, how does Lucene rank the retrieved data? Best regards, Bing On Sun, Jan 22, 2012 at 5:56 AM, Kai Lu lukai1...@gmail.com wrote: Solr is kind of retrieval step, you can customize the score formula in Lucene. But it supposes not to be too complicated, like it's better can be factorization. It also regards to the stored information, like TF,DF,position, etc. You can do 2nd phase rerank to the top N data you have got. Sent from my iPad On Jan 21, 2012, at 1:33 PM, Bing Li lbl...@gmail.com wrote: Dear all, I am using SolrJ to implement a system that needs to provide users with searching services. I have some questions about Solr searching as follows. As I know, Lucene retrieves data according to the degree of keyword matching on text field (partial matching). But, if I search data by string field (complete matching), how does Lucene sort the retrieved data? If I want to add new sorting ways, Solr's function query seems to support this feature. However, for a complicated ranking strategy, such PageRank, can Solr provide an interface for me to do that? My ranking ways are more complicated than PageRank. Now I have to load all of matched data from Solr first by keyword and rank them again in my ways before showing to users. It is correct? Thanks so much! Bing
Re: PageRank sort
How often are you updating the rank? You might also be able to keep the rank info in a flat file via the ExternalFileField and the FileFloatSource and do FunctionQuery stuff that way. However, I don't know how that handles refreshing data or if it would be efficient in your case. On Apr 24, 2009, at 1:52 AM, Marcus Herou wrote: Hi. I've posted before but here it goes again: I have BlogData data which is more or less 100% static but one field is not - the PageRank. I would like to sort on that field and on the Lucene list I got these answers. 1. Use two indexes and a ParallellReader 2. Use a FieldScoreQuery containing the PageRank field. 3. Use a CustomScoreQuery which uses the FieldScoreQuery combined with other Queries (the actual search). I think I could use this pattern as well: 1. Use two indexes and a ParallellReader 2. Normal search and Sort on the PageRank column (perhaps consuming more memory) Anyone have an idea of howto implement these patterns in SOLR ? I have never extended SOLR but am not afraid of doing so if someone pushes me in the right direction. Kindly //Marcus -- Marcus Herou CTO and co-founder Tailsweep AB +46702561312 marcus.he...@tailsweep.com http://www.tailsweep.com/ http://blogg.tailsweep.com/ -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Re: PageRank sort
Hi. Comments inline. On Fri, Apr 24, 2009 at 1:00 PM, Grant Ingersoll gsing...@apache.orgwrote: How often are you updating the rank? The goal is to optimize the pagerank calculating algorithm so can have continuous updates (1 blogs at a time 24/7) but more likely we'll end up refreshing the index once a weeks or so (hopefully each night). You might also be able to keep the rank info in a flat file via the ExternalFileField and the FileFloatSource and do FunctionQuery stuff that way. However, I don't know how that handles refreshing data or if it would be efficient in your case. Great! That seems like something that could work. Depends on how that field get's re-read/indexed I guess. Or is it used at query time solely ? I feel that googling ExternalFileField does not really give the meat I need to narrow this down. Any pointers and/or pseudo code ? On Apr 24, 2009, at 1:52 AM, Marcus Herou wrote: Hi. I've posted before but here it goes again: I have BlogData data which is more or less 100% static but one field is not - the PageRank. I would like to sort on that field and on the Lucene list I got these answers. 1. Use two indexes and a ParallellReader 2. Use a FieldScoreQuery containing the PageRank field. 3. Use a CustomScoreQuery which uses the FieldScoreQuery combined with other Queries (the actual search). I think I could use this pattern as well: 1. Use two indexes and a ParallellReader 2. Normal search and Sort on the PageRank column (perhaps consuming more memory) Anyone have an idea of howto implement these patterns in SOLR ? I have never extended SOLR but am not afraid of doing so if someone pushes me in the right direction. Kindly //Marcus -- Marcus Herou CTO and co-founder Tailsweep AB +46702561312 marcus.he...@tailsweep.com http://www.tailsweep.com/ http://blogg.tailsweep.com/ -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search It seems to be a generic issue with Lucene since it is not really built in the way that one would plugin an external scoring mechanism (very fast internal one instead) but hopefully I'll sort this one out. Thanks for the reply, really apprecciated. Kindly //Marcus -- Marcus Herou CTO and co-founder Tailsweep AB +46702561312 marcus.he...@tailsweep.com http://www.tailsweep.com/ http://blogg.tailsweep.com/
Re: PageRank sort
On Fri, Apr 24, 2009 at 1:39 PM, Marcus Herou marcus.he...@tailsweep.com wrote: Great! That seems like something that could work. Depends on how that field get's re-read/indexed I guess. http://lucene.apache.org/solr/api/org/apache/solr/schema/ExternalFileField.html It's a separate *text* file that just contains id/value pairs for a field. Calculate your custom score and save it to that file. Then call commit so the file is re-read and all your scores will be updated (and usable in a function query). So the short answer is, you should be able to do what you want with no Solr customization in Java... it's all built in. -Yonik http://www.lucidimagination.com
Re: PageRank sort
And I published the setup here: http://dev.tailsweep.com/solr-external-scoring/en/ /M On Sat, Apr 25, 2009 at 12:01 AM, Marcus Herou marcus.he...@tailsweep.comwrote: Works like a charm! Thank you sir. //Marcus On Fri, Apr 24, 2009 at 11:01 PM, Marcus Herou marcus.he...@tailsweep.com wrote: That is fantastic, I am creating a really small index right now trying to figure out howto implement the FunctionQuery for this. //Marcus On Fri, Apr 24, 2009 at 10:55 PM, Yonik Seeley yo...@lucidimagination.com wrote: On Fri, Apr 24, 2009 at 1:39 PM, Marcus Herou marcus.he...@tailsweep.com wrote: Great! That seems like something that could work. Depends on how that field get's re-read/indexed I guess. http://lucene.apache.org/solr/api/org/apache/solr/schema/ExternalFileField.html It's a separate *text* file that just contains id/value pairs for a field. Calculate your custom score and save it to that file. Then call commit so the file is re-read and all your scores will be updated (and usable in a function query). So the short answer is, you should be able to do what you want with no Solr customization in Java... it's all built in. -Yonik http://www.lucidimagination.com -- Marcus Herou CTO and co-founder Tailsweep AB +46702561312 marcus.he...@tailsweep.com http://www.tailsweep.com/ http://blogg.tailsweep.com/ -- Marcus Herou CTO and co-founder Tailsweep AB +46702561312 marcus.he...@tailsweep.com http://www.tailsweep.com/ http://blogg.tailsweep.com/ -- Marcus Herou CTO and co-founder Tailsweep AB +46702561312 marcus.he...@tailsweep.com http://www.tailsweep.com/ http://blogg.tailsweep.com/
Re: PageRank sort
Works like a charm! Thank you sir. //Marcus On Fri, Apr 24, 2009 at 11:01 PM, Marcus Herou marcus.he...@tailsweep.comwrote: That is fantastic, I am creating a really small index right now trying to figure out howto implement the FunctionQuery for this. //Marcus On Fri, Apr 24, 2009 at 10:55 PM, Yonik Seeley yo...@lucidimagination.com wrote: On Fri, Apr 24, 2009 at 1:39 PM, Marcus Herou marcus.he...@tailsweep.com wrote: Great! That seems like something that could work. Depends on how that field get's re-read/indexed I guess. http://lucene.apache.org/solr/api/org/apache/solr/schema/ExternalFileField.html It's a separate *text* file that just contains id/value pairs for a field. Calculate your custom score and save it to that file. Then call commit so the file is re-read and all your scores will be updated (and usable in a function query). So the short answer is, you should be able to do what you want with no Solr customization in Java... it's all built in. -Yonik http://www.lucidimagination.com -- Marcus Herou CTO and co-founder Tailsweep AB +46702561312 marcus.he...@tailsweep.com http://www.tailsweep.com/ http://blogg.tailsweep.com/ -- Marcus Herou CTO and co-founder Tailsweep AB +46702561312 marcus.he...@tailsweep.com http://www.tailsweep.com/ http://blogg.tailsweep.com/
Re: PageRank sort
Cool! GET ' http://127.0.0.1:8110/solr/test/select?indent=onstart=0rows=100q={!boostb=blogRank v=$qq}qq=title:solrdebugQuery=on' On Sat, Apr 25, 2009 at 12:43 AM, Marcus Herou marcus.he...@tailsweep.comwrote: That seems wise... PageRank * Text-based Scoring. So you mean in my stupid case that: GET ' http://127.0.0.1:8110/solr/test/select?indent=onstart=0rows=100q={!boosthttp://127.0.0.1:8110/solr/test/select?indent=onstart=0rows=100q=%7B%21boostb=blogRank v=$qq}qq=*:*' would yield the same results as: GET http://127.0.0.1:8110/solr/test/select?indent=onstart=0rows=100q=*:*_val_:\log(blogRank)\ since I have no text data but if I introduce a tokenized textfield (title). Example: add docfield name=blogId1/fieldfield name=titlesolr solr solr/field/doc docfield name=blogId2/fieldfield name=titlesolr/field/doc /add where blogId=1 had blogRank of 1 where blogId=2 had blogRank of 2 and if I searched for solr GET ' http://127.0.0.1:8110/solr/test/select?indent=onstart=0rows=100q={!boosthttp://127.0.0.1:8110/solr/test/select?indent=onstart=0rows=100q=%7B%21boostb=blogRank v=$qq}qq=title:solr' I might get blogId=1 as nr1 in the results even though it had lower blogRank due to the higher frequency of the term solr ? Did I understand this correctly ? //Marcus On Sat, Apr 25, 2009 at 12:07 AM, Yonik Seeley yo...@lucidimagination.com wrote: You probably want to mix the custom score with the normal relevancy score... to add, use a normal boolean query. To multiply, check out boosted query: http://lucene.apache.org/solr/api/org/apache/solr/search/BoostQParserPlugin.html For other options, use a more complex function query with the new query() capability (need to use 1.4 trunk for that though). Examples: q={!boost b=myScore v=$qq}qq=my normal lucene query OR for a dismax relevancy query, q={!boost b=myScore v=$qq}qq={!dismax qf=text_all pf=text_all}solr rocks If the {! type of syntax looks new, check out http://wiki.apache.org/solr/LocalParams powerful stuff! -Yonik http://www.lucidimagination.com -- Marcus Herou CTO and co-founder Tailsweep AB +46702561312 marcus.he...@tailsweep.com http://www.tailsweep.com/ http://blogg.tailsweep.com/ -- Marcus Herou CTO and co-founder Tailsweep AB +46702561312 marcus.he...@tailsweep.com http://www.tailsweep.com/ http://blogg.tailsweep.com/
Re: PageRank sort
double name=time7.0/double /lst /lst /lst /lst /response On Sat, Apr 25, 2009 at 12:49 AM, Marcus Herou marcus.he...@tailsweep.comwrote: Cool! GET ' http://127.0.0.1:8110/solr/test/select?indent=onstart=0rows=100q={!boosthttp://127.0.0.1:8110/solr/test/select?indent=onstart=0rows=100q=%7B%21boostb=blogRank v=$qq}qq=title:solrdebugQuery=on' On Sat, Apr 25, 2009 at 12:43 AM, Marcus Herou marcus.he...@tailsweep.com wrote: That seems wise... PageRank * Text-based Scoring. So you mean in my stupid case that: GET ' http://127.0.0.1:8110/solr/test/select?indent=onstart=0rows=100q={!boosthttp://127.0.0.1:8110/solr/test/select?indent=onstart=0rows=100q=%7B%21boostb=blogRank v=$qq}qq=*:*' would yield the same results as: GET http://127.0.0.1:8110/solr/test/select?indent=onstart=0rows=100q=*:*_val_:\log(blogRank)\ since I have no text data but if I introduce a tokenized textfield (title). Example: add docfield name=blogId1/fieldfield name=titlesolr solr solr/field/doc docfield name=blogId2/fieldfield name=titlesolr/field/doc /add where blogId=1 had blogRank of 1 where blogId=2 had blogRank of 2 and if I searched for solr GET ' http://127.0.0.1:8110/solr/test/select?indent=onstart=0rows=100q={!boosthttp://127.0.0.1:8110/solr/test/select?indent=onstart=0rows=100q=%7B%21boostb=blogRank v=$qq}qq=title:solr' I might get blogId=1 as nr1 in the results even though it had lower blogRank due to the higher frequency of the term solr ? Did I understand this correctly ? //Marcus On Sat, Apr 25, 2009 at 12:07 AM, Yonik Seeley yo...@lucidimagination.com wrote: You probably want to mix the custom score with the normal relevancy score... to add, use a normal boolean query. To multiply, check out boosted query: http://lucene.apache.org/solr/api/org/apache/solr/search/BoostQParserPlugin.html For other options, use a more complex function query with the new query() capability (need to use 1.4 trunk for that though). Examples: q={!boost b=myScore v=$qq}qq=my normal lucene query OR for a dismax relevancy query, q={!boost b=myScore v=$qq}qq={!dismax qf=text_all pf=text_all}solr rocks If the {! type of syntax looks new, check out http://wiki.apache.org/solr/LocalParams powerful stuff! -Yonik http://www.lucidimagination.com -- Marcus Herou CTO and co-founder Tailsweep AB +46702561312 marcus.he...@tailsweep.com http://www.tailsweep.com/ http://blogg.tailsweep.com/ -- Marcus Herou CTO and co-founder Tailsweep AB +46702561312 marcus.he...@tailsweep.com http://www.tailsweep.com/ http://blogg.tailsweep.com/ -- Marcus Herou CTO and co-founder Tailsweep AB +46702561312 marcus.he...@tailsweep.com http://www.tailsweep.com/ http://blogg.tailsweep.com/
PageRank sort
Hi. I've posted before but here it goes again: I have BlogData data which is more or less 100% static but one field is not - the PageRank. I would like to sort on that field and on the Lucene list I got these answers. 1. Use two indexes and a ParallellReader 2. Use a FieldScoreQuery containing the PageRank field. 3. Use a CustomScoreQuery which uses the FieldScoreQuery combined with other Queries (the actual search). I think I could use this pattern as well: 1. Use two indexes and a ParallellReader 2. Normal search and Sort on the PageRank column (perhaps consuming more memory) Anyone have an idea of howto implement these patterns in SOLR ? I have never extended SOLR but am not afraid of doing so if someone pushes me in the right direction. Kindly //Marcus -- Marcus Herou CTO and co-founder Tailsweep AB +46702561312 marcus.he...@tailsweep.com http://www.tailsweep.com/ http://blogg.tailsweep.com/