How to Use PageRank like Document Boosting at Solr?

2013-06-12 Thread Furkan KAMACI
I use Nutch to index my documents. I have a Nutch aware schema at my Solr
and there is a field like that:

field name=boost type=float stored=true indexed=false/

boost holds the epic score of my documents (similar to Google's pagerank).
How can I boost my queries at Solr side?I followed wiki and tried that:

q={!boost b=boost}text:supervillians

and it says:

can not use FieldCache on a field which is neither indexed nor has doc
values: boost

However there should be a convenient solution for my purpose. Instead of
adding something to search query maybe I boost document with a different
way while indexing, what do you suggest for me?


Re: How to Use PageRank like Document Boosting at Solr?

2013-06-12 Thread Michael Della Bitta
Seems like your boost field needs to be indexed.
On Jun 12, 2013 3:49 AM, Furkan KAMACI furkankam...@gmail.com wrote:

 I use Nutch to index my documents. I have a Nutch aware schema at my Solr
 and there is a field like that:

 field name=boost type=float stored=true indexed=false/

 boost holds the epic score of my documents (similar to Google's pagerank).
 How can I boost my queries at Solr side?I followed wiki and tried that:

 q={!boost b=boost}text:supervillians

 and it says:

 can not use FieldCache on a field which is neither indexed nor has doc
 values: boost

 However there should be a convenient solution for my purpose. Instead of
 adding something to search query maybe I boost document with a different
 way while indexing, what do you suggest for me?



Re: pagerank??

2012-04-04 Thread Bing Li
According to my knowledge, Solr cannot support this.

In my case, I get data by keyword-matching from Solr and then rank the data
by PageRank after that.

Thanks,
Bing

On Wed, Apr 4, 2012 at 6:37 AM, Manuel Antonio Novoa Proenza 
mano...@estudiantes.uci.cu wrote:

 Hello,

 I have in my Solr index , many indexed documents.

 Let me know any way or efficient function to calculate the page rank of
 websites indexed.


 s

 10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
 INFORMATICAS...
 CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

 http://www.uci.cu
 http://www.facebook.com/universidad.uci
 http://www.flickr.com/photos/universidad_uci


Re: pagerank??

2012-04-04 Thread Ravish Bhagdev
You might want to look into Nutch and its LinkRank instead of Solr for
this.  For obtaining such information, you need a crawler to crawl through
the links.  Not what Solr is meant for.

Rav

On Wed, Apr 4, 2012 at 8:46 AM, Bing Li lbl...@gmail.com wrote:

 According to my knowledge, Solr cannot support this.

 In my case, I get data by keyword-matching from Solr and then rank the data
 by PageRank after that.

 Thanks,
 Bing

 On Wed, Apr 4, 2012 at 6:37 AM, Manuel Antonio Novoa Proenza 
 mano...@estudiantes.uci.cu wrote:

  Hello,
 
  I have in my Solr index , many indexed documents.
 
  Let me know any way or efficient function to calculate the page rank of
  websites indexed.
 
 
  s
 
  10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
  INFORMATICAS...
  CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION
 
  http://www.uci.cu
  http://www.facebook.com/universidad.uci
  http://www.flickr.com/photos/universidad_uci



Re: PageRank

2012-04-04 Thread Manuel Antonio Novoa Proenza
hi Rav
Thank you for your answer.

In my case I use nutch for crawling the web. Using nutch am a true rookie. How 
do I configure nutch to return that information? And how do I make solr to 
index that information, or that information is being built with the score of 
the indexed documents.

thank you very much
















Saludos...














Manuel Antonio Novoa Proenza
Universidad de las Ciencias Informáticas
Email: mano...@estudiantes.uci.cu




10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci



Re: PageRank

2012-04-04 Thread Markus Jelsma

Hi,

Please subscribe to the Nutch mailing list. Scoring is straightforward 
and calculated scores can be written to the CrawlDB or as external file 
field for Solr.


Cheers

On Wed, 04 Apr 2012 10:22:46 -0500 (COT), Manuel Antonio Novoa Proenza 
mano...@estudiantes.uci.cu wrote:

hi Rav
Thank you for your answer.

In my case I use nutch for crawling the web. Using nutch am a true
rookie. How do I configure nutch to return that information? And how
do I make solr to index that information, or that information is 
being

built with the score of the indexed documents.

thank you very much
















Saludos...














Manuel Antonio Novoa Proenza
Universidad de las Ciencias Informáticas
Email: mano...@estudiantes.uci.cu




10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci


--
Markus Jelsma - CTO - Openindex
http://www.linkedin.com/in/markus17
050-8536600 / 06-50258350


pagerank??

2012-04-03 Thread Manuel Antonio Novoa Proenza
Hello, 

I have in my Solr index , many indexed documents. 

Let me know any way or efficient function to calculate the page rank of 
websites indexed. 


s 

10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci

Re: How to Sort By a PageRank-Like Complicated Strategy?

2012-01-29 Thread Ahmet Arslan
 As I learned, big data, such as Lucene index, was not
 suitable to be
 updated frequently. 

Some people use ExternalFileField for PageRank-like fields.

http://lucidworks.lucidimagination.com/display/solr/Solr+Field+Types#SolrFieldTypes-WorkingwithExternalFiles

Lucene supports parent/child documents, may be that can be used too.

http://blog.mikemccandless.com/2012/01/tochildblockjoinquery-in-lucene.html


Re: How to Sort By a PageRank-Like Complicated Strategy?

2012-01-28 Thread Bing Li
Dear Shashi,

As I learned, big data, such as Lucene index, was not suitable to be
updated frequently. Frequent updating must affect the performance and
consistency when Lucene index must be replicated in a large scale cluster.
It is expected such a search engine must work in a write-once  read-many
environment, right? That's what HDFS (Hadoop Distributed File System)
provides. According to my experience, it is really slow when updating a
Lucene Index.

Why did you say I could update Lucene index frequently?

Thanks so much!
Bing

On Mon, Jan 23, 2012 at 11:02 PM, Shashi Kant sk...@sloan.mit.edu wrote:

 You can update the document in the index quite frequently. IDNK what
 your requirement is, another option would be to boost query time.

 On Sun, Jan 22, 2012 at 5:51 AM, Bing Li lbl...@gmail.com wrote:
  Dear Shashi,
 
  Thanks so much for your reply!
 
  However, I think the value of PageRank is not a static one. It must
 update
  on the fly. As I know, Lucene index is not suitable to be updated too
  frequently. If so, how to deal with that?
 
  Best regards,
  Bing
 
 
  On Sun, Jan 22, 2012 at 12:43 PM, Shashi Kant sk...@sloan.mit.edu
 wrote:
 
  Lucene has a mechanism to boost up/down documents using your custom
  ranking algorithm. So if you come up with something like Pagerank
  you might do something like doc.SetBoost(myboost), before writing to
  index.
 
 
 
  On Sat, Jan 21, 2012 at 5:07 PM, Bing Li lbl...@gmail.com wrote:
   Hi, Kai,
  
   Thanks so much for your reply!
  
   If the retrieving is done on a string field, not a text field, a
   complete
   matching approach should be used according to my understanding, right?
   If
   so, how does Lucene rank the retrieved data?
  
   Best regards,
   Bing
  
   On Sun, Jan 22, 2012 at 5:56 AM, Kai Lu lukai1...@gmail.com wrote:
  
   Solr is kind of retrieval step, you can customize the score formula
 in
   Lucene. But it supposes not to be too complicated, like it's better
 can
   be
   factorization. It also regards to the stored information, like
   TF,DF,position, etc. You can do 2nd phase rerank to the top N data
 you
   have
   got.
  
   Sent from my iPad
  
   On Jan 21, 2012, at 1:33 PM, Bing Li lbl...@gmail.com wrote:
  
Dear all,
   
I am using SolrJ to implement a system that needs to provide users
with
searching services. I have some questions about Solr searching as
   follows.
   
As I know, Lucene retrieves data according to the degree of keyword
matching on text field (partial matching).
   
But, if I search data by string field (complete matching), how does
   Lucene
sort the retrieved data?
   
If I want to add new sorting ways, Solr's function query seems to
support
this feature.
   
However, for a complicated ranking strategy, such PageRank, can
 Solr
provide an interface for me to do that?
   
My ranking ways are more complicated than PageRank. Now I have to
load
   all
of matched data from Solr first by keyword and rank them again in
 my
ways
before showing to users. It is correct?
   
Thanks so much!
Bing
  
 
 



Re: How to Sort By a PageRank-Like Complicated Strategy?

2012-01-23 Thread Shashi Kant
You can update the document in the index quite frequently. IDNK what
your requirement is, another option would be to boost query time.

On Sun, Jan 22, 2012 at 5:51 AM, Bing Li lbl...@gmail.com wrote:
 Dear Shashi,

 Thanks so much for your reply!

 However, I think the value of PageRank is not a static one. It must update
 on the fly. As I know, Lucene index is not suitable to be updated too
 frequently. If so, how to deal with that?

 Best regards,
 Bing


 On Sun, Jan 22, 2012 at 12:43 PM, Shashi Kant sk...@sloan.mit.edu wrote:

 Lucene has a mechanism to boost up/down documents using your custom
 ranking algorithm. So if you come up with something like Pagerank
 you might do something like doc.SetBoost(myboost), before writing to
 index.



 On Sat, Jan 21, 2012 at 5:07 PM, Bing Li lbl...@gmail.com wrote:
  Hi, Kai,
 
  Thanks so much for your reply!
 
  If the retrieving is done on a string field, not a text field, a
  complete
  matching approach should be used according to my understanding, right?
  If
  so, how does Lucene rank the retrieved data?
 
  Best regards,
  Bing
 
  On Sun, Jan 22, 2012 at 5:56 AM, Kai Lu lukai1...@gmail.com wrote:
 
  Solr is kind of retrieval step, you can customize the score formula in
  Lucene. But it supposes not to be too complicated, like it's better can
  be
  factorization. It also regards to the stored information, like
  TF,DF,position, etc. You can do 2nd phase rerank to the top N data you
  have
  got.
 
  Sent from my iPad
 
  On Jan 21, 2012, at 1:33 PM, Bing Li lbl...@gmail.com wrote:
 
   Dear all,
  
   I am using SolrJ to implement a system that needs to provide users
   with
   searching services. I have some questions about Solr searching as
  follows.
  
   As I know, Lucene retrieves data according to the degree of keyword
   matching on text field (partial matching).
  
   But, if I search data by string field (complete matching), how does
  Lucene
   sort the retrieved data?
  
   If I want to add new sorting ways, Solr's function query seems to
   support
   this feature.
  
   However, for a complicated ranking strategy, such PageRank, can Solr
   provide an interface for me to do that?
  
   My ranking ways are more complicated than PageRank. Now I have to
   load
  all
   of matched data from Solr first by keyword and rank them again in my
   ways
   before showing to users. It is correct?
  
   Thanks so much!
   Bing
 




Re: How to Sort By a PageRank-Like Complicated Strategy?

2012-01-22 Thread Bing Li
Dear Shashi,

Thanks so much for your reply!

However, I think the value of PageRank is not a static one. It must update
on the fly. As I know, Lucene index is not suitable to be updated too
frequently. If so, how to deal with that?

Best regards,
Bing


On Sun, Jan 22, 2012 at 12:43 PM, Shashi Kant sk...@sloan.mit.edu wrote:

 Lucene has a mechanism to boost up/down documents using your custom
 ranking algorithm. So if you come up with something like Pagerank
 you might do something like doc.SetBoost(myboost), before writing to index.



 On Sat, Jan 21, 2012 at 5:07 PM, Bing Li lbl...@gmail.com wrote:
  Hi, Kai,
 
  Thanks so much for your reply!
 
  If the retrieving is done on a string field, not a text field, a complete
  matching approach should be used according to my understanding, right? If
  so, how does Lucene rank the retrieved data?
 
  Best regards,
  Bing
 
  On Sun, Jan 22, 2012 at 5:56 AM, Kai Lu lukai1...@gmail.com wrote:
 
  Solr is kind of retrieval step, you can customize the score formula in
  Lucene. But it supposes not to be too complicated, like it's better can
 be
  factorization. It also regards to the stored information, like
  TF,DF,position, etc. You can do 2nd phase rerank to the top N data you
 have
  got.
 
  Sent from my iPad
 
  On Jan 21, 2012, at 1:33 PM, Bing Li lbl...@gmail.com wrote:
 
   Dear all,
  
   I am using SolrJ to implement a system that needs to provide users
 with
   searching services. I have some questions about Solr searching as
  follows.
  
   As I know, Lucene retrieves data according to the degree of keyword
   matching on text field (partial matching).
  
   But, if I search data by string field (complete matching), how does
  Lucene
   sort the retrieved data?
  
   If I want to add new sorting ways, Solr's function query seems to
 support
   this feature.
  
   However, for a complicated ranking strategy, such PageRank, can Solr
   provide an interface for me to do that?
  
   My ranking ways are more complicated than PageRank. Now I have to load
  all
   of matched data from Solr first by keyword and rank them again in my
 ways
   before showing to users. It is correct?
  
   Thanks so much!
   Bing
 



How to Sort By a PageRank-Like Complicated Strategy?

2012-01-21 Thread Bing Li
Dear all,

I am using SolrJ to implement a system that needs to provide users with
searching services. I have some questions about Solr searching as follows.

As I know, Lucene retrieves data according to the degree of keyword
matching on text field (partial matching).

But, if I search data by string field (complete matching), how does Lucene
sort the retrieved data?

If I want to add new sorting ways, Solr's function query seems to support
this feature.

However, for a complicated ranking strategy, such PageRank, can Solr
provide an interface for me to do that?

My ranking ways are more complicated than PageRank. Now I have to load all
of matched data from Solr first by keyword and rank them again in my ways
before showing to users. It is correct?

Thanks so much!
Bing


Re: How to Sort By a PageRank-Like Complicated Strategy?

2012-01-21 Thread Kai Lu
Solr is kind of retrieval step, you can customize the score formula in Lucene. 
But it supposes not to be too complicated, like it's better can be 
factorization. It also regards to the stored information, like TF,DF,position, 
etc. You can do 2nd phase rerank to the top N data you have got.

Sent from my iPad

On Jan 21, 2012, at 1:33 PM, Bing Li lbl...@gmail.com wrote:

 Dear all,
 
 I am using SolrJ to implement a system that needs to provide users with
 searching services. I have some questions about Solr searching as follows.
 
 As I know, Lucene retrieves data according to the degree of keyword
 matching on text field (partial matching).
 
 But, if I search data by string field (complete matching), how does Lucene
 sort the retrieved data?
 
 If I want to add new sorting ways, Solr's function query seems to support
 this feature.
 
 However, for a complicated ranking strategy, such PageRank, can Solr
 provide an interface for me to do that?
 
 My ranking ways are more complicated than PageRank. Now I have to load all
 of matched data from Solr first by keyword and rank them again in my ways
 before showing to users. It is correct?
 
 Thanks so much!
 Bing


Re: How to Sort By a PageRank-Like Complicated Strategy?

2012-01-21 Thread Bing Li
Hi, Kai,

Thanks so much for your reply!

If the retrieving is done on a string field, not a text field, a complete
matching approach should be used according to my understanding, right? If
so, how does Lucene rank the retrieved data?

Best regards,
Bing

On Sun, Jan 22, 2012 at 5:56 AM, Kai Lu lukai1...@gmail.com wrote:

 Solr is kind of retrieval step, you can customize the score formula in
 Lucene. But it supposes not to be too complicated, like it's better can be
 factorization. It also regards to the stored information, like
 TF,DF,position, etc. You can do 2nd phase rerank to the top N data you have
 got.

 Sent from my iPad

 On Jan 21, 2012, at 1:33 PM, Bing Li lbl...@gmail.com wrote:

  Dear all,
 
  I am using SolrJ to implement a system that needs to provide users with
  searching services. I have some questions about Solr searching as
 follows.
 
  As I know, Lucene retrieves data according to the degree of keyword
  matching on text field (partial matching).
 
  But, if I search data by string field (complete matching), how does
 Lucene
  sort the retrieved data?
 
  If I want to add new sorting ways, Solr's function query seems to support
  this feature.
 
  However, for a complicated ranking strategy, such PageRank, can Solr
  provide an interface for me to do that?
 
  My ranking ways are more complicated than PageRank. Now I have to load
 all
  of matched data from Solr first by keyword and rank them again in my ways
  before showing to users. It is correct?
 
  Thanks so much!
  Bing



Re: How to Sort By a PageRank-Like Complicated Strategy?

2012-01-21 Thread Shashi Kant
Lucene has a mechanism to boost up/down documents using your custom
ranking algorithm. So if you come up with something like Pagerank
you might do something like doc.SetBoost(myboost), before writing to index.



On Sat, Jan 21, 2012 at 5:07 PM, Bing Li lbl...@gmail.com wrote:
 Hi, Kai,

 Thanks so much for your reply!

 If the retrieving is done on a string field, not a text field, a complete
 matching approach should be used according to my understanding, right? If
 so, how does Lucene rank the retrieved data?

 Best regards,
 Bing

 On Sun, Jan 22, 2012 at 5:56 AM, Kai Lu lukai1...@gmail.com wrote:

 Solr is kind of retrieval step, you can customize the score formula in
 Lucene. But it supposes not to be too complicated, like it's better can be
 factorization. It also regards to the stored information, like
 TF,DF,position, etc. You can do 2nd phase rerank to the top N data you have
 got.

 Sent from my iPad

 On Jan 21, 2012, at 1:33 PM, Bing Li lbl...@gmail.com wrote:

  Dear all,
 
  I am using SolrJ to implement a system that needs to provide users with
  searching services. I have some questions about Solr searching as
 follows.
 
  As I know, Lucene retrieves data according to the degree of keyword
  matching on text field (partial matching).
 
  But, if I search data by string field (complete matching), how does
 Lucene
  sort the retrieved data?
 
  If I want to add new sorting ways, Solr's function query seems to support
  this feature.
 
  However, for a complicated ranking strategy, such PageRank, can Solr
  provide an interface for me to do that?
 
  My ranking ways are more complicated than PageRank. Now I have to load
 all
  of matched data from Solr first by keyword and rank them again in my ways
  before showing to users. It is correct?
 
  Thanks so much!
  Bing



Re: PageRank sort

2009-04-24 Thread Grant Ingersoll

How often are you updating the rank?

You might also be able to keep the rank info in a flat file via the  
ExternalFileField and the FileFloatSource and do FunctionQuery stuff  
that way.   However, I don't know how that handles refreshing data or  
if it would be efficient in your case.


On Apr 24, 2009, at 1:52 AM, Marcus Herou wrote:


Hi.

I've posted before but here it goes again:

I have BlogData data which is more or less 100% static but one field  
is not

- the PageRank.
I would like to sort on that field and on the Lucene list I got these
answers.

1. Use two indexes and a ParallellReader
2. Use a FieldScoreQuery containing the PageRank field.
3. Use a CustomScoreQuery which uses the FieldScoreQuery combined  
with other

Queries (the actual search).

I think I could use this pattern as well:
1. Use two indexes and a ParallellReader
2. Normal search and Sort on the PageRank column (perhaps consuming  
more

memory)

Anyone have an idea of howto implement these patterns in SOLR ?
I have never extended SOLR but am not afraid of doing so if someone  
pushes

me in the right direction.

Kindly

//Marcus




--
Marcus Herou CTO and co-founder Tailsweep AB
+46702561312
marcus.he...@tailsweep.com
http://www.tailsweep.com/
http://blogg.tailsweep.com/


--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search



Re: PageRank sort

2009-04-24 Thread Marcus Herou
Hi.

Comments inline.

On Fri, Apr 24, 2009 at 1:00 PM, Grant Ingersoll gsing...@apache.orgwrote:

 How often are you updating the rank?


The goal is to optimize the pagerank calculating algorithm so can have
continuous updates (1 blogs at a time 24/7) but more likely we'll end up
refreshing the index once a weeks or so (hopefully each night).




 You might also be able to keep the rank info in a flat file via the
 ExternalFileField and the FileFloatSource and do FunctionQuery stuff that
 way.   However, I don't know how that handles refreshing data or if it would
 be efficient in your case.


Great! That seems like something that could work. Depends on how that field
get's re-read/indexed I guess. Or is it used at query time solely ? I feel
that googling ExternalFileField does not really give the meat I need to
narrow this down. Any pointers and/or pseudo code ?



 On Apr 24, 2009, at 1:52 AM, Marcus Herou wrote:

  Hi.

 I've posted before but here it goes again:

 I have BlogData data which is more or less 100% static but one field is
 not
 - the PageRank.
 I would like to sort on that field and on the Lucene list I got these
 answers.

 1. Use two indexes and a ParallellReader
 2. Use a FieldScoreQuery containing the PageRank field.
 3. Use a CustomScoreQuery which uses the FieldScoreQuery combined with
 other
 Queries (the actual search).

 I think I could use this pattern as well:
 1. Use two indexes and a ParallellReader
 2. Normal search and Sort on the PageRank column (perhaps consuming more
 memory)

 Anyone have an idea of howto implement these patterns in SOLR ?
 I have never extended SOLR but am not afraid of doing so if someone pushes
 me in the right direction.

 Kindly

 //Marcus




 --
 Marcus Herou CTO and co-founder Tailsweep AB
 +46702561312
 marcus.he...@tailsweep.com
 http://www.tailsweep.com/
 http://blogg.tailsweep.com/


 --
 Grant Ingersoll
 http://www.lucidimagination.com/

 Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using
 Solr/Lucene:
 http://www.lucidimagination.com/search


It seems to be a generic issue with Lucene since it is not really built in
the way that one would plugin an external scoring mechanism (very fast
internal one instead) but hopefully I'll sort this one out.

Thanks for the reply, really apprecciated.

Kindly

//Marcus



-- 
Marcus Herou CTO and co-founder Tailsweep AB
+46702561312
marcus.he...@tailsweep.com
http://www.tailsweep.com/
http://blogg.tailsweep.com/


Re: PageRank sort

2009-04-24 Thread Yonik Seeley
On Fri, Apr 24, 2009 at 1:39 PM, Marcus Herou
marcus.he...@tailsweep.com wrote:
 Great! That seems like something that could work. Depends on how that field
 get's re-read/indexed I guess.

http://lucene.apache.org/solr/api/org/apache/solr/schema/ExternalFileField.html

It's a separate *text* file that just contains id/value pairs for a field.
Calculate your custom score and save it to that file.  Then call
commit so the file is re-read and all your scores will be updated (and
usable in a function query).

So the short answer is, you should be able to do what you want with no
Solr customization in Java... it's all built in.

-Yonik
http://www.lucidimagination.com


Re: PageRank sort

2009-04-24 Thread Marcus Herou
And I published the setup here:
http://dev.tailsweep.com/solr-external-scoring/en/

/M

On Sat, Apr 25, 2009 at 12:01 AM, Marcus Herou
marcus.he...@tailsweep.comwrote:

 Works like a charm!

 Thank you sir.

 //Marcus


 On Fri, Apr 24, 2009 at 11:01 PM, Marcus Herou marcus.he...@tailsweep.com
  wrote:

 That is fantastic, I am creating a really small index right now trying to
 figure out howto implement the FunctionQuery for this.

 //Marcus


 On Fri, Apr 24, 2009 at 10:55 PM, Yonik Seeley 
 yo...@lucidimagination.com wrote:

 On Fri, Apr 24, 2009 at 1:39 PM, Marcus Herou
 marcus.he...@tailsweep.com wrote:
  Great! That seems like something that could work. Depends on how that
 field
  get's re-read/indexed I guess.


 http://lucene.apache.org/solr/api/org/apache/solr/schema/ExternalFileField.html

 It's a separate *text* file that just contains id/value pairs for a
 field.
 Calculate your custom score and save it to that file.  Then call
 commit so the file is re-read and all your scores will be updated (and
 usable in a function query).

 So the short answer is, you should be able to do what you want with no
 Solr customization in Java... it's all built in.

 -Yonik
 http://www.lucidimagination.com




 --
 Marcus Herou CTO and co-founder Tailsweep AB
 +46702561312
 marcus.he...@tailsweep.com
 http://www.tailsweep.com/
 http://blogg.tailsweep.com/




 --
 Marcus Herou CTO and co-founder Tailsweep AB
 +46702561312
 marcus.he...@tailsweep.com
 http://www.tailsweep.com/
 http://blogg.tailsweep.com/




-- 
Marcus Herou CTO and co-founder Tailsweep AB
+46702561312
marcus.he...@tailsweep.com
http://www.tailsweep.com/
http://blogg.tailsweep.com/


Re: PageRank sort

2009-04-24 Thread Marcus Herou
Works like a charm!

Thank you sir.

//Marcus

On Fri, Apr 24, 2009 at 11:01 PM, Marcus Herou
marcus.he...@tailsweep.comwrote:

 That is fantastic, I am creating a really small index right now trying to
 figure out howto implement the FunctionQuery for this.

 //Marcus


 On Fri, Apr 24, 2009 at 10:55 PM, Yonik Seeley yo...@lucidimagination.com
  wrote:

 On Fri, Apr 24, 2009 at 1:39 PM, Marcus Herou
 marcus.he...@tailsweep.com wrote:
  Great! That seems like something that could work. Depends on how that
 field
  get's re-read/indexed I guess.


 http://lucene.apache.org/solr/api/org/apache/solr/schema/ExternalFileField.html

 It's a separate *text* file that just contains id/value pairs for a field.
 Calculate your custom score and save it to that file.  Then call
 commit so the file is re-read and all your scores will be updated (and
 usable in a function query).

 So the short answer is, you should be able to do what you want with no
 Solr customization in Java... it's all built in.

 -Yonik
 http://www.lucidimagination.com




 --
 Marcus Herou CTO and co-founder Tailsweep AB
 +46702561312
 marcus.he...@tailsweep.com
 http://www.tailsweep.com/
 http://blogg.tailsweep.com/




-- 
Marcus Herou CTO and co-founder Tailsweep AB
+46702561312
marcus.he...@tailsweep.com
http://www.tailsweep.com/
http://blogg.tailsweep.com/


Re: PageRank sort

2009-04-24 Thread Marcus Herou
Cool!

GET '
http://127.0.0.1:8110/solr/test/select?indent=onstart=0rows=100q={!boostb=blogRank
v=$qq}qq=title:solrdebugQuery=on'

On Sat, Apr 25, 2009 at 12:43 AM, Marcus Herou
marcus.he...@tailsweep.comwrote:

 That seems wise... PageRank * Text-based Scoring.

 So you mean in my stupid case that:
 GET '
 http://127.0.0.1:8110/solr/test/select?indent=onstart=0rows=100q={!boosthttp://127.0.0.1:8110/solr/test/select?indent=onstart=0rows=100q=%7B%21boostb=blogRank
  v=$qq}qq=*:*'
 would yield the same results as:
 GET 
 http://127.0.0.1:8110/solr/test/select?indent=onstart=0rows=100q=*:*_val_:\log(blogRank)\

 since I have no text data

 but if I introduce a tokenized textfield (title).
 Example:

 add
 docfield name=blogId1/fieldfield name=titlesolr solr 
 solr/field/doc

 docfield name=blogId2/fieldfield name=titlesolr/field/doc
 /add

 where blogId=1 had blogRank of 1
 where blogId=2 had blogRank of 2
 and if I searched for solr
 GET '
 http://127.0.0.1:8110/solr/test/select?indent=onstart=0rows=100q={!boosthttp://127.0.0.1:8110/solr/test/select?indent=onstart=0rows=100q=%7B%21boostb=blogRank
  v=$qq}qq=title:solr'

 I might get blogId=1 as nr1 in the results even though it had lower
 blogRank due to the higher frequency of the term solr ?

 Did I understand this correctly ?

 //Marcus



 On Sat, Apr 25, 2009 at 12:07 AM, Yonik Seeley yo...@lucidimagination.com
  wrote:

 You probably want to mix the custom score with the normal relevancy
 score... to add, use a normal boolean query.  To multiply, check out
 boosted query:

 http://lucene.apache.org/solr/api/org/apache/solr/search/BoostQParserPlugin.html

 For other options, use a more complex function query with the new
 query() capability (need to use 1.4 trunk for that though).

 Examples:

 q={!boost b=myScore v=$qq}qq=my normal lucene query
  OR for a dismax relevancy query,
 q={!boost b=myScore v=$qq}qq={!dismax qf=text_all pf=text_all}solr rocks

 If the {! type of syntax looks new, check out
 http://wiki.apache.org/solr/LocalParams
 powerful stuff!

 -Yonik
 http://www.lucidimagination.com




 --
 Marcus Herou CTO and co-founder Tailsweep AB
 +46702561312
 marcus.he...@tailsweep.com
 http://www.tailsweep.com/
 http://blogg.tailsweep.com/




-- 
Marcus Herou CTO and co-founder Tailsweep AB
+46702561312
marcus.he...@tailsweep.com
http://www.tailsweep.com/
http://blogg.tailsweep.com/


Re: PageRank sort

2009-04-24 Thread Marcus Herou
 double name=time7.0/double
/lst
  /lst
 /lst
/lst
/response


On Sat, Apr 25, 2009 at 12:49 AM, Marcus Herou
marcus.he...@tailsweep.comwrote:

 Cool!

 GET '
 http://127.0.0.1:8110/solr/test/select?indent=onstart=0rows=100q={!boosthttp://127.0.0.1:8110/solr/test/select?indent=onstart=0rows=100q=%7B%21boostb=blogRank
  v=$qq}qq=title:solrdebugQuery=on'

 On Sat, Apr 25, 2009 at 12:43 AM, Marcus Herou marcus.he...@tailsweep.com
  wrote:

 That seems wise... PageRank * Text-based Scoring.

 So you mean in my stupid case that:
 GET '
 http://127.0.0.1:8110/solr/test/select?indent=onstart=0rows=100q={!boosthttp://127.0.0.1:8110/solr/test/select?indent=onstart=0rows=100q=%7B%21boostb=blogRank
  v=$qq}qq=*:*'
 would yield the same results as:
 GET 
 http://127.0.0.1:8110/solr/test/select?indent=onstart=0rows=100q=*:*_val_:\log(blogRank)\

 since I have no text data

 but if I introduce a tokenized textfield (title).
 Example:

 add
 docfield name=blogId1/fieldfield name=titlesolr solr 
 solr/field/doc


 docfield name=blogId2/fieldfield name=titlesolr/field/doc
 /add

 where blogId=1 had blogRank of 1
 where blogId=2 had blogRank of 2
 and if I searched for solr
 GET '
 http://127.0.0.1:8110/solr/test/select?indent=onstart=0rows=100q={!boosthttp://127.0.0.1:8110/solr/test/select?indent=onstart=0rows=100q=%7B%21boostb=blogRank
  v=$qq}qq=title:solr'

 I might get blogId=1 as nr1 in the results even though it had lower
 blogRank due to the higher frequency of the term solr ?

 Did I understand this correctly ?

 //Marcus



 On Sat, Apr 25, 2009 at 12:07 AM, Yonik Seeley 
 yo...@lucidimagination.com wrote:

 You probably want to mix the custom score with the normal relevancy
 score... to add, use a normal boolean query.  To multiply, check out
 boosted query:

 http://lucene.apache.org/solr/api/org/apache/solr/search/BoostQParserPlugin.html

 For other options, use a more complex function query with the new
 query() capability (need to use 1.4 trunk for that though).

 Examples:

 q={!boost b=myScore v=$qq}qq=my normal lucene query
  OR for a dismax relevancy query,
 q={!boost b=myScore v=$qq}qq={!dismax qf=text_all pf=text_all}solr
 rocks

 If the {! type of syntax looks new, check out
 http://wiki.apache.org/solr/LocalParams
 powerful stuff!

 -Yonik
 http://www.lucidimagination.com




 --
 Marcus Herou CTO and co-founder Tailsweep AB
 +46702561312
 marcus.he...@tailsweep.com
 http://www.tailsweep.com/
 http://blogg.tailsweep.com/




 --
 Marcus Herou CTO and co-founder Tailsweep AB
 +46702561312
 marcus.he...@tailsweep.com
 http://www.tailsweep.com/
 http://blogg.tailsweep.com/




-- 
Marcus Herou CTO and co-founder Tailsweep AB
+46702561312
marcus.he...@tailsweep.com
http://www.tailsweep.com/
http://blogg.tailsweep.com/


PageRank sort

2009-04-23 Thread Marcus Herou
Hi.

I've posted before but here it goes again:

I have BlogData data which is more or less 100% static but one field is not
- the PageRank.
I would like to sort on that field and on the Lucene list I got these
answers.

1. Use two indexes and a ParallellReader
2. Use a FieldScoreQuery containing the PageRank field.
3. Use a CustomScoreQuery which uses the FieldScoreQuery combined with other
Queries (the actual search).

I think I could use this pattern as well:
1. Use two indexes and a ParallellReader
2. Normal search and Sort on the PageRank column (perhaps consuming more
memory)

Anyone have an idea of howto implement these patterns in SOLR ?
I have never extended SOLR but am not afraid of doing so if someone pushes
me in the right direction.

Kindly

//Marcus




-- 
Marcus Herou CTO and co-founder Tailsweep AB
+46702561312
marcus.he...@tailsweep.com
http://www.tailsweep.com/
http://blogg.tailsweep.com/