Yep, it's tricky to do this sort of thing in Solr. 

One way to do it would be to try and reindex the main item on some regular 
basis with the keywords/comments actually flattened into the main record. Maybe 
along with a field for number_of_comments, so you can boost on that or what 
have you.  If you can figure out a way to do that, it would be easiest/most 
reliable without fighting Solr.  Beware that it's difficult to set up a Solr 
that has very frequent commits though, you might want to batch the updates 
every hour or half hour or what have you. 

Another thing to look at is this patch which supports a limited type of 'join' 
in Solr. I'm not sure it's current status of maturity, and I'm not sure if it 
would work in your use case or not. 
https://issues.apache.org/jira/browse/SOLR-2272

And, if your alternative is writing your own thing from scratch, another option 
would be instead writing new components in Java for Solr to try and do what you 
want.  If you can understand the structure and features of the lucene index 
underlying Solr, and figure out a way to get the functionality you want from 
lucene, then that's the first step to figuring out how to write a component for 
Solr to expose it.  
________________________________________
From: alex.d...@gmail.com [alex.d...@gmail.com] On Behalf Of Alex Dong 
[a...@trunk.ly]
Sent: Friday, March 04, 2011 12:56 AM
To: Gora Mohanty
Cc: solr-user@lucene.apache.org
Subject: Re: Model foreign key type of search?

Gora, thanks for the quick reply.

Yes, I'm aware of the differences between Solr vs. DBMS. We've actually
written some c++ analytical engine that can process through a billion tweets
with multiple facets drill down. We may end up cook our own in the end but
so far solr suites our needs quite well.  The multi-lingual tokenizer and
tika integration are all too addictive.

What you're suggesting is exactly what I'm doing. Trying to use dynamic
fields and copyTo to get all the information into one field, then run the
search over that.

However, this is not good enough.  Allow me to elaborate this using the same
Paris example again.  Let's say two urls, first has 10 people bookmarked and
second has 100. Let's say these two have roughly similar score if we squeeze
them into one single field. Then I'd like to rank the one with more users
higher.

Another way to look at this is PageRank relies on the the number and anchor
text of the incoming link, we're trying to use the number of people and
their keywords/comments as a weight for the link.

Alex


On Fri, Mar 4, 2011 at 6:29 PM, Gora Mohanty <g...@mimirtech.com> wrote:

> On Fri, Mar 4, 2011 at 10:24 AM, Alex Dong <a...@trunk.ly> wrote:
> > Hi there,  I need some advice on how to implement this using solr:
> >
> > We have two tables: urls and bookmarks.
> > - Each url has four fields:  {guid, title, text, url}
> > - One url will have one or more bookmarks associated with it. Each
> bookmark
> > has these: {link.guid, user, tags, comment}
> >
> > I'd like to return matched urls based on not only the "title, text" from
> the
> > url schema, but also some kind of aggregated popularity score based on
> all
> > "bookmarks" for the same url. The popularity score should base on
> > number/frequency of bookmarks that match the query.
> [...]
>
> It is best not to think of Solr as a RDBMS, and not to try to graft
> RDBMS practices on to it. Instead, you should flatten your data,
> e.g., in the above, you could have:
> * Four single-valued fields: guid, title, text, url
> * Four multi-valued fields: bookmark_guid, bookmark_user,
>  bookmark_tags, bookmark_comment
> Your index would contain one record per guid of the URL,
> and you would need to populate the multi-valued bookmark
> fields from all bookmark instances associated with that URL.
>
> Then one could either copy the relevant search fields to a full-text
> search field, and search only on that, or, e.g., search on bookmark_tags
> and bookmark_comment in addition to searching on title, and text.
>
> Regards,
> Gora
>

Reply via email to