Yep, it's tricky to do this sort of thing in Solr. One way to do it would be to try and reindex the main item on some regular basis with the keywords/comments actually flattened into the main record. Maybe along with a field for number_of_comments, so you can boost on that or what have you. If you can figure out a way to do that, it would be easiest/most reliable without fighting Solr. Beware that it's difficult to set up a Solr that has very frequent commits though, you might want to batch the updates every hour or half hour or what have you.
Another thing to look at is this patch which supports a limited type of 'join' in Solr. I'm not sure it's current status of maturity, and I'm not sure if it would work in your use case or not. https://issues.apache.org/jira/browse/SOLR-2272 And, if your alternative is writing your own thing from scratch, another option would be instead writing new components in Java for Solr to try and do what you want. If you can understand the structure and features of the lucene index underlying Solr, and figure out a way to get the functionality you want from lucene, then that's the first step to figuring out how to write a component for Solr to expose it. ________________________________________ From: alex.d...@gmail.com [alex.d...@gmail.com] On Behalf Of Alex Dong [a...@trunk.ly] Sent: Friday, March 04, 2011 12:56 AM To: Gora Mohanty Cc: solr-user@lucene.apache.org Subject: Re: Model foreign key type of search? Gora, thanks for the quick reply. Yes, I'm aware of the differences between Solr vs. DBMS. We've actually written some c++ analytical engine that can process through a billion tweets with multiple facets drill down. We may end up cook our own in the end but so far solr suites our needs quite well. The multi-lingual tokenizer and tika integration are all too addictive. What you're suggesting is exactly what I'm doing. Trying to use dynamic fields and copyTo to get all the information into one field, then run the search over that. However, this is not good enough. Allow me to elaborate this using the same Paris example again. Let's say two urls, first has 10 people bookmarked and second has 100. Let's say these two have roughly similar score if we squeeze them into one single field. Then I'd like to rank the one with more users higher. Another way to look at this is PageRank relies on the the number and anchor text of the incoming link, we're trying to use the number of people and their keywords/comments as a weight for the link. Alex On Fri, Mar 4, 2011 at 6:29 PM, Gora Mohanty <g...@mimirtech.com> wrote: > On Fri, Mar 4, 2011 at 10:24 AM, Alex Dong <a...@trunk.ly> wrote: > > Hi there, I need some advice on how to implement this using solr: > > > > We have two tables: urls and bookmarks. > > - Each url has four fields: {guid, title, text, url} > > - One url will have one or more bookmarks associated with it. Each > bookmark > > has these: {link.guid, user, tags, comment} > > > > I'd like to return matched urls based on not only the "title, text" from > the > > url schema, but also some kind of aggregated popularity score based on > all > > "bookmarks" for the same url. The popularity score should base on > > number/frequency of bookmarks that match the query. > [...] > > It is best not to think of Solr as a RDBMS, and not to try to graft > RDBMS practices on to it. Instead, you should flatten your data, > e.g., in the above, you could have: > * Four single-valued fields: guid, title, text, url > * Four multi-valued fields: bookmark_guid, bookmark_user, > bookmark_tags, bookmark_comment > Your index would contain one record per guid of the URL, > and you would need to populate the multi-valued bookmark > fields from all bookmark instances associated with that URL. > > Then one could either copy the relevant search fields to a full-text > search field, and search only on that, or, e.g., search on bookmark_tags > and bookmark_comment in addition to searching on title, and text. > > Regards, > Gora >