I have no great answer for you, this is to me a generally unanswered question, 
hard to do Solr with this sort of thing, I think you seem to understand it 
properly. 

There ARE some interesting new features in trunk (not 1.4) that may be 
relevant, although to my perspective none of them provide magic bullet 
solutions. But there is a 'join' feature which could be awfully useful with the 
setup you suggest of having different 'types' of documents all together in the 
same index. 

https://issues.apache.org/jira/browse/SOLR-2272
________________________________________
From: Scott Yeadon [scott.yea...@anu.edu.au]
Sent: Tuesday, February 08, 2011 4:41 PM
To: solr-user@lucene.apache.org
Subject: relational db mapping for advanced search

Hi,

I was just after some advice on how to map some relational metadata to a
solr index. The web application I'm working on is based around people
and the searching based around properties of these people. Several
properties are more complex - for example, a person's occupations have
place, from/to dates and other descriptive text; texts about a person
have authors, sources and publication dates. Despite the usefulness of
facets and the search-based navigation, an advanced search feature is a
non-negotiable required feature of the application.

An advanced search needs to be able to query a person on any set of
attributes (e.g. gender, birth date, death date, place of birth) etc
including the more complex search criteron as described above
(occupation, texts). Taking occupation as an example, because occupation
has its own metadata and a person could have worked an arbitrary number
of occupations throughout their lifetime, I was wondering how/if this
information can be denormalised into a single person index document to
support such a search. I can't use text concatenation in a multivalued
field as I need to be able to run date-based range queries (e.g.
publication dates, occupation dates). And I'm not sure that resorting to
multiple repeated fields based on the current limits (e.g. occ1,
occ1startdate, occ1enddate, occ1place, occ2, etc) is a good approach
(although that would work).

If there isn't a sensible way to denormalise this, what is the best
approach? For example, should I have an occupation document type, a
person document type, a text/source document type and (in an advanced
search context) each containing the relevant person id and (in the
advanced search context) run a query against each document type and then
use the intersecting set of person ids as the result used by the
application for its display/pagination? And if so, how do I ensure I
capture all records - for example if there are 100,000 hits on someone
having worked in Australia in 1956, is there any way to ensure all
100,000 are returned in a query (similar to the facet.limit = -1) other
than specifying an arbitrary high number in the "rows" parameter and
hoping a query doesn't hit more than 100,000 and thus exclude those
above the limit from the "intersect" processing?

Or is there a single query solution?

Any advice/hints welcome.

Scott.

Reply via email to