The date checking can be implemented using range query as a filter query,
such as
&fq=startDate:[* TO NOW] AND endDate:[NOW TO *]
(You can also use an "frange" query.)
Then you will have to flatten the database tables. Your Solr schema would
have a single "merged" record type. You will have to decide whether the
different record types (tables) will have common fields versus static
qualification by adding a prefix or suffix, e.g., "name" vs. "employee_name"
and "employer_name". The latter has the advantage that you do not have to
separately specify a table "type" field since the fields would be empty for
records of other types.
-- Jack Krupansky
-----Original Message-----
From: Stefan Burkard
Sent: Wednesday, August 15, 2012 8:12 AM
To: solr-user@lucene.apache.org
Subject: How to design index for related versioned database records
Hi solr-users
I have a case where I need to build an index from a database.
***Data structure***
The data is spread across multiple tables and in each table the
records are versioned - this means that one "real" record can exist
multiple times in a table, each with different validFrom/validUntil
dates. Therefore it is possible to query the valid version of a record
for a given point in time.
The relations of the data are something like this:
Employee <-> LinkTable (=Employment) <-> Employer <-> LinkTable
(=offered services) <-> Service
That means I have data across 5 relations, each of them with versioned
records.
***Search needs***
Now I need to be able to search for employees and employers based on
the services they offer for a given point in time.
Therefore I have built an index of all employees and employers with
their services as subentity. So I have one index entry for every
version of every employee/employer and each version collects the
offered services for the given timeframe of the employee/employer
version.
Problem: The offered services of an employee/employer can change
during its validity period. That means I do not only need to take the
version timespan of the employee/employer into account but also the
version timespans of services and the link-tables.
***Question***
I think I could continue with my strategy to have an index entry of an
employee/employer with its services for any given point in time. But
there are much more entries than now since every involved
validfrom/validuntil period (if they overlap) produces more entries.
But I am not sure if this is a good strategy, or if it would be better
to try to index the whole datastructure in an other way.
Are there any recommendations how to handle such a case?
Thanks for any help
Stephan