Re: How to design index for related versioned database records

Jack Krupansky Wed, 15 Aug 2012 05:32:53 -0700

The date checking can be implemented using range query as a filter query,such as


&fq=startDate:[* TO NOW] AND endDate:[NOW TO *]


(You can also use an "frange" query.)

Then you will have to flatten the database tables. Your Solr schema wouldhave a single "merged" record type. You will have to decide whether thedifferent record types (tables) will have common fields versus staticqualification by adding a prefix or suffix, e.g., "name" vs. "employee_name"and "employer_name". The latter has the advantage that you do not have toseparately specify a table "type" field since the fields would be empty forrecords of other types.


-- Jack Krupansky

-----Original Message-----From: Stefan Burkard

Sent: Wednesday, August 15, 2012 8:12 AM
To: solr-user@lucene.apache.org
Subject: How to design index for related versioned database records

Hi solr-users

I have a case where I need to build an index from a database.

***Data structure***
The data is spread across multiple tables and in each table the
records are versioned - this means that one "real" record can exist
multiple times in a table, each with different validFrom/validUntil
dates. Therefore it is possible to query the valid version of a record
for a given point in time.

The relations of the data are something like this:
Employee <-> LinkTable (=Employment) <-> Employer <-> LinkTable
(=offered services) <-> Service

That means I have data across 5 relations, each of them with versionedrecords.


***Search needs***
Now I need to be able to search for employees and employers based on
the services they offer for a given point in time.

Therefore I have built an index of all employees and employers with
their services as subentity. So I have one index entry for every
version of every employee/employer and each version collects the
offered services for the given timeframe of the employee/employer
version.

Problem: The offered services of an employee/employer can change
during its validity period. That means I do not only need to take the
version timespan of the employee/employer into account but also the
version timespans of services and the link-tables.

***Question***
I think I could continue with my strategy to have an index entry of an
employee/employer with its services for any given point in time. But
there are much more entries than now since every involved
validfrom/validuntil period (if they overlap) produces more entries.
But I am not sure if this is a good strategy, or if it would be better
to try to index the whole datastructure in an other way.

Are there any recommendations how to handle such a case?

Thanks for any help

Stephan

Re: How to design index for related versioned database records

Reply via email to