Re: How to design index for related versioned database records

Stefan Burkard Tue, 21 Aug 2012 03:06:20 -0700

Hi Jack

Thanks for your answer. Do I understand that correctly that I must
create a "merge-entity" that contains all the different
validFrom/validUntil dates as fields (and of course the other
search-related fields).


This would mean that the number of index entries is equal to the
number of all possible combinations of from/until date-ranges in a
"record-chain" (all records with all their individual versions
connected by foreign keys) since every combination creates a new
record in a query across all tables. That also means that I will have
a lot of entries with the same values in the other search-related
fields - the only difference will be most of the time one of the
from/until-ranges.
Perhaps the query can be optimized so that irrelevant combinations can
be avoided (for example if two date-ranges do not overlap).

Then, when I have built that index I can query it with the reference
date as argument to compare it with every from/until range in the
chain. And so I get only the relevant entries where the reference date
is between all from/until ranges.

Is this correct?

Thanks and regards
Stephan



On Wed, Aug 15, 2012 at 2:32 PM, Jack Krupansky <j...@basetechnology.com> wrote:
> The date checking can be implemented using range query as a filter query,
> such as
>
> &fq=startDate:[* TO NOW] AND endDate:[NOW TO *]
>
> (You can also use an "frange" query.)
>
> Then you will have to flatten the database tables. Your Solr schema would
> have a single "merged" record type. You will have to decide whether the
> different record types (tables) will have common fields versus static
> qualification by adding a prefix or suffix, e.g., "name" vs. "employee_name"
> and "employer_name". The latter has the advantage that you do not have to
> separately specify a table "type" field since the fields would be empty for
> records of other types.
>
> -- Jack Krupansky
>
> -----Original Message----- From: Stefan Burkard
> Sent: Wednesday, August 15, 2012 8:12 AM
> To: solr-user@lucene.apache.org
> Subject: How to design index for related versioned database records
>
>
> Hi solr-users
>
> I have a case where I need to build an index from a database.
>
> ***Data structure***
> The data is spread across multiple tables and in each table the
> records are versioned - this means that one "real" record can exist
> multiple times in a table, each with different validFrom/validUntil
> dates. Therefore it is possible to query the valid version of a record
> for a given point in time.
>
> The relations of the data are something like this:
> Employee <-> LinkTable (=Employment) <-> Employer <-> LinkTable
> (=offered services) <-> Service
>
> That means I have data across 5 relations, each of them with versioned
> records.
>
> ***Search needs***
> Now I need to be able to search for employees and employers based on
> the services they offer for a given point in time.
>
> Therefore I have built an index of all employees and employers with
> their services as subentity. So I have one index entry for every
> version of every employee/employer and each version collects the
> offered services for the given timeframe of the employee/employer
> version.
>
> Problem: The offered services of an employee/employer can change
> during its validity period. That means I do not only need to take the
> version timespan of the employee/employer into account but also the
> version timespans of services and the link-tables.
>
> ***Question***
> I think I could continue with my strategy to have an index entry of an
> employee/employer with its services for any given point in time. But
> there are much more entries than now since every involved
> validfrom/validuntil period (if they overlap) produces more entries.
> But I am not sure if this is a good strategy, or if it would be better
> to try to index the whole datastructure in an other way.
>
> Are there any recommendations how to handle such a case?
>
> Thanks for any help
> Stephan

Re: How to design index for related versioned database records

Reply via email to