Both dynamic fields and multivalued fields are powerful Solr features that
can be used to great effect, but only is used in moderation - a relatively
small number of discrete values (e.g., a few dozens of strings.) Anything
more complex and you are asking for trouble and creating a pseudo-schema
that will be difficult to maintain or for anybody else to comprehend.
So, the simple answer to your question: Flatten, in the most straightforward
manner - each instance of a record type should be a discrete Solr
document, give each record its own id to be the Solr document key/ID.
Solr can support multiple document types in the same collection, or you can
store each record type in separate collection.
The simplest, cleanest structure is to store each record type in a separate
collection and then use multiple Solr queries to emulate SQL join operations
as needed.
But if you would prefer to mash multiple record types into the same Solr
collection/schema, you can do that too. Make the schema be the union of the
schemas for each record type - Solr/Lucene has no significant overhead for
fields which do not have values present for a given document.
Each document would have a unique ID field. In addition, each document would
have a parent field for each record type, so you can quickly search for all
children of a given parent. You can have one common parent ID if you assign
unique IDs to all children across all record types, but it can sometimes be
cleaner for the child ID to reset to zero/one for each new parent. It's
merely a question of whether you want to have a single key value or a tuple
of key values to identify a specific child.
You can duplicate a subset of the parent fields in each child to simulate
the effect of a simple join in a single clean query. But you can do a
separate query to get parent record details.
-- Jack Krupansky
-Original Message-
From: Sperrink
Sent: Saturday, June 29, 2013 5:08 AM
To: solr-user@lucene.apache.org
Subject: Schema design for parent child field
Good day,
I'm seeking some guidance on how best to represent the following data within
a solr schema.
I have a list of subjects which are detailed to n levels.
Each document can contain many of these subject entities.
As I see it if this had been just 1 subject per document, dynamic fields
would have been a good resolution.
Any suggestions on how best to create this structure in a denormalised
fashion while maintaining the data integrity.
For example a document could have:
Subject level 1: contract
Subject level 2: claims
Subject level 1: patent
Subject level 2: counter claims
If I were to search for level 1 contract, I would only want the facet count
for level 2 to contain claims and not counter claims.
Any assistance in this would be much appreciated.
--
View this message in context:
http://lucene.472066.n3.nabble.com/Schema-design-for-parent-child-field-tp4074084.html
Sent from the Solr - User mailing list archive at Nabble.com.