Both dynamic fields and multivalued fields are powerful Solr features that can be used to great effect, but only is used in moderation - a relatively small number of discrete values (e.g., a few dozens of strings.) Anything more complex and you are asking for trouble and creating a pseudo-schema that will be difficult to maintain or for anybody else to comprehend.

So, the simple answer to your question: Flatten, in the most straightforward manner - each instance of a "record type" should be a discrete Solr document, give each "record" its own "id" to be the Solr document key/ID. Solr can support multiple document types in the same collection, or you can store each record type in separate collection.

The simplest, cleanest structure is to store each record type in a separate collection and then use multiple Solr queries to emulate SQL join operations as needed.

But if you would prefer to "mash" multiple record types into the same Solr collection/schema, you can do that too. Make the schema be the union of the schemas for each record type - Solr/Lucene has no significant overhead for fields which do not have values present for a given document.

Each document would have a unique ID field. In addition, each document would have a parent field for each record type, so you can quickly search for all children of a given parent. You can have one common parent ID if you assign unique IDs to all children across all record types, but it can sometimes be cleaner for the child ID to reset to zero/one for each new parent. It's merely a question of whether you want to have a single key value or a tuple of key values to identify a specific child.

You can duplicate a subset of the parent fields in each child to simulate the effect of a simple join in a single clean query. But you can do a separate query to get parent record details.

-- Jack Krupansky

-----Original Message----- From: Sperrink
Sent: Saturday, June 29, 2013 5:08 AM
To: solr-user@lucene.apache.org
Subject: Schema design for parent child field

Good day,
I'm seeking some guidance on how best to represent the following data within
a solr schema.
I have a list of subjects which are detailed to n levels.
Each document can contain many of these subject entities.
As I see it if this had been just 1 subject per document, dynamic fields
would have been a good resolution.
Any suggestions on how best to create this structure in a denormalised
fashion while maintaining the data integrity.
For example a document could have:
Subject level 1: contract
Subject level 2: claims
Subject level 1: patent
Subject level 2: counter claims

If I were to search for level 1 contract, I would only want the facet count
for level 2 to contain claims and not counter claims.

Any assistance in this would be much appreciated.




--
View this message in context: http://lucene.472066.n3.nabble.com/Schema-design-for-parent-child-field-tp4074084.html Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to