Re: parent/child rows in solr

Shawn Heisey Fri, 07 Sep 2018 19:00:01 -0700

On 9/7/2018 7:44 PM, John Smith wrote:

Thanks Shawn, for your comments. The reason why I don't want to go flat
file structure, is due to all the wasted/duplicated data. If a department
has 100 employees, then it's very wasteful in terms of disk space to repeat
the header data over and over again, 100 times. In this example there is
only a few doc types, but my real-life data is much larger, and the problem
is a "scaling" problem; with just a little bit of data, no problem in
duplicating header fields, but with massive amounts of data it's a large
problem.

If your goal is data storage, then you are completely correct. All thatdata duplication is something to avoid for a data storage situation. Normalizing your data so it's relational makes perfect sense, becausemost database software is designed to efficiently deal with thoserelationships.

Solr is not designed as a data storage platform, and does not handlethose relationships efficiently. Solr's design goals are all about*search*. It often gets touted as filling a NoSQL role ... but it's notsomething I would personally use as a primary data repository. Searchis a space where data duplication is expected and completely normal. This is something that people often have a hard time accepting.


Thanks,
Shawn

Re: parent/child rows in solr

Reply via email to