On 9/7/2018 7:44 PM, John Smith wrote:
Thanks Shawn, for your comments. The reason why I don't want to go flat
file structure, is due to all the wasted/duplicated data. If a department
has 100 employees, then it's very wasteful in terms of disk space to repeat
the header data over and over again, 100 times. In this example there is
only a few doc types, but my real-life data is much larger, and the problem
is a "scaling" problem; with just a little bit of data, no problem in
duplicating header fields, but with massive amounts of data it's a large
problem.

If your goal is data storage, then you are completely correct.  All that data duplication is something to avoid for a data storage situation.  Normalizing your data so it's relational makes perfect sense, because most database software is designed to efficiently deal with those relationships.

Solr is not designed as a data storage platform, and does not handle those relationships efficiently.  Solr's design goals are all about *search*.  It often gets touted as filling a NoSQL role ... but it's not something I would personally use as a primary data repository.  Search is a space where data duplication is expected and completely normal.  This is something that people often have a hard time accepting.

Thanks,
Shawn

Reply via email to