On 9/7/2018 7:44 PM, John Smith wrote:
Thanks Shawn, for your comments. The reason why I don't want to go flat file structure, is due to all the wasted/duplicated data. If a department has 100 employees, then it's very wasteful in terms of disk space to repeat the header data over and over again, 100 times. In this example there is only a few doc types, but my real-life data is much larger, and the problem is a "scaling" problem; with just a little bit of data, no problem in duplicating header fields, but with massive amounts of data it's a large problem.
If your goal is data storage, then you are completely correct. All that data duplication is something to avoid for a data storage situation. Normalizing your data so it's relational makes perfect sense, because most database software is designed to efficiently deal with those relationships.
Solr is not designed as a data storage platform, and does not handle those relationships efficiently. Solr's design goals are all about *search*. It often gets touted as filling a NoSQL role ... but it's not something I would personally use as a primary data repository. Search is a space where data duplication is expected and completely normal. This is something that people often have a hard time accepting.
Thanks, Shawn