Hi, and apologies in advance for the lengthy question! I'm looking to use Solr to power searching & browsing over a large set of product data stored in a relational db. I'm wandering what the most appropriate schema design strategy to use is. A simplified view of the relational data is:
Shop (~1000 rows) -Id* -Name Product (~300,000 rows) -Id* -Name -Availability ProductFormat (~5 rows) -Id* -Name Component (part of a product that may be sold separately) (~4,000,000 rows) -Id* -Name ProductComponent (~4,000,000 rows) -ProductId* -ComponentId* ShopProduct (~6,000,000 rows) -ShopId* -ProductId* -ProductFormatId* -AvailableDate ShopProductPriceList (~15,000,000 rows) -ShopId* -ProductId* -ProductFormatId* -Applicability (Component/Product)* -Type (Regular/SalePrice)* -Amount * logical primary key This means: -availability of a product differs from shop to shop -the price of a product or component is dependent on the format, and also differs from shop to shop Textual searching is required over product & component names, and filtering is required over Shops, Product Availability, Formats, & Prices. The simplest approach would be to flatten out the data completely (1 solr document per ShopProduct and ShopProductComponent). This would result in ~80million documents, which I'm guessing this would need some form of sharding/distribution An alternate approach would be to construct one document per Product, and *nest* the relational data via dynamic fields (and possibly plugins?) Eg one document per Product; multi-value fields for ProductComponent & Shop; dynamic fields for Availability/Format, using ShopId as part of the field name. This approach would result in far fewer documents (400,000), but more complex queries. It would also require extending Solr/Lucene to search over ProductComponents and filter by price, which I'm not quite clear on as yet... Any guidance on which of the two general approaches (or others) to explore further? Thanks! Vim