Hi, and apologies in advance for the lengthy question!

I'm looking to use Solr to power searching & browsing over a large set of
product data stored in a relational db. I'm wandering what the most
appropriate schema design strategy to use is. A simplified view of the
relational data is:

Shop (~1000 rows)
-Id*
-Name

Product (~300,000 rows)
-Id*
-Name
-Availability

ProductFormat (~5 rows)
-Id*
-Name

Component (part of a product that may be sold separately) (~4,000,000 rows)
-Id*
-Name

ProductComponent (~4,000,000 rows)
-ProductId*
-ComponentId*

ShopProduct (~6,000,000 rows)
-ShopId*
-ProductId*
-ProductFormatId*
-AvailableDate

ShopProductPriceList (~15,000,000 rows)
-ShopId*
-ProductId*
-ProductFormatId*
-Applicability (Component/Product)*
-Type (Regular/SalePrice)*
-Amount

* logical primary key

This means:
-availability of a product differs from shop to shop
-the price of a product or component is dependent on the format, and also
differs from shop to shop

Textual searching is required over product & component names, and filtering
is required over Shops, Product Availability, Formats, & Prices.

The simplest approach would be to flatten out the data completely (1 solr
document per ShopProduct and ShopProductComponent). This would result in
~80million documents, which I'm guessing this would need some form of
sharding/distribution

An alternate approach would be to construct one document per Product, and
*nest* the relational data via dynamic fields (and possibly plugins?)
Eg one document per Product; multi-value fields for ProductComponent & Shop;
dynamic fields for Availability/Format, using ShopId as part of the field
name.
This approach would result in far fewer documents (400,000), but more
complex queries. It would also require extending Solr/Lucene to search over
ProductComponents and filter by price, which I'm not quite clear on as
yet...

Any guidance on which of the two general approaches (or others) to explore
further?

Thanks!
Vim

Reply via email to