Actually, that's the root of my concern. It looks like it product will average ~20,000 associated accessories, still workable, but starting to look painful. Coming back the other way, I would guess each accessory would be associated with 100 products on average.
Given that there would be searchable fields in both the product and accessory data, I assume I would have to either split them into separate indexes and merge the results, or have one document per product/accessory combo so that I don't get a mix of accessories matching the search term. For example, if a product had two accessories, one with the description of "Blue Swing" and another with "Red Ball" and I did a search for "Red Swing" it would rank about the same as a document that actually had a "Red Swing". So it sounds like you are suggesting the external map, in which case is there a good way to merge the two searches? Basically on search on product attributes and a second search on the attributes of related accessories? many thanks, Jonathan ________________________________________ From: Robert Stewart [bstewart...@gmail.com] Sent: Thursday, October 20, 2011 12:05 PM To: solr-user@lucene.apache.org Subject: Re: how to handle large relational data in Solr If your "documents" are products, then 100,000 documents is a pretty small index for solr. Do you know approximately how many accessories are related to each product on average? If # if relatively small (around 100 or less), then it should be ok to create product documents with all the related accessories as fields on the document, something like: <doc> <field name="id">PRODUCT_ID</field> <field name="name">PRODUCT_NAME</field> <field name="accessory">accessory one</field> <field name="accessory">accessory two</field> .... <field name="accessory">accessory N</field> </doc> And then you can search for products by accessory, and show accessory facets over products, etc. Even if # of accessories per product is large (1000 or more), you can still do it this way, but it may be better to store some small accessory ID as integers instead of larger names, and maybe use some external mapping to resolve names for search and display. Bob On Oct 20, 2011, at 11:08 AM, Jonathan Carothers wrote: > Agreed, this will just be a read only view of the existing database for > search purposes. Sorry for the confusion. > ________________________________________ > From: Brandon Ramirez [brandon_rami...@elementk.com] > Sent: Thursday, October 20, 2011 10:50 AM > To: solr-user@lucene.apache.org > Subject: RE: how to handle large relational data in Solr > > I would not recommend removing your relational database altogether. You > should treat that as your system of record. By replacing it, you are forcing > Solr to store the unmodified value for everything even when not needed. You > also lose normalization. And if you ever need to add some data to your > system that isn't search-related, you have no choice but to add it to your > search index. > > > Brandon Ramirez | Office: 585.214.5413 | Fax: 585.295.4848 > Software Engineer II | Element K | www.elementk.com > > > -----Original Message----- > From: Jonathan Carothers [mailto:jonathan.caroth...@amentra.com] > Sent: Thursday, October 20, 2011 10:12 AM > To: solr-user@lucene.apache.org > Subject: how to handle large relational data in Solr > > All, > > We are attempting to convert a fairly large relational database into Solr > index(es). > > There are ~100,000 products with ~1,000,000 accessories that can be related > to any number of the products. So if I include the search terms and the > relationships in the same index, we're looking at a pretty huge index. > > If we break it out into three indexes, one for the product search, one for > the accessories search, and one for their relationship, is there a good way > to merge the results? > > Is there a better way to structure the indexes? > > We will have a relational database available if it makes sense to do some > sort of a hybrid approach. > > many thanks, > Jonathan >