In addition to Otis' suggestion, think about using multivalued fields with an increment gap of, say, 100 (assuming your accessories had less than 100 fields). Then you can do proximity searches with a size < 100 (e.g. "red swing"~90) would not match across your multiple entries....
If this is clear as mud, write back with what you've tried and maybe we can help Best Erick On Thu, Oct 20, 2011 at 7:23 PM, Jonathan Carothers <jonathan.caroth...@amentra.com> wrote: > Actually, that's the root of my concern. It looks like it product will > average ~20,000 associated accessories, still workable, but starting to look > painful. Coming back the other way, I would guess each accessory would be > associated with 100 products on average. > > Given that there would be searchable fields in both the product and accessory > data, I assume I would have to either split them into separate indexes and > merge the results, or have one document per product/accessory combo so that I > don't get a mix of accessories matching the search term. For example, if a > product had two accessories, one with the description of "Blue Swing" and > another with "Red Ball" and I did a search for "Red Swing" it would rank > about the same as a document that actually had a "Red Swing". > > So it sounds like you are suggesting the external map, in which case is there > a good way to merge the two searches? Basically on search on product > attributes and a second search on the attributes of related accessories? > > many thanks, > Jonathan > ________________________________________ > From: Robert Stewart [bstewart...@gmail.com] > Sent: Thursday, October 20, 2011 12:05 PM > To: solr-user@lucene.apache.org > Subject: Re: how to handle large relational data in Solr > > If your "documents" are products, then 100,000 documents is a pretty small > index for solr. Do you know approximately how many accessories are related > to each product on average? If # if relatively small (around 100 or less), > then it should be ok to create product documents with all the related > accessories as fields on the document, something like: > > <doc> > <field name="id">PRODUCT_ID</field> > <field name="name">PRODUCT_NAME</field> > <field name="accessory">accessory one</field> > <field name="accessory">accessory two</field> > .... > <field name="accessory">accessory N</field> > </doc> > > > And then you can search for products by accessory, and show accessory facets > over products, etc. > > Even if # of accessories per product is large (1000 or more), you can still > do it this way, but it may be better to store some small accessory ID as > integers instead of larger names, and maybe use some external mapping to > resolve names for search and display. > > Bob > > > On Oct 20, 2011, at 11:08 AM, Jonathan Carothers wrote: > >> Agreed, this will just be a read only view of the existing database for >> search purposes. Sorry for the confusion. >> ________________________________________ >> From: Brandon Ramirez [brandon_rami...@elementk.com] >> Sent: Thursday, October 20, 2011 10:50 AM >> To: solr-user@lucene.apache.org >> Subject: RE: how to handle large relational data in Solr >> >> I would not recommend removing your relational database altogether. You >> should treat that as your system of record. By replacing it, you are >> forcing Solr to store the unmodified value for everything even when not >> needed. You also lose normalization. And if you ever need to add some >> data to your system that isn't search-related, you have no choice but to add >> it to your search index. >> >> >> Brandon Ramirez | Office: 585.214.5413 | Fax: 585.295.4848 >> Software Engineer II | Element K | www.elementk.com >> >> >> -----Original Message----- >> From: Jonathan Carothers [mailto:jonathan.caroth...@amentra.com] >> Sent: Thursday, October 20, 2011 10:12 AM >> To: solr-user@lucene.apache.org >> Subject: how to handle large relational data in Solr >> >> All, >> >> We are attempting to convert a fairly large relational database into Solr >> index(es). >> >> There are ~100,000 products with ~1,000,000 accessories that can be related >> to any number of the products. So if I include the search terms and the >> relationships in the same index, we're looking at a pretty huge index. >> >> If we break it out into three indexes, one for the product search, one for >> the accessories search, and one for their relationship, is there a good way >> to merge the results? >> >> Is there a better way to structure the indexes? >> >> We will have a relational database available if it makes sense to do some >> sort of a hybrid approach. >> >> many thanks, >> Jonathan >> > >