Actually, that's the root of my concern.  It looks like it product will average 
~20,000 associated accessories, still workable, but starting to look painful.  
Coming back the other way, I would guess each accessory would be associated 
with 100 products on average.

Given that there would be searchable fields in both the product and accessory 
data, I assume I would have to either split them  into separate indexes and 
merge the results, or have one document per product/accessory combo so that I 
don't get a mix of accessories matching the search term.  For example, if a 
product had two accessories, one with the description of "Blue Swing" and 
another with "Red Ball" and I did a search for "Red Swing" it would rank about 
the same as a document that actually had a "Red Swing".

So it sounds like you are suggesting the external map, in which case is there a 
good way to merge the two searches?  Basically on search on product attributes 
and a second search on the attributes of related accessories?

many thanks,
Jonathan
________________________________________
From: Robert Stewart [bstewart...@gmail.com]
Sent: Thursday, October 20, 2011 12:05 PM
To: solr-user@lucene.apache.org
Subject: Re: how to handle large relational data in Solr

If your "documents" are products, then 100,000 documents is a pretty small 
index for solr.  Do you know approximately how many accessories are related to 
each product on average?  If # if relatively small (around 100 or less), then 
it should be ok to create product documents with all the related accessories as 
fields on the document, something like:

<doc>
        <field name="id">PRODUCT_ID</field>
        <field name="name">PRODUCT_NAME</field>
        <field name="accessory">accessory one</field>
        <field name="accessory">accessory two</field>
        ....
        <field name="accessory">accessory N</field>
</doc>


And then you can search for products by accessory, and show accessory facets 
over products, etc.

Even if # of accessories per product is large (1000 or more), you can still do 
it this way, but it may be better to store some small accessory ID as integers 
instead of larger names, and maybe use some external mapping to resolve names 
for search and display.

Bob


On Oct 20, 2011, at 11:08 AM, Jonathan Carothers wrote:

> Agreed, this will just be a read only view of the existing database for 
> search purposes.  Sorry for the confusion.
> ________________________________________
> From: Brandon Ramirez [brandon_rami...@elementk.com]
> Sent: Thursday, October 20, 2011 10:50 AM
> To: solr-user@lucene.apache.org
> Subject: RE: how to handle large relational data in Solr
>
> I would not recommend removing your relational database altogether.  You 
> should treat that as your system of record.  By replacing it, you are forcing 
> Solr to store the unmodified value for everything even when not needed.  You 
> also lose normalization.   And if you ever need to add some data to your 
> system that isn't search-related, you have no choice but to add it to your 
> search index.
>
>
> Brandon Ramirez | Office: 585.214.5413 | Fax: 585.295.4848
> Software Engineer II | Element K | www.elementk.com
>
>
> -----Original Message-----
> From: Jonathan Carothers [mailto:jonathan.caroth...@amentra.com]
> Sent: Thursday, October 20, 2011 10:12 AM
> To: solr-user@lucene.apache.org
> Subject: how to handle large relational data in Solr
>
> All,
>
> We are attempting to convert a fairly large relational database into Solr 
> index(es).
>
> There are ~100,000 products with ~1,000,000 accessories that can be related 
> to any number of the products.  So if I include the search terms and the 
> relationships in the same index, we're looking at a pretty huge index.
>
> If we break it out into three indexes, one for the product search, one for 
> the accessories search, and one for their relationship, is there a good way 
> to merge the results?
>
> Is there a better way to structure the indexes?
>
> We will have a relational database available if it makes sense to do some 
> sort of a hybrid approach.
>
> many thanks,
> Jonathan
>

Reply via email to