Re: Data Modeling

2009-08-19 Thread Smiley, David W.
This is the sort of Solr fundamentals question my book (chapter 2) will help 
you with.

Think about what your user interface is.  What are users searching for?  That 
is, what exactly comes back from search results?  It's not clear from your 
description what your search scenario is.

~ David Smiley
 Author: http://www.packtpub.com/solr-1-4-enterprise-search-server



On 8/19/09 10:31 AM, Vladimir Landman v...@northernautoparts.com wrote:

Hi,

I am trying to create a schema for Solr.   Here is a relational model of what 
our data might look like:

Inventory
-
Sku
Price
Weight

Attributes
---
AttributeName
AttributeValue

Applications
--
Id (Auto-Incrementing)
Sku
VehicleYear
VehicleMake
VehicleModel
VehicleEngine

There can be multiple Application(s) records.  Also, Attributes can also have 
duplicates.  Basically I want to store basic information about our inventory, 
attributes, and applications.  If I didn't have the applications,
I would simply have:
field name=id ...
field name=sku ...
field name=price ...
field name=weight ...
!-- Attributes --
field name=OilPumpVolume ...
field name=FuelType ...

Since one part might have 3 or 4 attributes, but 100 applications, I want to 
try to avoid having 400 records, but maybe that is just what I will have to do.

I appreciate any help.
--
Vladimir Landman
Northern Auto Parts





RE: Data Modeling

2009-08-19 Thread Smiley, David W.
It's getting clearer Vladimir.  So fundamentally your users are searching for 
products (apparently auto parts) and the different attributes would become 
navigation filters.  If this is right, then your initial schema (the first 
email) is a start, although it's a little ambigous to interpert it because id 
and sku are over-loaded.  Your schema would contain a part id, the part's 
sku, and for each attribute you mentioned it would have a field as well.  I 
recommend using Solr's dynamic fields to define those so that you don't have to 
explicitly define every attribute you'll ever think of for every part 
explicitly in the schema.   The word application was totally throwing me but 
now I believe you mean to say that this is a vehicle, and an auto part is going 
to work on multiple vehicles.  In Solr, you're going to denormalize this 
related data by inlining the auto information (aka application) into the each 
document which is an auto part. ...

I think you have a couple approaches on that.

Firstly, I observe that when I'm shopping for autos or for auto parts, I am 
guided through a user interface to pick my precise vehicle.  THEN I see related 
products.  This is straight forward -- you would not use Solr; put this 
information in your database and build an easy app to navigate to a specific 
vehicle to get the vehicle identifier.  You *could* use Solr for this but it'd 
be in a separate index/core or you would have to use multiple document types in 
your schema (my book has more info on these approaches).  So once you have the 
vehicle identifier, you would look up documents in Solr (aka auto parts) that 
have have this vehicle identifier.  It's be a multi-valued untokenized field 
and this would be the only vehicle info needed in your schema.

The other approach would be necessary to dynamically filter a list of parts by 
*partial* vehicle choices like picking Porsche and 2001 would give you 
parts that will work on a Boxster and a Carerra made in 2001.  Doing this 
correctly is tricky for solr and it's non-relational schema because there are 
multiple vehicle attributes and an auto part is associated to multiple 
vehicles.  I'll advise more if you need to do this but hopefully you won't need 
to.  It's a bit advanced and complicated.

~ David Smiley
Author: http://www.packtpub.com/solr-1-4-enterprise-search-server

From: Vladimir Landman [v...@northernautoparts.com]
Sent: Wednesday, August 19, 2009 4:01 PM
To: solr-user@lucene.apache.org
Subject: FW: Data Modeling

I hit reply and sent this to just David, but I think it should go to the whole 
list:

Hi David,

I want to do 2 kinds of things with Solr  Maybe 3 in the future

1. I want to use  it on our website so that a customer can filter down products 
by different attributes.  So suppose we have:

Inventory
---
ABC, 10
DEF, 15
s
Attributes

ABC,Brand,ACME Brand
ABC,Water Pump Style,Short
DEF,Brand,Engine Builders
DEF,Water Pump Style, Long


Vehicle Applicatins
ABC, 1999, Toyota, Camry, 3.1L
ABC, 2000, Toyota, Camry, 3.1L
DEF, 1997, Ford, Focus, 2.5L
DEF, 1998, Ford, Focus, 2.5L

I would like to be able to handle two things:

1. Give the person a list of all the unique years.  When they pick one, show 
them all the Makes for that year.  When they pick that, show all the Models.

Alternatively:
1. Give them a list of makes, then models, then engine, etc...

Also, it would be nice to if I could give Solr a Part#(Sku) and have it get all 
the attributes for that sku, alternatively, I'd love to be able to drill-down 
by the attributes such as Brand, Water Pump Style, etc.

Please let me know if this email is still not clear...



--
Vladimir Landman
Northern Auto Parts


From: Smiley, David W. [mailto:dsmi...@mitre.org]
Sent: 2009-08-19 10:42 AM
To: solr; Vladimir Landman
Subject: Re: Data Modeling

This is the sort of Solr fundamentals question my book (chapter 2) will help 
you with.

Think about what your user interface is.  What are users searching for?  That 
is, what exactly comes back from search results?  It's not clear from your 
description what your search scenario is.

~ David Smiley
 Author: http://www.packtpub.com/solr-1-4-enterprise-search-server



On 8/19/09 10:31 AM, Vladimir Landman v...@northernautoparts.com wrote:
Hi,

I am trying to create a schema for Solr.   Here is a relational model of what 
our data might look like:

Inventory
-
Sku
Price
Weight

Attributes
---
AttributeName
AttributeValue

Applications
--
Id (Auto-Incrementing)
Sku
VehicleYear
VehicleMake
VehicleModel
VehicleEngine

There can be multiple Application(s) records.  Also, Attributes can also have 
duplicates.  Basically I want to store basic information about our inventory, 
attributes, and applications.  If I didn't have the applications,
I would simply have:
field name=id ...
field name=sku ...
field name=price