Re: schema help

Geoffrey Young Tue, 11 Mar 2008 13:55:52 -0700


Otis Gospodnetic wrote:

Geoff,

I'm not sure if I understood your problem correctly, but it sounds
like you want your search to be restricted to authors, but then you

want to list all of his/her books when displaying results.

that's about right. add that I may also want to search on libraries andshow all the books (and authors) stored there.

in real life, it's not books or authors, of course, but the parallelsare close enough :) in fact, the library example is a good one forme... or at least a network of public libraries linked together.

The
easiest thing to do would be to create an index where each
"row"/Document has the author name, the book title, etc.  For each
author-matching Document you'd pull his/her books out of the result
set.  Yes, this means the author name would be denormalized in

RDBMS-speak.

I think I can live with the denormalization - it seems lucene is flatand very different conceptually than a database :)

the trouble I'm having is one of dimension. an author has many, manyattributes (name, birthdate, biography in $language, etc). as does eachbook (title in $language, summary in $language, genre, etc). as doeseach library (name, address, directions in $language, etc). so anauthor with N books doesn't seem to scale very well in the flatrepresentations I'm finding in all the lucene/solr docs and examples...at least not in some way I can wrap my head around.

part of what seemed really appealing about lucene in general was thatyou could stuff all this (unindexed) information into a document andretrieve it all based on some search criteria. but it's seeming verydifficult for me to wrap my head around the data I need to represent.

Another option is not to index/store book titles, but
rather have only an author index to search against.  The book data
(mapped to author identities) would then be pulled from an external
source (e.g. RDBMS: select title from books where author_id in
(1,2,3)) at search results display time.

eew :) seriously, though, that's what we have now - all rdbms driven.if solr could only conceptually handle the initial lookup there wouldn'tbe much point.

maybe I'm thinking about this all wrong (as is to be expected :), but Ijust can't believe that nobody is using solr to represent data a bitmore complex than the examples out there.


thanks for the feedback.

--Geoff


Otis

-- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

----- Original Message ---- From: Geoffrey Young
<[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent:
Tuesday, March 11, 2008 12:17:32 PM Subject: schema help

hi :)

I'm trying to work out a schema for our widgets.  more than "just
coming up with something" I'd like something idiomatic in solr terms.
any help is much appreciated.  here's a similar problem space to what
I'm working with...

lets say we're talking books.  books are written by authors and held
in libraries.  a sister company is using lucene+compass and they seem
to have completely different collections (or whatever the technical
term is :)

authors books libraries

so that a search for authors hits only the authors dataset.

all of the solr examples I can find don't seem to address this kind
of data disparity.  what is the standard and idiomatic approach for
solr?

for my particular data I'd want to display something like this

author book in library book in library

on the same result page, but using a completely flat, single schemadoesn't seem to scale very well.


collective widsom most welcome :)

--Geoff

Re: schema help

Reply via email to