Otis Gospodnetic wrote:
Geoff,

I'm not sure if I understood your problem correctly, but it sounds
like you want your search to be restricted to authors, but then you
want to list all of his/her books when displaying results.

that's about right. add that I may also want to search on libraries and show all the books (and authors) stored there.

in real life, it's not books or authors, of course, but the parallels are close enough :) in fact, the library example is a good one for me... or at least a network of public libraries linked together.

The
easiest thing to do would be to create an index where each
"row"/Document has the author name, the book title, etc.  For each
author-matching Document you'd pull his/her books out of the result
set.  Yes, this means the author name would be denormalized in
RDBMS-speak.

I think I can live with the denormalization - it seems lucene is flat and very different conceptually than a database :)

the trouble I'm having is one of dimension. an author has many, many attributes (name, birthdate, biography in $language, etc). as does each book (title in $language, summary in $language, genre, etc). as does each library (name, address, directions in $language, etc). so an author with N books doesn't seem to scale very well in the flat representations I'm finding in all the lucene/solr docs and examples... at least not in some way I can wrap my head around.

part of what seemed really appealing about lucene in general was that you could stuff all this (unindexed) information into a document and retrieve it all based on some search criteria. but it's seeming very difficult for me to wrap my head around the data I need to represent.

Another option is not to index/store book titles, but
rather have only an author index to search against.  The book data
(mapped to author identities) would then be pulled from an external
source (e.g. RDBMS: select title from books where author_id in
(1,2,3)) at search results display time.

eew :) seriously, though, that's what we have now - all rdbms driven. if solr could only conceptually handle the initial lookup there wouldn't be much point.

maybe I'm thinking about this all wrong (as is to be expected :), but I just can't believe that nobody is using solr to represent data a bit more complex than the examples out there.

thanks for the feedback.

--Geoff


Otis

-- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch

----- Original Message ---- From: Geoffrey Young
<[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent:
Tuesday, March 11, 2008 12:17:32 PM Subject: schema help

hi :)

I'm trying to work out a schema for our widgets.  more than "just
coming up with something" I'd like something idiomatic in solr terms.
any help is much appreciated.  here's a similar problem space to what
I'm working with...

lets say we're talking books.  books are written by authors and held
in libraries.  a sister company is using lucene+compass and they seem
to have completely different collections (or whatever the technical
term is :)

authors books libraries

so that a search for authors hits only the authors dataset.

all of the solr examples I can find don't seem to address this kind
of data disparity.  what is the standard and idiomatic approach for
solr?

for my particular data I'd want to display something like this

author book in library book in library

on the same result page, but using a completely flat, single schema doesn't seem to scale very well.

collective widsom most welcome :)

--Geoff


Reply via email to