Re: schema help

Rachel McConnell Tue, 11 Mar 2008 14:15:11 -0700

Our Solr use consists of several rather different data types, some of
which have one-to-many relationships with other types.  We don't need
to do any searching of quite the kind you describe, but I have an idea
about it, depending on what you need to do with the book data.  It is
rather hacky, but maybe you can improve it.


If you only need to present a list of books, possibly with links to
fuller data, you could do this:
* store only Authors in solr
* create a field, stored but not indexed (I may be using slightly
wrong terms here) which contains the short text representation of all
their books
* search on authors however you want and make sure you return this
field, and just display it as is

For example, if Jane Doe has written 2 books, How To Garden, and
Fields Of Maine, your special field might contain this:

<a href="link/to/How-To-Garden/>How To Garden</a> published on DATE,
describes how to garden in Jane Doe's inimitable fashion.  She goes
into great depth ....

<a href="link/to/Fields-Of-Maine/>Fields of Maine</a> published on
DATE.  A brief overvew of Maine's woods and fields with special
attention to wildflowers....

If your 'authors' 'write' 'books' with great frequency, you'd need to
update a lot...


Another possibility is to do two searches, with this kind of
structure, which sort of mimics an RDBMS:
* everything in Solr has a field, type (book, author, library, etc).
these can be filtered on a search by search basis
* books have a field, authorId, uniquely referencing the author
* your first search will restricted to just authors, from which you
will extract the IDs.
* your second search will be restricted to just books, whose authorId
field is exactly one of the IDs from the first search


As you have noticed, Lucene is not an RDBMS.  Searching through all
the text of all the books is more the use it was designed around; of
course the analogy might not be THAT strong with your need!

Rachel

On 3/11/08, Geoffrey Young <[EMAIL PROTECTED]> wrote:
>
>
>  Otis Gospodnetic wrote:
>  > Geoff,
>  >
>  > I'm not sure if I understood your problem correctly, but it sounds
>  > like you want your search to be restricted to authors, but then you
>  > want to list all of his/her books when displaying results.
>
>
> that's about right.  add that I may also want to search on libraries and
>  show all the books (and authors) stored there.
>
>  in real life, it's not books or authors, of course, but the parallels
>  are close enough :)  in fact, the library example is a good one for
>  me... or at least a network of public libraries linked together.
>
>
>  > The
>  > easiest thing to do would be to create an index where each
>  > "row"/Document has the author name, the book title, etc.  For each
>  > author-matching Document you'd pull his/her books out of the result
>  > set.  Yes, this means the author name would be denormalized in
>  > RDBMS-speak.
>
>
> I think I can live with the denormalization - it seems lucene is flat
>  and very different conceptually than a database :)
>
>  the trouble I'm having is one of dimension.  an author has many, many
>  attributes (name, birthdate, biography in $language, etc).  as does each
>  book (title in $language, summary in $language, genre, etc).  as does
>  each library (name, address, directions in $language, etc).  so an
>  author with N books doesn't seem to scale very well in the flat
>  representations I'm finding in all the lucene/solr docs and examples...
>  at least not in some way I can wrap my head around.
>
>  part of what seemed really appealing about lucene in general was that
>  you could stuff all this (unindexed) information into a document and
>  retrieve it all based on some search criteria.  but it's seeming very
>  difficult for me to wrap my head around the data I need to represent.
>
>
>  > Another option is not to index/store book titles, but
>  > rather have only an author index to search against.  The book data
>  > (mapped to author identities) would then be pulled from an external
>  > source (e.g. RDBMS: select title from books where author_id in
>  > (1,2,3)) at search results display time.
>
>
> eew :)  seriously, though, that's what we have now - all rdbms driven.
>  if solr could only conceptually handle the initial lookup there wouldn't
>  be much point.
>
>  maybe I'm thinking about this all wrong (as is to be expected :), but I
>  just can't believe that nobody is using solr to represent data a bit
>  more complex than the examples out there.
>
>  thanks for the feedback.
>
>  --Geoff
>
>
>  >
>  > Otis
>  >
>  > -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>  >
>  > ----- Original Message ---- From: Geoffrey Young
>  > <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent:
>  > Tuesday, March 11, 2008 12:17:32 PM Subject: schema help
>  >
>  > hi :)
>  >
>  > I'm trying to work out a schema for our widgets.  more than "just
>  > coming up with something" I'd like something idiomatic in solr terms.
>  > any help is much appreciated.  here's a similar problem space to what
>  > I'm working with...
>  >
>  > lets say we're talking books.  books are written by authors and held
>  > in libraries.  a sister company is using lucene+compass and they seem
>  > to have completely different collections (or whatever the technical
>  > term is :)
>  >
>  > authors books libraries
>  >
>  > so that a search for authors hits only the authors dataset.
>  >
>  > all of the solr examples I can find don't seem to address this kind
>  > of data disparity.  what is the standard and idiomatic approach for
>  > solr?
>  >
>  > for my particular data I'd want to display something like this
>  >
>  > author book in library book in library
>  >
>  > on the same result page, but using a completely flat, single schema
>  > doesn't seem to scale very well.
>  >
>  > collective widsom most welcome :)
>  >
>  > --Geoff
>  >
>  >
>

Re: schema help

Reply via email to