the trouble I'm having is one of dimension. an author has many, many
attributes (name, birthdate, biography in $language, etc). as does
each book (title in $language, summary in $language, genre, etc). as
does each library (name, address, directions in $language, etc). so
an author with N books doesn't seem to scale very well in the flat
representations I'm finding in all the lucene/solr docs and
examples... at least not in some way I can wrap my head around.
OG: I'm not sure why the number of attributes worries you. Imagine
is as a wide RDBMS table, if it helps. Indices with dozens of fields
are not uncommon.
it's not necessarily the number of fields, it's the Attribute1 ..
AttributeN-style numbering that worries me. but I think it's all
starting to make sense now... if wanting to pull data in multiple
queries was my holdup.
OG: You certainly can do that. I'm not sure I understand where the
hard part is. You seem to know what attributes each entity has.
Maybe you are confused by how to handle N different types of entities
in a single index?
yes... or, more properly, how to relate them to eachother.
I understand that the schema can hold tons of attributes that are unused
in different documents. my question seems to be how to organize my data
such that I can answer the question "how do I get a list of libraries
with $book like $pattern" - where does the de-normalization typically
occur? if a document fully represents "a book by an author in a
library" such that the same book (with all it's attributes) is in my
index multiple times (one for each library) how do I drill down to
showing just the directions to a specific library?
(I'm assuming a single index is what you currently
have in mind)
using different indices is what my lucene+compass counterparts are
doing. I couldn't find an example of that in the solr docs (unless the
answer is running multiple, distinct instances at the same time)
eew :) seriously, though, that's what we have now - all rdbms
driven. if solr could only conceptually handle the initial lookup
there wouldn't be much point.
OG: Well, there might or might not be, depending on how much data you
have, how flexible and fast your RDBMS-powered (full-text?) search,
and so on. The Lucene/Solr for full-text search + RDBMS/BDB for
display data is a common combination.
"the decision has been made to use lucene to replace all rdbms
functionality for search"
*cough*
:)
maybe I'm thinking about this all wrong (as is to be expected :), but
I just can't believe that nobody is using solr to represent data a
bit more complex than the examples out there.
OG: Oh, lots of people are, it's just that examples are simple, so
people new to Solr, Lucene, etc. have easier time learning.
:)
thanks for your help here.
--Geoff