On May 19, 2006, at 4:53 PM, Chris Hostetter wrote:
: it has is case sensitivity. I could lowercase everything, but then
: the terms the user sees will be in all lowercase and that simply
: won't do for my scholarly audience :)
picky, picky users.
Yeah, if it wasn't for them, I'd have it easy :)
: It seems like what I really need is simply a separate index (or
: rather a partition of the main Solr one) where a Document represents
: an "agent", and do a PrefixQuery or TermEnum and get all unique
agents.
i've let it roll arround in my head for a few days, and i think that's
exactly what i would do if it were me ... in fact, what you
describe is
pretty much exactly what i do for product categories, except htat i
think
i store more metadata about each category then you would store
about your
"agents". what you really need is a way to search for agents by
term or
term prefix, get a list of matching agents, and then use each agent
as a
facet for your "works" .. I do the same thing, except my term
queries are
on the unique id for the category, and my "prefix" queries are for the
null prefix (ie: look at all categories) .. then once i have a
category, i
have other data that helps me with further facets (ie: for digital
camera's, "resolution" is a good facet).
In fact, I've implemented this locally as a custom SolrCache that
holds a RAMDirectory. I TermEnum the agents in the main index on warm
() and index all the agents into the RAMDirectory. It is working well.
i could imagine the same extension eventually unfolding for your
agents
... i don't know much about literary works, but if we transition it
to art
in general, you might have information for one artist about different
"labels" that apply to the art he produced in his life (sculpture,
painting, cubist, impressionist, modern, "blue period", etc..) and
once
your user has selected a specific artist, you could use the list of
labels
from a stored field of the artists metadata doc to decide which
facets to
offer the user in refining further.
We have metadata out the wazoo for this stuff. We have "genres"
which is a categorization of the type of work like "Painting",
"Poetry", etc. We have agents classified into roles. The same
person could be the author of one work, and a figure in a painting of
another work, and the editor of another. So even within agent the
user interface will display the break down of each agent by the
various roles. *whew*
: Maybe I need to build some sort of term -> agent cache during
warming
: that makes this a no brainer?
that's another way to go ... but if you make one doc per agent,
then this
is just a subset of the filter cache .... i personally love the filter
cache :)
I opted for the RAMDirectory so I can leverage Lucene scoring for the
ordering of agents, rather than only alphabetical and frequency options.
Erik