On May 19, 2006, at 4:53 PM, Chris Hostetter wrote:
: it has is case sensitivity.  I could lowercase everything, but then
: the terms the user sees will be in all lowercase and that simply
: won't do for my scholarly audience :)

picky, picky users.

Yeah, if it wasn't for them, I'd have it easy :)

: It seems like what I really need is simply a separate index (or
: rather a partition of the main Solr one) where a Document represents
: an "agent", and do a PrefixQuery or TermEnum and get all unique agents.

i've let it roll arround in my head for a few days, and i think that's
exactly what i would do if it were me ... in fact, what you describe is pretty much exactly what i do for product categories, except htat i think i store more metadata about each category then you would store about your "agents". what you really need is a way to search for agents by term or term prefix, get a list of matching agents, and then use each agent as a facet for your "works" .. I do the same thing, except my term queries are
on the unique id for the category, and my "prefix" queries are for the
null prefix (ie: look at all categories) .. then once i have a category, i
have other data that helps me with further facets (ie: for digital
camera's, "resolution" is a good facet).

In fact, I've implemented this locally as a custom SolrCache that holds a RAMDirectory. I TermEnum the agents in the main index on warm () and index all the agents into the RAMDirectory. It is working well.

i could imagine the same extension eventually unfolding for your agents ... i don't know much about literary works, but if we transition it to art
in general, you might have information for one artist about different
"labels" that apply to the art he produced in his life (sculpture,
painting, cubist, impressionist, modern, "blue period", etc..) and once your user has selected a specific artist, you could use the list of labels from a stored field of the artists metadata doc to decide which facets to
offer the user in refining further.

We have metadata out the wazoo for this stuff. We have "genres" which is a categorization of the type of work like "Painting", "Poetry", etc. We have agents classified into roles. The same person could be the author of one work, and a figure in a painting of another work, and the editor of another. So even within agent the user interface will display the break down of each agent by the various roles. *whew*

: Maybe I need to build some sort of term -> agent cache during warming
: that makes this a no brainer?

that's another way to go ... but if you make one doc per agent, then this
is just a subset of the filter cache .... i personally love the filter
cache :)

I opted for the RAMDirectory so I can leverage Lucene scoring for the ordering of agents, rather than only alphabetical and frequency options.

        Erik

Reply via email to