The problem is going to be 'near real time' indexing issues. Solr 1.4 at least does not do a very good job of handling very frequent commits. If you want to add to the user's history in the Solr index ever time they click the button, and they click the button a lot, and this naturally leads to commits very frequent commits to Solr (every minute, every second, multiple times a second), you're going to have RAM and performance problems.

I believe there are some things in trunk that make handling this better, don't know the details but "near real time search" is what people talk about, to google or ask on this list.

Or, if it's acceptable for your requirements, you could record all the "I've read this" clicks in an external store, and only add them to the Solr index nightly, or even hourly. If you batch em and add em as frequently as you can get away with (every hour sure, every 10 minutes pushing it, every minute, no), you can get around that issue. Or for that matter you could ADD em to Solr but only 'commit' every hour or whatever, but I don't like that strategy since if Solr crashes or otherwise restarts you pretty much lose those pending commits, better to queue em up in an external store.

On 1/19/2011 1:52 PM, Markus Jelsma wrote:
Hi,

I've never seen Solr's behaviour with a huge amount of values in a multi
valued but i think it should work alright. Then you can stored a list of user
ID's along with each book document and user filter queries to include or
exclude the book from the result set.

Cheers,

Hi,

I'm looking for ideas on how to make an efficient facet query on a
user's history with respect to the catalog of documents (something
like "Read document already: yes / no"). The catalog is around 100k
titles and there are several thousand users. Of course, each user has
a different history, many having read fewer than 500 titles, but some
heavy users having read perhaps 50k titles.

Performance is not terribly important right now so all I did was bump
up the boolean query limit and put together a big string of document
id's that the user has read. The first query is slow but once it's in
the query cache it's fine. I would like to find a better way of doing
it though.

What type of solr plugin would be best suited to helping in this
situation? I could make a function plugin that provides something like
hasHadBefore() - true/false, but would that be efficient for faceting
and filtering? Another idea is a QParserPlugin that looks for a field
like hasHadBefore:userid and somehow substitutes in the list of docs.
But I'm not sure how a new parser plugin would interact with the
existing parser. Can solr use a parser plugin to only handle one
field, and leave all the other fields to the default parser?

Thanks,
Jon

Reply via email to