On Thu, Jul 10, 2008 at 7:53 AM, aris buinevicius <[EMAIL PROTECTED]> wrote: > We're trying to implement a large scale domain specific web email > application, and so far solr performance on the search side is really doing > well for us. > > There are two limitations that I can't seem to get around however, and was > hoping for some advice. > > 1. We would like to do bulk tagging on large query result sets (ie, if you > have 1M emails, do a search, and then you wish to apply a tag to the result > set of, say, 250k results). I've tried many approaches, but the closest > support I could see was the update field functionality in SOLR-139. Is > there any other way to separate the very dynamic metadata (tags and other > fields) abstracted away from the static documents themselves? I've > researched joining against a metadata database, but unfortunately the join > logic for large results is just too bulky to be perform well at scale. > Also have even looked at postgres tsearch2, but that also breaks down with a > large number of emails. Updating large no:of docs in one go is a bit expensive . (SOLR-139) is trying to achieve that but it is still expensive.If the users do not tag the docs too often then it may be OK > > 2. We're assuming we'll have thousands of users with independent data; any > good way to partition multiple indexes with solr? With Lucene we could > just save those in independent directories, and cache the index while the > user session is active. I saw some configurations on tomcat that would > allow multiple instances, but that's probably not practical for lots of > concurrent users. Maintaining multiple indices is not a good idea. Add an extra attribute 'userid' to each document and search with user id as a 'fq'. The caches in Solr will automatically take care of the rest. > > Thanks for any tips; would love to use Solr (or Lucene), but haven't been > able to get around issue 1 yet for large numbers of emails in a timely > response. We've really looked at the gamut here, including solr, lucene, > postgres (tsearch2), sphinx, xapian, couchdb(!), and more. > > ab >
-- --Noble Paul