Re: squatter running longer than 24 hours

2007-10-25 Thread David Lang
On Mon, 22 Oct 2007, Rob Mueller wrote:

 squatter would really benefit from incremental updates. At the moment a
 single new message in a mailbox containing 20k messages causes it to read
 in all the existing messages in order to regenerate the index.

 We spoke to Ken about this ages back, and even offered to pay for the work
 to make it happen, but it was just around the time CMU hired him, so it
 never actually happened pity. It would be nice to be able to dedicate a
 couple of weeks to rummage around in there and actually make it happen...

postgres has full-text search capabilities at acceptable performance on very 
large databases, their code is BSD so anything relavent coudl be merged into 
cyrus. it may be worth someone looking into their logic.

David Lang

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: squatter running longer than 24 hours

2007-10-22 Thread David Carter
On Sun, 21 Oct 2007, Vincent Fox wrote:

 I have seen squatter run more than 24 hours.

 This is on a large mail filesystem.  I've seen it start up a
 second one while the first is still running.  Should I:

 1) Forget about squatter
 2) Remove from cyrus.conf, run from cron every other day
 3) Find some option to cyrus.conf for same effect as #2?

I squat a fraction of mailboxes each night using:

http://www-uxsup.csx.cam.ac.uk/~dpc22/cyrus/patches/2.3.8/squatter.patch

For example:

   squatter -s -m 0 -M 7

would update the squat indexes for 1 in 7 mailboxes, based on modulo 
arithmetic on the mailbox UniqueID.

squatter would really benefit from incremental updates. At the moment a 
single new message in a mailbox containing 20k messages causes it to read 
in all the existing messages in order to regenerate the index.

Unfortunately, the code is rather impenetrable. I infer that it is 
collecting information about adjacent characters in the message body. 
Presumably a 5 character search term provides 4 required pairing as a 
prefilter from the squat engine before message by message search kicks in.

-- 
David Carter Email: [EMAIL PROTECTED]
University Computing Service,Phone: (01223) 334502
New Museums Site, Pembroke Street,   Fax:   (01223) 334679
Cambridge UK. CB2 3QH.

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: squatter running longer than 24 hours

2007-10-22 Thread Rob Mueller
 squatter would really benefit from incremental updates. At the moment a
 single new message in a mailbox containing 20k messages causes it to read
 in all the existing messages in order to regenerate the index.

We spoke to Ken about this ages back, and even offered to pay for the work 
to make it happen, but it was just around the time CMU hired him, so it 
never actually happened pity. It would be nice to be able to dedicate a 
couple of weeks to rummage around in there and actually make it happen...

Rob


Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html