Re: prblems rebuilding conversations db

2019-01-21 Thread ellie timoney
Hi Michael,

On Tue, Jan 22, 2019, at 12:32 AM, Michael Menge wrote:
> Hi Ellie
> 
> Quoting ellie timoney :
> 
> > Hi Michael,
> >
> > On Wed, Jan 16, 2019, at 10:12 PM, Michael Menge wrote:
> >> Hi,
> >>
> >> because conversations db seems to be required for search indexes, I
> >> enabled this option
> >> on our production servers today and tried to rebuild the conversations
> >> db for all users with
> >>
> >> ctl_conversationsdb -v -b UID
> >>
> >> For most users this did take less than a second. But for some users
> >> this process would
> >> not finish. I did kill one process after about 30 Minutes (most others
> >> after 3 Minutes).
> >> The UserID.conversations has grown over 2 GB (the mailbox itself has
> >> only ~700 MB of mails,
> >> and the conversations files from finnished rebuilds are less then 20 MB)
> >
> > So, a 2GB-and-growing conversations db for a 700MB mailbox is a  
> > ratio of ~3:1, which seems obscene...
> >
> 
> thanks for confirming that this is not expected.
> 
> > For comparison, the Conversations.append_reply_200 cassandane test  
> > (two messages with 100 replies each) produces an 87Kb conversations  
> > db for a 1016Kb mailbox (ratio ~1:12, n.b inverted!), which makes  
> > your ~3:1 and growing ratio seem even more obscene...
> >
> > Obviously you can't share the bad conversations db because  
> > conversations db's are full of personally-identifying information.   
> > But I wonder if looking through it with `cyr_dbtool ... show` turns  
> > up any unusual patterns, especially in comparison to one of the good  
> > users?  I wonder if it's somehow repeating itself?
> 
> I was unable to run "cyr_dbtool ... show" or "ctl_conversationsdb -d"
> while ctl_conversationsdb -v -b was still running
> (waiting for a lock on conversations db file)
> 
> After aborting "ctl_conversationsdb -b" "cyr_dbtool show" didn't show any
> duplicates. But the string command showed more occurrences than cyr_dbtool
> did (i used an hash value (d8087b76083ec5e7) I found in the output of  
> cyr_dbtool
> 
> 
> # cyr_dbtool /path-to/userid.conversations skiplist show | grep  
> d8087b76083ec5e7 | wc -l
> 3
> # cyr_dbtool /path-to/userid.conversations skiplist show | grep  
> Bd8087b76083ec5e7 | wc -l
> 1
> # strings /path-to/userid.conversations | grep d8087b76083ec5e7 | wc -l
> 183
> # strings /path-to/userid.conversations | grep Bd8087b76083ec5e7 | wc -l
> 91
> 
> ctl_conversationsdb -d  did a cleanup because I did abort the  
> "ctl_conversationsdb -b" process
> 
> skiplist recovery /path-to/userid.conversations: found partial txn,  
> not replaying
> skiplist: recovered /path-to/userid.conversations (0 records, 144  
> bytes) in 0 seconds
> skiplist: checkpointed /path-to/userid.conversations (0 records, 144  
> bytes) in 0.010 sec
> 
> 
> I did run ctl_conversationsd -b with strace and discovered that the inbox
> (cyrus.index O_RDWR, cyrus.header O_RDONLY, cache  cyrus.O_RDWR ) of that user
> was opened over and over again, no other folder for that user was accessed.
> For other users the inbox was only opened once and then the other  
> folders followed.
> So not stopping "ctl_conversationsd -b", it would have run till my  
> filesystem was
> full.
> 

I guess something about that inbox is confusing it, and making it redo it over 
and over?  Very curious

> > Bron, does this sound like anything you've seen before, that maybe  
> > got fixed on master but not backported to 3.0?
> >
> >> Cyrus Logs show many "ctl_conversationsdb[933]: mailbox: longlock
> >> user.UserID for 1.5 seconds"
> >> every few seconds.
> >
> > This message is logged when the lock is released, if that lock had  
> > been held for >1s.  It's reporting that it held the lock for longer  
> > than it would like to have, but at least you know the lock has been  
> > released.  I think this is telling you that user-visible performance  
> > on the mailbox would have been impacted while this was occurring  
> > (because imapd/lmtpd wouldn't be able to access their mailbox while  
> > ctl_conversationsdb was holding those locks), but also tells you  
> > that ctl_conversationsdb was doing the right thing, and not just  
> > holding a single lock for the entire job.
> >
> 
> I am not sure if any other processes had a chance to acquire a lock.

If they'd tried, they would have succeeded (by blocking until the lock was 
theirs, during the gaps where ctl_conversationsdb dropped and reacquired the 
lock), but maybe nothing else wanted it.

> > Though, that raises the question: was the user accessing the mailbox  
> > while the rebuild was in progress?  I wonder if their client is  
> > doing something funny and tripping things up?
> 
> We did this while the server was accessible (didn't want to have
> several hour downtime for ~44000 users)

Fair enough!

> But as I did my recent testing on a test-server without user access,
> only ctl_conversationsdb and cyr_dbtool did accessed the mailbox during
> the conversation rebuild mentioned 

squatter issue on Cyrus 2.4

2019-01-21 Thread Marco

Hello,

 I have Cyrus 2.4 with thousand mailboxes and separated partitions for 
data and metadata, with squat files in metamaildata partitions.


Every night at 2 I run "squatter -s -a -i" to update the indexes.

Two nights ago a metamaildata partition filled out. Starting from 2am 
many GiB busy in few hours. I can't understand why. No errors in Cyrus 
logs. I didn't find the files which caused the busy space.


Last night squatter didn't reproduce the error.

Do you have ever experienced problems like this?

Thank you very much for every hints.
Kind Regards
Marco

**
name   : Cyrus IMAPD
version: v2.4.17-Invoca-RPM-2.4.17-6.el6 d1df8aff 2012-12-01
vendor : Project Cyrus
support-url: http://www.cyrusimap.org
os : Linux
os-version : 2.6.32-279.el6.x86_64
environment: Built w/Cyrus SASL 2.1.23
 Running w/Cyrus SASL 2.1.23
 Built w/OpenSSL 1.0.0-fips 29 Mar 2010
 Running w/OpenSSL 1.0.0-fips 29 Mar 2010
 Built w/zlib 1.2.3
 Running w/zlib 1.2.3
 CMU Sieve 2.4
 TCP Wrappers
 NET-SNMP
 mmap = shared
 lock = fcntl
 nonblock = fcntl
 idle = idled

**
EVENTS {
  # Squatter
  squatter  cmd="squatter -s -a -i" at=0200
**
metapartition_files: header index cache expunge squat
**

Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
To Unsubscribe:
https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus