Re: sync_client problems

2017-05-11 Thread Eric Cunningham
Thanks Bron, but that doesn't seem to work for me, unless I'm missing 
something.


I ran reconstructs for this account on both my master and copy hosts:

[cyrus@imap1 ~]$ reconstruct -f -r user.scramer
user.scramer
user.scramer.Archive
user.scramer.Archives
user.scramer.Archives.2004
...
user.scramer.Trash.IS Networking and Operations
user.scramer.Trash.IS Security
user.scramer.Trash.IS Servers
user.scramer.Trash.IS Staff
user.scramer.Trash.IS Surveys
...


[cyrus@imap2 ~]$ reconstruct -f -r user.scramer
user.scramer
user.scramer.Archive
user.scramer.Archives
user.scramer.Archives.2004
...
user.scramer.Trash.IS Networking and Operations
user.scramer.Trash.IS Security
user.scramer.Trash.IS Servers
user.scramer.Trash.IS Staff
user.scramer.Trash.IS Surveys
...


I then restart the replication against the log-run file and it again 
fails on a folder that no longer exists in that account:


...
May 11 11:40:51 imap1 sync_client[60696]: sync_mailboxes: doing 1000
May 11 11:40:51 imap1 sync_client[60696]: IOERROR: zero length response 
to MAILBOXES (end of file reached)
May 11 11:40:51 imap1 sync_client[60696]: IOERROR: zero length response 
to MAILBOXES (end of file reached)
May 11 11:40:51 imap1 sync_client[60696]: IOERROR: zero length response 
to UNMAILBOX (end of file reached)
May 11 11:40:51 imap1 sync_client[60696]: folder_delete(): failed: 
user.scramer.Trash.IS Software Management 'Bad protocol'
May 11 11:40:51 imap1 sync_client[60696]: Error in do_sync(): bailing 
out! Bad protocol
May 11 11:40:51 imap1 sync_client[60696]: Processing sync log file 
/var/spool/CyrusDB-Sieve/cyrusdb/sync/log-run failed: Bad protocol



-Eric


On 05/11/2017 02:14 AM, Bron Gondwana wrote:

Looks like you have a corrupted mailboxes database - if you run a reconstruct 
at each end then it should be fine.  Replication won't bail for an 
actually-doesn't-exist mailbox, but it can't fix broken mailboxes, that's what 
reconstruct is for.

Bron.


On Thu, 11 May 2017, at 01:01, Eric Cunningham wrote:

Hi list, I'm running cyrus replication on cyrus 2.5.10.  It seems
whenever a folder is encountered in log-run that doesn't exist on the
client and/or the server, the replication process logs "Error in
do_sync(): bailing out! Bad protocol".  Sometimes, it dies completely,
other times it restarts at the beginning of the log-run file.  But when
it re-encounters the folder in question, the same failure occurs and the
problem continuously loops until some manual action is taken.  I have a
checker script that attempts to restart a stopped sync but I haven't
tried to code it to get past these types of folder errors.  In the
meantime, replication effectively stops and the replication log
continuously builds in size.  You might imagine my horror at discovering
yesterday that replication of my production server with 1200 accounts
had been failing for over 7 weeks unnoticed, generating 9.5 million
lines in the replication log files.  I had gotten out of the habit of
daily monitoring of replication as it had been running fine and I
eventually moved on to other things.  I'm trying to get replication to
run through the enormous backlog but am running into more similar folder
issues that cause the process to fail, I make manual fixes, and start
the process again.  In my experience with this, reconstruction of the
affected folders on the master host and copy host has no effect.

Can this otherwise excellent feature be made more robust to work past
these folder issues?

Here are a couple of examples I've observed thus far of how the process
is failing for me.

--

In this example, the subfolder "Fly America Foreign Travel" no longer
existed on the master host but did exist on the copy host.  To resolve
this, I had to manually delete that folder from the copy host.

May  9 15:17:48 imap1 sync_client[34614]: inefficient replication (8 >
5) user.jtaft.PO.Fly America Foreign Travel
May  9 15:17:48 imap1 sync_client[34614]: FULLMAILBOX received NO
response: IMAP_MAILBOX_NONEXISTENT No Such Mailbox
May  9 15:17:48 imap1 sync_client[34614]: do_folders(): update failed:
user.jtaft.PO.Fly America Foreign Travel 'Mailbox does not exist'
May  9 15:17:48 imap1 sync_client[34614]: IOERROR: Mailbox does not exist
May  9 15:17:48 imap1 sync_client[34614]: Error in do_sync(): bailing
out! Mailbox does not exist
May  9 15:17:48 imap1 sync_client[34614]: Processing sync log file
/var/spool/CyrusDB-Sieve/cyrusdb/sync/log-run failed: Mailbox does not exist
May  9 15:17:48 imap1 sync_client[34614]: Reprocessing sync log file
/var/spool/CyrusDB-Sieve/cyrusdb/sync/log-run

--

In this example, the subfolder "Sohn" did not exist on the master host
nor on the copy host.  To resolve, I had to edit log-run and completely
remove all entries for this "Sohn" subfolder.

May 10 10:13:26 imap1 sync_client[47011]: IOERROR: zero length response
to MAILBOXES (end of file reached)
May 10 10:13:26 imap1 sync_c

Re: sync_client problems

2017-05-11 Thread Eric Cunningham
I have to walk this back.  In looking slightly further back in my 
logfiles, before every instance of a failure to sync some folder, I see 
a common error reported prior to every "bailing out! Bad protocol" error 
- "IOERROR: GUID mismatch /var/spool/cyrus/mail/c/user/cdm-lit/Sent/148."


May 11 12:44:09 imap1 sync_client[48590]: IOERROR: GUID mismatch 
/var/spool/cyrus/mail/c/user/cdm-lit/Sent/148.
May 11 12:44:22 imap1 sync_client[48590]: IOERROR: zero length response 
to MAILBOX (end of file reached)
May 11 12:44:22 imap1 sync_client[48590]: do_folders(): update failed: 
user.cdm-lit.Sent 'Bad protocol'

May 11 12:44:22 imap1 sync_client[48590]: sync_mailboxes: doing 1000
May 11 12:44:22 imap1 sync_client[48590]: IOERROR: zero length response 
to MAILBOXES (end of file reached)

May 11 12:44:22 imap1 sync_client[48590]: sync_mailboxes: doing 1000
May 11 12:44:22 imap1 sync_client[48590]: IOERROR: zero length response 
to MAILBOXES (end of file reached)

May 11 12:44:22 imap1 sync_client[48590]: sync_mailboxes: doing 1000
May 11 12:44:22 imap1 sync_client[48590]: IOERROR: zero length response 
to MAILBOXES (end of file reached)

May 11 12:44:22 imap1 sync_client[48590]: sync_mailboxes: doing 1000
May 11 12:44:22 imap1 sync_client[48590]: IOERROR: zero length response 
to MAILBOXES (end of file reached)

May 11 12:44:22 imap1 sync_client[48590]: sync_mailboxes: doing 1000
May 11 12:44:22 imap1 sync_client[48590]: IOERROR: zero length response 
to MAILBOXES (end of file reached)

May 11 12:44:22 imap1 sync_client[48590]: sync_mailboxes: doing 1000
May 11 12:44:22 imap1 sync_client[48590]: IOERROR: zero length response 
to MAILBOXES (end of file reached)
May 11 12:44:22 imap1 sync_client[48590]: IOERROR: zero length response 
to MAILBOXES (end of file reached)
May 11 12:44:22 imap1 sync_client[48590]: IOERROR: zero length response 
to UNMAILBOX (end of file reached)
May 11 12:44:22 imap1 sync_client[48590]: folder_delete(): failed: 
user.guest-student-coordinator.Trash.Aarflot 'Bad protocol'
May 11 12:44:22 imap1 sync_client[48590]: Error in do_sync(): bailing 
out! Bad protocol
May 11 12:44:22 imap1 sync_client[48590]: Processing sync log file 
/var/spool/CyrusDB-Sieve/cyrusdb/sync/log-run failed: Bad protocol


When I recontructed this cdm-lit account, it ran successfully on my 
master host, but fails on my copy host:


[cyrus@imap2 ~]$ reconstruct -f -r user.cdm-lit
user.cdm-lit
user.cdm-lit.Sea Trail 2013
user.cdm-lit.Sent uid 1 found - adding
user.cdm-lit.Sent uid 2 found - adding
user.cdm-lit.Sent uid 3 found - adding
user.cdm-lit.Sent uid 4 found - adding

user.cdm-lit.Sent uid 146 found - adding
user.cdm-lit.Sent uid 147 found - adding
user.cdm-lit.Sent uid 148 found - adding
fatal error: Internal error: assertion failed: imap/mailbox.c: 2847: 
record->size


Since I couldn't get a successful reconstruct on this account, I deleted 
it and recreated it from my master host.  However, I'm still unable to 
get a successful reconstruct with "failed to read index header" for 
every subfolder and "fatal error: Internal error: assertion failed: 
imap/mailbox.c: 2847: record->size"


Any ideas on how to correct this so I can see if I can then get past the 
replication error?


Thanks!

-Eric


On 05/11/2017 11:46 AM, Eric Cunningham wrote:
Thanks Bron, but that doesn't seem to work for me, unless I'm missing 
something.


I ran reconstructs for this account on both my master and copy hosts:

[cyrus@imap1 ~]$ reconstruct -f -r user.scramer
user.scramer
user.scramer.Archive
user.scramer.Archives
user.scramer.Archives.2004
...
user.scramer.Trash.IS Networking and Operations
user.scramer.Trash.IS Security
user.scramer.Trash.IS Servers
user.scramer.Trash.IS Staff
user.scramer.Trash.IS Surveys
...


[cyrus@imap2 ~]$ reconstruct -f -r user.scramer
user.scramer
user.scramer.Archive
user.scramer.Archives
user.scramer.Archives.2004
...
user.scramer.Trash.IS Networking and Operations
user.scramer.Trash.IS Security
user.scramer.Trash.IS Servers
user.scramer.Trash.IS Staff
user.scramer.Trash.IS Surveys
...


I then restart the replication against the log-run file and it again 
fails on a folder that no longer exists in that account:


...
May 11 11:40:51 imap1 sync_client[60696]: sync_mailboxes: doing 1000
May 11 11:40:51 imap1 sync_client[60696]: IOERROR: zero length response 
to MAILBOXES (end of file reached)
May 11 11:40:51 imap1 sync_client[60696]: IOERROR: zero length response 
to MAILBOXES (end of file reached)
May 11 11:40:51 imap1 sync_client[60696]: IOERROR: zero length response 
to UNMAILBOX (end of file reached)
May 11 11:40:51 imap1 sync_client[60696]: folder_delete(): failed: 
user.scramer.Trash.IS Software Management 'Bad protocol'
May 11 11:40:51 imap1 sync_client[60696]: Error in do_sync(): bailing 
out! Bad protocol
May 11 11:40:51 imap1 sync_client[60696]: Processing sync log file 
/var/spool/CyrusDB-Sieve/cyrusdb/sync/log-run failed: Bad protocol



Re: sync_client problems

2017-05-11 Thread Eric Cunningham
Just to close the loop on this, once the corrupted cdm-lit account on my 
copy server successfully reconstructed, the backlog of replication has 
completed successfully and is now caught up.


-Eric

On 05/11/2017 02:20 PM, Eric Cunningham wrote:
-G didn't seem to help, but I tried a "reconstruct -f -r" command 
without having previously deleted all occurrences of the cyrus.* files 
and it was then successful.




On 05/11/2017 01:44 PM, Patrick Boutilier wrote:

Try a -G with reconstruct?

-G Force  re-parsing  of the underlying message (checks GUID 
correctness).  Reconstruct with -G should fix all possible individual 
message issues, including corrupted data files.



On 05/11/2017 02:37 PM, Eric Cunningham wrote:
I have to walk this back.  In looking slightly further back in my 
logfiles, before every instance of a failure to sync some folder, I 
see a common error reported prior to every "bailing out! Bad 
protocol" error - "IOERROR: GUID mismatch 
/var/spool/cyrus/mail/c/user/cdm-lit/Sent/148."


May 11 12:44:09 imap1 sync_client[48590]: IOERROR: GUID mismatch 
/var/spool/cyrus/mail/c/user/cdm-lit/Sent/148.
May 11 12:44:22 imap1 sync_client[48590]: IOERROR: zero length 
response to MAILBOX (end of file reached)
May 11 12:44:22 imap1 sync_client[48590]: do_folders(): update 
failed: user.cdm-lit.Sent 'Bad protocol'

May 11 12:44:22 imap1 sync_client[48590]: sync_mailboxes: doing 1000
May 11 12:44:22 imap1 sync_client[48590]: IOERROR: zero length 
response to MAILBOXES (end of file reached)

May 11 12:44:22 imap1 sync_client[48590]: sync_mailboxes: doing 1000
May 11 12:44:22 imap1 sync_client[48590]: IOERROR: zero length 
response to MAILBOXES (end of file reached)

May 11 12:44:22 imap1 sync_client[48590]: sync_mailboxes: doing 1000
May 11 12:44:22 imap1 sync_client[48590]: IOERROR: zero length 
response to MAILBOXES (end of file reached)

May 11 12:44:22 imap1 sync_client[48590]: sync_mailboxes: doing 1000
May 11 12:44:22 imap1 sync_client[48590]: IOERROR: zero length 
response to MAILBOXES (end of file reached)

May 11 12:44:22 imap1 sync_client[48590]: sync_mailboxes: doing 1000
May 11 12:44:22 imap1 sync_client[48590]: IOERROR: zero length 
response to MAILBOXES (end of file reached)

May 11 12:44:22 imap1 sync_client[48590]: sync_mailboxes: doing 1000
May 11 12:44:22 imap1 sync_client[48590]: IOERROR: zero length 
response to MAILBOXES (end of file reached)
May 11 12:44:22 imap1 sync_client[48590]: IOERROR: zero length 
response to MAILBOXES (end of file reached)
May 11 12:44:22 imap1 sync_client[48590]: IOERROR: zero length 
response to UNMAILBOX (end of file reached)
May 11 12:44:22 imap1 sync_client[48590]: folder_delete(): failed: 
user.guest-student-coordinator.Trash.Aarflot 'Bad protocol'
May 11 12:44:22 imap1 sync_client[48590]: Error in do_sync(): bailing 
out! Bad protocol
May 11 12:44:22 imap1 sync_client[48590]: Processing sync log file 
/var/spool/CyrusDB-Sieve/cyrusdb/sync/log-run failed: Bad protocol


When I recontructed this cdm-lit account, it ran successfully on my 
master host, but fails on my copy host:


[cyrus@imap2 ~]$ reconstruct -f -r user.cdm-lit
user.cdm-lit
user.cdm-lit.Sea Trail 2013
user.cdm-lit.Sent uid 1 found - adding
user.cdm-lit.Sent uid 2 found - adding
user.cdm-lit.Sent uid 3 found - adding
user.cdm-lit.Sent uid 4 found - adding

user.cdm-lit.Sent uid 146 found - adding
user.cdm-lit.Sent uid 147 found - adding
user.cdm-lit.Sent uid 148 found - adding
fatal error: Internal error: assertion failed: imap/mailbox.c: 2847: 
record->size


Since I couldn't get a successful reconstruct on this account, I 
deleted it and recreated it from my master host.  However, I'm still 
unable to get a successful reconstruct with "failed to read index 
header" for every subfolder and "fatal error: Internal error: 
assertion failed: imap/mailbox.c: 2847: record->size"


Any ideas on how to correct this so I can see if I can then get past 
the replication error?


Thanks!

-Eric


On 05/11/2017 11:46 AM, Eric Cunningham wrote:
Thanks Bron, but that doesn't seem to work for me, unless I'm 
missing something.


I ran reconstructs for this account on both my master and copy hosts:

[cyrus@imap1 ~]$ reconstruct -f -r user.scramer
user.scramer
user.scramer.Archive
user.scramer.Archives
user.scramer.Archives.2004
...
user.scramer.Trash.IS Networking and Operations
user.scramer.Trash.IS Security
user.scramer.Trash.IS Servers
user.scramer.Trash.IS Staff
user.scramer.Trash.IS Surveys
...


[cyrus@imap2 ~]$ reconstruct -f -r user.scramer
user.scramer
user.scramer.Archive
user.scramer.Archives
user.scramer.Archives.2004
...
user.scramer.Trash.IS Networking and Operations
user.scramer.Trash.IS Security
user.scramer.Trash.IS Servers
user.scramer.Trash.IS Staff
user.scramer.Trash.IS Surveys
...


I then restart the replication against the log-run file and it again 
fails on a folder that no longer exists in that account:


...

Re: sync_client problems

2017-05-11 Thread Eric Cunningham
-G didn't seem to help, but I tried a "reconstruct -f -r" command 
without having previously deleted all occurrences of the cyrus.* files 
and it was then successful.




On 05/11/2017 01:44 PM, Patrick Boutilier wrote:

Try a -G with reconstruct?

-G Force  re-parsing  of the underlying message (checks GUID 
correctness).  Reconstruct with -G should fix all possible individual 
message issues, including corrupted data files.



On 05/11/2017 02:37 PM, Eric Cunningham wrote:
I have to walk this back.  In looking slightly further back in my 
logfiles, before every instance of a failure to sync some folder, I 
see a common error reported prior to every "bailing out! Bad protocol" 
error - "IOERROR: GUID mismatch 
/var/spool/cyrus/mail/c/user/cdm-lit/Sent/148."


May 11 12:44:09 imap1 sync_client[48590]: IOERROR: GUID mismatch 
/var/spool/cyrus/mail/c/user/cdm-lit/Sent/148.
May 11 12:44:22 imap1 sync_client[48590]: IOERROR: zero length 
response to MAILBOX (end of file reached)
May 11 12:44:22 imap1 sync_client[48590]: do_folders(): update failed: 
user.cdm-lit.Sent 'Bad protocol'

May 11 12:44:22 imap1 sync_client[48590]: sync_mailboxes: doing 1000
May 11 12:44:22 imap1 sync_client[48590]: IOERROR: zero length 
response to MAILBOXES (end of file reached)

May 11 12:44:22 imap1 sync_client[48590]: sync_mailboxes: doing 1000
May 11 12:44:22 imap1 sync_client[48590]: IOERROR: zero length 
response to MAILBOXES (end of file reached)

May 11 12:44:22 imap1 sync_client[48590]: sync_mailboxes: doing 1000
May 11 12:44:22 imap1 sync_client[48590]: IOERROR: zero length 
response to MAILBOXES (end of file reached)

May 11 12:44:22 imap1 sync_client[48590]: sync_mailboxes: doing 1000
May 11 12:44:22 imap1 sync_client[48590]: IOERROR: zero length 
response to MAILBOXES (end of file reached)

May 11 12:44:22 imap1 sync_client[48590]: sync_mailboxes: doing 1000
May 11 12:44:22 imap1 sync_client[48590]: IOERROR: zero length 
response to MAILBOXES (end of file reached)

May 11 12:44:22 imap1 sync_client[48590]: sync_mailboxes: doing 1000
May 11 12:44:22 imap1 sync_client[48590]: IOERROR: zero length 
response to MAILBOXES (end of file reached)
May 11 12:44:22 imap1 sync_client[48590]: IOERROR: zero length 
response to MAILBOXES (end of file reached)
May 11 12:44:22 imap1 sync_client[48590]: IOERROR: zero length 
response to UNMAILBOX (end of file reached)
May 11 12:44:22 imap1 sync_client[48590]: folder_delete(): failed: 
user.guest-student-coordinator.Trash.Aarflot 'Bad protocol'
May 11 12:44:22 imap1 sync_client[48590]: Error in do_sync(): bailing 
out! Bad protocol
May 11 12:44:22 imap1 sync_client[48590]: Processing sync log file 
/var/spool/CyrusDB-Sieve/cyrusdb/sync/log-run failed: Bad protocol


When I recontructed this cdm-lit account, it ran successfully on my 
master host, but fails on my copy host:


[cyrus@imap2 ~]$ reconstruct -f -r user.cdm-lit
user.cdm-lit
user.cdm-lit.Sea Trail 2013
user.cdm-lit.Sent uid 1 found - adding
user.cdm-lit.Sent uid 2 found - adding
user.cdm-lit.Sent uid 3 found - adding
user.cdm-lit.Sent uid 4 found - adding

user.cdm-lit.Sent uid 146 found - adding
user.cdm-lit.Sent uid 147 found - adding
user.cdm-lit.Sent uid 148 found - adding
fatal error: Internal error: assertion failed: imap/mailbox.c: 2847: 
record->size


Since I couldn't get a successful reconstruct on this account, I 
deleted it and recreated it from my master host.  However, I'm still 
unable to get a successful reconstruct with "failed to read index 
header" for every subfolder and "fatal error: Internal error: 
assertion failed: imap/mailbox.c: 2847: record->size"


Any ideas on how to correct this so I can see if I can then get past 
the replication error?


Thanks!

-Eric


On 05/11/2017 11:46 AM, Eric Cunningham wrote:
Thanks Bron, but that doesn't seem to work for me, unless I'm missing 
something.


I ran reconstructs for this account on both my master and copy hosts:

[cyrus@imap1 ~]$ reconstruct -f -r user.scramer
user.scramer
user.scramer.Archive
user.scramer.Archives
user.scramer.Archives.2004
...
user.scramer.Trash.IS Networking and Operations
user.scramer.Trash.IS Security
user.scramer.Trash.IS Servers
user.scramer.Trash.IS Staff
user.scramer.Trash.IS Surveys
...


[cyrus@imap2 ~]$ reconstruct -f -r user.scramer
user.scramer
user.scramer.Archive
user.scramer.Archives
user.scramer.Archives.2004
...
user.scramer.Trash.IS Networking and Operations
user.scramer.Trash.IS Security
user.scramer.Trash.IS Servers
user.scramer.Trash.IS Staff
user.scramer.Trash.IS Surveys
...


I then restart the replication against the log-run file and it again 
fails on a folder that no longer exists in that account:


...
May 11 11:40:51 imap1 sync_client[60696]: sync_mailboxes: doing 1000
May 11 11:40:51 imap1 sync_client[60696]: IOERROR: zero length 
response to MAILBOXES (end of file reached)
May 11 11:40:51 imap1 sync_client[60696]: IOERROR: zero length 
response 

sync_client problems

2017-05-10 Thread Eric Cunningham
Hi list, I'm running cyrus replication on cyrus 2.5.10.  It seems 
whenever a folder is encountered in log-run that doesn't exist on the 
client and/or the server, the replication process logs "Error in 
do_sync(): bailing out! Bad protocol".  Sometimes, it dies completely, 
other times it restarts at the beginning of the log-run file.  But when 
it re-encounters the folder in question, the same failure occurs and the 
problem continuously loops until some manual action is taken.  I have a 
checker script that attempts to restart a stopped sync but I haven't 
tried to code it to get past these types of folder errors.  In the 
meantime, replication effectively stops and the replication log 
continuously builds in size.  You might imagine my horror at discovering 
yesterday that replication of my production server with 1200 accounts 
had been failing for over 7 weeks unnoticed, generating 9.5 million 
lines in the replication log files.  I had gotten out of the habit of 
daily monitoring of replication as it had been running fine and I 
eventually moved on to other things.  I'm trying to get replication to 
run through the enormous backlog but am running into more similar folder 
issues that cause the process to fail, I make manual fixes, and start 
the process again.  In my experience with this, reconstruction of the 
affected folders on the master host and copy host has no effect.


Can this otherwise excellent feature be made more robust to work past 
these folder issues?


Here are a couple of examples I've observed thus far of how the process 
is failing for me.


--

In this example, the subfolder "Fly America Foreign Travel" no longer 
existed on the master host but did exist on the copy host.  To resolve 
this, I had to manually delete that folder from the copy host.


May  9 15:17:48 imap1 sync_client[34614]: inefficient replication (8 > 
5) user.jtaft.PO.Fly America Foreign Travel
May  9 15:17:48 imap1 sync_client[34614]: FULLMAILBOX received NO 
response: IMAP_MAILBOX_NONEXISTENT No Such Mailbox
May  9 15:17:48 imap1 sync_client[34614]: do_folders(): update failed: 
user.jtaft.PO.Fly America Foreign Travel 'Mailbox does not exist'

May  9 15:17:48 imap1 sync_client[34614]: IOERROR: Mailbox does not exist
May  9 15:17:48 imap1 sync_client[34614]: Error in do_sync(): bailing 
out! Mailbox does not exist
May  9 15:17:48 imap1 sync_client[34614]: Processing sync log file 
/var/spool/CyrusDB-Sieve/cyrusdb/sync/log-run failed: Mailbox does not exist
May  9 15:17:48 imap1 sync_client[34614]: Reprocessing sync log file 
/var/spool/CyrusDB-Sieve/cyrusdb/sync/log-run


--

In this example, the subfolder "Sohn" did not exist on the master host 
nor on the copy host.  To resolve, I had to edit log-run and completely 
remove all entries for this "Sohn" subfolder.


May 10 10:13:26 imap1 sync_client[47011]: IOERROR: zero length response 
to MAILBOXES (end of file reached)
May 10 10:13:26 imap1 sync_client[47011]: IOERROR: zero length response 
to MAILBOXES (end of file reached)
May 10 10:13:26 imap1 sync_client[47011]: IOERROR: zero length response 
to UNMAILBOX (end of file reached)
May 10 10:13:26 imap1 sync_client[47011]: folder_delete(): failed: 
user.gg-chair.Trash.Sohn 'Bad protocol'
May 10 10:13:26 imap1 sync_client[47011]: Error in do_sync(): bailing 
out! Bad protocol
May 10 10:13:26 imap1 sync_client[47011]: Processing sync log file 
/var/spool/CyrusDB-Sieve/cyrusdb/sync/log-run failed: Bad protocol


--

Often, errors are logged without "bailing out!" and replication 
continues to run:


May 10 10:45:00 imap1 sync_client[87438]: sync_mailboxes: doing 1000
May 10 10:45:01 imap1 sync_client[87438]: sync_mailboxes: doing 1000
May 10 10:45:24 imap1 sync_client[87438]: IOERROR: GUID mismatch 
/var/spool/cyrus/mail/c/user/cdm-lit/Sent/148.
May 10 10:45:37 imap1 sync_client[87438]: IOERROR: zero length response 
to MAILBOX (end of file reached)
May 10 10:45:37 imap1 sync_client[87438]: do_folders(): update failed: 
user.cdm-lit.Sent 'Bad protocol'

May 10 10:45:37 imap1 sync_client[87438]: sync_mailboxes: doing 1000

--

Thank you, list!

-Eric


Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
To Unsubscribe:
https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus


Re: sync_client problems

2017-05-12 Thread Eric Cunningham
I have snapshot backups of that account, that particular message and the 
various cyrus db files.  What can I provide for review?



On 5/11/17 7:38 PM, Bron Gondwana wrote:

It looked like the '148.' file was corrupted in some way.  I assume you've 
deleted all the evidence now, so we can't see how!  Oh well :(

Bron.

On Fri, 12 May 2017, at 04:55, Eric Cunningham wrote:

Just to close the loop on this, once the corrupted cdm-lit account on my
copy server successfully reconstructed, the backlog of replication has
completed successfully and is now caught up.

-Eric

On 05/11/2017 02:20 PM, Eric Cunningham wrote:

-G didn't seem to help, but I tried a "reconstruct -f -r" command
without having previously deleted all occurrences of the cyrus.* files
and it was then successful.



On 05/11/2017 01:44 PM, Patrick Boutilier wrote:

Try a -G with reconstruct?

-G Force  re-parsing  of the underlying message (checks GUID
correctness).  Reconstruct with -G should fix all possible individual
message issues, including corrupted data files.


On 05/11/2017 02:37 PM, Eric Cunningham wrote:

I have to walk this back.  In looking slightly further back in my
logfiles, before every instance of a failure to sync some folder, I
see a common error reported prior to every "bailing out! Bad
protocol" error - "IOERROR: GUID mismatch
/var/spool/cyrus/mail/c/user/cdm-lit/Sent/148."

May 11 12:44:09 imap1 sync_client[48590]: IOERROR: GUID mismatch
/var/spool/cyrus/mail/c/user/cdm-lit/Sent/148.
May 11 12:44:22 imap1 sync_client[48590]: IOERROR: zero length
response to MAILBOX (end of file reached)
May 11 12:44:22 imap1 sync_client[48590]: do_folders(): update
failed: user.cdm-lit.Sent 'Bad protocol'
May 11 12:44:22 imap1 sync_client[48590]: sync_mailboxes: doing 1000
May 11 12:44:22 imap1 sync_client[48590]: IOERROR: zero length
response to MAILBOXES (end of file reached)
May 11 12:44:22 imap1 sync_client[48590]: sync_mailboxes: doing 1000
May 11 12:44:22 imap1 sync_client[48590]: IOERROR: zero length
response to MAILBOXES (end of file reached)
May 11 12:44:22 imap1 sync_client[48590]: sync_mailboxes: doing 1000
May 11 12:44:22 imap1 sync_client[48590]: IOERROR: zero length
response to MAILBOXES (end of file reached)
May 11 12:44:22 imap1 sync_client[48590]: sync_mailboxes: doing 1000
May 11 12:44:22 imap1 sync_client[48590]: IOERROR: zero length
response to MAILBOXES (end of file reached)
May 11 12:44:22 imap1 sync_client[48590]: sync_mailboxes: doing 1000
May 11 12:44:22 imap1 sync_client[48590]: IOERROR: zero length
response to MAILBOXES (end of file reached)
May 11 12:44:22 imap1 sync_client[48590]: sync_mailboxes: doing 1000
May 11 12:44:22 imap1 sync_client[48590]: IOERROR: zero length
response to MAILBOXES (end of file reached)
May 11 12:44:22 imap1 sync_client[48590]: IOERROR: zero length
response to MAILBOXES (end of file reached)
May 11 12:44:22 imap1 sync_client[48590]: IOERROR: zero length
response to UNMAILBOX (end of file reached)
May 11 12:44:22 imap1 sync_client[48590]: folder_delete(): failed:
user.guest-student-coordinator.Trash.Aarflot 'Bad protocol'
May 11 12:44:22 imap1 sync_client[48590]: Error in do_sync(): bailing
out! Bad protocol
May 11 12:44:22 imap1 sync_client[48590]: Processing sync log file
/var/spool/CyrusDB-Sieve/cyrusdb/sync/log-run failed: Bad protocol

When I recontructed this cdm-lit account, it ran successfully on my
master host, but fails on my copy host:

[cyrus@imap2 ~]$ reconstruct -f -r user.cdm-lit
user.cdm-lit
user.cdm-lit.Sea Trail 2013
user.cdm-lit.Sent uid 1 found - adding
user.cdm-lit.Sent uid 2 found - adding
user.cdm-lit.Sent uid 3 found - adding
user.cdm-lit.Sent uid 4 found - adding

user.cdm-lit.Sent uid 146 found - adding
user.cdm-lit.Sent uid 147 found - adding
user.cdm-lit.Sent uid 148 found - adding
fatal error: Internal error: assertion failed: imap/mailbox.c: 2847:
record->size

Since I couldn't get a successful reconstruct on this account, I
deleted it and recreated it from my master host.  However, I'm still
unable to get a successful reconstruct with "failed to read index
header" for every subfolder and "fatal error: Internal error:
assertion failed: imap/mailbox.c: 2847: record->size"

Any ideas on how to correct this so I can see if I can then get past
the replication error?

Thanks!

-Eric


On 05/11/2017 11:46 AM, Eric Cunningham wrote:

Thanks Bron, but that doesn't seem to work for me, unless I'm
missing something.

I ran reconstructs for this account on both my master and copy hosts:

[cyrus@imap1 ~]$ reconstruct -f -r user.scramer
user.scramer
user.scramer.Archive
user.scramer.Archives
user.scramer.Archives.2004
...
user.scramer.Trash.IS Networking and Operations
user.scramer.Trash.IS Security
user.scramer.Trash.IS Servers
user.scramer.Trash.IS Staff
user.scramer.Trash.IS Surveys
...


[cyrus@imap2 ~]$ reconstruct -f -r user.scramer
user.scramer
user.scramer.Archive
user.scramer.Archives
user.sc

Re: sync_client problems

2017-05-12 Thread Eric Cunningham

Yes, file 148 is completely corrupted.


On 5/12/17 9:16 AM, Bron Gondwana wrote:

Just read the 148 file and see if there is corruption.  I suspect that's the 
cause.

Bron

On Fri, 12 May 2017, at 23:09, Eric Cunningham wrote:

I have snapshot backups of that account, that particular message and the
various cyrus db files.  What can I provide for review?


On 5/11/17 7:38 PM, Bron Gondwana wrote:

It looked like the '148.' file was corrupted in some way.  I assume you've 
deleted all the evidence now, so we can't see how!  Oh well :(

Bron.

On Fri, 12 May 2017, at 04:55, Eric Cunningham wrote:

Just to close the loop on this, once the corrupted cdm-lit account on my
copy server successfully reconstructed, the backlog of replication has
completed successfully and is now caught up.

-Eric

On 05/11/2017 02:20 PM, Eric Cunningham wrote:

-G didn't seem to help, but I tried a "reconstruct -f -r" command
without having previously deleted all occurrences of the cyrus.* files
and it was then successful.



On 05/11/2017 01:44 PM, Patrick Boutilier wrote:

Try a -G with reconstruct?

-G Force  re-parsing  of the underlying message (checks GUID
correctness).  Reconstruct with -G should fix all possible individual
message issues, including corrupted data files.


On 05/11/2017 02:37 PM, Eric Cunningham wrote:

I have to walk this back.  In looking slightly further back in my
logfiles, before every instance of a failure to sync some folder, I
see a common error reported prior to every "bailing out! Bad
protocol" error - "IOERROR: GUID mismatch
/var/spool/cyrus/mail/c/user/cdm-lit/Sent/148."

May 11 12:44:09 imap1 sync_client[48590]: IOERROR: GUID mismatch
/var/spool/cyrus/mail/c/user/cdm-lit/Sent/148.
May 11 12:44:22 imap1 sync_client[48590]: IOERROR: zero length
response to MAILBOX (end of file reached)
May 11 12:44:22 imap1 sync_client[48590]: do_folders(): update
failed: user.cdm-lit.Sent 'Bad protocol'
May 11 12:44:22 imap1 sync_client[48590]: sync_mailboxes: doing 1000
May 11 12:44:22 imap1 sync_client[48590]: IOERROR: zero length
response to MAILBOXES (end of file reached)
May 11 12:44:22 imap1 sync_client[48590]: sync_mailboxes: doing 1000
May 11 12:44:22 imap1 sync_client[48590]: IOERROR: zero length
response to MAILBOXES (end of file reached)
May 11 12:44:22 imap1 sync_client[48590]: sync_mailboxes: doing 1000
May 11 12:44:22 imap1 sync_client[48590]: IOERROR: zero length
response to MAILBOXES (end of file reached)
May 11 12:44:22 imap1 sync_client[48590]: sync_mailboxes: doing 1000
May 11 12:44:22 imap1 sync_client[48590]: IOERROR: zero length
response to MAILBOXES (end of file reached)
May 11 12:44:22 imap1 sync_client[48590]: sync_mailboxes: doing 1000
May 11 12:44:22 imap1 sync_client[48590]: IOERROR: zero length
response to MAILBOXES (end of file reached)
May 11 12:44:22 imap1 sync_client[48590]: sync_mailboxes: doing 1000
May 11 12:44:22 imap1 sync_client[48590]: IOERROR: zero length
response to MAILBOXES (end of file reached)
May 11 12:44:22 imap1 sync_client[48590]: IOERROR: zero length
response to MAILBOXES (end of file reached)
May 11 12:44:22 imap1 sync_client[48590]: IOERROR: zero length
response to UNMAILBOX (end of file reached)
May 11 12:44:22 imap1 sync_client[48590]: folder_delete(): failed:
user.guest-student-coordinator.Trash.Aarflot 'Bad protocol'
May 11 12:44:22 imap1 sync_client[48590]: Error in do_sync(): bailing
out! Bad protocol
May 11 12:44:22 imap1 sync_client[48590]: Processing sync log file
/var/spool/CyrusDB-Sieve/cyrusdb/sync/log-run failed: Bad protocol

When I recontructed this cdm-lit account, it ran successfully on my
master host, but fails on my copy host:

[cyrus@imap2 ~]$ reconstruct -f -r user.cdm-lit
user.cdm-lit
user.cdm-lit.Sea Trail 2013
user.cdm-lit.Sent uid 1 found - adding
user.cdm-lit.Sent uid 2 found - adding
user.cdm-lit.Sent uid 3 found - adding
user.cdm-lit.Sent uid 4 found - adding

user.cdm-lit.Sent uid 146 found - adding
user.cdm-lit.Sent uid 147 found - adding
user.cdm-lit.Sent uid 148 found - adding
fatal error: Internal error: assertion failed: imap/mailbox.c: 2847:
record->size

Since I couldn't get a successful reconstruct on this account, I
deleted it and recreated it from my master host.  However, I'm still
unable to get a successful reconstruct with "failed to read index
header" for every subfolder and "fatal error: Internal error:
assertion failed: imap/mailbox.c: 2847: record->size"

Any ideas on how to correct this so I can see if I can then get past
the replication error?

Thanks!

-Eric


On 05/11/2017 11:46 AM, Eric Cunningham wrote:

Thanks Bron, but that doesn't seem to work for me, unless I'm
missing something.

I ran reconstructs for this account on both my master and copy hosts:

[cyrus@imap1 ~]$ reconstruct -f -r user.scramer
user.scramer
user.scramer.Archive
user.scramer.Archives
user.scramer.Archives.2004
...
user.scramer.Trash.IS Networking and Operations
user

Missing .seen db files

2018-02-02 Thread Eric Cunningham
Hi list, I had a user report that their read message status was lost.  I 
recall having dealt with seen message db files in the past, but on my 
current implementation (2.5 on FreeBSD 10.2), I'm not able to locate any 
in or within my configdirectory, /var/spool/CyrusDB-Sieve/cyrusdb.  I 
have /var/spool/CyrusDB-Sieve/cyrusdb/user/u/username.sub but not .seen. 
 Would they be located anywhere else or perhaps the seen functionality 
is contained in another database file?  Without per-user .seen 
files, how can 'seen' issues be dealt with for individual users?  Thanks!


-Eric



Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
To Unsubscribe:
https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus


Re: cyrus master fails with status 71

2016-11-07 Thread Eric Cunningham via Info-cyrus
Hi Ellie, we've been running with your patch since Oct 25 and haven't 
encountered any issues with imapd exiting, thus far.  But, now that 
imapd has had a chance to run uninterrupted for almost 2 weeks, the 
number of imapd processes/connections has steadily climbed every day. 
This morning, it was near 16,000.  This system has a total of 1400 accounts.


To try and control this growth, per 
https://cyrusimap.org/imap/faqs/o-toomanyprocesses.html I've set the 
following:


To cyrus.conf, added "-U 50" option to the SERVICES section for imapd:

  imap  cmd="imapd -U 50" listen="imap" prefork=60
  imaps cmd="imapd -s -U 50"  listen="imaps"prefork=150


To imapd.conf, added the following tcp_keepalive options:

  tcp_keepalive: 1
  tcp_keepalive_cnt: 1
  tcp_keepalive_idle: 30
  tcp_keepalive_intvl: 900

After restarting imapd, the following are now being logged repeatedly:

Nov  7 10:18:19 imap1 lmtpunix[58768]: unable to 
setsocketopt(TCP_KEEPCNT): Invalid argument
Nov  7 10:18:19 imap1 lmtpunix[58768]: unable to 
setsocketopt(TCP_KEEPIDLE): Invalid argument
Nov  7 10:18:19 imap1 lmtpunix[58768]: unable to 
setsocketopt(TCP_KEEPINTVL): Invalid argument



So, a couple of questions for the list:

Are such numbers of imapd processes to be expected?
Why is lmptunix complaining about options passed to imapd?

Thank you.

-Eric


On 10/25/16 8:23 PM, ellie timoney via Info-cyrus wrote:

Hi Eric,

Patch attached.  I'd appreciate if you could advise whether this helps.
Though I guess you won't be able to tell for a couple of weeks.

If it doesn't cause any new problems (I don't expect it to), then it
will be included in 2.5.11 (whenever that comes out).

Cheers,

ellie

On Wed, Oct 26, 2016, at 10:04 AM, ellie timoney via Info-cyrus wrote:

accept failed: Software caused connection abort


Some sleuthing suggests that "Software caused connection abort"
corresponds with "ECONNABORTED".

The man page on my system for accept(2) unhelpfully defines this as:


   ECONNABORTED
  A connection has been aborted.


But some digging around online suggests that this situation occurs when
a client connects, but subsequently disconnects  (RST) before the server
gets around to accept()ing the connection.  When the server does
eventually accept(), the accept() fails with this error.

Which sounds to me like we want to treat ECONNABORTED similarly to
EAGAIN, not as a fatal OS error.  I'll have a patch up for this shortly.

Cheers,

ellie

On Wed, Oct 26, 2016, at 09:27 AM, Eric Cunningham via Info-cyrus wrote:

Having repeatedly experienced the "status 71" issue, I've been
incrementally bumping it's value up.  It's currently set to 32768 (!)
and that value was in place when it most recently failed.


On 10/25/16 4:21 PM, Shawn Bakhtiar via Info-cyrus wrote:

H.. if that’s the case could you be hitting the the maximum number
of accepts??

Check the 11.11.1.2. kern.ipc.soacceptqueue section of the FreeBSD handbook

https://www.freebsd.org/doc/handbook/configtuning-kernel-limits.html

Given the load you described perhaps 128 is just not enough?




On Oct 24, 2016, at 1:22 PM, Eric Cunningham via Info-cyrus
<info-cyrus@lists.andrew.cmu.edu
<mailto:info-cyrus@lists.andrew.cmu.edu>> wrote:



=
 Eric Cunningham
 Information Services - http://whoi-it.whoi.edu
 Woods Hole Oceanographic Institution - http://www.whoi.edu
 Woods Hole, MA  02543-1541 phone: (508) 289-2224
 fax: (508) 457-2174   e-mail: ecunning...@whoi.edu
<mailto:ecunning...@whoi.edu>
=

On 10/24/2016 03:45 PM, Bron Gondwana via Info-cyrus wrote:

On Tue, 25 Oct 2016, at 02:45, Eric Cunningham via Info-cyrus wrote:

Hi list, we're running cyrus imap 2.5.9 built from the FreeBSD 10-2
(release-p7) ports tree.

The cyrus master process is failing periodically (every 1-2 weeks) as
follows:

Oct 22 07:38:48 imap1 master[7767]: process type:SERVICE name:imaps
path:/usr/local/cyrus/bin/imapd age:305.215s pid:32760 exited, status 71
Oct 22 07:38:48 imap1 master[7767]: service imaps/ipv4 pid 32760 in
READY state: terminated abnormally
Oct 22 07:38:48 imap1 master[7767]: too many failures for service
imaps/ipv4, disabling until next SIGHUP

This prevents new connections by clients until cyrus is restarted.  I've
looked around the web but have not seen this issue reported.

A little background:

Our initial thought on this was that we were running out of listen
queues so have upped that incrementally from the default of 32 to a
current setting of 32768 via /usr/local/etc/rc.d/imapd using the -l
option, with increased kern.ipc.soacceptqueue set to 32768, but that
hasn't helped.  Sometimes the "status 71" occurs during periods of light
use during off hours, like on Saturday mornings.

We have ~1400 imap accoun

Re: cyrus master fails with status 71

2016-10-24 Thread Eric Cunningham via Info-cyrus



=
  Eric Cunningham
  Information Services - http://whoi-it.whoi.edu
  Woods Hole Oceanographic Institution - http://www.whoi.edu
  Woods Hole, MA  02543-1541 phone: (508) 289-2224
  fax: (508) 457-2174   e-mail: ecunning...@whoi.edu
=

On 10/24/2016 03:45 PM, Bron Gondwana via Info-cyrus wrote:

On Tue, 25 Oct 2016, at 02:45, Eric Cunningham via Info-cyrus wrote:

Hi list, we're running cyrus imap 2.5.9 built from the FreeBSD 10-2
(release-p7) ports tree.

The cyrus master process is failing periodically (every 1-2 weeks) as
follows:

Oct 22 07:38:48 imap1 master[7767]: process type:SERVICE name:imaps
path:/usr/local/cyrus/bin/imapd age:305.215s pid:32760 exited, status 71
Oct 22 07:38:48 imap1 master[7767]: service imaps/ipv4 pid 32760 in
READY state: terminated abnormally
Oct 22 07:38:48 imap1 master[7767]: too many failures for service
imaps/ipv4, disabling until next SIGHUP

This prevents new connections by clients until cyrus is restarted.  I've
looked around the web but have not seen this issue reported.

A little background:

Our initial thought on this was that we were running out of listen
queues so have upped that incrementally from the default of 32 to a
current setting of 32768 via /usr/local/etc/rc.d/imapd using the -l
option, with increased kern.ipc.soacceptqueue set to 32768, but that
hasn't helped.  Sometimes the "status 71" occurs during periods of light
use during off hours, like on Saturday mornings.

We have ~1400 imap accounts, though the number of impad processes hovers
around 3,000-4,000.  There have been spikes observed as high as 12,000
imapd processes.  In that particular case, 1 user had 2 imap clients
accounting for near 6,000 of those connections.  We've attempted to
limit these high numbers using the following imapd.conf values:

maxlogins_per_host: 50
maxlogins_per_user: 30
tcp_keepalive: 1
tcp_keepalive_cnt: 1
tcp_keepalive_idle: 30
tcp_keepalive_intvl: 900

However, it seems that once these were reached, no new connections were
permitted and resulted in all manner of user complaints about not being
able to get at their email.

Any ideas on this "status 71" issue?  Could an upgrade to 2.5.10
possibly address this?  Thanks!


https://www.freebsd.org/cgi/man.cgi?query=sysexits

 EX_OSERR (71) An operating system error has been detected.  This
   is intended to be used for such things as ``cannot
   fork'', ``cannot create pipe'', or the like.  It
   includes things like getuid returning a user that
   does not exist in the passwd file.

So the question is: what failed?  Is there anything earlier in the log to 
suggest
what the imapd was doing when it died?

Bron.



Using the example I posted, I traced back imaps process id 32760 and 
found only this:


Oct 22 07:38:48 imap1 imaps[32760]: accept failed: Software caused 
connection abort


-Eric


Cyrus Home Page: http://www.cyrusimap.org/
List Archives/Info: http://lists.andrew.cmu.edu/pipermail/info-cyrus/
To Unsubscribe:
https://lists.andrew.cmu.edu/mailman/listinfo/info-cyrus