Re: GSSAPI authentication ceased working

2009-01-08 Thread Dave McMurtrie
Lars Hanke wrote:

 BTW: It's still not working. I put it to PRI2, since the important 
 ldapdb stuff is running. Kerberized imap is rarely used here, so people 
 can do without. But still I'd like to understand, what is happening.

Is the keytab readable by the cyrus user (the Unix uid)?

Thanks,

Dave
-- 
Dave McMurtrie, SPE
Email Systems Team Leader
Carnegie Mellon University,
Computing Services

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: Storage Sizing: IOPS per mailbox

2009-01-08 Thread Blake Hudson
 Original Message  
Subject: Storage Sizing: IOPS per mailbox
From: ram r...@netcore.co.in
To: info-cyrus info-cyrus@lists.andrew.cmu.edu
Date: Friday, January 02, 2009 10:40:17 PM
 When sizing a storage device for a large cyrus server, the typical
 question asked by storage vendors is what is the IOPS required per
 mailbox 
 M$$ Exchange has this concept of IOPS. and they suggest 1.5 IOPS per
 mailbox ( heavy users ) 

 If I use postfix and cyrus , on my imap server ( pure IMAP server .. All
 spam filtering , outgoing mails , authentication etc happens on
 different servers )


 If the storage is used only for imap storage , what is the typical
 IOPS requirement per user
 We will probably assume 30-50 mails a day of average 100k , and an email
 client checking for new mail every 5minutes 


   
In my experience I would estimate 0.1 IOPS per user for heavy users
(thousands of emails per day, checked every few minutes) and 0.01 IOPS
per user for typical ISP accounts (under a dozen emails, checked a few
times daily). Our systems use MySQL for authentication and account
verification and primarily skiplist databases within Cyrus. These
figures may be on the safe side as none of our systems do just postfix/imap.

--Blake

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: choosing a file system

2009-01-08 Thread Janne Peltonen
Hm.

ReiserFS:

If I'm still following after reading through all this discussion,
everyone who is actually using ReiserFS (v3) appears to be very content
with it, even with very large installations. Apparently the fact that
ReiserFS uses the BKL in places doesn't hurt performance too badly, even
with multi core systems? Another thing I don't recall being mentioned
was fragmentation - ext3 appears to have a problem with it, in typical
Cyrus usage, but how does ReiserFS compare to it?

Also, the write barrier problem mentioned in response to my earlier post
on ext3 would apparently be there with ReiserFS, too, wouldn't it?

GFS:

Nobody mentioned using GFS, which /is/ a clustered file system and as
such, probably overkill if it's only mounted on one node at a time, but
I'm curious... the overhead of a clustered FS is the fact that all
metadata operations take a long time, because there is a lot of
cluster-wide locking. But how much metadata operations there are, after
all, in Cyrus?

Also, GFS is one of the two file systems available when using RH
clustering...

Ext3:

I'm using this happily, with 50k users, 24 distinct mailspools of 240G
each. Full backups take quite a while to complete (~2 days), but normal
usage is quite fast. There is the barrier problem, of course... I'm
using noatime (implying nodiratime) and data=ordered, since
data=writeback resulted in corrupted skiplist files on crash, while
data=ordered mostly didn't.

Also, ext3 is the other FS available when using RH clustering. (Of
course, it isn't a clustered FS, so it is only available when using the
cluster in active-passive mode.)

XFS:

There was someone using this, too, and happy with it.

JFS:

Mm, apparently no comments on this, not positive, at least.

Future:

Ext4 just got stable, so there is no real world Cyrus user experience on
it. Among other things, it contains an online defragmenter. Journal
checksumming might also help around the write barrier problem on LVM
logical volumes, if I've understood correctly.

Reiser4 might have a future, at least Andrew Morton's -mm patch contains
it and there are people developing it. But I don't know if it ever will
be included in the standard kernel tree.

Btrfs is in so early development that I don't know yet what to say about
it, but the fact of ZFS's being incompatible with GPL might be mitigated
by this.

Conclusion:

I'm going to continue using ext3 for now, and probably ext4 when it's
available from certain commercial enterprise linux vendor (personally,
I'd be using Debian, but the department has an official policy of using
RH / Centos). I'm eagerly waiting for btrfs to appear... I probably /would/
switch to ReiserFS for now, if RH cluster would support ReiserFS FS
resources.  Hmm, maybe I should just start hacking... On the other hand,
the upgrade path from ext3 to ext4 is quite easy, and I don't know yet
which would be better, ReiserFS or ext4.


-- 
Janne Peltonen janne.pelto...@helsinki.fi PGP Key ID: 0x9CFAC88B
Please consider membership of the Hospitality Club 
(http://www.hospitalityclub.org)

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: Storage Sizing: IOPS per mailbox

2009-01-08 Thread Ian Batten

On 08 Jan 09, at 1508, Blake Hudson wrote:

  Original Message  
 Subject: Storage Sizing: IOPS per mailbox
 From: ram r...@netcore.co.in
 To: info-cyrus info-cyrus@lists.andrew.cmu.edu
 Date: Friday, January 02, 2009 10:40:17 PM
 When sizing a storage device for a large cyrus server, the typical
 question asked by storage vendors is what is the IOPS required per
 mailbox
 M$$ Exchange has this concept of IOPS. and they suggest 1.5 IOPS per
 mailbox ( heavy users )

 If I use postfix and cyrus , on my imap server ( pure IMAP  
 server .. All
 spam filtering , outgoing mails , authentication etc happens on
 different servers )


 If the storage is used only for imap storage , what is the typical
 IOPS requirement per user
 We will probably assume 30-50 mails a day of average 100k , and an  
 email
 client checking for new mail every 5minutes



 In my experience I would estimate 0.1 IOPS per user for heavy users
 (thousands of emails per day, checked every few minutes) and 0.01 IOPS
 per user for typical ISP accounts (under a dozen emails, checked a few
 times daily).

Our IMAP server has as I type 1020 imap connections up, representing  
most of our staff.   The metadata (both /var/imap and per-mailbox) is  
in a ZFS pool configured as a two-way mirror of two-way stripes of SAS  
drives.  The load on that is low.  The zfs statistics on the metadata  
are as an example:

mailhost-new# zpool iostat 1
capacity operationsbandwidth
pool used  avail   read  write   read  write
--  -  -  -  -  -  -
pool1   54.4G  23.6G 12 77   489K   499K
pool1   54.4G  23.6G  0785  0  3.74M
pool1   54.4G  23.6G  0 14  0   135K
pool1   54.4G  23.6G  0 24  0   877K
pool1   54.4G  23.6G  0 33  0   242K
pool1   54.4G  23.6G  0 10  0  43.6K
pool1   54.4G  23.6G  0417  1.48K  2.06M
pool1   54.4G  23.6G  0 22  0   139K
pool1   54.4G  23.6G  0  1  0  7.92K
pool1   54.4G  23.6G  0  7  0  31.7K
pool1   54.4G  23.6G  0 20  0  83.2K
pool1   54.4G  23.6G  0504   1013  1.93M
pool1   54.4G  23.6G  0 23  0   574K
pool1   54.4G  23.6G  2 17  96.5K   123K
pool1   54.4G  23.6G  0 40  0   285K
pool1   54.4G  23.6G  0 26  0   123K
pool1   54.4G  23.6G  0698  1.98K  3.41M
pool1   54.4G  23.6G  0  3  0  15.8K
pool1   54.4G  23.6G  0 24  0   744K
pool1   54.4G  23.6G  0 16  0   713K
pool1   54.4G  23.6G  3 15   209K   147K
pool1   54.4G  23.6G  5569   760K  2.71M
pool1   54.4G  23.6G  0 16  0   222K
^Cmailhost-new#

You can see the five-second sync.  The first line, average, figures  
aren't representative because they of course include backup activity.

The actual messages are stored in the lowest (`archive') QoS band of a  
Pillar Axiom 500, in NAS mode.  The load is very small on each of two  
2TB-ish partitions (it's approaching four pm, so the business is going  
at close to full load):

mailpool1

I/O Operations
Read I/Os per second: 58.677
Write I/Os per second: 7.129
Average Request Time: 4.475 ms
Current MB per second: 1.003
General Statistics
Read/Write Cache Hit Percentage: 69%
Read/Write I/O Ratio: 89:11

mailpool2

I/O Operations
Read I/Os per second: 46.733
Write I/Os per second: 9.467
Average Request Time: 1.923 ms
Current MB per second: 0.544
General Statistics
Read/Write Cache Hit Percentage: 56%
Read/Write I/O Ratio: 83:17








Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: choosing a file system

2009-01-08 Thread Vincent Fox
(Summary of filesystem discussion)

You left out ZFS.

Sometimes Linux admins remind me of Windows admins.

I have adminned a half-dozen UNIX variants professionally but
keep running into admins who only do ONE and for whom every
problem is solved with how can I do this with one OS only?

I admin numerous Linux systems in our data center (Perdition proxy
in front of Cyrus for one)  but frankly you want me to go back into 
filesystem
Dark Ages now for terabytes of mail volume I'd throw a professional fit.
Even the idea that I need to tune my filesystem for inodes and to avoid it
wanting to fsck on reboot #20 or whatever seems like caveman discussion.
Any of them offer cheap and nearly-instant snapshots  online scrubbing?
No?  Then why use it for large number of files of important nature?

I love Linux, I surely do.  Virtually everything of an appliance nature here
will probably shift over to it in the long run I think and for good reasons.
But filesystem is one area where the bazaar model has fallen into a very
deep rut and can't muster energy to climb out.

So far ZFS ticking along with no problems and low iostat numbers
with everything in one big pool.  I have separate fs for data, imap, mail
but haven't seen any need to carve mail spool into chunks at all.
There were initial problems noted here in the mailing lists way back
in Solaris 10u3 but that was solved with the fsync patch and since then
it's been like butter.  Mail-store systems nobody ever needs to look
at them because it just works.






Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: GSSAPI authentication ceased working

2009-01-08 Thread Wesley Craig
On 02 Jan 2009, at 11:19, Lars Hanke wrote:
 hermod: /var/log/auth.log
 Jan  2 17:07:54 hermod imtest: GSSAPI Error: Unspecified GSS  
 failure.  Minor code may provide more information (Decrypt  
 integrity check failed)

 hel: /var/log/syslog
 Jan  2 16:07:54 hel krb5kdc[1652]: TGS_REQ (7 etypes {18 17 16 23 1  
 3 2}) 172.16.6.5: PROCESS_TGS: authtime 0,  unknown client for  
 imap/hermod@mgr, Decrypt integrity check failed

As I read this, hel is saying that the TGT is bad.  You're trying to  
obtain a service ticket for imap/hermod, but the TGT you're  
attempting to use is not accepted by the KDC.  If you klist after  
running imtest, you have no imap/hermod ticket.  I've never seen an  
error like that.  It suggests that you KDC is really broken :)   
Something like the key used to encrypt your TGT isn't valid for  
obtaining service tickets.

:wes


Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: choosing a file system

2009-01-08 Thread Bron Gondwana
On Thu, Jan 08, 2009 at 05:20:00PM +0200, Janne Peltonen wrote:
 If I'm still following after reading through all this discussion,
 everyone who is actually using ReiserFS (v3) appears to be very content
 with it, even with very large installations. Apparently the fact that
 ReiserFS uses the BKL in places doesn't hurt performance too badly, even
 with multi core systems? Another thing I don't recall being mentioned
 was fragmentation - ext3 appears to have a problem with it, in typical
 Cyrus usage, but how does ReiserFS compare to it?

Yeah, I'm surprised the BKL hasn't hurt us more.  Fragmentation, yeah
it does hurt performance a bit.  We run a patch which causes a skiplist
checkpoint every time it runs a recovery, which includes every
restart.  We also tune skiplists to checkpoint more frequently in
everyday use.  This helps reduce meta fragmentation.

For data fragmentation - we don't care.  Honestly.  Data IO is so rare.

The main time it matters is if someone does a body search.

Which leaves... index files.  The worst case are files that are only
ever appended to, never any records deleted.  Each time you expunge
a mailbox (even with delayed expunge) it causes a complete rewrite of
the cyrus.index file.

I also wrote a filthy little script (attached) which can repack cyrus
meta directories.  I'm not 100% certain that it's problem free though,
so I only run it on replicas.  Besides, it's not protected like most
of our auto-system functions, which check the database to see if the
machine is reporting high load problems and choke themselves until the
load drops back down again.
 
 I'm using this happily, with 50k users, 24 distinct mailspools of 240G
 each. Full backups take quite a while to complete (~2 days), but normal
 usage is quite fast. There is the barrier problem, of course... I'm
 using noatime (implying nodiratime) and data=ordered, since
 data=writeback resulted in corrupted skiplist files on crash, while
 data=ordered mostly didn't.

Yeah, full backups.  Ouch.  I think the last time we had to do that it
took somewhat over a week.  Mainly CPU limited on the backup server,
which is doing a LOT of gzipping!

Our incremental backups take about 4 hours.  We could probably speed
this up a little more, but given that it's now down from about 12 hours
two weeks ago, I'm happy.  We were actually rate limited by Perl
'unpack' and hash creation, believe it or not!  I wound up rewriting
Cyrus::IndexFile to provide a raw interface, and unpacking just the
fields that I needed.  I also asserted index file version == 10 in the
backup library so I can guarantee the offsets are correct.

I've described our backup system here before - it's _VERY_ custom,
based on a deep understanding of the Cyrus file structures.  In this
case it's definitely worth it - it allows us to reconstruct partial
mailbox recoveries with flags intact.  Unfortunately, seen information
is much trickier.  I've been tempted for a while to patch cyrus's
seen support to store seen information for the user themselves in the
cyrus.index file, and only seen information for unowned folders in the
user.seen files.  The way it works now seems optimised for the uncommon
case at the expense of the common.  That always annoys me!
 
 Ext4 just got stable, so there is no real world Cyrus user experience on
 it. Among other things, it contains an online defragmenter. Journal
 checksumming might also help around the write barrier problem on LVM
 logical volumes, if I've understood correctly.

Yeah, it's interesting.  Local fiddling suggests it's worse for my
Maildir performance than even btrfs, and btrfs feels more jerky than
reiser3, so I stick with reiser3.
 
 Reiser4 might have a future, at least Andrew Morton's -mm patch contains
 it and there are people developing it. But I don't know if it ever will
 be included in the standard kernel tree.

Yeah, the mailing list isn't massively active at the moment either... I
do keep an eye on it.

 Btrfs is in so early development that I don't know yet what to say about
 it, but the fact of ZFS's being incompatible with GPL might be mitigated
 by this.

Yeah, btrfs looks interesting.  Especially with their work on improving
locking - even on my little dual processor laptop (yay core processors)
I would expect to see an improvement when they merge the new locking
code.

 I'm going to continue using ext3 for now, and probably ext4 when it's
 available from certain commercial enterprise linux vendor (personally,
 I'd be using Debian, but the department has an official policy of using
 RH / Centos). I'm eagerly waiting for btrfs to appear... I probably /would/
 switch to ReiserFS for now, if RH cluster would support ReiserFS FS
 resources.  Hmm, maybe I should just start hacking... On the other hand,
 the upgrade path from ext3 to ext4 is quite easy, and I don't know yet
 which would be better, ReiserFS or ext4.

Sounds sane.  If vendor support matters, then ext4 is probably the
immediate future good choice.  

Re: choosing a file system

2009-01-08 Thread Bron Gondwana
On Thu, Jan 08, 2009 at 08:01:04AM -0800, Vincent Fox wrote:
 (Summary of filesystem discussion)
 
 You left out ZFS.
 
 Sometimes Linux admins remind me of Windows admins.
 
 I have adminned a half-dozen UNIX variants professionally but
 keep running into admins who only do ONE and for whom every
 problem is solved with how can I do this with one OS only?

We run one zfs machine.  I've seen it report issues on a scrub
only to not have them on the second scrub.  While it looks shiny
and great, it's also relatively new.

Besides, we had a disk _fail_ early on in our x4500 - Sun shipped
a replacement drive, but the kernel was unable to recognise it:

---

Nothing odd about how it snaps in. We can see the connectors in the
slot - they seem fine as far as we can tell. The drive's 'ok' light is
on and the blue led lit.

Which suggests the server thinks the drive is fine, but the dmesg data
definitely suggests it isn't.

I've also included the output of hdadm display below as well, which
shows that currently it thinks the drive is not present, even though the
last thing reported in the dmesg log is that the device was connected.

Aug 14 21:59:13 backup1  SATA device attached at port 0
Aug 14 21:59:13 backup1 sata: [ID 663010 kern.info]
+/p...@2,0/pci1022,7...@8/pci11ab,1...@1 :

The output of hdadm display shows that the machine definitely thinks the
drive is NOT connected.

---

Sun's response was to wait for the next kernel upgrade - there was a bug
that made that channel unusable even after a reboot.

 So far ZFS ticking along with no problems and low iostat numbers
 with everything in one big pool.  I have separate fs for data, imap, mail
 but haven't seen any need to carve mail spool into chunks at all.
 There were initial problems noted here in the mailing lists way back
 in Solaris 10u3 but that was solved with the fsync patch and since then
 it's been like butter.  Mail-store systems nobody ever needs to look
 at them because it just works.

I'd sure hate to lose the entire basket, say due to an unknown bug in
zfs.

Besides, I _know_ Debian quite well.  We don't have any Solaris
experience in our team.  The documentation looks quite good, but it's
still a lot of things that work differently.  I tell you what,
maintaining Solaris and using the Solaris userland feels like going
back 20 years - and the whole need a sunsolve password and only get
some patches - permission denied on others crap.  I don't need that.

So while I apprciate that ZFS has some advantages, I'd have to say
that they need to be weighed up against the rest of the system, and
the all the eggs in a relatively new basket argument.  Also, the
response we've had from Linus when we find kernel issues has been
absolutely fantastic.

Bron ( Debian on the Solaris kernel would be interesting... )

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: choosing a file system

2009-01-08 Thread Dale Ghent
On Jan 8, 2009, at 7:46 PM, Bron Gondwana wrote:

 We run one zfs machine.  I've seen it report issues on a scrub
 only to not have them on the second scrub.  While it looks shiny
 and great, it's also relatively new.

Wait, weren't you just crowing about ext4? The filesystem that was  
marked GA in the linux kernel release that happened just a few weeks  
ago? You also sound pretty enthusiastic, rather than cautious, when  
talking about brtfs and tux3.

ZFS, and anyone who even remotely seriously follows Solaris would know  
this, has been GA for 3 years now. For someone who doesn't have their  
nose buried in Solaris much or with any serious attention span, I  
guess it could still seem new.

As for your x4500, I can't tell if those syslog lines you pasted were  
from Aug. 2008 or 2007, but certainly since 2007 the marvel SATA  
driver has seen some huge improvements to work around some pretty  
nasty bugs in the marvell chipset. If you still have that x4500, and  
have not applied the current patch for the marvell88sx driver, I  
highly suggest doing so. Problems with that chip are some of the  
reasons Sun switched to the LSI 1068E as the controller in the x4540.

/dale


Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: choosing a file system

2009-01-08 Thread Bron Gondwana


On Thu, 08 Jan 2009 20:03 -0500, Dale Ghent da...@elemental.org wrote:
 On Jan 8, 2009, at 7:46 PM, Bron Gondwana wrote:
 
  We run one zfs machine.  I've seen it report issues on a scrub
  only to not have them on the second scrub.  While it looks shiny
  and great, it's also relatively new.

 Wait, weren't you just crowing about ext4? The filesystem that was  
 marked GA in the linux kernel release that happened just a few weeks  
 ago? You also sound pretty enthusiastic, rather than cautious, when  
 talking about brtfs and tux3.

I was saying I find it interesting.  I wouldn't seriously consider
using it for production mail stores just yet.  But I have been testing
it on my laptop, where I'm running an offlineimap replicated copy of
my mail.  I wouldn't consider btrfs for production yet either, and
tux3 isn't even on the radar.  They're interesting to watch though,
as is ZFS.

I also said (or at least meant) that if you have commercial support,
ext4 is probably going to be the next evolutionary step from ext3.

 ZFS, and anyone who even remotely seriously follows Solaris would know  
 this, has been GA for 3 years now. For someone who doesn't have their  
 nose buried in Solaris much or with any serious attention span, I  
 guess it could still seem new.

Yeah, it's true - but I've heard anecdotes of people losing entire
zpools due to bugs.  Google turns up things like:

http://www.techcrunch.com/2008/01/15/joyent-suffers-major-downtime-due-to-zfs-bug/

which points to this thread:

http://www.opensolaris.org/jive/thread.jspa?threadID=49020tstart=0

and finally this comment:

http://www.joyeur.com/2008/01/16/strongspace-and-bingodisk-update#c008480

Not something I would want happening to my entire universe, which is
why having ~280 separate filesystems (at the moment) with our email
spread across them means that a rare filesystem bug is only likely to
affect a single store if it bites - and we can restore one store's
worth of users a lot quicker than the whole system.

It's the same reason we prefer Cyrus replication (and put a LOT of work
into making it stable - check this mailing list from a couple of years
ago.  I wrote most of the patches the stabilised replication between
2.3.3 and 2.3.8)

If all your files are on a single filesystem then a rare bug only has
to hit once.  A frequent bug on the other hand, well - you'll know
about them pretty fast... :)  None of the filesystems mentioned have
frequent bugs (except btrfs and probably tux3 - but they ship with
big fat warnings all over)

 As for your x4500, I can't tell if those syslog lines you pasted were  
 from Aug. 2008 or 2007, but certainly since 2007 the marvel SATA  
 driver has seen some huge improvements to work around some pretty  
 nasty bugs in the marvell chipset. If you still have that x4500, and  
 have not applied the current patch for the marvell88sx driver, I  
 highly suggest doing so. Problems with that chip are some of the  
 reasons Sun switched to the LSI 1068E as the controller in the x4540.

I think it was 2007 actually.  We haven't had any trouble with it for
a while, but then it does pretty little.  The big zpool is just used
for backups, which are pretty much one .tar.gz and one .sqlite3 file
per user - and the .sqlite3 file is just indexing the .tar.gz file,
we can rebuild it by reading the tar file if needed.

As a counterpoint to some of the above, we had an issue with Linux
where there was a bug in 64 bit writev handling of mmaped space.  If
you were doing a writev with a mmaped space that crossed a page boundary
and the following page wasn't mapped in, it would inject spurious zero
bytes in the output where the start of the next page belonged.

It took me a few days to prove it was the kernel and create a repeatable
test case, and then backwards and forwards with Linus and a couple of
other developers we fixed it and tested it _that_day_.  I don't know
anyone with even unobtanium level support with a commercial vendor who
has actually had that sort of turnaround.

This caused pretty massive file corruption of especially our skiplist
files, but bits of every other meta file too.   Luckily, as per above,
we had only upgraded one machine.  We generally do that with new kernels
or software versions - upgrade one production machine and watch it for
a bit.  We also test things on testbed machines first, but you always
find something different on production.  The mmap over boundaries case
was pretty rare - only a few per day would actually cause a crash, the
others were silent corruption that wasn't detected at the time.

If something like this hit an only machine, we would have been seriously
screwed.  Since it only hit one machine, we could apply the fix and
re-replicate all the damaged data from the other machine.  No actual
dataloss.

Bron.
-- 
  Bron Gondwana
  br...@fastmail.fm


Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: 

Re: choosing a file system

2009-01-08 Thread Robert Banz

On Jan 8, 2009, at 4:46 PM, Bron Gondwana wrote:

 On Thu, Jan 08, 2009 at 08:01:04AM -0800, Vincent Fox wrote:
 (Summary of filesystem discussion)

 You left out ZFS.

 Sometimes Linux admins remind me of Windows admins.

 I have adminned a half-dozen UNIX variants professionally but
 keep running into admins who only do ONE and for whom every
 problem is solved with how can I do this with one OS only?

 We run one zfs machine.  I've seen it report issues on a scrub
 only to not have them on the second scrub.  While it looks shiny
 and great, it's also relatively new.

You'd be surprised how unreliable disks and the transport between the  
disk and host can be. This isn't a ZFS problem, but a statistical  
certainty as we're pushing a large amount of bits down the wire.

You can, with a large enough corpus, have on-disk data corruption, or  
data corruption that appeared en-flight to the disk, or in the  
controller, that your standard disk CRCs can't correct for. As we keep  
pushing the limits, data integrity checking at the filesystem layer --  
before the information is presented for your application to consume --  
has basically become a requirement.

BTW, the reason that the first scrub saw the error, and the second  
scrub didn't, is that the first scrub fixed it -- that's the job of a  
ZFS scrub.

-rob

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: choosing a file system

2009-01-08 Thread Bron Gondwana
On Thu, Jan 08, 2009 at 08:57:18PM -0800, Robert Banz wrote:

 On Jan 8, 2009, at 4:46 PM, Bron Gondwana wrote:

 On Thu, Jan 08, 2009 at 08:01:04AM -0800, Vincent Fox wrote:
 (Summary of filesystem discussion)

 You left out ZFS.

 Sometimes Linux admins remind me of Windows admins.

 I have adminned a half-dozen UNIX variants professionally but
 keep running into admins who only do ONE and for whom every
 problem is solved with how can I do this with one OS only?

There's a significant upfront cost to learning a whole new system
for one killer feature, especially if it comes along with signifiant
regressions in lots of other features (like a non-sucky userland
out of the box).  Applying patches on Solaris seems to be a choice
between incredibly low-level command line tools or boot up a whole
graphical environment on a machine in a datacentre on the other side
of the world.

 We run one zfs machine.  I've seen it report issues on a scrub
 only to not have them on the second scrub.  While it looks shiny
 and great, it's also relatively new.

 You'd be surprised how unreliable disks and the transport between the  
 disk and host can be. This isn't a ZFS problem, but a statistical  
 certainty as we're pushing a large amount of bits down the wire.

 You can, with a large enough corpus, have on-disk data corruption, or  
 data corruption that appeared en-flight to the disk, or in the  
 controller, that your standard disk CRCs can't correct for. As we keep  
 pushing the limits, data integrity checking at the filesystem layer --  
 before the information is presented for your application to consume --  
 has basically become a requirement.

 BTW, the reason that the first scrub saw the error, and the second scrub 
 didn't, is that the first scrub fixed it -- that's the job of a ZFS 

# zpool status -v rpool
  pool: rpool
 state: ONLINE
status: One or more devices has experienced an error resulting in data
corruption.  Applications may be affected.
action: Restore the file in question if possible.  Otherwise restore the
entire pool from backup.
   see: http://www.sun.com/msg/ZFS-8000-8A
 scrub: scrub in progress for 0h0m, 0.69% done, 1h40m to go
config:

NAME  STATE READ WRITE CKSUM
rpool ONLINE   0 0 0
  mirror  ONLINE   0 0 0
c5t0d0s0  ONLINE   0 0 0
c5t4d0s0  ONLINE   0 0 0

errors: Permanent errors have been detected in the following files:

//dev/dsk

---

if that's an error that the scrub fixed then it's a really badly
written error message.

Same error didn't exist next scrub, which was what confused me.

Bron.

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: choosing a file system

2009-01-08 Thread Bron Gondwana
On Thu, Jan 08, 2009 at 08:01:04AM -0800, Vincent Fox wrote:
 (Summary of filesystem discussion)
 
 You left out ZFS.

Just to come back to this - I should say that I'm a big fan
of ZFS and what Sun have done with filesystem design.  Despite
the issues we've had with that machine, I know it's great for
people who are using it...

BUT - if someone is asking what's the best filesystem to use
on Linux and gets told ZFS, and by the way you should switch
operating systems and ditch all the rest of your custom setup/
experience then you're as bad as a Linux weenie saying just
use Cyrus on Linux in a how should I tune NTFS on my 
Exchange server discussion.

From the original post:

Message-ID: 1617f8010812300849k1c7c878bl2f17e8d4287c1...@mail.gmail.com

  zfs (but we should switch to solaris or freebsd and 
   throw away our costly SAN)

I'd love to do some load testing on a ZFS box with our setup
at some point.  There would be some advantages, though I suspect
having one big mailboxes.db vs the lots of little ones we have
would be a point of contention - and fine-grained skiplist
locking is still very much a wishlist item.  I'd want to take
some time testing it before unleashing it on the world!

Bron.

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: choosing a file system

2009-01-08 Thread Vincent Fox
Bron Gondwana wrote:
 BUT - if someone is asking what's the best filesystem to use
 on Linux and gets told ZFS, and by the way you should switch
 operating systems and ditch all the rest of your custom setup/
 experience then you're as bad as a Linux weenie saying just
 use Cyrus on Linux in a how should I tune NTFS on my 
 Exchange server discussion.

   
Point taken.  We can go around that circle all day long but I *am*
saying there are other UNIX OS out there than just Linux and quite
frankly it blows my mind sometimes how people fall into ruts.

Numerous times in my career I have had to switch some application
from AIX to HP-UX, or IRIX to Linux.   The differing flavors of UNIX are
not so different to me as others perhaps.  Particularly when it's a 
single app
on a dedicated server I usually find it odd how people get stuck on 
something
and won't change.  Or they take the safe institutional path and never 
fight it.
Collect your paycheck and go home at 4.

I sleep very well at night knowing the Cyrus mail-stores are on ZFS.
Once in a while I run a scrub just for fun.  No futzing around.

This was no cakewalk.  I was pushing a boulder up a hill particularly
when we ran head-first into the ZFS fsync bottleneck start of Fall quarter.
Managers said we needed a crash program to convert everything
to Linux or Exchange or whatever.  I dug into the bugs instead and Sun
got us an interim patch to fix it and we moved on.  Now as I said it's like
butter and one of those setups nobody thinks about.  There are always
excuses why you will stick with established practice even if it's 
antiquated
and full of aches and pains, and I fought that and won.  It seems to me 
there
is no bigger deal than having a RELIABLE filesystem for mail-store and
this is where all other filesystem I have worked with since 1989 have been
a frigging nightmare.  Everything from bad controllers to double-disk
failures in RAID-5 sets keeps me wondering am I paranoid ENOUGH.

I'll be all over btrfs when it hits beta.  I'm not married to ZFS.  But I'm
quite unashamedly looking down my nose at any filesystem now that leaves
me possibly looking at fsck prompt.  I've done enough of that in my career
already it's time to move beyond 30+ years worth of cruft atop antique
designs that seemed tolerable when a huge disk was 20 gigs.







Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


Re: choosing a file system

2009-01-08 Thread Robert Banz

 There's a significant upfront cost to learning a whole new system
 for one killer feature, especially if it comes along with signifiant
 regressions in lots of other features (like a non-sucky userland
 out of the box).

...

The non-sucky userland comment is simply a matter of preference, and  
bait for a religious war, which I'm not going to bite.

What I will say is that switching between Solaris, Linux, IRIX,  
Ultrix, FreeBSD, HP-UX, OSF/1 -- any *nix variant, should not be  
considered a stumbling block. Your comment shows the narrow-mindedness  
of the current Linux culture, many of us were brought up supporting  
and using a collection of these platforms at any one time.

(notice, didn't mention AIX. I've got my standards ;)

Patching is always an issue on any OS, and you do have the choice of  
running X applications remotely (booting an entire graphic  
environment!?), and many other tools available such as pca to help you  
patch on Solaris, which provide many of the features that you're used  
to.

-rob

Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html


32-bit to 64-bit migration seen flags

2009-01-08 Thread ram
I am migrating mailboxes from a 32 bit cyrus (cyrus-2.3.7) to a 64 bit
cyrus (2.3.13) server 

When I copy the mailbox seen flags(skiplist) from the 32 bit server to
the 64 bit servers it does not work. All the mails are flagged as unseen
on the new server 

Is there a way I can migrate the seen flags 

Thanks
Ram




Cyrus Home Page: http://cyrusimap.web.cmu.edu/
Cyrus Wiki/FAQ: http://cyrusimap.web.cmu.edu/twiki
List Archives/Info: http://asg.web.cmu.edu/cyrus/mailing-list.html