Re: [GTALUG] mbox vs Maildir [was Re: Linux friendly email providers?]

2023-11-24 Thread D. Hugh Redelmeier via talk
| From: Howard Gibson via talk 

|I figured that separate email files would be using up a lot of
| sectors, but my bandwidth is limited by my Blu-ray discs.  My actual
| backup is a gzipped tar file.  Sectors should not be a problem, should
| they?

Uncompressed TAR format: 


tar uses blocks.
The default block-size is 512 bytes (that dates back to 7th edition UNIX 
in the late 1970s) (for many applications you want to use a larger block 
size but not for maximal compression).

Each file uses one header block plus the number of blocks to hold the 
contents.  So maildir is probably significantly worse than mbox if no 
compression is used.

I expect any compression technique to be fairly good at compressing a run 
of 0 bytes.  Such bytes are expected for filling out each header block and 
the last block of each file.  I cannot quantify "fairly good".

mbox has none of that per-message overhead, compressed or not.

Without testing, I cannot say how much worse compressed tar files of 
maildir would be, compared with compressed tar files of mbox.
---
Post to this mailing list talk@gtalug.org
Unsubscribe from this mailing list https://gtalug.org/mailman/listinfo/talk


Re: [GTALUG] mbox vs Maildir [was Re: Linux friendly email providers?]

2023-11-24 Thread Howard Gibson via talk
On Fri, 24 Nov 2023 01:20:20 -0500 (EST)
"D. Hugh Redelmeier via talk"  wrote:

> You mentioned that you were running out of space on your system.  If a lot 
> of that space is mail messages, I would bet that Maildir is costing you a 
> lot of it.  Each message is taking a multiple of the allocation unit size 
> (1KB? 4KB?) and a large part of that is likely unused (the tail of the 
> last unit).

Hugh,  

   My primary data device is 1TB.  My primary backup device is 50GB
Blu-ray.  I compress my backup now.  My email archives go back twenty
five years, and I have no plans to archive any of it.

   I figured that separate email files would be using up a lot of
sectors, but my bandwidth is limited by my Blu-ray discs.  My actual
backup is a gzipped tar file.  Sectors should not be a problem, should
they?

-- 
Howard Gibson 
hgib...@eol.ca
http://home.eol.ca/~hgibson
---
Post to this mailing list talk@gtalug.org
Unsubscribe from this mailing list https://gtalug.org/mailman/listinfo/talk


Re: [GTALUG] mbox vs Maildir [was Re: Linux friendly email providers?]

2023-11-24 Thread Ron / BCLUG via talk

D. Hugh Redelmeier via talk wrote on 2023-11-23 22:20:

I don't remember seeing that corruption in the last few decades of using 
mbox.


I didn't notice it 'til transferring providers and looking back at old 
message folders. No idea how long it's been lurking around.


Probably nothing of value lost, but ... data loss is enough to make me 
very, very concerned.



The horrors of in-band signalling are well known -- maybe the 
software I use reflects that knowledge.


Pretty sure a power outage while writing to a multi-megabyte file would 
be enough corrupt it with most software? Unless some form of atomic 
transactions / file system journal is used?


And when all messages are in one file, it can cascade. Apparently.


Particularly annoying when a screen full of messages (in the messages 
pane) have no subject, date, correspondents, and looking into the mbox 
there are HTML messages, and mime encoded parts - a nightmare to try to fix.



As John pointed out, a "From" mid message needs special handling too - I 
don't know if that is handled differently in Maildir, but sure feels 
like a sloppy hack.




You mentioned that you were running out of space on your system. If a 
lot of that space is mail messages, I would bet that Maildir is costing 
you a lot of it. Each message is taking a multiple of the allocation 
unit size (1KB? 4KB?) and a large part of that is likely unused (the 
tail of the last unit). My intuition would be that since mail messages 
are usually short, and the distribution of sizes isn't uniform, you are 
probably using at least 25% more disk space with Maildir.


Probably true, but if a file gets corrupted, I only lose one message, 
and it's more likely to be recoverable.



Also, consider RAID-1: 100% extra disk space versus storage capacity.

Often seen as worthwhile, for similar reasons.


I dunno, I may decide to switch back, but so far it's been working well.



If anyone's considering running Maildir, I recommend the layout=FS 
option set in mail_location.


Otherwise a folder structure like:

Archives/2023/11

shows 3 entries in the file system of server like:

.Archives
.Archives.2023
.Archives.2023.11

Ugh.


Cheers,

rb

---
Post to this mailing list talk@gtalug.org
Unsubscribe from this mailing list https://gtalug.org/mailman/listinfo/talk


Re: [GTALUG] mbox vs Maildir [was Re: Linux friendly email providers?]

2023-11-23 Thread John Sellens via talk
Ah - mbox format - still use it for my mail archives.
Convenient for grepping or loading into vi.


On Fri, 2023/11/24 01:20:20AM -0500, D. Hugh Redelmeier via talk 
 wrote:
| | From: Ron / BCLUG via talk 
| 
| | I've seen mbox files get corrupted (all mailbox
| | messages in one file, and a line like "From: " is the message delimiter.
| | Terrible!)
| 
| I don't remember seeing that corruption in the last few decades of using 
| mbox. The horrors of in-band signalling are well known -- maybe the 
| software I use reflects that knowledge.

I believe in an mbox file, the messages start with "From " (no colon),
preceded by either the beginning of the file, or a newline.

When the body of a message contains "From " the convention is
that it must be replaced by ">From " when saving to the file
i.e. add a > before the From.

That's how message delimiter confusion is avoided.

Geez, there is such much crap filling up my brain.  Cheers.

John
---
Post to this mailing list talk@gtalug.org
Unsubscribe from this mailing list https://gtalug.org/mailman/listinfo/talk


[GTALUG] mbox vs Maildir [was Re: Linux friendly email providers?]

2023-11-23 Thread D. Hugh Redelmeier via talk
| From: Ron / BCLUG via talk 

| I've seen mbox files get corrupted (all mailbox
| messages in one file, and a line like "From: " is the message delimiter.
| Terrible!)

I don't remember seeing that corruption in the last few decades of using 
mbox. The horrors of in-band signalling are well known -- maybe the 
software I use reflects that knowledge.

| I've recently switched to using Maildir format (server *and* Thunderbird).
| One message per file.

You mentioned that you were running out of space on your system.  If a lot 
of that space is mail messages, I would bet that Maildir is costing you a 
lot of it.  Each message is taking a multiple of the allocation unit size 
(1KB? 4KB?) and a large part of that is likely unused (the tail of the 
last unit).

My intuition would be that since mail messages are usually short, and the 
distribution of sizes isn't uniform, you are probably using at least 25% 
more disk space with Maildir.

But intuition is surprisingly bad for computer things.  With Thunderbird 
conversions, you could easily measure this for a real-world example

---
Post to this mailing list talk@gtalug.org
Unsubscribe from this mailing list https://gtalug.org/mailman/listinfo/talk