Re: [Dovecot] v1.1.rc8 released

2008-06-04 Thread Anders
Timo Sirainen wrote:

   + deliver: Added -c parameter to provide path to delivered mail.
 This allows maildir to save identical mails to multiple recipients
 using hard links.

Now I tried this, with some trouble.

I had to set maildir_copy_with_hardlinks = yes for deliver to pick it
up, even though this is supposed to be the default.

Also, the W= size thing is not added to filenames when using -p.


Cheers,
Anders.




Re: [Dovecot] v1.1.rc8 released

2008-06-04 Thread Anders
Timo Sirainen wrote:
 On Wed, 2008-06-04 at 11:37 +0200, Anders wrote:
 Timo Sirainen wrote:

 + deliver: Added -c parameter to provide path to delivered mail.
   This allows maildir to save identical mails to multiple recipients
   using hard links.

[...]

 Also, the W= size thing is not added to filenames when using -p.

 And cache isn't updated either. These are because hard linking can be
 done without actually reading the mail contents. I won't fix this for
 v1.1, but I updated the documentation.

 I'm not sure what the best final solution to this is though. There could
 of course be a special deliver-check when the reading is done, but COPY
 command has the same problem. Should it read the files or not? If the
 file contents are already in memory it would be a good idea to read them
 and update cache, but otherwise not. I guess mincore() is the only
 potential way to check that, but mmaping the file only to check that is
 probably more trouble than worth.

For the delivery case, the mail will obviously be in memory, as we have
just written it to a temporary file. Is it more than a few lines of code
to add an index update to deliver.c after the hardlink? I might want to
have that as a local patch.

As an alternative, can I call something from my wrapper script to have the
index updated after delivery? I guess this will be impossible in the
general case, as there is no way to know where Sieve decided to put the
mail.

I am not sure how important the update is in our case, anyway. Most people
have the MUA open all day, and I guess a client in IDLE will fetch the
headers and have the index updated immediately, right?


Regards,
Anders.




Re: [Dovecot] v1.1.rc8 released

2008-06-04 Thread Timo Sirainen
On Wed, 2008-06-04 at 11:37 +0200, Anders wrote:
 Timo Sirainen wrote:
 
  + deliver: Added -c parameter to provide path to delivered mail.
This allows maildir to save identical mails to multiple recipients
using hard links.
 
 Now I tried this, with some trouble.
 
 I had to set maildir_copy_with_hardlinks = yes for deliver to pick it
 up, even though this is supposed to be the default.

deliver uses separate config parsing code. Looks like all boolean
settings which have yes default are no as default in deliver. I'll
fix it today by adding more kludges.. Hopefully v2.0 will come soon with
its unified config parsing code. :)

 Also, the W= size thing is not added to filenames when using -p.

And cache isn't updated either. These are because hard linking can be
done without actually reading the mail contents. I won't fix this for
v1.1, but I updated the documentation.

I'm not sure what the best final solution to this is though. There could
of course be a special deliver-check when the reading is done, but COPY
command has the same problem. Should it read the files or not? If the
file contents are already in memory it would be a good idea to read them
and update cache, but otherwise not. I guess mincore() is the only
potential way to check that, but mmaping the file only to check that is
probably more trouble than worth.


signature.asc
Description: This is a digitally signed message part


Re: [Dovecot] v1.1.rc8 released

2008-06-04 Thread Steffen Kaiser

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On Wed, 4 Jun 2008, Timo Sirainen wrote:


Also, the W= size thing is not added to filenames when using -p.


And cache isn't updated either. These are because hard linking can be
done without actually reading the mail contents. I won't fix this for
v1.1, but I updated the documentation.


Could you transfer existing attributes from the source file(name) to the 
destination filename?


Bye,

- -- 
Steffen Kaiser

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.6 (GNU/Linux)

iD8DBQFIRozBVJMDrex4hCIRAn33AKDWn8wCWDX6R7a9F/HPown1mE/oDACeO4XU
jx/MuEvyKrQFblYVDyiWk7c=
=iYGZ
-END PGP SIGNATURE-


Re: [Dovecot] v1.1.rc8 released

2008-06-04 Thread Timo Sirainen
On Wed, 2008-06-04 at 14:38 +0200, Steffen Kaiser wrote:
  Also, the W= size thing is not added to filenames when using -p.
 
  And cache isn't updated either. These are because hard linking can be
  done without actually reading the mail contents. I won't fix this for
  v1.1, but I updated the documentation.
 
 Could you transfer existing attributes from the source file(name) to the 
 destination filename?

Looks like S=n is copied and W=n could also be copied with minimal
trouble. I'll add to my TODO.



signature.asc
Description: This is a digitally signed message part


Re: [Dovecot] v1.1.rc8 released

2008-06-03 Thread Ed W

Timo Sirainen wrote:

On Mon, 2008-06-02 at 23:25 +0100, Ed W wrote:
  

Hi



+ deliver: Added -c parameter to provide path to delivered mail.
  This allows maildir to save identical mails to multiple recipients
  using hard links.
  
  
Funnily enough it was on my todo list to whip up a small perl program to 
go and scan my maildirs and figure out if this theoretical idea actually 
amounted to anything. 


Algorithm would be this:

Open each message,
scan for first blank line. 
SHA the rest of the message, store the SHA in a hash (along with the 
message size)
rinse and repeat and see if we end up with any hashes showing count 
greater than 1...


This would represent the best case that we could achieve assuming body 
content fixed and we find some way to manage variable headers.



Somewhat faster way would be to get a list of file sizes first and not
bother checksumming any files which have a unique size.
  



Could do, but I was trying to expand to the case that the headers were 
different, but the body was the same (eg I suspect that mailing list 
managers might deliver emails one by one (verp), but the body is not 
customised.  Anyway, just wanted to checksum the body of the message not 
the whole message


Actually the motivation for this was I was wondering about the benefit 
of a storage backend where the body was stored per file and the headers 
were stored separately (perhaps in a maildir type format).  I haven't 
looked to see if this is what dbox does already...


I have been looking at git and brackup for backing up maildirs and it's 
got me thinking a bit more about mail storage algorithms


Ed W


Re: [Dovecot] v1.1.rc8 released

2008-06-03 Thread Geert Hendrickx
On Tue, Jun 03, 2008 at 10:27:32AM +0200, Jost Krieger wrote:
 On Tue, Jun 03, 2008 at 07:11:33AM +0100, Ed W wrote:
 
  Could do, but I was trying to expand to the case that the headers were 
  different, but the body was the same (eg I suspect that mailing list 
  managers might deliver emails one by one (verp), but the body is not 
  customised.  Anyway, just wanted to checksum the body of the message not 
  the whole message
 
 That could lead to slight problems, like hardlinking totally unrelated
 messages, e.g. empty messages. Some Headers like From:, To:, Date:,
 Subject: should probably be identical.

Message-ID perhaps? :-)

 For some consistency, just removing  *locally* generated trace headers
 before fingerprinting might lead to better results.

That may still leave identical messages not hard-linked thus wasting space.
Eg. if they come from MTA's that do recipient splitting, or messages that
are routed via different systems.  The Received headers will be different
but the body generally identical.

I think a better solution is what was suggested here before, ie. to keep
the (unique) message headers in a Maildir-like format, containing links to
(single-instance stored) message bodies in a a separate location.

Geert




Re: [Dovecot] v1.1.rc8 released (managesieve updated)

2008-06-03 Thread Stephan Bosch
Timo Sirainen wrote:
 http://dovecot.org/releases/1.1/rc/dovecot-1.1.rc8.tar.gz
 http://dovecot.org/releases/1.1/rc/dovecot-1.1.rc8.tar.gz.sig
I refreshed the ManageSieve patch for the new Dovecot v1.1 release:

http://www.rename-it.nl/dovecot/1.1/dovecot-1.1.rc8-managesieve-0.10.2.diff.gz
http://www.rename-it.nl/dovecot/1.1/dovecot-1.1.rc8-managesieve-0.10.2.diff.gz.sig

Regards,

Stephan.


Re: [Dovecot] v1.1.rc8 released

2008-06-03 Thread Jost Krieger
On Tue, Jun 03, 2008 at 07:11:33AM +0100, Ed W wrote:

 Could do, but I was trying to expand to the case that the headers were 
 different, but the body was the same (eg I suspect that mailing list 
 managers might deliver emails one by one (verp), but the body is not 
 customised.  Anyway, just wanted to checksum the body of the message not 
 the whole message

That could lead to slight problems, like hardlinking totally unrelated
messages, e.g. empty messages. Some Headers like From:, To:, Date:,
Subject: should probably be identical.

For some consistency, just removing  *locally* generated trace headers
before fingerprinting might lead to better results.

Jost
-- 
| Helft Spam ausrotten!HTML in Mail ist unhöflich. |
| Postmaster, JAPH, manchmal Wahrsager   am RZ der RUB |
| Wahre Worte sind nicht gefällig, gefällige Worte sind nicht wahr.|
|  Lao Tse, Tao Te King 81 |


smime.p7s
Description: S/MIME cryptographic signature


Re: [Dovecot] v1.1.rc8 released

2008-06-03 Thread Jost Krieger
On Tue, Jun 03, 2008 at 10:45:20AM +0200, Geert Hendrickx wrote:
 On Tue, Jun 03, 2008 at 10:27:32AM +0200, Jost Krieger wrote:
...
  That could lead to slight problems, like hardlinking totally unrelated
  messages, e.g. empty messages. Some Headers like From:, To:, Date:,
  Subject: should probably be identical.
 
 Message-ID perhaps? :-)

Yep, add that ...
 
  For some consistency, just removing  *locally* generated trace headers
  before fingerprinting might lead to better results.
 
 That may still leave identical messages not hard-linked thus wasting space.
 Eg. if they come from MTA's that do recipient splitting, or messages that
 are routed via different systems.  The Received headers will be different
 but the body generally identical.

True, but these headers are quite important sometimes.

 I think a better solution is what was suggested here before, ie. to keep
 the (unique) message headers in a Maildir-like format, containing links to
 (single-instance stored) message bodies in a a separate location.

Probably better, but to make this transparent for the users, it would
need quite a bit of work in dovecot.

Jost
-- 
| Helft Spam ausrotten!HTML in Mail ist unhöflich. |
| Postmaster, JAPH, manchmal Wahrsager   am RZ der RUB |
| Wahre Worte sind nicht gefällig, gefällige Worte sind nicht wahr.|
|  Lao Tse, Tao Te King 81 |


smime.p7s
Description: S/MIME cryptographic signature


Re: [Dovecot] v1.1.rc8 released

2008-06-03 Thread Timo Sirainen
On Tue, 2008-06-03 at 07:11 +0100, Ed W wrote:
 Actually the motivation for this was I was wondering about the benefit 
 of a storage backend where the body was stored per file and the headers 
 were stored separately (perhaps in a maildir type format).  I haven't 
 looked to see if this is what dbox does already...

dbox is half-designed to support this. It supports arbitrary metadata
(unlike maildir) and I've already written 3 lines of code to get this
implemented ;)

/* Pointer to external message data. Format is:
   1*(start offset byte count ref) */
DBOX_METADATA_EXT_REF   = 'X',

There's no code to actually read/write such metadata though. Also I'm
not exactly sure what the ref is. Maybe just a filename used to store
the data.



signature.asc
Description: This is a digitally signed message part


Re: [Dovecot] v1.1.rc8 released

2008-06-03 Thread Kenneth Porter

--On Tuesday, June 03, 2008 3:49 PM +0300 Timo Sirainen [EMAIL PROTECTED] 
wrote:


dbox is half-designed to support this. It supports arbitrary metadata
(unlike maildir) and I've already written 3 lines of code to get this
implemented ;)

/* Pointer to external message data. Format is:
   1*(start offset byte count ref) */
DBOX_METADATA_EXT_REF   = 'X',

There's no code to actually read/write such metadata though. Also I'm
not exactly sure what the ref is. Maybe just a filename used to store
the data.


LOL, so I'm not the only one who designs like that. I think of it like 
sculpting: Throw some clay on the table and then scrape away anything 
that's not part of my objective. It's the right-brain side of programming. 
(And the hardest part.)





[Dovecot] v1.1.rc8 released

2008-06-02 Thread Timo Sirainen
http://dovecot.org/releases/1.1/rc/dovecot-1.1.rc8.tar.gz
http://dovecot.org/releases/1.1/rc/dovecot-1.1.rc8.tar.gz.sig

I then decided to add the deliver -c feature to this release. Seems to
work in my tests, but who knows if it breaks something.. Although most
of the code is called only if -c parameter is given. Anyway we really
should have a comprehensive test suite written some day (yes, help is
really wanted for this :).

So let's hope this is the last RC release. If there aren't any major
problems I'll release v1.1.0 in a couple of weeks.

I'll also try to merge all my different development trees into a single
v1.2 code tree within a few weeks. v1.2.0 will probably be released this
summer as well, since it mainly has new features that don't change
existing code all that much (CONDSTORE is a bit invasive though).

+ deliver: Added -c parameter to provide path to delivered mail.
  This allows maildir to save identical mails to multiple recipients
  using hard links.
- rc6/rc7 broke POP3 with non-Maildir formats
- mbox: Saving a message without a body or the end-of-headers line
  could have caused an assert-crash later.
- Several dbox fixes



signature.asc
Description: This is a digitally signed message part


Re: [Dovecot] v1.1.rc8 released

2008-06-02 Thread Timo Sirainen
On Mon, 2008-06-02 at 23:25 +0100, Ed W wrote:
 Hi
 
  + deliver: Added -c parameter to provide path to delivered mail.
This allows maildir to save identical mails to multiple recipients
using hard links.

 
 
 Funnily enough it was on my todo list to whip up a small perl program to 
 go and scan my maildirs and figure out if this theoretical idea actually 
 amounted to anything. 
 
 Algorithm would be this:
 
 Open each message,
 scan for first blank line. 
 SHA the rest of the message, store the SHA in a hash (along with the 
 message size)
 rinse and repeat and see if we end up with any hashes showing count 
 greater than 1...
 
 This would represent the best case that we could achieve assuming body 
 content fixed and we find some way to manage variable headers.

Somewhat faster way would be to get a list of file sizes first and not
bother checksumming any files which have a unique size.

 Next up is to use a mime parser and SHA each message part.  Same idea, 
 assuming we used some kind of format to store each part individually, 
 how much gain is this really worth in terms of storage (looks tempting 
 up front, condense all those duplicated jokes, etc - however, does it 
 really bear out in practice...). 

This is in my dbox TODO list (not near future though).

 I think MS Exchange only does single instance storage like you describe 
 here with delivery time hardlinking of messages?  Never analysed what 
 that was worth (back when I had an Exchange system to fiddle with...)

No idea about Exchange, but dbmail 2.3 does single instance MIME part
storing.

 I have a feeling that gzip compression of files would be worth more than 
 this hardlinking (on many but not all mail systems...)

Or you could use both. zlib plugin already supports this with maildir.


signature.asc
Description: This is a digitally signed message part


Re: [Dovecot] v1.1.rc8 released

2008-06-02 Thread Timo Sirainen
On Mon, 2008-06-02 at 16:04 -0700, Andrew Roberts wrote:
 On Tue, 3 Jun 2008, Timo Sirainen wrote:
 
  + deliver: Added -c parameter to provide path to delivered mail.
This allows maildir to save identical mails to multiple recipients
using hard links.
 
 According to the wiki, deliver already uses a -c parameter to specify the 
 path to an alternate configuration file.  Is this incorrect, or is this 
 functionality going away?

I meant to say -p parameter.. And thanks for reminding, I'll update the
wiki as well.



signature.asc
Description: This is a digitally signed message part