[PATCH] Fix code extracting the MTA from Received: headers

2010-04-13 Thread Dirk Hohndel
On Tue, 13 Apr 2010 10:37:49 -0700, Carl Worth  wrote:
> On Thu, 08 Apr 2010 08:07:48 -0700, Dirk Hohndel  
> wrote:
> > Right now my plan is to do something like this:
> > 
> > 1) look for my email address in To/Cc
> > 2) look for my email in "for " in Received headers
> > 3) look for my email in X-Original-To
> > 4) look for the domain of my email in Received headers (not just 1st)
> > 5) punt and use default email address
> > 
> > Does that sound sane?
> 
> It sounds sane.

Good.

> > (and thanks for sending the headers - this really helps... can others
> > for whom the current code or the logic mentioned above wouldn't work
> > send their headers, too, please?)
> 
> I started using fetchmail many years ago and have never really needed to
> switch. So I'm still using that, (but don't necessarily recommend it to
> anyone.
> 
> It seems to break the above since it delivers mail locally, so the first
> headers I get are:
> 
>   X-Original-To: cworth at localhost

Easy to detect. I'll add that as an exclusion

>   Delivered-To: cworth at localhost
>   Received: from yoom.home.cworth.org (yoom.home.cworth.org [127.0.0.1])
>   by yoom.home.cworth.org (Postfix) with ESMTP id D391B5883A6
>   for ; Mon, 12 Apr 2010 09:11:18 -0700 (PDT)
>   MIME-Version: 1.0
>   Received: from 10.22.226.213 [10.22.226.213]
>   by yoom.home.cworth.org with IMAP (fetchmail-6.3.16)
>   for  (single-drop); Mon, 12 Apr 2010 
> 09:11:18 -0700 (PDT)

A
(he runs screaming out of the room)

> And none of these are useful for your detection. Worse, the presence of
> "cworth.org" in the above might throw your detection off before it could
> find something useful like "intel.com" in a later Received header.

I have some choice words for these headers...
And an idea how to exclude these false positives as well... It's kind of
a hack, but I'm thinking that in order for the "Received: ... by ..."
part to be truly relevant to us, the from host should have a non-private
IP address. 

Yes, I can envision within-your-own-network cases where none of the
systems have a non-private email address... but then hopefully your last
hop is correct... if not - your setup is even more screwed up than Carl's.

> I'll send a complete message with full headers to you separately.

Thanks

> Perhaps I can just switch programs to transfer email and avoid this
> problem. Anyone have a recommendation for something to transfer mail
> From an imap server to the local matchine, (but *not* leaving it stored
> on the imap server)[*]. I don't think offlineimap supports this mode
> does it?

Don't think so. I'm not going to comment on the usefulness of this mode
in public :-)

/D

-- 
Dirk Hohndel
Intel Open Source Technology Center


[PATCH] Fix code extracting the MTA from Received: headers

2010-04-13 Thread Carl Worth
On Thu, 08 Apr 2010 08:07:48 -0700, Dirk Hohndel  
wrote:
> Right now my plan is to do something like this:
> 
> 1) look for my email address in To/Cc
> 2) look for my email in "for " in Received headers
> 3) look for my email in X-Original-To
> 4) look for the domain of my email in Received headers (not just 1st)
> 5) punt and use default email address
> 
> Does that sound sane?

It sounds sane.

> (and thanks for sending the headers - this really helps... can others
> for whom the current code or the logic mentioned above wouldn't work
> send their headers, too, please?)

I started using fetchmail many years ago and have never really needed to
switch. So I'm still using that, (but don't necessarily recommend it to
anyone.

It seems to break the above since it delivers mail locally, so the first
headers I get are:

X-Original-To: cworth at localhost
Delivered-To: cworth at localhost
Received: from yoom.home.cworth.org (yoom.home.cworth.org [127.0.0.1])
by yoom.home.cworth.org (Postfix) with ESMTP id D391B5883A6
for ; Mon, 12 Apr 2010 09:11:18 -0700 (PDT)
MIME-Version: 1.0
Received: from 10.22.226.213 [10.22.226.213]
by yoom.home.cworth.org with IMAP (fetchmail-6.3.16)
for  (single-drop); Mon, 12 Apr 2010 
09:11:18 -0700 (PDT)

And none of these are useful for your detection. Worse, the presence of
"cworth.org" in the above might throw your detection off before it could
find something useful like "intel.com" in a later Received header.

I'll send a complete message with full headers to you separately.

Perhaps I can just switch programs to transfer email and avoid this
problem. Anyone have a recommendation for something to transfer mail


Re: [PATCH] Fix code extracting the MTA from Received: headers

2010-04-13 Thread Carl Worth
On Thu, 08 Apr 2010 08:07:48 -0700, Dirk Hohndel hohn...@infradead.org wrote:
 Right now my plan is to do something like this:
 
 1) look for my email address in To/Cc
 2) look for my email in for em...@add.res in Received headers
 3) look for my email in X-Original-To
 4) look for the domain of my email in Received headers (not just 1st)
 5) punt and use default email address
 
 Does that sound sane?

It sounds sane.

 (and thanks for sending the headers - this really helps... can others
 for whom the current code or the logic mentioned above wouldn't work
 send their headers, too, please?)

I started using fetchmail many years ago and have never really needed to
switch. So I'm still using that, (but don't necessarily recommend it to
anyone.

It seems to break the above since it delivers mail locally, so the first
headers I get are:

X-Original-To: cwo...@localhost
Delivered-To: cwo...@localhost
Received: from yoom.home.cworth.org (yoom.home.cworth.org [127.0.0.1])
by yoom.home.cworth.org (Postfix) with ESMTP id D391B5883A6
for cwo...@localhost; Mon, 12 Apr 2010 09:11:18 -0700 (PDT)
MIME-Version: 1.0
Received: from 10.22.226.213 [10.22.226.213]
by yoom.home.cworth.org with IMAP (fetchmail-6.3.16)
for cwo...@localhost (single-drop); Mon, 12 Apr 2010 09:11:18 
-0700 (PDT)

And none of these are useful for your detection. Worse, the presence of
cworth.org in the above might throw your detection off before it could
find something useful like intel.com in a later Received header.

I'll send a complete message with full headers to you separately.

Perhaps I can just switch programs to transfer email and avoid this
problem. Anyone have a recommendation for something to transfer mail
From an imap server to the local matchine, (but *not* leaving it stored
on the imap server)[*]. I don't think offlineimap supports this mode
does it?

-Carl

[*] I do separately want to start playing with remote notmuch, but I
won't use this with the imap servers currently accepting my
mail. Instead, I'd rather just rsync my mail from my local machine to a
server I own, (which could then export imap if needed), and do remote
notmuch stuff from there.


pgp4Og6GZRwoM.pgp
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [PATCH] Fix code extracting the MTA from Received: headers

2010-04-13 Thread Dirk Hohndel
On Tue, 13 Apr 2010 10:37:49 -0700, Carl Worth cwo...@cworth.org wrote:
 On Thu, 08 Apr 2010 08:07:48 -0700, Dirk Hohndel hohn...@infradead.org 
 wrote:
  Right now my plan is to do something like this:
  
  1) look for my email address in To/Cc
  2) look for my email in for em...@add.res in Received headers
  3) look for my email in X-Original-To
  4) look for the domain of my email in Received headers (not just 1st)
  5) punt and use default email address
  
  Does that sound sane?
 
 It sounds sane.

Good.
 
  (and thanks for sending the headers - this really helps... can others
  for whom the current code or the logic mentioned above wouldn't work
  send their headers, too, please?)
 
 I started using fetchmail many years ago and have never really needed to
 switch. So I'm still using that, (but don't necessarily recommend it to
 anyone.
 
 It seems to break the above since it delivers mail locally, so the first
 headers I get are:
 
   X-Original-To: cwo...@localhost

Easy to detect. I'll add that as an exclusion

   Delivered-To: cwo...@localhost
   Received: from yoom.home.cworth.org (yoom.home.cworth.org [127.0.0.1])
   by yoom.home.cworth.org (Postfix) with ESMTP id D391B5883A6
   for cwo...@localhost; Mon, 12 Apr 2010 09:11:18 -0700 (PDT)
   MIME-Version: 1.0
   Received: from 10.22.226.213 [10.22.226.213]
   by yoom.home.cworth.org with IMAP (fetchmail-6.3.16)
   for cwo...@localhost (single-drop); Mon, 12 Apr 2010 09:11:18 
 -0700 (PDT)

A
(he runs screaming out of the room)

 And none of these are useful for your detection. Worse, the presence of
 cworth.org in the above might throw your detection off before it could
 find something useful like intel.com in a later Received header.

I have some choice words for these headers...
And an idea how to exclude these false positives as well... It's kind of
a hack, but I'm thinking that in order for the Received: ... by ...
part to be truly relevant to us, the from host should have a non-private
IP address. 

Yes, I can envision within-your-own-network cases where none of the
systems have a non-private email address... but then hopefully your last
hop is correct... if not - your setup is even more screwed up than Carl's.

 I'll send a complete message with full headers to you separately.

Thanks
 
 Perhaps I can just switch programs to transfer email and avoid this
 problem. Anyone have a recommendation for something to transfer mail
 From an imap server to the local matchine, (but *not* leaving it stored
 on the imap server)[*]. I don't think offlineimap supports this mode
 does it?

Don't think so. I'm not going to comment on the usefulness of this mode
in public :-)

/D

-- 
Dirk Hohndel
Intel Open Source Technology Center
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


[PATCH] Fix code extracting the MTA from Received: headers

2010-04-08 Thread Sebastian Spaeth
On 2010-04-07, Dirk Hohndel wrote:
> 
> The previous code made too many assumptions about the (sadly not
> standardized) format of the Received headers. This version should
> be more robust to deal with different variations.

This code might be useful for some, but I know it is not being useful
for me. I use e.g. dreamhost.com as my mail provider and I never have my
email domain name show up after the Received: by .
See my Received headers for your message below.

On the other hand, it contains "for " stating the
intended email address explicitely. IMHO, we should use this before we
start some hand-wavy guessing.

Also, I have the "X-Original-To: sebastian at sspaeth.de" header. Is that
something that we could make use of before starting to guess?

Sebastian
---
Received: from segal.dreamhost.com (mx1.spunky.mail.dreamhost.com 
[208.97.132.47])
by homiemail-mx12.g.dreamhost.com (Postfix) with ESMTP id 9A6602781BC
for ; Wed,  7 Apr 2010 13:38:48 -0700 (PDT)
Received: from localhost (localhost [127.0.0.1])
by segal.dreamhost.com (Postfix) with ESMTP id 9CF8A5341BE
for ; Wed,  7 Apr 2010 13:38:48 -0700 (PDT)
Received: from connor.dreamhost.com ([208.97.132.81])
by localhost (segal.dreamhost.com [208.97.132.104]) (amavisd-new, port 
10024)
with ESMTP id S3IlsMcJewY1 for ;
Wed,  7 Apr 2010 13:38:39 -0700 (PDT)
Received: from olra.theworths.org (u15218177.onlinehome-server.com 
[82.165.184.25])
by connor.dreamhost.com (Postfix) with ESMTP id 33B472C9806F
for ; Wed,  7 Apr 2010 13:38:39 -0700 (PDT)
Received: from localhost (localhost [127.0.0.1])
by olra.theworths.org (Postfix) with ESMTP id 1978741733A;
Wed,  7 Apr 2010 13:38:38 -0700 (PDT)
X-Virus-Scanned: Debian amavisd-new at olra.theworths.org
Received: from olra.theworths.org ([127.0.0.1])
by localhost (olra.theworths.org [127.0.0.1]) (amavisd-new, port 10024)
with ESMTP id ZbcQaubefNY6; Wed,  7 Apr 2010 13:38:37 -0700 (PDT)
Received: from olra.theworths.org (localhost [127.0.0.1])
by olra.theworths.org (Postfix) with ESMTP id 044574196F4;
Wed,  7 Apr 2010 13:38:35 -0700 (PDT)


[PATCH] Fix code extracting the MTA from Received: headers

2010-04-08 Thread Dirk Hohndel
On Thu, 08 Apr 2010 09:59:14 +0200, "Sebastian Spaeth"  wrote:
> On 2010-04-07, Dirk Hohndel wrote:
> > 
> > The previous code made too many assumptions about the (sadly not
> > standardized) format of the Received headers. This version should
> > be more robust to deal with different variations.
> 
> This code might be useful for some, but I know it is not being useful
> for me. I use e.g. dreamhost.com as my mail provider and I never have my
> email domain name show up after the Received: by .
> See my Received headers for your message below.

That's the funny thing about heuristics - they are always based on the
cases the author has access to. I run my own mail servers and they put
in useful Received lines. Dreamhost doesn't appear to do that - I'm sure
there are many other scenarios that I don't handle, yet.
Please keep them coming.

> On the other hand, it contains "for " stating the
> intended email address explicitely. IMHO, we should use this before we
> start some hand-wavy guessing.
> 
> Also, I have the "X-Original-To: sebastian at sspaeth.de" header. Is that
> something that we could make use of before starting to guess?

It's complicated. Some MTAs put in bogux "for " or "for
UID 1000" into Received headers. I haven't seen any incorrect
"X-Original-To" headers, but wouldn't be surprised to see those be faked
or wrong, either.
Right now my plan is to do something like this:

1) look for my email address in To/Cc
2) look for my email in "for " in Received headers
3) look for my email in X-Original-To
4) look for the domain of my email in Received headers (not just 1st)
5) punt and use default email address

Does that sound sane?

(and thanks for sending the headers - this really helps... can others
for whom the current code or the logic mentioned above wouldn't work
send their headers, too, please?)

/D

-- 
Dirk Hohndel
Intel Open Source Technology Center


[PATCH] Fix code extracting the MTA from Received: headers

2010-04-07 Thread Carl Worth
On Wed, 07 Apr 2010 13:38:29 -0700, Dirk Hohndel  
wrote:
> The previous code made too many assumptions about the (sadly not
> standardized) format of the Received headers. This version should
> be more robust to deal with different variations.

Thanks for maintaining this. I'll have to fiddle with my mail setup
before this feature is useful for me. So I haven't tested this, (other
than to verify that it hasn't broken "notmuch reply" for me).

But I've pushed this now at least.

-Carl
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: 



[PATCH] Fix code extracting the MTA from Received: headers

2010-04-07 Thread Dirk Hohndel

The previous code made too many assumptions about the (sadly not
standardized) format of the Received headers. This version should
be more robust to deal with different variations.

Signed-off-by: Dirk Hohndel 
---
 notmuch-reply.c |   23 +--
 1 files changed, 9 insertions(+), 14 deletions(-)

diff --git a/notmuch-reply.c b/notmuch-reply.c
index 8eb4754..39377e1 100644
--- a/notmuch-reply.c
+++ b/notmuch-reply.c
@@ -296,28 +296,23 @@ guess_from_received_header (notmuch_config_t *config, 
notmuch_message_t *message
 received = notmuch_message_get_header (message, "received");
 by = strstr (received, " by ");
 if (by && *(by+4)) {
-   /* we know that there are 4 characters after by - either the 4th one
-* is '\0' (broken header) or it is the first letter of the hostname 
-* that last received this email - which we'll use to guess the right
-* from email address
+   /* sadly, the format of Received: headers is a bit inconsistent,
+* depending on the MTA used. So we try to extract just the MTA
+* here by removing leading whitespace and assuming that the MTA
+* name ends at the next whitespace
+* we test for *(by+4) to be non-'\0' to make sure there's something
+* there at all - and then assume that the first whitespace delimited
+* token that follows is the last receiving server
 */
mta = strdup (by+4);
if (mta == NULL)
return NULL;
-
-   /* After the MTA comes its IP address (or HELO response) in parenthesis.
-* so let's terminate the string there
-*/
-   if ((ptr = strchr (mta, '(')) == NULL) {
-   free (mta);
+   token = strtok(mta," \t");
+   if (token == NULL)
return NULL;
-   }
-   *ptr = '\0';
-
/* Now extract the last two components of the MTA host name
 * as domain and tld
 */
-   token = mta;
while ((ptr = strsep (, delim)) != NULL) {
if (*ptr == '\0')
continue;
-- 
1.6.6.1


-- 
Dirk Hohndel
Intel Open Source Technology Center


[PATCH] Fix code extracting the MTA from Received: headers

2010-04-07 Thread Dirk Hohndel

The previous code made too many assumptions about the (sadly not
standardized) format of the Received headers. This version should
be more robust to deal with different variations.

Signed-off-by: Dirk Hohndel hohn...@infradead.org
---
 notmuch-reply.c |   23 +--
 1 files changed, 9 insertions(+), 14 deletions(-)

diff --git a/notmuch-reply.c b/notmuch-reply.c
index 8eb4754..39377e1 100644
--- a/notmuch-reply.c
+++ b/notmuch-reply.c
@@ -296,28 +296,23 @@ guess_from_received_header (notmuch_config_t *config, 
notmuch_message_t *message
 received = notmuch_message_get_header (message, received);
 by = strstr (received,  by );
 if (by  *(by+4)) {
-   /* we know that there are 4 characters after by - either the 4th one
-* is '\0' (broken header) or it is the first letter of the hostname 
-* that last received this email - which we'll use to guess the right
-* from email address
+   /* sadly, the format of Received: headers is a bit inconsistent,
+* depending on the MTA used. So we try to extract just the MTA
+* here by removing leading whitespace and assuming that the MTA
+* name ends at the next whitespace
+* we test for *(by+4) to be non-'\0' to make sure there's something
+* there at all - and then assume that the first whitespace delimited
+* token that follows is the last receiving server
 */
mta = strdup (by+4);
if (mta == NULL)
return NULL;
-
-   /* After the MTA comes its IP address (or HELO response) in parenthesis.
-* so let's terminate the string there
-*/
-   if ((ptr = strchr (mta, '(')) == NULL) {
-   free (mta);
+   token = strtok(mta, \t);
+   if (token == NULL)
return NULL;
-   }
-   *ptr = '\0';
-
/* Now extract the last two components of the MTA host name
 * as domain and tld
 */
-   token = mta;
while ((ptr = strsep (token, delim)) != NULL) {
if (*ptr == '\0')
continue;
-- 
1.6.6.1


-- 
Dirk Hohndel
Intel Open Source Technology Center
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [PATCH] Fix code extracting the MTA from Received: headers

2010-04-07 Thread Carl Worth
On Wed, 07 Apr 2010 13:38:29 -0700, Dirk Hohndel hohn...@infradead.org wrote:
 The previous code made too many assumptions about the (sadly not
 standardized) format of the Received headers. This version should
 be more robust to deal with different variations.

Thanks for maintaining this. I'll have to fiddle with my mail setup
before this feature is useful for me. So I haven't tested this, (other
than to verify that it hasn't broken notmuch reply for me).

But I've pushed this now at least.

-Carl


pgpF2JAnApaj7.pgp
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch