[PATCH] Clean up author display for some "Last, First" cases

2010-04-24 Thread Dirk Hohndel
On Sat, 24 Apr 2010 08:30:22 -0700, Carl Worth  wrote:
> On Wed, 21 Apr 2010 22:04:39 -0700, Dirk Hohndel  
> wrote:
> > +/* clean up the uggly "Lastname, Firstname" format that some mail systems
> > + * (most notably, Exchange) are creating to be "Firstname Lastname" 
> > + * To make sure that we don't change other potential situations where a 
> > + * comma is in the name, we check that we match one of these patterns
> > + * "Last, First" 
> > + * "Last, First MI" 
> 
> This is an interesting idea. We could make it a little more flexible by
> doing a regexp comparison of "first.*last" against the email address,
> (perhaps people have email addresses like carl_worth at example.com?)

I'll look into that. We actually had some discussion about this on IRC
and I was thinking of taking this feature to a new level... something
like: 
- by default we show names as they come in (least surprise)
- we offer to reverse Last, First
- we offer to shorten to FirstL
- we offer an alias map
So I could define that mail from "cworth at cworth.org" gets the author
listed as "cworth". Or as CarlW.

> > +char *cleanauthor,*testauthor;
> 
> I'd much rather see an underscore separating two words in a single
> identifier, (so clean_author, test_author).

Happy to comply to your preferences in the future

> > +   /* let's assemble what we think is the correct name */
> > +   lname = comma - author;
> > +   fname = strlen(author) - lname - 2;
> > +   strncpy(cleanauthor, comma + 2, fname);
> > +   *(cleanauthor+fname) = ' ';
> > +   strncpy(cleanauthor + fname + 1, author, lname);
> > +   *(cleanauthor+fname+1+lname) = '\0';
> 
> The comment above, ("what we think is the correct name"), didn't help me
> understand what the code is doing. And the code is hard enough to follow
> that I could really use some help. Something like:
> 
> /* Break at comma and reverse: "Last, First etc." -> "First Last etc." */

Ok, I'll try to be more explicit in documenting algorithms

> Lots of little additions here and there so plenty of chance for an
> off-by-one. Do we have a test case for this yet?

Nope. Will do.

> > +   /* make a temporary copy and see if it matches the email */
> > +   testauthor = xstrdup(cleanauthor);
> 
> It would be preferable to use talloc functions consistently. (Existing
> occurrences of xstrdup in the code base are for the sake of
> talloc-unfriendly glib data structures like GHashTable.)
> 
> As is, testauthor is leaking.

Oops.

/D


[PATCH] Clean up author display for some "Last, First" cases

2010-04-24 Thread Carl Worth
On Wed, 21 Apr 2010 22:04:39 -0700, Dirk Hohndel  
wrote:
> +/* clean up the uggly "Lastname, Firstname" format that some mail systems
> + * (most notably, Exchange) are creating to be "Firstname Lastname" 
> + * To make sure that we don't change other potential situations where a 
> + * comma is in the name, we check that we match one of these patterns
> + * "Last, First" 
> + * "Last, First MI" 

This is an interesting idea. We could make it a little more flexible by
doing a regexp comparison of "first.*last" against the email address,
(perhaps people have email addresses like carl_worth at example.com?)

> +char *cleanauthor,*testauthor;

I'd much rather see an underscore separating two words in a single
identifier, (so clean_author, test_author).

> + /* let's assemble what we think is the correct name */
> + lname = comma - author;
> + fname = strlen(author) - lname - 2;
> + strncpy(cleanauthor, comma + 2, fname);
> + *(cleanauthor+fname) = ' ';
> + strncpy(cleanauthor + fname + 1, author, lname);
> + *(cleanauthor+fname+1+lname) = '\0';

The comment above, ("what we think is the correct name"), didn't help me
understand what the code is doing. And the code is hard enough to follow
that I could really use some help. Something like:

/* Break at comma and reverse: "Last, First etc." -> "First Last etc." */

Lots of little additions here and there so plenty of chance for an
off-by-one. Do we have a test case for this yet?

> + /* make a temporary copy and see if it matches the email */
> + testauthor = xstrdup(cleanauthor);

It would be preferable to use talloc functions consistently. (Existing
occurrences of xstrdup in the code base are for the sake of
talloc-unfriendly glib data structures like GHashTable.)

As is, testauthor is leaking.

-Carl
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: 



Re: [PATCH] Clean up author display for some Last, First cases

2010-04-24 Thread Carl Worth
On Wed, 21 Apr 2010 22:04:39 -0700, Dirk Hohndel hohn...@infradead.org wrote:
 +/* clean up the uggly Lastname, Firstname format that some mail systems
 + * (most notably, Exchange) are creating to be Firstname Lastname 
 + * To make sure that we don't change other potential situations where a 
 + * comma is in the name, we check that we match one of these patterns
 + * Last, First first.l...@company.com
 + * Last, First MI first.mi.l...@company.com

This is an interesting idea. We could make it a little more flexible by
doing a regexp comparison of first.*last against the email address,
(perhaps people have email addresses like carl_wo...@example.com?)

 +char *cleanauthor,*testauthor;

I'd much rather see an underscore separating two words in a single
identifier, (so clean_author, test_author).

 + /* let's assemble what we think is the correct name */
 + lname = comma - author;
 + fname = strlen(author) - lname - 2;
 + strncpy(cleanauthor, comma + 2, fname);
 + *(cleanauthor+fname) = ' ';
 + strncpy(cleanauthor + fname + 1, author, lname);
 + *(cleanauthor+fname+1+lname) = '\0';

The comment above, (what we think is the correct name), didn't help me
understand what the code is doing. And the code is hard enough to follow
that I could really use some help. Something like:

/* Break at comma and reverse: Last, First etc. - First Last etc. */

Lots of little additions here and there so plenty of chance for an
off-by-one. Do we have a test case for this yet?

 + /* make a temporary copy and see if it matches the email */
 + testauthor = xstrdup(cleanauthor);

It would be preferable to use talloc functions consistently. (Existing
occurrences of xstrdup in the code base are for the sake of
talloc-unfriendly glib data structures like GHashTable.)

As is, testauthor is leaking.

-Carl


pgpmhhsdbp2An.pgp
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [PATCH] Clean up author display for some Last, First cases

2010-04-24 Thread Dirk Hohndel
On Sat, 24 Apr 2010 08:30:22 -0700, Carl Worth cwo...@cworth.org wrote:
 On Wed, 21 Apr 2010 22:04:39 -0700, Dirk Hohndel hohn...@infradead.org 
 wrote:
  +/* clean up the uggly Lastname, Firstname format that some mail systems
  + * (most notably, Exchange) are creating to be Firstname Lastname 
  + * To make sure that we don't change other potential situations where a 
  + * comma is in the name, we check that we match one of these patterns
  + * Last, First first.l...@company.com
  + * Last, First MI first.mi.l...@company.com
 
 This is an interesting idea. We could make it a little more flexible by
 doing a regexp comparison of first.*last against the email address,
 (perhaps people have email addresses like carl_wo...@example.com?)

I'll look into that. We actually had some discussion about this on IRC
and I was thinking of taking this feature to a new level... something
like: 
- by default we show names as they come in (least surprise)
- we offer to reverse Last, First
- we offer to shorten to FirstL
- we offer an alias map
So I could define that mail from cwo...@cworth.org gets the author
listed as cworth. Or as CarlW.

  +char *cleanauthor,*testauthor;
 
 I'd much rather see an underscore separating two words in a single
 identifier, (so clean_author, test_author).

Happy to comply to your preferences in the future

  +   /* let's assemble what we think is the correct name */
  +   lname = comma - author;
  +   fname = strlen(author) - lname - 2;
  +   strncpy(cleanauthor, comma + 2, fname);
  +   *(cleanauthor+fname) = ' ';
  +   strncpy(cleanauthor + fname + 1, author, lname);
  +   *(cleanauthor+fname+1+lname) = '\0';
 
 The comment above, (what we think is the correct name), didn't help me
 understand what the code is doing. And the code is hard enough to follow
 that I could really use some help. Something like:
 
 /* Break at comma and reverse: Last, First etc. - First Last etc. */

Ok, I'll try to be more explicit in documenting algorithms

 Lots of little additions here and there so plenty of chance for an
 off-by-one. Do we have a test case for this yet?

Nope. Will do.

  +   /* make a temporary copy and see if it matches the email */
  +   testauthor = xstrdup(cleanauthor);
 
 It would be preferable to use talloc functions consistently. (Existing
 occurrences of xstrdup in the code base are for the sake of
 talloc-unfriendly glib data structures like GHashTable.)
 
 As is, testauthor is leaking.

Oops.

/D
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


[PATCH] Clean up author display for some "Last, First" cases

2010-04-21 Thread Dirk Hohndel

We specifically check if this is one of these two patterns:
 "Last, First" 
 "Last, First MI" 
If this is the case, we rewrite the author name in a more
reader friendly manner

Signed-off-by: Dirk Hohndel 
---
 lib/thread.cc |   51 +--
 1 files changed, 49 insertions(+), 2 deletions(-)

diff --git a/lib/thread.cc b/lib/thread.cc
index baa0d7f..7e72114 100644
--- a/lib/thread.cc
+++ b/lib/thread.cc
@@ -144,6 +144,51 @@ _thread_move_matched_author (notmuch_thread_t *thread,
 return;
 }

+/* clean up the uggly "Lastname, Firstname" format that some mail systems
+ * (most notably, Exchange) are creating to be "Firstname Lastname" 
+ * To make sure that we don't change other potential situations where a 
+ * comma is in the name, we check that we match one of these patterns
+ * "Last, First" 
+ * "Last, First MI" 
+ */
+char *
+_thread_cleanup_author (notmuch_thread_t *thread,
+   const char *author, const char *from)
+{
+char *cleanauthor,*testauthor;
+const char *comma;
+char *blank;
+int fname,lname;
+
+cleanauthor = talloc_strdup(thread, author);
+if (cleanauthor == NULL)
+   return NULL;
+comma = strchr(author,',');
+if (comma) {
+   /* let's assemble what we think is the correct name */
+   lname = comma - author;
+   fname = strlen(author) - lname - 2;
+   strncpy(cleanauthor, comma + 2, fname);
+   *(cleanauthor+fname) = ' ';
+   strncpy(cleanauthor + fname + 1, author, lname);
+   *(cleanauthor+fname+1+lname) = '\0';
+   /* make a temporary copy and see if it matches the email */
+   testauthor = xstrdup(cleanauthor);
+   
+   blank=strchr(testauthor,' ');
+   while (blank != NULL) {
+   *blank = '.';
+   blank=strchr(testauthor,' ');
+   }
+   if (strcasestr(from, testauthor) == NULL)
+   /* we didn't identify this as part of the email address 
+   * so let's punt and return the original author */
+   strcpy (cleanauthor, author);
+  
+}
+return cleanauthor;
+}
+
 /* Add 'message' as a message that belongs to 'thread'.
  *
  * The 'thread' will talloc_steal the 'message' and hold onto a
@@ -158,6 +203,7 @@ _thread_add_message (notmuch_thread_t *thread,
 InternetAddressList *list;
 InternetAddress *address;
 const char *from, *author;
+char *cleanauthor;

 _notmuch_message_list_add_message (thread->message_list,
   talloc_steal (thread, message));
@@ -178,8 +224,9 @@ _thread_add_message (notmuch_thread_t *thread,
mailbox = INTERNET_ADDRESS_MAILBOX (address);
author = internet_address_mailbox_get_addr (mailbox);
}
-   _thread_add_author (thread, author);
-   notmuch_message_set_author (message, author);
+   cleanauthor = _thread_cleanup_author (thread, author, from);
+   _thread_add_author (thread, cleanauthor);
+   notmuch_message_set_author (message, cleanauthor);
}
g_object_unref (G_OBJECT (list));
 }
-- 
1.6.6.1


-- 
Dirk Hohndel
Intel Open Source Technology Center