[notmuch] [PATCH] notmuch-new: Eliminate tallocs whilst construct filenames.

2009-11-27 Thread Jeffrey Ollie
On Fri, Nov 27, 2009 at 11:41 PM, Carl Worth  wrote:
>
> But yes, we need a test suite.

I have zero experience, but Check[1] looks interesting.

> Oh, and we'll also need to deal with remaining glib usage inside of
> notmuch, (and inside of GMime as well), before we can do good testing
> for memory-fault injection. Maybe what we'll end up with is a patch to
> de-glib-ify GMime? I'm not sure.

Rather than de-glib-ify'ing GMime wouldn't it be better to find a non
glib-based library for MIME processing?  I do most of my development
in Python these days so I don't know of any examples.  My googling
turned up several C++ libraries or c-client and I wouldn't want to go
down either of those paths.

[1] http://check.sourceforge.net/

-- 
Jeff Ollie


[notmuch] [PATCH] notmuch-new: Eliminate tallocs whilst construct filenames.

2009-11-27 Thread Carl Worth
On Fri, 27 Nov 2009 14:17:02 +, Chris Wilson  
wrote:
> > I *know* I composed a reply to this message earlier, but apparently
> > you're right that it never went out. (*sigh*---if only I had a reliable
> > mail client[*]).
> 
> I hear there's one called sup... ;-)

Heh. But seriously, I hit a lot of crashes with sup, and that invariably
led to *lots* of lost tag changes. I'm willing to live with lots of
Xapian-defect-250 pain right now to avoid that lossage.

> The issue I see with the "error, continue" pattern is that we are in
> danger of not reporting the first error but the last one.

OK. That would be a problem, yes.

> Is notmuch ready for fault-injection yet? Maybe once you have a nasty
> testsuite...

It's not "ready" in the sense that there is going to be a huge series of
fixes that fault-injection will find. But it's definitely "ready" in
the sense that I want to start doing this kind of testing.

But yes, we need a test suite.

Oh, and we'll also need to deal with remaining glib usage inside of
notmuch, (and inside of GMime as well), before we can do good testing
for memory-fault injection. Maybe what we'll end up with is a patch to
de-glib-ify GMime? I'm not sure.

-Carl
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: 



[notmuch] [PATCH] notmuch-new: Eliminate tallocs whilst construct filenames.

2009-11-27 Thread Chris Wilson
Excerpts from Carl Worth's message of Fri Nov 27 13:23:06 + 2009:
> On Sun, 22 Nov 2009 00:57:10 +, Chris Wilson  chris-wilson.co.uk> wrote:
> > The majority of filenames will fit within PATH_MAX [4096] (because
> > that's a hard limit imposed by the filesystems) so we can avoid an
> > allocation per lookup and thereby eliminate a large proportion of the
> > overhead of scanning a maildir.
> 
> Hi Chris,
> 
> I *know* I composed a reply to this message earlier, but apparently
> you're right that it never went out. (*sigh*---if only I had a reliable
> mail client[*]).

I hear there's one called sup... ;-)

> Anyway, on to the promised review of the patch:
> 
> The basic idea of this I really like, of course. Thanks for helping to
> improve the efficiency of notmuch. But this part I don't:

It's a bit outdated now, the impact of the asprintf overhead is lost
with the introduction of the scandir. I'm sure the profiles will
indicate something else to improve beyond xapian...

> One might argue that the error-cleanup goto is a common and
> well-understood idiom, so that it's not bad to have. The problem I have
> with it is how much work it is to verify. If I'm reading one line of
> code in the middle of a function that's testing for an error case and
> handling it with goto, then I need to:
> 
> 1. Verify this condition, and that a return value variable gets
>set.
> 
> 2. Check down at the end of the function to ensure the correct
>objects are freed and the correct return value is returned.
> 
> 3. Check back up at the beginning of the function to ensure the
>relevant objects are initialized to NULL.
> 
> And beyond verification, one has to code in these three places
> simultaneously as well.
> 
> Meanwhile, by taking advantage of talloc like I did in the original
> version of this code, an error return becomes a much more local decision
> and is much simpler to code.

The issue I see with the "error, continue" pattern is that we are in
danger of not reporting the first error but the last one. The common
practice is abort on error and cleanup, and this single instance is
inconsistent with the rest of the error handling everywhere else in
notmuch-new.c. The argument to counter your 3 points is the unified
unwind approach where there is just a single exit path that handles
both error and normal returns. (You always have to set the appropriate
error value whether you continue or abort.)

The advantage of talloc is that it provides a convenient allocation
context that not only groups object by their associated lifetimes, but
provides a single point of access for reaping allocations on unwind. I
don't see how talloc affects the decision process on how to actually
handle errors, but it does make it easier to cleanup afterwards.

Is notmuch ready for fault-injection yet? Maybe once you have a nasty
testsuite...
-ickle
-- 
Chris Wilson, Intel Open Source Technology Centre


Re: [notmuch] [PATCH] notmuch-new: Eliminate tallocs whilst construct filenames.

2009-11-27 Thread Carl Worth
On Sun, 22 Nov 2009 00:57:10 +, Chris Wilson ch...@chris-wilson.co.uk 
wrote:
 The majority of filenames will fit within PATH_MAX [4096] (because
 that's a hard limit imposed by the filesystems) so we can avoid an
 allocation per lookup and thereby eliminate a large proportion of the
 overhead of scanning a maildir.

Hi Chris,

I *know* I composed a reply to this message earlier, but apparently
you're right that it never went out. (*sigh*---if only I had a reliable
mail client[*]).

  [*] I tried and tried to figure out how to get gnus to save an Fcc (a
  file copy of all outgoing messages), and failed to configure the
  various fake newsgroup things that gnus wanted for me to be able
  to do this. I also failed to get message-mode to do an Fcc. I
  settled instead for a Bcc to myself on all messages, deciding that
  it would be nice to see that mail actually went *out* and not just
  that I thought I sent it. Maybe that failed here, I don't know.

  The other piece I want is for unsent mail drafts to automatically
  be saved, and for notmuch to prompt me to continue with a draft if
  I start composing a new message while a draft is
  around. Currently, I can save a draft while composing in
  message-mode with C-x C-s but the big bug here is that the
  drafts never get deleted when I send the message, so I can't tell
  unfinished drafts apart from completed-and-sent messages.

Anyway, on to the promised review of the patch:

The basic idea of this I really like, of course. Thanks for helping to
improve the efficiency of notmuch. But this part I don't:

 - continue;
 - }
 -
 - if (S_ISREG (st-st_mode)) {
 - /* If the file hasn't been modified since the last
 -  * add_files, then we need not look at it. */
 - if (path_dbtime == 0 || st-st_mtime  path_dbtime) {
 - state-processed_files++;
 -
 - status = notmuch_database_add_message (notmuch, next, message);
 - switch (status) {
 - /* success */
 + } else {

It's true that in a former life, one of your primary jobs was to clean
up after me, especially cleaning up things like memory leaks on error
paths. But in my new talloc-enabled world, I'm quite happy to keep
cleaner, easier-to-understand code for the price of just having a
talloced object live slightly longer on an error path.

We do still have to be careful that we don't let such object accumulate
without bounds in some error case, and that they don't lock up important
resources. But when it's just a matter of a dozen bytes or so, talloced
into the parent's context which is going to be freed in the error path
above, then I'm much happier to allow for this transient leak rather
than convoluting the code with more cleanup gotos.

One might argue that the error-cleanup goto is a common and
well-understood idiom, so that it's not bad to have. The problem I have
with it is how much work it is to verify. If I'm reading one line of
code in the middle of a function that's testing for an error case and
handling it with goto, then I need to:

1. Verify this condition, and that a return value variable gets
   set.

2. Check down at the end of the function to ensure the correct
   objects are freed and the correct return value is returned.

3. Check back up at the beginning of the function to ensure the
   relevant objects are initialized to NULL.

And beyond verification, one has to code in these three places
simultaneously as well.

Meanwhile, by taking advantage of talloc like I did in the original
version of this code, an error return becomes a much more local decision
and is much simpler to code.

Chris, I'd be interested to hear any thoughts you have on this pattern.

-Carl
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [notmuch] [PATCH] notmuch-new: Eliminate tallocs whilst construct filenames.

2009-11-27 Thread Carl Worth
On Fri, 27 Nov 2009 14:17:02 +, Chris Wilson ch...@chris-wilson.co.uk 
wrote:
  I *know* I composed a reply to this message earlier, but apparently
  you're right that it never went out. (*sigh*---if only I had a reliable
  mail client[*]).
 
 I hear there's one called sup... ;-)

Heh. But seriously, I hit a lot of crashes with sup, and that invariably
led to *lots* of lost tag changes. I'm willing to live with lots of
Xapian-defect-250 pain right now to avoid that lossage.

 The issue I see with the error, continue pattern is that we are in
 danger of not reporting the first error but the last one.

OK. That would be a problem, yes.

 Is notmuch ready for fault-injection yet? Maybe once you have a nasty
 testsuite...

It's not ready in the sense that there is going to be a huge series of
fixes that fault-injection will find. But it's definitely ready in
the sense that I want to start doing this kind of testing.

But yes, we need a test suite.

Oh, and we'll also need to deal with remaining glib usage inside of
notmuch, (and inside of GMime as well), before we can do good testing
for memory-fault injection. Maybe what we'll end up with is a patch to
de-glib-ify GMime? I'm not sure.

-Carl


pgp8mU3VrPhzD.pgp
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


[notmuch] [PATCH] notmuch-new: Eliminate tallocs whilst construct filenames.

2009-11-22 Thread Chris Wilson
The majority of filenames will fit within PATH_MAX [4096] (because
that's a hard limit imposed by the filesystems) so we can avoid an
allocation per lookup and thereby eliminate a large proportion of the
overhead of scanning a maildir.

Signed-off-by: Chris Wilson 
---
 notmuch-new.c |   75 ++--
 1 files changed, 45 insertions(+), 30 deletions(-)

diff --git a/notmuch-new.c b/notmuch-new.c
index 0dd2784..13559d1 100644
--- a/notmuch-new.c
+++ b/notmuch-new.c
@@ -107,6 +107,7 @@ add_files_recursive (notmuch_database_t *notmuch,
 add_files_state_t *state)
 {
 DIR *dir = NULL;
+char buf[4096];
 struct dirent *entry = NULL;
 char *next = NULL;
 time_t path_mtime, path_dbtime;
@@ -114,6 +115,7 @@ add_files_recursive (notmuch_database_t *notmuch,
 notmuch_message_t *message = NULL;
 struct dirent **namelist = NULL;
 int num_entries;
+int path_len, dname_len;

 /* If we're told to, we bail out on encountering a read-only
  * directory, (with this being a clear clue from the user to
@@ -140,6 +142,12 @@ add_files_recursive (notmuch_database_t *notmuch,

 int i=0;

+path_len = strlen (path);
+if (path_len + 2 < (int) sizeof (buf)) {
+   memcpy (buf, path, path_len);
+   buf[path_len] = '/';
+}
+
 while (!interrupted) {
if (i == num_entries)
break;
@@ -164,37 +172,42 @@ add_files_recursive (notmuch_database_t *notmuch,
continue;
}

-   next = talloc_asprintf (notmuch, "%s/%s", path, entry->d_name);
+   dname_len = strlen (entry->d_name);
+   if (path_len + dname_len + 2 < (int) sizeof (buf)) {
+   memcpy (buf + path_len + 1, entry->d_name, dname_len);
+   buf[path_len + dname_len + 1] = '\0';
+   next = buf;
+   } else {
+   next = talloc_asprintf (notmuch, "%s/%s", path, entry->d_name);
+   }

if (stat (next, st)) {
fprintf (stderr, "Error reading %s: %s\n",
 next, strerror (errno));
ret = NOTMUCH_STATUS_FILE_ERROR;
-   continue;
-   }
-
-   if (S_ISREG (st->st_mode)) {
-   /* If the file hasn't been modified since the last
-* add_files, then we need not look at it. */
-   if (path_dbtime == 0 || st->st_mtime > path_dbtime) {
-   state->processed_files++;
-
-   status = notmuch_database_add_message (notmuch, next, );
-   switch (status) {
-   /* success */
+   } else {
+   if (S_ISREG (st->st_mode)) {
+   /* If the file hasn't been modified since the last
+* add_files, then we need not look at it. */
+   if (path_dbtime == 0 || st->st_mtime > path_dbtime) {
+   state->processed_files++;
+
+   status = notmuch_database_add_message (notmuch, next, 
);
+   switch (status) {
+   /* success */
case NOTMUCH_STATUS_SUCCESS:
state->added_messages++;
tag_inbox_and_unread (message);
break;
-   /* Non-fatal issues (go on to next file) */
+   /* Non-fatal issues (go on to next file) */
case NOTMUCH_STATUS_DUPLICATE_MESSAGE_ID:
-   /* Stay silent on this one. */
+   /* Stay silent on this one. */
break;
case NOTMUCH_STATUS_FILE_NOT_EMAIL:
fprintf (stderr, "Note: Ignoring non-mail file: %s\n",
 next);
break;
-   /* Fatal issues. Don't process anymore. */
+   /* Fatal issues. Don't process anymore. */
case NOTMUCH_STATUS_READONLY_DATABASE:
case NOTMUCH_STATUS_XAPIAN_EXCEPTION:
case NOTMUCH_STATUS_OUT_OF_MEMORY:
@@ -210,25 +223,27 @@ add_files_recursive (notmuch_database_t *notmuch,
case NOTMUCH_STATUS_LAST_STATUS:
INTERNAL_ERROR ("add_message returned unexpected value: 
%d",  status);
goto DONE;
-   }
+   }

-   if (message) {
-   notmuch_message_destroy (message);
-   message = NULL;
-   }
+   if (message) {
+   notmuch_message_destroy (message);
+   message = NULL;
+   }

-   if (do_add_files_print_progress) {
-   do_add_files_print_progress = 0;
-   add_files_print_progress (state);
+   if (do_add_files_print_progress) {
+   do_add_files_print_progress = 0;
+   add_files_print_progress (state);
+