[notmuch] interesting project!

2009-11-24 Thread Dirk-Jan C. Binnema
Hi Carl,

> "Carl" == Carl Worth  writes:

Carl> I agree that trying to support OOM doesn't make sense without
Carl> testing. But that's why I want to test notmuch with memory-fault
Carl> injection. We've been doing this with the cairo library with good
Carl> success for a while.

Carl> As for "unlikely that malloc ever returns NULL", that's simply a
Carl> system-configuration away (just turn off overcommit). And I can 
imagine
Carl> notmuch being used in lots of places, (netbooks, web servers, etc.), 
so
Carl> I do want to make it as robust as possible.

That is a very laudable goal! But it's also quite hard to achieve, considering
that both GMime and Xapian may have some different ideas about that. And at
least in the current code, I see fprintfs in 'malloc-returns-NULL'-cases --
but fprintf itself will probably allocate memory too. Also, at least now, the
bad?alloc exceptions for C++ are not caught. Of course, that can be changed,
but it's just to show that these things are hard to get right.

Carl> Thanks for mentioning the hash table. The hash table is one of the few
Carl> things that I *am* using from glib right now in notmuch. It's got a
Carl> couple of bizarre things about it:

Carl>   1. The simpler-appearing g_hash_table_new function is useless
Carl>  for common cases like hashing strings. It will just leak
Carl>  memory. So g_hash_table_new_full is the only one worth using.

Hmmm, I never noticed that behavior. Tf you are using dynamically allocated
strings, GHashTable won't free them for you -- but I can really see how it
could (given that it takes generic pointers), so you have to free those
yourself. But any memleaks beyond that?

Carl>   2. There are two lookup functions, g_hash_table_lookup, and
Carl>  g_hash_table_lookup_extended.

Carl>  So, it might make sense if a hash-table interface supported
Carl>  these two modes well. What's bizarre about GHashTable though,
Carl>  is that in the "just a set" case, we only use NULL as the
Carl>  value when inserting. And distinguish "previously inserted
Carl>  with NULL" from "never inserted" is the one thing that
Carl>  g_hash_table_lookup can't do. So I've only found that I could
Carl>  ever use g_hash_table_lookup_extended, (and pass a pair of
Carl>  NULLs for the return arguments I don't need).

Hmmn, well in I found that returning NULL for 'not set' works in many cases,
and it makes it quite easy for that. If you need to distinguish between NULL
and 'not set', you can use either the _extended version as you mention, or use
some special NOT_SET static ptr you can compare with (and handle it
appropriately in the destructor).

Carl> I definitely like the idea of having tiny, focused libraries that do
Carl> one thing and do it well, (and maybe even some things so tiny that
Carl> they are actually designed to be copied into the application---like
Carl> with gnulib and with Eric's new hash table).

Ok; glib fills the role pretty well for me, and I don't really pay for the
parts that I don't use. But tastes differ, no problem ;-)

Carl> Thanks for understanding. :-)
Carl> And I enjoy the conversation,

Same here :) 

Best wishes,
Dirk.

-- 
Dirk-Jan C. Binnema  Helsinki, Finland
e:djcb at djcbsoftware.nl   w:www.djcbsoftware.nl
pgp: D09C E664 897D 7D39 5047 A178 E96A C7A1 017D DA3C


[notmuch] interesting project!

2009-11-23 Thread Carl Worth
On Mon, 23 Nov 2009 09:08:34 +0200, Dirk-Jan C. Binnema  wrote:
> Well, the counter point to the OOM-problems is that is that in many programs,
> the 'malloc returns NULL'-case is often not very well tested (because it's
> rather hard to test), and that at least on Linux, it's unlikely that malloc
> ever does return NULL. Lennart Poettering wrote this up in some more
> detail[1]. Of course, the requirements for notmuch may be a bit different and
> I definitely don't want to suggest any radical change here after only finding
> out about notmuch a few days ago :)

No problem. I'm glad to discuss things. That's how I learn and find out
whether my decisions are sound or not. :-)

I agree that trying to support OOM doesn't make sense without
testing. But that's why I want to test notmuch with memory-fault
injection. We've been doing this with the cairo library with good
success for a while.

As for "unlikely that malloc ever returns NULL", that's simply a
system-configuration away (just turn off overcommit). And I can imagine
notmuch being used in lots of places, (netbooks, web servers, etc.), so
I do want to make it as robust as possible.

> (BTW, there is a hashtable implementation in libc, (hcreate(3) etc.). Is that
> one not sufficiently 'talloc-friendly'? It's not very user-friendly, but
> that's another matter)

Thanks for mentioning the hash table. The hash table is one of the few
things that I *am* using from glib right now in notmuch. It's got a
couple of bizarre things about it:

1. The simpler-appearing g_hash_table_new function is useless
   for common cases like hashing strings. It will just leak
   memory. So g_hash_table_new_full is the only one worth using.

2. There are two lookup functions, g_hash_table_lookup, and
   g_hash_table_lookup_extended.

   And a program like notmuch really does use the hash table in
   two ways. In the simpler case, we're using the hash to simply
   implement a set, (such as avoiding duplicates in a set of
   tags). In the more complex case, we're associating actual
   objects with the keys, (such as when linking messages
   together into a tree for the thread).

   So, it might make sense if a hash-table interface supported
   these two modes well. What's bizarre about GHashTable though,
   is that in the "just a set" case, we only use NULL as the
   value when inserting. And distinguish "previously inserted
   with NULL" from "never inserted" is the one thing that
   g_hash_table_lookup can't do. So I've only found that I could
   ever use g_hash_table_lookup_extended, (and pass a pair of
   NULLs for the return arguments I don't need).

Fortunately, Eric Anholt spent *his* flight home coding up an nice
implementation of an open-addressed hash designed specifically to be a
tiny little implementation suitable for copying directly into
project. He's testing it with Mesa now, and I might pull it into notmuch
later.

> I could imagine the string functions could replace the ones in talloc. There
> are many more string functions, e.g., for handling file names / paths, which
> are quite useful. Then there are wrappers for gcc'isms (G_UNLIKELY etc.) that
> would make the ones in notmuch unneeded, and a lot of compatibility things
> like G_DIR_SEPARATOR. And the datastructures (GSlice/GList/GHashtable) are
> nice. The UTF8 functionality might come in handy.

Yes. The portability stuff I think is actually interesting. I've thought
it really might make sense to have something that gave you *just* that,
(without a main loop, an object system, several memory allocators or
pieces for making your own memory allocators, etc). I haven't had a
chance to look into gnulib yet, but I'd like to.

As for a list, I almost always find it cleaner to be able to just have
my own list data structures, (to avoid casts, etc.).

And for a hash table, I'm interested in what Eric's doing.

I'm really not prejudiced against using code that's already been
written, (in spite of what might appear I don't feel the need to
re-solve every problem that's already been solved). But I have long
thought that we could have better support for a "C programmers toolkit"
of commonly needed things than we have before.

I definitely like the idea of having tiny, focused libraries that do one
thing and do it well, (and maybe even some things so tiny that they are
actually designed to be copied into the application---like with gnulib
and with Eric's new hash table).

> Anyway, I was just curious, people have survived without GLib before, and if
> you dislike the OOM-strategy, it's a bit of a no-no of course.

Thanks for understanding. :-)

And I enjoy the conversation,

-Carl


[notmuch] interesting project!

2009-11-23 Thread Dirk-Jan C. Binnema
Hi Carl,

> "Carl" == Carl Worth  writes:

Carl> On Sun, 22 Nov 2009 14:23:10 +0200, Dirk-Jan C. Binnema
Carl>  wrote:
>> A small question: it seems that notmuch is avoiding the use of GLib 
directly
>> (of course, it depend on it anyway through GMime); is this because of
>> OOM-handling? It'd be nice if GLib could be used, it would make some 
things
>> quite a bit easier.

Carl> It's true that I don't like the OOM handling in glib. I also think 
that
Carl> glib tries to be too many different things at the same time. And
Carl> finally, having some talloc-friendly data structures (like a 
hash-table)
Carl> would be really nice.

Well, the counter point to the OOM-problems is that is that in many programs,
the 'malloc returns NULL'-case is often not very well tested (because it's
rather hard to test), and that at least on Linux, it's unlikely that malloc
ever does return NULL. Lennart Poettering wrote this up in some more
detail[1]. Of course, the requirements for notmuch may be a bit different and
I definitely don't want to suggest any radical change here after only finding
out about notmuch a few days ago :)

(BTW, there is a hashtable implementation in libc, (hcreate(3) etc.). Is that
one not sufficiently 'talloc-friendly'? It's not very user-friendly, but
that's another matter)

Carl> In the meantime, as you say, we're already linking with glib because 
of
Carl> GMime, so there's really no reason not to call functions that are 
there
Carl> and that do what we want. What kinds of things were you thinking of 
that
Carl> would be easier with glib?

I could imagine the string functions could replace the ones in talloc. There
are many more string functions, e.g., for handling file names / paths, which
are quite useful. Then there are wrappers for gcc'isms (G_UNLIKELY etc.) that
would make the ones in notmuch unneeded, and a lot of compatibility things
like G_DIR_SEPARATOR. And the datastructures (GSlice/GList/GHashtable) are
nice. The UTF8 functionality might come in handy.

Anyway, I was just curious, people have survived without GLib before, and if
you dislike the OOM-strategy, it's a bit of a no-no of course.

Best wishes,
Dirk.


[1] http://article.gmane.org/gmane.comp.audio.jackit/19998

-- 
Dirk-Jan C. Binnema  Helsinki, Finland
e:djcb at djcbsoftware.nl   w:www.djcbsoftware.nl
pgp: D09C E664 897D 7D39 5047 A178 E96A C7A1 017D DA3C


[notmuch] interesting project!

2009-11-22 Thread Carl Worth
On Sun, 22 Nov 2009 14:23:10 +0200, Dirk-Jan C. Binnema  wrote:
> A small question: it seems that notmuch is avoiding the use of GLib directly
> (of course, it depend on it anyway through GMime); is this because of
> OOM-handling? It'd be nice if GLib could be used, it would make some things
> quite a bit easier.

It's true that I don't like the OOM handling in glib. I also think that
glib tries to be too many different things at the same time. And
finally, having some talloc-friendly data structures (like a hash-table)
would be really nice.

In the meantime, as you say, we're already linking with glib because of
GMime, so there's really no reason not to call functions that are there
and that do what we want. What kinds of things were you thinking of that
would be easier with glib?

-Carl


[notmuch] interesting project!

2009-11-22 Thread Dirk-Jan C. Binnema
Hi Carl,

> "Carl" == Carl Worth  writes:

>> Anyhow, I'll study the notmuch code and see if there are some useful
>> bits in my code that might make sense there, e.g., various dir scanning
>> optimizations, see [2].

Carl> That sounds great. It's also good to have people with experience in
Carl> this area join and help out. I'll look forward to any ideas or other
Carl> contributions you will have.

Thanks for the nice words!

A small question: it seems that notmuch is avoiding the use of GLib directly
(of course, it depend on it anyway through GMime); is this because of
OOM-handling? It'd be nice if GLib could be used, it would make some things
quite a bit easier.

Best wishes,
Dirk.

-- 
Dirk-Jan C. Binnema  Helsinki, Finland
e:djcb at djcbsoftware.nl   w:www.djcbsoftware.nl
pgp: D09C E664 897D 7D39 5047 A178 E96A C7A1 017D DA3C


[notmuch] interesting project!

2009-11-21 Thread Carl Worth
On Sat, 21 Nov 2009 11:01:46 +0200, Dirk-Jan C. Binnema  wrote:
> Hi all,

Hi, Dirk. Welcome to notmuch!

> Wow, 'notmuch' looks like a very interesting project. In 2008, I wrote an
> e-mail (Maildir) search tool called 'mu'[1], also using Xapian and GMime; my
> plan was at some point to turn it into a mail reader (use
> offlineimap/fetchmail etc. for getting the mail, and something else for
> sending it), but I never got that far. Search works pretty well
> though. Anyhow, it seems notmuch is getting there quickly.

Ah, how ignorant I was. I probably could have saved myself a bunch of
work if I had just started with mu. Oh, well.

> Anyhow, I'll study the notmuch code and see if there are some useful bits in
> my code that might make sense there, e.g., various dir scanning optimizations,
> see [2].

That sounds great. It's also good to have people with experience in this
area join and help out. I'll look forward to any ideas or other
contributions you will have.

> [2] http://djcbflux.blogspot.com/2008/10/seek-destroy.html

Thanks. Stewart Smith contributed a patch to notmuch a couple of days
ago that added inode sorting, (which I was totally unaware of as an
optimization idea):

Read mail directory in inode number order
http://git.notmuchmail.org/git/notmuch?a=commitdiff;h=a45ff8c36112a2f17c1ad5c20a16c30a47759797

-Carl


[notmuch] interesting project!

2009-11-21 Thread Jameson Greaf Rollins
On Sat, Nov 21, 2009 at 01:10:42PM +0100, Carl Worth wrote:
> On Sat, 21 Nov 2009 11:01:46 +0200, Dirk-Jan C. Binnema  gmail.com> wrote:
> > Anyhow, I'll study the notmuch code and see if there are some useful bits in
> > my code that might make sense there, e.g., various dir scanning 
> > optimizations,
> > see [2].
> 
> That sounds great. It's also good to have people with experience in this
> area join and help out. I'll look forward to any ideas or other
> contributions you will have.

I've been using mu for a while now and have found it incredibly
useful.  I just heard about notmuch and it seems like the mail
processing system I've been waiting for, so I'm incredibly excited.
The idea of the mu and notmuch folks working together sounds
incredibly awesome.  I am really encouraged.

jamie.
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 836 bytes
Desc: Digital signature
URL: 



[notmuch] interesting project!

2009-11-21 Thread Dirk-Jan C. Binnema
Hi all,

Wow, 'notmuch' looks like a very interesting project. In 2008, I wrote an
e-mail (Maildir) search tool called 'mu'[1], also using Xapian and GMime; my
plan was at some point to turn it into a mail reader (use
offlineimap/fetchmail etc. for getting the mail, and something else for
sending it), but I never got that far. Search works pretty well
though. Anyhow, it seems notmuch is getting there quickly.

Anyhow, I'll study the notmuch code and see if there are some useful bits in
my code that might make sense there, e.g., various dir scanning optimizations,
see [2].

Good luck!
Dirk.


[1] http://www.djcbsoftware.nl/code/mu/
[2] http://djcbflux.blogspot.com/2008/10/seek-destroy.html

-- 
Dirk-Jan C. Binnema  Helsinki, Finland
e:djcb at djcbsoftware.nl   w:www.djcbsoftware.nl
pgp: D09C E664 897D 7D39 5047 A178 E96A C7A1 017D DA3C