Error while compacting: Bad position key

2018-07-12 Thread Mike Hommey
Hi,

When running `notmuch compact` today, it stopped with the following
output:

Compacting database...
compacting table postlist
 Reduced by 25% 648656K (2498904K -> 1850248K)
compacting table docdata
 Reduced by 15% 24K (152K -> 128K)
compacting table termlist
 Reduced by 1% 27008K (2211800K -> 2184792K)
compacting table position
Error while compacting: Bad position key

Compaction failed: A Xapian exception occurred

Running xapian-check says:

docdata:
blocksize=8K items=2677 firstunused=19 revision=15425 levels=1 root=17
B-tree checked okay
docdata table structure checked OK

termlist:
blocksize=8K items=2986940 firstunused=276475 revision=15425 levels=2 
root=271786
B-tree checked okay
doclen not within bounds
(...)
doclen not within bounds
termlist table errors found: 107982

postlist:
blocksize=8K items=16090818 firstunused=312363 revision=15425 levels=3 
root=249894
B-tree checked okay
postlist table structure checked OK

position:
blocksize=8K items=236476398 firstunused=653990 revision=15425 levels=3 
root=598684
xapian-check: DatabaseError: Block 459158 item 179: not in sorted order


Is there something I can do, or do I essentially need to completely
rebuild the database (and if so, what's the best way to do it?)

Cheers,

Mike
___
notmuch mailing list
notmuch@notmuchmail.org
https://notmuchmail.org/mailman/listinfo/notmuch


[notmuch] [PATCH] Build and link against notmuch shared library

2010-01-20 Thread Mike Hommey
On Wed, Jan 20, 2010 at 03:07:27PM -0500, Ben Gamari wrote:
> + install lib/libnotmuch.so $(DESTDIR)$(prefix)/lib/

> +$(dir)/libnotmuch.so: $(libnotmuch_modules)
> + $(call quiet,CXX,$(LDFLAGS)) $^ $(FINAL_LDFLAGS) -shared -o $@

If you're going to install that in $(prefix)/lib, you'd better make that
a library with a SONAME. -Wl,-soname,$(notdir $@) should do it, and
you'd obviously have to change the target name to add a SO version.

Mike


Re: [notmuch] [PATCH] Build and link against notmuch shared library

2010-01-20 Thread Mike Hommey
On Wed, Jan 20, 2010 at 03:07:27PM -0500, Ben Gamari wrote:
 + install lib/libnotmuch.so $(DESTDIR)$(prefix)/lib/

 +$(dir)/libnotmuch.so: $(libnotmuch_modules)
 + $(call quiet,CXX,$(LDFLAGS)) $^ $(FINAL_LDFLAGS) -shared -o $@

If you're going to install that in $(prefix)/lib, you'd better make that
a library with a SONAME. -Wl,-soname,$(notdir $@) should do it, and
you'd obviously have to change the target name to add a SO version.

Mike
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


[notmuch] Quick thoughts on a notmuch daemon

2010-01-09 Thread Mike Hommey
On Fri, Jan 08, 2010 at 11:26:31PM +1300, martin f krafft wrote:
> also sprach Mike Hommey <mh+notmuch at glandium.org> [2010.01.08.2220 +1300]:
> > FYI, I have a good experience writing fuse filesystems, both with
> > high-level and low-level APIs. I'd avise to use the low-level API,
> > which allows for better performance.
> 
> I don't have any experience with FUSE yet, but the examples in
> /usr/share/doc/libfuse-dev/examples/ look trivial. This is where
> I would start, one function at a time. If you have a better
> suggestion, I'd love to hear it; or to clone your repo! ;)

As I said above, there are 2 sets of APIs in FUSE.

The high-level API sends the full path for the file being accessed for
every system call. And except for specific cases such as read(), write()
or readdir() you have nothing else to identify the file you are referring
to, which means you have to parse the path, and find the proper file
accordingly.
In notmuch case, that would mean doing a search for most system calls.
Try to imagine how many syscalls that are not read(), write() or
readdir() mutt does when opening a Maildir.

The low-level API, otoh, uses inode numbers extensively (again, except
for read, write and readdir). The lookup call is responsible for resolving
the paths, given an inode and a name. Its results are cached by the kernel.
So, for example reading foo/bar from your fuse mount point will lookup
foo in the inode 1 (FUSE_ROOT_ID) and then do another lookup for bar in
the first result.
One of the problems with this API is that the inode number type is
unsigned long, which means you can't necessarily map real inode numbers,
which can be 64 bits. And even if it could, afaik, there is no quick way
to get a file from its inode, sadly.

All in all, in the high-level API case, that means we would need lookups
caching badly, and in the low-level API case, some fast way to map on
one hand virtual directories with inodes numbers, and on the other hand,
real files with inode numbers.

Some quick thoughts, about the whole thing:
- We will need to be careful about deduplication: if you copy a file
  from one directory to another, you don't want to have the copy in the
  underlying Maildir. But as you won't know until the file is totally
  written and closed...
- We should probably allow extra files to be stored in the virtual
  Maildir (for example, courierimap stores stuff in a Maildir)
- We may not need a client program at all, the "search directories"
  configuration could be handled via extended file attributes.

I also had another not quite unrelated idea a while ago, that could have
its value here: a generic data store, very much like the git object
database (an idea would be to have the git object datastore be a special
case of this generic data store, for possibly interesting compatibility),
which would allow for better storage of the messages: if the maildir is
exposed via fuse, why would you need a raw maildir for ? It would also
allow easier deduplication of messages that are different but not quite:
- Mailing list replies you get both directly and from the mailing
  list software, their headers have differences, but the files are mostly
  equivalent
- Mail quotes are found in both the original message and its response.

Mike


[notmuch] Quick thoughts on a notmuch daemon

2010-01-08 Thread Mike Hommey
On Fri, Jan 08, 2010 at 10:03:21PM +1300, martin f krafft wrote:
> also sprach Mike Hommey <mh+notmuch at glandium.org> [2010.01.08.2106 +1300]:
> > I'm in \o_ (though I won't be in Wellington). I've been thinking
> > about a fuse filesystem on top of notmuch for a while.
> 
> Grand news to see you interested! A FUSE filesystem is <25 functions
> to implement, and each function is basically an entity of its own
> and thus highly parallisable. Once we agreed on a general mapping
> between filesystem i/o and notmuch interaction, 25 of us can write
> a function each and be done. How's that for collaboration? ;)

FYI, I have a good experience writing fuse filesystems, both with
high-level and low-level APIs. I'd avise to use the low-level API, which
allows for better performance.

Mike


[notmuch] indexing encrypted messages (was: OpenPGP support)

2010-01-08 Thread Mike Hommey
On Fri, Jan 08, 2010 at 03:56:10PM +1300, martin f krafft wrote:
> also sprach Jameson Graef Rollins  
> [2009.11.26.1901 +1300]:
> > I would really like to start using notmuch with emacs beyond just
> > testing, but I really need to be able to handle/read/send mail with
> > PGP/MIME encoded attachments.  Do folks have any suggestions on how to
> > handle this?  Is there a separate emacs mode that people use for
> > signing/verifying/{de,en}crypting mail buffers, or is this something
> > that is going to have to be integrated into the notmuch mode?  I guess
> > the notmuch-show mode at least will need to do some verifying and
> > decrypting.
> 
> How about indexing GPG-encrypted messages?

That may leak decrypted form in the xapian index, though in a split
manner. But that'd still be a problem IMHO.

Mike


[notmuch] Quick thoughts on a notmuch daemon

2010-01-08 Thread Mike Hommey
On Fri, Jan 08, 2010 at 03:56:20PM +1300, martin f krafft wrote:
> These ideas are not new, and I've written about them before:
> 
> http://madduck.net/blog/2007.07.24:a-user-space-filesystem-for-mail-labeling/
> 
> notmuch seems an excellent base for implementing such a filesystem.
> I will try to make time before LCA to get up to speed on fuse, then
> maybe Carl and Micah and I (and whoever else will be in Wellington)
> can hack this up in a few hours and over a few beers.
> 
> If this resonates, or you want to work on this too, let's hear from
> you!

I'm in \o_ (though I won't be in Wellington). I've been thinking about a
fuse filesystem on top of notmuch for a while.

Mike


[notmuch] 25 minutes load time with emacs -f notmuch

2009-11-22 Thread Mike Hommey
On Sat, Nov 21, 2009 at 05:36:18PM -0500, Brett Viren wrote:
> On Sat, Nov 21, 2009 at 12:07 PM, Carl Worth  wrote:
> 
> > Though, frankly, I think we need to fix "notmuch new" to do much better
> > than 40 files/sec.
> 
> Just a "me too".
> 
> Processed 130871 total files in 38m 7s (57 files/sec.).
> Added 102723 new messages to the database (not much, really).
> 
> This was ~2GB of mail on a 2.5GHz CPU.  That seems pretty reasonable
> to me but I'd like to rerun the "notmuch new" under google perftools
> to see if there are any obvious bottlenecks that might be cleaned up.

FWIW, my 90k+ messages mailbox was imported at a pace of 130 files/sec,
and my CPU is "only" 2.2GHz, but I have a SSD. A good share of the
bottlenecks is "simply" I/O. Don't forget having a lot of small files
sucks I/O wise, as files are most likely spread all over the disk.

A good test, if you have enough memory, would be to put your mailbox in
a tmpfs, and see how fast that imports.

Mike


[notmuch] Segfault with weird Message-ID

2009-11-21 Thread Mike Hommey
On Fri, Nov 20, 2009 at 10:05:56PM +0100, Mike Hommey wrote:
> On Fri, Nov 20, 2009 at 09:53:37PM +0100, Carl Worth wrote:
> > On Fri, 20 Nov 2009 14:26:25 +0100, Mike Hommey <mh+notmuch at 
> > glandium.org> wrote:
> > > - for some reason, xapian doesn't want to add the document corresponding
> > >   to this old spam message: notmuch->xapian_db->add_document throws an
> > >   exception.
> > 
> > I think things had just gone wrong long before then.
> 
> I *did* see it throwing an exception from there. The sad thing is that I
> can't reproduce the problem anymore :-/
> 
> > > I can provide the spam if necessary, or can continue debugging the issue
> > > with some guidance.
> > 
> > Thanks for providing it. It turns out that the giant Message-Id value
> > wasn't causing the problem. Instead the message was corrupt by having a
> > stray new line at the third line. (So GMime is seeing only the first two
> > lines of headers). We *used* to have working code to detect this kind of
> > file as "not an email" but again, this broke when we changed
> > notmuch_message_get_header to return "" instead of NULL for missing
> > headers.
> 
> Interestingly, when I first traced on what message the crash was
> happening, I did see notmuch having the message-id in the message_id
> variable.

I just was able to reproduce after starting over.

header isn't "", and message_id is correctly filled. I can also confirm
the exception is thrown from notmuch->xapian_db->add_document.

> FWIW, that was using c05c3f1.

With 3ae12b1, I get the following output:
Error: A Xapian exception occurred. Halting processing.

But I confirm there is no crash, now.

Cheers,

Mike


[notmuch] Segfault with weird Message-ID

2009-11-20 Thread Mike Hommey
On Fri, Nov 20, 2009 at 09:53:37PM +0100, Carl Worth wrote:
> On Fri, 20 Nov 2009 14:26:25 +0100, Mike Hommey <mh+notmuch at glandium.org> 
> wrote:
> > - for some reason, xapian doesn't want to add the document corresponding
> >   to this old spam message: notmuch->xapian_db->add_document throws an
> >   exception.
> 
> I think things had just gone wrong long before then.

I *did* see it throwing an exception from there. The sad thing is that I
can't reproduce the problem anymore :-/

> > I can provide the spam if necessary, or can continue debugging the issue
> > with some guidance.
> 
> Thanks for providing it. It turns out that the giant Message-Id value
> wasn't causing the problem. Instead the message was corrupt by having a
> stray new line at the third line. (So GMime is seeing only the first two
> lines of headers). We *used* to have working code to detect this kind of
> file as "not an email" but again, this broke when we changed
> notmuch_message_get_header to return "" instead of NULL for missing
> headers.

Interestingly, when I first traced on what message the crash was
happening, I did see notmuch having the message-id in the message_id
variable.

FWIW, that was using c05c3f1.

I'll see if I can reproduce my segfault again when starting from scratch
again, and will also give a try to your patches.

Cheers,

Mike


[notmuch] Segfault with weird Message-ID

2009-11-20 Thread Mike Hommey
Hi,

I got a segfault when importing my maildir. It happened because of an
old weird email, where the message-id is the following:
Message-ID: <22b17a1f$4fbe$0550 at myrop (ew6.southwind.net 
[216.53.98.70]) by onyx.southwind.net from homepage.com (114.230.197.216) by 
newmail.spectraweb.ch from default (m202.2-25.warwick.net [
218.242.202.80]) by host.warwick.net (8.10.0.Beta10/8.10.0.Beta10) with SMTP id 
e9GKEKk19201>

I have absolutely no idea how it got this value, but the mail being
an archived 8 years old spam, I'm not exactly sure if anyone would 
still expect such message id to occur.

Anyways, the stack dump is the following:
#0  0x76d1e598 in Xapian::Document::add_term(std::string const&, 
unsigned int) () from /usr/lib/libxapian.so.15
#1  0x0040f5ff in _notmuch_message_add_term (message=0x0, 
prefix_name=0x41ad7f "tag", value=0x4191b0 "inbox") at lib/message.cc:587
#2  0x0040f827 in notmuch_message_add_tag (message=0x0, tag=0x4191b0 
"inbox") at lib/message.cc:668
#3  0x00407bc8 in tag_inbox_and_unread (message=0x0) at notmuch-new.c:44
#4  0x00407f63 in add_files_recursive (notmuch=0x62cc20, path=0x832e90 
"/home/mh/Maildir/saved-messages/cur", st=0x7fffe000, state=0x7fffe240) 
at notmuch-new.c:185
#5  0x00408036 in add_files_recursive (notmuch=0x62cc20, path=0x832de0 
"/home/mh/Maildir/saved-messages", st=0x7fffe000, state=0x7fffe240) at 
notmuch-new.c:223
#6  0x00408036 in add_files_recursive (notmuch=0x62cc20, path=0x62c920 
"/home/mh/Maildir", st=0x7fffe000, state=0x7fffe240) at 
notmuch-new.c:223
#7  0x00408245 in add_files (notmuch=0x62cc20, path=0x62c920 
"/home/mh/Maildir", state=0x7fffe240) at notmuch-new.c:287
#8  0x00408704 in notmuch_new_command (ctx=0x61f140, argc=0, 
argv=0x7fffe3e8) at notmuch-new.c:431
#9  0x00406ea8 in main (argc=2, argv=0x7fffe3d8) at notmuch.c:400

And the most likely problem is that message is NULL.

Now, looking at the code, there seems to me there actually 3 problems:
- _notmuch_message_create_for_message_id can return NULL, and while
  there is a test for it in notmuch_database_add_message, the function
  still returns a success code
- things are still going on even when message is NULL in
  add_files_recursive
- for some reason, xapian doesn't want to add the document corresponding
  to this old spam message: notmuch->xapian_db->add_document throws an
  exception.

I can provide the spam if necessary, or can continue debugging the issue
with some guidance.

Cheers,

Mike