Re: notmuch for documents

2010-11-08 Thread Carl Worth
On Sat, 06 Nov 2010 16:12:17 -0400, Jameson Rollins 
 wrote:
> expected.  Try it: it really works!  There are only a couple of very
> little things that are a little funky:

Hey, that's pretty cool. I'm glad it's working so well for you.

>   * allow me to specify which "headers" from my ebooks I want indexed
> ("Author", "Publisher", etc.)

Fortunately, that's a feature we're already planning to add (quite
soon!) even just to better support the indexing of email.

> So what do people think about this idea?  Does it make sense to look
> into extending notmuch to handle non-mail documents? 

I'm cautious about making any big changes to support a much wider use
case. But where little changes can make a big improvement here, I'm
ready to listen.

-Carl

-- 
carl.d.wo...@intel.com


pgpPRt14Hkf23.pgp
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


notmuch for documents

2010-11-08 Thread Carl Worth
On Sat, 06 Nov 2010 16:12:17 -0400, Jameson Rollins  wrote:
> expected.  Try it: it really works!  There are only a couple of very
> little things that are a little funky:

Hey, that's pretty cool. I'm glad it's working so well for you.

>   * allow me to specify which "headers" from my ebooks I want indexed
> ("Author", "Publisher", etc.)

Fortunately, that's a feature we're already planning to add (quite
soon!) even just to better support the indexing of email.

> So what do people think about this idea?  Does it make sense to look
> into extending notmuch to handle non-mail documents? 

I'm cautious about making any big changes to support a much wider use
case. But where little changes can make a big improvement here, I'm
ready to listen.

-Carl

-- 
carl.d.worth at intel.com
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: 



notmuch for documents

2010-11-06 Thread Darren McGuicken
On Sat, 06 Nov 2010 23:28:21 +, Darren McGuicken  wrote:
> I've now had a chance to play with this a little and while indexing,
> tagging and searching all seem to work as expected, I am getting the
> error 'Stack overflow in regexp matcher' when I try to view any of the
> ebooks which either leaves the buffer basically useless (no notmuch key
> shortcuts will work) or leads to a full segfault in emacs (23.1.1).

Hmm, looks like Michal's patch back in July fixes this behaviour:

 id:"1279279955-3110-1-git-send-email-sojkam1 at fel.cvut.cz"
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: not available
URL: 



notmuch for documents

2010-11-06 Thread Darren McGuicken
On Sat, 06 Nov 2010 16:12:17 -0400, Jameson Rollins  wrote:
> But that's it!  Everything else works as a perfect ebook indexer.  I
> can of course even add tags to my books.  Beautiful.  It's really
> quite incredible how well it works for this out of the box.  The only
> other issue is that my ebooks don't come in rfc5322-formatted files.
> I have to translate them for notmuch to work.

I've now had a chance to play with this a little and while indexing,
tagging and searching all seem to work as expected, I am getting the
error 'Stack overflow in regexp matcher' when I try to view any of the
ebooks which either leaves the buffer basically useless (no notmuch key
shortcuts will work) or leads to a full segfault in emacs (23.1.1).

The trace begins:

Debugger entered--Lisp error: (error "Stack overflow in regexp matcher")
  re-search-forward("\\(^[^>]+\\)\n>" nil t)
  notmuch-wash-tidy-citations(0)
  run-hook-with-args(notmuch-wash-tidy-citations 0)
  notmuch-show-insert-part-text/plain((:body ((:content "The Project
  Gutenberg EBook of The Adventures of Sherlock Holmes\nby Sir Art...

The contents of the ':content' part appears to be the complete text of
the novel.
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: not available
URL: 



notmuch for documents

2010-11-06 Thread Darren McGuicken
On Sat, 06 Nov 2010 16:59:17 -0400, Jameson Rollins  wrote:
> Hey, Darren.  Total side note, but you might be interested in the recent
> message from Kristoffer about his nice rss2message utility, "sluk", that
> works great for translating rss feeds into notmuch-indexable messages:

Thanks for the heads up Jamie - I saw this come through but at the time
it didn't seem to support most of the feeds I was reading.  I didn't
look too closely since feed2imap worked for me.  I'll check out the
latest version and see if that has changed.  I do like the idea of
notmuch-to-feed as a companion tool.
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: not available
URL: 



notmuch for documents

2010-11-06 Thread Darren McGuicken
On Sat, 06 Nov 2010 16:12:17 -0400, Jameson Rollins  wrote:
> Notmuch is the best damn mail system there ever was and we wouldn't
> want to mess with that.  Does abstracting everything in notmuch from
> "messages" -> "documents" hurt it as a mail system?  What if just the
> back-end were abstracted, to allow for different front-ends for
> different classes of documents, i.e. "messages", "articles", "books",
> "rss feeds", etc.?  Are there any big problems with this proposal that
> I'm overlooking?
> 
> I'm very interested to hear what others think about this idea.

Absolutely 100% in agreement - the reason I use emacs for everything I
possibly can from web browsing to writing to coding to mailing is
because it's all just (occasionally arbitrarily formatted) text and all
text can be dealt with across buffers in exactly the same manner.  The
ability to use a single indexer and search interface for any of that
same text gets a big +1 from me.

I recently started using feed2imap in order to get notmuch tagging and
searching for rss, I'm sure I'm not alone.  I was just wondering how on
earth to go about adding headers to my sets of org-mode formatted notes,
for instance, so that notmuch could pick them up and index them for me.

In fact, the less of a distinction between the types of 'document' I'm
dealing with the better.  Maybe different document types get different
default keybindings - I may not want to 'reply' to an ebook but I
absolutely may want to forward it to someone?
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 197 bytes
Desc: not available
URL: 



notmuch for documents

2010-11-06 Thread Nicolás Reynolds
El 06/11/10 04:59, Jameson Rollins dijo:
> On Sat, 06 Nov 2010 20:40:10 +, Darren McGuicken  fernseed.info> wrote:
> > I recently started using feed2imap in order to get notmuch tagging and
> > searching for rss, I'm sure I'm not alone.
> 
> Hey, Darren.  Total side note, but you might be interested in the recent
> message from Kristoffer about his nice rss2message utility, "sluk", that
> works great for translating rss feeds into notmuch-indexable messages:
> 
> http://github.com/krl/sluk/
> id:"87d3rgzdyj.fsf at rymdkoloni.se"

thanks to both of you, i was just looking for this a few weeks ago! :)


-- 
Salud!
Nicol?s Reynolds,
xmpp:fauno at kiwwwi.com.ar
omb:http://identi.ca/fauno
blog:http://selfdandi.com.ar/
gnu/linux user #455044

http://librecultivo.org.ar
http://parabolagnulinux.org
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 490 bytes
Desc: not available
URL: 



notmuch for documents

2010-11-06 Thread Jameson Rollins
On Sat, 06 Nov 2010 20:40:10 +, Darren McGuicken  wrote:
> I recently started using feed2imap in order to get notmuch tagging and
> searching for rss, I'm sure I'm not alone.

Hey, Darren.  Total side note, but you might be interested in the recent
message from Kristoffer about his nice rss2message utility, "sluk", that
works great for translating rss feeds into notmuch-indexable messages:

http://github.com/krl/sluk/
id:"87d3rgzdyj.fsf at rymdkoloni.se"

jamie.
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 835 bytes
Desc: not available
URL: 



Re: notmuch for documents

2010-11-06 Thread Darren McGuicken
On Sat, 06 Nov 2010 23:28:21 +, Darren McGuicken 
 wrote:
> I've now had a chance to play with this a little and while indexing,
> tagging and searching all seem to work as expected, I am getting the
> error 'Stack overflow in regexp matcher' when I try to view any of the
> ebooks which either leaves the buffer basically useless (no notmuch key
> shortcuts will work) or leads to a full segfault in emacs (23.1.1).

Hmm, looks like Michal's patch back in July fixes this behaviour:

 id:"1279279955-3110-1-git-send-email-sojk...@fel.cvut.cz"


pgpYjG6qnygv5.pgp
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: notmuch for documents

2010-11-06 Thread Darren McGuicken
On Sat, 06 Nov 2010 16:12:17 -0400, Jameson Rollins 
 wrote:
> But that's it!  Everything else works as a perfect ebook indexer.  I
> can of course even add tags to my books.  Beautiful.  It's really
> quite incredible how well it works for this out of the box.  The only
> other issue is that my ebooks don't come in rfc5322-formatted files.
> I have to translate them for notmuch to work.

I've now had a chance to play with this a little and while indexing,
tagging and searching all seem to work as expected, I am getting the
error 'Stack overflow in regexp matcher' when I try to view any of the
ebooks which either leaves the buffer basically useless (no notmuch key
shortcuts will work) or leads to a full segfault in emacs (23.1.1).

The trace begins:

Debugger entered--Lisp error: (error "Stack overflow in regexp matcher")
  re-search-forward("\\(^[^>]+\\)\n>" nil t)
  notmuch-wash-tidy-citations(0)
  run-hook-with-args(notmuch-wash-tidy-citations 0)
  notmuch-show-insert-part-text/plain((:body ((:content "The Project
  Gutenberg EBook of The Adventures of Sherlock Holmes\nby Sir Art...

The contents of the ':content' part appears to be the complete text of
the novel.


pgp5Ghu2R6pb8.pgp
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


notmuch for documents

2010-11-06 Thread Jameson Rollins
A little while ago on #notmuch, madduck and ojwb mentioned that they
thought notmuch was overly focused on mail.  At the time I thought this
was a silly criticism and defended notmuch as doing what it does
*really* well and that we shouldn't expect notmuch to be all things to
all people.

Yesterday, however, I had the profound realization that madduck and ojwb
are right.

Notmuch stores database entries for email messages.  However, these
messages are nothing more than simple rfc5322 [0] structured documents.
They include nothing more than headers and a text body.

Imagine now that I have a collection of ebooks, each stored in a single
rfc5322-formatted text file:


From: Italo Calvino 
Subject: If on a winter's night a traveler
Date: 1979

You are about to begin reading Italo Calvino's new novel,
...


I store them all in a directory.  I now create a NOTMUCH_CONFIG with a
database.path that points to that directory, and run notmuch.  Notmuch
works *out of the box* (almost) perfectly to index my collection of
ebooks.  All the notmuch commands work exactly as expected.  I can
search through the bodies, search the titles, search for an author,
search for a publication date, etc.  The emacs interface even works as
expected.  Try it: it really works!  There are only a couple of very
little things that are a little funky:

  * the "headers" in my ebooks aren't exactly intuitive ("From" instead
of "Author", "Subject" instead of "Title", etc.) and there are some
missing headers ("Publisher").  I also had to format some of them in
a strange way (I had to add "" in the "From"
field in order to get it to index properly for some reason).

  * The documentation keeps referring to "messages", even though my
documents are books.  And there are some subcommands that don't seem
to make sense ("reply" to a book?).

But that's it!  Everything else works as a perfect ebook indexer.  I can
of course even add tags to my books.  Beautiful.  It's really quite
incredible how well it works for this out of the box.  The only other
issue is that my ebooks don't come in rfc5322-formatted files.  I have
to translate them for notmuch to work.

So what would have to be tweaked in notmuch to make it work even better
as an ebook indexer?

  * add some sort of translator to extract the "headers" and "body" from
my non-rfc5322-formatted ebook files

  * allow me to specify which "headers" from my ebooks I want indexed
("Author", "Publisher", etc.)

  * tweak notmuch show to just open the ebook itself in an ebook reader
instead of outputting it to stdout

  * tweak the documentation

Those are not very big changes.  And yet, with these changes notmuch can
now work for *many* other large classes of structured documents.

Another real world example:

I have hundreds of scientific journal articles on my computer.  They are
all pdf files and each has a corresponding bibtex entry in a flat text
file.  If notmuch could read the headers from the bibtex file and the
body from the text in the pdf (ps2ascii), notmuch would work *perfectly*
as an indexer for my scientific journal articles.

So what do people think about this idea?  Does it make sense to look
into extending notmuch to handle non-mail documents?  We definitely
would *not* want to compromise notmuch as a mail indexer/reader.
Notmuch is the best damn mail system there ever was and we wouldn't want
to mess with that.  Does abstracting everything in notmuch from
"messages" -> "documents" hurt it as a mail system?  What if just the
back-end were abstracted, to allow for different front-ends for
different classes of documents, i.e. "messages", "articles", "books",
"rss feeds", etc.?  Are there any big problems with this proposal that
I'm overlooking?

I'm very interested to hear what others think about this idea.

jamie.

[0] http://tools.ietf.org/html/rfc5322
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 835 bytes
Desc: not available
URL: 



Re: notmuch for documents

2010-11-06 Thread Darren McGuicken
On Sat, 06 Nov 2010 16:59:17 -0400, Jameson Rollins 
 wrote:
> Hey, Darren.  Total side note, but you might be interested in the recent
> message from Kristoffer about his nice rss2message utility, "sluk", that
> works great for translating rss feeds into notmuch-indexable messages:

Thanks for the heads up Jamie - I saw this come through but at the time
it didn't seem to support most of the feeds I was reading.  I didn't
look too closely since feed2imap worked for me.  I'll check out the
latest version and see if that has changed.  I do like the idea of
notmuch-to-feed as a companion tool.


pgpTuibffqKoN.pgp
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: notmuch for documents

2010-11-06 Thread Nicolás Reynolds
El 06/11/10 04:59, Jameson Rollins dijo:
> On Sat, 06 Nov 2010 20:40:10 +, Darren McGuicken 
>  wrote:
> > I recently started using feed2imap in order to get notmuch tagging and
> > searching for rss, I'm sure I'm not alone.
> 
> Hey, Darren.  Total side note, but you might be interested in the recent
> message from Kristoffer about his nice rss2message utility, "sluk", that
> works great for translating rss feeds into notmuch-indexable messages:
> 
> http://github.com/krl/sluk/
> id:"87d3rgzdyj@rymdkoloni.se"

thanks to both of you, i was just looking for this a few weeks ago! :)


-- 
Salud!
Nicolás Reynolds,
xmpp:fa...@kiwwwi.com.ar
omb:http://identi.ca/fauno
blog:http://selfdandi.com.ar/
gnu/linux user #455044

http://librecultivo.org.ar
http://parabolagnulinux.org


pgpvUxnQgJMN1.pgp
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: notmuch for documents

2010-11-06 Thread Jameson Rollins
On Sat, 06 Nov 2010 20:40:10 +, Darren McGuicken 
 wrote:
> I recently started using feed2imap in order to get notmuch tagging and
> searching for rss, I'm sure I'm not alone.

Hey, Darren.  Total side note, but you might be interested in the recent
message from Kristoffer about his nice rss2message utility, "sluk", that
works great for translating rss feeds into notmuch-indexable messages:

http://github.com/krl/sluk/
id:"87d3rgzdyj@rymdkoloni.se"

jamie.


pgpK9pxhFu9mJ.pgp
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: notmuch for documents

2010-11-06 Thread Darren McGuicken
On Sat, 06 Nov 2010 16:12:17 -0400, Jameson Rollins 
 wrote:
> Notmuch is the best damn mail system there ever was and we wouldn't
> want to mess with that.  Does abstracting everything in notmuch from
> "messages" -> "documents" hurt it as a mail system?  What if just the
> back-end were abstracted, to allow for different front-ends for
> different classes of documents, i.e. "messages", "articles", "books",
> "rss feeds", etc.?  Are there any big problems with this proposal that
> I'm overlooking?
> 
> I'm very interested to hear what others think about this idea.

Absolutely 100% in agreement - the reason I use emacs for everything I
possibly can from web browsing to writing to coding to mailing is
because it's all just (occasionally arbitrarily formatted) text and all
text can be dealt with across buffers in exactly the same manner.  The
ability to use a single indexer and search interface for any of that
same text gets a big +1 from me.

I recently started using feed2imap in order to get notmuch tagging and
searching for rss, I'm sure I'm not alone.  I was just wondering how on
earth to go about adding headers to my sets of org-mode formatted notes,
for instance, so that notmuch could pick them up and index them for me.

In fact, the less of a distinction between the types of 'document' I'm
dealing with the better.  Maybe different document types get different
default keybindings - I may not want to 'reply' to an ebook but I
absolutely may want to forward it to someone?


pgpWwiVn25nFV.pgp
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


notmuch for documents

2010-11-06 Thread Jameson Rollins
A little while ago on #notmuch, madduck and ojwb mentioned that they
thought notmuch was overly focused on mail.  At the time I thought this
was a silly criticism and defended notmuch as doing what it does
*really* well and that we shouldn't expect notmuch to be all things to
all people.

Yesterday, however, I had the profound realization that madduck and ojwb
are right.

Notmuch stores database entries for email messages.  However, these
messages are nothing more than simple rfc5322 [0] structured documents.
They include nothing more than headers and a text body.

Imagine now that I have a collection of ebooks, each stored in a single
rfc5322-formatted text file:


From: Italo Calvino 
Subject: If on a winter's night a traveler
Date: 1979

You are about to begin reading Italo Calvino's new novel,
...


I store them all in a directory.  I now create a NOTMUCH_CONFIG with a
database.path that points to that directory, and run notmuch.  Notmuch
works *out of the box* (almost) perfectly to index my collection of
ebooks.  All the notmuch commands work exactly as expected.  I can
search through the bodies, search the titles, search for an author,
search for a publication date, etc.  The emacs interface even works as
expected.  Try it: it really works!  There are only a couple of very
little things that are a little funky:

  * the "headers" in my ebooks aren't exactly intuitive ("From" instead
of "Author", "Subject" instead of "Title", etc.) and there are some
missing headers ("Publisher").  I also had to format some of them in
a strange way (I had to add "" in the "From"
field in order to get it to index properly for some reason).

  * The documentation keeps referring to "messages", even though my
documents are books.  And there are some subcommands that don't seem
to make sense ("reply" to a book?).

But that's it!  Everything else works as a perfect ebook indexer.  I can
of course even add tags to my books.  Beautiful.  It's really quite
incredible how well it works for this out of the box.  The only other
issue is that my ebooks don't come in rfc5322-formatted files.  I have
to translate them for notmuch to work.

So what would have to be tweaked in notmuch to make it work even better
as an ebook indexer?

  * add some sort of translator to extract the "headers" and "body" from
my non-rfc5322-formatted ebook files

  * allow me to specify which "headers" from my ebooks I want indexed
("Author", "Publisher", etc.)

  * tweak notmuch show to just open the ebook itself in an ebook reader
instead of outputting it to stdout

  * tweak the documentation

Those are not very big changes.  And yet, with these changes notmuch can
now work for *many* other large classes of structured documents.

Another real world example:

I have hundreds of scientific journal articles on my computer.  They are
all pdf files and each has a corresponding bibtex entry in a flat text
file.  If notmuch could read the headers from the bibtex file and the
body from the text in the pdf (ps2ascii), notmuch would work *perfectly*
as an indexer for my scientific journal articles.

So what do people think about this idea?  Does it make sense to look
into extending notmuch to handle non-mail documents?  We definitely
would *not* want to compromise notmuch as a mail indexer/reader.
Notmuch is the best damn mail system there ever was and we wouldn't want
to mess with that.  Does abstracting everything in notmuch from
"messages" -> "documents" hurt it as a mail system?  What if just the
back-end were abstracted, to allow for different front-ends for
different classes of documents, i.e. "messages", "articles", "books",
"rss feeds", etc.?  Are there any big problems with this proposal that
I'm overlooking?

I'm very interested to hear what others think about this idea.

jamie.

[0] http://tools.ietf.org/html/rfc5322


pgpul6EdvaH3r.pgp
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch