Folder search semantics (was Re: [RFC PATCH v2 0/8] Custom query parser, date search, folder search, and more)

2011-02-03 Thread Austin Clements
Quoth Carl Worth on Feb 02 at  2:48 pm:
> Restricting my reply to one tiny bit of your mail:
> 
> You wrote:
> > non-recursive is the only thing that makes sense for Maildir++ folders
> 
> Either I'm not understanding Maildir++ folders, or I don't agree with
> you.
> 
> I might have an email archive that looks like this:
> 
>   Maildir
> .work
>   .project1
>   .project2
>   .etc...
> .family
>   .dad
>   .mom
>   .brother
>   .etc...
> 
> With the above setup, what would be unreasonable about wanting to search
> for all work-related messages (across all projects, say) with a string
> like "folder:work" ?
> 
> Now, a person might definitely want to search for messages in the
> ".work" folder directly, (not including the sub-folders), so we should
> provide support for users to get at that behavior as well, (such as a
> proposed "folder:work$" or so).
> 
> To me, both cases are perfectly legitimate, and I don't understand an
> argument that claims that only one makes sense. (Or again, I may be
> misunderstanding something.)

(Somebody with more first-hand Maildir++ experience should jump in here.
I stopped using Maildir++ a long time ago, so I may have no idea what
I'm talking about.)

Both cases are perfectly legitimate.

However, the issue with Maildir++ is that the inbox is stored in the
top-level directory:

  Maildir
cur
new
tmp
.work
.work.project1

As a consequence, all folders are subfolders of the inbox.  With
recursive search, a search for your inbox folder returns *all* of your
messages.  I wasn't trying to say that we shouldn't support recursive
search (I'm all for flexibility), but it's a confusing default for
Maildir++ because of this.

Maildir++ has the added twist that the inbox folder has no name.  As a
result, currently notmuch can't search for a Maildir++ inbox folder,
which needs to be addressed somehow.  The least surprising approach
would compatibility with the Maildir++ convention of calling the
top-level folder INBOX, the subfolder INBOX.work, etc.


Maildir++ issues aside, I submit that rooted, non-recursive folder
searches are a more natural default with a more conventional syntactic
extension to non-rooted/recursive searches.  In
id:87aaiy3u65.fsf at yoom.home.cworth.org, you mentioned that you
implemented non-rooted folder search to mimic subject search.  But file
system paths are not natural language like subject lines.  File system
paths are hierarchical and rooted.

Of course, special query operators like ^ and $ can mitigate this, but
these queries *aren't* regexps and, furthermore, people don't usually
apply regexps to file names.  They apply globs.  Glob syntax has the
added benefit of congruity with Xapian wildcard syntax.  This naturally
leads to a rooted, non-recursive syntax by default (like globs), where a
* at the end means recursive and a * at the beginning means non-rooted.
In fact, we could easily generalize this to arbitrary shell globs.


Here's a proposal that, I think, addresses Maildir++ inboxes and
subfolders; rooted, non-rooted, recursive, and non-recursive queries;
and then some.  Plus, it wouldn't require many code changes; you've
already done the hard work.

Switch XFOLDER from a probabilistic prefix with word-splitting to a
boolean prefix without word-splitting.  When indexing, strip off the cur
or new and examine the resulting directory name.  If it's the mail root,
this is a Maildir++ inbox, so add the term XFOLDERINBOX.  If it starts
with a dot, it's a Maildir++ subfolder, so add the term
XFOLDERINBOX<.dirname>.  Otherwise, add the term XFOLDER.
Then, using a custom query transform for the "folder:" prefix, enumerate
XFOLDER terms and form a synonym query out of those that fnmatch the
user's folder query.


Re: [RFC PATCH v2 0/8] Custom query parser, date search, folder search, and more

2011-02-03 Thread Carl Worth
Restricting my reply to one tiny bit of your mail:

You wrote:
 non-recursive is the only thing that makes sense for Maildir++ folders

Either I'm not understanding Maildir++ folders, or I don't agree with
you.

I might have an email archive that looks like this:

  Maildir
.work
  .project1
  .project2
  .etc...
.family
  .dad
  .mom
  .brother
  .etc...

With the above setup, what would be unreasonable about wanting to search
for all work-related messages (across all projects, say) with a string
like folder:work ?

Now, a person might definitely want to search for messages in the
.work folder directly, (not including the sub-folders), so we should
provide support for users to get at that behavior as well, (such as a
proposed folder:work$ or so).

To me, both cases are perfectly legitimate, and I don't understand an
argument that claims that only one makes sense. (Or again, I may be
misunderstanding something.)

-Carl

-- 
carl.d.wo...@intel.com


pgpdyeeZ7h1Pw.pgp
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Folder search semantics (was Re: [RFC PATCH v2 0/8] Custom query parser, date search, folder search, and more)

2011-02-03 Thread Austin Clements
Quoth Carl Worth on Feb 02 at  2:48 pm:
 Restricting my reply to one tiny bit of your mail:
 
 You wrote:
  non-recursive is the only thing that makes sense for Maildir++ folders
 
 Either I'm not understanding Maildir++ folders, or I don't agree with
 you.
 
 I might have an email archive that looks like this:
 
   Maildir
 .work
   .project1
   .project2
   .etc...
 .family
   .dad
   .mom
   .brother
   .etc...
 
 With the above setup, what would be unreasonable about wanting to search
 for all work-related messages (across all projects, say) with a string
 like folder:work ?
 
 Now, a person might definitely want to search for messages in the
 .work folder directly, (not including the sub-folders), so we should
 provide support for users to get at that behavior as well, (such as a
 proposed folder:work$ or so).
 
 To me, both cases are perfectly legitimate, and I don't understand an
 argument that claims that only one makes sense. (Or again, I may be
 misunderstanding something.)

(Somebody with more first-hand Maildir++ experience should jump in here.
I stopped using Maildir++ a long time ago, so I may have no idea what
I'm talking about.)

Both cases are perfectly legitimate.

However, the issue with Maildir++ is that the inbox is stored in the
top-level directory:

  Maildir
cur
new
tmp
.work
.work.project1

As a consequence, all folders are subfolders of the inbox.  With
recursive search, a search for your inbox folder returns *all* of your
messages.  I wasn't trying to say that we shouldn't support recursive
search (I'm all for flexibility), but it's a confusing default for
Maildir++ because of this.

Maildir++ has the added twist that the inbox folder has no name.  As a
result, currently notmuch can't search for a Maildir++ inbox folder,
which needs to be addressed somehow.  The least surprising approach
would compatibility with the Maildir++ convention of calling the
top-level folder INBOX, the subfolder INBOX.work, etc.


Maildir++ issues aside, I submit that rooted, non-recursive folder
searches are a more natural default with a more conventional syntactic
extension to non-rooted/recursive searches.  In
id:87aaiy3u65@yoom.home.cworth.org, you mentioned that you
implemented non-rooted folder search to mimic subject search.  But file
system paths are not natural language like subject lines.  File system
paths are hierarchical and rooted.

Of course, special query operators like ^ and $ can mitigate this, but
these queries *aren't* regexps and, furthermore, people don't usually
apply regexps to file names.  They apply globs.  Glob syntax has the
added benefit of congruity with Xapian wildcard syntax.  This naturally
leads to a rooted, non-recursive syntax by default (like globs), where a
* at the end means recursive and a * at the beginning means non-rooted.
In fact, we could easily generalize this to arbitrary shell globs.


Here's a proposal that, I think, addresses Maildir++ inboxes and
subfolders; rooted, non-rooted, recursive, and non-recursive queries;
and then some.  Plus, it wouldn't require many code changes; you've
already done the hard work.

Switch XFOLDER from a probabilistic prefix with word-splitting to a
boolean prefix without word-splitting.  When indexing, strip off the cur
or new and examine the resulting directory name.  If it's the mail root,
this is a Maildir++ inbox, so add the term XFOLDERINBOX.  If it starts
with a dot, it's a Maildir++ subfolder, so add the term
XFOLDERINBOX.dirname.  Otherwise, add the term XFOLDERdirname.
Then, using a custom query transform for the folder: prefix, enumerate
XFOLDER terms and form a synonym query out of those that fnmatch the
user's folder query.
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


[RFC PATCH v2 0/8] Custom query parser, date search, folder search, and more

2011-02-02 Thread Carl Worth
Restricting my reply to one tiny bit of your mail:

You wrote:
> non-recursive is the only thing that makes sense for Maildir++ folders

Either I'm not understanding Maildir++ folders, or I don't agree with
you.

I might have an email archive that looks like this:

  Maildir
.work
  .project1
  .project2
  .etc...
.family
  .dad
  .mom
  .brother
  .etc...

With the above setup, what would be unreasonable about wanting to search
for all work-related messages (across all projects, say) with a string
like "folder:work" ?

Now, a person might definitely want to search for messages in the
".work" folder directly, (not including the sub-folders), so we should
provide support for users to get at that behavior as well, (such as a
proposed "folder:work$" or so).

To me, both cases are perfectly legitimate, and I don't understand an
argument that claims that only one makes sense. (Or again, I may be
misunderstanding something.)

-Carl

-- 
carl.d.worth at intel.com
-- next part --
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
URL: 



[RFC PATCH v2 0/8] Custom query parser, date search, folder search, and more

2011-02-02 Thread Austin Clements
I rebased the query parser against current master.  It's on the
qparser-3 branch at
  http://awakening.csail.mit.edu/git/notmuch.git

At cworth's request, I've folded the database closing bug fix in to
the appropriate patch.  I also stripped out my implementation of
folder searching, since it obviously conflicts with cworth's [1].  I'm
not planning to resend the patches unless asked because there were no
actual code changes.

[1] I still assert the "correct" folder solution is somewhere between
mine and cworth's: rooted and non-recursive by default (non-recursive
is the only thing that makes sense for Maildir++ folders and it would
be silly for the default to depend on folder type), but leveraging the
flexibility of custom query transforms to support Maildir++ folders
and recursive (and maybe non-rooted) searches as well.  Non-rooted and
recursive searches even map to natural syntaxes that align with
Xapian's existing wildcard syntax.

Quoth myself on Jan 16 at  3:10 am:
> This is version 2 of the custom query parser.  It now supports date
> searches with sane syntax, folder searches (without any additions or
> changes to the database, unlike cworth's recent commit), and "tag:*"
> and "-tag:*" queries for finding tagged and untagged messages.  I used
> these features to guide changes to the original design and to validate
> the approach.  This is still RFC, but it's much less raw now.
> 
> In addition to the new features, the core query parser has a bunch of
> cleanups and changes, including completely redone NEAR and ADJ
> operators that now behave essentially the same as they do in Xapian's
> query parser.  I also split the implementation of these out into a
> separate patch for ease of review.
> 
> There's a notable lack of tests in this current series.  I do have a
> pile of tests for the lexer, parser, and generator, but the
> infrastructure for testing them needs cleanup before I send that out.
> 

-- 
Austin Clements  MIT/'06/PhD/CSAIL
amdragon at mit.edu   http://web.mit.edu/amdragon
   Somewhere in the dream we call reality you will find me,
  searching for the reality we call dreams.


Re: [RFC PATCH v2 0/8] Custom query parser, date search, folder search, and more

2011-02-01 Thread Austin Clements
I rebased the query parser against current master.  It's on the
qparser-3 branch at
  http://awakening.csail.mit.edu/git/notmuch.git

At cworth's request, I've folded the database closing bug fix in to
the appropriate patch.  I also stripped out my implementation of
folder searching, since it obviously conflicts with cworth's [1].  I'm
not planning to resend the patches unless asked because there were no
actual code changes.

[1] I still assert the correct folder solution is somewhere between
mine and cworth's: rooted and non-recursive by default (non-recursive
is the only thing that makes sense for Maildir++ folders and it would
be silly for the default to depend on folder type), but leveraging the
flexibility of custom query transforms to support Maildir++ folders
and recursive (and maybe non-rooted) searches as well.  Non-rooted and
recursive searches even map to natural syntaxes that align with
Xapian's existing wildcard syntax.

Quoth myself on Jan 16 at  3:10 am:
 This is version 2 of the custom query parser.  It now supports date
 searches with sane syntax, folder searches (without any additions or
 changes to the database, unlike cworth's recent commit), and tag:*
 and -tag:* queries for finding tagged and untagged messages.  I used
 these features to guide changes to the original design and to validate
 the approach.  This is still RFC, but it's much less raw now.
 
 In addition to the new features, the core query parser has a bunch of
 cleanups and changes, including completely redone NEAR and ADJ
 operators that now behave essentially the same as they do in Xapian's
 query parser.  I also split the implementation of these out into a
 separate patch for ease of review.
 
 There's a notable lack of tests in this current series.  I do have a
 pile of tests for the lexer, parser, and generator, but the
 infrastructure for testing them needs cleanup before I send that out.
 

-- 
Austin Clements  MIT/'06/PhD/CSAIL
amdra...@mit.edu   http://web.mit.edu/amdragon
   Somewhere in the dream we call reality you will find me,
  searching for the reality we call dreams.
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


[RFC PATCH v2 0/8] Custom query parser, date search, folder search, and more

2011-01-16 Thread Austin Clements
This is version 2 of the custom query parser.  It now supports date
searches with sane syntax, folder searches (without any additions or
changes to the database, unlike cworth's recent commit), and "tag:*"
and "-tag:*" queries for finding tagged and untagged messages.  I used
these features to guide changes to the original design and to validate
the approach.  This is still RFC, but it's much less raw now.

In addition to the new features, the core query parser has a bunch of
cleanups and changes, including completely redone NEAR and ADJ
operators that now behave essentially the same as they do in Xapian's
query parser.  I also split the implementation of these out into a
separate patch for ease of review.

There's a notable lack of tests in this current series.  I do have a
pile of tests for the lexer, parser, and generator, but the
infrastructure for testing them needs cleanup before I send that out.



[RFC PATCH v2 0/8] Custom query parser, date search, folder search, and more

2011-01-16 Thread Austin Clements
This is version 2 of the custom query parser.  It now supports date
searches with sane syntax, folder searches (without any additions or
changes to the database, unlike cworth's recent commit), and tag:*
and -tag:* queries for finding tagged and untagged messages.  I used
these features to guide changes to the original design and to validate
the approach.  This is still RFC, but it's much less raw now.

In addition to the new features, the core query parser has a bunch of
cleanups and changes, including completely redone NEAR and ADJ
operators that now behave essentially the same as they do in Xapian's
query parser.  I also split the implementation of these out into a
separate patch for ease of review.

There's a notable lack of tests in this current series.  I do have a
pile of tests for the lexer, parser, and generator, but the
infrastructure for testing them needs cleanup before I send that out.

___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch