Re: [PATCH 0/5] lib: make folder: prefix literal

2014-02-05 Thread Tomi Ollila
On Tue, Feb 04 2014, Austin Clements amdra...@mit.edu wrote:

 Quoth Jani Nikula on Feb 01 at  4:54 pm:

 I kind of like the /** suffix for recursive, but there's two small
 wrinkles: 1) it needs quoting on the command line (unlike my original
 suggestion of just / suffix), and 2) what should the top level
 recursive search be? path:** or path:/** or path:./**? I guess the
 first one is most obvious?

 The shell quoting is annoying, but depending on the shell, it should
 at least give an error (zsh) or Just Work (apparently bash and sh pass
 the unexpanded glob through if it doesn't match anything?).


In zsh:

$ echo whatever:/**
whatever:/**

Quick check with:
ksh-20100621-12.el6.x86_64,
dash-0.5.5.1-3.1.el6.x86_64
busybox-1.15.1-20.el6.x86_64 (busybox sh  busybox ash)
and
http://sourceforge.net/projects/heirloom/files/heirloom-sh/050706/heirloom-sh-050706.tar.bz2/download

all do the same (non-)expansion.


I vaguely remember some shells did puke some error when expansion yielded
no results... maybe some shell option does it. Definitely not mainstream
feature.

Tomi
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [PATCH 0/5] lib: make folder: prefix literal

2014-02-05 Thread Tomi Ollila
On Wed, Feb 05 2014, Tomi Ollila tomi.oll...@iki.fi wrote:

 On Tue, Feb 04 2014, Austin Clements amdra...@mit.edu wrote:


 In zsh:

 $ echo whatever:/**
 whatever:/**

Except (retested after seeing related IRC msg from Austin):

$ unsetopt no_nomatch
$ echo whatever:/**
zsh: no matches found: whatever:/**

We can maybe document this (and bash nullglob) for users to decide
how they want their shells to behave...

Tomi


 Quick check with:
 ksh-20100621-12.el6.x86_64,
 dash-0.5.5.1-3.1.el6.x86_64
 busybox-1.15.1-20.el6.x86_64 (busybox sh  busybox ash)
 and
 http://sourceforge.net/projects/heirloom/files/heirloom-sh/050706/heirloom-sh-050706.tar.bz2/download

 all do the same (non-)expansion.


 I vaguely remember some shells did puke some error when expansion yielded
 no results... maybe some shell option does it. Definitely not mainstream
 feature.

... or maybe it is after all ;/


 Tomi

___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [PATCH 0/5] lib: make folder: prefix literal

2014-02-04 Thread Austin Clements
Quoth Jani Nikula on Feb 01 at  4:54 pm:
 On Fri, 31 Jan 2014, Austin Clements amdra...@mit.edu wrote:
  What if we introduce two prefixes, say folder: and path: (maybe dir:?)
  to address both use cases, each as naturally as possible?  Both would
  be boolean prefixes because of the limitations of probabilistic
  prefixes, but we could take advantage of Jani's idea of generating
  several boolean terms.
 
 Agreed. On to details:
 
  folder: could work the way I suggested (simply the path to the file,
  with {cur,new} stripped off).
 
 What if the file is not in a folder named cur/new? I suggest indexing
 the folder as-is, if only for some backwards compatibility.

Agreed.  I believe this will also support MH, if I understand MH
correctly (does anyone actually use MH?)

 What if there is not all of cur/new/tmp folders? I suggest ignoring
 that, and only look at the path to the file being indexed. This is
 simplest to implement, and it does not matter if the sibling directories
 come and go, and for this reason also unsurprising.

That sounds good to me.

 For top level cur/new, index the empty string .

Yes.

  path: would support file system search
  uses.  These seem more varied, but I think fall into exact match and
  recursive match.  Since I don't have this use case, I can't have any
  strong opinions about syntax, but I'll throw out an idea: many shells
  support ** for recursive path matching and people are already quite
  familiar with glob patterns for paths, so why not simply adopt this?
  In other words, when adding the path a/b/cur/x:2, add path: terms
  a/b/cur and a/b/** and a/** and **.
 
 Since folder: would cover the cur/new cases, I suggest the non-recursive
 variant of path: prefix is the exact filesystem folder name as-is (with
 the top level being the empty string ). I presume this is what you
 meant too.

Yes.  I suppose I didn't actually say it, but that's what I was
thinking.

 I kind of like the /** suffix for recursive, but there's two small
 wrinkles: 1) it needs quoting on the command line (unlike my original
 suggestion of just / suffix), and 2) what should the top level
 recursive search be? path:** or path:/** or path:./**? I guess the
 first one is most obvious?

The shell quoting is annoying, but depending on the shell, it should
at least give an error (zsh) or Just Work (apparently bash and sh pass
the unexpanded glob through if it doesn't match anything?).

 So here's what my original suggestions would become:
 
  Here's a thought. With boolean prefix folder:, we can devise a scheme
  where the folder: query defines what is to be matched.
  
  For example:
  
  folder:foo match files in foo, foo/new, and foo/cur.
 
 - folder:foo
 
  folder:foo/match all files in all subdirectories under foo (this
 would handle Tomi's use case), including foo/new and foo/cur.
 
 - path:foo/**
 
  folder:foo/.   match in foo only, and specifically not in foo/cur or 
  foo/new.
 
 - path:foo
 
  folder:foo/new  match in foo/new, and specifically not in foo/cur (this
 allows distinguishing between messages in cur and new).
 
 - path:foo/new
 
  folder:/   match everything.
 
 - path:**
 
  folder:/.  match in top level maildir only.
 
 - path:
 
  folder:  match in top level maildir, including cur/new.
 
 - folder:
 
 
 I'd like these details to be ironed out and agreed on before I send the
 next version.

This all looks good to me.

 BR,
 Jani.
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [PATCH 0/5] lib: make folder: prefix literal

2014-02-04 Thread Austin Clements
Quoth Rob Browning on Jan 31 at  1:19 pm:
 Austin Clements amdra...@mit.edu writes:
 
  folder: could work the way I suggested (simply the path to the file,
  with {cur,new} stripped off).
 
 Hmm, so would notmuch try to guess whether or not it's dealing with a
 maildir++ tree, and if so convert folder:foo to a search of .foo, and/or
 folder:foo/bar to .foo.bar?  Or would the user just need to know to say
 folder:.foo and folder:.foo.bar?

My opinion on this has changed over time, but I don't think we should
try to interpret Maildir++ trees specially.  That is, the user would
have to say folder:.foo.bar if they're using Maildir++.  The . seems
as good as a / for a separator, so we might as well not translate
it.  The leading . is annoying, but *shrug* so is Maildir++.

 And if we're only planning special treatment for for maildir-like
 stores, then I wonder if the term should just be maildir:?

The simple algorithm of taking the relative path and stripping
{new,cur} (if present) does a good job of supporting both Maildir and
non-Maildir stores (while balancing this support with simplicity,
predictability, and usability).

 Though folder: would make more sense if the long-term goal was to have a
 DTRT term.  But in that case, I wonder if it might eventually be
 expected to support mixed trees, i.e. say a tree containing maildir++
 and mh subdirs, and if so, how that should be handled.

The simple {new,cur}-stripping algorithm already does fairly well at
this.  Worrying more about mixed Maildir++ and MH stores seems
unnecessary to me unless someone demonstrates and actual need.

  many shells support ** for recursive path matching and people are
  already quite familiar with glob patterns for paths, so why not simply
  adopt this?
 
 rsync too.

Ah, sure enough.  Even better!
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [PATCH 0/5] lib: make folder: prefix literal

2014-02-04 Thread Rob Browning
Austin Clements amdra...@mit.edu writes:

 The simple algorithm of taking the relative path and stripping
 {new,cur} (if present) does a good job of supporting both Maildir and
 non-Maildir stores (while balancing this support with simplicity,
 predictability, and usability).

Unless, of course, the user has legitimate folders named cur and new,
but perhaps that'll just end up a don't do that then FAQ...

-- 
Rob Browning
rlb @defaultvalue.org and @debian.org
GPG as of 2011-07-10 E6A9 DA3C C9FD 1FF8 C676 D2C4 C0F0 39E9 ED1B 597A
GPG as of 2002-11-03 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [PATCH 0/5] lib: make folder: prefix literal

2014-02-04 Thread Carl Worth
Austin Clements amdra...@mit.edu writes:
 Agreed.  I believe this will also support MH, if I understand MH
 correctly (does anyone actually use MH?)

When I started notmuch, I had all of my mail in one-message-per-file in
various directories, (without these silly cur and new directories
that maildir uses).

At some point, I did a mass conversion of all of my directories to be
roughly:

~/mail//MM

So that I keep directories small by just delivering a month's worth of
mail to each directory. This conversion, (and the delivery agent I am
currently using, maildrop), happen to create the silly cur and new
directories.

So most of my mail still is in maildir format now.

But I do have a few messages in non-maildir directories. These have
generally come into being in cases such as someone providing me a
message to demonstrate a notmuch bug or use case. So in cases like this
I did things like:

mkdir ~/mail/bug-description
cp example-file ~/mail/bug-description

I also have a few directories created similarly when I've copied some
downloaded archives from a mailing list into my mail storage. (But often
I've used mb2md for those so the conversion has accidentally created
maildir directories).

I don't know if the non-maildir directories I have are strict mh
format, (did it have filenames with sequential numbers? I don't
recall).

But my intention with notmuch from the beginning was to support any
one-message-per-file layout without enforcing any particular naming of
directories or files. And I would like to see that preserved.

Since then, we have also supported various semantics when people do
encode information in directories and filenames, (such as ignoring
cur/new and interpreting maildir flags). This kind of thing does
seem good.

-Carl


pgpZXkuCQyczA.pgp
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [PATCH 0/5] lib: make folder: prefix literal

2014-02-01 Thread Jani Nikula
On Fri, 31 Jan 2014, Austin Clements amdra...@mit.edu wrote:
 What if we introduce two prefixes, say folder: and path: (maybe dir:?)
 to address both use cases, each as naturally as possible?  Both would
 be boolean prefixes because of the limitations of probabilistic
 prefixes, but we could take advantage of Jani's idea of generating
 several boolean terms.

Agreed. On to details:

 folder: could work the way I suggested (simply the path to the file,
 with {cur,new} stripped off).

What if the file is not in a folder named cur/new? I suggest indexing
the folder as-is, if only for some backwards compatibility.

What if there is not all of cur/new/tmp folders? I suggest ignoring
that, and only look at the path to the file being indexed. This is
simplest to implement, and it does not matter if the sibling directories
come and go, and for this reason also unsurprising.

For top level cur/new, index the empty string .

 path: would support file system search
 uses.  These seem more varied, but I think fall into exact match and
 recursive match.  Since I don't have this use case, I can't have any
 strong opinions about syntax, but I'll throw out an idea: many shells
 support ** for recursive path matching and people are already quite
 familiar with glob patterns for paths, so why not simply adopt this?
 In other words, when adding the path a/b/cur/x:2, add path: terms
 a/b/cur and a/b/** and a/** and **.

Since folder: would cover the cur/new cases, I suggest the non-recursive
variant of path: prefix is the exact filesystem folder name as-is (with
the top level being the empty string ). I presume this is what you
meant too.

I kind of like the /** suffix for recursive, but there's two small
wrinkles: 1) it needs quoting on the command line (unlike my original
suggestion of just / suffix), and 2) what should the top level
recursive search be? path:** or path:/** or path:./**? I guess the
first one is most obvious?

So here's what my original suggestions would become:

 Here's a thought. With boolean prefix folder:, we can devise a scheme
 where the folder: query defines what is to be matched.
 
 For example:
 
 folder:foo   match files in foo, foo/new, and foo/cur.

- folder:foo

 folder:foo/  match all files in all subdirectories under foo (this
  would handle Tomi's use case), including foo/new and foo/cur.

- path:foo/**

 folder:foo/. match in foo only, and specifically not in foo/cur or foo/new.

- path:foo

 folder:foo/new  match in foo/new, and specifically not in foo/cur (this
  allows distinguishing between messages in cur and new).

- path:foo/new

 folder:/ match everything.

- path:**

 folder:/.match in top level maildir only.

- path:

 folder:match in top level maildir, including cur/new.

- folder:


I'd like these details to be ironed out and agreed on before I send the
next version.

BR,
Jani.

___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [PATCH 0/5] lib: make folder: prefix literal

2014-01-31 Thread Rob Browning
Austin Clements amdra...@mit.edu writes:

 folder: could work the way I suggested (simply the path to the file,
 with {cur,new} stripped off).

Hmm, so would notmuch try to guess whether or not it's dealing with a
maildir++ tree, and if so convert folder:foo to a search of .foo, and/or
folder:foo/bar to .foo.bar?  Or would the user just need to know to say
folder:.foo and folder:.foo.bar?

And if we're only planning special treatment for for maildir-like
stores, then I wonder if the term should just be maildir:?

Though folder: would make more sense if the long-term goal was to have a
DTRT term.  But in that case, I wonder if it might eventually be
expected to support mixed trees, i.e. say a tree containing maildir++
and mh subdirs, and if so, how that should be handled.

 many shells support ** for recursive path matching and people are
 already quite familiar with glob patterns for paths, so why not simply
 adopt this?

rsync too.

-- 
Rob Browning
rlb @defaultvalue.org and @debian.org
GPG as of 2011-07-10 E6A9 DA3C C9FD 1FF8 C676 D2C4 C0F0 39E9 ED1B 597A
GPG as of 2002-11-03 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [PATCH 0/5] lib: make folder: prefix literal

2014-01-30 Thread Austin Clements
Quoth Jani Nikula on Jan 25 at  5:38 pm:
 On Sat, 25 Jan 2014, Jani Nikula j...@nikula.org wrote:
  Perhaps we need to have two prefixes, one of which is the literal
  filesystem folder and another which hides the implementation details,
  like I mentioned in my mail to Peter [1]. But consider this: my proposed
  implementation does cover *all* use cases.
 
 Here's a thought. With boolean prefix folder:, we can devise a scheme
 where the folder: query defines what is to be matched.
 
 For example:
 
 folder:foomatch files in foo, foo/new, and foo/cur.
 folder:foo/   match all files in all subdirectories under foo (this
   would handle Tomi's use case), including foo/new and foo/cur.
 folder:foo/.  match in foo only, and specifically not in foo/cur or foo/new.
 folder:foo/new  match in foo/new, and specifically not in foo/cur (this
   allows distinguishing between messages in cur and new).
 folder:/  match everything.
 folder:/. match in top level maildir only.
 folder: match in top level maildir, including cur/new.
 
 This requires indexing all the path components with suitable
 suffixes. For example, a file foo/new/baz would get terms /, foo,
 foo/, foo/new, and foo/new/.. A file foo/bar would get terms /,
 foo, foo/, and foo/..
 
 It's obviously a concern this increases the database size; not sure how
 it would compare with the current stemmed probabilistic prefix.
 
 Opinions on this? This would really cover all use cases, and address
 Austin's interface and backward compatibility concerns.

I like this idea in general, though I agree with others that the
specific syntax seems a little wanting.  The concept of adding several
boolean terms seems powerful, and I would be surprised if the extra
terms had any substantive effect on database size.

However, it seems like this is overloading one prefix for two
meanings.  And I think that's because people want two similar but
distinct things.  Several of us want a simple, natural Maildir-aware
folder search (the Maildir folder of a/b/cur/x:2, is a/b).  Others
want file system search.  It's easy to conflate these because Maildir
represents folders as directory paths, but maybe they need to be
treated as distinct things.

What if we introduce two prefixes, say folder: and path: (maybe dir:?)
to address both use cases, each as naturally as possible?  Both would
be boolean prefixes because of the limitations of probabilistic
prefixes, but we could take advantage of Jani's idea of generating
several boolean terms.

folder: could work the way I suggested (simply the path to the file,
with {cur,new} stripped off).  path: would support file system search
uses.  These seem more varied, but I think fall into exact match and
recursive match.  Since I don't have this use case, I can't have any
strong opinions about syntax, but I'll throw out an idea: many shells
support ** for recursive path matching and people are already quite
familiar with glob patterns for paths, so why not simply adopt this?
In other words, when adding the path a/b/cur/x:2, add path: terms
a/b/cur and a/b/** and a/** and **.

 BR,
 Jani.
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [PATCH 0/5] lib: make folder: prefix literal

2014-01-29 Thread Jani Nikula
On Sun, 26 Jan 2014, Carl Worth cwo...@cworth.org wrote:
 Jani Nikula j...@nikula.org writes:
 Here's a thought. With boolean prefix folder:, we can devise a scheme
 where the folder: query defines what is to be matched.

 I like the idea, but I tried to infer the rules from the examples, and I
 failed. It looks like there are two new symbols, / and /. but I
 couldn't decipher the exact semantics of each.

 I think a proposal like this should not re-use the '/' symbol as we
 already have that as a path divider. (See rsync for lots of user
 confusion with a significant trailing '/').

 I propose a similar, but slightly different approach, where we add two
 additional symbols:

   '^' Matches the beginning of a path

   '$' Matches the end of a path

 [Obviously, I chose these symbols from regular expressions. I would be
 OK with alternate symbols, ('$' seems like it might be problematic in
 the shell, but perhaps not too much if it's always at the end of a
 phrase.)]

 This way, one could search for:

   folder:foo  Works like folder: historically

   folder:^full/path$  Works like Jani's proposal

   folder:^path/prefix Satisfies Tomi's use case, (as well as anyone
   who doesn't want to have to specify or
   distinguish between /cur or /new.

 Any extra '/' at the beginning or end of a search string, (such as
 folder:^/full/path/$) would not change the semantics.

 Further, I think we can implement this with less database bloat by
 leaving folder as probabilistic and simply indexing two new terms to
 indicate the beginning of the path and the end of the path.

 Finally, we could also extend the scheme to other things like subject:
 to allow for an exact subject search like:

   subject:^lib: make folder: prefix literal$

 It was with an eye toward something like this that I chose to make
 folder: probabilistic in the first place. (I probably would have indexed
 things appropriately in the first place as well, but at the time doing
 the necessary query parsing for '^' and '$' seemed daunting).

Unfortunately, I haven't had the time to experiment with this. But it
bugs me that the probabilistic folder: prefix has stemming and it's case
insensitive. It's possible to work around the stemming with the anchors
you suggest or by quoting, but is there a way to have case sensitive
probabilistic prefixes?

BR,
Jani.
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [PATCH 0/5] lib: make folder: prefix literal

2014-01-29 Thread Carl Worth
Jani Nikula j...@nikula.org writes:
 Unfortunately, I haven't had the time to experiment with this. But it
 bugs me that the probabilistic folder: prefix has stemming and it's case
 insensitive. It's possible to work around the stemming with the anchors
 you suggest or by quoting, but is there a way to have case sensitive
 probabilistic prefixes?

The stemming and case insensitivity just has to do with which terms are
shoved into the database, (you have to add extra terms to get these
features). If we're getting those features for folder now, (and I agree
that we don't want them), it's because we're calling some Xapian
convenience function along the lines of create a bunch of terms for
this chunk of text.

The fix for that is to do the simple thing and simply break the path at
each '/' and add a term for each component. Then these problems all go
away.

So fixes for this should not require switching from a probabilistic to a
Boolean prefix.

-Carl

-- 
carl.d.wo...@intel.com


pgpO4oiDDTjcx.pgp
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [PATCH 0/5] lib: make folder: prefix literal

2014-01-29 Thread Jani Nikula
On Wed, 29 Jan 2014, Carl Worth cwo...@cworth.org wrote:
 Austin Clements acleme...@csail.mit.edu writes:
 I think you're assuming we have much more control over this than we
 do.

 To be fair, I only started discussing my proposal for '^' and '$' in
 response to Jani's proposal with special semantics for trailing '/' and
 /..

I only chose those to avoid any collisions with actual file names,
without much further thought. I'm not a fan of rsync's trailing '/'
semantics either. The main point was to demonstrate that if folder: were
a boolean prefix, it would be possible to index folder terms in a way
that would address the issues with the current folder: prefix.

I don't have good counter-proposals now, but an *example* is having
sub-prefixes like folder:recursive::foo or folder:maildir::bar,
where the former would match anything under foo and the latter would
match anything in bar/new and bar/cur. These recursive:: and
maildir:: prefixes would be just part of the indexed boolean terms.

 Support for any of this magic syntax would require a custom query
 parser, yes.

 Austin, haven't you been proposing a custom query parser for ages? Where
 does that work stand now?

That is the unicorn... many of the query improvements I have in mind
depend on a custom query parser. So I'd like to have that. And a
pony. But in the mean time, I'm left wondering whether I should pursue
folder: as a boolean prefix, or try to figure out if there are
improvements to be made as a probabilistic prefix, or just put this work
on hold. With the db upgrade and upgrade tests, it's not exactly a
trivial amount of work.

BR,
Jani.
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [PATCH 0/5] lib: make folder: prefix literal

2014-01-25 Thread Jani Nikula
On Fri, 24 Jan 2014, Austin Clements acleme...@csail.mit.edu wrote:
 On Thu, 09 Jan 2014, Jani Nikula j...@nikula.org wrote:
 Hi all, this series makes the folder: search prefix literal, or switches
 it from a probabilistic prefix to a boolean prefix. With this, you have
 to give the path from the maildir root to the folder you want in full,
 including the maildir cur/new component, if any. Examples:

 I strongly disagree with requiring the cur/new component.  The cur/new
 directory is an internal implementation detail of Maildir (and a rather
 broken one at that) and no more a part of the folder of a piece of
 mail than its final file name component.  It's also the less obvious
 user interface; if we require the cur/new component, we *will* get
 people asking why their folder searches aren't working, but if we strip
 the cur/new component, nobody will be surprised.

 I think the question is not whether we should strip cur/new, but when.
 We've already defined a _filename_is_in_maildir test in
 lib/message.cc, which we depend on for flag sync.  It's simple, but I
 think this would be the right thing to use for consistency.

I'd like to discuss some of the reasons I chose to include the cur/new
components. Admittedly, none of them are very strong on their own, but
all of them together tilted my opinion towards requiring them.

The way I see it, notmuch supports maildir, but does not require it. In
many ways the messages are just files somewhere in the directory
hierarchy. There are only a few cases where it matters that there are
cur/new/tmp directories within a directory.

If you strip cur and new, it becomes impossible to distinguish between
files in foo, foo/cur, and foo/new - and one of the reasons for changing
folder: in the first place is to be able to better distinguish between
folders.

Apparently mutt presents the difference between messages in new and cur
to its users (so I've been told; I've never really used mutt), and our
integration with mutt lacks that distinction. We could fix that by
requiring the cur/new components in folder: searches.

Speaking of consistency, compare _filename_is_in_maildir() with
_entries_resemble_maildir() in notmuch-new.c. What should the indexed
folder: prefix be if there is not all of cur, new, and tmp? We will
actually index files in tmp if cur or new is not present! What if the
missing sibling directories are added (or existing ones removed) later?
Where's the consistency compared to new.ignore config, which also
matches the cur/new components if so desired? Or consistency with
notmuch search --output=files?

My conclusion was that requiring *all* filesystem folder components
as-is is consistent, most versatile, agnostic to Maildir or Maildir++
implementation details wrt directory naming or hierarchy, without
difficult corner cases, simplest to implement, and unsurprising (once
you understand the cur/new distinction).

For *me* this is the more obvious user interface. And hey, I'm a user
too.

Perhaps we need to have two prefixes, one of which is the literal
filesystem folder and another which hides the implementation details,
like I mentioned in my mail to Peter [1]. But consider this: my proposed
implementation does cover *all* use cases.


BR,
Jani.


[1] id:8761ppurfz@nikula.org
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [PATCH 0/5] lib: make folder: prefix literal

2014-01-25 Thread Tomi Ollila
On Sat, Jan 25 2014, Jani Nikula j...@nikula.org wrote:

 On Fri, 24 Jan 2014, Austin Clements acleme...@csail.mit.edu wrote:
 On Thu, 09 Jan 2014, Jani Nikula j...@nikula.org wrote:
 Hi all, this series makes the folder: search prefix literal, or switches
 it from a probabilistic prefix to a boolean prefix. With this, you have
 to give the path from the maildir root to the folder you want in full,
 including the maildir cur/new component, if any. Examples:

 I strongly disagree with requiring the cur/new component.  The cur/new
 directory is an internal implementation detail of Maildir (and a rather
 broken one at that) and no more a part of the folder of a piece of
 mail than its final file name component.  It's also the less obvious
 user interface; if we require the cur/new component, we *will* get
 people asking why their folder searches aren't working, but if we strip
 the cur/new component, nobody will be surprised.

 I think the question is not whether we should strip cur/new, but when.
 We've already defined a _filename_is_in_maildir test in
 lib/message.cc, which we depend on for flag sync.  It's simple, but I
 think this would be the right thing to use for consistency.

 I'd like to discuss some of the reasons I chose to include the cur/new
 components. Admittedly, none of them are very strong on their own, but
 all of them together tilted my opinion towards requiring them.

 The way I see it, notmuch supports maildir, but does not require it. In
 many ways the messages are just files somewhere in the directory
 hierarchy. There are only a few cases where it matters that there are
 cur/new/tmp directories within a directory.

 If you strip cur and new, it becomes impossible to distinguish between
 files in foo, foo/cur, and foo/new - and one of the reasons for changing
 folder: in the first place is to be able to better distinguish between
 folders.

 Apparently mutt presents the difference between messages in new and cur
 to its users (so I've been told; I've never really used mutt), and our
 integration with mutt lacks that distinction. We could fix that by
 requiring the cur/new components in folder: searches.

 Speaking of consistency, compare _filename_is_in_maildir() with
 _entries_resemble_maildir() in notmuch-new.c. What should the indexed
 folder: prefix be if there is not all of cur, new, and tmp? We will
 actually index files in tmp if cur or new is not present! What if the
 missing sibling directories are added (or existing ones removed) later?
 Where's the consistency compared to new.ignore config, which also
 matches the cur/new components if so desired? Or consistency with
 notmuch search --output=files?

 My conclusion was that requiring *all* filesystem folder components
 as-is is consistent, most versatile, agnostic to Maildir or Maildir++
 implementation details wrt directory naming or hierarchy, without
 difficult corner cases, simplest to implement, and unsurprising (once
 you understand the cur/new distinction).

 For *me* this is the more obvious user interface. And hey, I'm a user
 too.

 Perhaps we need to have two prefixes, one of which is the literal
 filesystem folder and another which hides the implementation details,
 like I mentioned in my mail to Peter [1]. But consider this: my proposed
 implementation does cover *all* use cases.

I challenge that with my use case: my mails are arranged as follows: 

head of contents of notmuch archive prior to my involvement:

$ find notmuch | head -5
notmuch
notmuch/6b
notmuch/6b/de820df0697ab2d235fbc8e32510d7
notmuch/6b/917afddb116a03c45371282be58388
notmuch/6b/10eb0bc1406f6767161f5883f328f7

head of contents of received mail

$ find notmuch | head -5
received
received/rawmail2
received/6b
received/6b/86a8937aac57721ad87f0e0e5fe6c3
received/6b/3278d6c4c1fe7604f1404bc09acff7

Interestingly find started with subdirectory '6b' in both cases...
-- anyway I have 0xff + 1 subdirectories in each mail directory and
$ md5sum received/6b/86a8937aac57721ad87f0e0e5fe6c3 outputs
6b86a8937aac57721ad87f0e0e5fe6c3 received/6b/86a8937aac57721ad87f0e0e5fe6c3

For me the current folder: works as I don't have collisions.

For me a folder: search which would just work as a prefix i.e. match
anything under given directory hierarchy would work best.

At the end it might be that I have to hack the search for my purposes;
more important/interesting thing is whether I need to use incompatible
database format :O


Tomi



 BR,
 Jani.


 [1] id:8761ppurfz@nikula.org
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [PATCH 0/5] lib: make folder: prefix literal

2014-01-25 Thread Jani Nikula
On Sat, 25 Jan 2014, Tomi Ollila tomi.oll...@iki.fi wrote:
 On Sat, Jan 25 2014, Jani Nikula j...@nikula.org wrote:
 Perhaps we need to have two prefixes, one of which is the literal
 filesystem folder and another which hides the implementation details,
 like I mentioned in my mail to Peter [1]. But consider this: my proposed
 implementation does cover *all* use cases.

 I challenge that with my use case: my mails are arranged as follows: 

[snip]

 For me the current folder: works as I don't have collisions.

Fair enough, your use case would be *very inconvenient* with the
proposed changes to the folder: prefix, *regardless* of whether the leaf
cur/new is indexed and required or not.

(Very inconvenient, or practically impossible, as you'd have to include
all those 01..ff directories in your searches.)

 For me a folder: search which would just work as a prefix i.e. match
 anything under given directory hierarchy would work best.

Indeed. Your use case is not an argument in whether cur/new should be
included or not.

That recursive folder prefix suggestion is, I think, incompatible with
the requirements for the literal folder: prefix we've been considering.


BR,
Jani.

___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [PATCH 0/5] lib: make folder: prefix literal

2014-01-25 Thread Jani Nikula
On Sat, 25 Jan 2014, Jani Nikula j...@nikula.org wrote:
 Perhaps we need to have two prefixes, one of which is the literal
 filesystem folder and another which hides the implementation details,
 like I mentioned in my mail to Peter [1]. But consider this: my proposed
 implementation does cover *all* use cases.

Here's a thought. With boolean prefix folder:, we can devise a scheme
where the folder: query defines what is to be matched.

For example:

folder:foo  match files in foo, foo/new, and foo/cur.
folder:foo/ match all files in all subdirectories under foo (this
would handle Tomi's use case), including foo/new and foo/cur.
folder:foo/.match in foo only, and specifically not in foo/cur or foo/new.
folder:foo/new  match in foo/new, and specifically not in foo/cur (this
allows distinguishing between messages in cur and new).
folder:/match everything.
folder:/.   match in top level maildir only.
folder:   match in top level maildir, including cur/new.

This requires indexing all the path components with suitable
suffixes. For example, a file foo/new/baz would get terms /, foo,
foo/, foo/new, and foo/new/.. A file foo/bar would get terms /,
foo, foo/, and foo/..

It's obviously a concern this increases the database size; not sure how
it would compare with the current stemmed probabilistic prefix.

Opinions on this? This would really cover all use cases, and address
Austin's interface and backward compatibility concerns.

BR,
Jani.
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [PATCH 0/5] lib: make folder: prefix literal

2014-01-25 Thread David Bremner
Jani Nikula j...@nikula.org writes:

 On Sat, 25 Jan 2014, Jani Nikula j...@nikula.org wrote:
 Perhaps we need to have two prefixes, one of which is the literal
 filesystem folder and another which hides the implementation details,
 like I mentioned in my mail to Peter [1]. But consider this: my proposed
 implementation does cover *all* use cases.

 Here's a thought. With boolean prefix folder:, we can devise a scheme
 where the folder: query defines what is to be matched.

 For example:

 folder:foomatch files in foo, foo/new, and foo/cur.
 folder:foo/   match all files in all subdirectories under foo (this
   would handle Tomi's use case), including foo/new and
   foo/cur.

handling hierarchies sounds useful and natural

 folder:foo/.  match in foo only, and specifically not in foo/cur or foo/new.
 folder:foo/new  match in foo/new, and specifically not in foo/cur (this
   allows distinguishing between messages in cur and new).

is new special cased here? or do you rely on it being a leaf
directory?

 folder:/  match everything.
 folder:/. match in top level maildir only.
 folder: match in top level maildir, including cur/new.

I could certainly support this UI, assuming the database bloat is not
too bad.

I started to wonder about using 3 prefixes instead, but then I read your
message again and a light went on. ;).

d

___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [PATCH 0/5] lib: make folder: prefix literal

2014-01-25 Thread Jani Nikula
On Sat, 25 Jan 2014, David Bremner da...@tethera.net wrote:
 Jani Nikula j...@nikula.org writes:

 On Sat, 25 Jan 2014, Jani Nikula j...@nikula.org wrote:
 Perhaps we need to have two prefixes, one of which is the literal
 filesystem folder and another which hides the implementation details,
 like I mentioned in my mail to Peter [1]. But consider this: my proposed
 implementation does cover *all* use cases.

 Here's a thought. With boolean prefix folder:, we can devise a scheme
 where the folder: query defines what is to be matched.

 For example:

 folder:foo   match files in foo, foo/new, and foo/cur.
 folder:foo/  match all files in all subdirectories under foo (this
  would handle Tomi's use case), including foo/new and
   foo/cur.

 handling hierarchies sounds useful and natural

 folder:foo/. match in foo only, and specifically not in foo/cur or foo/new.
 folder:foo/new  match in foo/new, and specifically not in foo/cur (this
  allows distinguishing between messages in cur and new).

 is new special cased here? or do you rely on it being a leaf
 directory?

A little bit of both I guess; not too bad.

An alternative might be to make the variant without the trailing /
recursive, so folder:foo would match all files in all subdirectories
under foo, including foo/new and foo/cur. This would be more compatible
with the current folder: prefix too.

Then, if you wanted to match without recursion, you'd have to have
folder:foo/. OR folder:foo/new OR folder:foo/cur assuming new and cur
are leaf nodes, and if not, with /. at the end.

But you'd have to decide what to do with folder:foo/ which would then
match nothing.

There's definitely room for thoughts and discussion.

 folder:/ match everything.
 folder:/.match in top level maildir only.
 folder:match in top level maildir, including cur/new.

 I could certainly support this UI, assuming the database bloat is not
 too bad.

 I started to wonder about using 3 prefixes instead, but then I read your
 message again and a light went on. ;).

\o/


BR,
Jani.
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [PATCH 0/5] lib: make folder: prefix literal

2014-01-25 Thread Carl Worth
Jani Nikula j...@nikula.org writes:
 Here's a thought. With boolean prefix folder:, we can devise a scheme
 where the folder: query defines what is to be matched.

I like the idea, but I tried to infer the rules from the examples, and I
failed. It looks like there are two new symbols, / and /. but I
couldn't decipher the exact semantics of each.

I think a proposal like this should not re-use the '/' symbol as we
already have that as a path divider. (See rsync for lots of user
confusion with a significant trailing '/').

I propose a similar, but slightly different approach, where we add two
additional symbols:

  '^'   Matches the beginning of a path

  '$'   Matches the end of a path

[Obviously, I chose these symbols from regular expressions. I would be
OK with alternate symbols, ('$' seems like it might be problematic in
the shell, but perhaps not too much if it's always at the end of a
phrase.)]

This way, one could search for:

  folder:fooWorks like folder: historically

  folder:^full/path$Works like Jani's proposal

  folder:^path/prefix   Satisfies Tomi's use case, (as well as anyone
who doesn't want to have to specify or
distinguish between /cur or /new.

Any extra '/' at the beginning or end of a search string, (such as
folder:^/full/path/$) would not change the semantics.

Further, I think we can implement this with less database bloat by
leaving folder as probabilistic and simply indexing two new terms to
indicate the beginning of the path and the end of the path.

Finally, we could also extend the scheme to other things like subject:
to allow for an exact subject search like:

subject:^lib: make folder: prefix literal$

It was with an eye toward something like this that I chose to make
folder: probabilistic in the first place. (I probably would have indexed
things appropriately in the first place as well, but at the time doing
the necessary query parsing for '^' and '$' seemed daunting).

-Carl

-- 
carl.d.wo...@intel.com


pgpGvAffMislw.pgp
Description: PGP signature
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [PATCH 0/5] lib: make folder: prefix literal

2014-01-24 Thread Austin Clements
On Thu, 09 Jan 2014, Jani Nikula j...@nikula.org wrote:
 Hi all, this series makes the folder: search prefix literal, or switches
 it from a probabilistic prefix to a boolean prefix. With this, you have
 to give the path from the maildir root to the folder you want in full,
 including the maildir cur/new component, if any. Examples:

I strongly disagree with requiring the cur/new component.  The cur/new
directory is an internal implementation detail of Maildir (and a rather
broken one at that) and no more a part of the folder of a piece of
mail than its final file name component.  It's also the less obvious
user interface; if we require the cur/new component, we *will* get
people asking why their folder searches aren't working, but if we strip
the cur/new component, nobody will be surprised.

I think the question is not whether we should strip cur/new, but when.
We've already defined a _filename_is_in_maildir test in
lib/message.cc, which we depend on for flag sync.  It's simple, but I
think this would be the right thing to use for consistency.

 folder:cur
 folder:foo/bar
 folder:

 The last one can be used to refer to the maildir root (note that in
 shell you'll need quoting to pass the double quotes to xapian,
 folder:'').

 The old probabilistic folder: prefix is problematic in a number of
 ways. It's not possible to refer to the maildir root. It does stemming,
 so inboxing would match inbox too. cur for the folder in maildir
 root would match all cur folders across the maildir hierarchy. Likely
 some others I forgot.

 WARNING! The change requires a database format version bump, and a
 database upgrade, which is automatically done on 'notmuch new'. The
 upgrade is irreversible if you want to try this on your database! A
 complete database rebuild is required for reverting the database format
 version. Make sure your backups are in order!

 The series includes some tests, including an initial upgrade test, along
 with a test database in the previous format version.


 BR,
 Jani.



 Jani Nikula (5):
   lib: make folder: prefix literal
   test: fix insert folder: searches
   test: fix test for literal folder: search
   test: add test database in format version 1
   test: add database upgrade test from format version 1

  lib/database.cc|  39 -
  lib/message.cc | 154 
 +
  lib/notmuch-private.h  |   3 +
  test/insert|  10 +--
  test/notmuch-test  |   1 +
  test/search-by-folder  |  24 -
  test/test-databases/README |   5 ++
  test/test-databases/database-v1.tar.gz | Bin 0 - 252243 bytes
  test/upgrade   |  25 ++
  9 files changed, 174 insertions(+), 87 deletions(-)
  create mode 100644 test/test-databases/README
  create mode 100644 test/test-databases/database-v1.tar.gz
  create mode 100755 test/upgrade

 -- 
 1.8.5.2

 ___
 notmuch mailing list
 notmuch@notmuchmail.org
 http://notmuchmail.org/mailman/listinfo/notmuch
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: [PATCH 0/5] lib: make folder: prefix literal

2014-01-24 Thread David Bremner
Austin Clements acleme...@csail.mit.edu writes:

 On Thu, 09 Jan 2014, Jani Nikula j...@nikula.org wrote:

 I strongly disagree with requiring the cur/new component.  The cur/new
 directory is an internal implementation detail of Maildir (and a rather
 broken one at that) and no more a part of the folder of a piece of
 mail than its final file name component.  It's also the less obvious
 user interface; if we require the cur/new component, we *will* get
 people asking why their folder searches aren't working, but if we strip
 the cur/new component, nobody will be surprised.


My gut instinct agrees with Austin here, at least for a prefix named
folder. Some factors I remember

- are there some corner cases for people not using Maildir?
- isn't there some information encoded in whether a message is in cur or
  new? there was some discussion of mutt presenting this information

d
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch