Re: [PATCH 0/5] lib: make folder: prefix literal
On Tue, Feb 04 2014, Austin Clements amdra...@mit.edu wrote: Quoth Jani Nikula on Feb 01 at 4:54 pm: I kind of like the /** suffix for recursive, but there's two small wrinkles: 1) it needs quoting on the command line (unlike my original suggestion of just / suffix), and 2) what should the top level recursive search be? path:** or path:/** or path:./**? I guess the first one is most obvious? The shell quoting is annoying, but depending on the shell, it should at least give an error (zsh) or Just Work (apparently bash and sh pass the unexpanded glob through if it doesn't match anything?). In zsh: $ echo whatever:/** whatever:/** Quick check with: ksh-20100621-12.el6.x86_64, dash-0.5.5.1-3.1.el6.x86_64 busybox-1.15.1-20.el6.x86_64 (busybox sh busybox ash) and http://sourceforge.net/projects/heirloom/files/heirloom-sh/050706/heirloom-sh-050706.tar.bz2/download all do the same (non-)expansion. I vaguely remember some shells did puke some error when expansion yielded no results... maybe some shell option does it. Definitely not mainstream feature. Tomi ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
Re: [PATCH 0/5] lib: make folder: prefix literal
On Wed, Feb 05 2014, Tomi Ollila tomi.oll...@iki.fi wrote: On Tue, Feb 04 2014, Austin Clements amdra...@mit.edu wrote: In zsh: $ echo whatever:/** whatever:/** Except (retested after seeing related IRC msg from Austin): $ unsetopt no_nomatch $ echo whatever:/** zsh: no matches found: whatever:/** We can maybe document this (and bash nullglob) for users to decide how they want their shells to behave... Tomi Quick check with: ksh-20100621-12.el6.x86_64, dash-0.5.5.1-3.1.el6.x86_64 busybox-1.15.1-20.el6.x86_64 (busybox sh busybox ash) and http://sourceforge.net/projects/heirloom/files/heirloom-sh/050706/heirloom-sh-050706.tar.bz2/download all do the same (non-)expansion. I vaguely remember some shells did puke some error when expansion yielded no results... maybe some shell option does it. Definitely not mainstream feature. ... or maybe it is after all ;/ Tomi ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
Re: [PATCH 0/5] lib: make folder: prefix literal
Quoth Jani Nikula on Feb 01 at 4:54 pm: On Fri, 31 Jan 2014, Austin Clements amdra...@mit.edu wrote: What if we introduce two prefixes, say folder: and path: (maybe dir:?) to address both use cases, each as naturally as possible? Both would be boolean prefixes because of the limitations of probabilistic prefixes, but we could take advantage of Jani's idea of generating several boolean terms. Agreed. On to details: folder: could work the way I suggested (simply the path to the file, with {cur,new} stripped off). What if the file is not in a folder named cur/new? I suggest indexing the folder as-is, if only for some backwards compatibility. Agreed. I believe this will also support MH, if I understand MH correctly (does anyone actually use MH?) What if there is not all of cur/new/tmp folders? I suggest ignoring that, and only look at the path to the file being indexed. This is simplest to implement, and it does not matter if the sibling directories come and go, and for this reason also unsurprising. That sounds good to me. For top level cur/new, index the empty string . Yes. path: would support file system search uses. These seem more varied, but I think fall into exact match and recursive match. Since I don't have this use case, I can't have any strong opinions about syntax, but I'll throw out an idea: many shells support ** for recursive path matching and people are already quite familiar with glob patterns for paths, so why not simply adopt this? In other words, when adding the path a/b/cur/x:2, add path: terms a/b/cur and a/b/** and a/** and **. Since folder: would cover the cur/new cases, I suggest the non-recursive variant of path: prefix is the exact filesystem folder name as-is (with the top level being the empty string ). I presume this is what you meant too. Yes. I suppose I didn't actually say it, but that's what I was thinking. I kind of like the /** suffix for recursive, but there's two small wrinkles: 1) it needs quoting on the command line (unlike my original suggestion of just / suffix), and 2) what should the top level recursive search be? path:** or path:/** or path:./**? I guess the first one is most obvious? The shell quoting is annoying, but depending on the shell, it should at least give an error (zsh) or Just Work (apparently bash and sh pass the unexpanded glob through if it doesn't match anything?). So here's what my original suggestions would become: Here's a thought. With boolean prefix folder:, we can devise a scheme where the folder: query defines what is to be matched. For example: folder:foo match files in foo, foo/new, and foo/cur. - folder:foo folder:foo/match all files in all subdirectories under foo (this would handle Tomi's use case), including foo/new and foo/cur. - path:foo/** folder:foo/. match in foo only, and specifically not in foo/cur or foo/new. - path:foo folder:foo/new match in foo/new, and specifically not in foo/cur (this allows distinguishing between messages in cur and new). - path:foo/new folder:/ match everything. - path:** folder:/. match in top level maildir only. - path: folder: match in top level maildir, including cur/new. - folder: I'd like these details to be ironed out and agreed on before I send the next version. This all looks good to me. BR, Jani. ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
Re: [PATCH 0/5] lib: make folder: prefix literal
Quoth Rob Browning on Jan 31 at 1:19 pm: Austin Clements amdra...@mit.edu writes: folder: could work the way I suggested (simply the path to the file, with {cur,new} stripped off). Hmm, so would notmuch try to guess whether or not it's dealing with a maildir++ tree, and if so convert folder:foo to a search of .foo, and/or folder:foo/bar to .foo.bar? Or would the user just need to know to say folder:.foo and folder:.foo.bar? My opinion on this has changed over time, but I don't think we should try to interpret Maildir++ trees specially. That is, the user would have to say folder:.foo.bar if they're using Maildir++. The . seems as good as a / for a separator, so we might as well not translate it. The leading . is annoying, but *shrug* so is Maildir++. And if we're only planning special treatment for for maildir-like stores, then I wonder if the term should just be maildir:? The simple algorithm of taking the relative path and stripping {new,cur} (if present) does a good job of supporting both Maildir and non-Maildir stores (while balancing this support with simplicity, predictability, and usability). Though folder: would make more sense if the long-term goal was to have a DTRT term. But in that case, I wonder if it might eventually be expected to support mixed trees, i.e. say a tree containing maildir++ and mh subdirs, and if so, how that should be handled. The simple {new,cur}-stripping algorithm already does fairly well at this. Worrying more about mixed Maildir++ and MH stores seems unnecessary to me unless someone demonstrates and actual need. many shells support ** for recursive path matching and people are already quite familiar with glob patterns for paths, so why not simply adopt this? rsync too. Ah, sure enough. Even better! ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
Re: [PATCH 0/5] lib: make folder: prefix literal
Austin Clements amdra...@mit.edu writes: The simple algorithm of taking the relative path and stripping {new,cur} (if present) does a good job of supporting both Maildir and non-Maildir stores (while balancing this support with simplicity, predictability, and usability). Unless, of course, the user has legitimate folders named cur and new, but perhaps that'll just end up a don't do that then FAQ... -- Rob Browning rlb @defaultvalue.org and @debian.org GPG as of 2011-07-10 E6A9 DA3C C9FD 1FF8 C676 D2C4 C0F0 39E9 ED1B 597A GPG as of 2002-11-03 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4 ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
Re: [PATCH 0/5] lib: make folder: prefix literal
Austin Clements amdra...@mit.edu writes: Agreed. I believe this will also support MH, if I understand MH correctly (does anyone actually use MH?) When I started notmuch, I had all of my mail in one-message-per-file in various directories, (without these silly cur and new directories that maildir uses). At some point, I did a mass conversion of all of my directories to be roughly: ~/mail//MM So that I keep directories small by just delivering a month's worth of mail to each directory. This conversion, (and the delivery agent I am currently using, maildrop), happen to create the silly cur and new directories. So most of my mail still is in maildir format now. But I do have a few messages in non-maildir directories. These have generally come into being in cases such as someone providing me a message to demonstrate a notmuch bug or use case. So in cases like this I did things like: mkdir ~/mail/bug-description cp example-file ~/mail/bug-description I also have a few directories created similarly when I've copied some downloaded archives from a mailing list into my mail storage. (But often I've used mb2md for those so the conversion has accidentally created maildir directories). I don't know if the non-maildir directories I have are strict mh format, (did it have filenames with sequential numbers? I don't recall). But my intention with notmuch from the beginning was to support any one-message-per-file layout without enforcing any particular naming of directories or files. And I would like to see that preserved. Since then, we have also supported various semantics when people do encode information in directories and filenames, (such as ignoring cur/new and interpreting maildir flags). This kind of thing does seem good. -Carl pgpZXkuCQyczA.pgp Description: PGP signature ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
Re: [PATCH 0/5] lib: make folder: prefix literal
On Fri, 31 Jan 2014, Austin Clements amdra...@mit.edu wrote: What if we introduce two prefixes, say folder: and path: (maybe dir:?) to address both use cases, each as naturally as possible? Both would be boolean prefixes because of the limitations of probabilistic prefixes, but we could take advantage of Jani's idea of generating several boolean terms. Agreed. On to details: folder: could work the way I suggested (simply the path to the file, with {cur,new} stripped off). What if the file is not in a folder named cur/new? I suggest indexing the folder as-is, if only for some backwards compatibility. What if there is not all of cur/new/tmp folders? I suggest ignoring that, and only look at the path to the file being indexed. This is simplest to implement, and it does not matter if the sibling directories come and go, and for this reason also unsurprising. For top level cur/new, index the empty string . path: would support file system search uses. These seem more varied, but I think fall into exact match and recursive match. Since I don't have this use case, I can't have any strong opinions about syntax, but I'll throw out an idea: many shells support ** for recursive path matching and people are already quite familiar with glob patterns for paths, so why not simply adopt this? In other words, when adding the path a/b/cur/x:2, add path: terms a/b/cur and a/b/** and a/** and **. Since folder: would cover the cur/new cases, I suggest the non-recursive variant of path: prefix is the exact filesystem folder name as-is (with the top level being the empty string ). I presume this is what you meant too. I kind of like the /** suffix for recursive, but there's two small wrinkles: 1) it needs quoting on the command line (unlike my original suggestion of just / suffix), and 2) what should the top level recursive search be? path:** or path:/** or path:./**? I guess the first one is most obvious? So here's what my original suggestions would become: Here's a thought. With boolean prefix folder:, we can devise a scheme where the folder: query defines what is to be matched. For example: folder:foo match files in foo, foo/new, and foo/cur. - folder:foo folder:foo/ match all files in all subdirectories under foo (this would handle Tomi's use case), including foo/new and foo/cur. - path:foo/** folder:foo/. match in foo only, and specifically not in foo/cur or foo/new. - path:foo folder:foo/new match in foo/new, and specifically not in foo/cur (this allows distinguishing between messages in cur and new). - path:foo/new folder:/ match everything. - path:** folder:/.match in top level maildir only. - path: folder:match in top level maildir, including cur/new. - folder: I'd like these details to be ironed out and agreed on before I send the next version. BR, Jani. ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
Re: [PATCH 0/5] lib: make folder: prefix literal
Austin Clements amdra...@mit.edu writes: folder: could work the way I suggested (simply the path to the file, with {cur,new} stripped off). Hmm, so would notmuch try to guess whether or not it's dealing with a maildir++ tree, and if so convert folder:foo to a search of .foo, and/or folder:foo/bar to .foo.bar? Or would the user just need to know to say folder:.foo and folder:.foo.bar? And if we're only planning special treatment for for maildir-like stores, then I wonder if the term should just be maildir:? Though folder: would make more sense if the long-term goal was to have a DTRT term. But in that case, I wonder if it might eventually be expected to support mixed trees, i.e. say a tree containing maildir++ and mh subdirs, and if so, how that should be handled. many shells support ** for recursive path matching and people are already quite familiar with glob patterns for paths, so why not simply adopt this? rsync too. -- Rob Browning rlb @defaultvalue.org and @debian.org GPG as of 2011-07-10 E6A9 DA3C C9FD 1FF8 C676 D2C4 C0F0 39E9 ED1B 597A GPG as of 2002-11-03 14DD 432F AE39 534D B592 F9A0 25C8 D377 8C7E 73A4 ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
Re: [PATCH 0/5] lib: make folder: prefix literal
Quoth Jani Nikula on Jan 25 at 5:38 pm: On Sat, 25 Jan 2014, Jani Nikula j...@nikula.org wrote: Perhaps we need to have two prefixes, one of which is the literal filesystem folder and another which hides the implementation details, like I mentioned in my mail to Peter [1]. But consider this: my proposed implementation does cover *all* use cases. Here's a thought. With boolean prefix folder:, we can devise a scheme where the folder: query defines what is to be matched. For example: folder:foomatch files in foo, foo/new, and foo/cur. folder:foo/ match all files in all subdirectories under foo (this would handle Tomi's use case), including foo/new and foo/cur. folder:foo/. match in foo only, and specifically not in foo/cur or foo/new. folder:foo/new match in foo/new, and specifically not in foo/cur (this allows distinguishing between messages in cur and new). folder:/ match everything. folder:/. match in top level maildir only. folder: match in top level maildir, including cur/new. This requires indexing all the path components with suitable suffixes. For example, a file foo/new/baz would get terms /, foo, foo/, foo/new, and foo/new/.. A file foo/bar would get terms /, foo, foo/, and foo/.. It's obviously a concern this increases the database size; not sure how it would compare with the current stemmed probabilistic prefix. Opinions on this? This would really cover all use cases, and address Austin's interface and backward compatibility concerns. I like this idea in general, though I agree with others that the specific syntax seems a little wanting. The concept of adding several boolean terms seems powerful, and I would be surprised if the extra terms had any substantive effect on database size. However, it seems like this is overloading one prefix for two meanings. And I think that's because people want two similar but distinct things. Several of us want a simple, natural Maildir-aware folder search (the Maildir folder of a/b/cur/x:2, is a/b). Others want file system search. It's easy to conflate these because Maildir represents folders as directory paths, but maybe they need to be treated as distinct things. What if we introduce two prefixes, say folder: and path: (maybe dir:?) to address both use cases, each as naturally as possible? Both would be boolean prefixes because of the limitations of probabilistic prefixes, but we could take advantage of Jani's idea of generating several boolean terms. folder: could work the way I suggested (simply the path to the file, with {cur,new} stripped off). path: would support file system search uses. These seem more varied, but I think fall into exact match and recursive match. Since I don't have this use case, I can't have any strong opinions about syntax, but I'll throw out an idea: many shells support ** for recursive path matching and people are already quite familiar with glob patterns for paths, so why not simply adopt this? In other words, when adding the path a/b/cur/x:2, add path: terms a/b/cur and a/b/** and a/** and **. BR, Jani. ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
Re: [PATCH 0/5] lib: make folder: prefix literal
On Sun, 26 Jan 2014, Carl Worth cwo...@cworth.org wrote: Jani Nikula j...@nikula.org writes: Here's a thought. With boolean prefix folder:, we can devise a scheme where the folder: query defines what is to be matched. I like the idea, but I tried to infer the rules from the examples, and I failed. It looks like there are two new symbols, / and /. but I couldn't decipher the exact semantics of each. I think a proposal like this should not re-use the '/' symbol as we already have that as a path divider. (See rsync for lots of user confusion with a significant trailing '/'). I propose a similar, but slightly different approach, where we add two additional symbols: '^' Matches the beginning of a path '$' Matches the end of a path [Obviously, I chose these symbols from regular expressions. I would be OK with alternate symbols, ('$' seems like it might be problematic in the shell, but perhaps not too much if it's always at the end of a phrase.)] This way, one could search for: folder:foo Works like folder: historically folder:^full/path$ Works like Jani's proposal folder:^path/prefix Satisfies Tomi's use case, (as well as anyone who doesn't want to have to specify or distinguish between /cur or /new. Any extra '/' at the beginning or end of a search string, (such as folder:^/full/path/$) would not change the semantics. Further, I think we can implement this with less database bloat by leaving folder as probabilistic and simply indexing two new terms to indicate the beginning of the path and the end of the path. Finally, we could also extend the scheme to other things like subject: to allow for an exact subject search like: subject:^lib: make folder: prefix literal$ It was with an eye toward something like this that I chose to make folder: probabilistic in the first place. (I probably would have indexed things appropriately in the first place as well, but at the time doing the necessary query parsing for '^' and '$' seemed daunting). Unfortunately, I haven't had the time to experiment with this. But it bugs me that the probabilistic folder: prefix has stemming and it's case insensitive. It's possible to work around the stemming with the anchors you suggest or by quoting, but is there a way to have case sensitive probabilistic prefixes? BR, Jani. ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
Re: [PATCH 0/5] lib: make folder: prefix literal
Jani Nikula j...@nikula.org writes: Unfortunately, I haven't had the time to experiment with this. But it bugs me that the probabilistic folder: prefix has stemming and it's case insensitive. It's possible to work around the stemming with the anchors you suggest or by quoting, but is there a way to have case sensitive probabilistic prefixes? The stemming and case insensitivity just has to do with which terms are shoved into the database, (you have to add extra terms to get these features). If we're getting those features for folder now, (and I agree that we don't want them), it's because we're calling some Xapian convenience function along the lines of create a bunch of terms for this chunk of text. The fix for that is to do the simple thing and simply break the path at each '/' and add a term for each component. Then these problems all go away. So fixes for this should not require switching from a probabilistic to a Boolean prefix. -Carl -- carl.d.wo...@intel.com pgpO4oiDDTjcx.pgp Description: PGP signature ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
Re: [PATCH 0/5] lib: make folder: prefix literal
On Wed, 29 Jan 2014, Carl Worth cwo...@cworth.org wrote: Austin Clements acleme...@csail.mit.edu writes: I think you're assuming we have much more control over this than we do. To be fair, I only started discussing my proposal for '^' and '$' in response to Jani's proposal with special semantics for trailing '/' and /.. I only chose those to avoid any collisions with actual file names, without much further thought. I'm not a fan of rsync's trailing '/' semantics either. The main point was to demonstrate that if folder: were a boolean prefix, it would be possible to index folder terms in a way that would address the issues with the current folder: prefix. I don't have good counter-proposals now, but an *example* is having sub-prefixes like folder:recursive::foo or folder:maildir::bar, where the former would match anything under foo and the latter would match anything in bar/new and bar/cur. These recursive:: and maildir:: prefixes would be just part of the indexed boolean terms. Support for any of this magic syntax would require a custom query parser, yes. Austin, haven't you been proposing a custom query parser for ages? Where does that work stand now? That is the unicorn... many of the query improvements I have in mind depend on a custom query parser. So I'd like to have that. And a pony. But in the mean time, I'm left wondering whether I should pursue folder: as a boolean prefix, or try to figure out if there are improvements to be made as a probabilistic prefix, or just put this work on hold. With the db upgrade and upgrade tests, it's not exactly a trivial amount of work. BR, Jani. ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
Re: [PATCH 0/5] lib: make folder: prefix literal
On Fri, 24 Jan 2014, Austin Clements acleme...@csail.mit.edu wrote: On Thu, 09 Jan 2014, Jani Nikula j...@nikula.org wrote: Hi all, this series makes the folder: search prefix literal, or switches it from a probabilistic prefix to a boolean prefix. With this, you have to give the path from the maildir root to the folder you want in full, including the maildir cur/new component, if any. Examples: I strongly disagree with requiring the cur/new component. The cur/new directory is an internal implementation detail of Maildir (and a rather broken one at that) and no more a part of the folder of a piece of mail than its final file name component. It's also the less obvious user interface; if we require the cur/new component, we *will* get people asking why their folder searches aren't working, but if we strip the cur/new component, nobody will be surprised. I think the question is not whether we should strip cur/new, but when. We've already defined a _filename_is_in_maildir test in lib/message.cc, which we depend on for flag sync. It's simple, but I think this would be the right thing to use for consistency. I'd like to discuss some of the reasons I chose to include the cur/new components. Admittedly, none of them are very strong on their own, but all of them together tilted my opinion towards requiring them. The way I see it, notmuch supports maildir, but does not require it. In many ways the messages are just files somewhere in the directory hierarchy. There are only a few cases where it matters that there are cur/new/tmp directories within a directory. If you strip cur and new, it becomes impossible to distinguish between files in foo, foo/cur, and foo/new - and one of the reasons for changing folder: in the first place is to be able to better distinguish between folders. Apparently mutt presents the difference between messages in new and cur to its users (so I've been told; I've never really used mutt), and our integration with mutt lacks that distinction. We could fix that by requiring the cur/new components in folder: searches. Speaking of consistency, compare _filename_is_in_maildir() with _entries_resemble_maildir() in notmuch-new.c. What should the indexed folder: prefix be if there is not all of cur, new, and tmp? We will actually index files in tmp if cur or new is not present! What if the missing sibling directories are added (or existing ones removed) later? Where's the consistency compared to new.ignore config, which also matches the cur/new components if so desired? Or consistency with notmuch search --output=files? My conclusion was that requiring *all* filesystem folder components as-is is consistent, most versatile, agnostic to Maildir or Maildir++ implementation details wrt directory naming or hierarchy, without difficult corner cases, simplest to implement, and unsurprising (once you understand the cur/new distinction). For *me* this is the more obvious user interface. And hey, I'm a user too. Perhaps we need to have two prefixes, one of which is the literal filesystem folder and another which hides the implementation details, like I mentioned in my mail to Peter [1]. But consider this: my proposed implementation does cover *all* use cases. BR, Jani. [1] id:8761ppurfz@nikula.org ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
Re: [PATCH 0/5] lib: make folder: prefix literal
On Sat, Jan 25 2014, Jani Nikula j...@nikula.org wrote: On Fri, 24 Jan 2014, Austin Clements acleme...@csail.mit.edu wrote: On Thu, 09 Jan 2014, Jani Nikula j...@nikula.org wrote: Hi all, this series makes the folder: search prefix literal, or switches it from a probabilistic prefix to a boolean prefix. With this, you have to give the path from the maildir root to the folder you want in full, including the maildir cur/new component, if any. Examples: I strongly disagree with requiring the cur/new component. The cur/new directory is an internal implementation detail of Maildir (and a rather broken one at that) and no more a part of the folder of a piece of mail than its final file name component. It's also the less obvious user interface; if we require the cur/new component, we *will* get people asking why their folder searches aren't working, but if we strip the cur/new component, nobody will be surprised. I think the question is not whether we should strip cur/new, but when. We've already defined a _filename_is_in_maildir test in lib/message.cc, which we depend on for flag sync. It's simple, but I think this would be the right thing to use for consistency. I'd like to discuss some of the reasons I chose to include the cur/new components. Admittedly, none of them are very strong on their own, but all of them together tilted my opinion towards requiring them. The way I see it, notmuch supports maildir, but does not require it. In many ways the messages are just files somewhere in the directory hierarchy. There are only a few cases where it matters that there are cur/new/tmp directories within a directory. If you strip cur and new, it becomes impossible to distinguish between files in foo, foo/cur, and foo/new - and one of the reasons for changing folder: in the first place is to be able to better distinguish between folders. Apparently mutt presents the difference between messages in new and cur to its users (so I've been told; I've never really used mutt), and our integration with mutt lacks that distinction. We could fix that by requiring the cur/new components in folder: searches. Speaking of consistency, compare _filename_is_in_maildir() with _entries_resemble_maildir() in notmuch-new.c. What should the indexed folder: prefix be if there is not all of cur, new, and tmp? We will actually index files in tmp if cur or new is not present! What if the missing sibling directories are added (or existing ones removed) later? Where's the consistency compared to new.ignore config, which also matches the cur/new components if so desired? Or consistency with notmuch search --output=files? My conclusion was that requiring *all* filesystem folder components as-is is consistent, most versatile, agnostic to Maildir or Maildir++ implementation details wrt directory naming or hierarchy, without difficult corner cases, simplest to implement, and unsurprising (once you understand the cur/new distinction). For *me* this is the more obvious user interface. And hey, I'm a user too. Perhaps we need to have two prefixes, one of which is the literal filesystem folder and another which hides the implementation details, like I mentioned in my mail to Peter [1]. But consider this: my proposed implementation does cover *all* use cases. I challenge that with my use case: my mails are arranged as follows: head of contents of notmuch archive prior to my involvement: $ find notmuch | head -5 notmuch notmuch/6b notmuch/6b/de820df0697ab2d235fbc8e32510d7 notmuch/6b/917afddb116a03c45371282be58388 notmuch/6b/10eb0bc1406f6767161f5883f328f7 head of contents of received mail $ find notmuch | head -5 received received/rawmail2 received/6b received/6b/86a8937aac57721ad87f0e0e5fe6c3 received/6b/3278d6c4c1fe7604f1404bc09acff7 Interestingly find started with subdirectory '6b' in both cases... -- anyway I have 0xff + 1 subdirectories in each mail directory and $ md5sum received/6b/86a8937aac57721ad87f0e0e5fe6c3 outputs 6b86a8937aac57721ad87f0e0e5fe6c3 received/6b/86a8937aac57721ad87f0e0e5fe6c3 For me the current folder: works as I don't have collisions. For me a folder: search which would just work as a prefix i.e. match anything under given directory hierarchy would work best. At the end it might be that I have to hack the search for my purposes; more important/interesting thing is whether I need to use incompatible database format :O Tomi BR, Jani. [1] id:8761ppurfz@nikula.org ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
Re: [PATCH 0/5] lib: make folder: prefix literal
On Sat, 25 Jan 2014, Tomi Ollila tomi.oll...@iki.fi wrote: On Sat, Jan 25 2014, Jani Nikula j...@nikula.org wrote: Perhaps we need to have two prefixes, one of which is the literal filesystem folder and another which hides the implementation details, like I mentioned in my mail to Peter [1]. But consider this: my proposed implementation does cover *all* use cases. I challenge that with my use case: my mails are arranged as follows: [snip] For me the current folder: works as I don't have collisions. Fair enough, your use case would be *very inconvenient* with the proposed changes to the folder: prefix, *regardless* of whether the leaf cur/new is indexed and required or not. (Very inconvenient, or practically impossible, as you'd have to include all those 01..ff directories in your searches.) For me a folder: search which would just work as a prefix i.e. match anything under given directory hierarchy would work best. Indeed. Your use case is not an argument in whether cur/new should be included or not. That recursive folder prefix suggestion is, I think, incompatible with the requirements for the literal folder: prefix we've been considering. BR, Jani. ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
Re: [PATCH 0/5] lib: make folder: prefix literal
On Sat, 25 Jan 2014, Jani Nikula j...@nikula.org wrote: Perhaps we need to have two prefixes, one of which is the literal filesystem folder and another which hides the implementation details, like I mentioned in my mail to Peter [1]. But consider this: my proposed implementation does cover *all* use cases. Here's a thought. With boolean prefix folder:, we can devise a scheme where the folder: query defines what is to be matched. For example: folder:foo match files in foo, foo/new, and foo/cur. folder:foo/ match all files in all subdirectories under foo (this would handle Tomi's use case), including foo/new and foo/cur. folder:foo/.match in foo only, and specifically not in foo/cur or foo/new. folder:foo/new match in foo/new, and specifically not in foo/cur (this allows distinguishing between messages in cur and new). folder:/match everything. folder:/. match in top level maildir only. folder: match in top level maildir, including cur/new. This requires indexing all the path components with suitable suffixes. For example, a file foo/new/baz would get terms /, foo, foo/, foo/new, and foo/new/.. A file foo/bar would get terms /, foo, foo/, and foo/.. It's obviously a concern this increases the database size; not sure how it would compare with the current stemmed probabilistic prefix. Opinions on this? This would really cover all use cases, and address Austin's interface and backward compatibility concerns. BR, Jani. ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
Re: [PATCH 0/5] lib: make folder: prefix literal
Jani Nikula j...@nikula.org writes: On Sat, 25 Jan 2014, Jani Nikula j...@nikula.org wrote: Perhaps we need to have two prefixes, one of which is the literal filesystem folder and another which hides the implementation details, like I mentioned in my mail to Peter [1]. But consider this: my proposed implementation does cover *all* use cases. Here's a thought. With boolean prefix folder:, we can devise a scheme where the folder: query defines what is to be matched. For example: folder:foomatch files in foo, foo/new, and foo/cur. folder:foo/ match all files in all subdirectories under foo (this would handle Tomi's use case), including foo/new and foo/cur. handling hierarchies sounds useful and natural folder:foo/. match in foo only, and specifically not in foo/cur or foo/new. folder:foo/new match in foo/new, and specifically not in foo/cur (this allows distinguishing between messages in cur and new). is new special cased here? or do you rely on it being a leaf directory? folder:/ match everything. folder:/. match in top level maildir only. folder: match in top level maildir, including cur/new. I could certainly support this UI, assuming the database bloat is not too bad. I started to wonder about using 3 prefixes instead, but then I read your message again and a light went on. ;). d ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
Re: [PATCH 0/5] lib: make folder: prefix literal
On Sat, 25 Jan 2014, David Bremner da...@tethera.net wrote: Jani Nikula j...@nikula.org writes: On Sat, 25 Jan 2014, Jani Nikula j...@nikula.org wrote: Perhaps we need to have two prefixes, one of which is the literal filesystem folder and another which hides the implementation details, like I mentioned in my mail to Peter [1]. But consider this: my proposed implementation does cover *all* use cases. Here's a thought. With boolean prefix folder:, we can devise a scheme where the folder: query defines what is to be matched. For example: folder:foo match files in foo, foo/new, and foo/cur. folder:foo/ match all files in all subdirectories under foo (this would handle Tomi's use case), including foo/new and foo/cur. handling hierarchies sounds useful and natural folder:foo/. match in foo only, and specifically not in foo/cur or foo/new. folder:foo/new match in foo/new, and specifically not in foo/cur (this allows distinguishing between messages in cur and new). is new special cased here? or do you rely on it being a leaf directory? A little bit of both I guess; not too bad. An alternative might be to make the variant without the trailing / recursive, so folder:foo would match all files in all subdirectories under foo, including foo/new and foo/cur. This would be more compatible with the current folder: prefix too. Then, if you wanted to match without recursion, you'd have to have folder:foo/. OR folder:foo/new OR folder:foo/cur assuming new and cur are leaf nodes, and if not, with /. at the end. But you'd have to decide what to do with folder:foo/ which would then match nothing. There's definitely room for thoughts and discussion. folder:/ match everything. folder:/.match in top level maildir only. folder:match in top level maildir, including cur/new. I could certainly support this UI, assuming the database bloat is not too bad. I started to wonder about using 3 prefixes instead, but then I read your message again and a light went on. ;). \o/ BR, Jani. ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
Re: [PATCH 0/5] lib: make folder: prefix literal
Jani Nikula j...@nikula.org writes: Here's a thought. With boolean prefix folder:, we can devise a scheme where the folder: query defines what is to be matched. I like the idea, but I tried to infer the rules from the examples, and I failed. It looks like there are two new symbols, / and /. but I couldn't decipher the exact semantics of each. I think a proposal like this should not re-use the '/' symbol as we already have that as a path divider. (See rsync for lots of user confusion with a significant trailing '/'). I propose a similar, but slightly different approach, where we add two additional symbols: '^' Matches the beginning of a path '$' Matches the end of a path [Obviously, I chose these symbols from regular expressions. I would be OK with alternate symbols, ('$' seems like it might be problematic in the shell, but perhaps not too much if it's always at the end of a phrase.)] This way, one could search for: folder:fooWorks like folder: historically folder:^full/path$Works like Jani's proposal folder:^path/prefix Satisfies Tomi's use case, (as well as anyone who doesn't want to have to specify or distinguish between /cur or /new. Any extra '/' at the beginning or end of a search string, (such as folder:^/full/path/$) would not change the semantics. Further, I think we can implement this with less database bloat by leaving folder as probabilistic and simply indexing two new terms to indicate the beginning of the path and the end of the path. Finally, we could also extend the scheme to other things like subject: to allow for an exact subject search like: subject:^lib: make folder: prefix literal$ It was with an eye toward something like this that I chose to make folder: probabilistic in the first place. (I probably would have indexed things appropriately in the first place as well, but at the time doing the necessary query parsing for '^' and '$' seemed daunting). -Carl -- carl.d.wo...@intel.com pgpGvAffMislw.pgp Description: PGP signature ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
Re: [PATCH 0/5] lib: make folder: prefix literal
On Thu, 09 Jan 2014, Jani Nikula j...@nikula.org wrote: Hi all, this series makes the folder: search prefix literal, or switches it from a probabilistic prefix to a boolean prefix. With this, you have to give the path from the maildir root to the folder you want in full, including the maildir cur/new component, if any. Examples: I strongly disagree with requiring the cur/new component. The cur/new directory is an internal implementation detail of Maildir (and a rather broken one at that) and no more a part of the folder of a piece of mail than its final file name component. It's also the less obvious user interface; if we require the cur/new component, we *will* get people asking why their folder searches aren't working, but if we strip the cur/new component, nobody will be surprised. I think the question is not whether we should strip cur/new, but when. We've already defined a _filename_is_in_maildir test in lib/message.cc, which we depend on for flag sync. It's simple, but I think this would be the right thing to use for consistency. folder:cur folder:foo/bar folder: The last one can be used to refer to the maildir root (note that in shell you'll need quoting to pass the double quotes to xapian, folder:''). The old probabilistic folder: prefix is problematic in a number of ways. It's not possible to refer to the maildir root. It does stemming, so inboxing would match inbox too. cur for the folder in maildir root would match all cur folders across the maildir hierarchy. Likely some others I forgot. WARNING! The change requires a database format version bump, and a database upgrade, which is automatically done on 'notmuch new'. The upgrade is irreversible if you want to try this on your database! A complete database rebuild is required for reverting the database format version. Make sure your backups are in order! The series includes some tests, including an initial upgrade test, along with a test database in the previous format version. BR, Jani. Jani Nikula (5): lib: make folder: prefix literal test: fix insert folder: searches test: fix test for literal folder: search test: add test database in format version 1 test: add database upgrade test from format version 1 lib/database.cc| 39 - lib/message.cc | 154 + lib/notmuch-private.h | 3 + test/insert| 10 +-- test/notmuch-test | 1 + test/search-by-folder | 24 - test/test-databases/README | 5 ++ test/test-databases/database-v1.tar.gz | Bin 0 - 252243 bytes test/upgrade | 25 ++ 9 files changed, 174 insertions(+), 87 deletions(-) create mode 100644 test/test-databases/README create mode 100644 test/test-databases/database-v1.tar.gz create mode 100755 test/upgrade -- 1.8.5.2 ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
Re: [PATCH 0/5] lib: make folder: prefix literal
Austin Clements acleme...@csail.mit.edu writes: On Thu, 09 Jan 2014, Jani Nikula j...@nikula.org wrote: I strongly disagree with requiring the cur/new component. The cur/new directory is an internal implementation detail of Maildir (and a rather broken one at that) and no more a part of the folder of a piece of mail than its final file name component. It's also the less obvious user interface; if we require the cur/new component, we *will* get people asking why their folder searches aren't working, but if we strip the cur/new component, nobody will be surprised. My gut instinct agrees with Austin here, at least for a prefix named folder. Some factors I remember - are there some corner cases for people not using Maildir? - isn't there some information encoded in whether a message is in cur or new? there was some discussion of mutt presenting this information d ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch