Add support for `--limit=N` to `notmuch show`
In a previous email (about `thread:...` field in JSON output of `notmuch show`), I described one of my use-cases for notmuch. Now extending upon that, if one would to implement an email client that provides the user with search, there are two approaches: * use `notmuch search -- {query}` and based on that output display a thread list like GMail does; * use `notmuch show -- {query}` and based on that display a page with all emails that matched, grouping them by thread; (I prefer this variant, as it gives me a quicker glance if I search for something specific;) Now the problem with `notmuch show` is that if I give it a too "broad" query like `*` it will chew a lot of CPU and RAM (and in my case eventually crash). `notmuch search` does have a `--limit=N` argument that limits the search output only to the first `N` items. My feature request is to add such a flag also to `notmuch show` that should: * limit the number of threads in all cases except `--format=raw`; * not be allowed in case of `--format=raw` or `--part=P`; As a work-around I could use `notmuch search --output=threads --limit={limit} -- {query}`, then take those thread ID's and issue an `notmuch show -- thread:... thread:...`. But this has the following problems: * it requires two `notmuch` CLI calls; * and most importantly it renders the `--entire-thread=false` feature useless; (as not the entire threads are matched by `notmuch show` as opposed only to those matched by `notmuch search`;) Thanks, Ciprian. ___ notmuch mailing list notmuch@notmuchmail.org https://notmuchmail.org/mailman/listinfo/notmuch
Re: Add support for `thread` field in `notmuch show`
On Fri, May 1, 2020 at 3:09 PM David Bremner wrote: > Ciprian Dorin Craciun writes: > > I know that one can use `thread:{id:MESSAGE_ID}` to achieve the same > > result, however: > > * it is somewhat cumbersome for the integrator; > > Out of curiousity, what is harder about it? In both cases you have to > extra one value from the JSON. It is cumbersome because: * for once, now you need for each email to run a new `notmuch search` instance to get that email's thread id; this is very sub-optimal when you have more than a handful emails; * secondly, it adds more code to the client; To understand my use-case: I currently intend to use `notmuch show --format=json` to search my emails, and based on that to generate a nice HTML page, displaying all found emails. Now I want to include in each email's section a link to only display the thread. In order to do that, I either have to generate (by usin the technique described above) the `thread:...` for each of those emails, which in turn generates one CLI call per email. (And 99% of the time perhaps I don't even click the thread.) (Another option would be to use the `thread:{id:...}` for that link, but I find this quite a hack.) > > * having the thread identifier explicitly, could be used as a key in a > > cache, or other internal lookups; > > > > In fact the only way one can extract the thread identifier via the > > `notmuch` CLI is to use `notmuch search --output=threads -- > > id:MESSAGE_ID` > > Offhand I have no strong objection to someone (who is not me) adding > this. I think it's important to be aware that thread id's are ephemeral, > and subject to change e.g. if the database is re-built from > scratch. I understand that `thread:...` is tied to a particular database, but that shouldn't be an issue, as people don't regenerate often their databases, and the caches are usually short-lived. This weekend I'll try to take a stab at adding this to `notmuch`. Ciprian. ___ notmuch mailing list notmuch@notmuchmail.org https://notmuchmail.org/mailman/listinfo/notmuch
Re: Any updates on the `List-Id` indexing feature?
On Wed, Apr 29, 2020 at 8:08 PM David Bremner wrote: > > I've also read the FAQ: > > * https://notmuchmail.org/faq/#index8h2 > > Oops, that needs to be updated. > > It is implemented. See notmuch-config(1), under "index.header" That's perfect. However the `search-terms` man pages doesn't say how it should be used. Should I gather (from the `config` manpage) that these "prefixes" should always start with a upper-case letter, as in: `notmuch search -- 'List:some-id'`? Thanks, Ciprian. ___ notmuch mailing list notmuch@notmuchmail.org https://notmuchmail.org/mailman/listinfo/notmuch
Any updates on the `List-Id` indexing feature?
I've searched the mailing list archives about the `List-Id` feature: * https://www.mail-archive.com/notmuch@notmuchmail.org/msg43214.html * https://www.mail-archive.com/notmuch@notmuchmail.org/msg22092.html * https://www.mail-archive.com/notmuch@notmuchmail.org/msg14146.html I've also read the FAQ: * https://notmuchmail.org/faq/#index8h2 Although I understand why it's not implemented right now, however given how important it is to correctly handle emails from mailing lists, I wanted to ask if there was any progress made in this regard? Should I try to handle it myself in my own workflow, or is it on the "roadmap". :) Thanks, Ciprian. ___ notmuch mailing list notmuch@notmuchmail.org https://notmuchmail.org/mailman/listinfo/notmuch
Add support for `thread` field in `notmuch show`
According to the `devel/schemata` the message object doesn't contain the thread identifier to which it was assigned in the database. Sometimes, for example in an UI that displays a search result at message level, it would be useful to know the thread each message belongs to, so the user can easily switch to the entire thread. I know that one can use `thread:{id:MESSAGE_ID}` to achieve the same result, however: * it is somewhat cumbersome for the integrator; * having the thread identifier explicitly, could be used as a key in a cache, or other internal lookups; In fact the only way one can extract the thread identifier via the `notmuch` CLI is to use `notmuch search --output=threads -- id:MESSAGE_ID` Ciprian. ___ notmuch mailing list notmuch@notmuchmail.org https://notmuchmail.org/mailman/listinfo/notmuch
Re: Inconsistencies in handling command flags: `--flag=value` different than `--flag value`
On Wed, Apr 29, 2020 at 6:39 PM David Bremner wrote: > I guess I'm a bit leery of removing UI features that presumably at least > some people rely on. It's pretty upsetting to have sofware break one's > muscle memory. I think there are two complete different use-cases for the `notmuch` binary: * a simple CLI to query the database, in which case the current flags seem OK; * a "poor-mans" API to query the database, more bellow; (I know there already exists an `libnotmuch` API accessible in many programming languages. However for prototyping, and even for safety and robustness, when performance isn't an issue, I find the tool-based approach much more resilient.) Now about the "API" use-case, I assume that at the moment many users have already integrated `notmuch` as it is with the current flags and behaviour. Thus I agree that changing any flags in backward incompatible way would make a lot of people unhappy, and will generate perhaps quite a bit of "customer support". :) However, even with my `--strict` argument, I was perhaps leaning toward adding a more API-friendly command line parser, that would basically only take arguments in the form `--flag=value`, anything else being considered a search term, and anything not a flag but before a single `--` should be considered an error. Regarding the `--boolean` vs `--no-boolean` it does solve the strictness problem, however it makes the life of script developers quite hard, as now he has a `case` or `if/then/else`. Therefore I would say that `--flag=value` is the best option as it can be simply written as `--flag={FLAG:-true}` or in Python for example `"--flag=%s" % _flag`. Thinking even further uppon this, perhaps an even simpler idea would be to provide a new command, like for example `notmuch api` that takes on `stdin` a JSON with a specific format and does its job. Ciprian. ___ notmuch mailing list notmuch@notmuchmail.org https://notmuchmail.org/mailman/listinfo/notmuch
Re: Inconsistencies in handling command flags: `--flag=value` different than `--flag value`
On Mon, Apr 27, 2020 at 9:21 PM Tomi Ollila wrote: > > On Mon 2020-04-27 14:53:07 -0300, David Bremner wrote: > >> Quoting notmuch(1) > >> > >>OPTION SYNTAX > >>All options accepting an argument can be used with '=' > >>or ':' as a separator. For the cases where it's not ambiguous > >>(in particular excluding boolean options), a space can also be > >>used. I definitively skipped over that warning, mainly because I was reading the man-page for the specific command (i.e. `notmuch-search`, etc.) that don't feature that warning. Please note that I understand "why" I get this behavior, and definitively I agree that it's my fault. However my initial report was intended to find a way that new users don't shoot themselves in the foot, especially since many will use `notmuch` from a script, and sometimes they don't thoroughly check the arguments passed by the user. > > Alternately, we could deprecate using whitespace for all options, > > produce explicit warnings to stderr when whitespace appears on the next > > was it so, that originally we did not support whitespace, but David > added that in some commit... >From a "correctness" point of view, this would be the best approach. However I think it could be too late to introduce it, and it would break too many integrations. > > release, remove the suggestion to use a whitespace separator from the > > documentation, and eventually phase it out entirely in some future > > release. > > Alternatively we could check that next arg is (case-insensitively) > (subset of) 'true', 'false', 'yes', 'no', '0', '1', 't', 'nil' > (but not tpyoes of these ;) and in that case have that as an option > value... This would be perhaps the best approach. However I don't think it would solve the issues for integrators that would not see these warnings in the logs, until it is too late. > ... would that work better for human user who just wants to be > fluent on command line -- frontends can then always use = and option > values... Perhaps there could be an additional option (either on the command line or in the configuration) that would apply "strict" checking, and not letting any other form except `--argument=value`, including the boolean flags, and failing loudly. I think this third option would enable much safer integrations. (BTW, this "strict" option could also apply to the parsing of the search terms, which most of the time are under the control of the end user.) Ciprian. ___ notmuch mailing list notmuch@notmuchmail.org https://notmuchmail.org/mailman/listinfo/notmuch
Inconsistencies in handling command flags: `--flag=value` different than `--flag value`
[Again sorry for double reporting. BTW, where should I search for previous bugs? I've currently tried the mailing list archive.] Trying to play with `notmuch` from a wrapper, I've stumbled upon the following command line flags handling bug: notmuch show --format json --entire-thread true --body false -- 'cipr...@volution.ro' notmuch show --format json --entire-thread true --body=false -- 'cipr...@volution.ro' #=> yields nothing notmuch show --format json --entire-thread=true --body false -- 'cipr...@volution.ro' #=> yields some emails notmuch show --format json --entire-thread=true --body=false -- 'cipr...@volution.ro' #=> yields lots of emails I would expect that `--flag value` and `--flag=value` are equivalent, at least for the options that the manual states `--flag=(true|false)`. However based on the previous experiments it seems that using anything except `--flag=value` yields inconsistent results. Hope it helps, Ciprian. ___ notmuch mailing list notmuch@notmuchmail.org https://notmuchmail.org/mailman/listinfo/notmuch
Strip spaces in `tags` in `~/.notmuch-config` (and other fields)
[Sorry if I'm double reporting this. I've tried my best to search for previous discussions.] I've tried to manually edit my `~/.notmuch-config`, and I've seen that the field `tags` was written as `tags=unread;inbox;`. In order to increase readability I've decided to update my configuration file by adding spaces around `=` and `;` as in `tags = unread ; inbox ;`. Everything worked without a warning, until it didn't... :) What happened: all my emails are now tagged with `unread ` and ` inbox `; (i.e. whitespace in tags). Given that the `~/.notmuch-config` resembles an INI file, and given how lax the actual syntax is in general, I would suggest the following: * allow white-spaces around `[ section ]`, and `field = value`; * strip white-spaces (left and right) from values like `tags = unread ; inbox ;`; (but not infix like `tag = some tag ; some other tag;`;) * allow skipping the last `;` separator from `tags` and similar; Failing that, perhaps add a warning when parsing the configuration file. Hope it helps, Ciprian. ___ notmuch mailing list notmuch@notmuchmail.org https://notmuchmail.org/mailman/listinfo/notmuch
Alternative (raw) message store (i.e. instead of maildir)
On Tue, Aug 14, 2012 at 7:50 PM, Vladimir Marek wrote: >> On the other hand I strongly sustain having a more optimized >> backend for emails, especially for such cases. For example a >> BerkeleyDB would perfectly fit such a use case, especially if we store >> the body and the headers in separate databases. >> >> Just a small experiment, below are the R `summary(emails)` of the >> sizes of my 700k emails: >> >> Min. 1st Qu. Median Mean 3rd Qu. Max. >>8 4364 537411510 7042 3109 >> >> >> As seen 75% of the emails are below 7k, and this without any >> compression... >> >> Moreover we could organize the keys so that in a B-Tree structure >> the emails in the same thread are closer together... > > Now I'm not sure if you talk about some berkeley-db fuse filesystem or > direct support in notmuch. No tricks. :) I proposed -- better said queried if possible or at least wanted -- to have an internal interface (SPI) that any mail store would have to implement in order to be indexed and used by notmuch. I guess the interface would be quite lightweight, and would need just the following: * open store; * create a cursor iterating through all the emails, yielding only the keys; * read the envelope (as a byte blob) of a particular key; (used only for displaying thread lists, etc.;) * read the body (as a byte blob) of a particular key; * maybe create a cursor iterating over all those emails that have changed since a particular timestamp; > I don't have enough cycles to modify notmuch, > so I started to look at simpler (codewise) solution ... > > To summarize, what I personally want from the mail storage We need to make a distinction between current storage (like maildir) and archival storage (like the Zip or my proposal). > - ability to read and write mails It could be done through a small CLI over the proposed API. > - should work with mutt (or mutt-kz) This would eliminate any proposal not involving a FUSE wrapper... > - simple backup to windows drive (files can't contain double colon ':') This could be done via a dump like facility. (BerkeleyDB supports this natively through a tool.)
Alternative (raw) message store (i.e. instead of maildir)
On Tue, Aug 14, 2012 at 7:04 PM, Vladimir Marek wrote: >> > - fuse zip stores all changes in memory until unmounted >> > - fuse zip (and libzip for that matter) creates new temporary file when >> >updating archive, which takes considerable time when the archive is >> >very big. >> >> This isn't much of a hastle if you have maildir per time period and >> archive off. Maybe if you sync flags it may be... > > That might be interesting solution, maildir per time period. Although using a zip file through FUSE as a maildir store is not much better in my opinion. This is because it still doesn't solve the syscall overhead. For example just going through the list of files to find those that changed requires the following syscalls: * reading the next directory entry (which is amortized as it reads them in a batch, but the batch size is limited, should we say 1 syscall per 10 files?); * stat-ing the file; Now by adding FUSE we add an extra context switch for each syscall... Although this issue would be problematic only for reindexing, but still... > But still > fuse zip caches all the data until unmounted. So even with just reading > it keeps growing (I hope I'm not accusing fuse zip here, but this is my > understanding form the code). This could be simply alleviated by having > it periodically unmounted and mounted again (perhaps from cron). I think there is an option for FUSE mount to specify if the data should be cached by the kernel or not, as such this shouldn't be a problem for FUSE itself, except if the Zip FUSE handler does some extra caching.) >> > Of course this solution would have some disadvantages too, but for me >> > the advantages would win. At the moment I'm not sure if I want to >> > continue working on that. Maybe if there would be more interested guys >> >> I'm *really* tempted to investigate making this work for archived >> mail. Of course, the list of mounted file systems could get insane >> depending on granularity I guess... > > Well, if your granularity will be one archive per year of mail, it > should not be that bad ... On the other hand I strongly sustain having a more optimized backend for emails, especially for such cases. For example a BerkeleyDB would perfectly fit such a use case, especially if we store the body and the headers in separate databases. Just a small experiment, below are the R `summary(emails)` of the sizes of my 700k emails: Min. 1st Qu. Median Mean 3rd Qu. Max. 8 4364 537411510 7042 3109 As seen 75% of the emails are below 7k, and this without any compression... Moreover we could organize the keys so that in a B-Tree structure the emails in the same thread are closer together... Ciprian.
Re: Alternative (raw) message store (i.e. instead of maildir)
On Tue, Aug 14, 2012 at 7:50 PM, Vladimir Marek wrote: >> On the other hand I strongly sustain having a more optimized >> backend for emails, especially for such cases. For example a >> BerkeleyDB would perfectly fit such a use case, especially if we store >> the body and the headers in separate databases. >> >> Just a small experiment, below are the R `summary(emails)` of the >> sizes of my 700k emails: >> >> Min. 1st Qu. Median Mean 3rd Qu. Max. >>8 4364 537411510 7042 3109 >> >> >> As seen 75% of the emails are below 7k, and this without any >> compression... >> >> Moreover we could organize the keys so that in a B-Tree structure >> the emails in the same thread are closer together... > > Now I'm not sure if you talk about some berkeley-db fuse filesystem or > direct support in notmuch. No tricks. :) I proposed -- better said queried if possible or at least wanted -- to have an internal interface (SPI) that any mail store would have to implement in order to be indexed and used by notmuch. I guess the interface would be quite lightweight, and would need just the following: * open store; * create a cursor iterating through all the emails, yielding only the keys; * read the envelope (as a byte blob) of a particular key; (used only for displaying thread lists, etc.;) * read the body (as a byte blob) of a particular key; * maybe create a cursor iterating over all those emails that have changed since a particular timestamp; > I don't have enough cycles to modify notmuch, > so I started to look at simpler (codewise) solution ... > > To summarize, what I personally want from the mail storage We need to make a distinction between current storage (like maildir) and archival storage (like the Zip or my proposal). > - ability to read and write mails It could be done through a small CLI over the proposed API. > - should work with mutt (or mutt-kz) This would eliminate any proposal not involving a FUSE wrapper... > - simple backup to windows drive (files can't contain double colon ':') This could be done via a dump like facility. (BerkeleyDB supports this natively through a tool.) ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
Re: Alternative (raw) message store (i.e. instead of maildir)
On Tue, Aug 14, 2012 at 7:04 PM, Vladimir Marek wrote: >> > - fuse zip stores all changes in memory until unmounted >> > - fuse zip (and libzip for that matter) creates new temporary file when >> >updating archive, which takes considerable time when the archive is >> >very big. >> >> This isn't much of a hastle if you have maildir per time period and >> archive off. Maybe if you sync flags it may be... > > That might be interesting solution, maildir per time period. Although using a zip file through FUSE as a maildir store is not much better in my opinion. This is because it still doesn't solve the syscall overhead. For example just going through the list of files to find those that changed requires the following syscalls: * reading the next directory entry (which is amortized as it reads them in a batch, but the batch size is limited, should we say 1 syscall per 10 files?); * stat-ing the file; Now by adding FUSE we add an extra context switch for each syscall... Although this issue would be problematic only for reindexing, but still... > But still > fuse zip caches all the data until unmounted. So even with just reading > it keeps growing (I hope I'm not accusing fuse zip here, but this is my > understanding form the code). This could be simply alleviated by having > it periodically unmounted and mounted again (perhaps from cron). I think there is an option for FUSE mount to specify if the data should be cached by the kernel or not, as such this shouldn't be a problem for FUSE itself, except if the Zip FUSE handler does some extra caching.) >> > Of course this solution would have some disadvantages too, but for me >> > the advantages would win. At the moment I'm not sure if I want to >> > continue working on that. Maybe if there would be more interested guys >> >> I'm *really* tempted to investigate making this work for archived >> mail. Of course, the list of mounted file systems could get insane >> depending on granularity I guess... > > Well, if your granularity will be one archive per year of mail, it > should not be that bad ... On the other hand I strongly sustain having a more optimized backend for emails, especially for such cases. For example a BerkeleyDB would perfectly fit such a use case, especially if we store the body and the headers in separate databases. Just a small experiment, below are the R `summary(emails)` of the sizes of my 700k emails: Min. 1st Qu. Median Mean 3rd Qu. Max. 8 4364 537411510 7042 3109 As seen 75% of the emails are below 7k, and this without any compression... Moreover we could organize the keys so that in a B-Tree structure the emails in the same thread are closer together... Ciprian. ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
Alternative (raw) message store (i.e. instead of maildir)
On Sat, Aug 11, 2012 at 11:50 PM, Jameson Graef Rollins wrote: > On Sat, Aug 11 2012, Ciprian Dorin Craciun > wrote: >> My problem with it is that it doesn't scale... And I don't mean >> this in a theoretical sense, I mean it in the concrete one: I have >> about 661k emails... And a single `notmuch sync` takes a few tens of >> seconds... > > Hey, Ciprian. That sounds really slow, which makes me wonder if there > are other things going on here. > I have 155k messages, but notmuch new > takes a fraction of a second for me. This initial indexing certainly > takes a long time (hours potentially), but additions after that should > be really fast. What version of notmuch are you using? What version of > xapian? Don't think there is anything wrong here... Its just drags with the file system... So just to give a complete info: * hardware: Core i5, 8GiB RAM (7.5GiB of which is the FS cache), SSD (about 175MiB raw disk access); * `notmuch --version`: 0.13 (built from sources on latest ArchLinux); * `notmuch count`: 701820; * `notmuch new` (after adding 5925 new emails, at touching others): Processed 7017 total files in 3m 19s (35 files/sec.). Added 6061 new messages to the database. Detected 1116 file renames. * actually the entire thing took almost 5 minutes, but the first two it didn't display anything just acesing the disk; * `notmuch new` (another go, but this time I've `time`-d it): No new mail. real0m40.546s user0m4.523s sys 0m17.506s * `notmuch new` (yet another go, no change): No new mail. real0m39.190s user0m4.229s sys 0m17.697s * just to `du` the maildir (there are also 40k other files in other maildirs not included in this count): 8.7G.. real0m22.229s user0m1.023s sys 0m7.890s * on `new` no hooks are run; * the file system in cause is JFS; As such I doubt the problem is with notmuch itself, and I guess it's the file system interaction... Now I know I have a really obscure corner case, and I'm positively amazed on how good notmuch handles this situation. I just wandered if I could have fixed my problem by moving to an embedded DB, thus skipping all that syscall overhead... Ciprian.
Re: Alternative (raw) message store (i.e. instead of maildir)
On Sat, Aug 11, 2012 at 11:50 PM, Jameson Graef Rollins wrote: > On Sat, Aug 11 2012, Ciprian Dorin Craciun wrote: >> My problem with it is that it doesn't scale... And I don't mean >> this in a theoretical sense, I mean it in the concrete one: I have >> about 661k emails... And a single `notmuch sync` takes a few tens of >> seconds... > > Hey, Ciprian. That sounds really slow, which makes me wonder if there > are other things going on here. > I have 155k messages, but notmuch new > takes a fraction of a second for me. This initial indexing certainly > takes a long time (hours potentially), but additions after that should > be really fast. What version of notmuch are you using? What version of > xapian? Don't think there is anything wrong here... Its just drags with the file system... So just to give a complete info: * hardware: Core i5, 8GiB RAM (7.5GiB of which is the FS cache), SSD (about 175MiB raw disk access); * `notmuch --version`: 0.13 (built from sources on latest ArchLinux); * `notmuch count`: 701820; * `notmuch new` (after adding 5925 new emails, at touching others): Processed 7017 total files in 3m 19s (35 files/sec.). Added 6061 new messages to the database. Detected 1116 file renames. * actually the entire thing took almost 5 minutes, but the first two it didn't display anything just acesing the disk; * `notmuch new` (another go, but this time I've `time`-d it): No new mail. real0m40.546s user0m4.523s sys 0m17.506s * `notmuch new` (yet another go, no change): No new mail. real0m39.190s user0m4.229s sys 0m17.697s * just to `du` the maildir (there are also 40k other files in other maildirs not included in this count): 8.7G.. real0m22.229s user0m1.023s sys 0m7.890s * on `new` no hooks are run; * the file system in cause is JFS; As such I doubt the problem is with notmuch itself, and I guess it's the file system interaction... Now I know I have a really obscure corner case, and I'm positively amazed on how good notmuch handles this situation. I just wandered if I could have fixed my problem by moving to an embedded DB, thus skipping all that syscall overhead... Ciprian. ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
Alternative (raw) message store (i.e. instead of maildir)
On Sat, Aug 11, 2012 at 12:46 PM, Vladimir Marek wrote: > Hi, > > I have objections against maildir too, Just for the record I have nothing against maildir (or at least when compared to mbox format). On the contrary I find it quite easy to fiddle with... My problem with it is that it doesn't scale... And I don't mean this in a theoretical sense, I mean it in the concrete one: I have about 661k emails... And a single `notmuch sync` takes a few tens of seconds... (Of course my problem could be partially solved by moving to a fanout maildir folder, i.e. multiple maildirs. But this doesn't solve the scalability it just delays the problem...) > but I tried to tackle it from > different perspective. Store the maildir in zip file and use fuse-zip to > manage it. It works sort of but it has two major disadvantages: I also thought of using either FUSE or 9p for this. Unfortunately it doesn't quite solve my issue as seen above... Now about other hacks to my problem: * I'm aware that I can feed notmuch with individual file paths to be indexed, but it still needs a path where to find an email; * use the before mentioned fanout solution; * others? But regardless, having 600k emails on my disk (currently in the same folder) is insane... Moreover I would have loved to be able to use some Git plumbing as a store, or maybe CouchDB, etc... Ciprian.
Alternative (raw) message store (i.e. instead of maildir)
Hello all! My question -- rather a curiosity -- is if one could easily implement an alternative message store instead of maildir. (I actually have in mind a KV store like BerkeleyDB, or even a database like CouchDB...) (I'm not also implying the same for the index, which I'm aware is based on Xapian, which requires BerkeleyDB, which in turn needs a local file system.) After quickly looking over the code (2 minutes actually) I saw that currently this is not easily possible without touching a lot of files... Or am I wrong? Better said: is such an abstract email store interface on the to-do list, or even acceptable to have if someone provides it? Thanks, Ciprian.
Re: Alternative (raw) message store (i.e. instead of maildir)
On Sat, Aug 11, 2012 at 12:46 PM, Vladimir Marek wrote: > Hi, > > I have objections against maildir too, Just for the record I have nothing against maildir (or at least when compared to mbox format). On the contrary I find it quite easy to fiddle with... My problem with it is that it doesn't scale... And I don't mean this in a theoretical sense, I mean it in the concrete one: I have about 661k emails... And a single `notmuch sync` takes a few tens of seconds... (Of course my problem could be partially solved by moving to a fanout maildir folder, i.e. multiple maildirs. But this doesn't solve the scalability it just delays the problem...) > but I tried to tackle it from > different perspective. Store the maildir in zip file and use fuse-zip to > manage it. It works sort of but it has two major disadvantages: I also thought of using either FUSE or 9p for this. Unfortunately it doesn't quite solve my issue as seen above... Now about other hacks to my problem: * I'm aware that I can feed notmuch with individual file paths to be indexed, but it still needs a path where to find an email; * use the before mentioned fanout solution; * others? But regardless, having 600k emails on my disk (currently in the same folder) is insane... Moreover I would have loved to be able to use some Git plumbing as a store, or maybe CouchDB, etc... Ciprian. ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
Alternative (raw) message store (i.e. instead of maildir)
Hello all! My question -- rather a curiosity -- is if one could easily implement an alternative message store instead of maildir. (I actually have in mind a KV store like BerkeleyDB, or even a database like CouchDB...) (I'm not also implying the same for the index, which I'm aware is based on Xapian, which requires BerkeleyDB, which in turn needs a local file system.) After quickly looking over the code (2 minutes actually) I saw that currently this is not easily possible without touching a lot of files... Or am I wrong? Better said: is such an abstract email store interface on the to-do list, or even acceptable to have if someone provides it? Thanks, Ciprian. ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
Exporting a single email as JSON
On Sun, Dec 11, 2011 at 01:19, Jameson Graef Rollins wrote: > On Sun, 11 Dec 2011 00:46:51 +0200, Ciprian Dorin Craciun gmail.com> wrote: >> ? ? * in my use-case I would need each line of the output to be a >> standalone JSON object of an individual message; (thus I can script >> with Bash `notmuch ... | while read message ; do ... ; done`;) > > This is actually a slightly different idea than what I thought you were > originally proposing. ?Outputting a series of json objects rather than a > single list has been talked about for notmuch search as well. ?I'm don't > have a good sense of whether this is a sensible idea or not. > >> ? ? * maybe someone else would need that the output to contain >> **exactly one** such message (maybe the first); > > This is what I thought we were talking about. ?This is an option I would > like to see, at least. Indeed exporting multiple messages as top / root JSON objects isn't quite usable except limited import / export use-cases, thus what you propose is more sensible. And in the end by having this possibility I could easily implement the solution I'm seeking as simple as: notmuch --output=messages -- {criteria} \ | xargs -L 1 -- notmuch show --format=json -- \ | while read message_json ; do ... ; done But there is only one problem with such an approach: efficiency. With the snippet above I'll have as many `notmuch` process executions as messages. (And I do have quite a few of them.) Thus although Notmuch is quite fast -- as in human imperceptible -- still opening and closing the Xapian database so many times does have quite an overhead. So in the end I think a discussion about the needed (/ wanted) use-cases would be better. Ciprian. P.S.: I could help implement (or at least prototype) some of these use-cases. Thus I'll watch over the thread you've pointed me to.
Exporting a single email as JSON
On Sat, Dec 10, 2011 at 22:15, Jameson Graef Rollins wrote: > On Sat, 10 Dec 2011 20:32:22 +0200, Ciprian Dorin Craciun gmail.com> wrote: >> ? ? Quick question: why isn't it reasonable to export a **single** >> email in JSON format (by using the `show` sub-command)? (I mean I >> understand that in order to be able to correctly parse the output we >> need only one "object" (i.e. a list of threads, containing a list of >> emails, etc.). But there might be use cases in which we need a >> "twist".) > > Hi, Ciprian. ?I agree that it would be nice too have the ability to > output single messages without the rest of their thread. ?I have on > occasion wanted this functionality, but never enough to get around to > implementing it. ?It definitely wouldn't be that hard to implement, > though. > > The notmuch show function is actually going through a pretty major > overhaul at the moment. ?I bet as soon as that's done we can get some > sort of single-message output going. > > jamie. I've given a quick look into `notmuch-show.c` (commit from December 4) and indeed it seems quite trivial to add new formats. Thus I wonder: a) Is the code suitable for experimenting such a feature? (I mean is the "overhaul" almost done, or still in progress?) b) What would be the estimate for the "overhaul" completion? (To start prototyping such a feature...) c) Would someone else be interested in such a feature? (Or it's something so remote that only the two of us stumbled upon it?) I think it's quite hard to get this feature "right". I.e. I can see the following different -- but equally likely -- use-cases: * in my use-case I would need each line of the output to be a standalone JSON object of an individual message; (thus I can script with Bash `notmuch ... | while read message ; do ... ; done`;) * maybe someone else would need that the output to contain **exactly one** such message (maybe the first); * and maybe for someone else the use case involves having no `--entire-thread` by default; * further more someone else would actually prefer a "flatten" list of messages (not the currently nested list); * or maybe the separator in the first use case should be `\0` instead of `\n`; Thanks, Ciprian. P.S.: I think all sub-commands that output line-feed separated records should also have the option to split them instead with `\0`. (I.e. `xargs` insists upon this I think, if not it separates by space or new-line.)
Exporting a single email as JSON
Hello all! Quick question: why isn't it reasonable to export a **single** email in JSON format (by using the `show` sub-command)? (I mean I understand that in order to be able to correctly parse the output we need only one "object" (i.e. a list of threads, containing a list of emails, etc.). But there might be use cases in which we need a "twist".) My current use case is: I want to import the JSON representation of my emails in CouchDB, each email in a single document. And as I already have my emails indexed with Notmuch, I hopped that -- with the help of some Bash-fu and Curl -- it would have been trivial to instruct notmuch to export all emails matching a certain criteria as JSON... What would have been perfect in this case: each matching email (with or without the `--entire-thread` flag) should be exported as a single JSON object on a single line, thus each different email on a single line. Thus I could have easily used `notmuch show --output=json-line -- {criteria} | xargs -L 1 -- curl {couchdb-magic}`.) For now I'll pre-process the current output in JavaScript. Thanks, Ciprian.
Re: Exporting a single email as JSON
On Sun, Dec 11, 2011 at 01:19, Jameson Graef Rollins wrote: > On Sun, 11 Dec 2011 00:46:51 +0200, Ciprian Dorin Craciun > wrote: >> * in my use-case I would need each line of the output to be a >> standalone JSON object of an individual message; (thus I can script >> with Bash `notmuch ... | while read message ; do ... ; done`;) > > This is actually a slightly different idea than what I thought you were > originally proposing. Outputting a series of json objects rather than a > single list has been talked about for notmuch search as well. I'm don't > have a good sense of whether this is a sensible idea or not. > >> * maybe someone else would need that the output to contain >> **exactly one** such message (maybe the first); > > This is what I thought we were talking about. This is an option I would > like to see, at least. Indeed exporting multiple messages as top / root JSON objects isn't quite usable except limited import / export use-cases, thus what you propose is more sensible. And in the end by having this possibility I could easily implement the solution I'm seeking as simple as: notmuch --output=messages -- {criteria} \ | xargs -L 1 -- notmuch show --format=json -- \ | while read message_json ; do ... ; done But there is only one problem with such an approach: efficiency. With the snippet above I'll have as many `notmuch` process executions as messages. (And I do have quite a few of them.) Thus although Notmuch is quite fast -- as in human imperceptible -- still opening and closing the Xapian database so many times does have quite an overhead. So in the end I think a discussion about the needed (/ wanted) use-cases would be better. Ciprian. P.S.: I could help implement (or at least prototype) some of these use-cases. Thus I'll watch over the thread you've pointed me to. ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
Re: Exporting a single email as JSON
On Sat, Dec 10, 2011 at 22:15, Jameson Graef Rollins wrote: > On Sat, 10 Dec 2011 20:32:22 +0200, Ciprian Dorin Craciun > wrote: >> Quick question: why isn't it reasonable to export a **single** >> email in JSON format (by using the `show` sub-command)? (I mean I >> understand that in order to be able to correctly parse the output we >> need only one "object" (i.e. a list of threads, containing a list of >> emails, etc.). But there might be use cases in which we need a >> "twist".) > > Hi, Ciprian. I agree that it would be nice too have the ability to > output single messages without the rest of their thread. I have on > occasion wanted this functionality, but never enough to get around to > implementing it. It definitely wouldn't be that hard to implement, > though. > > The notmuch show function is actually going through a pretty major > overhaul at the moment. I bet as soon as that's done we can get some > sort of single-message output going. > > jamie. I've given a quick look into `notmuch-show.c` (commit from December 4) and indeed it seems quite trivial to add new formats. Thus I wonder: a) Is the code suitable for experimenting such a feature? (I mean is the "overhaul" almost done, or still in progress?) b) What would be the estimate for the "overhaul" completion? (To start prototyping such a feature...) c) Would someone else be interested in such a feature? (Or it's something so remote that only the two of us stumbled upon it?) I think it's quite hard to get this feature "right". I.e. I can see the following different -- but equally likely -- use-cases: * in my use-case I would need each line of the output to be a standalone JSON object of an individual message; (thus I can script with Bash `notmuch ... | while read message ; do ... ; done`;) * maybe someone else would need that the output to contain **exactly one** such message (maybe the first); * and maybe for someone else the use case involves having no `--entire-thread` by default; * further more someone else would actually prefer a "flatten" list of messages (not the currently nested list); * or maybe the separator in the first use case should be `\0` instead of `\n`; Thanks, Ciprian. P.S.: I think all sub-commands that output line-feed separated records should also have the option to split them instead with `\0`. (I.e. `xargs` insists upon this I think, if not it separates by space or new-line.) ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
Exporting a single email as JSON
Hello all! Quick question: why isn't it reasonable to export a **single** email in JSON format (by using the `show` sub-command)? (I mean I understand that in order to be able to correctly parse the output we need only one "object" (i.e. a list of threads, containing a list of emails, etc.). But there might be use cases in which we need a "twist".) My current use case is: I want to import the JSON representation of my emails in CouchDB, each email in a single document. And as I already have my emails indexed with Notmuch, I hopped that -- with the help of some Bash-fu and Curl -- it would have been trivial to instruct notmuch to export all emails matching a certain criteria as JSON... What would have been perfect in this case: each matching email (with or without the `--entire-thread` flag) should be exported as a single JSON object on a single line, thus each different email on a single line. Thus I could have easily used `notmuch show --output=json-line -- {criteria} | xargs -L 1 -- curl {couchdb-magic}`.) For now I'll pre-process the current output in JavaScript. Thanks, Ciprian. ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
RFC: notmuch powered (personal) (end-to-end) e-mail system
Hello all! (Sorry for the long email.) I'm "struggling" for some time to get rid of the current "de-facto" email solutions (i.e. GMail, Zimbra), and I've passively observed for some time the notmuch project and community. Although I've forwarded all my email to a single account, and I'm currently mirroring my GMail account locally (by using `mbsync`), index it by using notmuch, and I collect spam mails for later filter training, unfortunately I'm unable to "convert" because the current notmuch-powered solutions have (some of) the following shortcomings (I don't want to offend anyone, so please take these as observations): * the most feature full UI is the Emacs one -- thus limited remote access (I mean from an arbitrary computer with only a web-browser); (and I'm not a very big fan of Emacs;) * most are still dependent on external IMAP systems -- this is not a problem with notmuch itself, but for the integrating clients; * SPAM -- as above -- is not integrated; * filtering (tag applying) is not automatic (as in integrated in notmuch itself or the client), but triggered through external scripts; As such I'm thinking on implementing a custom end-to-end email system and I would like to hear your feedback before embarking on such a task. I'm targeting the following features: * (inbound) SMTP integration, thus once an email is received it is automatically pushed through the system; (I'm primarily targeting those users that afford to run their own SMTP server; but the solution could still be adapted for those that only want the other features;) * automatic spam filtering, and tag applying; * automatic email triggers based on tags (such as user notifications, forwarding, etc.) * remote RPC-like access to the whole system; * remote Web user interface; About the overall architecture I'm thinking on adopting the following: * in general the whole system is decomposed in independent components (long-lived OS daemons) that each one does a particular job (see below); * all the components communicate between each-other through a message queue system (for example ZeroMQ or RabbitMQ); * all the communication is JSON based; The components would be: * SMTP inbound gateway -- for example I could take qmail or Postfix and replace the delivery agent with a custom process that pushes the email into the system; (any other solution suggestions?); * email store -- as the name suggests it is a simple key-value-like store that should persist raw email-messages; it should be as robust as possible, and its contents should be the only thing needed to reconstruct all the other derived data; (I could use here a simple process that maintains a maildir, I could go also with a BerkeleyDB wrapper, or even something more sophisticated;) * spam filter -- which either classifies the email or trains the spam filter; (for example I would use bogofilter;) * email index -- this is where notmuch would come into play; it would be fed with emails, which it would automatically apply tags and issue trigger notifications based on tags; it also maintains a set of filters and tags to automatically apply; * (maybe) a coordinator that should delegate and monitor requests to the above components; but if I'm using RabbitMQ and carefully designing the above components, they could drive each other; * restful web service that would intermediate access to all the above components; For now I have the following uncertainties: * how should I handle multiple users? I think each user should have it's own store / notmuch / bogofilter instance (at least in terms of storage if not even in terms of separate daemon); * should I keep the emails is a file-system, or a key-value store? (the file-system is more bug-free, but I'm confident that a BerkeleyDB instance would be more efficient); * should I use libnotmuch or for starters just make a notmuch tool wrapper; * and the most pressing one, transactions: I would like that at no point does a message get half processed or lost; as such I need notmuch to behave transactionally -- indexing the message and tagging it should be atomic and durable; (is there a way with libnotmuch to control the underlaying BerkeleyDB database?) Suggestions? Considerations? Ciprian.
RFC: notmuch powered (personal) (end-to-end) e-mail system
Hello all! (Sorry for the long email.) I'm "struggling" for some time to get rid of the current "de-facto" email solutions (i.e. GMail, Zimbra), and I've passively observed for some time the notmuch project and community. Although I've forwarded all my email to a single account, and I'm currently mirroring my GMail account locally (by using `mbsync`), index it by using notmuch, and I collect spam mails for later filter training, unfortunately I'm unable to "convert" because the current notmuch-powered solutions have (some of) the following shortcomings (I don't want to offend anyone, so please take these as observations): * the most feature full UI is the Emacs one -- thus limited remote access (I mean from an arbitrary computer with only a web-browser); (and I'm not a very big fan of Emacs;) * most are still dependent on external IMAP systems -- this is not a problem with notmuch itself, but for the integrating clients; * SPAM -- as above -- is not integrated; * filtering (tag applying) is not automatic (as in integrated in notmuch itself or the client), but triggered through external scripts; As such I'm thinking on implementing a custom end-to-end email system and I would like to hear your feedback before embarking on such a task. I'm targeting the following features: * (inbound) SMTP integration, thus once an email is received it is automatically pushed through the system; (I'm primarily targeting those users that afford to run their own SMTP server; but the solution could still be adapted for those that only want the other features;) * automatic spam filtering, and tag applying; * automatic email triggers based on tags (such as user notifications, forwarding, etc.) * remote RPC-like access to the whole system; * remote Web user interface; About the overall architecture I'm thinking on adopting the following: * in general the whole system is decomposed in independent components (long-lived OS daemons) that each one does a particular job (see below); * all the components communicate between each-other through a message queue system (for example ZeroMQ or RabbitMQ); * all the communication is JSON based; The components would be: * SMTP inbound gateway -- for example I could take qmail or Postfix and replace the delivery agent with a custom process that pushes the email into the system; (any other solution suggestions?); * email store -- as the name suggests it is a simple key-value-like store that should persist raw email-messages; it should be as robust as possible, and its contents should be the only thing needed to reconstruct all the other derived data; (I could use here a simple process that maintains a maildir, I could go also with a BerkeleyDB wrapper, or even something more sophisticated;) * spam filter -- which either classifies the email or trains the spam filter; (for example I would use bogofilter;) * email index -- this is where notmuch would come into play; it would be fed with emails, which it would automatically apply tags and issue trigger notifications based on tags; it also maintains a set of filters and tags to automatically apply; * (maybe) a coordinator that should delegate and monitor requests to the above components; but if I'm using RabbitMQ and carefully designing the above components, they could drive each other; * restful web service that would intermediate access to all the above components; For now I have the following uncertainties: * how should I handle multiple users? I think each user should have it's own store / notmuch / bogofilter instance (at least in terms of storage if not even in terms of separate daemon); * should I keep the emails is a file-system, or a key-value store? (the file-system is more bug-free, but I'm confident that a BerkeleyDB instance would be more efficient); * should I use libnotmuch or for starters just make a notmuch tool wrapper; * and the most pressing one, transactions: I would like that at no point does a message get half processed or lost; as such I need notmuch to behave transactionally -- indexing the message and tagging it should be atomic and durable; (is there a way with libnotmuch to control the underlaying BerkeleyDB database?) Suggestions? Considerations? Ciprian. ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
`notmuch setup` replaces `~/.notmuch-config` instead of truncating it
On Tue, Nov 16, 2010 at 22:42, Daniel Kahn Gillmor wrote: > On 11/16/2010 03:37 PM, Daniel Kahn Gillmor wrote: >> On 11/16/2010 03:26 PM, Ciprian Dorin, Craciun wrote: >>> ? ? So in the light of the above quoted "glitches", my question is: >>> due to the small chance of a power loss happening right when we write >>> such a small file, doesn't the inconvenience weight more than the >>> (fairly remote probable) file loss? >> >> What inconvenience? ?The inconvenience of writing the code correctly? > > Ah sorry -- on re-reading, i see you're probably referring to the > inconvenience of breaking hardlinks, dropping permissions, ACLs and > other metadata. Yes, exactly this is what I'm referring to: the meta-data that is not copied when a new file is created. > That's an open question, as far as i'm concerned. ?it'd be nice if there > was a way to "clone" a file's permissions and metadata to get the best > of both worlds. ?maybe someone knows some tricks to do that? > > ? ? ? ?--dkg
`notmuch setup` replaces `~/.notmuch-config` instead of truncating it
On Tue, Nov 16, 2010 at 22:37, Daniel Kahn Gillmor wrote: > On 11/16/2010 03:26 PM, Ciprian Dorin, Craciun wrote: >> ? ? P.S.: I say "pseudo" atomic because only the rename is atomic, >> thus in order to override file `a` for the target file `b` which >> exists, we must execute two **non-atomic** operations as a whole, but >> each atomic in part, rename operations: make `b` -> `c`, and then >> rename `a` -> `b`. So there is actually a small time-frame when I can >> be left with two files (`a` and `c`), none of which is my config file >> `b`. (This can be solved when opening the config file by checking if >> there isn't any leftover `c` or `a` file, in which case I take the `a` >> file and complete the rename.) > > There is only one ".notmuch-config" entry in the inode directory that is > your homedir. ?it points either to the old file, or the new file. ?it > cannot point to both, and it will not point to anything but those two > possibilities. ?This is what the atomicity of the operation is expected > to guarantee. > > ? ? ? ?--dkg Actually I've been wrong about this... I've thought that the way the file is "overwritten" is actually done by either two `rename` calls or by `link`, followed by another `link`, and then finally an `unlink` (this is what I've tried to explain in my previous email). In fact I've thought that the `rename` OS call can't overwrite a file if it exists. But after reading the man page of `rename(2)` -- quoted below -- I was indeed wrong to call the atomicity as being "pseudo". int rename(const char *oldpath, const char *newpath); ... If newpath already exists it will be atomically replaced (subject to a few conditions; see ERRORS below), so that there is no point at which another process attempting to access newpath will find it missing. ... If newpath exists but the operation fails for some reason rename() guarantees to leave an instance of newpath in place. So indeed the behavior is completely atomic. Ciprian.
`notmuch setup` replaces `~/.notmuch-config` instead of truncating it
On Tue, Nov 16, 2010 at 21:09, Carl Worth wrote: > On Tue, 16 Nov 2010 15:33:30 +0200, "Ciprian Dorin, Craciun" at gmail.com> wrote: >> ? ? So my question is: is this behaviour (of deleting the file and >> creating a new one) deliberate? If not, could it be fixed (I could >> provide a patch) to just update the file in place? > > Daniel gave the perfect answer later in the thread. It is intentional to > replace the file with a new, complete version, (to avoid loss/corruption > of the file if "notmuch setup" is interrupted). But we should fix this > to replace the target of any symlinks. > > -Carl I understand now the reason for your file replacement choice. (I'll look over tomorrow to see if I can provide a patch on the line Daniel has described). But -- in general, and totally overlooking the "pseudo" atomic effect obtained from of POSIX file semantics -- doesn't this practice mislead some software (like backup systems, etc.) that would rely maybe on the inode number as part of the identity of the file? Moreover what if the user has set any ACL's or extended attributes on the file, wouldn't these be lost? (Wouldn't also SELinux be bothered?) So after browsing through the source code, I've found inside `notmuch-config.c`, inside the function `notmuch_config_save`, the call which actually overrides the file: `g_file_set_contents (config->filename, data, length, &error)`. Now searching for the documentation of this function, I've stumbled upon, from which I cite (I've never used glib before, so maybe the link is not the best one): http://library.gnome.org/devel/glib/unstable/glib-File-Utilities.html On Unix, if filename already exists hard links to filename will break. Also since the file is recreated, existing permissions, access control lists, metadata etc. may be lost. If filename is a symbolic link, the link itself will be replaced, not the linked file. So in the light of the above quoted "glitches", my question is: due to the small chance of a power loss happening right when we write such a small file, doesn't the inconvenience weight more than the (fairly remote probable) file loss? (I must admit I've lost once the `/etc/network/interfaces` file after an edit and immediately after a quick cold reboot, but it was my fault as I've not sync-ed the file system.) Ciprian. P.S.: I say "pseudo" atomic because only the rename is atomic, thus in order to override file `a` for the target file `b` which exists, we must execute two **non-atomic** operations as a whole, but each atomic in part, rename operations: make `b` -> `c`, and then rename `a` -> `b`. So there is actually a small time-frame when I can be left with two files (`a` and `c`), none of which is my config file `b`. (This can be solved when opening the config file by checking if there isn't any leftover `c` or `a` file, in which case I take the `a` file and complete the rename.)
`notmuch setup` replaces `~/.notmuch-config` instead of truncating it
Hello all! First congratulations for the nice software! I hardly wait for a notmuch native (i.e. libnotmuch) and curses client (like `ner`) to become more stable, and thus I'll be able to ditch GMail. :) But until then a small glitch... While upgrading from notmuch 0.4 to 0.5, I've re-runned `notmuch config` as suggested in the release email. But in my particular case `~/.notmuch-config` is symlinked to an applications configuration directory which is versioned. Thus I've expected than when notmuch updates the config, it opens it for read-write, but with the truncation flag (which as a consequence would have modified the symlinked file). But instead it deleted the symlink, and replaced it with a newly created file (thus breaking my custom configuration backup system.) So my question is: is this behaviour (of deleting the file and creating a new one) deliberate? If not, could it be fixed (I could provide a patch) to just update the file in place? Thanks, Ciprian.
Re: `notmuch setup` replaces `~/.notmuch-config` instead of truncating it
On Tue, Nov 16, 2010 at 22:42, Daniel Kahn Gillmor wrote: > On 11/16/2010 03:37 PM, Daniel Kahn Gillmor wrote: >> On 11/16/2010 03:26 PM, Ciprian Dorin, Craciun wrote: >>> So in the light of the above quoted "glitches", my question is: >>> due to the small chance of a power loss happening right when we write >>> such a small file, doesn't the inconvenience weight more than the >>> (fairly remote probable) file loss? >> >> What inconvenience? The inconvenience of writing the code correctly? > > Ah sorry -- on re-reading, i see you're probably referring to the > inconvenience of breaking hardlinks, dropping permissions, ACLs and > other metadata. Yes, exactly this is what I'm referring to: the meta-data that is not copied when a new file is created. > That's an open question, as far as i'm concerned. it'd be nice if there > was a way to "clone" a file's permissions and metadata to get the best > of both worlds. maybe someone knows some tricks to do that? > > --dkg ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
Re: `notmuch setup` replaces `~/.notmuch-config` instead of truncating it
On Tue, Nov 16, 2010 at 22:37, Daniel Kahn Gillmor wrote: > On 11/16/2010 03:26 PM, Ciprian Dorin, Craciun wrote: >> P.S.: I say "pseudo" atomic because only the rename is atomic, >> thus in order to override file `a` for the target file `b` which >> exists, we must execute two **non-atomic** operations as a whole, but >> each atomic in part, rename operations: make `b` -> `c`, and then >> rename `a` -> `b`. So there is actually a small time-frame when I can >> be left with two files (`a` and `c`), none of which is my config file >> `b`. (This can be solved when opening the config file by checking if >> there isn't any leftover `c` or `a` file, in which case I take the `a` >> file and complete the rename.) > > There is only one ".notmuch-config" entry in the inode directory that is > your homedir. it points either to the old file, or the new file. it > cannot point to both, and it will not point to anything but those two > possibilities. This is what the atomicity of the operation is expected > to guarantee. > > --dkg Actually I've been wrong about this... I've thought that the way the file is "overwritten" is actually done by either two `rename` calls or by `link`, followed by another `link`, and then finally an `unlink` (this is what I've tried to explain in my previous email). In fact I've thought that the `rename` OS call can't overwrite a file if it exists. But after reading the man page of `rename(2)` -- quoted below -- I was indeed wrong to call the atomicity as being "pseudo". int rename(const char *oldpath, const char *newpath); ... If newpath already exists it will be atomically replaced (subject to a few conditions; see ERRORS below), so that there is no point at which another process attempting to access newpath will find it missing. ... If newpath exists but the operation fails for some reason rename() guarantees to leave an instance of newpath in place. So indeed the behavior is completely atomic. Ciprian. ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
Re: `notmuch setup` replaces `~/.notmuch-config` instead of truncating it
On Tue, Nov 16, 2010 at 21:09, Carl Worth wrote: > On Tue, 16 Nov 2010 15:33:30 +0200, "Ciprian Dorin, Craciun" > wrote: >> So my question is: is this behaviour (of deleting the file and >> creating a new one) deliberate? If not, could it be fixed (I could >> provide a patch) to just update the file in place? > > Daniel gave the perfect answer later in the thread. It is intentional to > replace the file with a new, complete version, (to avoid loss/corruption > of the file if "notmuch setup" is interrupted). But we should fix this > to replace the target of any symlinks. > > -Carl I understand now the reason for your file replacement choice. (I'll look over tomorrow to see if I can provide a patch on the line Daniel has described). But -- in general, and totally overlooking the "pseudo" atomic effect obtained from of POSIX file semantics -- doesn't this practice mislead some software (like backup systems, etc.) that would rely maybe on the inode number as part of the identity of the file? Moreover what if the user has set any ACL's or extended attributes on the file, wouldn't these be lost? (Wouldn't also SELinux be bothered?) So after browsing through the source code, I've found inside `notmuch-config.c`, inside the function `notmuch_config_save`, the call which actually overrides the file: `g_file_set_contents (config->filename, data, length, &error)`. Now searching for the documentation of this function, I've stumbled upon, from which I cite (I've never used glib before, so maybe the link is not the best one): http://library.gnome.org/devel/glib/unstable/glib-File-Utilities.html On Unix, if filename already exists hard links to filename will break. Also since the file is recreated, existing permissions, access control lists, metadata etc. may be lost. If filename is a symbolic link, the link itself will be replaced, not the linked file. So in the light of the above quoted "glitches", my question is: due to the small chance of a power loss happening right when we write such a small file, doesn't the inconvenience weight more than the (fairly remote probable) file loss? (I must admit I've lost once the `/etc/network/interfaces` file after an edit and immediately after a quick cold reboot, but it was my fault as I've not sync-ed the file system.) Ciprian. P.S.: I say "pseudo" atomic because only the rename is atomic, thus in order to override file `a` for the target file `b` which exists, we must execute two **non-atomic** operations as a whole, but each atomic in part, rename operations: make `b` -> `c`, and then rename `a` -> `b`. So there is actually a small time-frame when I can be left with two files (`a` and `c`), none of which is my config file `b`. (This can be solved when opening the config file by checking if there isn't any leftover `c` or `a` file, in which case I take the `a` file and complete the rename.) ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch
`notmuch setup` replaces `~/.notmuch-config` instead of truncating it
Hello all! First congratulations for the nice software! I hardly wait for a notmuch native (i.e. libnotmuch) and curses client (like `ner`) to become more stable, and thus I'll be able to ditch GMail. :) But until then a small glitch... While upgrading from notmuch 0.4 to 0.5, I've re-runned `notmuch config` as suggested in the release email. But in my particular case `~/.notmuch-config` is symlinked to an applications configuration directory which is versioned. Thus I've expected than when notmuch updates the config, it opens it for read-write, but with the truncation flag (which as a consequence would have modified the symlinked file). But instead it deleted the symlink, and replaced it with a newly created file (thus breaking my custom configuration backup system.) So my question is: is this behaviour (of deleting the file and creating a new one) deliberate? If not, could it be fixed (I could provide a patch) to just update the file in place? Thanks, Ciprian. ___ notmuch mailing list notmuch@notmuchmail.org http://notmuchmail.org/mailman/listinfo/notmuch