Add support for `--limit=N` to `notmuch show`

2020-05-01 Thread Ciprian Dorin Craciun
In a previous email (about `thread:...` field in JSON output of
`notmuch show`), I described one of my use-cases for notmuch.

Now extending upon that, if one would to implement an email client
that provides the user with search, there are two approaches:

* use `notmuch search -- {query}` and based on that output display a
thread list like GMail does;

* use `notmuch show -- {query}` and based on that display a page with
all emails that matched, grouping them by thread;  (I prefer this
variant, as it gives me a quicker glance if I search for something
specific;)



Now the problem with `notmuch show` is that if I give it a too "broad"
query like `*` it will chew a lot of CPU and RAM (and in my case
eventually crash).

`notmuch search` does have a `--limit=N` argument that limits the
search output only to the first `N` items.  My feature request is to
add such a flag also to `notmuch show` that should:

* limit the number of threads in all cases except `--format=raw`;
* not be allowed in case of `--format=raw` or `--part=P`;



As a work-around I could use `notmuch search --output=threads
--limit={limit} -- {query}`, then take those thread ID's and issue an
`notmuch show -- thread:... thread:...`.  But this has the following
problems:
* it requires two `notmuch` CLI calls;
* and most importantly it renders the `--entire-thread=false` feature
useless;  (as not the entire threads are matched by `notmuch show` as
opposed only to those matched by `notmuch search`;)

Thanks,
Ciprian.
___
notmuch mailing list
notmuch@notmuchmail.org
https://notmuchmail.org/mailman/listinfo/notmuch


Re: Add support for `thread` field in `notmuch show`

2020-05-01 Thread Ciprian Dorin Craciun
On Fri, May 1, 2020 at 3:09 PM David Bremner  wrote:
> Ciprian Dorin Craciun  writes:
> > I know that one can use `thread:{id:MESSAGE_ID}` to achieve the same
> > result, however:
> > * it is somewhat cumbersome for the integrator;
>
> Out of curiousity, what is harder about it? In both cases you have to
> extra one value from the JSON.


It is cumbersome because:
* for once, now you need for each email to run a new `notmuch search`
instance to get that email's thread id;  this is very sub-optimal when
you have more than a handful emails;
* secondly, it adds more code to the client;

To understand my use-case:  I currently intend to use `notmuch show
--format=json` to search my emails, and based on that to generate a
nice HTML page, displaying all found emails.  Now I want to include in
each email's section a link to only display the thread.

In order to do that, I either have to generate (by usin the technique
described above) the `thread:...` for each of those emails, which in
turn generates one CLI call per email.  (And 99% of the time perhaps I
don't even click the thread.)  (Another option would be to use the
`thread:{id:...}` for that link, but I find this quite a hack.)



> > * having the thread identifier explicitly, could be used as a key in a
> > cache, or other internal lookups;
> >
> > In fact the only way one can extract the thread identifier via the
> > `notmuch` CLI is to use `notmuch search --output=threads --
> > id:MESSAGE_ID`
>
> Offhand I have no strong objection to someone (who is not me) adding
> this. I think it's important to be aware that thread id's are ephemeral,
> and subject to change e.g. if the database is re-built from
> scratch.

I understand that `thread:...` is tied to a particular database, but
that shouldn't be an issue, as people don't regenerate often their
databases, and the caches are usually short-lived.


This weekend I'll try to take a stab at adding this to `notmuch`.

Ciprian.
___
notmuch mailing list
notmuch@notmuchmail.org
https://notmuchmail.org/mailman/listinfo/notmuch


Re: Any updates on the `List-Id` indexing feature?

2020-04-29 Thread Ciprian Dorin Craciun
On Wed, Apr 29, 2020 at 8:08 PM David Bremner  wrote:
> > I've also read the FAQ:
> > * https://notmuchmail.org/faq/#index8h2
>
> Oops, that needs to be updated.
>
> It is implemented. See notmuch-config(1), under "index.header"


That's perfect.  However the `search-terms` man pages doesn't say how
it should be used.

Should I gather (from the `config` manpage) that these "prefixes"
should always start with a upper-case letter, as in: `notmuch search
-- 'List:some-id'`?

Thanks,
Ciprian.
___
notmuch mailing list
notmuch@notmuchmail.org
https://notmuchmail.org/mailman/listinfo/notmuch


Any updates on the `List-Id` indexing feature?

2020-04-29 Thread Ciprian Dorin Craciun
I've searched the mailing list archives about the `List-Id` feature:
* https://www.mail-archive.com/notmuch@notmuchmail.org/msg43214.html
* https://www.mail-archive.com/notmuch@notmuchmail.org/msg22092.html
* https://www.mail-archive.com/notmuch@notmuchmail.org/msg14146.html

I've also read the FAQ:
* https://notmuchmail.org/faq/#index8h2

Although I understand why it's not implemented right now, however
given how important it is to correctly handle emails from mailing
lists, I wanted to ask if there was any progress made in this regard?

Should I try to handle it myself in my own workflow, or is it on the
"roadmap".  :)

Thanks,
Ciprian.
___
notmuch mailing list
notmuch@notmuchmail.org
https://notmuchmail.org/mailman/listinfo/notmuch


Add support for `thread` field in `notmuch show`

2020-04-29 Thread Ciprian Dorin Craciun
According to the `devel/schemata` the message object doesn't contain
the thread identifier to which it was assigned in the database.

Sometimes, for example in an UI that displays a search result at
message level, it would be useful to know the thread each message
belongs to, so the user can easily switch to the entire thread.

I know that one can use `thread:{id:MESSAGE_ID}` to achieve the same
result, however:
* it is somewhat cumbersome for the integrator;
* having the thread identifier explicitly, could be used as a key in a
cache, or other internal lookups;

In fact the only way one can extract the thread identifier via the
`notmuch` CLI is to use `notmuch search --output=threads --
id:MESSAGE_ID`

Ciprian.
___
notmuch mailing list
notmuch@notmuchmail.org
https://notmuchmail.org/mailman/listinfo/notmuch


Re: Inconsistencies in handling command flags: `--flag=value` different than `--flag value`

2020-04-29 Thread Ciprian Dorin Craciun
On Wed, Apr 29, 2020 at 6:39 PM David Bremner  wrote:
> I guess I'm a bit leery of removing UI features that presumably at least
> some people rely on. It's pretty upsetting to have sofware break one's
> muscle memory.


I think there are two complete different use-cases for the `notmuch` binary:
* a simple CLI to query the database, in which case the current flags seem OK;
* a "poor-mans" API to query the database, more bellow;

(I know there already exists an `libnotmuch` API accessible in many
programming languages.  However for prototyping, and even for safety
and robustness, when performance isn't an issue, I find the tool-based
approach much more resilient.)

Now about the "API" use-case,  I assume that at the moment many users
have already integrated `notmuch` as it is with the current flags and
behaviour.  Thus I agree that changing any flags in backward
incompatible way would make a lot of people unhappy, and will generate
perhaps quite a bit of "customer support".  :)



However, even with my `--strict` argument, I was perhaps leaning
toward adding a more API-friendly command line parser, that would
basically only take arguments in the form `--flag=value`, anything
else being considered a search term, and anything not a flag but
before a single `--` should be considered an error.

Regarding the `--boolean` vs `--no-boolean` it does solve the
strictness problem, however it makes the life of script developers
quite hard, as now he has a `case` or `if/then/else`.  Therefore I
would say that `--flag=value` is the best option as it can be simply
written as `--flag={FLAG:-true}` or in Python for example `"--flag=%s"
% _flag`.



Thinking even further uppon this, perhaps an even simpler idea would
be to provide a new command, like for example `notmuch api` that takes
on `stdin` a JSON with a specific format and does its job.


Ciprian.
___
notmuch mailing list
notmuch@notmuchmail.org
https://notmuchmail.org/mailman/listinfo/notmuch


Re: Inconsistencies in handling command flags: `--flag=value` different than `--flag value`

2020-04-27 Thread Ciprian Dorin Craciun
On Mon, Apr 27, 2020 at 9:21 PM Tomi Ollila  wrote:
> > On Mon 2020-04-27 14:53:07 -0300, David Bremner wrote:
> >> Quoting notmuch(1)
> >>
> >>OPTION SYNTAX
> >>All options accepting an argument can be used with '='
> >>or ':' as a separator. For the cases where it's not ambiguous
> >>(in particular excluding boolean options), a space can also be
> >>used.


I definitively skipped over that warning, mainly because I was reading
the man-page for the specific command (i.e. `notmuch-search`, etc.)
that don't feature that warning.

Please note that I understand "why" I get this behavior, and
definitively I agree that it's my fault.  However my initial report
was intended to find a way that new users don't shoot themselves in
the foot, especially since many will use `notmuch` from a script, and
sometimes they don't thoroughly check the arguments passed by the
user.



> > Alternately, we could deprecate using whitespace for all options,
> > produce explicit warnings to stderr when whitespace appears on the next
>
> was it so, that originally we did not support whitespace, but David
> added that in some commit...


>From a "correctness" point of view, this would be the best approach.
However I think it could be too late to introduce it, and it would
break too many integrations.



> > release, remove the suggestion to use a whitespace separator from the
> > documentation, and eventually phase it out entirely in some future
> > release.
>
> Alternatively we could check that next arg is (case-insensitively)
> (subset of) 'true', 'false', 'yes', 'no', '0', '1', 't', 'nil'
> (but not tpyoes of these ;) and in that case have that as an option
> value...


This would be perhaps the best approach.  However I don't think it
would solve the issues for integrators that would not see these
warnings in the logs, until it is too late.



> ... would that work better for human user who just wants to be
> fluent on command line -- frontends can then always use = and option
> values...


Perhaps there could be an additional option (either on the command
line or in the configuration) that would apply "strict" checking, and
not letting any other form except `--argument=value`, including the
boolean flags, and failing loudly.

I think this third option would enable much safer integrations.

(BTW, this "strict" option could also apply to the parsing of the
search terms, which most of the time are under the control of the end
user.)

Ciprian.
___
notmuch mailing list
notmuch@notmuchmail.org
https://notmuchmail.org/mailman/listinfo/notmuch


Inconsistencies in handling command flags: `--flag=value` different than `--flag value`

2020-04-24 Thread Ciprian Dorin Craciun
[Again sorry for double reporting.  BTW, where should I search for
previous bugs?  I've currently tried the mailing list archive.]


Trying to play with `notmuch` from a wrapper, I've stumbled upon the
following command line flags handling bug:


notmuch show --format json --entire-thread true --body false --
'cipr...@volution.ro'
notmuch show --format json --entire-thread true --body=false --
'cipr...@volution.ro'
#=> yields nothing

notmuch show --format json --entire-thread=true --body false --
'cipr...@volution.ro'
#=> yields some emails

notmuch show --format json --entire-thread=true --body=false --
'cipr...@volution.ro'
#=> yields lots of emails



I would expect that `--flag value` and `--flag=value` are equivalent,
at least for the options that the manual states `--flag=(true|false)`.

However based on the previous experiments it seems that using anything
except `--flag=value` yields inconsistent results.

Hope it helps,
Ciprian.
___
notmuch mailing list
notmuch@notmuchmail.org
https://notmuchmail.org/mailman/listinfo/notmuch


Strip spaces in `tags` in `~/.notmuch-config` (and other fields)

2020-04-24 Thread Ciprian Dorin Craciun
[Sorry if I'm double reporting this.  I've tried my best to search for
previous discussions.]


I've tried to manually edit my `~/.notmuch-config`, and I've seen that
the field `tags` was written as `tags=unread;inbox;`.  In order to
increase readability I've decided to update my configuration file by
adding spaces around `=` and `;` as in `tags = unread ; inbox ;`.
Everything worked without a warning, until it didn't...  :)

What happened:  all my emails are now tagged with `unread ` and `
inbox `;  (i.e. whitespace in tags).


Given that the `~/.notmuch-config` resembles an INI file, and given
how lax the actual syntax is in general, I would suggest the
following:

* allow white-spaces around `[ section ]`, and `field = value`;
* strip white-spaces (left and right) from values like `tags = unread
; inbox ;`;  (but not infix like `tag = some tag ; some other tag;`;)
* allow skipping the last `;` separator from `tags` and similar;

Failing that, perhaps add a warning when parsing the configuration file.


Hope it helps,
Ciprian.
___
notmuch mailing list
notmuch@notmuchmail.org
https://notmuchmail.org/mailman/listinfo/notmuch


Alternative (raw) message store (i.e. instead of maildir)

2012-08-14 Thread Ciprian Dorin Craciun
On Tue, Aug 14, 2012 at 7:50 PM, Vladimir Marek
 wrote:
>> On the other hand I strongly sustain having a more optimized
>> backend for emails, especially for such cases. For example a
>> BerkeleyDB would perfectly fit such a use case, especially if we store
>> the body and the headers in separate databases.
>>
>> Just a small experiment, below are the R `summary(emails)` of the
>> sizes of my 700k emails:
>> 
>> Min.  1st Qu.   Median Mean  3rd Qu. Max.
>>8 4364 537411510 7042 3109
>> 
>>
>> As seen 75% of the emails are below 7k, and this without any 
>> compression...
>>
>> Moreover we could organize the keys so that in a B-Tree structure
>> the emails in the same thread are closer together...
>
> Now I'm not sure if you talk about some berkeley-db fuse filesystem or
> direct support in notmuch.

No tricks. :)

I proposed -- better said queried if possible or at least wanted
-- to have an internal interface (SPI) that any mail store would have
to implement in order to be indexed and used by notmuch. I guess the
interface would be quite lightweight, and would need just the
following:
* open store;
* create a cursor iterating through all the emails, yielding only the keys;
* read the envelope (as a byte blob) of a particular key; (used
only for displaying thread lists, etc.;)
* read the body (as a byte blob) of a particular key;
* maybe create a cursor iterating over all those emails that have
changed since a particular timestamp;


> I don't have enough cycles to modify notmuch,
> so I started to look at simpler (codewise) solution ...
>
> To summarize, what I personally want from the mail storage

We need to make a distinction between current storage (like
maildir) and archival storage (like the Zip or my proposal).


> - ability to read and write mails

It could be done through a small CLI over the proposed API.

> - should work with mutt (or mutt-kz)

This would eliminate any proposal not involving a FUSE wrapper...

> - simple backup to windows drive (files can't contain double colon ':')

This could be done via a dump like facility. (BerkeleyDB supports
this natively through a tool.)


Alternative (raw) message store (i.e. instead of maildir)

2012-08-14 Thread Ciprian Dorin Craciun
On Tue, Aug 14, 2012 at 7:04 PM, Vladimir Marek
 wrote:
>> >  - fuse zip stores all changes in memory until unmounted
>> >  - fuse zip (and libzip for that matter) creates new temporary file when
>> >updating archive, which takes considerable time when the archive is
>> >very big.
>>
>> This isn't much of a hastle if you have maildir per time period and
>> archive off. Maybe if you sync flags it may be...
>
> That might be interesting solution, maildir per time period.


Although using a zip file through FUSE as a maildir store is not
much better in my opinion.

This is because it still doesn't solve the syscall overhead. For
example just going through the list of files to find those that
changed requires the following syscalls:
* reading the next directory entry (which is amortized as it reads
them in a batch, but the batch size is limited, should we say 1
syscall per 10 files?);
* stat-ing the file;

Now by adding FUSE we add an extra context switch for each syscall...

Although this issue would be problematic only for reindexing, but still...


> But still
> fuse zip caches all the data until unmounted. So even with just reading
> it keeps growing (I hope I'm not accusing fuse zip here, but this is my
> understanding form the code). This could be simply alleviated by having
> it periodically unmounted and mounted again (perhaps from cron).

I think there is an option for FUSE mount to specify if the data
should be cached by the kernel or not, as such this shouldn't be a
problem for FUSE itself, except if the Zip FUSE handler does some
extra caching.)


>> > Of course this solution would have some disadvantages too, but for me
>> > the advantages would win. At the moment I'm not sure if I want to
>> > continue working on that. Maybe if there would be more interested guys
>>
>> I'm *really* tempted to investigate making this work for archived
>> mail. Of course, the list of mounted file systems could get insane
>> depending on granularity I guess...
>
> Well, if your granularity will be one archive per year of mail, it
> should not be that bad ...


On the other hand I strongly sustain having a more optimized
backend for emails, especially for such cases. For example a
BerkeleyDB would perfectly fit such a use case, especially if we store
the body and the headers in separate databases.

Just a small experiment, below are the R `summary(emails)` of the
sizes of my 700k emails:

Min.  1st Qu.   Median Mean  3rd Qu. Max.
   8 4364 537411510 7042 3109


As seen 75% of the emails are below 7k, and this without any compression...

Moreover we could organize the keys so that in a B-Tree structure
the emails in the same thread are closer together...

Ciprian.


Re: Alternative (raw) message store (i.e. instead of maildir)

2012-08-14 Thread Ciprian Dorin Craciun
On Tue, Aug 14, 2012 at 7:50 PM, Vladimir Marek
 wrote:
>> On the other hand I strongly sustain having a more optimized
>> backend for emails, especially for such cases. For example a
>> BerkeleyDB would perfectly fit such a use case, especially if we store
>> the body and the headers in separate databases.
>>
>> Just a small experiment, below are the R `summary(emails)` of the
>> sizes of my 700k emails:
>> 
>> Min.  1st Qu.   Median Mean  3rd Qu. Max.
>>8 4364 537411510 7042 3109
>> 
>>
>> As seen 75% of the emails are below 7k, and this without any 
>> compression...
>>
>> Moreover we could organize the keys so that in a B-Tree structure
>> the emails in the same thread are closer together...
>
> Now I'm not sure if you talk about some berkeley-db fuse filesystem or
> direct support in notmuch.

No tricks. :)

I proposed -- better said queried if possible or at least wanted
-- to have an internal interface (SPI) that any mail store would have
to implement in order to be indexed and used by notmuch. I guess the
interface would be quite lightweight, and would need just the
following:
* open store;
* create a cursor iterating through all the emails, yielding only the keys;
* read the envelope (as a byte blob) of a particular key; (used
only for displaying thread lists, etc.;)
* read the body (as a byte blob) of a particular key;
* maybe create a cursor iterating over all those emails that have
changed since a particular timestamp;


> I don't have enough cycles to modify notmuch,
> so I started to look at simpler (codewise) solution ...
>
> To summarize, what I personally want from the mail storage

We need to make a distinction between current storage (like
maildir) and archival storage (like the Zip or my proposal).


> - ability to read and write mails

It could be done through a small CLI over the proposed API.

> - should work with mutt (or mutt-kz)

This would eliminate any proposal not involving a FUSE wrapper...

> - simple backup to windows drive (files can't contain double colon ':')

This could be done via a dump like facility. (BerkeleyDB supports
this natively through a tool.)
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: Alternative (raw) message store (i.e. instead of maildir)

2012-08-14 Thread Ciprian Dorin Craciun
On Tue, Aug 14, 2012 at 7:04 PM, Vladimir Marek
 wrote:
>> >  - fuse zip stores all changes in memory until unmounted
>> >  - fuse zip (and libzip for that matter) creates new temporary file when
>> >updating archive, which takes considerable time when the archive is
>> >very big.
>>
>> This isn't much of a hastle if you have maildir per time period and
>> archive off. Maybe if you sync flags it may be...
>
> That might be interesting solution, maildir per time period.


Although using a zip file through FUSE as a maildir store is not
much better in my opinion.

This is because it still doesn't solve the syscall overhead. For
example just going through the list of files to find those that
changed requires the following syscalls:
* reading the next directory entry (which is amortized as it reads
them in a batch, but the batch size is limited, should we say 1
syscall per 10 files?);
* stat-ing the file;

Now by adding FUSE we add an extra context switch for each syscall...

Although this issue would be problematic only for reindexing, but still...


> But still
> fuse zip caches all the data until unmounted. So even with just reading
> it keeps growing (I hope I'm not accusing fuse zip here, but this is my
> understanding form the code). This could be simply alleviated by having
> it periodically unmounted and mounted again (perhaps from cron).

I think there is an option for FUSE mount to specify if the data
should be cached by the kernel or not, as such this shouldn't be a
problem for FUSE itself, except if the Zip FUSE handler does some
extra caching.)


>> > Of course this solution would have some disadvantages too, but for me
>> > the advantages would win. At the moment I'm not sure if I want to
>> > continue working on that. Maybe if there would be more interested guys
>>
>> I'm *really* tempted to investigate making this work for archived
>> mail. Of course, the list of mounted file systems could get insane
>> depending on granularity I guess...
>
> Well, if your granularity will be one archive per year of mail, it
> should not be that bad ...


On the other hand I strongly sustain having a more optimized
backend for emails, especially for such cases. For example a
BerkeleyDB would perfectly fit such a use case, especially if we store
the body and the headers in separate databases.

Just a small experiment, below are the R `summary(emails)` of the
sizes of my 700k emails:

Min.  1st Qu.   Median Mean  3rd Qu. Max.
   8 4364 537411510 7042 3109


As seen 75% of the emails are below 7k, and this without any compression...

Moreover we could organize the keys so that in a B-Tree structure
the emails in the same thread are closer together...

Ciprian.
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Alternative (raw) message store (i.e. instead of maildir)

2012-08-13 Thread Ciprian Dorin Craciun
On Sat, Aug 11, 2012 at 11:50 PM, Jameson Graef Rollins
 wrote:
> On Sat, Aug 11 2012, Ciprian Dorin Craciun  
> wrote:
>> My problem with it is that it doesn't scale... And I don't mean
>> this in a theoretical sense, I mean it in the concrete one: I have
>> about 661k emails... And a single `notmuch sync` takes a few tens of
>> seconds...
>
> Hey, Ciprian.  That sounds really slow, which makes me wonder if there
> are other things going on here.
> I have 155k messages, but notmuch new
> takes a fraction of a second for me.  This initial indexing certainly
> takes a long time (hours potentially), but additions after that should
> be really fast.  What version of notmuch are you using?  What version of
> xapian?


Don't think there is anything wrong here... Its just drags with
the file system...

So just to give a complete info:
* hardware: Core i5, 8GiB RAM (7.5GiB of which is the FS cache),
SSD (about 175MiB raw disk access);
* `notmuch --version`: 0.13 (built from sources on latest ArchLinux);
* `notmuch count`: 701820;
* `notmuch new` (after adding 5925 new emails, at touching others):

Processed 7017 total files in 3m 19s (35 files/sec.).
Added 6061 new messages to the database. Detected 1116 file renames.

* actually the entire thing took almost 5 minutes, but the first
two it didn't display anything just acesing the disk;
* `notmuch new` (another go, but this time I've `time`-d it):

No new mail.
real0m40.546s
user0m4.523s
sys 0m17.506s

* `notmuch new` (yet another go, no change):

No new mail.
real0m39.190s
user0m4.229s
sys 0m17.697s

* just to `du` the maildir (there are also 40k other files in
other maildirs not included in this count):

8.7G..
real0m22.229s
user0m1.023s
sys 0m7.890s

* on `new` no hooks are run;
* the file system in cause is JFS;


As such I doubt the problem is with notmuch itself, and I guess
it's the file system interaction...

Now I know I have a really obscure corner case, and I'm positively
amazed on how good notmuch handles this situation. I just wandered if
I could have fixed my problem by moving to an embedded DB, thus
skipping all that syscall overhead...

Ciprian.


Re: Alternative (raw) message store (i.e. instead of maildir)

2012-08-13 Thread Ciprian Dorin Craciun
On Sat, Aug 11, 2012 at 11:50 PM, Jameson Graef Rollins
 wrote:
> On Sat, Aug 11 2012, Ciprian Dorin Craciun  wrote:
>> My problem with it is that it doesn't scale... And I don't mean
>> this in a theoretical sense, I mean it in the concrete one: I have
>> about 661k emails... And a single `notmuch sync` takes a few tens of
>> seconds...
>
> Hey, Ciprian.  That sounds really slow, which makes me wonder if there
> are other things going on here.
> I have 155k messages, but notmuch new
> takes a fraction of a second for me.  This initial indexing certainly
> takes a long time (hours potentially), but additions after that should
> be really fast.  What version of notmuch are you using?  What version of
> xapian?


Don't think there is anything wrong here... Its just drags with
the file system...

So just to give a complete info:
* hardware: Core i5, 8GiB RAM (7.5GiB of which is the FS cache),
SSD (about 175MiB raw disk access);
* `notmuch --version`: 0.13 (built from sources on latest ArchLinux);
* `notmuch count`: 701820;
* `notmuch new` (after adding 5925 new emails, at touching others):

Processed 7017 total files in 3m 19s (35 files/sec.).
Added 6061 new messages to the database. Detected 1116 file renames.

* actually the entire thing took almost 5 minutes, but the first
two it didn't display anything just acesing the disk;
* `notmuch new` (another go, but this time I've `time`-d it):

No new mail.
real0m40.546s
user0m4.523s
sys 0m17.506s

* `notmuch new` (yet another go, no change):

No new mail.
real0m39.190s
user0m4.229s
sys 0m17.697s

* just to `du` the maildir (there are also 40k other files in
other maildirs not included in this count):

8.7G..
real0m22.229s
user0m1.023s
sys 0m7.890s

* on `new` no hooks are run;
* the file system in cause is JFS;


As such I doubt the problem is with notmuch itself, and I guess
it's the file system interaction...

Now I know I have a really obscure corner case, and I'm positively
amazed on how good notmuch handles this situation. I just wandered if
I could have fixed my problem by moving to an embedded DB, thus
skipping all that syscall overhead...

Ciprian.
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Alternative (raw) message store (i.e. instead of maildir)

2012-08-11 Thread Ciprian Dorin Craciun
On Sat, Aug 11, 2012 at 12:46 PM, Vladimir Marek
 wrote:
> Hi,
>
> I have objections against maildir too,

Just for the record I have nothing against maildir (or at least
when compared to mbox format). On the contrary I find it quite easy to
fiddle with...

My problem with it is that it doesn't scale... And I don't mean
this in a theoretical sense, I mean it in the concrete one: I have
about 661k emails... And a single `notmuch sync` takes a few tens of
seconds...

(Of course my problem could be partially solved by moving to a
fanout maildir folder, i.e. multiple maildirs. But this doesn't solve
the scalability it just delays the problem...)


> but I tried to tackle it from
> different perspective. Store the maildir in zip file and use fuse-zip to
> manage it. It works sort of but it has two major disadvantages:

I also thought of using either FUSE or 9p for this. Unfortunately
it doesn't quite solve my issue as seen above...


Now about other hacks to my problem:
* I'm aware that I can feed notmuch with individual file paths to
be indexed, but it still needs a path where to find an email;
* use the before mentioned fanout solution;
* others?

But regardless, having 600k emails on my disk (currently in the
same folder) is insane... Moreover I would have loved to be able to
use some Git plumbing as a store, or maybe CouchDB, etc...

Ciprian.


Alternative (raw) message store (i.e. instead of maildir)

2012-08-11 Thread Ciprian Dorin Craciun
Hello all!

My question -- rather a curiosity -- is if one could easily
implement an alternative message store instead of maildir. (I actually
have in mind a KV store like BerkeleyDB, or even a database like
CouchDB...) (I'm not also implying the same for the index, which I'm
aware is based on Xapian, which requires BerkeleyDB, which in turn
needs a local file system.)

After quickly looking over the code (2 minutes actually) I saw
that currently this is not easily possible without touching a lot of
files... Or am I wrong?

Better said: is such an abstract email store interface on the
to-do list, or even acceptable to have if someone provides it?

Thanks,
Ciprian.


Re: Alternative (raw) message store (i.e. instead of maildir)

2012-08-11 Thread Ciprian Dorin Craciun
On Sat, Aug 11, 2012 at 12:46 PM, Vladimir Marek
 wrote:
> Hi,
>
> I have objections against maildir too,

Just for the record I have nothing against maildir (or at least
when compared to mbox format). On the contrary I find it quite easy to
fiddle with...

My problem with it is that it doesn't scale... And I don't mean
this in a theoretical sense, I mean it in the concrete one: I have
about 661k emails... And a single `notmuch sync` takes a few tens of
seconds...

(Of course my problem could be partially solved by moving to a
fanout maildir folder, i.e. multiple maildirs. But this doesn't solve
the scalability it just delays the problem...)


> but I tried to tackle it from
> different perspective. Store the maildir in zip file and use fuse-zip to
> manage it. It works sort of but it has two major disadvantages:

I also thought of using either FUSE or 9p for this. Unfortunately
it doesn't quite solve my issue as seen above...


Now about other hacks to my problem:
* I'm aware that I can feed notmuch with individual file paths to
be indexed, but it still needs a path where to find an email;
* use the before mentioned fanout solution;
* others?

But regardless, having 600k emails on my disk (currently in the
same folder) is insane... Moreover I would have loved to be able to
use some Git plumbing as a store, or maybe CouchDB, etc...

Ciprian.
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Alternative (raw) message store (i.e. instead of maildir)

2012-08-11 Thread Ciprian Dorin Craciun
Hello all!

My question -- rather a curiosity -- is if one could easily
implement an alternative message store instead of maildir. (I actually
have in mind a KV store like BerkeleyDB, or even a database like
CouchDB...) (I'm not also implying the same for the index, which I'm
aware is based on Xapian, which requires BerkeleyDB, which in turn
needs a local file system.)

After quickly looking over the code (2 minutes actually) I saw
that currently this is not easily possible without touching a lot of
files... Or am I wrong?

Better said: is such an abstract email store interface on the
to-do list, or even acceptable to have if someone provides it?

Thanks,
Ciprian.
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Exporting a single email as JSON

2011-12-11 Thread Ciprian Dorin Craciun
On Sun, Dec 11, 2011 at 01:19, Jameson Graef Rollins
 wrote:
> On Sun, 11 Dec 2011 00:46:51 +0200, Ciprian Dorin Craciun  gmail.com> wrote:
>> ? ? * in my use-case I would need each line of the output to be a
>> standalone JSON object of an individual message; (thus I can script
>> with Bash `notmuch ... | while read message ; do ... ; done`;)
>
> This is actually a slightly different idea than what I thought you were
> originally proposing. ?Outputting a series of json objects rather than a
> single list has been talked about for notmuch search as well. ?I'm don't
> have a good sense of whether this is a sensible idea or not.
>
>> ? ? * maybe someone else would need that the output to contain
>> **exactly one** such message (maybe the first);
>
> This is what I thought we were talking about. ?This is an option I would
> like to see, at least.


Indeed exporting multiple messages as top / root JSON objects
isn't quite usable except limited import / export use-cases, thus what
you propose is more sensible. And in the end by having this
possibility I could easily implement the solution I'm seeking as
simple as:

notmuch --output=messages -- {criteria} \
| xargs -L 1 -- notmuch show --format=json -- \
| while read message_json ; do ... ; done


But there is only one problem with such an approach: efficiency.
With the snippet above I'll have as many `notmuch` process executions
as messages. (And I do have quite a few of them.) Thus although
Notmuch is quite fast -- as in human imperceptible -- still opening
and closing the Xapian database so many times does have quite an
overhead.

So in the end I think a discussion about the needed (/ wanted)
use-cases would be better.

Ciprian.

P.S.: I could help implement (or at least prototype) some of these
use-cases. Thus I'll watch over the thread you've pointed me to.


Exporting a single email as JSON

2011-12-11 Thread Ciprian Dorin Craciun
On Sat, Dec 10, 2011 at 22:15, Jameson Graef Rollins
 wrote:
> On Sat, 10 Dec 2011 20:32:22 +0200, Ciprian Dorin Craciun  gmail.com> wrote:
>> ? ? Quick question: why isn't it reasonable to export a **single**
>> email in JSON format (by using the `show` sub-command)? (I mean I
>> understand that in order to be able to correctly parse the output we
>> need only one "object" (i.e. a list of threads, containing a list of
>> emails, etc.). But there might be use cases in which we need a
>> "twist".)
>
> Hi, Ciprian. ?I agree that it would be nice too have the ability to
> output single messages without the rest of their thread. ?I have on
> occasion wanted this functionality, but never enough to get around to
> implementing it. ?It definitely wouldn't be that hard to implement,
> though.
>
> The notmuch show function is actually going through a pretty major
> overhaul at the moment. ?I bet as soon as that's done we can get some
> sort of single-message output going.
>
> jamie.


I've given a quick look into `notmuch-show.c` (commit from
December 4) and indeed it seems quite trivial to add new formats.

Thus I wonder:
a) Is the code suitable for experimenting such a feature? (I mean
is the "overhaul" almost done, or still in progress?)
b) What would be the estimate for the "overhaul" completion? (To
start prototyping such a feature...)
c) Would someone else be interested in such a feature? (Or it's
something so remote that only the two of us stumbled upon it?)

I think it's quite hard to get this feature "right". I.e. I can
see the following different -- but equally likely -- use-cases:
* in my use-case I would need each line of the output to be a
standalone JSON object of an individual message; (thus I can script
with Bash `notmuch ... | while read message ; do ... ; done`;)
* maybe someone else would need that the output to contain
**exactly one** such message (maybe the first);
* and maybe for someone else the use case involves having no
`--entire-thread` by default;
* further more someone else would actually prefer a "flatten" list
of messages (not the currently nested list);
* or maybe the separator in the first use case should be `\0`
instead of `\n`;

Thanks,
Ciprian.

P.S.: I think all sub-commands that output line-feed separated
records should also have the option to split them instead with `\0`.
(I.e. `xargs` insists upon this I think, if not it separates by space
or new-line.)


Exporting a single email as JSON

2011-12-10 Thread Ciprian Dorin Craciun
Hello all!

Quick question: why isn't it reasonable to export a **single**
email in JSON format (by using the `show` sub-command)? (I mean I
understand that in order to be able to correctly parse the output we
need only one "object" (i.e. a list of threads, containing a list of
emails, etc.). But there might be use cases in which we need a
"twist".)

My current use case is: I want to import the JSON representation
of my emails in CouchDB, each email in a single document. And as I
already have my emails indexed with Notmuch, I hopped that -- with the
help of some Bash-fu and Curl -- it would have been trivial to
instruct notmuch to export all emails matching a certain criteria as
JSON...

What would have been perfect in this case: each matching email
(with or without the `--entire-thread` flag) should be exported as a
single JSON object on a single line, thus each different email on a
single line. Thus I could have easily used `notmuch show
--output=json-line -- {criteria} | xargs -L 1 -- curl
{couchdb-magic}`.)

For now I'll pre-process the current output in JavaScript.

Thanks,
Ciprian.


Re: Exporting a single email as JSON

2011-12-10 Thread Ciprian Dorin Craciun
On Sun, Dec 11, 2011 at 01:19, Jameson Graef Rollins
 wrote:
> On Sun, 11 Dec 2011 00:46:51 +0200, Ciprian Dorin Craciun 
>  wrote:
>>     * in my use-case I would need each line of the output to be a
>> standalone JSON object of an individual message; (thus I can script
>> with Bash `notmuch ... | while read message ; do ... ; done`;)
>
> This is actually a slightly different idea than what I thought you were
> originally proposing.  Outputting a series of json objects rather than a
> single list has been talked about for notmuch search as well.  I'm don't
> have a good sense of whether this is a sensible idea or not.
>
>>     * maybe someone else would need that the output to contain
>> **exactly one** such message (maybe the first);
>
> This is what I thought we were talking about.  This is an option I would
> like to see, at least.


Indeed exporting multiple messages as top / root JSON objects
isn't quite usable except limited import / export use-cases, thus what
you propose is more sensible. And in the end by having this
possibility I could easily implement the solution I'm seeking as
simple as:

notmuch --output=messages -- {criteria} \
| xargs -L 1 -- notmuch show --format=json -- \
| while read message_json ; do ... ; done


But there is only one problem with such an approach: efficiency.
With the snippet above I'll have as many `notmuch` process executions
as messages. (And I do have quite a few of them.) Thus although
Notmuch is quite fast -- as in human imperceptible -- still opening
and closing the Xapian database so many times does have quite an
overhead.

So in the end I think a discussion about the needed (/ wanted)
use-cases would be better.

Ciprian.

P.S.: I could help implement (or at least prototype) some of these
use-cases. Thus I'll watch over the thread you've pointed me to.
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: Exporting a single email as JSON

2011-12-10 Thread Ciprian Dorin Craciun
On Sat, Dec 10, 2011 at 22:15, Jameson Graef Rollins
 wrote:
> On Sat, 10 Dec 2011 20:32:22 +0200, Ciprian Dorin Craciun 
>  wrote:
>>     Quick question: why isn't it reasonable to export a **single**
>> email in JSON format (by using the `show` sub-command)? (I mean I
>> understand that in order to be able to correctly parse the output we
>> need only one "object" (i.e. a list of threads, containing a list of
>> emails, etc.). But there might be use cases in which we need a
>> "twist".)
>
> Hi, Ciprian.  I agree that it would be nice too have the ability to
> output single messages without the rest of their thread.  I have on
> occasion wanted this functionality, but never enough to get around to
> implementing it.  It definitely wouldn't be that hard to implement,
> though.
>
> The notmuch show function is actually going through a pretty major
> overhaul at the moment.  I bet as soon as that's done we can get some
> sort of single-message output going.
>
> jamie.


I've given a quick look into `notmuch-show.c` (commit from
December 4) and indeed it seems quite trivial to add new formats.

Thus I wonder:
a) Is the code suitable for experimenting such a feature? (I mean
is the "overhaul" almost done, or still in progress?)
b) What would be the estimate for the "overhaul" completion? (To
start prototyping such a feature...)
c) Would someone else be interested in such a feature? (Or it's
something so remote that only the two of us stumbled upon it?)

I think it's quite hard to get this feature "right". I.e. I can
see the following different -- but equally likely -- use-cases:
* in my use-case I would need each line of the output to be a
standalone JSON object of an individual message; (thus I can script
with Bash `notmuch ... | while read message ; do ... ; done`;)
* maybe someone else would need that the output to contain
**exactly one** such message (maybe the first);
* and maybe for someone else the use case involves having no
`--entire-thread` by default;
* further more someone else would actually prefer a "flatten" list
of messages (not the currently nested list);
* or maybe the separator in the first use case should be `\0`
instead of `\n`;

Thanks,
Ciprian.

P.S.: I think all sub-commands that output line-feed separated
records should also have the option to split them instead with `\0`.
(I.e. `xargs` insists upon this I think, if not it separates by space
or new-line.)
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Exporting a single email as JSON

2011-12-10 Thread Ciprian Dorin Craciun
Hello all!

Quick question: why isn't it reasonable to export a **single**
email in JSON format (by using the `show` sub-command)? (I mean I
understand that in order to be able to correctly parse the output we
need only one "object" (i.e. a list of threads, containing a list of
emails, etc.). But there might be use cases in which we need a
"twist".)

My current use case is: I want to import the JSON representation
of my emails in CouchDB, each email in a single document. And as I
already have my emails indexed with Notmuch, I hopped that -- with the
help of some Bash-fu and Curl -- it would have been trivial to
instruct notmuch to export all emails matching a certain criteria as
JSON...

What would have been perfect in this case: each matching email
(with or without the `--entire-thread` flag) should be exported as a
single JSON object on a single line, thus each different email on a
single line. Thus I could have easily used `notmuch show
--output=json-line -- {criteria} | xargs -L 1 -- curl
{couchdb-magic}`.)

For now I'll pre-process the current output in JavaScript.

Thanks,
Ciprian.
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


RFC: notmuch powered (personal) (end-to-end) e-mail system

2011-03-20 Thread Ciprian Dorin Craciun
Hello all! (Sorry for the long email.)

I'm "struggling" for some time to get rid of the current
"de-facto" email solutions (i.e. GMail, Zimbra), and I've passively
observed for some time the notmuch project and community.

Although I've forwarded all my email to a single account, and I'm
currently mirroring my GMail account locally (by using `mbsync`),
index it by using notmuch, and I collect spam mails for later filter
training, unfortunately I'm unable to "convert" because the current
notmuch-powered solutions have (some of) the following shortcomings (I
don't want to offend anyone, so please take these as observations):
* the most feature full UI is the Emacs one -- thus limited remote
access (I mean from an arbitrary computer with only a web-browser);
(and I'm not a very big fan of Emacs;)
* most are still dependent on external IMAP systems -- this is not
a problem with notmuch itself, but for the integrating clients;
* SPAM -- as above -- is not integrated;
* filtering (tag applying) is not automatic (as in integrated in
notmuch itself or the client), but triggered through external scripts;

As such I'm thinking on implementing a custom end-to-end email
system and I would like to hear your feedback before embarking on such
a task.

I'm targeting the following features:
* (inbound) SMTP integration, thus once an email is received it is
automatically pushed through the system; (I'm primarily targeting
those users that afford to run their own SMTP server; but the solution
could still be adapted for those that only want the other features;)
* automatic spam filtering, and tag applying;
* automatic email triggers based on tags (such as user
notifications, forwarding, etc.)
* remote RPC-like access to the whole system;
* remote Web user interface;

About the overall architecture I'm thinking on adopting the following:
* in general the whole system is decomposed in independent
components (long-lived OS daemons) that each one does a particular job
(see below);
* all the components communicate between each-other through a
message queue system (for example ZeroMQ or RabbitMQ);
* all the communication is JSON based;

The components would be:
* SMTP inbound gateway -- for example I could take qmail or
Postfix and replace the delivery agent with a custom process that
pushes the email into the system; (any other solution suggestions?);
* email store -- as the name suggests it is a simple
key-value-like store that should persist raw email-messages; it should
be as robust as possible, and its contents should be the only thing
needed to reconstruct all the other derived data; (I could use here a
simple process that maintains a maildir, I could go also with a
BerkeleyDB wrapper, or even something more sophisticated;)
* spam filter -- which either classifies the email or trains the
spam filter; (for example I would use bogofilter;)
* email index -- this is where notmuch would come into play; it
would be fed with emails, which it would automatically apply tags and
issue trigger notifications based on tags; it also maintains a set of
filters and tags to automatically apply;
* (maybe) a coordinator that should delegate and monitor requests
to the above components; but if I'm using RabbitMQ and carefully
designing the above components, they could drive each other;
* restful web service that would intermediate access to all the
above components;

For now I have the following uncertainties:
* how should I handle multiple users? I think each user should
have it's own store / notmuch / bogofilter instance (at least in terms
of storage if not even in terms of separate daemon);
* should I keep the emails is a file-system, or a key-value store?
(the file-system is more bug-free, but I'm confident that a BerkeleyDB
instance would be more efficient);
* should I use libnotmuch or for starters just make a notmuch tool wrapper;
* and the most pressing one, transactions: I would like that at no
point does a message get half processed or lost; as such I need
notmuch to behave transactionally -- indexing the message and tagging
it should be atomic and durable; (is there a way with libnotmuch to
control the underlaying BerkeleyDB database?)

Suggestions? Considerations?

Ciprian.


RFC: notmuch powered (personal) (end-to-end) e-mail system

2011-03-20 Thread Ciprian Dorin Craciun
Hello all! (Sorry for the long email.)

I'm "struggling" for some time to get rid of the current
"de-facto" email solutions (i.e. GMail, Zimbra), and I've passively
observed for some time the notmuch project and community.

Although I've forwarded all my email to a single account, and I'm
currently mirroring my GMail account locally (by using `mbsync`),
index it by using notmuch, and I collect spam mails for later filter
training, unfortunately I'm unable to "convert" because the current
notmuch-powered solutions have (some of) the following shortcomings (I
don't want to offend anyone, so please take these as observations):
* the most feature full UI is the Emacs one -- thus limited remote
access (I mean from an arbitrary computer with only a web-browser);
(and I'm not a very big fan of Emacs;)
* most are still dependent on external IMAP systems -- this is not
a problem with notmuch itself, but for the integrating clients;
* SPAM -- as above -- is not integrated;
* filtering (tag applying) is not automatic (as in integrated in
notmuch itself or the client), but triggered through external scripts;

As such I'm thinking on implementing a custom end-to-end email
system and I would like to hear your feedback before embarking on such
a task.

I'm targeting the following features:
* (inbound) SMTP integration, thus once an email is received it is
automatically pushed through the system; (I'm primarily targeting
those users that afford to run their own SMTP server; but the solution
could still be adapted for those that only want the other features;)
* automatic spam filtering, and tag applying;
* automatic email triggers based on tags (such as user
notifications, forwarding, etc.)
* remote RPC-like access to the whole system;
* remote Web user interface;

About the overall architecture I'm thinking on adopting the following:
* in general the whole system is decomposed in independent
components (long-lived OS daemons) that each one does a particular job
(see below);
* all the components communicate between each-other through a
message queue system (for example ZeroMQ or RabbitMQ);
* all the communication is JSON based;

The components would be:
* SMTP inbound gateway -- for example I could take qmail or
Postfix and replace the delivery agent with a custom process that
pushes the email into the system; (any other solution suggestions?);
* email store -- as the name suggests it is a simple
key-value-like store that should persist raw email-messages; it should
be as robust as possible, and its contents should be the only thing
needed to reconstruct all the other derived data; (I could use here a
simple process that maintains a maildir, I could go also with a
BerkeleyDB wrapper, or even something more sophisticated;)
* spam filter -- which either classifies the email or trains the
spam filter; (for example I would use bogofilter;)
* email index -- this is where notmuch would come into play; it
would be fed with emails, which it would automatically apply tags and
issue trigger notifications based on tags; it also maintains a set of
filters and tags to automatically apply;
* (maybe) a coordinator that should delegate and monitor requests
to the above components; but if I'm using RabbitMQ and carefully
designing the above components, they could drive each other;
* restful web service that would intermediate access to all the
above components;

For now I have the following uncertainties:
* how should I handle multiple users? I think each user should
have it's own store / notmuch / bogofilter instance (at least in terms
of storage if not even in terms of separate daemon);
* should I keep the emails is a file-system, or a key-value store?
(the file-system is more bug-free, but I'm confident that a BerkeleyDB
instance would be more efficient);
* should I use libnotmuch or for starters just make a notmuch tool wrapper;
* and the most pressing one, transactions: I would like that at no
point does a message get half processed or lost; as such I need
notmuch to behave transactionally -- indexing the message and tagging
it should be atomic and durable; (is there a way with libnotmuch to
control the underlaying BerkeleyDB database?)

Suggestions? Considerations?

Ciprian.
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


`notmuch setup` replaces `~/.notmuch-config` instead of truncating it

2010-11-17 Thread Ciprian Dorin, Craciun
On Tue, Nov 16, 2010 at 22:42, Daniel Kahn Gillmor
 wrote:
> On 11/16/2010 03:37 PM, Daniel Kahn Gillmor wrote:
>> On 11/16/2010 03:26 PM, Ciprian Dorin, Craciun wrote:
>>> ? ? So in the light of the above quoted "glitches", my question is:
>>> due to the small chance of a power loss happening right when we write
>>> such a small file, doesn't the inconvenience weight more than the
>>> (fairly remote probable) file loss?
>>
>> What inconvenience? ?The inconvenience of writing the code correctly?
>
> Ah sorry -- on re-reading, i see you're probably referring to the
> inconvenience of breaking hardlinks, dropping permissions, ACLs and
> other metadata.

Yes, exactly this is what I'm referring to: the meta-data that is
not copied when a new file is created.


> That's an open question, as far as i'm concerned. ?it'd be nice if there
> was a way to "clone" a file's permissions and metadata to get the best
> of both worlds. ?maybe someone knows some tricks to do that?
>
> ? ? ? ?--dkg


`notmuch setup` replaces `~/.notmuch-config` instead of truncating it

2010-11-17 Thread Ciprian Dorin, Craciun
On Tue, Nov 16, 2010 at 22:37, Daniel Kahn Gillmor
 wrote:
> On 11/16/2010 03:26 PM, Ciprian Dorin, Craciun wrote:
>> ? ? P.S.: I say "pseudo" atomic because only the rename is atomic,
>> thus in order to override file `a` for the target file `b` which
>> exists, we must execute two **non-atomic** operations as a whole, but
>> each atomic in part, rename operations: make `b` -> `c`, and then
>> rename `a` -> `b`. So there is actually a small time-frame when I can
>> be left with two files (`a` and `c`), none of which is my config file
>> `b`. (This can be solved when opening the config file by checking if
>> there isn't any leftover `c` or `a` file, in which case I take the `a`
>> file and complete the rename.)
>
> There is only one ".notmuch-config" entry in the inode directory that is
> your homedir. ?it points either to the old file, or the new file. ?it
> cannot point to both, and it will not point to anything but those two
> possibilities. ?This is what the atomicity of the operation is expected
> to guarantee.
>
> ? ? ? ?--dkg


Actually I've been wrong about this... I've thought that the way
the file is "overwritten" is actually done by either two `rename`
calls or by `link`, followed by another `link`, and then finally an
`unlink` (this is what I've tried to explain in my previous email).

In fact I've thought that the `rename` OS call can't overwrite a
file if it exists.

But after reading the man page of `rename(2)` -- quoted below -- I
was indeed wrong to call the atomicity as being "pseudo".


   int rename(const char *oldpath, const char *newpath);
...
   If newpath already exists it will be atomically replaced
(subject to a few conditions; see ERRORS below),  so  that  there  is
no
   point at which another process attempting to access newpath
will find it missing.
...
   If newpath exists but the operation fails for some reason
rename() guarantees to leave an instance of newpath in place.


So indeed the behavior is completely atomic.

Ciprian.


`notmuch setup` replaces `~/.notmuch-config` instead of truncating it

2010-11-16 Thread Ciprian Dorin, Craciun
On Tue, Nov 16, 2010 at 21:09, Carl Worth  wrote:
> On Tue, 16 Nov 2010 15:33:30 +0200, "Ciprian Dorin, Craciun"  at gmail.com> wrote:
>> ? ? So my question is: is this behaviour (of deleting the file and
>> creating a new one) deliberate? If not, could it be fixed (I could
>> provide a patch) to just update the file in place?
>
> Daniel gave the perfect answer later in the thread. It is intentional to
> replace the file with a new, complete version, (to avoid loss/corruption
> of the file if "notmuch setup" is interrupted). But we should fix this
> to replace the target of any symlinks.
>
> -Carl


I understand now the reason for your file replacement choice.
(I'll look over tomorrow to see if I can provide a patch on the line
Daniel has described).

But -- in general, and totally overlooking the "pseudo" atomic
effect obtained from of POSIX file semantics -- doesn't this practice
mislead some software (like backup systems, etc.) that would rely
maybe on the inode number as part of the identity of the file?
Moreover what if the user has set any ACL's or extended attributes on
the file, wouldn't these be lost? (Wouldn't also SELinux be bothered?)

So after browsing through the source code, I've found inside
`notmuch-config.c`, inside the function `notmuch_config_save`, the
call which actually overrides the file: `g_file_set_contents
(config->filename, data, length, &error)`. Now searching for the
documentation of this function, I've stumbled upon, from which I cite
(I've never used glib before, so maybe the link is not the best one):
http://library.gnome.org/devel/glib/unstable/glib-File-Utilities.html

On Unix, if filename already exists hard links to filename will break.
Also since the file is recreated, existing permissions, access control
lists, metadata etc. may be lost. If filename is a symbolic link, the
link itself will be replaced, not the linked file.


So in the light of the above quoted "glitches", my question is:
due to the small chance of a power loss happening right when we write
such a small file, doesn't the inconvenience weight more than the
(fairly remote probable) file loss? (I must admit I've lost once the
`/etc/network/interfaces` file after an edit and immediately after a
quick cold reboot, but it was my fault as I've not sync-ed the file
system.)

Ciprian.

P.S.: I say "pseudo" atomic because only the rename is atomic,
thus in order to override file `a` for the target file `b` which
exists, we must execute two **non-atomic** operations as a whole, but
each atomic in part, rename operations: make `b` -> `c`, and then
rename `a` -> `b`. So there is actually a small time-frame when I can
be left with two files (`a` and `c`), none of which is my config file
`b`. (This can be solved when opening the config file by checking if
there isn't any leftover `c` or `a` file, in which case I take the `a`
file and complete the rename.)


`notmuch setup` replaces `~/.notmuch-config` instead of truncating it

2010-11-16 Thread Ciprian Dorin, Craciun
Hello all!

First congratulations for the nice software! I hardly wait for a
notmuch native (i.e. libnotmuch) and curses client (like `ner`) to
become more stable, and thus I'll be able to ditch GMail. :) But until
then a small glitch...

While upgrading from notmuch 0.4 to 0.5, I've re-runned `notmuch
config` as suggested in the release email.

But in my particular case `~/.notmuch-config` is symlinked to an
applications configuration directory which is versioned. Thus I've
expected than when notmuch updates the config, it opens it for
read-write, but with the truncation flag (which as a consequence would
have modified the symlinked file). But instead it deleted the symlink,
and replaced it with a newly created file (thus breaking my custom
configuration backup system.)

So my question is: is this behaviour (of deleting the file and
creating a new one) deliberate? If not, could it be fixed (I could
provide a patch) to just update the file in place?

Thanks,
Ciprian.


Re: `notmuch setup` replaces `~/.notmuch-config` instead of truncating it

2010-11-16 Thread Ciprian Dorin, Craciun
On Tue, Nov 16, 2010 at 22:42, Daniel Kahn Gillmor
 wrote:
> On 11/16/2010 03:37 PM, Daniel Kahn Gillmor wrote:
>> On 11/16/2010 03:26 PM, Ciprian Dorin, Craciun wrote:
>>>     So in the light of the above quoted "glitches", my question is:
>>> due to the small chance of a power loss happening right when we write
>>> such a small file, doesn't the inconvenience weight more than the
>>> (fairly remote probable) file loss?
>>
>> What inconvenience?  The inconvenience of writing the code correctly?
>
> Ah sorry -- on re-reading, i see you're probably referring to the
> inconvenience of breaking hardlinks, dropping permissions, ACLs and
> other metadata.

Yes, exactly this is what I'm referring to: the meta-data that is
not copied when a new file is created.


> That's an open question, as far as i'm concerned.  it'd be nice if there
> was a way to "clone" a file's permissions and metadata to get the best
> of both worlds.  maybe someone knows some tricks to do that?
>
>        --dkg
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: `notmuch setup` replaces `~/.notmuch-config` instead of truncating it

2010-11-16 Thread Ciprian Dorin, Craciun
On Tue, Nov 16, 2010 at 22:37, Daniel Kahn Gillmor
 wrote:
> On 11/16/2010 03:26 PM, Ciprian Dorin, Craciun wrote:
>>     P.S.: I say "pseudo" atomic because only the rename is atomic,
>> thus in order to override file `a` for the target file `b` which
>> exists, we must execute two **non-atomic** operations as a whole, but
>> each atomic in part, rename operations: make `b` -> `c`, and then
>> rename `a` -> `b`. So there is actually a small time-frame when I can
>> be left with two files (`a` and `c`), none of which is my config file
>> `b`. (This can be solved when opening the config file by checking if
>> there isn't any leftover `c` or `a` file, in which case I take the `a`
>> file and complete the rename.)
>
> There is only one ".notmuch-config" entry in the inode directory that is
> your homedir.  it points either to the old file, or the new file.  it
> cannot point to both, and it will not point to anything but those two
> possibilities.  This is what the atomicity of the operation is expected
> to guarantee.
>
>        --dkg


Actually I've been wrong about this... I've thought that the way
the file is "overwritten" is actually done by either two `rename`
calls or by `link`, followed by another `link`, and then finally an
`unlink` (this is what I've tried to explain in my previous email).

In fact I've thought that the `rename` OS call can't overwrite a
file if it exists.

But after reading the man page of `rename(2)` -- quoted below -- I
was indeed wrong to call the atomicity as being "pseudo".


   int rename(const char *oldpath, const char *newpath);
...
   If newpath already exists it will be atomically replaced
(subject to a few conditions; see ERRORS below),  so  that  there  is
no
   point at which another process attempting to access newpath
will find it missing.
...
   If newpath exists but the operation fails for some reason
rename() guarantees to leave an instance of newpath in place.


So indeed the behavior is completely atomic.

Ciprian.
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


Re: `notmuch setup` replaces `~/.notmuch-config` instead of truncating it

2010-11-16 Thread Ciprian Dorin, Craciun
On Tue, Nov 16, 2010 at 21:09, Carl Worth  wrote:
> On Tue, 16 Nov 2010 15:33:30 +0200, "Ciprian Dorin, Craciun" 
>  wrote:
>>     So my question is: is this behaviour (of deleting the file and
>> creating a new one) deliberate? If not, could it be fixed (I could
>> provide a patch) to just update the file in place?
>
> Daniel gave the perfect answer later in the thread. It is intentional to
> replace the file with a new, complete version, (to avoid loss/corruption
> of the file if "notmuch setup" is interrupted). But we should fix this
> to replace the target of any symlinks.
>
> -Carl


I understand now the reason for your file replacement choice.
(I'll look over tomorrow to see if I can provide a patch on the line
Daniel has described).

But -- in general, and totally overlooking the "pseudo" atomic
effect obtained from of POSIX file semantics -- doesn't this practice
mislead some software (like backup systems, etc.) that would rely
maybe on the inode number as part of the identity of the file?
Moreover what if the user has set any ACL's or extended attributes on
the file, wouldn't these be lost? (Wouldn't also SELinux be bothered?)

So after browsing through the source code, I've found inside
`notmuch-config.c`, inside the function `notmuch_config_save`, the
call which actually overrides the file: `g_file_set_contents
(config->filename, data, length, &error)`. Now searching for the
documentation of this function, I've stumbled upon, from which I cite
(I've never used glib before, so maybe the link is not the best one):
http://library.gnome.org/devel/glib/unstable/glib-File-Utilities.html

On Unix, if filename already exists hard links to filename will break.
Also since the file is recreated, existing permissions, access control
lists, metadata etc. may be lost. If filename is a symbolic link, the
link itself will be replaced, not the linked file.


So in the light of the above quoted "glitches", my question is:
due to the small chance of a power loss happening right when we write
such a small file, doesn't the inconvenience weight more than the
(fairly remote probable) file loss? (I must admit I've lost once the
`/etc/network/interfaces` file after an edit and immediately after a
quick cold reboot, but it was my fault as I've not sync-ed the file
system.)

Ciprian.

P.S.: I say "pseudo" atomic because only the rename is atomic,
thus in order to override file `a` for the target file `b` which
exists, we must execute two **non-atomic** operations as a whole, but
each atomic in part, rename operations: make `b` -> `c`, and then
rename `a` -> `b`. So there is actually a small time-frame when I can
be left with two files (`a` and `c`), none of which is my config file
`b`. (This can be solved when opening the config file by checking if
there isn't any leftover `c` or `a` file, in which case I take the `a`
file and complete the rename.)
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch


`notmuch setup` replaces `~/.notmuch-config` instead of truncating it

2010-11-16 Thread Ciprian Dorin, Craciun
Hello all!

First congratulations for the nice software! I hardly wait for a
notmuch native (i.e. libnotmuch) and curses client (like `ner`) to
become more stable, and thus I'll be able to ditch GMail. :) But until
then a small glitch...

While upgrading from notmuch 0.4 to 0.5, I've re-runned `notmuch
config` as suggested in the release email.

But in my particular case `~/.notmuch-config` is symlinked to an
applications configuration directory which is versioned. Thus I've
expected than when notmuch updates the config, it opens it for
read-write, but with the truncation flag (which as a consequence would
have modified the symlinked file). But instead it deleted the symlink,
and replaced it with a newly created file (thus breaking my custom
configuration backup system.)

So my question is: is this behaviour (of deleting the file and
creating a new one) deliberate? If not, could it be fixed (I could
provide a patch) to just update the file in place?

Thanks,
Ciprian.
___
notmuch mailing list
notmuch@notmuchmail.org
http://notmuchmail.org/mailman/listinfo/notmuch