Re: [mmh] folder_read with scandir()

2020-03-21 Thread Philipp Takacs
[2020-03-21 23:01] markus schnalke 
> [2020-03-21 17:48] Philipp Takacs 
> > [2020-03-21 10:28] markus schnalke 
> > >
> > > Some comments:
> > >
> > > [2020-03-19 12:45] Philipp Takacs 
> > > >
> > > > Also noticed the other flag, not shoure if we need this realy. It's
> > > > only used in folder and I don't see the point of knowing that there
> > > > are other files in a folder.
> > >
> > > We should be a bit careful at that point to understand the
> > > situation. Here are some aspects to think about:
> > >
> > > - Files starting with a comma. These were backups of deleted messages
> > > (see the section on the trash folder in my master's thesis). Thus
> > > they are considered MH files here but not further regarded. I'm
> > > not sure if the code had been this way all the time, because there
> > > used to the a `--with-hash-prefix' option (or similar) that
> > > replaced the comma with a hash symbol as the prefix for deleted
> > > messages backups. Anyways, we don't have such comma files anymore.
> > > Mmh uses the trash folder for that case, i.e. we can get rid of
> > > the comma prefix file handling ... possibly. Does anyone know the
> > > situation in nmh: have they switched to a trash folder as well?
> > > Because we're still (basically) mail storage compatible to nmh (I
> > > think), we should not be fully ignorant of that situation. But we
> > > could handle those backup files as `other' files. 
> > 
> > The files starting with a comma/dot aren't the problem they are totally
> > ignored. The problem are files witch aren't totally ignored. I have a ugly
> > workaround in the filter and I want to get rid of it.
>
> How I understand the situation (without looking at the code): A
> folder can contain:
>
> - messages
> - `.' and `..'
> - `.mh_sequences'
> - subfolders
> - other files
>
> Other files are everything else, no matter if they are named
> `.foo' or `,foo' or just `foo'.
>
> When we read a folder, only messages are of relevance. All the
> rest is ignored.

expect for the OTHERS bit set for files which aren't messages and don't
start with a '.' or a ','.

> Do we need to know about subfolders and other files? I'd say that
> it would be nice, because there are mmh tools that use such
> information.
>
> Does it have to be part of this function? Not necessarily.

That's why I have mentioned it.

> (Sorry for not looking at the code right now. I hope my thoughts
> help nonetheless.)

No problem, it's helps to have a different view on the problems

> > > - `Other' files seem to be files in the mail storage that have
> > > nothing to do with MH. E.g. any random file someone created in the
> > > mail storage without any connection to mmh. Normally mmh would
> > > simply ignore them, i.e. act as if they wouldn't be there at all.
> > 
> > Not completely other files are also sub directories. folder uses this to
> > check if it's required to search for sub directories with the recursive
> > flag. This is an other ugly code path I found.
>
> Like I layed out above, these are different classes of files that
> could be in a folder. Merging subdirectories and other files (or
> ignoring everything that starts with a dot) does not match the
> problem structure. It might work and it might have been good
> enough -- it might still be good enough -- but the goal should be
> that the code reflects the problem domain.
>
>
> > > There's one place, however, where `other' files are relevant:
> > > rmf(1). We should not delete a mail folder that includes `other'
> > > files. It seems this check is missing there. Hence, I would keep
> > > the `other' files marker. It adds few complexity and can be
> > > useful. Plus, we should use it in rmf(1).
> > 
> > rmf(1) just don't use folder_read(), therefor it's completely useless
> > for it. It does check for other files on it's own.
>
> Sounds a bit like code duplication. Surely both cases -- reading
> message numbers from a folder and deleting a folder -- are quite
> different, but the classification of files within a folder is the
> same and maybe should be outfactored and reused (maybe with an
> enum fileclass or such).

Yes it's code duplication. folder_read() covers most cases, but in some
cases there need to be done something a little different. Some refactoring
to have less code duplication would be nice. The best function for this
is the msgfilter().

> > > > +static int others;
> > > > +
> > > > +static int
> > > > +msgcmp(const struct dirent **d1, const struct dirent **d2)
> > > > +{
> > > > +   size_t l1, l2;
> > > > +
> > > > +   l1 = strlen((*d1)->d_name);
> > > > +   l2 = strlen((*d2)->d_name);
> > > > +   if (l1 < l2) {
> > > > +   return -1;
> > > > +   }
> > > > +   if (l1 > l2) {
> > > > +   return 1;
> > > > +   }
> > > > +   return strcmp((*d1)->d_name, (*d2)->d_name);
> > > > +}
> > >
> > > Isn't that a strange comparison algorithm, for first compare
> > > string lengths and only if eaqual their 

Re: [mmh] Previous-Sequence

2020-03-21 Thread Philipp Takacs
Hi

[2020-03-21 09:54] markus schnalke 
> [2019-12-15 17:55] Philipp Takacs 
> >
> > During that I found the Previous-Sequence again. I don't see a reason
> > why we need this feature. The usecases, I see, for the feature are better
> > handled with mark or the backlog of your shell. But I don't use this
> > feature.

During reading some old mails on the nmh mailing list I found a use case.
Put mails in a sequence after a refile[0]. Not sure if this works, but
for this I would suggest to add a -preserve switch to refile instead of
using previous-sequence for this, like nmh has done.

> Just for reference, the section from mh-sequence(7):
>
>The Previous Sequence
>Mmh provides the ability to remember the `msgs' or  `msg'
>argument last given to an mmh command.  The entry `Previ‐
>ous-Sequence' should be defined in the mmh  profile;  its
>value  should  be  a  sequence  name or multiple sequence
>names separated by spaces.  If  this  entry  is  defined,
>when   an  mmh  command  finishes,  it  will  define  the
>sequence(s) named in the value of this entry to be  those
>messages  that  were  specified to the command.  Hence, a
>profile entry of
>
> Previous-Sequence: pseq
>
>directs any mmh command that accepts a  `msg'  or  `msgs'
>argument  to define the sequence `pseq' as those messages
>when it finishes.
>
>Note: there can be a performance  penalty  in  using  the
>`Previous-Sequence'  facility.   If  it  is used, all mmh
>programs have to write the sequence  information  to  the
>.mh_sequences file for the folder each time they run.  If
>the `Previous-Sequence' profile entry  is  not  included,
>only pick and mark will write to the .mh_sequences file.

Actually the last sentence is wrong. A lot more mmh programs write
the .mh_sequences file, like show, inc, ...

> Besides: It does introduce a lot of write accesses to the sequence
> files.

Would it only be the write access itself, it wouldn't be that bad. You
still need to enable it.

> Before we remove it, I just would like to know why it was
> introduced in MH the first time and what possible other usecases
> there could be. This means asking on the nmh-workers mailing list.
> Both, out of curiosity and as a double-check in case we've not
> thought on something. Do you wanna write this message or should
> I?

This is a good idea. I want to write this mail.

So conclusion ask the nmh guys for use cases we missed and if there
aren't any, remove it.

Philipp

[0] https://lists.nongnu.org/archive/html/nmh-workers/2005-12/msg00086.html



Re: [mmh] show sequences racecondition

2020-03-21 Thread markus schnalke
Hoi.

[2020-03-21 16:15] Philipp Takacs 
> [2020-03-21 09:32] markus schnalke 
> > [2020-03-19 12:14] Philipp Takacs 
> > > [2019-12-15 17:55] Philipp Takacs 
> > > >
> > > > I have looked at the problem with open pager for show and change the
> > > > sequences during the pager is open. This happen often, if you show a
> > > > message and recieve a new message.
> >
> > This is an annoying inconvenience (actually a bug but not too
> > severe) that we probably all suffer from. Improving the situation
> > would be great.
> >
> > We can't get fully rid of the race condition, or? We have multiple
> > processes writing the same file, thus we will inevitably have a
> > conflict window. Correct? Could you explain quickly how we react
> > in this case of conflict?
> 
> So currently and after my path last writer winns. Currenly a programm
> reads the sequences and does it work. If the work includes changing
> the sequences, it writes the sequences to disk.
> 
> My patches reads the sequences after show is finisched with the main
> work. The changes to the sequences are then aplied to the new set of
> sequences. This makes the conflict window smaler, but the race
> condition is still there. My patch does this only for show, but I want
> to add this to seq_save.

Thanks for the explanation. Show(1) is the only case where I've
noticed the problem. Thus it's the most important.


> I found then, that seq_read and seq_list ignores mails witch it thinks
> don't exist. This I have disabled this, to avoid a secound read of the 
> of the whole directory.
> 
> Now the conflict window is small, but still there. The best way to
> handle this is locking.

This whole topic is a special one ... One case that one should
have in mind as well: Stuff shouldn't mess up, if you have show(1)
running while deleting/refiling/incing messages, not only to the
end but possibly on previously used message numbers. For instance,
I use `folder -pack' on the inbox quite often ... there might be a
`repl' job in background at the same time ...


I really don't want to burden you with this stuff. I think that
improving the most annoying problem is already a great step
forward. The rest is more of a separate topic to dig into and
solve consistently and generally. I only wanted to put the
information onto the list, to have it documented.


meillo


P.S.
Not related to this topic, more in general: I really appreciate
that you supported your changes with test cases. This is good
work. It will make mmh increasingly stronger. Thank you!



Re: [mmh] folder_add and msgstats

2020-03-21 Thread markus schnalke
Hoi.

[2020-03-21 18:43] Philipp Takacs 
> [2020-03-21 10:05] markus schnalke 
> > [2020-03-19 12:33] Philipp Takacs 
> > > I have also removed the
> > > clear_msg_flags loop from folder_read and folder_realloc. In most cases
> > > we use calloc, so this isn't necesarry. At one place in folder_realloc
> > > realloc is used. There I have add a memset.
> > > 
> > > comment?
> >
> > Why were (in the section down below) only the flags outside the
> > current message range cleared? Now you clear all flags.
> 
> Not sure whats make you think this. But if I had to guess I would
> say it's the "- lo".

No, it's the comment a bit more down in the code:

> > -   /*
> > -   ** Clear all the flags for entries outside
> > -   ** the current message range for this folder.
> > -   */

Maybe I was irritated that you removed this comment but still
performed a similar action. The memset() line is pretty complex
to understand. You could add a similar two-line comment to it,
explaining what you explained to me here (only a bit more
destilled):

> So a shourt exircourse to the msgstats. msgstats
> is an array of the lenght ``mp->hghoff - mp->lowoff''. lowoff is the
> starting offset, so if you want the stats of message 5 you access
> ``mp->msgstats[5 - mp->mp->lowoff]. hghoff is the end offset so you
> can change flags for all messagestats in the range lowoff <= i <= hghoff.
> folder_realloc in this code path increases the hghoff. Now I have to
> clear all data after the old hghoff. by the way there was a off by one
> in my path. I fixed it now it looks like this:
> 
> > > memset(mp->msgstats + mp->hghoff - lo + 1, 0, hi - mp->hghoff);
> 
> This actually clears less then the code before. The code before
> started at ``mp->hghmsg + 1''. Now it starts hghoff which is
> greater or equal to hghmsg.
> 
> > Does that change the behavior?
> 
> Yes, like the pathes for seq_read it don't clears flags for msgs
> which it belive don't exists. But this is actually the reason
> I created this patch.

Alright. That makes sense.

> > Or, if your memset() clears only specific
> > parts, you probably should put a comment there to explain it, as
> > this might not be what the programmer usually expects from a
> > memset() after a (re)alloc().
> 
> To me this looks completly like this what I would expect.

Of course; you've written the code. ;-)

> But I'll think about a good comment.

Is what you actually clear only the newly allocated memory? If so
then you could write something like:

/* zero the newly allocated memory, i.e. no flags set */

Or:

/* clear the newly allocated msg flag space */


(I really need to look at the code, not only at the diffs and the
mails ...)


meillo



Re: [mmh] folder_read with scandir()

2020-03-21 Thread markus schnalke
Hoi.

[2020-03-21 17:48] Philipp Takacs 
> [2020-03-21 10:28] markus schnalke 
> >
> > Some comments:
> >
> > [2020-03-19 12:45] Philipp Takacs 
> > >
> > > Also noticed the other flag, not shoure if we need this realy. It's
> > > only used in folder and I don't see the point of knowing that there
> > > are other files in a folder.
> >
> > We should be a bit careful at that point to understand the
> > situation. Here are some aspects to think about:
> >
> > - Files starting with a comma. These were backups of deleted messages
> > (see the section on the trash folder in my master's thesis). Thus
> > they are considered MH files here but not further regarded. I'm
> > not sure if the code had been this way all the time, because there
> > used to the a `--with-hash-prefix' option (or similar) that
> > replaced the comma with a hash symbol as the prefix for deleted
> > messages backups. Anyways, we don't have such comma files anymore.
> > Mmh uses the trash folder for that case, i.e. we can get rid of
> > the comma prefix file handling ... possibly. Does anyone know the
> > situation in nmh: have they switched to a trash folder as well?
> > Because we're still (basically) mail storage compatible to nmh (I
> > think), we should not be fully ignorant of that situation. But we
> > could handle those backup files as `other' files. 
> 
> The files starting with a comma/dot aren't the problem they are totally
> ignored. The problem are files witch aren't totally ignored. I have a ugly
> workaround in the filter and I want to get rid of it.

How I understand the situation (without looking at the code): A
folder can contain:

- messages
- `.' and `..'
- `.mh_sequences'
- subfolders
- other files

Other files are everything else, no matter if they are named
`.foo' or `,foo' or just `foo'.

When we read a folder, only messages are of relevance. All the
rest is ignored.

Do we need to know about subfolders and other files? I'd say that
it would be nice, because there are mmh tools that use such
information.

Does it have to be part of this function? Not necessarily.

(Sorry for not looking at the code right now. I hope my thoughts
help nonetheless.)


> > - `Other' files seem to be files in the mail storage that have
> > nothing to do with MH. E.g. any random file someone created in the
> > mail storage without any connection to mmh. Normally mmh would
> > simply ignore them, i.e. act as if they wouldn't be there at all.
> 
> Not completely other files are also sub directories. folder uses this to
> check if it's required to search for sub directories with the recursive
> flag. This is an other ugly code path I found.

Like I layed out above, these are different classes of files that
could be in a folder. Merging subdirectories and other files (or
ignoring everything that starts with a dot) does not match the
problem structure. It might work and it might have been good
enough -- it might still be good enough -- but the goal should be
that the code reflects the problem domain.


> > There's one place, however, where `other' files are relevant:
> > rmf(1). We should not delete a mail folder that includes `other'
> > files. It seems this check is missing there. Hence, I would keep
> > the `other' files marker. It adds few complexity and can be
> > useful. Plus, we should use it in rmf(1).
> 
> rmf(1) just don't use folder_read(), therefor it's completely useless
> for it. It does check for other files on it's own.

Sounds a bit like code duplication. Surely both cases -- reading
message numbers from a folder and deleting a folder -- are quite
different, but the classification of files within a folder is the
same and maybe should be outfactored and reused (maybe with an
enum fileclass or such).


> So conclusion, currently I would suggest to let in in. But if I rewrite
> the recursive code path of folder(1), I'll bring this up again.

Would be okay for me.


> > > +static int others;
> > > +
> > > +static int
> > > +msgcmp(const struct dirent **d1, const struct dirent **d2)
> > > +{
> > > + size_t l1, l2;
> > > +
> > > + l1 = strlen((*d1)->d_name);
> > > + l2 = strlen((*d2)->d_name);
> > > + if (l1 < l2) {
> > > + return -1;
> > > + }
> > > + if (l1 > l2) {
> > > + return 1;
> > > + }
> > > + return strcmp((*d1)->d_name, (*d2)->d_name);
> > > +}
> >
> > Isn't that a strange comparison algorithm, for first compare
> > string lengths and only if eaqual their content? Here does mmh
> > sort by length of folder name?
> 
> This is only for msg-files and sorts by numeric order. So "9" is
> smaller the "10". But actually it's complete irrelevant, because the
> m_atoi() and the storage in the array will automatic get the right
> order. Therefor we could completly remove this function.

Even better. ;-)

Still I'd like to comment on this piece of code, only because we
all can learn from other peoples views on certain code snippets.

I would have been less irritated if the function would have been
named `msgnumcmp'. 

Re: [mmh] folder_add and msgstats

2020-03-21 Thread Philipp Takacs
Hi

[2020-03-21 10:05] markus schnalke 
> [2020-03-19 12:33] Philipp Takacs 
> > I have also removed the
> > clear_msg_flags loop from folder_read and folder_realloc. In most cases
> > we use calloc, so this isn't necesarry. At one place in folder_realloc
> > realloc is used. There I have add a memset.
> > 
> > comment?
>
> Why were (in the section down below) only the flags outside the
> current message range cleared? Now you clear all flags.

Not sure whats make you think this. But if I had to guess I would
say it's the "- lo". So a shourt exircourse to the msgstats. msgstats
is an array of the lenght ``mp->hghoff - mp->lowoff''. lowoff is the
starting offset, so if you want the stats of message 5 you access
``mp->msgstats[5 - mp->mp->lowoff]. hghoff is the end offset so you
can change flags for all messagestats in the range lowoff <= i <= hghoff.
folder_realloc in this code path increases the hghoff. Now I have to
clear all data after the old hghoff. by the way there was a off by one
in my path. I fixed it now it looks like this:

> > memset(mp->msgstats + mp->hghoff - lo + 1, 0, hi - mp->hghoff);

This actually clears less then the code before. The code before
started at ``mp->hghmsg + 1''. Now it starts hghoff which is
greater or equal to hghmsg.

> Does that change the behavior?

Yes, like the pathes for seq_read it don't clears flags for msgs
which it belive don't exists. But this is actually the reason
I created this patch.

> Or, if your memset() clears only specific
> parts, you probably should put a comment there to explain it, as
> this might not be what the programmer usually expects from a
> memset() after a (re)alloc().

To me this looks completly like this what I would expect. But I'll
think about a good comment.

Philipp

> Apart from that, your changes look and sound sensible.
>
>
> meillo
>
>
> > From 937c01f62952147a8ff853d8a103dbe8465cb90e Mon Sep 17 00:00:00 2001
> > From: Philipp Takacs 
> > Date: Wed, 18 Mar 2020 21:46:30 +0100
> > Subject: [PATCH 3/3] use memset to clear the msgstats in folder_realloc
> > 
> > ---
> >  sbr/folder_realloc.c | 16 +---
> >  1 file changed, 1 insertion(+), 15 deletions(-)
> > 
> > diff --git a/sbr/folder_realloc.c b/sbr/folder_realloc.c
> > index 64a66409..781e8bdc 100644
> > --- a/sbr/folder_realloc.c
> > +++ b/sbr/folder_realloc.c
> > @@ -46,6 +46,7 @@ folder_realloc(struct msgs *mp, int lo, int hi)
> > ** just realloc the message status array.
> > */
> > mp->msgstats = mh_xrealloc(mp->msgstats, MSGSTATSIZE(mp, lo, 
> > hi));
> > +   memset(mp->msgstats + mp->hghoff - lo, 0, hi - mp->hghoff);
> > } else {
> > /*
> > ** We are changing the offset of the message status
> > @@ -69,20 +70,5 @@ folder_realloc(struct msgs *mp, int lo, int hi)
> > mp->lowoff = lo;
> > mp->hghoff = hi;
> >  
> > -   /*
> > -   ** Clear all the flags for entries outside
> > -   ** the current message range for this folder.
> > -   */
> > -   if (mp->nummsg > 0) {
> > -   for (msgnum = mp->lowoff; msgnum < mp->lowmsg; msgnum++)
> > -   clear_msg_flags(mp, msgnum);
> > -   for (msgnum = mp->hghmsg + 1; msgnum <= mp->hghoff; msgnum++)
> > -   clear_msg_flags(mp, msgnum);
> > -   } else {
> > -   /* no messages, so clear entire range */
> > -   for (msgnum = mp->lowoff; msgnum <= mp->hghoff; msgnum++)
> > -   clear_msg_flags(mp, msgnum);
> > -   }
> > -
> > return mp;
> >  }
> > -- 
> > 2.20.1
> > 
>



Re: [mmh] folder_read with scandir()

2020-03-21 Thread Philipp Takacs
[2020-03-21 10:28] markus schnalke 
> [2020-03-19 12:45] Philipp Takacs 
> >
> > I have also looked at folder_read and found it't a bit ugly ;-)
>
> A good indicator for starting to rework. :-)
>
> > I have rewritten it with scandir. This saves about 40 loc and improves
> > the reading flow for this funktion.
>
> Sounds good!
>
>
> Some comments:
>
> > Also noticed the other flag, not shoure if we need this realy. It's
> > only used in folder and I don't see the point of knowing that there
> > are other files in a folder.
>
> We should be a bit careful at that point to understand the
> situation. Here are some aspects to think about:
>
> - Files starting with a comma. These were backups of deleted messages
> (see the section on the trash folder in my master's thesis). Thus
> they are considered MH files here but not further regarded. I'm
> not sure if the code had been this way all the time, because there
> used to the a `--with-hash-prefix' option (or similar) that
> replaced the comma with a hash symbol as the prefix for deleted
> messages backups. Anyways, we don't have such comma files anymore.
> Mmh uses the trash folder for that case, i.e. we can get rid of
> the comma prefix file handling ... possibly. Does anyone know the
> situation in nmh: have they switched to a trash folder as well?
> Because we're still (basically) mail storage compatible to nmh (I
> think), we should not be fully ignorant of that situation. But we
> could handle those backup files as `other' files. 

The files starting with a comma/dot aren't the problem they are totally
ignored. The problem are files witch aren't totally ignored. I have a ugly
workaround in the filter and I want to get rid of it.

> - `Other' files seem to be files in the mail storage that have
> nothing to do with MH. E.g. any random file someone created in the
> mail storage without any connection to mmh. Normally mmh would
> simply ignore them, i.e. act as if they wouldn't be there at all.

Not completely other files are also sub directories. folder uses this to
check if it's required to search for sub directories with the recursive
flag. This is an other ugly code path I found.

> There's one place, however, where `other' files are relevant:
> rmf(1). We should not delete a mail folder that includes `other'
> files. It seems this check is missing there. Hence, I would keep
> the `other' files marker. It adds few complexity and can be
> useful. Plus, we should use it in rmf(1).

rmf(1) just don't use folder_read(), therefor it's completely useless
for it. It does check for other files on it's own.

So conclusion, currently I would suggest to let in in. But if I rewrite
the recursive code path of folder(1), I'll bring this up again.

> > +static int others;
> > +
> > +static int
> > +msgcmp(const struct dirent **d1, const struct dirent **d2)
> > +{
> > +   size_t l1, l2;
> > +
> > +   l1 = strlen((*d1)->d_name);
> > +   l2 = strlen((*d2)->d_name);
> > +   if (l1 < l2) {
> > +   return -1;
> > +   }
> > +   if (l1 > l2) {
> > +   return 1;
> > +   }
> > +   return strcmp((*d1)->d_name, (*d2)->d_name);
> > +}
>
> Isn't that a strange comparison algorithm, for first compare
> string lengths and only if eaqual their content? Here does mmh
> sort by length of folder name?

This is only for msg-files and sorts by numeric order. So "9" is
smaller the "10". But actually it's complete irrelevant, because the
m_atoi() and the storage in the array will automatic get the right
order. Therefor we could completly remove this function.

> > +static int
> > +msgfilter(const struct dirent *e)
> > +{
> > +   int i;
> > +   if (e->d_name[0] == '.' || e->d_name[0] == ',') {
>
> About this comma: see above!
>
> > +   return 0;
> > +   }
> > +   for (i = 0; e->d_name[i]; i++) {
> > +   //if ((i > 1 && e->d_name[i] < '0') || e->d_name[i] < '1' || 
> > e->d_name[i] > '9') {
>
> C99-style comment ... looks to be not finished at this point.

Oh thanks, I have forgotten about this.

> > +   if (e->d_name[i] < '0' || e->d_name[i] > '9') {

I have changed this to:

+   if ((i == 0 && e->d_name[i] == '0') || e->d_name[i] < '0' || 
e->d_name[i] > '9') {

> > +   others = 1;
> > +   return 0;
> > +   }
> > +   }
> > +   return 1;
> > +}
>
>
> Wow, it feels so good to have some touch with the mmh development
> again! I'm really looking forward to dig into the code as well.
> Hoping I get that managed soon.

Thanks for your comments. I'm looking forward to hear from you.

Philipp



Re: [mmh] show sequences racecondition

2020-03-21 Thread Philipp Takacs
Hi

[2020-03-21 09:32] markus schnalke 
> [2020-03-19 12:14] Philipp Takacs 
> > [2019-12-15 17:55] Philipp Takacs 
> > >
> > > I have looked at the problem with open pager for show and change the
> > > sequences during the pager is open. This happen often, if you show a
> > > message and recieve a new message.
>
> This is an annoying inconvenience (actually a bug but not too
> severe) that we probably all suffer from. Improving the situation
> would be great.
>
> We can't get fully rid of the race condition, or? We have multiple
> processes writing the same file, thus we will inevitably have a
> conflict window. Correct? Could you explain quickly how we react
> in this case of conflict?

So currently and after my path last writer winns. Currenly a programm
reads the sequences and does it work. If the work includes changing
the sequences, it writes the sequences to disk.

My patches reads the sequences after show is finisched with the main
work. The changes to the sequences are then aplied to the new set of
sequences. This makes the conflict window smaler, but the race
condition is still there. My patch does this only for show, but I want
to add this to seq_save.

I found then, that seq_read and seq_list ignores mails witch it thinks
don't exist. This I have disabled this, to avoid a secound read of the 
of the whole directory.

Now the conflict window is small, but still there. The best way to
handle this is locking.

Philipp



Re: [mmh] folder_read with scandir()

2020-03-21 Thread markus schnalke
Hoi.

[2020-03-19 12:45] Philipp Takacs 
>
> I have also looked at folder_read and found it't a bit ugly ;-)

A good indicator for starting to rework. :-)

> I have rewritten it with scandir. This saves about 40 loc and improves
> the reading flow for this funktion.

Sounds good!


Some comments:

> Also noticed the other flag, not shoure if we need this realy. It's
> only used in folder and I don't see the point of knowing that there
> are other files in a folder.

We should be a bit careful at that point to understand the
situation. Here are some aspects to think about:

- Files starting with a comma. These were backups of deleted messages
(see the section on the trash folder in my master's thesis). Thus
they are considered MH files here but not further regarded. I'm
not sure if the code had been this way all the time, because there
used to the a `--with-hash-prefix' option (or similar) that
replaced the comma with a hash symbol as the prefix for deleted
messages backups. Anyways, we don't have such comma files anymore.
Mmh uses the trash folder for that case, i.e. we can get rid of
the comma prefix file handling ... possibly. Does anyone know the
situation in nmh: have they switched to a trash folder as well?
Because we're still (basically) mail storage compatible to nmh (I
think), we should not be fully ignorant of that situation. But we
could handle those backup files as `other' files. 

- `Other' files seem to be files in the mail storage that have
nothing to do with MH. E.g. any random file someone created in the
mail storage without any connection to mmh. Normally mmh would
simply ignore them, i.e. act as if they wouldn't be there at all.
There's one place, however, where `other' files are relevant:
rmf(1). We should not delete a mail folder that includes `other'
files. It seems this check is missing there. Hence, I would keep
the `other' files marker. It adds few complexity and can be
useful. Plus, we should use it in rmf(1).


> +static int others;
> +
> +static int
> +msgcmp(const struct dirent **d1, const struct dirent **d2)
> +{
> + size_t l1, l2;
> +
> + l1 = strlen((*d1)->d_name);
> + l2 = strlen((*d2)->d_name);
> + if (l1 < l2) {
> + return -1;
> + }
> + if (l1 > l2) {
> + return 1;
> + }
> + return strcmp((*d1)->d_name, (*d2)->d_name);
> +}

Isn't that a strange comparison algorithm, for first compare
string lengths and only if eaqual their content? Here does mmh
sort by length of folder name?


> +static int
> +msgfilter(const struct dirent *e)
> +{
> + int i;
> + if (e->d_name[0] == '.' || e->d_name[0] == ',') {

About this comma: see above!

> + return 0;
> + }
> + for (i = 0; e->d_name[i]; i++) {
> + //if ((i > 1 && e->d_name[i] < '0') || e->d_name[i] < '1' || 
> e->d_name[i] > '9') {

C99-style comment ... looks to be not finished at this point.

> + if (e->d_name[i] < '0' || e->d_name[i] > '9') {
> + others = 1;
> + return 0;
> + }
> + }
> + return 1;
> +}


Wow, it feels so good to have some touch with the mmh development
again! I'm really looking forward to dig into the code as well.
Hoping I get that managed soon.


meillo



Re: [mmh] folder_add and msgstats

2020-03-21 Thread markus schnalke
Hoi.

[2020-03-19 12:33] Philipp Takacs 
>
> I have also removed the
> clear_msg_flags loop from folder_read and folder_realloc. In most cases
> we use calloc, so this isn't necesarry. At one place in folder_realloc
> realloc is used. There I have add a memset.
> 
> comment?

Why were (in the section down below) only the flags outside the
current message range cleared? Now you clear all flags. Does that
change the behavior? Or, if your memset() clears only specific
parts, you probably should put a comment there to explain it, as
this might not be what the programmer usually expects from a
memset() after a (re)alloc().

Apart from that, your changes look and sound sensible.


meillo


> From 937c01f62952147a8ff853d8a103dbe8465cb90e Mon Sep 17 00:00:00 2001
> From: Philipp Takacs 
> Date: Wed, 18 Mar 2020 21:46:30 +0100
> Subject: [PATCH 3/3] use memset to clear the msgstats in folder_realloc
> 
> ---
>  sbr/folder_realloc.c | 16 +---
>  1 file changed, 1 insertion(+), 15 deletions(-)
> 
> diff --git a/sbr/folder_realloc.c b/sbr/folder_realloc.c
> index 64a66409..781e8bdc 100644
> --- a/sbr/folder_realloc.c
> +++ b/sbr/folder_realloc.c
> @@ -46,6 +46,7 @@ folder_realloc(struct msgs *mp, int lo, int hi)
>   ** just realloc the message status array.
>   */
>   mp->msgstats = mh_xrealloc(mp->msgstats, MSGSTATSIZE(mp, lo, 
> hi));
> + memset(mp->msgstats + mp->hghoff - lo, 0, hi - mp->hghoff);
>   } else {
>   /*
>   ** We are changing the offset of the message status
> @@ -69,20 +70,5 @@ folder_realloc(struct msgs *mp, int lo, int hi)
>   mp->lowoff = lo;
>   mp->hghoff = hi;
>  
> - /*
> - ** Clear all the flags for entries outside
> - ** the current message range for this folder.
> - */
> - if (mp->nummsg > 0) {
> - for (msgnum = mp->lowoff; msgnum < mp->lowmsg; msgnum++)
> - clear_msg_flags(mp, msgnum);
> - for (msgnum = mp->hghmsg + 1; msgnum <= mp->hghoff; msgnum++)
> - clear_msg_flags(mp, msgnum);
> - } else {
> - /* no messages, so clear entire range */
> - for (msgnum = mp->lowoff; msgnum <= mp->hghoff; msgnum++)
> - clear_msg_flags(mp, msgnum);
> - }
> -
>   return mp;
>  }
> -- 
> 2.20.1
> 



Re: [mmh] Previous-Sequence

2020-03-21 Thread markus schnalke
Hoi.

[2019-12-15 17:55] Philipp Takacs 
>
> During that I found the Previous-Sequence again. I don't see a reason
> why we need this feature. The usecases, I see, for the feature are better
> handled with mark or the backlog of your shell. But I don't use this
> feature.

Just for reference, the section from mh-sequence(7):

   The Previous Sequence
   Mmh provides the ability to remember the `msgs' or  `msg'
   argument last given to an mmh command.  The entry `Previ‐
   ous-Sequence' should be defined in the mmh  profile;  its
   value  should  be  a  sequence  name or multiple sequence
   names separated by spaces.  If  this  entry  is  defined,
   when   an  mmh  command  finishes,  it  will  define  the
   sequence(s) named in the value of this entry to be  those
   messages  that  were  specified to the command.  Hence, a
   profile entry of

Previous-Sequence: pseq

   directs any mmh command that accepts a  `msg'  or  `msgs'
   argument  to define the sequence `pseq' as those messages
   when it finishes.

   Note: there can be a performance  penalty  in  using  the
   `Previous-Sequence'  facility.   If  it  is used, all mmh
   programs have to write the sequence  information  to  the
   .mh_sequences file for the folder each time they run.  If
   the `Previous-Sequence' profile entry  is  not  included,
   only pick and mark will write to the .mh_sequences file.

I myself don't use the previous sequence neither, yet I was
reluctant to throw it away as I wondered if I just haven't
understood it fully enough to find use for it. Also, I have the
feeling that I haven't really understood its reason for existance
well enough.

To me, it seems that it might have been introduced in times before
the shell history feature and today, the shell history covers the
usecase up fully.

The usecase I have in mind is something like:

show 16 25-29 33 34 41-l
refile pseq +foo

However, with shell history and line editing facilities, I simply
do:
Escape k cw refile
to get the second command with out the previous sequence. The
argument set doesn't get lost if you have a shell history.

The shell history covers up the case of a wrong command you like
to undo, too. E.g. I have ``mark unread'' and ``mark read''
commands (which wrap around mark(1)).


And regarding the situation apart from interactive shells with
history, in scripts you'll have the msg arguments in variables
anyways, thus you have them remembered.


This brings me to the conclusion that I don't see a relevant need
for the previous sequence in mmh. I find all the usecaes I can
think of covered otherwise (and more convenient) and I haven't
used the previous sequence myself at all.

Besides: It does introduce a lot of write accesses to the sequence
files.


Before we remove it, I just would like to know why it was
introduced in MH the first time and what possible other usecases
there could be. This means asking on the nmh-workers mailing list.
Both, out of curiosity and as a double-check in case we've not
thought on something. Do you wanna write this message or should
I?


meillo



Re: [mmh] show sequences racecondition

2020-03-21 Thread markus schnalke
Hoi,

good to see you being active again, Philipp. I try to put my
share in as well, at least comment on your messages. So, let's
start here:


[2020-03-19 12:14] Philipp Takacs 
> [2019-12-15 17:55] Philipp Takacs 
> >
> > I have looked at the problem with open pager for show and change the
> > sequences during the pager is open. This happen often, if you show a
> > message and recieve a new message.

This is an annoying inconvenience (actually a bug but not too
severe) that we probably all suffer from. Improving the situation
would be great.

We can't get fully rid of the race condition, or? We have multiple
processes writing the same file, thus we will inevitably have a
conflict window. Correct? Could you explain quickly how we react
in this case of conflict?

(For once I sadly am not able to dig into the code at the moment,
but as well it often is valuable to explain the code to someone
else in order to check the approach taken.)


meillo