Re: some thought on m_getfld

2023-09-16 Thread David Levine
Philipp wrote:

> [2023-09-02 14:41] Ken Hornstein 
> >
> > I think you are thinking at the wrong scale.  The problem with m_getfld()
> > is not the interface, it is that it should not exist at all.
>
> Actually not quite, I think step by step. First get a good function
> parsing the fields. Then look how this function can be used to parse the
> complete header.

With a scanner/parser approach, I don't think it's necessary to consider
those separately.

I agree with Ken:

> > So this suggests to me that really in most cases the whole header should
> > just be read and then you get some accessor functions to get out the
> > headers you care about.

The nmh code for interfacing with GMail APIs (on the gmailapis branch) does
that.  It's in Python and needs some optimizing, but at least it's brief
and corresponds to the RFCs.

David



Re: some thought on m_getfld

2023-09-03 Thread Philipp
[2023-09-02 14:41] Ken Hornstein 
> >I'm not sure if it's a good idea to start this topic, but it bugs me a
> >bit. Also as a disclaimer my view on this is highly influenced by my own
> >m_getfld variant[0].
> >
> >Because of the missunderstanding of when LENERR is used I have looked
> >at m_getfld.c and I see some problems with the interface and the
> >implemetation.
> >[...]
>
> First, my apologies for not contributing more on the discussion regarding
> header folding; my life has been busy as of late.  But I have some thoughts
> on this.
>
> I think you are thinking at the wrong scale.  The problem with m_getfld()
> is not the interface, it is that it should not exist at all.

Actually not quite, I think step by step. First get a good function
parsing the fields. Then look how this function can be used to parse the
complete header.

> As you note, there's a ton of duplicated code when dealing with m_getfld().
> This has led me to believe that the number of programs that actually need
> a m_getfld()-style interface is close to zero.  I think programs tend to
> fall into two styles of header access:
>
> - One where they process each header field at a time (like the config file
>   parser).
> - One where they need to slurp in all header fields and then just extract
>   the ones they care about (I think this is the most common).

I would split up your second case in two:

 - One where all fields are read and processed, but some fields are
   processed in a special way (like post)
 - One where only some fields are needed (like pick)

> So this suggests to me that really in most cases the whole header should
> just be read and then you get some accessor functions to get out the
> headers you care about.
> [...]
> I realize that this is a larger project and if someone just wants to
> improve m_getfld() that is fine with me.

A function to parse the complete header would be nice, but as you
mention this is a larger project. The problem header parsing is required
everywhere in nmh. Implementing such a function direct would required to
either rewrite every m_getfld user at once or having two parser at the
same time.

So to separate this first improve the field parser and build a high
level API based on this parser. This way there would only one parse and
the single programs could be updated step by step.

> In terms of implementation, as you note header fields have a well-defined
> structure so honestly my preference would be to write a flex lexer to
> process header fields.

Because I have not that much experience with flex, I have handcrafted
my m_getfld. But using flex (and yacc) sounds like a good idea.

> But really I think there should be a higher level
> API to deal with header fields rather than have nmh code call m_getfld().

Some time ago I have started a getfields() function for this. This
function collect either all fields or the one specified by a filter.
When you are interested in it, I can share it.

Philipp

> --Ken
>



Re: some thought on m_getfld

2023-09-02 Thread Ken Hornstein
>I'm not sure if it's a good idea to start this topic, but it bugs me a
>bit. Also as a disclaimer my view on this is highly influenced by my own
>m_getfld variant[0].
>
>Because of the missunderstanding of when LENERR is used I have looked
>at m_getfld.c and I see some problems with the interface and the
>implemetation.
>[...]

First, my apologies for not contributing more on the discussion regarding
header folding; my life has been busy as of late.  But I have some thoughts
on this.

I think you are thinking at the wrong scale.  The problem with m_getfld()
is not the interface, it is that it should not exist at all.

As you note, there's a ton of duplicated code when dealing with m_getfld().
This has led me to believe that the number of programs that actually need
a m_getfld()-style interface is close to zero.  I think programs tend to
fall into two styles of header access:

- One where they process each header field at a time (like the config file
  parser).
- One where they need to slurp in all header fields and then just extract
  the ones they care about (I think this is the most common).

So this suggests to me that really in most cases the whole header should
just be read and then you get some accessor functions to get out the
headers you care about.

In terms of implementation, as you note header fields have a well-defined
structure so honestly my preference would be to write a flex lexer to
process header fields.  But really I think there should be a higher level
API to deal with header fields rather than have nmh code call m_getfld().

I realize that this is a larger project and if someone just wants to
improve m_getfld() that is fine with me.

--Ken



some thought on m_getfld

2023-09-02 Thread Philipp
Hi

I'm not sure if it's a good idea to start this topic, but it bugs me
a bit. Also as a disclaimer my view on this is highly influenced by my
own m_getfld variant[0].

Because of the missunderstanding of when LENERR is used I have looked at
m_getfld.c and I see some problems with the interface and the implemetation.

Let's start with the interface:

m_getfld takes a char *buf and an int *bufsz for the field body and
length. The bufsz is used as input and output parameter. If a field
exceeds the legth of the of the buffer FLDPLUS is returned and the
caller is responsible to call m_getfld again and concat the field body.
Most callers use fmt_appendcomp in a loop to do this.

This leads to a huge amount of duplicated code. I would say m_getfld
should dynamic allocate the buffer. This way m_getfld could returen
exact one field per call. For the body m_getfld should return one line
per call.

Next let look at LENERR and FMTERR. FMTERR is to indecate a format error
and LENERR indicates that the field name is to long. I don't see why to
distinguish these errors and most of the code handle them the same way[1].

Speaking of length, when the caller would like to know if a field
contains a line with more then 998 bytes he has to check it on it's own.
But for m_getfld this would be simple to check. I know this would be
ignored in most cases. But for mhbuild and post this could be usefull.

m_getfld also supports mbox and MMDF. There are only a few places where
this is necesary. With a field/line based API it would be quite easy to
implement this on top of m_getlfd.

Next lets look at the implementation:

For fields m_getlfd reads byte by byte from it's internal state and
checks how to handle it depending on the current situation. A header
is quite good structured so in most cases something like strchr could
be used.

There is this complex state struct which mimics some parts of buffered IO.
Is this realy necesary? I have avoided implementing buffered IO myself
by using getline(3). There might be other ways to implemet this without
a so complex state.

I don't want to tell you how you have to implement m_getfld. I just hope
my view on this helps you to improve m_getfld. 

Philipp

[0] 
http://git.marmaro.de/?p=mmh;a=blob;f=sbr/m_getfld2.c;h=b9a618d166e245a37854003610f5a9cb1576ef0c;hb=HEAD

[1] at the cases I have looked at there is not even an indication in the
error message which sate caused the error