Re: some thought on m_getfld
Philipp wrote: > [2023-09-02 14:41] Ken Hornstein > > > > I think you are thinking at the wrong scale. The problem with m_getfld() > > is not the interface, it is that it should not exist at all. > > Actually not quite, I think step by step. First get a good function > parsing the fields. Then look how this function can be used to parse the > complete header. With a scanner/parser approach, I don't think it's necessary to consider those separately. I agree with Ken: > > So this suggests to me that really in most cases the whole header should > > just be read and then you get some accessor functions to get out the > > headers you care about. The nmh code for interfacing with GMail APIs (on the gmailapis branch) does that. It's in Python and needs some optimizing, but at least it's brief and corresponds to the RFCs. David
Re: some thought on m_getfld
[2023-09-02 14:41] Ken Hornstein > >I'm not sure if it's a good idea to start this topic, but it bugs me a > >bit. Also as a disclaimer my view on this is highly influenced by my own > >m_getfld variant[0]. > > > >Because of the missunderstanding of when LENERR is used I have looked > >at m_getfld.c and I see some problems with the interface and the > >implemetation. > >[...] > > First, my apologies for not contributing more on the discussion regarding > header folding; my life has been busy as of late. But I have some thoughts > on this. > > I think you are thinking at the wrong scale. The problem with m_getfld() > is not the interface, it is that it should not exist at all. Actually not quite, I think step by step. First get a good function parsing the fields. Then look how this function can be used to parse the complete header. > As you note, there's a ton of duplicated code when dealing with m_getfld(). > This has led me to believe that the number of programs that actually need > a m_getfld()-style interface is close to zero. I think programs tend to > fall into two styles of header access: > > - One where they process each header field at a time (like the config file > parser). > - One where they need to slurp in all header fields and then just extract > the ones they care about (I think this is the most common). I would split up your second case in two: - One where all fields are read and processed, but some fields are processed in a special way (like post) - One where only some fields are needed (like pick) > So this suggests to me that really in most cases the whole header should > just be read and then you get some accessor functions to get out the > headers you care about. > [...] > I realize that this is a larger project and if someone just wants to > improve m_getfld() that is fine with me. A function to parse the complete header would be nice, but as you mention this is a larger project. The problem header parsing is required everywhere in nmh. Implementing such a function direct would required to either rewrite every m_getfld user at once or having two parser at the same time. So to separate this first improve the field parser and build a high level API based on this parser. This way there would only one parse and the single programs could be updated step by step. > In terms of implementation, as you note header fields have a well-defined > structure so honestly my preference would be to write a flex lexer to > process header fields. Because I have not that much experience with flex, I have handcrafted my m_getfld. But using flex (and yacc) sounds like a good idea. > But really I think there should be a higher level > API to deal with header fields rather than have nmh code call m_getfld(). Some time ago I have started a getfields() function for this. This function collect either all fields or the one specified by a filter. When you are interested in it, I can share it. Philipp > --Ken >
Re: some thought on m_getfld
>I'm not sure if it's a good idea to start this topic, but it bugs me a >bit. Also as a disclaimer my view on this is highly influenced by my own >m_getfld variant[0]. > >Because of the missunderstanding of when LENERR is used I have looked >at m_getfld.c and I see some problems with the interface and the >implemetation. >[...] First, my apologies for not contributing more on the discussion regarding header folding; my life has been busy as of late. But I have some thoughts on this. I think you are thinking at the wrong scale. The problem with m_getfld() is not the interface, it is that it should not exist at all. As you note, there's a ton of duplicated code when dealing with m_getfld(). This has led me to believe that the number of programs that actually need a m_getfld()-style interface is close to zero. I think programs tend to fall into two styles of header access: - One where they process each header field at a time (like the config file parser). - One where they need to slurp in all header fields and then just extract the ones they care about (I think this is the most common). So this suggests to me that really in most cases the whole header should just be read and then you get some accessor functions to get out the headers you care about. In terms of implementation, as you note header fields have a well-defined structure so honestly my preference would be to write a flex lexer to process header fields. But really I think there should be a higher level API to deal with header fields rather than have nmh code call m_getfld(). I realize that this is a larger project and if someone just wants to improve m_getfld() that is fine with me. --Ken
some thought on m_getfld
Hi I'm not sure if it's a good idea to start this topic, but it bugs me a bit. Also as a disclaimer my view on this is highly influenced by my own m_getfld variant[0]. Because of the missunderstanding of when LENERR is used I have looked at m_getfld.c and I see some problems with the interface and the implemetation. Let's start with the interface: m_getfld takes a char *buf and an int *bufsz for the field body and length. The bufsz is used as input and output parameter. If a field exceeds the legth of the of the buffer FLDPLUS is returned and the caller is responsible to call m_getfld again and concat the field body. Most callers use fmt_appendcomp in a loop to do this. This leads to a huge amount of duplicated code. I would say m_getfld should dynamic allocate the buffer. This way m_getfld could returen exact one field per call. For the body m_getfld should return one line per call. Next let look at LENERR and FMTERR. FMTERR is to indecate a format error and LENERR indicates that the field name is to long. I don't see why to distinguish these errors and most of the code handle them the same way[1]. Speaking of length, when the caller would like to know if a field contains a line with more then 998 bytes he has to check it on it's own. But for m_getfld this would be simple to check. I know this would be ignored in most cases. But for mhbuild and post this could be usefull. m_getfld also supports mbox and MMDF. There are only a few places where this is necesary. With a field/line based API it would be quite easy to implement this on top of m_getlfd. Next lets look at the implementation: For fields m_getlfd reads byte by byte from it's internal state and checks how to handle it depending on the current situation. A header is quite good structured so in most cases something like strchr could be used. There is this complex state struct which mimics some parts of buffered IO. Is this realy necesary? I have avoided implementing buffered IO myself by using getline(3). There might be other ways to implemet this without a so complex state. I don't want to tell you how you have to implement m_getfld. I just hope my view on this helps you to improve m_getfld. Philipp [0] http://git.marmaro.de/?p=mmh;a=blob;f=sbr/m_getfld2.c;h=b9a618d166e245a37854003610f5a9cb1576ef0c;hb=HEAD [1] at the cases I have looked at there is not even an indication in the error message which sate caused the error