Re: REWRITE: Parsing
I've pushed an attempt to define character classes, single character quoting, unqouted and quoted strings, STRING and TOKEN. Note that we need to go over the actual uses of STRING to see whether they should be using something else. That's an interesting point -- consider *any* command or option, and invariably you can have: COMMAND Option / 'Option' / `Option` Currently, the way we're documenting these options isn't catering for the quoting rules which the parser allows for---even if it's ignoring them (although in cases like PipeRead this does have semantic differences---although by the time it reaches the shell, we don't care how much of a mess someone might have landed themselves in). You are right. The current code mixes all sorts of things in an order that lacks proper definition: * Reading command lines from various sources (module input, file input, PipeRead), including parsing of continued lines. * Lazy tokenisation of command lines (i.e. the next token is parsed when it's needed). Tokenisation actually does not suit all commands. Some take the remaining part of the command line as a literal string (Echo, function definitions, ...), and some have more complex needs, e.g. commands that parse conditions containing sub-tokens separated by whitespace. Sometimes it is required that a token begins with a quote character. * De-quoting of tokens. This is done for the tokens before their contents are interpreted. So, the POSINT rule should really be a POSINT_TOKEN rule, but I wouldn't do that now because its a lot of work and the gain is doubtful. * Variable expansion does not fit into the parsing process at all. So we'll probably end up with several separate parsing steps (with their own Abnf each) and generation steps in between (expansion, de-quoting). However in documenting that minor detail for diffrent options would make the ABNF almost unreadable in my eyes, so I wonder how useful it is? Not at all for now. At the moment we have a strict separation of the general parsing which is done in a central place, and the command parsing which is done separately in each function. The most important work at the moment is to capture how the commands parse their arguments. I've started to even write down which parsing functions they use, but there are some really awful commands, e.g. windowshade: The first argument may be Last (which is only walid if the window has been unshaded before after being shaded with a direction argument), or it may be a direction (which may be an invalid string). Then, if both are missing or the one used was not valid, a bool argument is parsed instead, with the additional meaning that 2 is also interpreted as false. That's awfully difficult to describe in Abnf. 8-P Another example are function definitions: AddToFunc TeddyBear I Echo xteddy AddToFunc TeddyBear + C Exec exec xteddy + M Nop All of these are valid (AFAICT). So we're at a slight quandry here, because we've only documented the bare minimum, and your quoting definiions are fabulous, Dominik---the question I'm asking is how/if we apply them to every single option we've already documented, and whether that's a useful thing to do? No, the de-quoting step must be done outside the parser but I'm not sure how to write all this down. One other thing to note (which I don't think warrants a separate thread), is can you check the following for me (lines 1056-1063): MENUCONTEXT =/ Window / Interior Interior =/ Title Interior =/ (Button INT) Icon Interior =/ Item Interior =/ Context Interior =/ This Interior =/ (Rectangle MENUCONTEXTGEOMETRY) I'm not convinced the syntax is correct there, but I'm not sure precisely what it's getting at either? MENUCONTEXT = Window / Interior / Title / (Button INT) / Icon / ... can be written as MENUCONTEXT = Window MENUCONTEXT =/ Interior MENUCONTEXT =/ Title MENUCONTEXT =/ (Button INT) MENUCONTEXT =/ Icon MENUCONTEXT =/ ... See 3.3 in the rfc: http://tools.ietf.org/html/rfc5234 =/ is just a form of multi line continuation of a alternative rule: FOO = a / b / c or FOO = a / b / c is equivalent to FOO = a / b FOO =/ c Ciao Dominik ^_^ ^_^ -- Dominik Vogt
Re: REWRITE: Parsing
On Tue, Aug 26, 2014 at 08:39:57AM +0200, Dominik Vogt wrote: You are right. The current code mixes all sorts of things in an order that lacks proper definition: * Reading command lines from various sources (module input, file input, PipeRead), including parsing of continued lines. * Lazy tokenisation of command lines (i.e. the next token is parsed when it's needed). Tokenisation actually does not suit all commands. Some take the remaining part of the command line as a literal string (Echo, function definitions, ...), and some have more complex needs, e.g. commands that parse conditions containing sub-tokens separated by whitespace. Sometimes it is required that a token begins with a quote character. * De-quoting of tokens. This is done for the tokens before their contents are interpreted. So, the POSINT rule should really be a POSINT_TOKEN rule, but I wouldn't do that now because its a lot of work and the gain is doubtful. * Variable expansion does not fit into the parsing process at all. So we'll probably end up with several separate parsing steps (with their own Abnf each) and generation steps in between (expansion, de-quoting). Trouble is, you're in danger of silently documenting most of the intricacies of the current parser at the same time as you're documenting what the different commands have a syntaxes. As you say though, it's likely possible to link the two together after the fact, and since we know the above points, it's perhaps easier to do after we've documented everything else, which is more important. What I was checking is that we're not shooting ourselves in the foot by ignoring it now; so I'm not too worried. MENUCONTEXT =/ Window / Interior Interior =/ Title Interior =/ (Button INT) Icon Interior =/ Item Interior =/ Context Interior =/ This Interior =/ (Rectangle MENUCONTEXTGEOMETRY) I'm not convinced the syntax is correct there, but I'm not sure precisely what it's getting at either? MENUCONTEXT = Window / Interior / Title / (Button INT) / Icon / ... can be written as MENUCONTEXT = Window MENUCONTEXT =/ Interior MENUCONTEXT =/ Title MENUCONTEXT =/ (Button INT) MENUCONTEXT =/ Icon MENUCONTEXT =/ ... See 3.3 in the rfc: http://tools.ietf.org/html/rfc5234 =/ is just a form of multi line continuation of a alternative rule: FOO = a / b / c or FOO = a / b / c is equivalent to FOO = a / b FOO =/ c Sorry, I don't think I was too clear on this, Dominik. I understand the rules, what I'm not understanding in the above is why Interior is a continuation pattern repeated over again and again. It looks incorrect to me, as in I would have expected: MENUCONTEXT =/ Window / Interior MENUCONTEXT =/ Title ; etc... But that's not how it's ended up being in the document we're writing, and I'm just wondering if that's deliberate (i.e., I'm misreading it), or a typo. At this point I'm suggesting it's a typo. I need coffee and then I'm off to $DAYJOB. I'll continue this in my lunch hour. -- Thomas Adam -- Deep in my heart I wish I was wrong. But deep in my heart I know I am not. -- Morrissey (Girl Least Likely To -- off of Viva Hate.)
Re: REWRITE: Parsing
On Tue, Aug 26, 2014 at 10:33:39AM +0200, Dominik Vogt wrote: It looks incorrect to me, as in I would have expected: MENUCONTEXT =/ Window / Interior MENUCONTEXT =/ Title Yes. Note that the first line just has =, not =/. That was also broken. Both now fixed. Cheers, -- Thomas Adam
Re: REWRITE: Parsing
Another example are function definitions: AddToFunc TeddyBear I Echo xteddy AddToFunc TeddyBear + C Exec exec xteddy + M Nop This is actually a terrible piece of the fvwm syntax: Function and menu definitions are not atomic. If you write a function definition using FvwmConsole, the meaning of the '+' might change between lines: For example, if a function is triggered that uses PipeRead to generate a menu definition on the fly while you type into FvwmConsole, the next '+' might no longer add to the function but to the menu instead. And let's not talk about functions that generate functions that generate functions. Ciao Dominik ^_^ ^_^ -- Dominik Vogt
Historical Parsing (twm onwards) [WAS: Re: REWRITE: Parsing]
On Tue, Aug 26, 2014 at 07:32:17PM +0100, Dominik Vogt wrote: Another example are function definitions: AddToFunc TeddyBear I Echo xteddy AddToFunc TeddyBear + C Exec exec xteddy + M Nop This is actually a terrible piece of the fvwm syntax: Function and menu definitions are not atomic. If you write a function definition using FvwmConsole, the meaning of the '+' might change between lines: For example, if a function is triggered that uses PipeRead to generate a menu definition on the fly while you type into FvwmConsole, the next '+' might no longer add to the function but to the menu instead. And let's not talk about functions that generate functions that generate functions. Oh absolutely---I'm amazed we've not encountered bugs with regards to this---clearly over the years the developers have done something right. Or maybe it's that, and no one's used this in the weird and wonderful ways that would break things. Since we're on this subject, I'm curious about some of fvwm's history. fvwm (1.x) had what I'd call block level constructs, so for example: Function Resize-or-Raise Resize Motion Raise Motion Raise Click RaiseLower DoubleClick EndFunction This is at odds from twm which uses constructs like: Function move-or-iconify { f.move f.deltastop f.iconify } Although the intent is the same; and from what I can tell of fvwm 1.X's code having just quickly eye-balled it, nested function definitions wasn't possible. I wonder when the + command was introduced into fvwm, and did it always have a triple-meaning of menu/function/decor? Or did that get extended over time? -- Thomas Adam -- Deep in my heart I wish I was wrong. But deep in my heart I know I am not. -- Morrissey (Girl Least Likely To -- off of Viva Hate.)
Re: Historical Parsing (twm onwards) [WAS: Re: REWRITE: Parsing]
Thomas Adam tho...@fvwm.org writes: On Tue, Aug 26, 2014 at 07:32:17PM +0100, Dominik Vogt wrote: Another example are function definitions: AddToFunc TeddyBear I Echo xteddy AddToFunc TeddyBear + C Exec exec xteddy + M Nop This is actually a terrible piece of the fvwm syntax: Function and menu definitions are not atomic. If you write a function definition using FvwmConsole, the meaning of the '+' might change between lines: For example, if a function is triggered that uses PipeRead to generate a menu definition on the fly while you type into FvwmConsole, the next '+' might no longer add to the function but to the menu instead. And let's not talk about functions that generate functions that generate functions. Oh absolutely---I'm amazed we've not encountered bugs with regards to this---clearly over the years the developers have done something right. Or maybe it's that, and no one's used this in the weird and wonderful ways that would break things. Since we're on this subject, I'm curious about some of fvwm's history. fvwm (1.x) had what I'd call block level constructs, so for example: Function Resize-or-Raise Resize Motion Raise Motion Raise Click RaiseLower DoubleClick EndFunction This is at odds from twm which uses constructs like: Function move-or-iconify { f.move f.deltastop f.iconify } Although the intent is the same; and from what I can tell of fvwm 1.X's code having just quickly eye-balled it, nested function definitions wasn't possible. I wonder when the + command was introduced into fvwm, and did it always have a triple-meaning of menu/function/decor? Or did that get extended over time? I might take the blame for other mis-designed things, but as far as I remember, that goes way back. I think the issue was those pretty long commands AddToFunc, etc. But the + sign is just broken. On the other hand, I've never seen it cause a real problem. I think Fvwm just scoops up commands so fast that it's unlikely that there will be a conflict. A warning in the man page might be in order. It would be nice if Fvwm reported where it found an error (line 40 .fvwm/config) which would make the parser aware of where commands are coming from and provide a way to fix this. Of course sometimes it would be FvwmAnimate PID 1234, 20th command. -- Dan Espen
Re: Historical Parsing (twm onwards) [WAS: Re: REWRITE: Parsing]
On Tue, Aug 26, 2014 at 03:20:09PM -0400, Dan Espen wrote: Thomas Adam tho...@fvwm.org writes: I might take the blame for other mis-designed things, but as far as I remember, that goes way back. I think the issue was those pretty long commands AddToFunc, etc. But the + sign is just broken. On the other hand, I've never seen it cause a real problem. I think Fvwm just scoops up commands so fast that it's unlikely that there will be a conflict. Probably because nobody uses dynamic menus much. When fvwm reads a file or PipeRead input, it does not do anything in between, but input from modules cound trigger that. Anyway, it would be nice to have a clean scripting engine that can handle this correctly. You'd just have to store a separate '+' context for each source from which fvwm reads commands. It would be nice if Fvwm reported where it found an error (line 40 .fvwm/config) which would make the parser aware of where commands are coming from and provide a way to fix this. Of course sometimes it would be FvwmAnimate PID 1234, 20th command. Good idea. We should write that down somewhere. Ciao Dominik ^_^ ^_^ -- Dominik Vogt
REWRITE: New parser design
Pondering a new parser a bit, I think it would be good to split the existing command implementations: 1. A user interface function that takes the F_CMD_ARGS argument list as it does now. Its tasks are: a) To prepare a structure that contains the syntax tables for the command and any context information that is necessary to parse the commend syntax properly (example: The last option of the WindowShade command is not always valid but only under certain circumstances; the parser needs to know about that). b) To call the command syntax parser with the input structure prepared in (a). The parser returns an output structure describing the parse values and an error code. c) In case an error occurs, the user interface function generates an error message (if necessary). d) If parsing was successful, the user interface function calls the a command implementation function with the output structure from the parser and selected information from F_CMD_ARGS. To be specific, there is no information passed that would allow the command implementation function to do any parsing. The idea is that the interface function is the owner of all information in the output structure, and if the implementation function needs to store any information, it is responsible to make a copy in allocated storage, if necessary (or maybe just flag that it has taken posession of the data?) e) To call a cleanup function that frees all information in the input and output structures. Hopefully this can be automated. Note: For these compatibility functions we have that just translate the command line into a different command (e.g. WindowFont is nowadays translated into a style command), the interface function would pass execution to a different interface function and not call the implementation function. 2. A command implementation function that: * does the real work of the command, * is passed only the information it needs to do its work, * does no parsing itself * can be called from elswhere without having to go through the interface function 3. A cleanup function that frees any data allocated by the interface function or the parser. Rationale - The split into interface and implementation can be started at any time, independently of any new parser. It may be difficult for some badly defined command syntaxes. The benefit is that we can separate parsing from functionality and start writing a completely new parser in parallel to the existing one. The old parsing could be kept and be activated through a configure switch which would just exchange the implementation of the interface functions. Furthermore, the implementation function will have a well defined interface allowing to call them directly from any other place. In my eyes, splitting is an important step towards table driven parsing. Ciao Dominik ^_^ ^_^ -- Dominik Vogt
Re: REWRITE: New parser design
On Tue, Aug 26, 2014 at 10:42:03PM +0100, Dominik Vogt wrote: 1. A user interface function that takes the F_CMD_ARGS argument list as it does now. Its tasks are: a) To prepare a structure that contains the syntax tables for the command and any context information that is necessary to parse the commend syntax properly (example: The last option of the WindowShade command is not always valid but only under certain circumstances; the parser needs to know about that). b) To call the command syntax parser with the input structure prepared in (a). The parser returns an output structure describing the parse values and an error code. c) In case an error occurs, the user interface function generates an error message (if necessary). OK, so you have a single entry function which is responsible for dispatching commands for fvwm to try and process. I get that. I'm thinking now about the parsing mechanics. There's two approaches to my mind: * We have a big list of commands as we do now, but instead they're stored in a structure with callbacks. Each command has their own definition, and one of the callbacks is to prepare() and then exec() themselves. So for example: struct single_cmd { /* The actual command name */ const char *name; /* What to print on syntax error? */ const char *usage; enum cmd_state (*prepare)(const char *, struct tokens *); enum cmd_state (*exec)(F_CMD_ARGS); }; * Then we have a big list of these---let's say to replace the ones in mvwm/commands.h --- you can imagine something like: extern const struct cmd_single all_cmds[]; extern const struct cmd_single cmd_break; extern const struct cmd_single cmd_close; /* Repeat ad nauseum */ struct cmd_single *all_cmds[] = { cmd_break, cmd_close, NULL }; * Each CMD_*() as we have it now have one of these somewhere: /* Example definition of cmd_break */ struct single_cmd cmd_break = { Break, Break [levels], cmd_break_prepare, cmd_break_exec }; And the controlling interface function can just call into each of -prepare() and -exec() as it deems fit; keeping track of the error state, and throwing the command out as it does now. In terms of free()ing the structure, that should be fairly consistent to not necessarily need a callback---we could always introduce one and if it's NULL, don't call it. I think that summarises what you're referring to, Dominik, and that fits rather nicely with what I had in mind, What's nice about this split of having: interface function - prepare() - [maybe error] - exec() Is the interface function can do a lot of the heavy-lifting for us; we can concentrate on the prepare() side in isolation. Note that I've hand-waved the actual parsing details. I'll say a few things on those now: * I'm keen to have tokenised data as an output, perhaps in the form: struct tokens { const char *key; const char *value; struct tokens *next; }; The reason being is that if the commands tokenise the data in a meaningful way, we can pass a pointer about for common parsing routines and have the description for that data remain the same wherever they're used (for example in parsing move/resize commands). This would allow us in terms of transitioning to a new parser even easier because the data is always in a known format for common areas, reducing the need for additional parsing per command. * Tokenisation in the form key/value means that each command can lookup the data in the same way. For instance: if (has_token(level)) { int l = strtonum(get_token(level), 1, INT_MAX, NULL); if (l 10) fprintf(stderr, Sorry, break level too high!\n); } This says nothing about the type; conversion to other types can happen independently as needed (hence the strtonum() definition above). * Note that the semantics of a command don't change---there might be an upfront overhead in terms of structuring the tokenised data, but I think this flexibility really does allow for some interesting things. - OK, at the moment, that's an internal mechanism in terms of the key name, because we're not changing the semantics at this point, but when we do, this makes the transition easier. - The key could then be whatever new identifier per command(s) we want, including common idioms for establishing how we define move/resize arguments, etc. Note that the struct tokens example doesn't take into account function parsing semantics of multiple values for different functions, in specific orderings. I didn't really want to dive into the mechanics in quite this much detail, but I'm
Re: REWRITE: New parser design
Thomas Adam tho...@fvwm.org writes: * Tokenisation in the form key/value means that each command can lookup the data in the same way. For instance: if (has_token(level)) { int l = strtonum(get_token(level), 1, INT_MAX, NULL); if (l 10) fprintf(stderr, Sorry, break level too high!\n); } Comments: What you show above tells me we have 2 functions has_token and get_token that as a result of a parser can retrieve parts of the command just parsed using a token in the command. Doesn't imply some kind of hash table? I would expect a table driven parser to have a table representing the commands arguments. Maybe: Animated_Move = {int, x, int, y, keyword, WARP, warp_keyword}; Then the logic in the CMD_ANIMATED_MOVE would be getting at it's parsed values differently: this_am_x = x; this_am_y = y; this_am)_warp = FALSE; if (warp_keyword) { this_am_warp = TRUE; } -- Dan Espen