Re: REWRITE: Parsing

2014-08-26 Thread Dominik Vogt
  I've pushed an attempt to define character classes, single character
  quoting, unqouted and quoted strings, STRING and TOKEN.
  
  Note that we need to go over the actual uses of STRING to see
  whether they should be using something else.
 
 That's an interesting point -- consider *any* command or option, and
 invariably you can have:
 
 COMMAND Option / 'Option' / `Option`
 
 Currently, the way we're documenting these options isn't catering for
 the quoting rules which the parser allows for---even if it's ignoring
 them (although in cases like PipeRead this does have semantic
 differences---although by the time it reaches the shell, we don't care
 how much of a mess someone might have landed themselves in).

You are right.  The current code mixes all sorts of things in an order
that lacks proper definition:

  * Reading command lines from various sources (module input, file input,
PipeRead), including parsing of continued lines.
  * Lazy tokenisation of command lines (i.e. the next token is parsed
when it's needed).  Tokenisation actually does not suit all commands.
Some take the remaining part of the command line as a literal string
(Echo, function definitions, ...), and some have more complex needs,
e.g. commands that parse conditions containing sub-tokens separated
by whitespace.  Sometimes it is required that a token begins with a
quote character.
  * De-quoting of tokens.  This is done for the tokens before their
contents are interpreted.  So, the POSINT rule should really be a
POSINT_TOKEN rule, but I wouldn't do that now because its a lot of
work and the gain is doubtful.
  * Variable expansion does not fit into the parsing process at all.

So we'll probably end up with several separate parsing steps (with their
own Abnf each) and generation steps in between (expansion, de-quoting).

 However in documenting that minor detail for diffrent options would make
 the ABNF almost unreadable in my eyes, so I wonder how useful it is?

Not at all for now.  At the moment we have a strict separation of the
general parsing which is done in a central place, and the command parsing
which is done separately in each function.  The most important work at
the moment is to capture how the commands parse their arguments.  I've
started to even write down which parsing functions they use, but there are
some really awful commands, e.g. windowshade:  The first argument may be
Last (which is only walid if the window has been unshaded before after
being shaded with a direction argument), or it may be a direction (which
may be an invalid string).  Then, if both are missing or the one used was
not valid, a bool argument is parsed instead, with the additional meaning
that 2 is also interpreted as false.  That's awfully difficult to
describe in Abnf. 8-P

 Another example are function definitions:
 
 AddToFunc TeddyBear I Echo xteddy
 AddToFunc TeddyBear
 + C Exec exec xteddy
 + M Nop
 
 All of these are valid (AFAICT).  So we're at a slight quandry here,
 because we've only documented the bare minimum, and your quoting
 definiions are fabulous, Dominik---the question I'm asking is how/if we
 apply them to every single option we've already documented, and whether
 that's a useful thing to do?

No, the de-quoting step must be done outside the parser but I'm not sure
how to write all this down.

 One other thing to note (which I don't think warrants a separate
 thread), is can you check the following for me (lines 1056-1063):
 
 MENUCONTEXT =/ Window /
 Interior
 Interior =/ Title
 Interior =/ (Button INT) Icon
 Interior =/ Item
 Interior =/ Context
 Interior =/ This
 Interior =/ (Rectangle MENUCONTEXTGEOMETRY)
 
 I'm not convinced the syntax is correct there, but I'm not sure
 precisely what it's getting at either?

MENUCONTEXT = Window / Interior / Title / (Button INT) / Icon / ...

can be written as 

MENUCONTEXT = Window
MENUCONTEXT =/ Interior
MENUCONTEXT =/ Title
MENUCONTEXT =/ (Button INT)
MENUCONTEXT =/ Icon
MENUCONTEXT =/ ...

See 3.3 in the rfc:  http://tools.ietf.org/html/rfc5234
=/ is just a form of multi line continuation of a alternative rule:
  FOO = a / b / c
or
  FOO = a / b /
  c
is equivalent to
  FOO = a / b
  FOO =/ c

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt



Re: REWRITE: Parsing

2014-08-26 Thread Thomas Adam
On Tue, Aug 26, 2014 at 08:39:57AM +0200, Dominik Vogt wrote:
 You are right.  The current code mixes all sorts of things in an order
 that lacks proper definition:
 
   * Reading command lines from various sources (module input, file input,
 PipeRead), including parsing of continued lines.
   * Lazy tokenisation of command lines (i.e. the next token is parsed
 when it's needed).  Tokenisation actually does not suit all commands.
 Some take the remaining part of the command line as a literal string
 (Echo, function definitions, ...), and some have more complex needs,
 e.g. commands that parse conditions containing sub-tokens separated
 by whitespace.  Sometimes it is required that a token begins with a
 quote character.
   * De-quoting of tokens.  This is done for the tokens before their
 contents are interpreted.  So, the POSINT rule should really be a
 POSINT_TOKEN rule, but I wouldn't do that now because its a lot of
 work and the gain is doubtful.
   * Variable expansion does not fit into the parsing process at all.
 
 So we'll probably end up with several separate parsing steps (with their
 own Abnf each) and generation steps in between (expansion, de-quoting).

Trouble is, you're in danger of silently documenting most of the
intricacies of the current parser at the same time as you're documenting
what the different commands have a syntaxes.  As you say though, it's
likely possible to link the two together after the fact, and since we
know the above points, it's perhaps easier to do after we've documented
everything else, which is more important.  What I was checking is that
we're not shooting ourselves in the foot by ignoring it now; so I'm not
too worried.

  MENUCONTEXT =/ Window /
  Interior
  Interior =/ Title
  Interior =/ (Button INT) Icon
  Interior =/ Item
  Interior =/ Context
  Interior =/ This
  Interior =/ (Rectangle MENUCONTEXTGEOMETRY)
  
  I'm not convinced the syntax is correct there, but I'm not sure
  precisely what it's getting at either?
 
 MENUCONTEXT = Window / Interior / Title / (Button INT) / Icon / ...
 
 can be written as 
 
 MENUCONTEXT = Window
 MENUCONTEXT =/ Interior
 MENUCONTEXT =/ Title
 MENUCONTEXT =/ (Button INT)
 MENUCONTEXT =/ Icon
 MENUCONTEXT =/ ...
 
 See 3.3 in the rfc:  http://tools.ietf.org/html/rfc5234
 =/ is just a form of multi line continuation of a alternative rule:
   FOO = a / b / c
 or
   FOO = a / b /
   c
 is equivalent to
   FOO = a / b
   FOO =/ c

Sorry, I don't think I was too clear on this, Dominik.  I understand the
rules, what I'm not understanding in the above is why Interior is a
continuation pattern repeated over again and again.  It looks incorrect
to me, as in I would have expected:

MENUCONTEXT =/ Window / Interior
MENUCONTEXT =/ Title
; etc...

But that's not how it's ended up being in the document we're writing,
and I'm just wondering if that's deliberate (i.e., I'm misreading it),
or a typo.  At this point I'm suggesting it's a typo.

I need coffee and then I'm off to $DAYJOB.  I'll continue this in my
lunch hour.

-- Thomas Adam

-- 
Deep in my heart I wish I was wrong.  But deep in my heart I know I am
not. -- Morrissey (Girl Least Likely To -- off of Viva Hate.)



Re: REWRITE: Parsing

2014-08-26 Thread Thomas Adam
On Tue, Aug 26, 2014 at 10:33:39AM +0200, Dominik Vogt wrote:
  It looks incorrect
  to me, as in I would have expected:
  
  MENUCONTEXT =/ Window / Interior
  MENUCONTEXT =/ Title
 
 Yes.  Note that the first line just has =, not =/.

That was also broken.  Both now fixed.

Cheers,
-- Thomas Adam



Re: REWRITE: Parsing

2014-08-26 Thread Dominik Vogt
  Another example are function definitions:
  
  AddToFunc TeddyBear I Echo xteddy
  AddToFunc TeddyBear
  + C Exec exec xteddy
  + M Nop

This is actually a terrible piece of the fvwm syntax: Function and
menu definitions are not atomic.  If you write a function definition
using FvwmConsole, the meaning of the '+' might change between lines:
For example, if a function is triggered that uses PipeRead to
generate a menu definition on the fly while you type into FvwmConsole,
the next '+' might no longer add to the function but to the menu
instead.  And let's not talk about functions that generate functions
that generate functions.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt





Historical Parsing (twm onwards) [WAS: Re: REWRITE: Parsing]

2014-08-26 Thread Thomas Adam
On Tue, Aug 26, 2014 at 07:32:17PM +0100, Dominik Vogt wrote:
   Another example are function definitions:
   
   AddToFunc TeddyBear I Echo xteddy
   AddToFunc TeddyBear
   + C Exec exec xteddy
   + M Nop
 
 This is actually a terrible piece of the fvwm syntax: Function and
 menu definitions are not atomic.  If you write a function definition
 using FvwmConsole, the meaning of the '+' might change between lines:
 For example, if a function is triggered that uses PipeRead to
 generate a menu definition on the fly while you type into FvwmConsole,
 the next '+' might no longer add to the function but to the menu
 instead.  And let's not talk about functions that generate functions
 that generate functions.

Oh absolutely---I'm amazed we've not encountered bugs with regards to
this---clearly over the years the developers have done something right.
Or maybe it's that, and no one's used this in the weird and wonderful
ways that would break things.

Since we're on this subject, I'm curious about some of fvwm's history.
fvwm (1.x) had what I'd call block level constructs, so for
example:

Function Resize-or-Raise
Resize  Motion
Raise   Motion
Raise   Click
RaiseLower  DoubleClick
EndFunction

This is at odds from twm which uses constructs like:

Function move-or-iconify {
f.move f.deltastop f.iconify
}

Although the intent is the same; and from what I can tell of fvwm 1.X's
code having just quickly eye-balled it, nested function definitions
wasn't possible.

I wonder when the + command was introduced into fvwm, and did it
always have a triple-meaning of menu/function/decor?  Or did that get
extended over time?

-- Thomas Adam

-- 
Deep in my heart I wish I was wrong.  But deep in my heart I know I am
not. -- Morrissey (Girl Least Likely To -- off of Viva Hate.)



Re: Historical Parsing (twm onwards) [WAS: Re: REWRITE: Parsing]

2014-08-26 Thread Dan Espen
Thomas Adam tho...@fvwm.org writes:

 On Tue, Aug 26, 2014 at 07:32:17PM +0100, Dominik Vogt wrote:
   Another example are function definitions:
   
   AddToFunc TeddyBear I Echo xteddy
   AddToFunc TeddyBear
   + C Exec exec xteddy
   + M Nop
 
 This is actually a terrible piece of the fvwm syntax: Function and
 menu definitions are not atomic.  If you write a function definition
 using FvwmConsole, the meaning of the '+' might change between lines:
 For example, if a function is triggered that uses PipeRead to
 generate a menu definition on the fly while you type into FvwmConsole,
 the next '+' might no longer add to the function but to the menu
 instead.  And let's not talk about functions that generate functions
 that generate functions.

 Oh absolutely---I'm amazed we've not encountered bugs with regards to
 this---clearly over the years the developers have done something right.
 Or maybe it's that, and no one's used this in the weird and wonderful
 ways that would break things.

 Since we're on this subject, I'm curious about some of fvwm's history.
 fvwm (1.x) had what I'd call block level constructs, so for
 example:

 Function Resize-or-Raise
 Resize  Motion
 Raise   Motion
 Raise   Click
 RaiseLower  DoubleClick
 EndFunction

 This is at odds from twm which uses constructs like:

 Function move-or-iconify {
 f.move f.deltastop f.iconify
 }

 Although the intent is the same; and from what I can tell of fvwm 1.X's
 code having just quickly eye-balled it, nested function definitions
 wasn't possible.

 I wonder when the + command was introduced into fvwm, and did it
 always have a triple-meaning of menu/function/decor?  Or did that get
 extended over time?

I might take the blame for other mis-designed things, but
as far as I remember, that goes way back.  I think the issue was those
pretty long commands AddToFunc, etc.  But the + sign is just broken.
On the other hand, I've never seen it cause a real problem.
I think Fvwm just scoops up commands so fast that it's unlikely that
there will be a conflict.

A warning in the man page might be in order.

It would be nice if Fvwm reported where it found an error
(line 40 .fvwm/config) which would make the parser aware
of where commands are coming from and provide a way to fix
this.  Of course sometimes it would be FvwmAnimate PID 1234,
20th command.

-- 
Dan Espen



Re: Historical Parsing (twm onwards) [WAS: Re: REWRITE: Parsing]

2014-08-26 Thread Dominik Vogt
On Tue, Aug 26, 2014 at 03:20:09PM -0400, Dan Espen wrote:
 Thomas Adam tho...@fvwm.org writes:
 I might take the blame for other mis-designed things, but
 as far as I remember, that goes way back.  I think the issue was those
 pretty long commands AddToFunc, etc.  But the + sign is just broken.
 On the other hand, I've never seen it cause a real problem.
 I think Fvwm just scoops up commands so fast that it's unlikely that
 there will be a conflict.

Probably because nobody uses dynamic menus much.  When fvwm reads
a file or PipeRead input, it does not do anything in between, but
input from modules cound trigger that.  Anyway, it would be nice
to have a clean scripting engine that can handle this correctly.
You'd just have to store a separate '+' context for each source
from which fvwm reads commands.

 It would be nice if Fvwm reported where it found an error
 (line 40 .fvwm/config) which would make the parser aware
 of where commands are coming from and provide a way to fix
 this.  Of course sometimes it would be FvwmAnimate PID 1234,
 20th command.

Good idea.  We should write that down somewhere.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt



REWRITE: New parser design

2014-08-26 Thread Dominik Vogt
Pondering a new parser a bit, I think it would be good to split
the existing command implementations:

1. A user interface function that takes the F_CMD_ARGS argument
   list as it does now.  Its tasks are:

   a) To prepare a structure that contains the syntax tables for
  the command and any context information that is necessary
  to parse the commend syntax properly (example: The last
  option of the WindowShade command is not always valid but
  only under certain circumstances; the parser needs to know
  about that).

   b) To call the command syntax parser with the input structure
  prepared in (a).  The parser returns an output structure
  describing the parse values and an error code.

   c) In case an error occurs, the user interface function
  generates an error message (if necessary).

   d) If parsing was successful, the user interface function calls
  the a command implementation function with the output
  structure from the parser and selected information from
  F_CMD_ARGS.  To be specific, there is no information passed
  that would allow the command implementation function to do
  any parsing.  The idea is that the interface function is the
  owner of all information in the output structure, and if the
  implementation function needs to store any information, it is
  responsible to make a copy in allocated storage, if
  necessary (or maybe just flag that it has taken posession of
  the data?)

   e) To call a cleanup function that frees all information in the
  input and output structures.  Hopefully this can be
  automated.

   Note: For these compatibility functions we have that just
   translate the command line into a different command (e.g.
   WindowFont is nowadays translated into a style command), the
   interface function would pass execution to a different
   interface function and not call the implementation function.

2. A command implementation function that:

   * does the real work of the command,
   * is passed only the information it needs to do its work,
   * does no parsing itself
   * can be called from elswhere without having to go through the
 interface function

3. A cleanup function that frees any data allocated by the
   interface function or the parser.

Rationale
-

The split into interface and implementation can be started at any
time, independently of any new parser.  It may be difficult for
some badly defined command syntaxes.  The benefit is that we can
separate parsing from functionality and start writing a completely
new parser in parallel to the existing one.  The old parsing could
be kept and be activated through a configure switch which would
just exchange the implementation of the interface functions.

Furthermore, the implementation function will have a well defined
interface allowing to call them directly from any other place.

In my eyes, splitting is an important step towards table driven
parsing.

Ciao

Dominik ^_^  ^_^

-- 

Dominik Vogt



Re: REWRITE: New parser design

2014-08-26 Thread Thomas Adam
On Tue, Aug 26, 2014 at 10:42:03PM +0100, Dominik Vogt wrote:
 1. A user interface function that takes the F_CMD_ARGS argument
list as it does now.  Its tasks are:
 
a) To prepare a structure that contains the syntax tables for
   the command and any context information that is necessary
   to parse the commend syntax properly (example: The last
   option of the WindowShade command is not always valid but
   only under certain circumstances; the parser needs to know
   about that).

b) To call the command syntax parser with the input structure
   prepared in (a).  The parser returns an output structure
   describing the parse values and an error code.
 
c) In case an error occurs, the user interface function
   generates an error message (if necessary).

OK, so you have a single entry function which is responsible for
dispatching commands for fvwm to try and process.  I get that.  I'm
thinking now about the parsing mechanics.  There's two approaches to my
mind:

 * We have a big list of commands as we do now, but instead they're
   stored in a structure with callbacks.  Each command has their own
   definition, and one of the callbacks is to prepare() and then
   exec() themselves.  So for example:

struct single_cmd {
/* The actual command name */
const char *name;

/* What to print on syntax error? */
const char *usage;

enum cmd_state (*prepare)(const char *, struct tokens *);
enum cmd_state (*exec)(F_CMD_ARGS);
};

 * Then we have a big list of these---let's say to replace the ones in
   mvwm/commands.h --- you can imagine something like:

extern const struct cmd_single all_cmds[];
extern const struct cmd_single cmd_break;
extern const struct cmd_single cmd_close;
/* Repeat ad nauseum */

struct cmd_single *all_cmds[] = {
cmd_break,
cmd_close,
NULL
};

* Each CMD_*() as we have it now have one of these somewhere:

/* Example definition of cmd_break */
struct single_cmd cmd_break = {
Break,
Break [levels],
cmd_break_prepare,
cmd_break_exec
};

And the controlling interface function can just call into each of
-prepare() and -exec() as it deems fit; keeping track of the error
state, and throwing the command out as it does now.

In terms of free()ing the structure, that should be fairly consistent to
not necessarily need a callback---we could always introduce one and if
it's NULL, don't call it.

I think that summarises what you're referring to, Dominik, and that fits
rather nicely with what I had in mind,  What's nice about this split of
having:

interface function - prepare() - [maybe error] - exec()

Is the interface function can do a lot of the heavy-lifting for us; we
can concentrate on the prepare() side in isolation.

Note that I've hand-waved the actual parsing details.  I'll say a few
things on those now:

 * I'm keen to have tokenised data as an output, perhaps in the form:

struct tokens {
const char *key;
const char *value;

struct tokens *next;
};

 The reason being is that if the commands tokenise the data in a
 meaningful way, we can pass a pointer about for common parsing routines
 and have the description for that data remain the same wherever they're
 used (for example in parsing move/resize commands).  This would allow
 us in terms of transitioning to a new parser even easier because the
 data is always in a known format for common areas, reducing the need
 for additional parsing per command.

* Tokenisation in the form key/value means that each command can lookup
  the data in the same way.  For instance:

if (has_token(level)) {
int l = strtonum(get_token(level), 1, INT_MAX, NULL);
if (l  10)
fprintf(stderr, Sorry, break level too high!\n);
}

 This says nothing about the type; conversion to other types can happen
 independently as needed (hence the strtonum() definition above).

* Note that the semantics of a command don't change---there might be an
  upfront overhead in terms of structuring the tokenised data, but I
  think this flexibility really does allow for some interesting things.

- OK, at the moment, that's an internal mechanism in terms of the
  key name, because we're not changing the semantics at this point,
  but when we do, this makes the transition easier.
- The key could then be whatever new identifier per command(s) we
  want, including common idioms for establishing how we define
  move/resize arguments, etc.

Note that the struct tokens example doesn't take into account function
parsing semantics of multiple values for different functions, in
specific orderings.

I didn't really want to dive into the mechanics in quite this much
detail, but I'm 

Re: REWRITE: New parser design

2014-08-26 Thread Dan Espen
Thomas Adam tho...@fvwm.org writes:

 * Tokenisation in the form key/value means that each command can lookup
   the data in the same way.  For instance:

 if (has_token(level)) {
 int l = strtonum(get_token(level), 1, INT_MAX, NULL);
 if (l  10)
 fprintf(stderr, Sorry, break level too high!\n);
 }

Comments:

What you show above tells me we have 2 functions has_token and
get_token that as a result of a parser can retrieve parts of the
command just parsed using a token in the command.

Doesn't imply some kind of hash table?

I would  expect a table driven  parser to have a  table representing the
commands arguments.

Maybe:

Animated_Move = {int, x,
  int, y,
  keyword, WARP, warp_keyword};

Then the logic in the CMD_ANIMATED_MOVE would be
getting at it's parsed values differently:

this_am_x = x;
this_am_y = y;
this_am)_warp = FALSE;
if (warp_keyword) {
   this_am_warp = TRUE;
}

-- 
Dan Espen