Re: [Qemu-devel] [PATCH v2 01/11] qapi: add qapi2texi script

2016-11-03 Thread Marc-André Lureau
Hi

On Thu, Nov 3, 2016 at 5:46 PM Markus Armbruster  wrote:

> Marc-André Lureau  writes:
>
> > If we want to have a very strict grammar, there is even more work needed
> to
> > cleanup the documentation. I don't fancy doing that work, it gives only
> low
> > benefits.
>
> Since I have pretty specific ideas on how I want this part of your much
> larger work done, I think it's only fair if I give it a try myself:
> propose a grammar, create a parser to show it works, clean up existing
> comments to comply with it.
>
>
Thank you, I gladly accept the help :) However, could you wait until the
next iteration? I am trying to improve the parsing and simplifying a bit
the code (although probably not reaching your goal)

-- 
Marc-André Lureau


Re: [Qemu-devel] [PATCH v2 01/11] qapi: add qapi2texi script

2016-11-03 Thread Markus Armbruster
Marc-André Lureau  writes:

> Hi
>
> On Fri, Oct 28, 2016 at 7:45 PM Markus Armbruster  wrote:
>
>> Marc-André Lureau  writes:
>>
>> > As the name suggests, the qapi2texi script converts JSON QAPI
>> > description into a standalone texi file suitable for different target
>> > formats.
>> >
>> > It parses the following kind of blocks with some little variations:
>> >
>> >   ##
>> >   # = Section
>> >   # == Subsection
>> >   #
>> >   # Some text foo with *emphasis*
>> >   # 1. with a list
>> >   # 2. like that
>> >   #
>> >   # And some code:
>> >   # | $ echo foo
>> >   # | <- do this
>> >   # | -> get that
>> >   #
>> >   ##
>> >
>> >   ##
>> >   # @symbol
>> >   #
>> >   # Symbol body ditto ergo sum. Foo bar
>> >   # baz ding.
>> >   #
>> >   # @arg: foo
>> >   # @arg: #optional foo
>> >   #
>> >   # Returns: returns bla bla
>> >   #
>> >   #  Or bla blah
>> >   #
>> >   # Since: version
>> >   # Notes: notes, comments can have
>> >   #- itemized list
>> >   #- like this
>> >   #
>> >   #and continue...
>> >   #
>> >   # Example:
>> >   #
>> >   # <- { "execute": "quit" }
>> >   # -> { "return": {} }
>> >   #
>> >   ##
>> >
>> > Thanks to the json declaration, it's able to give extra information
>> > about the type of arguments and return value expected.
>> >
>> > Signed-off-by: Marc-André Lureau 
>> > ---
>> >  scripts/qapi.py| 100 +++-
>> >  scripts/qapi2texi.py   | 314
>> +
>> >  docs/qapi-code-gen.txt |  44 +--
>> >  3 files changed, 446 insertions(+), 12 deletions(-)
>> >  create mode 100755 scripts/qapi2texi.py
>>
>> My review will be easier to follow if you skip ahead qapi-code-gen.txt,
>> then go to class QAPISchemaParser, then come back here.
>>
>> > diff --git a/scripts/qapi.py b/scripts/qapi.py
>> > index 21bc32f..4efc7e7 100644
>> > --- a/scripts/qapi.py
>> > +++ b/scripts/qapi.py
>> > @@ -122,6 +122,79 @@ class QAPIExprError(Exception):
>> >  "%s:%d: %s" % (self.info['file'], self.info['line'],
>> self.msg)
>> >
>> >
>> > +class QAPIDoc:
>>
>> Old-style class.  Make this
>>
>>class QAPIDoc(object):
>>
>
> ok
>
>
>>
>> > +def __init__(self, comment):
>> > +self.symbol = None
>> > +self.comment = "" # the main symbol comment
>>
>> PEP8 wants at least two spaces before #, and I prefer the # in colum 32.
>>
>
> let's just remove that comment
>
>
>>
>> > +self.args = OrderedDict()
>> > +# meta is for Since:, Notes:, Examples:, Returns:...
>> > +self.meta = OrderedDict()
>> > +# the current section to populate, array of [dict, key, 
>> > comment...]
>> > +self.section = None
>> > +
>> > +for line in comment.split('\n'):
>>
>> Makes me wonder whether the caller should pass a list of lines instead
>> of a string that needs to be split into lines.
>>
>> > +# remove multiple spaces
>> > +sline = ' '.join(line.split())
>>
>> Actually, this doesn't "remove multiple spaces", it squeezes sequences
>> of whitespace into a single space character.  Rather inefficiently, I
>> suspect, but let's not worry about that now.
>>
>> > +# take the first word out
>> > +split = sline.split(' ', 1)
>> > +key = split[0]
>> > +
>> > +if key.startswith("@"):
>> > +key = key[1:].rstrip(':')
>>
>> Treats the colon as optional.  I understand you're trying to be
>> unintrusive, but since we're parsing comments already, we can make the
>> parser do double duty and enforce consistent comment formatting.  But
>> this isn't the place to discuss the format we want, review of
>> qapi-code-gen.txt is.
>>
>
> Indeed, I was being permissive: my initial plan was rather to have
> something quickly last year. It's simply comments, it does not have to be
> rigorous as code to me. Now that we are taking >1.5y to get it in, we may
> as well enforce more stuff from the start.

My aim is to avoid silent garbage-in, garbage out in the long run, where
"garbage" input looks innocent at least to casual reviewers, and to
assist developers in writing non-garbage input.

A quick solution that doesn't accomplish that may still get us closer.
Or it may be an avoidable detour.  Depends.

>Obviously that kind of feedback
> would have been nice earlier.

You're absolutely right.

>(needless to say my motivation is lower than
> when I started)

Who wouldn't be frustrated by such delays!

You're the victim of unlucky timing and perhaps questionable judgement
calls.  Your work arrived in the middle of the huge effort to merge
Eric's work.  For better or worse, we decided to stay focused on the
task that was in progress.  Maybe we should've paused at least for some
high level review, and to see whether we 

Re: [Qemu-devel] [PATCH v2 01/11] qapi: add qapi2texi script

2016-11-02 Thread Marc-André Lureau
Hi

On Fri, Oct 28, 2016 at 7:45 PM Markus Armbruster  wrote:

> Marc-André Lureau  writes:
>
> > As the name suggests, the qapi2texi script converts JSON QAPI
> > description into a standalone texi file suitable for different target
> > formats.
> >
> > It parses the following kind of blocks with some little variations:
> >
> >   ##
> >   # = Section
> >   # == Subsection
> >   #
> >   # Some text foo with *emphasis*
> >   # 1. with a list
> >   # 2. like that
> >   #
> >   # And some code:
> >   # | $ echo foo
> >   # | <- do this
> >   # | -> get that
> >   #
> >   ##
> >
> >   ##
> >   # @symbol
> >   #
> >   # Symbol body ditto ergo sum. Foo bar
> >   # baz ding.
> >   #
> >   # @arg: foo
> >   # @arg: #optional foo
> >   #
> >   # Returns: returns bla bla
> >   #
> >   #  Or bla blah
> >   #
> >   # Since: version
> >   # Notes: notes, comments can have
> >   #- itemized list
> >   #- like this
> >   #
> >   #and continue...
> >   #
> >   # Example:
> >   #
> >   # <- { "execute": "quit" }
> >   # -> { "return": {} }
> >   #
> >   ##
> >
> > Thanks to the json declaration, it's able to give extra information
> > about the type of arguments and return value expected.
> >
> > Signed-off-by: Marc-André Lureau 
> > ---
> >  scripts/qapi.py| 100 +++-
> >  scripts/qapi2texi.py   | 314
> +
> >  docs/qapi-code-gen.txt |  44 +--
> >  3 files changed, 446 insertions(+), 12 deletions(-)
> >  create mode 100755 scripts/qapi2texi.py
>
> My review will be easier to follow if you skip ahead qapi-code-gen.txt,
> then go to class QAPISchemaParser, then come back here.
>
> > diff --git a/scripts/qapi.py b/scripts/qapi.py
> > index 21bc32f..4efc7e7 100644
> > --- a/scripts/qapi.py
> > +++ b/scripts/qapi.py
> > @@ -122,6 +122,79 @@ class QAPIExprError(Exception):
> >  "%s:%d: %s" % (self.info['file'], self.info['line'],
> self.msg)
> >
> >
> > +class QAPIDoc:
>
> Old-style class.  Make this
>
>class QAPIDoc(object):
>

ok


>
> > +def __init__(self, comment):
> > +self.symbol = None
> > +self.comment = "" # the main symbol comment
>
> PEP8 wants at least two spaces before #, and I prefer the # in colum 32.
>

let's just remove that comment


>
> > +self.args = OrderedDict()
> > +# meta is for Since:, Notes:, Examples:, Returns:...
> > +self.meta = OrderedDict()
> > +# the current section to populate, array of [dict, key,
> comment...]
> > +self.section = None
> > +
> > +for line in comment.split('\n'):
>
> Makes me wonder whether the caller should pass a list of lines instead
> of a string that needs to be split into lines.
>
> > +# remove multiple spaces
> > +sline = ' '.join(line.split())
>
> Actually, this doesn't "remove multiple spaces", it squeezes sequences
> of whitespace into a single space character.  Rather inefficiently, I
> suspect, but let's not worry about that now.
>
> > +# take the first word out
> > +split = sline.split(' ', 1)
> > +key = split[0]
> > +
> > +if key.startswith("@"):
> > +key = key[1:].rstrip(':')
>
> Treats the colon as optional.  I understand you're trying to be
> unintrusive, but since we're parsing comments already, we can make the
> parser do double duty and enforce consistent comment formatting.  But
> this isn't the place to discuss the format we want, review of
> qapi-code-gen.txt is.
>

Indeed, I was being permissive: my initial plan was rather to have
something quickly last year. It's simply comments, it does not have to be
rigorous as code to me. Now that we are taking >1.5y to get it in, we may
as well enforce more stuff from the start. Obviously that kind of feedback
would have been nice earlier.  (needless to say my motivation is lower than
when I started)

>
> > +sline = split[1] if len(split) > 1 else ""
> > +if self.symbol is None:
> > +# the first is the section symbol
> > +self.symbol = key
>
> Let's not say "the section symbol".  It's the symbol this APIDoc object
> documents.  The APIDoc object has sections.
>

ok


>
> > +else:
> > +# else an arg
> > +self.start_section(self.args, key)
> > +elif self.symbol and \
> > +key in ("Returns:",
> > +# special case for Since often used without:
> > +"Since:", "Since",
> > +# those are often singular or plural
> > +"Note:", "Notes:",
> > +"Example:", "Examples:"):
> > +sline = split[1] if len(split) > 1 else ""
> > +line = None
> > +# 

Re: [Qemu-devel] [PATCH v2 01/11] qapi: add qapi2texi script

2016-10-28 Thread Eric Blake
On 10/28/2016 11:44 AM, Markus Armbruster wrote:
> Marc-André Lureau  writes:
> 
>> As the name suggests, the qapi2texi script converts JSON QAPI
>> description into a standalone texi file suitable for different target
>> formats.
>>
>> It parses the following kind of blocks with some little variations:
>>
>>   ##
>>   # = Section
>>   # == Subsection
>>   #
>>   # Some text foo with *emphasis*
>>   # 1. with a list
>>   # 2. like that
>>   #
>>   # And some code:
>>   # | $ echo foo
>>   # | <- do this
>>   # | -> get that

Backwards.  Pre-patch, qmp-commands.txt states:

> Also, the following notation is used to denote data flow:
> 
> -> data issued by the Client
> <- Server data response

so you want

-> do this
<- get that

>>   #
>>   ##

How much effort is it to get rid of the "little variations" and flag
documentation that does not match consistent patterns?  A tighter parser
that requires specific format, but also diagnoses where comments don't
meet that format, is probably going to be easier to maintain in the long
run (fewer special cases) even if it is more painful up front (more
tweaks to bring things into a consistent layout).


>>
>>   ##
>>   # @symbol
>>   #
>>   # Symbol body ditto ergo sum. Foo bar
>>   # baz ding.
>>   #
>>   # @arg: foo
>>   # @arg: #optional foo
>>   #
>>   # Returns: returns bla bla
>>   #
>>   #  Or bla blah
>>   #
>>   # Since: version
>>   # Notes: notes, comments can have
>>   #- itemized list
>>   #- like this
>>   #
>>   #and continue...
>>   #
>>   # Example:
>>   #
>>   # <- { "execute": "quit" }
>>   # -> { "return": {} }

Again, backwards.

>>   #
>>   ##
>>
>> Thanks to the json declaration, it's able to give extra information
>> about the type of arguments and return value expected.
>>
>> Signed-off-by: Marc-André Lureau 
>> ---
>>  scripts/qapi.py| 100 +++-
>>  scripts/qapi2texi.py   | 314 
>> +
>>  docs/qapi-code-gen.txt |  44 +--
>>  3 files changed, 446 insertions(+), 12 deletions(-)
>>  create mode 100755 scripts/qapi2texi.py
> 
> My review will be easier to follow if you skip ahead qapi-code-gen.txt,
> then go to class QAPISchemaParser, then come back here.

Commenting on just a few points:

>> +if key.startswith("@"):
>> +key = key[1:].rstrip(':')
> 
> Treats the colon as optional.  I understand you're trying to be
> unintrusive, but since we're parsing comments already, we can make the
> parser do double duty and enforce consistent comment formatting.  But
> this isn't the place to discuss the format we want, review of
> qapi-code-gen.txt is.

Makes sense - since we are writing a parser, we can make it be strict
and pick one approved format, rather than allowing two separate formats;
the parser will point out what needs to be fixed, and we actually make
those fixes then rearrange the series so the fixes come first.

> 
> Taken together:
> 
> * The QAPI parser is hacked to accumulate and exfiltrate some comments
>   to the QAPIDoc parser.

I haven't closely looked at the code yet, just the review comments at
this point.  During the QAPI parse of the .json file, are we passing the
entire ##...## comment (of which there should be at most one per
top-level element of the .json file), and then letting QAPIDoc further
parse that comment block?  Or are you trying to combine both levels of
parsing into the same parser object?


> General remarks:
> 
> * I feel the only way to keep the API documentation in order as we
>   develop is to have the parser reject malformed API comments.

Agreed. Forcing docs to be well-written up front will pay benefits in
ease of maintenance and uniformity of style (it's a lot easier to
copy-and-paste a working style if that is the only style to copy from),
once we have converted existing documentation to be well-formed.

> 
> * The only way to create a robust and maintainable parser is to specify
>   the language *first* and *rigorously*.
> 
> * Stacking parsers is not a good idea.  Figuring out one parser's
>   accepted language is hard enough.

We do have the nice aspect that there is (should be) at most one ##...##
comment body per QAPI (pseudo-JSON) object, so the QAPI parse just needs
to find that comment block and associate it with everything else it
parsed.  Further parsing the documentation comment into sections for
output, as well as double-checking sanity aspects (such as no missing or
extra parameters) may require more coordination between the parsed QAPI
object and the code that parses the comments.


>> +++ b/docs/qapi-code-gen.txt
>> @@ -45,16 +45,13 @@ QAPI parser does not).  At present, there is no place 
>> where a QAPI
>>  schema requires the use of JSON numbers or null.
>>  
>>  Comments are allowed; anything between an unquoted # and the following
>> -newline is ignored.  Although there is not yet a documentation
>> 

Re: [Qemu-devel] [PATCH v2 01/11] qapi: add qapi2texi script

2016-10-28 Thread Markus Armbruster
Marc-André Lureau  writes:

> As the name suggests, the qapi2texi script converts JSON QAPI
> description into a standalone texi file suitable for different target
> formats.
>
> It parses the following kind of blocks with some little variations:
>
>   ##
>   # = Section
>   # == Subsection
>   #
>   # Some text foo with *emphasis*
>   # 1. with a list
>   # 2. like that
>   #
>   # And some code:
>   # | $ echo foo
>   # | <- do this
>   # | -> get that
>   #
>   ##
>
>   ##
>   # @symbol
>   #
>   # Symbol body ditto ergo sum. Foo bar
>   # baz ding.
>   #
>   # @arg: foo
>   # @arg: #optional foo
>   #
>   # Returns: returns bla bla
>   #
>   #  Or bla blah
>   #
>   # Since: version
>   # Notes: notes, comments can have
>   #- itemized list
>   #- like this
>   #
>   #and continue...
>   #
>   # Example:
>   #
>   # <- { "execute": "quit" }
>   # -> { "return": {} }
>   #
>   ##
>
> Thanks to the json declaration, it's able to give extra information
> about the type of arguments and return value expected.
>
> Signed-off-by: Marc-André Lureau 
> ---
>  scripts/qapi.py| 100 +++-
>  scripts/qapi2texi.py   | 314 
> +
>  docs/qapi-code-gen.txt |  44 +--
>  3 files changed, 446 insertions(+), 12 deletions(-)
>  create mode 100755 scripts/qapi2texi.py

My review will be easier to follow if you skip ahead qapi-code-gen.txt,
then go to class QAPISchemaParser, then come back here.

> diff --git a/scripts/qapi.py b/scripts/qapi.py
> index 21bc32f..4efc7e7 100644
> --- a/scripts/qapi.py
> +++ b/scripts/qapi.py
> @@ -122,6 +122,79 @@ class QAPIExprError(Exception):
>  "%s:%d: %s" % (self.info['file'], self.info['line'], self.msg)
>  
>  
> +class QAPIDoc:

Old-style class.  Make this

   class QAPIDoc(object):

> +def __init__(self, comment):
> +self.symbol = None
> +self.comment = "" # the main symbol comment

PEP8 wants at least two spaces before #, and I prefer the # in colum 32.

> +self.args = OrderedDict()
> +# meta is for Since:, Notes:, Examples:, Returns:...
> +self.meta = OrderedDict()
> +# the current section to populate, array of [dict, key, comment...]
> +self.section = None
> +
> +for line in comment.split('\n'):

Makes me wonder whether the caller should pass a list of lines instead
of a string that needs to be split into lines.

> +# remove multiple spaces
> +sline = ' '.join(line.split())

Actually, this doesn't "remove multiple spaces", it squeezes sequences
of whitespace into a single space character.  Rather inefficiently, I
suspect, but let's not worry about that now.

> +# take the first word out
> +split = sline.split(' ', 1)
> +key = split[0]
> +
> +if key.startswith("@"):
> +key = key[1:].rstrip(':')

Treats the colon as optional.  I understand you're trying to be
unintrusive, but since we're parsing comments already, we can make the
parser do double duty and enforce consistent comment formatting.  But
this isn't the place to discuss the format we want, review of
qapi-code-gen.txt is.

> +sline = split[1] if len(split) > 1 else ""
> +if self.symbol is None:
> +# the first is the section symbol
> +self.symbol = key

Let's not say "the section symbol".  It's the symbol this APIDoc object
documents.  The APIDoc object has sections.

> +else:
> +# else an arg
> +self.start_section(self.args, key)
> +elif self.symbol and \
> +key in ("Returns:",
> +# special case for Since often used without:
> +"Since:", "Since",
> +# those are often singular or plural
> +"Note:", "Notes:",
> +"Example:", "Examples:"):
> +sline = split[1] if len(split) > 1 else ""
> +line = None
> +# new "meta" section
> +self.start_section(self.meta, key.rstrip(':'))
> +
> +if self.section and self.section[1].startswith("Example"):
> +# example is verbatim
> +self.append_comment(line)
> +else:
> +self.append_comment(sline)

Sections other than "Example" and "Examples" have whitespace squeezed.
I believe this is the true reason for the squeezing.  There are a few
other places that rely on it, but they could do just as well without.

> +
> +self.end_section()
> +
> +def append_comment(self, line):
> +"""Adds a comment to the current section, or the symbol comment"""
> +if line is None:
> +return
> +if self.section is not None:
> 

[Qemu-devel] [PATCH v2 01/11] qapi: add qapi2texi script

2016-09-25 Thread Marc-André Lureau
As the name suggests, the qapi2texi script converts JSON QAPI
description into a standalone texi file suitable for different target
formats.

It parses the following kind of blocks with some little variations:

  ##
  # = Section
  # == Subsection
  #
  # Some text foo with *emphasis*
  # 1. with a list
  # 2. like that
  #
  # And some code:
  # | $ echo foo
  # | <- do this
  # | -> get that
  #
  ##

  ##
  # @symbol
  #
  # Symbol body ditto ergo sum. Foo bar
  # baz ding.
  #
  # @arg: foo
  # @arg: #optional foo
  #
  # Returns: returns bla bla
  #
  #  Or bla blah
  #
  # Since: version
  # Notes: notes, comments can have
  #- itemized list
  #- like this
  #
  #and continue...
  #
  # Example:
  #
  # <- { "execute": "quit" }
  # -> { "return": {} }
  #
  ##

Thanks to the json declaration, it's able to give extra information
about the type of arguments and return value expected.

Signed-off-by: Marc-André Lureau 
---
 scripts/qapi.py| 100 +++-
 scripts/qapi2texi.py   | 314 +
 docs/qapi-code-gen.txt |  44 +--
 3 files changed, 446 insertions(+), 12 deletions(-)
 create mode 100755 scripts/qapi2texi.py

diff --git a/scripts/qapi.py b/scripts/qapi.py
index 21bc32f..4efc7e7 100644
--- a/scripts/qapi.py
+++ b/scripts/qapi.py
@@ -122,6 +122,79 @@ class QAPIExprError(Exception):
 "%s:%d: %s" % (self.info['file'], self.info['line'], self.msg)
 
 
+class QAPIDoc:
+def __init__(self, comment):
+self.symbol = None
+self.comment = "" # the main symbol comment
+self.args = OrderedDict()
+# meta is for Since:, Notes:, Examples:, Returns:...
+self.meta = OrderedDict()
+# the current section to populate, array of [dict, key, comment...]
+self.section = None
+
+for line in comment.split('\n'):
+# remove multiple spaces
+sline = ' '.join(line.split())
+# take the first word out
+split = sline.split(' ', 1)
+key = split[0]
+
+if key.startswith("@"):
+key = key[1:].rstrip(':')
+sline = split[1] if len(split) > 1 else ""
+if self.symbol is None:
+# the first is the section symbol
+self.symbol = key
+else:
+# else an arg
+self.start_section(self.args, key)
+elif self.symbol and \
+key in ("Returns:",
+# special case for Since often used without:
+"Since:", "Since",
+# those are often singular or plural
+"Note:", "Notes:",
+"Example:", "Examples:"):
+sline = split[1] if len(split) > 1 else ""
+line = None
+# new "meta" section
+self.start_section(self.meta, key.rstrip(':'))
+
+if self.section and self.section[1].startswith("Example"):
+# example is verbatim
+self.append_comment(line)
+else:
+self.append_comment(sline)
+
+self.end_section()
+
+def append_comment(self, line):
+"""Adds a comment to the current section, or the symbol comment"""
+if line is None:
+return
+if self.section is not None:
+if self.section[-1] == "" and line == "":
+self.end_section()
+else:
+self.section.append(line)
+elif self.comment == "":
+self.comment = line
+else:
+self.comment += "\n" + line
+
+def end_section(self):
+if self.section is not None:
+dic = self.section[0]
+key = self.section[1]
+doc = "\n".join(self.section[2:])
+dic[key] = doc
+self.section = None
+
+def start_section(self, dic, key):
+self.end_section()
+self.section = [dic, key]  # .. remaining elems will be the doc
+
+
 class QAPISchemaParser(object):
 
 def __init__(self, fp, previously_included=[], incl_info=None):
@@ -137,11 +210,14 @@ class QAPISchemaParser(object):
 self.line = 1
 self.line_pos = 0
 self.exprs = []
+self.comment = None
+self.apidoc = incl_info['doc'] if incl_info else []
 self.accept()
 
 while self.tok is not None:
 expr_info = {'file': fname, 'line': self.line,
- 'parent': self.incl_info}
+ 'parent': self.incl_info, 'doc': self.apidoc}
+self.apidoc = []
 expr = self.get_expr(False)
 if isinstance(expr, dict) and "include" in expr:
 if len(expr) != 1:
@@ -162,6 +238,8 @@ class QAPISchemaParser(object):