Re: [Qemu-devel] [PATCH v2 2/4] qjson: do not save/restore contexts

2015-11-25 Thread Markus Armbruster
Paolo Bonzini  writes:

> JSON is LL(1) and our parser indeed needs only 1 token lookahead.
> Saving the parser context is mostly unnecessary;

"Mit Kanonen auf Spatzen" (shooting cannons at sparrows).

>  we can replace it
> with peeking at the next token, or remove it altogether when the
> restore only happens on errors.  The token list is destroyed anyway
> on errors.
>
> The only interesting thing is that parse_keyword always eats
> a TOKEN_KEYWORD, even if it is invalid, so it must come last in
> parse_value (otherwise, NULL is returned, parse_literal is invoked
> and it tries to peek beyond end of input).  This is caught by
> /errors/unterminated/literal, which actually checks for an unterminated
> keyword. ಠ_ಠ
>
> Signed-off-by: Paolo Bonzini 
> ---
>  qobject/json-parser.c | 59 
> ++-
>  1 file changed, 21 insertions(+), 38 deletions(-)
>
> diff --git a/qobject/json-parser.c b/qobject/json-parser.c
> index ac991ba..7a287ea 100644
> --- a/qobject/json-parser.c
> +++ b/qobject/json-parser.c
> @@ -296,23 +296,6 @@ static QObject 
> *parser_context_peek_token(JSONParserContext *ctxt)
>  return token;
>  }
>  
> -static JSONParserContext parser_context_save(JSONParserContext *ctxt)
> -{
> -JSONParserContext saved_ctxt = {0};
> -saved_ctxt.tokens.pos = ctxt->tokens.pos;
> -saved_ctxt.tokens.count = ctxt->tokens.count;
> -saved_ctxt.tokens.buf = ctxt->tokens.buf;
> -return saved_ctxt;
> -}
> -
> -static void parser_context_restore(JSONParserContext *ctxt,
> -   JSONParserContext saved_ctxt)
> -{
> -ctxt->tokens.pos = saved_ctxt.tokens.pos;
> -ctxt->tokens.count = saved_ctxt.tokens.count;
> -ctxt->tokens.buf = saved_ctxt.tokens.buf;
> -}
> -

This saves and restores tokens, which is an array @buf of @count tokens
with a cursor @pos.

@buf and count only ever change in parser_context_new().  Saving and
restoring them has always been pointless.

What this actually does is saving and restoring the cursor.

>  static void tokens_append_from_iter(QObject *obj, void *opaque)
>  {
>  JSONParserContext *ctxt = opaque;
> @@ -364,7 +347,6 @@ static void parser_context_free(JSONParserContext *ctxt)
>  static int parse_pair(JSONParserContext *ctxt, QDict *dict, va_list *ap)
>  {
>  QObject *key = NULL, *token = NULL, *value, *peek;
> -JSONParserContext saved_ctxt = parser_context_save(ctxt);
>  
>  peek = parser_context_peek_token(ctxt);
>  if (peek == NULL) {
> @@ -402,7 +384,6 @@ static int parse_pair(JSONParserContext *ctxt, QDict 
> *dict, va_list *ap)
>  return 0;
>  
>  out:
> -parser_context_restore(ctxt, saved_ctxt);
>  qobject_decref(key);
>  
>  return -1;

Before your patch: on error, we rewind the cursor to where it was on
entry.  Useful in a backtracking parser, but this shouldn't be one.

Now: we leave it wherever we error out.

Fine, because the sole caller parse_object() won't actually use it on
error.

> @@ -412,9 +393,8 @@ static QObject *parse_object(JSONParserContext *ctxt, 
> va_list *ap)
>  {
>  QDict *dict = NULL;
>  QObject *token, *peek;
> -JSONParserContext saved_ctxt = parser_context_save(ctxt);
>  
> -token = parser_context_pop_token(ctxt);
> +token = parser_context_peek_token(ctxt);
>  if (token == NULL) {
>  goto out;
>  }
> @@ -425,6 +405,7 @@ static QObject *parse_object(JSONParserContext *ctxt, 
> va_list *ap)
>  
>  dict = qdict_new();
>  
> +parser_context_pop_token(ctxt);
>  peek = parser_context_peek_token(ctxt);
>  if (peek == NULL) {
>  parse_error(ctxt, NULL, "premature EOI");
> @@ -465,7 +446,6 @@ static QObject *parse_object(JSONParserContext *ctxt, 
> va_list *ap)
>  return QOBJECT(dict);
>  
>  out:
> -parser_context_restore(ctxt, saved_ctxt);
>  QDECREF(dict);
>  return NULL;
>  }

I'm not 100% when exactly parser_context_peek_token() returns null, but
I think can see what your patch does anyway.

Before: if we can parse an object, advance cursor behind the object and
return it, else leave the cursor alone and return null.  The sole caller
parse_value() relies on this behavior to try alternatives until one
succeeds.

After: if we can parse an object, same as before, else advance the
cursor to right before the offending token and return null.

Consider QMP input { 1 [ 2 ] }.

json_parser_parse_err():
parse_value() @ pos=0:
triy parse_object() @ pos=0:
consume '{'
parse_pair() @ pos=1:
parse_value() @ pos=1:
try ..., consume 1, now pos=2
fail with "key is not a string in object"
propagate failure
try parse_array() @ pos=2:
consume [ 2 ], return the QList, now pos=5
return the QList

Re: [Qemu-devel] [PATCH v2 2/4] qjson: do not save/restore contexts

2015-11-24 Thread Gerd Hoffmann
  Hi,

> Is it accepted practice to put UTF-8 in commit messages? (Or, actually,
> anywhere in patches, except maybe the notes section?)

Sure.  We don't want limit people names (in signed-off etc) to us-ascii.

See "eb8934b vnc: fix memory corruption (CVE-2015-5225)" for a name
written in kanji.

> I'd recommend o_O.

Heh, it's 2015, not 1995 ...

cheers,
  Gerd





Re: [Qemu-devel] [PATCH v2 2/4] qjson: do not save/restore contexts

2015-11-24 Thread Laszlo Ersek
On 11/24/15 09:03, Gerd Hoffmann wrote:
>   Hi,
> 
>> Is it accepted practice to put UTF-8 in commit messages? (Or, actually,
>> anywhere in patches, except maybe the notes section?)
> 
> Sure.  We don't want limit people names (in signed-off etc) to us-ascii.
> 
> See "eb8934b vnc: fix memory corruption (CVE-2015-5225)" for a name
> written in kanji.

I'm very sorry, but this is something I expressly disagree with. (I
didn't want to bring this up on my own, but since you did...)

International engineering / science / research etc. are being done in
English. We use English because we expect people to learn one common
language (native English speakers have it easy, but that's a side
point), so that everyone not have to learn every possible language.

The same point applies *much more* to writing systems / alphabets. You
(the generic you) can't expect me (the generic me) to read Kanji,
Sanskrit, Thai script, Cyrillic script, and so on, even if your name is
written in that language natively. You come up with an approximation in
Latin script, and use that.

Is your purpose to feel pleased about the faithful representation of
your name in the commit message (that the international community is
unable to read, not even approximately), or is your goal to allow the
community to read your (approximate) name?

I bet everyone who is involved in international development, and travels
occasionally, has business cards in Latin script *too*. I bet whoever
does research and publishes papers in English puts their name (or at
least an official approximation of it) on the front page in  Latin
script *too*.

Specifically about the commit you mention, the email of the reporter is:

  zuozhi@alibaba-inc.com

I'm absolutely sure that "zuozhi" is the official Pinyin transliteration
of the reporter's name (or a part of it). Now, while Pinyin has its own
separate pronunciation rules, I *can* (and occasionally do) look up
those rules. So Pinyin allows me to *work* with the name with relative
safety, and it even gives me a fleeting chance at getting the
pronunciation right, should we meet.

My name is László Érsek. I've dropped the accents for the purpose of
international exchange in advance; I just write Laszlo Ersek, even when
I sign physical documents that are in English.

Even that way, I've seen the larger community abuse my name endlessly;
in particular I've seen all permutations (= reordering) and variations
(= missing characters) of the substring "szl" in "Laszlo".

If the larger community fails to get such a simple ASCII name right --
and yes I'm at fault too, I have occasionally left off the second "n" of
your last name --, then why am I (or anyone else) expected to struggle
with names written in non-Latin script? They are much harder, and have
exactly zero value, as far as international collaboration is concerned.

The development is being done in English, and the script of English is
Latin. Stick with it.

>> I'd recommend o_O.
> 
> Heh, it's 2015, not 1995 ...

Sure, and the Internet Standards are still being written in pure ASCII.

https://en.wikipedia.org/wiki/Request_for_Comments#Obtaining_RFCs

Laszlo




Re: [Qemu-devel] [PATCH v2 2/4] qjson: do not save/restore contexts

2015-11-24 Thread Gerd Hoffmann
  Hi,

> The same point applies *much more* to writing systems / alphabets. You
> (the generic you) can't expect me (the generic me) to read Kanji,
> Sanskrit, Thai script, Cyrillic script, and so on, even if your name is
> written in that language natively. You come up with an approximation in
> Latin script, and use that.

> Is your purpose to feel pleased about the faithful representation of
> your name in the commit message (that the international community is
> unable to read, not even approximately), or is your goal to allow the
> community to read your (approximate) name?

IMO that is for the people in question to decide.  Usually I just cut
+paste (or let stefans great patches script collect) what they are using
them-self and are apparently comfortable with.

Having Kanji only as in that specific case is unusual indeed, in most
cases there is latin transcript, either in place of or additionally to
the native version, like this: "latin (native) ".  The latter
looks a bit silly for latin-family of alphabets, for others
(kanji/cyrillic/...) it makes more sense and seems to be more common.

But even when they use native only I still think their name and not only
the email address should appear in the commit message when giving them
the credit they deserve.

cheers,
  Gerd





Re: [Qemu-devel] [PATCH v2 2/4] qjson: do not save/restore contexts

2015-11-24 Thread Paolo Bonzini


On 24/11/2015 11:50, Laszlo Ersek wrote:
> You
> (the generic you) can't expect me (the generic me) to read Kanji,
> Sanskrit, Thai script, Cyrillic script, and so on, even if your name is
> written in that language natively. You come up with an approximation in
> Latin script, and use that.
> 
> Is your purpose to feel pleased about the faithful representation of
> your name in the commit message (that the international community is
> unable to read, not even approximately), or is your goal to allow the
> community to read your (approximate) name?
> 
> Specifically about the commit you mention, the email of the reporter is:
> 
>   zuozhi@alibaba-inc.com
> 
> I'm absolutely sure that "zuozhi" is the official Pinyin transliteration
> of the reporter's name (or a part of it). Now, while Pinyin has its own
> separate pronunciation rules, I *can* (and occasionally do) look up
> those rules. So Pinyin allows me to *work* with the name with relative
> safety, and it even gives me a fleeting chance at getting the
> pronunciation right, should we meet.

I think this is getting into dangerous territory. :)

Chinese language and names are very different from ours and it's
possible that a person for some reason is very attached to the
particular hanzi (kanji is Japanese :)) that are used for their name.
(Fam, correct me if I'm wrong).

I've also seen people who have adopted an alternative English name and
still want to include the Chinese name somewhere, at which point it's
understandable that they write the latter with hanzi.

Certainly it's better if you get something like these:

   Signed-off-by: Gong Li (巩俐) 
   Signed-off-by: 巩俐 (Gong Li) 

but if the email is understandable I have no problem with

   Signed-off-by: 巩俐 

or in the case of an English name any of

   Signed-off-by: Jane Li (巩俐) 
   Signed-off-by: 巩俐 (Jane Li) 
   Signed-off-by: 巩俐 (Jane Li) 

where in the last case the email doesn't give the full Chinese name, but
there is an alternative for people who cannot read Chinese characters.

For some reason this almost never happens with Japanese developers, but
https://en.wikipedia.org/wiki/Japanese_name#Difficulty_of_reading_names
suggests that they may also appreciate using kanji for their names in
commit messages.

Thanks,

Paolo



Re: [Qemu-devel] [PATCH v2 2/4] qjson: do not save/restore contexts

2015-11-24 Thread Laszlo Ersek
On 11/24/15 11:50, Laszlo Ersek wrote:

> My name is László Érsek.

OMG, my native name is actually "Érsek László". 95% of my written
communication is in English, and this very discussion is occurring in
English. So in this context I forgot that in Hungarian the family name
goes first.

(Just goes on to show that I'm willing to mangle my name for
collaboration's sake. Others should be too.)




Re: [Qemu-devel] [PATCH v2 2/4] qjson: do not save/restore contexts

2015-11-24 Thread Paolo Bonzini


On 24/11/2015 13:44, Fam Zheng wrote:
> All in all, in my opinion, only something like
> 
>  Signed-off-by: 巩俐 
> 
> , from which you can't easily infer the pronunciation, is apparently
> inconsiderate in an English context. Otherwise I think we should tolerate the
> usage of non-latin characters even if it means we have to copy

FWIW, I totally agree.

Paolo



Re: [Qemu-devel] [PATCH v2 2/4] qjson: do not save/restore contexts

2015-11-24 Thread Fam Zheng
On Tue, 11/24 12:18, Paolo Bonzini wrote:
> 
> 
> On 24/11/2015 11:50, Laszlo Ersek wrote:
> > You
> > (the generic you) can't expect me (the generic me) to read Kanji,
> > Sanskrit, Thai script, Cyrillic script, and so on, even if your name is
> > written in that language natively. You come up with an approximation in
> > Latin script, and use that.
> > 
> > Is your purpose to feel pleased about the faithful representation of
> > your name in the commit message (that the international community is
> > unable to read, not even approximately), or is your goal to allow the
> > community to read your (approximate) name?
> > 
> > Specifically about the commit you mention, the email of the reporter is:
> > 
> >   zuozhi@alibaba-inc.com
> > 
> > I'm absolutely sure that "zuozhi" is the official Pinyin transliteration
> > of the reporter's name (or a part of it). Now, while Pinyin has its own

Yes, it's the Pinyin of 祚至, his given name. It's a peculiar and literary
choice of name. :)

> > separate pronunciation rules, I *can* (and occasionally do) look up
> > those rules. So Pinyin allows me to *work* with the name with relative
> > safety, and it even gives me a fleeting chance at getting the
> > pronunciation right, should we meet.
> 
> I think this is getting into dangerous territory. :)
> 
> Chinese language and names are very different from ours and it's
> possible that a person for some reason is very attached to the
> particular hanzi (kanji is Japanese :)) that are used for their name.
> (Fam, correct me if I'm wrong).
> 
> I've also seen people who have adopted an alternative English name and
> still want to include the Chinese name somewhere, at which point it's
> understandable that they write the latter with hanzi.
> 
> Certainly it's better if you get something like these:
> 
>Signed-off-by: Gong Li (巩俐) 
>Signed-off-by: 巩俐 (Gong Li) 
> 
> but if the email is understandable I have no problem with
> 
>Signed-off-by: 巩俐 
> 
> or in the case of an English name any of
> 
>Signed-off-by: Jane Li (巩俐) 
>Signed-off-by: 巩俐 (Jane Li) 
>Signed-off-by: 巩俐 (Jane Li) 
> 
> where in the last case the email doesn't give the full Chinese name, but
> there is an alternative for people who cannot read Chinese characters.
> 
> For some reason this almost never happens with Japanese developers, but
> https://en.wikipedia.org/wiki/Japanese_name#Difficulty_of_reading_names
> suggests that they may also appreciate using kanji for their names in
> commit messages.

The problem for using Pinyin is, even I as a Chinese can't tell which is the
last name out of a two-syllable name such as "Gong Li", because of obvious
reasons.

Having hanzi together with Pinyin helps a lot, in that it tells me how to call
it, should we meet. Not just in "which is the family name", but also which
tone to use. In Chinese, you're not calling one's name if not using the right
tone. For example "zuozhi" can be "左之" or "祚至" or one of other 100
combinations, which vary in tones. Personally I don't feel very comfortable
when a Mandarin speaker calls me with a wrong tone. And I know that will happen
from time to time if I stick to Pinyin rather than Fam.  Thinking from the
other way round, that may be a reason why they use hanzi.

All in all, in my opinion, only something like

 Signed-off-by: 巩俐 

, from which you can't easily infer the pronunciation, is apparently
inconsiderate in an English context. Otherwise I think we should tolerate the
usage of non-latin characters even if it means we have to copy

Fam



Re: [Qemu-devel] [PATCH v2 2/4] qjson: do not save/restore contexts

2015-11-24 Thread Markus Armbruster
Paolo Bonzini  writes:

> On 24/11/2015 13:44, Fam Zheng wrote:
>> All in all, in my opinion, only something like
>> 
>>  Signed-off-by: 巩俐 
>> 
>> , from which you can't easily infer the pronunciation, is apparently
>> inconsiderate in an English context. Otherwise I think we should tolerate the
>> usage of non-latin characters even if it means we have to copy
>
> FWIW, I totally agree.

Me too.



Re: [Qemu-devel] [PATCH v2 2/4] qjson: do not save/restore contexts

2015-11-23 Thread Laszlo Ersek
On 11/23/15 19:09, Paolo Bonzini wrote:
> 
> 
> On 23/11/2015 18:59, Laszlo Ersek wrote:
 The only interesting thing is that parse_keyword always eats
 a TOKEN_KEYWORD, even if it is invalid, so it must come last in
 parse_value (otherwise, NULL is returned, parse_literal is invoked
 and it tries to peek beyond end of input).  This is caught by
 /errors/unterminated/literal, which actually checks for an unterminated
 keyword. ಠ_ಠ
>> Is it accepted practice to put UTF-8 in commit messages? (Or, actually,
>> anywhere in patches, except maybe the notes section?)
>>
>> I'd recommend o_O.
> 
> Perhaps not Kannada, but Unicode 0xA1 to 0x1FF is pretty common in
> commit messages.

That seems to be a subset of the union of the following three:

https://en.wikipedia.org/wiki/Latin-1_Supplement_%28Unicode_block%29
https://en.wikipedia.org/wiki/Latin_Extended-A
https://en.wikipedia.org/wiki/Latin_Extended-B

Thanks for the info,
Laszlo



Re: [Qemu-devel] [PATCH v2 2/4] qjson: do not save/restore contexts

2015-11-23 Thread Paolo Bonzini


On 23/11/2015 18:59, Laszlo Ersek wrote:
>> > The only interesting thing is that parse_keyword always eats
>> > a TOKEN_KEYWORD, even if it is invalid, so it must come last in
>> > parse_value (otherwise, NULL is returned, parse_literal is invoked
>> > and it tries to peek beyond end of input).  This is caught by
>> > /errors/unterminated/literal, which actually checks for an unterminated
>> > keyword. ಠ_ಠ
> Is it accepted practice to put UTF-8 in commit messages? (Or, actually,
> anywhere in patches, except maybe the notes section?)
> 
> I'd recommend o_O.

Perhaps not Kannada, but Unicode 0xA1 to 0x1FF is pretty common in
commit messages.

Paolo



Re: [Qemu-devel] [PATCH v2 2/4] qjson: do not save/restore contexts

2015-11-23 Thread Eric Blake
On 11/23/2015 10:59 AM, Laszlo Ersek wrote:
> On 11/23/15 18:44, Paolo Bonzini wrote:
>> JSON is LL(1) and our parser indeed needs only 1 token lookahead.
>> Saving the parser context is mostly unnecessary; we can replace it
>> with peeking at the next token, or remove it altogether when the
>> restore only happens on errors.  The token list is destroyed anyway
>> on errors.
>>
>> The only interesting thing is that parse_keyword always eats
>> a TOKEN_KEYWORD, even if it is invalid, so it must come last in
>> parse_value (otherwise, NULL is returned, parse_literal is invoked
>> and it tries to peek beyond end of input).  This is caught by
>> /errors/unterminated/literal, which actually checks for an unterminated
>> keyword. ಠ_ಠ
> 
> Is it accepted practice to put UTF-8 in commit messages? (Or, actually,
> anywhere in patches, except maybe the notes section?)
> 

Git handles UTF-8 just fine (and for any other encoding, properly
transmitted in the email, git transcodes to UTF-8 before writing it into
the repository).

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH v2 2/4] qjson: do not save/restore contexts

2015-11-23 Thread Laszlo Ersek
On 11/23/15 21:05, Eric Blake wrote:
> On 11/23/2015 10:59 AM, Laszlo Ersek wrote:
>> On 11/23/15 18:44, Paolo Bonzini wrote:
>>> JSON is LL(1) and our parser indeed needs only 1 token lookahead.
>>> Saving the parser context is mostly unnecessary; we can replace it
>>> with peeking at the next token, or remove it altogether when the
>>> restore only happens on errors.  The token list is destroyed anyway
>>> on errors.
>>>
>>> The only interesting thing is that parse_keyword always eats
>>> a TOKEN_KEYWORD, even if it is invalid, so it must come last in
>>> parse_value (otherwise, NULL is returned, parse_literal is invoked
>>> and it tries to peek beyond end of input).  This is caught by
>>> /errors/unterminated/literal, which actually checks for an unterminated
>>> keyword. ಠ_ಠ
>>
>> Is it accepted practice to put UTF-8 in commit messages? (Or, actually,
>> anywhere in patches, except maybe the notes section?)
>>
> 
> Git handles UTF-8 just fine (and for any other encoding, properly
> transmitted in the email, git transcodes to UTF-8 before writing it into
> the repository).
> 

Yes, I know. I use latin2:

$ locale

LANG=
LC_CTYPE=hu_HU.ISO8859-2
LC_NUMERIC="POSIX"
LC_TIME="POSIX"
LC_COLLATE="POSIX"
LC_MONETARY="POSIX"
LC_MESSAGES="POSIX"
LC_PAPER="POSIX"
LC_NAME="POSIX"
LC_ADDRESS="POSIX"
LC_TELEPHONE="POSIX"
LC_MEASUREMENT="POSIX"
LC_IDENTIFICATION="POSIX"
LC_ALL=

and from my git config:

[i18n]
logOutputEncoding = latin2
commitencoding = latin2

This works very well -- as long as it doesn't choke on something outside
of latin2 --, both the glibc locale support and git are doing their jobs
perfectly fine; my question concerned any other users who decided to
stay with single-byte encodings (with an ASCII subset).

(I believe that RFCs stick with ASCII to this day, and I also think that
our source code and docs/ should stick with ASCII; but I know I can't
plausibly argue for the same in commit messages, assuming I'm alone with
that anyway.

BTW I should have written "non-ASCII Unicode code points" in my original
question, rather than "UTF-8".)

Thanks!
Laszlo



[Qemu-devel] [PATCH v2 2/4] qjson: do not save/restore contexts

2015-11-23 Thread Paolo Bonzini
JSON is LL(1) and our parser indeed needs only 1 token lookahead.
Saving the parser context is mostly unnecessary; we can replace it
with peeking at the next token, or remove it altogether when the
restore only happens on errors.  The token list is destroyed anyway
on errors.

The only interesting thing is that parse_keyword always eats
a TOKEN_KEYWORD, even if it is invalid, so it must come last in
parse_value (otherwise, NULL is returned, parse_literal is invoked
and it tries to peek beyond end of input).  This is caught by
/errors/unterminated/literal, which actually checks for an unterminated
keyword. ಠ_ಠ

Signed-off-by: Paolo Bonzini 
---
 qobject/json-parser.c | 59 ++-
 1 file changed, 21 insertions(+), 38 deletions(-)

diff --git a/qobject/json-parser.c b/qobject/json-parser.c
index ac991ba..7a287ea 100644
--- a/qobject/json-parser.c
+++ b/qobject/json-parser.c
@@ -296,23 +296,6 @@ static QObject 
*parser_context_peek_token(JSONParserContext *ctxt)
 return token;
 }
 
-static JSONParserContext parser_context_save(JSONParserContext *ctxt)
-{
-JSONParserContext saved_ctxt = {0};
-saved_ctxt.tokens.pos = ctxt->tokens.pos;
-saved_ctxt.tokens.count = ctxt->tokens.count;
-saved_ctxt.tokens.buf = ctxt->tokens.buf;
-return saved_ctxt;
-}
-
-static void parser_context_restore(JSONParserContext *ctxt,
-   JSONParserContext saved_ctxt)
-{
-ctxt->tokens.pos = saved_ctxt.tokens.pos;
-ctxt->tokens.count = saved_ctxt.tokens.count;
-ctxt->tokens.buf = saved_ctxt.tokens.buf;
-}
-
 static void tokens_append_from_iter(QObject *obj, void *opaque)
 {
 JSONParserContext *ctxt = opaque;
@@ -364,7 +347,6 @@ static void parser_context_free(JSONParserContext *ctxt)
 static int parse_pair(JSONParserContext *ctxt, QDict *dict, va_list *ap)
 {
 QObject *key = NULL, *token = NULL, *value, *peek;
-JSONParserContext saved_ctxt = parser_context_save(ctxt);
 
 peek = parser_context_peek_token(ctxt);
 if (peek == NULL) {
@@ -402,7 +384,6 @@ static int parse_pair(JSONParserContext *ctxt, QDict *dict, 
va_list *ap)
 return 0;
 
 out:
-parser_context_restore(ctxt, saved_ctxt);
 qobject_decref(key);
 
 return -1;
@@ -412,9 +393,8 @@ static QObject *parse_object(JSONParserContext *ctxt, 
va_list *ap)
 {
 QDict *dict = NULL;
 QObject *token, *peek;
-JSONParserContext saved_ctxt = parser_context_save(ctxt);
 
-token = parser_context_pop_token(ctxt);
+token = parser_context_peek_token(ctxt);
 if (token == NULL) {
 goto out;
 }
@@ -425,6 +405,7 @@ static QObject *parse_object(JSONParserContext *ctxt, 
va_list *ap)
 
 dict = qdict_new();
 
+parser_context_pop_token(ctxt);
 peek = parser_context_peek_token(ctxt);
 if (peek == NULL) {
 parse_error(ctxt, NULL, "premature EOI");
@@ -465,7 +446,6 @@ static QObject *parse_object(JSONParserContext *ctxt, 
va_list *ap)
 return QOBJECT(dict);
 
 out:
-parser_context_restore(ctxt, saved_ctxt);
 QDECREF(dict);
 return NULL;
 }
@@ -474,9 +454,8 @@ static QObject *parse_array(JSONParserContext *ctxt, 
va_list *ap)
 {
 QList *list = NULL;
 QObject *token, *peek;
-JSONParserContext saved_ctxt = parser_context_save(ctxt);
 
-token = parser_context_pop_token(ctxt);
+token = parser_context_peek_token(ctxt);
 if (token == NULL) {
 goto out;
 }
@@ -487,6 +466,7 @@ static QObject *parse_array(JSONParserContext *ctxt, 
va_list *ap)
 
 list = qlist_new();
 
+parser_context_pop_token(ctxt);
 peek = parser_context_peek_token(ctxt);
 if (peek == NULL) {
 parse_error(ctxt, NULL, "premature EOI");
@@ -537,7 +517,6 @@ static QObject *parse_array(JSONParserContext *ctxt, 
va_list *ap)
 return QOBJECT(list);
 
 out:
-parser_context_restore(ctxt, saved_ctxt);
 QDECREF(list);
 return NULL;
 }
@@ -545,9 +524,8 @@ out:
 static QObject *parse_keyword(JSONParserContext *ctxt)
 {
 QObject *token, *ret;
-JSONParserContext saved_ctxt = parser_context_save(ctxt);
 
-token = parser_context_pop_token(ctxt);
+token = parser_context_peek_token(ctxt);
 if (token == NULL) {
 goto out;
 }
@@ -556,6 +534,7 @@ static QObject *parse_keyword(JSONParserContext *ctxt)
 goto out;
 }
 
+parser_context_pop_token(ctxt);
 if (token_is_keyword(token, "true")) {
 ret = QOBJECT(qbool_from_bool(true));
 } else if (token_is_keyword(token, "false")) {
@@ -570,7 +549,6 @@ static QObject *parse_keyword(JSONParserContext *ctxt)
 return ret;
 
 out: 
-parser_context_restore(ctxt, saved_ctxt);
 
 return NULL;
 }
@@ -578,17 +556,21 @@ out:
 static QObject *parse_escape(JSONParserContext *ctxt, va_list *ap)
 {
 QObject *token = NULL, *obj;
-JSONParserContext saved_ctxt = parser_context_save(ctxt);
 
 if (ap == NULL) {
 goto out;
 }
 
-token = 

Re: [Qemu-devel] [PATCH v2 2/4] qjson: do not save/restore contexts

2015-11-23 Thread Laszlo Ersek
On 11/23/15 18:44, Paolo Bonzini wrote:
> JSON is LL(1) and our parser indeed needs only 1 token lookahead.
> Saving the parser context is mostly unnecessary; we can replace it
> with peeking at the next token, or remove it altogether when the
> restore only happens on errors.  The token list is destroyed anyway
> on errors.
> 
> The only interesting thing is that parse_keyword always eats
> a TOKEN_KEYWORD, even if it is invalid, so it must come last in
> parse_value (otherwise, NULL is returned, parse_literal is invoked
> and it tries to peek beyond end of input).  This is caught by
> /errors/unterminated/literal, which actually checks for an unterminated
> keyword. ಠ_ಠ

Is it accepted practice to put UTF-8 in commit messages? (Or, actually,
anywhere in patches, except maybe the notes section?)

I'd recommend o_O.

Thanks
Laszlo

> 
> Signed-off-by: Paolo Bonzini 
> ---
>  qobject/json-parser.c | 59 
> ++-
>  1 file changed, 21 insertions(+), 38 deletions(-)
> 
> diff --git a/qobject/json-parser.c b/qobject/json-parser.c
> index ac991ba..7a287ea 100644
> --- a/qobject/json-parser.c
> +++ b/qobject/json-parser.c
> @@ -296,23 +296,6 @@ static QObject 
> *parser_context_peek_token(JSONParserContext *ctxt)
>  return token;
>  }
>  
> -static JSONParserContext parser_context_save(JSONParserContext *ctxt)
> -{
> -JSONParserContext saved_ctxt = {0};
> -saved_ctxt.tokens.pos = ctxt->tokens.pos;
> -saved_ctxt.tokens.count = ctxt->tokens.count;
> -saved_ctxt.tokens.buf = ctxt->tokens.buf;
> -return saved_ctxt;
> -}
> -
> -static void parser_context_restore(JSONParserContext *ctxt,
> -   JSONParserContext saved_ctxt)
> -{
> -ctxt->tokens.pos = saved_ctxt.tokens.pos;
> -ctxt->tokens.count = saved_ctxt.tokens.count;
> -ctxt->tokens.buf = saved_ctxt.tokens.buf;
> -}
> -
>  static void tokens_append_from_iter(QObject *obj, void *opaque)
>  {
>  JSONParserContext *ctxt = opaque;
> @@ -364,7 +347,6 @@ static void parser_context_free(JSONParserContext *ctxt)
>  static int parse_pair(JSONParserContext *ctxt, QDict *dict, va_list *ap)
>  {
>  QObject *key = NULL, *token = NULL, *value, *peek;
> -JSONParserContext saved_ctxt = parser_context_save(ctxt);
>  
>  peek = parser_context_peek_token(ctxt);
>  if (peek == NULL) {
> @@ -402,7 +384,6 @@ static int parse_pair(JSONParserContext *ctxt, QDict 
> *dict, va_list *ap)
>  return 0;
>  
>  out:
> -parser_context_restore(ctxt, saved_ctxt);
>  qobject_decref(key);
>  
>  return -1;
> @@ -412,9 +393,8 @@ static QObject *parse_object(JSONParserContext *ctxt, 
> va_list *ap)
>  {
>  QDict *dict = NULL;
>  QObject *token, *peek;
> -JSONParserContext saved_ctxt = parser_context_save(ctxt);
>  
> -token = parser_context_pop_token(ctxt);
> +token = parser_context_peek_token(ctxt);
>  if (token == NULL) {
>  goto out;
>  }
> @@ -425,6 +405,7 @@ static QObject *parse_object(JSONParserContext *ctxt, 
> va_list *ap)
>  
>  dict = qdict_new();
>  
> +parser_context_pop_token(ctxt);
>  peek = parser_context_peek_token(ctxt);
>  if (peek == NULL) {
>  parse_error(ctxt, NULL, "premature EOI");
> @@ -465,7 +446,6 @@ static QObject *parse_object(JSONParserContext *ctxt, 
> va_list *ap)
>  return QOBJECT(dict);
>  
>  out:
> -parser_context_restore(ctxt, saved_ctxt);
>  QDECREF(dict);
>  return NULL;
>  }
> @@ -474,9 +454,8 @@ static QObject *parse_array(JSONParserContext *ctxt, 
> va_list *ap)
>  {
>  QList *list = NULL;
>  QObject *token, *peek;
> -JSONParserContext saved_ctxt = parser_context_save(ctxt);
>  
> -token = parser_context_pop_token(ctxt);
> +token = parser_context_peek_token(ctxt);
>  if (token == NULL) {
>  goto out;
>  }
> @@ -487,6 +466,7 @@ static QObject *parse_array(JSONParserContext *ctxt, 
> va_list *ap)
>  
>  list = qlist_new();
>  
> +parser_context_pop_token(ctxt);
>  peek = parser_context_peek_token(ctxt);
>  if (peek == NULL) {
>  parse_error(ctxt, NULL, "premature EOI");
> @@ -537,7 +517,6 @@ static QObject *parse_array(JSONParserContext *ctxt, 
> va_list *ap)
>  return QOBJECT(list);
>  
>  out:
> -parser_context_restore(ctxt, saved_ctxt);
>  QDECREF(list);
>  return NULL;
>  }
> @@ -545,9 +524,8 @@ out:
>  static QObject *parse_keyword(JSONParserContext *ctxt)
>  {
>  QObject *token, *ret;
> -JSONParserContext saved_ctxt = parser_context_save(ctxt);
>  
> -token = parser_context_pop_token(ctxt);
> +token = parser_context_peek_token(ctxt);
>  if (token == NULL) {
>  goto out;
>  }
> @@ -556,6 +534,7 @@ static QObject *parse_keyword(JSONParserContext *ctxt)
>  goto out;
>  }
>  
> +parser_context_pop_token(ctxt);
>  if (token_is_keyword(token, "true")) {
>  ret =