Re: [protobuf] Spec v2 int-lit snafu?

Josh Humphries Tue, 13 Nov 2018 06:58:26 -0800

On Mon, Nov 12, 2018 at 10:30 PM Michael Powell <mwpowell...@gmail.com>
wrote:


> On Mon, Nov 12, 2018 at 12:46 PM Michael Powell <mwpowell...@gmail.com>
> wrote:
> >
> > On Mon, Nov 12, 2018 at 10:06 AM Michael Powell <mwpowell...@gmail.com>
> wrote:
> > >
> > > Hello,
> > >
> > > Another question following up, how about the sign character for hex
> > > and oct integers? Is it necessary, should it be discarded?
> > >
> > > intLit     = decimalLit | octalLit | hexLit
> > > decimalLit = ( "1" … "9" ) { decimalDigit }
> > > octalLit   = "0" { octalDigit }
> > > hexLit = "0" ( "x" | "X" ) hexDigit { hexDigit }
> > >
> > > constant = fullIdent | ( [ "-" | "+" ] intLit ) | ( [ "-" | "+" ]
> > > floatLit ) | strLit | boolLit
> > >
> > >
> https://developers.google.com/protocol-buffers/docs/reference/proto2-spec#integer_literals
> > >
> https://developers.google.com/protocol-buffers/docs/reference/proto2-spec#constant
> > >
> > > For instance, I am fairly certain the sign character is encoded in a
> > > hex encoded integer. Not sure about octal, but I imagine that it is
> > > fairly consistent.
>
> Got it sorted out I believe. Actually, it's quite nice the parser
> support Spirit provides, aligns pretty much perfectly with the grammar
> specification. There's a bit of gymnastics involved juggling whether
> the AST has a sign or not and so forth, but other than that, it flows
> well enough.
>

If you haven't already, take a look at descriptor.proto
<https://github.com/protocolbuffers/protobuf/blob/master/src/google/protobuf/descriptor.proto>
-- FileDescriptorProto
<https://github.com/protocolbuffers/protobuf/blob/master/src/google/protobuf/descriptor.proto#L61>
therein is basically like an AST for the proto language (and is what protoc
produces as it parses). And for parsing options and the literal values in
particular, take a look at UninterpretedOption
<https://github.com/protocolbuffers/protobuf/blob/master/src/google/protobuf/descriptor.proto#L701>.
Options are first parsed into this structure, and then "interpreted" into
the attributes of *Options messages in a second pass. You'll see that the
approach there includes the negation in the literal integer value but
also distinguishes
between the two
<https://github.com/protocolbuffers/protobuf/blob/master/src/google/protobuf/descriptor.proto#L716>
in the AST.


>
> > > Case in point, the value 107026150751750362 gets encoded as
> > > 0X17C3BB7913C48DA (upper-case). Whereas it's negative counterpart,
> > > -107026150751750362, really does get encoded as 0xFE83C4486EC3B726.
> > > Signage included, if memory serves.
> > >
> > > In these cases, I think the sign bit falls in the "optional" category?
> >
> > So... As far as I can determine, there are a couple of ways to
> > interpret this, semantically speaking. But this potentially informs
> > whatever parsing stack you are using as well.
> >
> > I'm using Boost Spirit Qi, for instance, which supports radix-based
> > integer parsing well enough, but has its own set of issues when
> > dealing with signage. That being said...
> >
> > 1. Treat the value itself as positive one way or another, with an
> > optional sign attribute (i.e. '+' or '-'). This would potentially
> > work, especially when there is base 16 (hex) or base 8 (octal)
> > involved.
> >
> > 2. Otherwise, open to suggestions, but for Qi constraints; that I know
> > of, fails to parse negative signed hexadecimal/octal encoded values.
> >
> > Again, kind of a symptom of an imprecise grammar specification. I can
> > get a sense for how to handle it, but does it truly capture "intent".
> >
> > Thanks in advance for any light that can be shed.
> >
> > > Cheers, thanks,
> > >
> > > Michael
> > > On Sun, Nov 11, 2018 at 10:56 AM Josh Humphries <jh...@bluegosling.com>
> wrote:
> > > >
> > > > For the case of zero by itself, per the spec, it will be parsed as
> an octal literal with value zero -- so functionally equivalent to a decimal
> literal with value zero. And for values with multiple digits, a leading
> zero means it is an octal literal. Decimal values will not have a leading
> zero.
> > > >
> > > > ----
> > > > Josh Humphries
> > > > jh...@bluegosling.com
> > > >
> > > >
> > > > On Sat, Nov 10, 2018 at 10:16 PM Michael Powell <
> mwpowell...@gmail.com> wrote:
> > > >>
> > > >> Hello,
> > > >>
> > > >> I think 0 can be a decimal-lit, don't you think? However, the spec
> > > >> reads as follows:
> > > >>
> > > >> intLit     = decimalLit | octalLit | hexLit
> > > >> decimalLit = ( "1" … "9" ) { decimalDigit }
> > > >> octalLit   = "0" { octalDigit }
> > > >> hexLit     = "0" ( "x" | "X" ) hexDigit { hexDigit }
> > > >>
> > > >> Is there a reason, semantically speaking, why decimal must be
> greater
> > > >> than 0? And that's not including a plus/minus sign when you factor
> in
> > > >> constants.
> > > >>
> > > >> Of course, parsing, order matters, similar as with the escape
> > > >> character phrases in the string-literal:
> > > >>
> > > >> hex-lit | oct-lit | dec-lit
> > > >>
> > > >> And so on, since you have to rule out 0x\d+ for hex, followed by
> 0\d* ...
> > > >>
> > > >> Actually, now that I look at it "0" (really, "decimal" 0) is lurking
> > > >> in the oct-lit phrase.
> > > >>
> > > >> Kind of a grammatical nit-pick, I know, but I just wanted to be
> clear
> > > >> here. Seems like a possible source of confusion if you aren't paying
> > > >> careful attention.
> > > >>
> > > >> Thoughts?
> > > >>
> > > >> Best regards,
> > > >>
> > > >> Michael Powell
> > > >>
> > > >> --
> > > >> You received this message because you are subscribed to the Google
> Groups "Protocol Buffers" group.
> > > >> To unsubscribe from this group and stop receiving emails from it,
> send an email to protobuf+unsubscr...@googlegroups.com.
> > > >> To post to this group, send email to protobuf@googlegroups.com.
> > > >> Visit this group at https://groups.google.com/group/protobuf.
> > > >> For more options, visit https://groups.google.com/d/optout.
>
> --
> You received this message because you are subscribed to the Google Groups
> "Protocol Buffers" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to protobuf+unsubscr...@googlegroups.com.
> To post to this group, send email to protobuf@googlegroups.com.
> Visit this group at https://groups.google.com/group/protobuf.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to protobuf+unsubscr...@googlegroups.com.
To post to this group, send email to protobuf@googlegroups.com.
Visit this group at https://groups.google.com/group/protobuf.
For more options, visit https://groups.google.com/d/optout.

Re: [protobuf] Spec v2 int-lit snafu?

Reply via email to