Re: Dynamic token kinds

2018-12-23 Thread Akim Demaille



> Le 23 déc. 2018 à 12:46, Frank Heckenbach  a écrit :
> 
> Akim Demaille wrote:
> 
>>> but
>>> unfortunately git is quite data hungry and I'm a bit short on data
>>> volume (no broadband in my new appartment yet, and mobile data is
>>> paid in gold here in Germany),
>> 
>> Ouch.  We're lucky in France, that's very cheap.
> 
> I'm lucky to have a relatively fast connection at all. There are
> areas where RFC 1149 is the preferred transport layer.

:) :) :)


>> Please find the tarball here:
>> 
>> https://www.lrde.epita.fr/~akim/private/bison/bison-3.2.1.127-777b.tar.gz
>> https://www.lrde.epita.fr/~akim/private/bison/bison-3.2.1.127-777b.tar.xz
> 
> Thanks. I've tried it and it works with my parsers.

Great!


>> The best way would be to provide tarballs directly from my
>> GitHub repo.  After all, it's already running travis for the
>> tests, it can easily build tarballs.  However, I don't know
>> how to expose them, I don't think there is a free
>> (as in beer) way to do that.  If you have an idea, I'll take it!
> 
> I don't know too much about GitHub. As I said, I had hoped "Download
> Zip" would do it, but that's not self-contained.

I guess it's only an archive of the current files in the
repo.  So it's probably carrying nothing about gnulib.

[gnulib]
> I never liked that
> design, and apparently it's now causing me actual problems by
> breaking the simple download and vastly increasing the clone size
> (AFAICS, most of it is downloading the gnulib history which I don't
> care a tiny bit about, especially not when just trying to build
> Bison).

I can understand this.  However, this is the only model I can
see that really allows the maintainer of a package to be able
to use the latest version of the library, in case it addresses
a recent portability issue.  It's not too inconvenient to use
as a user of git, and it's completely invisible for the end
user (i.e., users of the tarballs).


> Perhaps what I can do (as I slight kludge) for future patches is to
[...]

The easiest, by far, would be for me to provide you with a
viable tarball :(




Re: Dynamic token kinds

2018-12-23 Thread Akim Demaille
Hi Frank!

> Le 22 déc. 2018 à 21:12, Frank Heckenbach  a écrit :
> 
> Sorry, now I seem to be the one to have trouble using your patch:
> 
> - I guess the right thing is to clone the repository,

Yes, it is.

> but
>  unfortunately git is quite data hungry and I'm a bit short on data
>  volume (no broadband in my new appartment yet, and mobile data is
>  paid in gold here in Germany),

Ouch.  We're lucky in France, that's very cheap.

> so I could only do it on a server
>  on which I have access (in the hope of running "make dist" or so
>  to get it to my machine then), but after downloading >100 MB (like
>  I feared), bootstrap failed with lots of warnings and errors (the
>  server is still running jessie, which I can't change, guess that's
>  too old to bootstrap bison).

I don't think so, but I don't actually know.

> I suppose I could ask you to send me a "dist" archive for testing
> now, or if only "data" files need changes compared to 3.2.2, post
> those, but for testing future changes, that's not a very nice
> workflow. Is there any better way?

Please find the tarball here:

https://www.lrde.epita.fr/~akim/private/bison/bison-3.2.1.127-777b.tar.gz
https://www.lrde.epita.fr/~akim/private/bison/bison-3.2.1.127-777b.tar.xz

The best way would be to provide tarballs directly from my
GitHub repo.  After all, it's already running travis for the
tests, it can easily build tarballs.  However, I don't know
how to expose them, I don't think there is a free
(as in beer) way to do that.  If you have an idea, I'll take it!





Re: Dynamic token kinds

2018-12-22 Thread Frank Heckenbach
Akim Demaille wrote:

> > To actually allow this, you could have the typed constructors all
> > accept the typeless tokens as well, but I don't consider that really
> > necessary. Unless you want to support that for backward (bugward?)
> > compatibility, I'll just change my code to make two separate
> > make_symbol calls.
> 
> Yes, I prefer it this way.  The whole point of my work on C++'s
> symbols so far is really to be type safe(r).

OK, I've changed my code accordingly.

> See below, I have a working draft that completely replaces
> make_symbol by "merging" the assert-based type checking into
> the symbol_type constructors.  Since that makes the ctors safe,
> I'm fine with exposing them.

Sorry, now I seem to be the one to have trouble using your patch:

- It doesn't seem to be against 3.2.2, so I can't directly use
  "patch".

- I tried to manually apply it (like I did yesterday), but failed
  (maybe I made some mistake this time, or it requires other
  changes, but I don't think debugging this is worthwhile).

- I tried to get just c++.m4 and variant.hh from the branch and copy
  them over my installation, but that also failed.

- Then I tried "Download ZIP" to get the whole branch, but then
  running ./bootstrap gave these errors which I don't quite get
  (I thought not requiring a git repository is the point of this
  download function):

  ./bootstrap: Bootstrapping from checked-out bison sources...
  ./bootstrap: getting gnulib files...
  fatal: Not a git repository (or any parent up to mount point /tmp)
  Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set).

- I guess the right thing is to clone the repository, but
  unfortunately git is quite data hungry and I'm a bit short on data
  volume (no broadband in my new appartment yet, and mobile data is
  paid in gold here in Germany), so I could only do it on a server
  on which I have access (in the hope of running "make dist" or so
  to get it to my machine then), but after downloading >100 MB (like
  I feared), bootstrap failed with lots of warnings and errors (the
  server is still running jessie, which I can't change, guess that's
  too old to bootstrap bison).

I suppose I could ask you to send me a "dist" archive for testing
now, or if only "data" files need changes compared to 3.2.2, post
those, but for testing future changes, that's not a very nice
workflow. Is there any better way?

Regards,
Frank



Re: Dynamic token kinds

2018-12-22 Thread Akim Demaille
Hi Frank!

> Le 22 déc. 2018 à 01:14, Frank Heckenbach  a écrit :
> 
> Akim Demaille wrote:
> 
>> I like this idea.  I have a draft for it in my repo, as "make-symbol".
>> Please, try it and report about it.
> 
> Again, sorry for the delay (still busy), but now I tried it
> (removing the "b4_parse_assert_if", see below).

Thanks for spending time on this.  I really need feedback  :)


> It seems to work for me. The only issue I had was due to sloppiness
> on my side. I'm only mentioning it in case others do the same.
> Basically, I had stored tokens of one specific semantic type in a
> look-up table together with tokens without semantic type (storing a
> dummy value in the table for the latter), and constructed the tokens
> for both in the same branch, exploiting the only case where a
> mismatch is inconsequential, i.e. setting a value and not using it.
> This worked before, but the stricter checks now (correctly) caught
> it.

Great news!   It works!


> To actually allow this, you could have the typed constructors all
> accept the typeless tokens as well, but I don't consider that really
> necessary. Unless you want to support that for backward (bugward?)
> compatibility, I'll just change my code to make two separate
> make_symbol calls.

Yes, I prefer it this way.  The whole point of my work on C++'s
symbols so far is really to be type safe(r).


>> There are a few issues:
>> - make_symbol will collide if the user has a token named symbol
>>  Any idea of a better name?  
> 
> To avoid such collisions, I think we have to avoid the "make_"
> prefix entirely. Maybe "build_symbol"?
> 
>> Or simply make them actual constructors for `symbol_type`.
> 
> Yes, if they are (documented as) public. I think I'd prefer this as
> I wouldn't have to change my code from 3.2.2.

See below, I have a working draft that completely replaces
make_symbol by "merging" the assert-based type checking into
the symbol_type constructors.  Since that makes the ctors safe,
I'm fine with exposing them.

I wish it required less changes.  In particular,
it tears appart symbol_type and stack_symbol_type even further
apart.  My CRTP might no longer fully make sense, maybe I'll
get rid of it at some point.


>> - In the signature of make_symbol, I've had to use int for the
>>  token type, instead of the enum token_type, and then
>>  to convert the int into token_type.  I don't like that, but I
>>  probably don't have much of a choice.  (A template would be
>>  overkill IMHO).
> 
> Why, is it because of char tokens (like '+' in your example)?

Yes, exactly.  I don't like that we accept ASCII, but we have
to.


Here is the patch, twice.  I want to keep the previous one
(with make_symbol) in the git history, so the second patch below
shows that actual commit, relatively to make_symbol.  But I think
the patch compared to before make_symbol is a better reading (it's
the first one below).


Now that I have done this, I think I can merge two different
types that currently exist in yy::parser: token and symbol_type.
The first one is used _only_ to define the enum of the various
token types, and the second one, well, implements the tokens.  And
'token' is actually a better name than 'symbol_type'.
Of course, I will leave a type alias for symbol_type.

WDYT?

Currently in https://github.com/akimd/bison/tree/make-symbol.

commit 5e8571708e34b3e69a8182a88057199e7bf63568
Author: Akim Demaille 
Date:   Wed Dec 19 17:51:10 2018 +0100

c++: exhibit a safe symbol_type

Instead of introducing make_symbol (whose name, btw, somewhat
infringes on the user's "name space", if she defines a token named
"symbol"), let's make the construction of symbol_type safer, using
assertions.

For instance with:

  %token ':'  ID  INT;

generate:

symbol_type (int token, const std::string&);
symbol_type (int token, const int&);
symbol_type (int token);

It does mean that now named token constructors (make_ID, make_INT,
etc.) go through a useless assert, but I think we can ignore this: I
assume any decent compiler will inline the symbol_type ctor inside the
make_TOKEN functions, which will show that the assert is trivially
verified, hence I expect no code will be emitted for it.  And anyway,
that's an assert, NDEBUG controls it.

* data/c++.m4 (symbol_type): Turn into a subclass of
basic_symbol.
Declare symbol constructors when variants are enabled.
* data/variant.hh (_b4_type_constructor_declare)
(_b4_type_constructor_define): Replace with...
(_b4_symbol_constructor_declare, _b4_symbol_constructor_def): these.
Generate symbol_type constructors.
* doc/bison.texi (Complete Symbols): Document.
* tests/types.at: Check.

diff --git a/NEWS b/NEWS
index 08d99f19..c67fb142 100644
--- a/NEWS
+++ b/NEWS
@@ -96,10 +96,36 @@ GNU Bison NEWS
   until it sees the '='.  So we notate the two possible reductions to
   indicate that each conflicts in 

Re: Dynamic token kinds

2018-12-21 Thread Frank Heckenbach
Akim Demaille wrote:

> Hi Frank, Hi all,
> 
> > Le 16 déc. 2018 à 10:02, Frank Heckenbach  a écrit 
> > :
> > 
> > So to make it safe, we might need something like this:
> > 
> >  static inline
> >  symbol_type
> >  make_symbol (token_type type, b4_locations_if([const location_type& l, 
> > ])T&& v);
> > 
> > auto-generated for each semantic type T of any token (plus one
> > without the "v" parameter for untyped tokens) that checks (at
> > runtime) the "type" parameter against the (statically known) valid
> > token types for T.
> 
> I like this idea.  I have a draft for it in my repo, as "make-symbol".
> Please, try it and report about it.

Again, sorry for the delay (still busy), but now I tried it
(removing the "b4_parse_assert_if", see below).

It seems to work for me. The only issue I had was due to sloppiness
on my side. I'm only mentioning it in case others do the same.
Basically, I had stored tokens of one specific semantic type in a
look-up table together with tokens without semantic type (storing a
dummy value in the table for the latter), and constructed the tokens
for both in the same branch, exploiting the only case where a
mismatch is inconsequential, i.e. setting a value and not using it.
This worked before, but the stricter checks now (correctly) caught
it.

To actually allow this, you could have the typed constructors all
accept the typeless tokens as well, but I don't consider that really
necessary. Unless you want to support that for backward (bugward?)
compatibility, I'll just change my code to make two separate
make_symbol calls.

> There are a few issues:
> - make_symbol will collide if the user has a token named symbol
>   Any idea of a better name?  

To avoid such collisions, I think we have to avoid the "make_"
prefix entirely. Maybe "build_symbol"?

> Or simply make them actual constructors for `symbol_type`.

Yes, if they are (documented as) public. I think I'd prefer this as
I wouldn't have to change my code from 3.2.2.

> - In the signature of make_symbol, I've had to use int for the
>   token type, instead of the enum token_type, and then
>   to convert the int into token_type.  I don't like that, but I
>   probably don't have much of a choice.  (A template would be
>   overkill IMHO).

Why, is it because of char tokens (like '+' in your example)?
Well, it may not be too nice, but I don't really mind, especially
since the value is explictly checked here.

> - I'm not sure I should emit the assert only when parse.assert
>   was set, maybe we should always do it.  After all, if the user
>   does not want the check, he can pass -DNDEBUG.  parse.assert
>   was made to check what Bison generates, more than to check what
>   the user does.

I agree, these are different things.

Regards,
Frank



Re: Dynamic token kinds

2018-12-20 Thread Akim Demaille
Any comment from someone?

> Le 19 déc. 2018 à 09:13, Akim Demaille  a écrit :
> 
> 
> 
>> Le 19 déc. 2018 à 07:34, Akim Demaille  a écrit :
>> 
>> Hi Frank, Hi all,
>> 
>>> Le 16 déc. 2018 à 10:02, Frank Heckenbach  a écrit 
>>> :
>>> 
>>> So to make it safe, we might need something like this:
>>> 
>>> static inline
>>> symbol_type
>>> make_symbol (token_type type, b4_locations_if([const location_type& l, 
>>> ])T&& v);
>>> 
>>> auto-generated for each semantic type T of any token (plus one
>>> without the "v" parameter for untyped tokens) that checks (at
>>> runtime) the "type" parameter against the (statically known) valid
>>> token types for T.
>> 
>> I like this idea.  I have a draft for it in my repo, as "make-symbol".
>> Please, try it and report about it.
>> 
>> There are a few issues:
>> - make_symbol will collide if the user has a token named symbol
>> Any idea of a better name?  
> 
> Or simply make them actual constructors for `symbol_type`.




Re: Dynamic token kinds

2018-12-19 Thread Akim Demaille



> Le 19 déc. 2018 à 07:34, Akim Demaille  a écrit :
> 
> Hi Frank, Hi all,
> 
>> Le 16 déc. 2018 à 10:02, Frank Heckenbach  a écrit :
>> 
>> So to make it safe, we might need something like this:
>> 
>> static inline
>> symbol_type
>> make_symbol (token_type type, b4_locations_if([const location_type& l, ])T&& 
>> v);
>> 
>> auto-generated for each semantic type T of any token (plus one
>> without the "v" parameter for untyped tokens) that checks (at
>> runtime) the "type" parameter against the (statically known) valid
>> token types for T.
> 
> I like this idea.  I have a draft for it in my repo, as "make-symbol".
> Please, try it and report about it.
> 
> There are a few issues:
> - make_symbol will collide if the user has a token named symbol
>  Any idea of a better name?  

Or simply make them actual constructors for `symbol_type`.


Re: Dynamic token kinds

2018-12-17 Thread Hans Åberg


> On 17 Dec 2018, at 18:37, Frank Heckenbach  wrote:
> 
> Do you actually use Bison's variant? Otherwise, what you do is
> irrelevant here (sorry), as this is a proposal specifically about
> Bison's variant.

As I said, I do not use it now, but I wanted to know whether I could use it 
before actually attempting to convert to it, which may be irrelevant in your 
programming approach. I have used a typed C++ parser I wrote myself before Akim 
started to write one, but then it wasn't very useful.





Re: Dynamic token kinds

2018-12-17 Thread Hans Åberg


> On 17 Dec 2018, at 11:17, Frank Heckenbach  wrote:
> 
> Hans Åberg wrote:
> 
>>> On 17 Dec 2018, at 10:48, Frank Heckenbach  wrote:
>> 
>> Might Bison generate a function with a switch statement, generate the right 
>> return for the lexer to use?
> 
> Different semantic types need separate functions since C++ is
> strongly typed. Perhaps an example makes it clearer:
> 
> Say we have tokens V_FOO and V_BAR with no semantic type, I_BAZ and
> I_QUX with semantic type int and S_BLA with type string. (BTW, I'm
> no fan of Hungarian notation, just use it here for the sake of
> example.) So far Bison generates (roughly speaking):
> 
>  symbol_type make_V_FOO ();
>  symbol_type make_V_BAR ();
>  symbol_type make_I_BAZ (int &&);
>  symbol_type make_I_QUX (int &&);
>  symbol_type make_S_BLA (string &&);

I thought something like that from looking at the calculator example.

> What I suggest to add (without changing the above), is:
> 
>  symbol_type make_symbol (token_type type);
>  // checks at runtime that type is V_FOO or V_BAR
> 
>  symbol_type make_symbol (token_type type, int &&);
>  // checks at runtime that type is I_BAZ or I_QUX
> 
>  symbol_type make_symbol (token_type type, string &&);
>  // checks at runtime that type is S_BLA
> 
> These runtime checks might be implemented via a switch if that's
> easier to auto-generate (it might be in fact) or with a simple
> "if (... || ...)" statement, that's an implementation detail.

Actually, I pass the semantic value through a class object to which the lexer 
function belongs, so the extra arguments are not needed. So I must think more 
about this.

>>> It's not that bad actually. Again, my lexers work fine as is.
>>> I just brought this up because Akim proposed to call the function
>>> "unsafe_..." which I thought was too harsh and proposed
>>> "unchecked_..." -- but by adding the checks, it would be neither
>>> unsafe nor unchecked. :)
>> 
>> This worries me.
> 
> That's why I suggest to add the check. :)

There must be some guard against programming errors, I think.

>> But also having having to use something more complex to be returned by the 
>> lexer than a value on the lookup table .
> 
> The lexer returns a token which contains the token kind (an enum)
> and the semantic value (a union value). As mismatch is bad. The
> make_FOO functions avoid a mismatch and are suitable for statically
> known token kinds. The direct constructor call can be used for
> dynamic token kinds, but allows a mismatch. The functions I propose
> to generate instead could be used for dynamic token kinds and avoid
> a mismatch.
> 
> Everything clear now?

Yes, it is the requirement of returning the semantic value that causes the 
issue. Then this requirement is perhaps born out the condition of not storing 
the type in the Bison variant.





Re: Dynamic token kinds

2018-12-17 Thread Frank Heckenbach
Hans Åberg wrote:

> > On 17 Dec 2018, at 10:48, Frank Heckenbach  wrote:
> > 
> > I think we agree here, and that was actually my concern when I
> > started this thread. I don't want to have to write a separate case
> > for each token kind in my lexer. Of course, we need a separate case
> > for each semantic type because that involves a different type in the
> > constructor/builder call already, but these are relatively few,
> > compared to token kinds, in my lexers.
> 
> Might Bison generate a function with a switch statement, generate the right 
> return for the lexer to use?

Different semantic types need separate functions since C++ is
strongly typed. Perhaps an example makes it clearer:

Say we have tokens V_FOO and V_BAR with no semantic type, I_BAZ and
I_QUX with semantic type int and S_BLA with type string. (BTW, I'm
no fan of Hungarian notation, just use it here for the sake of
example.) So far Bison generates (roughly speaking):

  symbol_type make_V_FOO ();
  symbol_type make_V_BAR ();
  symbol_type make_I_BAZ (int &&);
  symbol_type make_I_QUX (int &&);
  symbol_type make_S_BLA (string &&);

What I suggest to add (without changing the above), is:

  symbol_type make_symbol (token_type type);
  // checks at runtime that type is V_FOO or V_BAR

  symbol_type make_symbol (token_type type, int &&);
  // checks at runtime that type is I_BAZ or I_QUX

  symbol_type make_symbol (token_type type, string &&);
  // checks at runtime that type is S_BLA

These runtime checks might be implemented via a switch if that's
easier to auto-generate (it might be in fact) or with a simple
"if (... || ...)" statement, that's an implementation detail.

> >> Maybe an option. Akim perhaps haven't used this dynamic token
> >> lookup.
> > 
> > I guess he hasn't. But I don't think we need an option. These would
> > just be additional functions that one can use or not.
> 
> The with an option would be that those that do not need this feature could 
> use a more optimal variant.

According to my proposal everyone could use any function. In fact,
my lexers do, they use the "safe" make_FOO functions by default, and
the (so far) unchecked ones for the dynamicalled looked-up tokens.

> >> Those that do might prefer not risking the program to bomb.
> > 
> > It's not that bad actually. Again, my lexers work fine as is.
> > I just brought this up because Akim proposed to call the function
> > "unsafe_..." which I thought was too harsh and proposed
> > "unchecked_..." -- but by adding the checks, it would be neither
> > unsafe nor unchecked. :)
> 
> This worries me.

That's why I suggest to add the check. :)

> But also having having to use something more complex to be returned by the 
> lexer than a value on the lookup table .

The lexer returns a token which contains the token kind (an enum)
and the semantic value (a union value). As mismatch is bad. The
make_FOO functions avoid a mismatch and are suitable for statically
known token kinds. The direct constructor call can be used for
dynamic token kinds, but allows a mismatch. The functions I propose
to generate instead could be used for dynamic token kinds and avoid
a mismatch.

Everything clear now?

Regards,
Frank



Re: Dynamic token kinds

2018-12-17 Thread Hans Åberg


> On 17 Dec 2018, at 10:48, Frank Heckenbach  wrote:
> 
> Hans Åberg wrote:
> 
>>> On 16 Dec 2018, at 15:48, Frank Heckenbach  wrote:
>>> 
>>> Hans Åberg wrote:
>>> 
 The idea would be that rather than returning the token value,
 something that does not need translation. I don't know if that
 helps up the static approach, though.
>>> 
>>> Not sure what you mean here. Please elaborate.
>> 
>> I couldn't see the details when I looked at it. I don't use the typed 
>> parser, but might if it is safe.
> 
> The parser is type safe. This is only about an alternative way of
> creating tokens by the lexer, which is also type safe when used
> properly (as mine does). It's only about adding an additional safety
> net.

Right.

>>> I think Akim made it clear enough that he doesn't like the overhead.
>>> (I don't mind so much, as I used std::variant in my implementation,
>>> but I only have a few parsers to care about.)
>> 
>> In that case, my impression was that he thought he could do without it.
> 
> Well, he can. :)

In that case.

>>> One might validly say that preventing it is the job of the lexer
>>> (and my lexer does so), not Bison's, just like other kinds of
>>> undefined or wrong behaviour outside of the generated code, also
>>> dynamic token types are a somewhat special usage anyway, so Bison
>>> can just do nothing about it, that's fine.
>> 
>> I use the same thing, returning the token value found on a lookup
>> table, but I would not want to use the typed parser if I would
>> have to add translations for every possibility. The information
>> about it is in Bison, therefore it should not be put on the
>> writing of the lexer.
> 
> I think we agree here, and that was actually my concern when I
> started this thread. I don't want to have to write a separate case
> for each token kind in my lexer. Of course, we need a separate case
> for each semantic type because that involves a different type in the
> constructor/builder call already, but these are relatively few,
> compared to token kinds, in my lexers.

Might Bison generate a function with a switch statement, generate the right 
return for the lexer to use?

>>> I also suggested an approach in my previous mail with a few more
>>> generated functions that help runtime checking. I'd prefer Bison to
>>> add them, and then we'd have runtime checking as good as we'd have
>>> with std::variant in this respect.
>> 
>> Maybe an option. Akim perhaps haven't used this dynamic token
>> lookup.
> 
> I guess he hasn't. But I don't think we need an option. These would
> just be additional functions that one can use or not.

The with an option would be that those that do not need this feature could use 
a more optimal variant.

>> Those that do might prefer not risking the program to bomb.
> 
> It's not that bad actually. Again, my lexers work fine as is.
> I just brought this up because Akim proposed to call the function
> "unsafe_..." which I thought was too harsh and proposed
> "unchecked_..." -- but by adding the checks, it would be neither
> unsafe nor unchecked. :)

This worries me. But also having having to use something more complex to be 
returned by the lexer than a value on the lookup table .





Re: Dynamic token kinds

2018-12-17 Thread Frank Heckenbach
Hans Åberg wrote:

> > On 16 Dec 2018, at 15:48, Frank Heckenbach  wrote:
> > 
> > Hans Åberg wrote:
> > 
> >> The idea would be that rather than returning the token value,
> >> something that does not need translation. I don't know if that
> >> helps up the static approach, though.
> > 
> > Not sure what you mean here. Please elaborate.
> 
> I couldn't see the details when I looked at it. I don't use the typed parser, 
> but might if it is safe.

The parser is type safe. This is only about an alternative way of
creating tokens by the lexer, which is also type safe when used
properly (as mine does). It's only about adding an additional safety
net.

> > I think Akim made it clear enough that he doesn't like the overhead.
> > (I don't mind so much, as I used std::variant in my implementation,
> > but I only have a few parsers to care about.)
> 
> In that case, my impression was that he thought he could do without it.

Well, he can. :)

> > One might validly say that preventing it is the job of the lexer
> > (and my lexer does so), not Bison's, just like other kinds of
> > undefined or wrong behaviour outside of the generated code, also
> > dynamic token types are a somewhat special usage anyway, so Bison
> > can just do nothing about it, that's fine.
> 
> I use the same thing, returning the token value found on a lookup
> table, but I would not want to use the typed parser if I would
> have to add translations for every possibility. The information
> about it is in Bison, therefore it should not be put on the
> writing of the lexer.

I think we agree here, and that was actually my concern when I
started this thread. I don't want to have to write a separate case
for each token kind in my lexer. Of course, we need a separate case
for each semantic type because that involves a different type in the
constructor/builder call already, but these are relatively few,
compared to token kinds, in my lexers.

> > I also suggested an approach in my previous mail with a few more
> > generated functions that help runtime checking. I'd prefer Bison to
> > add them, and then we'd have runtime checking as good as we'd have
> > with std::variant in this respect.
> 
> Maybe an option. Akim perhaps haven't used this dynamic token
> lookup.

I guess he hasn't. But I don't think we need an option. These would
just be additional functions that one can use or not.

> Those that do might prefer not risking the program to bomb.

It's not that bad actually. Again, my lexers work fine as is.
I just brought this up because Akim proposed to call the function
"unsafe_..." which I thought was too harsh and proposed
"unchecked_..." -- but by adding the checks, it would be neither
unsafe nor unchecked. :)

So to wrap this up, Akim, do you think adding the typed functions as
proposed in one of my last mails is viable?

Regards,
Frank



Re: Dynamic token kinds

2018-12-16 Thread Hans Åberg


> On 16 Dec 2018, at 15:48, Frank Heckenbach  wrote:
> 
> Hans Åberg wrote:
> 
>> The idea would be that rather than returning the token value,
>> something that does not need translation. I don't know if that
>> helps up the static approach, though.
> 
> Not sure what you mean here. Please elaborate.

I couldn't see the details when I looked at it. I don't use the typed parser, 
but might if it is safe.

 Personally, I am already at C++17, so I have no reason for using a
 simpler variant. Having the type stored just adds a size_t, and
 that is used a lot in other circumstances, so no overhead to worry
 about.
>>> 
>>> Well, we had this discussion recently (as far as Bison is
>>> concerned).
>> 
>> Indeed, but that was where it seemed not requiring the type being
>> stored in the variant. This situation might be different in that
>> respect.
>> 
>> Here, not making sure the type is properly returned may bomb the
>> program, so preventing that seems higher than a rather small
>> overhead.
> 
> I think Akim made it clear enough that he doesn't like the overhead.
> (I don't mind so much, as I used std::variant in my implementation,
> but I only have a few parsers to care about.)

In that case, my impression was that he thought he could do without it.

> One might validly say that preventing it is the job of the lexer
> (and my lexer does so), not Bison's, just like other kinds of
> undefined or wrong behaviour outside of the generated code, also
> dynamic token types are a somewhat special usage anyway, so Bison
> can just do nothing about it, that's fine.

I use the same thing, returning the token value found on a lookup table, but I 
would not want to use the typed parser if I would have to add translations for 
every possibility. The information about it is in Bison, therefore it should 
not be put on the writing of the lexer.

> I also suggested an approach in my previous mail with a few more
> generated functions that help runtime checking. I'd prefer Bison to
> add them, and then we'd have runtime checking as good as we'd have
> with std::variant in this respect.

Maybe an option. Akim perhaps haven't used this dynamic token lookup. Those 
that do might prefer not risking the program to bomb.





Re: Dynamic token kinds

2018-12-16 Thread Frank Heckenbach
Hans Åberg wrote:

> > On 16 Dec 2018, at 11:13, Frank Heckenbach  wrote:
> > 
> > Hans Åberg wrote:
> > 
> >> Perhaps an entirely static approach can be achieved by the type
> >> being a part of the token_type. On the other hand, the use is for
> >> dynamic token lookup, so it may be too much of an effort for that.
> > 
> > Not sure what you mean with "part of", but with Bison's variant the
> > semantic type is determined by token_type, if that's what you mean.
> 
> The idea would be that rather than returning the token value,
> something that does not need translation. I don't know if that
> helps up the static approach, though.

Not sure what you mean here. Please elaborate.

> >> Personally, I am already at C++17, so I have no reason for using a
> >> simpler variant. Having the type stored just adds a size_t, and
> >> that is used a lot in other circumstances, so no overhead to worry
> >> about.
> > 
> > Well, we had this discussion recently (as far as Bison is
> > concerned).
> 
> Indeed, but that was where it seemed not requiring the type being
> stored in the variant. This situation might be different in that
> respect.
> 
> Here, not making sure the type is properly returned may bomb the
> program, so preventing that seems higher than a rather small
> overhead.

I think Akim made it clear enough that he doesn't like the overhead.
(I don't mind so much, as I used std::variant in my implementation,
but I only have a few parsers to care about.)

One might validly say that preventing it is the job of the lexer
(and my lexer does so), not Bison's, just like other kinds of
undefined or wrong behaviour outside of the generated code, also
dynamic token types are a somewhat special usage anyway, so Bison
can just do nothing about it, that's fine.

I also suggested an approach in my previous mail with a few more
generated functions that help runtime checking. I'd prefer Bison to
add them, and then we'd have runtime checking as good as we'd have
with std::variant in this respect.

Regards,
Frank



Re: Dynamic token kinds

2018-12-16 Thread Hans Åberg


> On 16 Dec 2018, at 11:13, Frank Heckenbach  wrote:
> 
> Hans Åberg wrote:
> 
>> Perhaps an entirely static approach can be achieved by the type
>> being a part of the token_type. On the other hand, the use is for
>> dynamic token lookup, so it may be too much of an effort for that.
> 
> Not sure what you mean with "part of", but with Bison's variant the
> semantic type is determined by token_type, if that's what you mean.

The idea would be that rather than returning the token value, something that 
does not need translation. I don't know if that helps up the static approach, 
though.

>> Personally, I am already at C++17, so I have no reason for using a
>> simpler variant. Having the type stored just adds a size_t, and
>> that is used a lot in other circumstances, so no overhead to worry
>> about.
> 
> Well, we had this discussion recently (as far as Bison is
> concerned).

Indeed, but that was where it seemed not requiring the type being stored in the 
variant. This situation might be different in that respect.

Here, not making sure the type is properly returned may bomb the program, so 
preventing that seems higher than a rather small overhead.





Re: Dynamic token kinds

2018-12-16 Thread Frank Heckenbach
Hans Åberg wrote:

> Perhaps an entirely static approach can be achieved by the type
> being a part of the token_type. On the other hand, the use is for
> dynamic token lookup, so it may be too much of an effort for that.

Not sure what you mean with "part of", but with Bison's variant the
semantic type is determined by token_type, if that's what you mean.

> Personally, I am already at C++17, so I have no reason for using a
> simpler variant. Having the type stored just adds a size_t, and
> that is used a lot in other circumstances, so no overhead to worry
> about.

Well, we had this discussion recently (as far as Bison is
concerned).

Regards,
Frank



Re: Dynamic token kinds

2018-12-16 Thread Hans Åberg



> On 16 Dec 2018, at 10:02, Frank Heckenbach  wrote:
> 
> Hans Åberg wrote:
> 
>>> On 30 Nov 2018, at 00:40, Frank Heckenbach  wrote:
>>> 
>>> Hans Åberg wrote:
>>> 
 It seems pretty standard to have lookup tokens with different
 syntactic behavior, for example when they are declared of
 different type in a language. So it is worrisome that the typed
 parser deems the use unsafe.
>>> 
>>> What is potentially unsafe is that the actual type may not match the
>>> declared type in the grammar. With std::variant, a mismatch would
>>> cause an exception to be thrown. With Bison's static variant, a
>>> mismatch might lead to UB.
>>> 
>>> So perhaps this function could actually do a type check (which
>>> probably requires another auto-generated switch) and also throw or
>>> (if this is not desired) call std::terminate or so on mismatch,
>>> Akim?
>> 
>> The C++17 std::variant stores the type as an index. So perhaps
>> there should be and additional table storing the type, with a
>> symbol constructor that constructs the right value from the token.
> 
> Unlike std::variant, Bison's variant does not store the type at all.
> So the type check I suggested isn't actually that easy. By the time
> the symbol_type constructor is called, the variant value has already
> been constructed and type information lost.
> 
> So to make it safe, we might need something like this:
> 
>  static inline
>  symbol_type
>  make_symbol (token_type type, b4_locations_if([const location_type& l, ])T&& 
> v);
> 
> auto-generated for each semantic type T of any token (plus one
> without the "v" parameter for untyped tokens) that checks (at
> runtime) the "type" parameter against the (statically known) valid
> token types for T.

I felt that one should just return the token value, in part because otherwise 
the lookup data must be changed, and in part because the type is internal to 
the parser. Then the constructor must be able to add the type, and so arrived 
at the suggestion of an additional table, for use with std::variant.

Perhaps an entirely static approach can be achieved by the type being a part of 
the token_type. On the other hand, the use is for dynamic token lookup, so it 
may be too much of an effort for that.

Personally, I am already at C++17, so I have no reason for using a simpler 
variant. Having the type stored just adds a size_t, and that is used a lot in 
other circumstances, so no overhead to worry about.





Re: Dynamic token kinds

2018-12-16 Thread Frank Heckenbach
Hans Åberg wrote:

> > On 30 Nov 2018, at 00:40, Frank Heckenbach  wrote:
> > 
> > Hans Åberg wrote:
> > 
> >>> Sure, though for my taste "unsafe" sounds a bit harsh, perhaps
> >>> "unchecked"? If you put in the next release, I'll change my code to
> >>> use it.
> >> 
> >> It seems pretty standard to have lookup tokens with different
> >> syntactic behavior, for example when they are declared of
> >> different type in a language. So it is worrisome that the typed
> >> parser deems the use unsafe.
> > 
> > What is potentially unsafe is that the actual type may not match the
> > declared type in the grammar. With std::variant, a mismatch would
> > cause an exception to be thrown. With Bison's static variant, a
> > mismatch might lead to UB.
> > 
> > So perhaps this function could actually do a type check (which
> > probably requires another auto-generated switch) and also throw or
> > (if this is not desired) call std::terminate or so on mismatch,
> > Akim?
> 
> The C++17 std::variant stores the type as an index. So perhaps
> there should be and additional table storing the type, with a
> symbol constructor that constructs the right value from the token.

Unlike std::variant, Bison's variant does not store the type at all.
So the type check I suggested isn't actually that easy. By the time
the symbol_type constructor is called, the variant value has already
been constructed and type information lost.

So to make it safe, we might need something like this:

  static inline
  symbol_type
  make_symbol (token_type type, b4_locations_if([const location_type& l, ])T&& 
v);

auto-generated for each semantic type T of any token (plus one
without the "v" parameter for untyped tokens) that checks (at
runtime) the "type" parameter against the (statically known) valid
token types for T.

Regards,
Frank



Re: Dynamic token kinds

2018-12-15 Thread Hans Åberg


> On 30 Nov 2018, at 00:40, Frank Heckenbach  wrote:
> 
> Hans Åberg wrote:
> 
>>> Sure, though for my taste "unsafe" sounds a bit harsh, perhaps
>>> "unchecked"? If you put in the next release, I'll change my code to
>>> use it.
>> 
>> It seems pretty standard to have lookup tokens with different
>> syntactic behavior, for example when they are declared of
>> different type in a language. So it is worrisome that the typed
>> parser deems the use unsafe.
> 
> What is potentially unsafe is that the actual type may not match the
> declared type in the grammar. With std::variant, a mismatch would
> cause an exception to be thrown. With Bison's static variant, a
> mismatch might lead to UB.
> 
> So perhaps this function could actually do a type check (which
> probably requires another auto-generated switch) and also throw or
> (if this is not desired) call std::terminate or so on mismatch,
> Akim?

The C++17 std::variant stores the type as an index. So perhaps there should be 
and additional table storing the type, with a symbol constructor that 
constructs the right value from the token.





Re: Dynamic token kinds

2018-11-29 Thread Hans Åberg


> On 29 Nov 2018, at 01:12, Frank Heckenbach  wrote:
> 
> Akim Demaille wrote:
> 
>> Wrt to the symbol constructor, you are right to be worried: I don't
>> consider it (so far?) to be part of the public API.  I do understand
>> something like it is needed, but I don't like that it looks safe
>> to use.
>> 
>> Would you be ok with parser::unsafe_make_symbol, or something like
>> this?
> 
> Sure, though for my taste "unsafe" sounds a bit harsh, perhaps
> "unchecked"? If you put in the next release, I'll change my code to
> use it.

It seems pretty standard to have lookup tokens with different syntactic 
behavior, for example when they are declared of different type in a language. 
So it is worrisome that the typed parser deems the use unsafe.





Re: Dynamic token kinds

2018-11-28 Thread Frank Heckenbach
Akim Demaille wrote:

> Wrt to the symbol constructor, you are right to be worried: I don't
> consider it (so far?) to be part of the public API.  I do understand
> something like it is needed, but I don't like that it looks safe
> to use.
> 
> Would you be ok with parser::unsafe_make_symbol, or something like
> this?

Sure, though for my taste "unsafe" sounds a bit harsh, perhaps
"unchecked"? If you put in the next release, I'll change my code to
use it.

Here's what I had as "make_token" in my patches, now renamed as
"unchecked_make_symbol", perhaps you can use it as is:

In b4_symbol_constructor_declare:

  static inline
  symbol_type
  unchecked_make_symbol (token_type type, b4_locations_if([const location_type& 
l, ])semantic_type&& v = { });

In b4_symbol_constructor_define:

  b4_parser_class_name::symbol_type
  b4_parser_class_name::unchecked_make_symbol (token_type type, 
b4_locations_if([const location_type& l, ])semantic_type&& v)
  {
return symbol_type (type, std::move (v)b4_locations_if([, l]));
  }

Cheers,
Frank



Re: Dynamic token kinds

2018-11-27 Thread Akim Demaille
Hi Frank!

> Le 25 nov. 2018 à 18:17, Frank Heckenbach  a écrit :
> 
> [...] Then my yylex function does this (slightly simplified from my actual
> code; GetToken, TokenText and Loc are functions of my RE-based
> lexer):
> 
>  switch (GetToken ())
>  {
>// ...
>case lt_Identifier:
>  if (auto i = Find (PredefinedStrings, TokenText ()))
>return TParser::symbol_type (TParser::token_type (i->second), Loc ());
>  return TParser::make_identifier (TokenText (), Loc ());
>  }
> 
> The only interesting thing here (as far as Bison is concerned) is
> that I determine the token kind dynamically, so I can't use the
> make_FOO functions, but call the TParser::symbol_type constructor
> manually. (In my experimental changes, I had added a make_token
> function for this purpose, but I hadn't mentioned it in my
> announcements, so obviously it's not present in Bison currently;
> I'm not sure if it was very useful anyway, since it was just a
> wrapper around this constructor).
> 
> So I need to make sure that this is officially supported, i.e. this
> constructor and enum yytokentype will remain in the public interface
> in the generated header, or if other alternatives are recommended
> instead.

yytokentype is already part of the documentation, there's no plan
to change it, I think you are safe.

Wrt to the symbol constructor, you are right to be worried: I don't
consider it (so far?) to be part of the public API.  I do understand
something like it is needed, but I don't like that it looks safe
to use.

Would you be ok with parser::unsafe_make_symbol, or something like
this?