Re: Dynamic token kinds
> Le 23 déc. 2018 à 12:46, Frank Heckenbach a écrit : > > Akim Demaille wrote: > >>> but >>> unfortunately git is quite data hungry and I'm a bit short on data >>> volume (no broadband in my new appartment yet, and mobile data is >>> paid in gold here in Germany), >> >> Ouch. We're lucky in France, that's very cheap. > > I'm lucky to have a relatively fast connection at all. There are > areas where RFC 1149 is the preferred transport layer. :) :) :) >> Please find the tarball here: >> >> https://www.lrde.epita.fr/~akim/private/bison/bison-3.2.1.127-777b.tar.gz >> https://www.lrde.epita.fr/~akim/private/bison/bison-3.2.1.127-777b.tar.xz > > Thanks. I've tried it and it works with my parsers. Great! >> The best way would be to provide tarballs directly from my >> GitHub repo. After all, it's already running travis for the >> tests, it can easily build tarballs. However, I don't know >> how to expose them, I don't think there is a free >> (as in beer) way to do that. If you have an idea, I'll take it! > > I don't know too much about GitHub. As I said, I had hoped "Download > Zip" would do it, but that's not self-contained. I guess it's only an archive of the current files in the repo. So it's probably carrying nothing about gnulib. [gnulib] > I never liked that > design, and apparently it's now causing me actual problems by > breaking the simple download and vastly increasing the clone size > (AFAICS, most of it is downloading the gnulib history which I don't > care a tiny bit about, especially not when just trying to build > Bison). I can understand this. However, this is the only model I can see that really allows the maintainer of a package to be able to use the latest version of the library, in case it addresses a recent portability issue. It's not too inconvenient to use as a user of git, and it's completely invisible for the end user (i.e., users of the tarballs). > Perhaps what I can do (as I slight kludge) for future patches is to [...] The easiest, by far, would be for me to provide you with a viable tarball :(
Re: Dynamic token kinds
Hi Frank! > Le 22 déc. 2018 à 21:12, Frank Heckenbach a écrit : > > Sorry, now I seem to be the one to have trouble using your patch: > > - I guess the right thing is to clone the repository, Yes, it is. > but > unfortunately git is quite data hungry and I'm a bit short on data > volume (no broadband in my new appartment yet, and mobile data is > paid in gold here in Germany), Ouch. We're lucky in France, that's very cheap. > so I could only do it on a server > on which I have access (in the hope of running "make dist" or so > to get it to my machine then), but after downloading >100 MB (like > I feared), bootstrap failed with lots of warnings and errors (the > server is still running jessie, which I can't change, guess that's > too old to bootstrap bison). I don't think so, but I don't actually know. > I suppose I could ask you to send me a "dist" archive for testing > now, or if only "data" files need changes compared to 3.2.2, post > those, but for testing future changes, that's not a very nice > workflow. Is there any better way? Please find the tarball here: https://www.lrde.epita.fr/~akim/private/bison/bison-3.2.1.127-777b.tar.gz https://www.lrde.epita.fr/~akim/private/bison/bison-3.2.1.127-777b.tar.xz The best way would be to provide tarballs directly from my GitHub repo. After all, it's already running travis for the tests, it can easily build tarballs. However, I don't know how to expose them, I don't think there is a free (as in beer) way to do that. If you have an idea, I'll take it!
Re: Dynamic token kinds
Akim Demaille wrote: > > To actually allow this, you could have the typed constructors all > > accept the typeless tokens as well, but I don't consider that really > > necessary. Unless you want to support that for backward (bugward?) > > compatibility, I'll just change my code to make two separate > > make_symbol calls. > > Yes, I prefer it this way. The whole point of my work on C++'s > symbols so far is really to be type safe(r). OK, I've changed my code accordingly. > See below, I have a working draft that completely replaces > make_symbol by "merging" the assert-based type checking into > the symbol_type constructors. Since that makes the ctors safe, > I'm fine with exposing them. Sorry, now I seem to be the one to have trouble using your patch: - It doesn't seem to be against 3.2.2, so I can't directly use "patch". - I tried to manually apply it (like I did yesterday), but failed (maybe I made some mistake this time, or it requires other changes, but I don't think debugging this is worthwhile). - I tried to get just c++.m4 and variant.hh from the branch and copy them over my installation, but that also failed. - Then I tried "Download ZIP" to get the whole branch, but then running ./bootstrap gave these errors which I don't quite get (I thought not requiring a git repository is the point of this download function): ./bootstrap: Bootstrapping from checked-out bison sources... ./bootstrap: getting gnulib files... fatal: Not a git repository (or any parent up to mount point /tmp) Stopping at filesystem boundary (GIT_DISCOVERY_ACROSS_FILESYSTEM not set). - I guess the right thing is to clone the repository, but unfortunately git is quite data hungry and I'm a bit short on data volume (no broadband in my new appartment yet, and mobile data is paid in gold here in Germany), so I could only do it on a server on which I have access (in the hope of running "make dist" or so to get it to my machine then), but after downloading >100 MB (like I feared), bootstrap failed with lots of warnings and errors (the server is still running jessie, which I can't change, guess that's too old to bootstrap bison). I suppose I could ask you to send me a "dist" archive for testing now, or if only "data" files need changes compared to 3.2.2, post those, but for testing future changes, that's not a very nice workflow. Is there any better way? Regards, Frank
Re: Dynamic token kinds
Hi Frank! > Le 22 déc. 2018 à 01:14, Frank Heckenbach a écrit : > > Akim Demaille wrote: > >> I like this idea. I have a draft for it in my repo, as "make-symbol". >> Please, try it and report about it. > > Again, sorry for the delay (still busy), but now I tried it > (removing the "b4_parse_assert_if", see below). Thanks for spending time on this. I really need feedback :) > It seems to work for me. The only issue I had was due to sloppiness > on my side. I'm only mentioning it in case others do the same. > Basically, I had stored tokens of one specific semantic type in a > look-up table together with tokens without semantic type (storing a > dummy value in the table for the latter), and constructed the tokens > for both in the same branch, exploiting the only case where a > mismatch is inconsequential, i.e. setting a value and not using it. > This worked before, but the stricter checks now (correctly) caught > it. Great news! It works! > To actually allow this, you could have the typed constructors all > accept the typeless tokens as well, but I don't consider that really > necessary. Unless you want to support that for backward (bugward?) > compatibility, I'll just change my code to make two separate > make_symbol calls. Yes, I prefer it this way. The whole point of my work on C++'s symbols so far is really to be type safe(r). >> There are a few issues: >> - make_symbol will collide if the user has a token named symbol >> Any idea of a better name? > > To avoid such collisions, I think we have to avoid the "make_" > prefix entirely. Maybe "build_symbol"? > >> Or simply make them actual constructors for `symbol_type`. > > Yes, if they are (documented as) public. I think I'd prefer this as > I wouldn't have to change my code from 3.2.2. See below, I have a working draft that completely replaces make_symbol by "merging" the assert-based type checking into the symbol_type constructors. Since that makes the ctors safe, I'm fine with exposing them. I wish it required less changes. In particular, it tears appart symbol_type and stack_symbol_type even further apart. My CRTP might no longer fully make sense, maybe I'll get rid of it at some point. >> - In the signature of make_symbol, I've had to use int for the >> token type, instead of the enum token_type, and then >> to convert the int into token_type. I don't like that, but I >> probably don't have much of a choice. (A template would be >> overkill IMHO). > > Why, is it because of char tokens (like '+' in your example)? Yes, exactly. I don't like that we accept ASCII, but we have to. Here is the patch, twice. I want to keep the previous one (with make_symbol) in the git history, so the second patch below shows that actual commit, relatively to make_symbol. But I think the patch compared to before make_symbol is a better reading (it's the first one below). Now that I have done this, I think I can merge two different types that currently exist in yy::parser: token and symbol_type. The first one is used _only_ to define the enum of the various token types, and the second one, well, implements the tokens. And 'token' is actually a better name than 'symbol_type'. Of course, I will leave a type alias for symbol_type. WDYT? Currently in https://github.com/akimd/bison/tree/make-symbol. commit 5e8571708e34b3e69a8182a88057199e7bf63568 Author: Akim Demaille Date: Wed Dec 19 17:51:10 2018 +0100 c++: exhibit a safe symbol_type Instead of introducing make_symbol (whose name, btw, somewhat infringes on the user's "name space", if she defines a token named "symbol"), let's make the construction of symbol_type safer, using assertions. For instance with: %token ':' ID INT; generate: symbol_type (int token, const std::string&); symbol_type (int token, const int&); symbol_type (int token); It does mean that now named token constructors (make_ID, make_INT, etc.) go through a useless assert, but I think we can ignore this: I assume any decent compiler will inline the symbol_type ctor inside the make_TOKEN functions, which will show that the assert is trivially verified, hence I expect no code will be emitted for it. And anyway, that's an assert, NDEBUG controls it. * data/c++.m4 (symbol_type): Turn into a subclass of basic_symbol. Declare symbol constructors when variants are enabled. * data/variant.hh (_b4_type_constructor_declare) (_b4_type_constructor_define): Replace with... (_b4_symbol_constructor_declare, _b4_symbol_constructor_def): these. Generate symbol_type constructors. * doc/bison.texi (Complete Symbols): Document. * tests/types.at: Check. diff --git a/NEWS b/NEWS index 08d99f19..c67fb142 100644 --- a/NEWS +++ b/NEWS @@ -96,10 +96,36 @@ GNU Bison NEWS until it sees the '='. So we notate the two possible reductions to indicate that each conflicts in
Re: Dynamic token kinds
Akim Demaille wrote: > Hi Frank, Hi all, > > > Le 16 déc. 2018 à 10:02, Frank Heckenbach a écrit > > : > > > > So to make it safe, we might need something like this: > > > > static inline > > symbol_type > > make_symbol (token_type type, b4_locations_if([const location_type& l, > > ])T&& v); > > > > auto-generated for each semantic type T of any token (plus one > > without the "v" parameter for untyped tokens) that checks (at > > runtime) the "type" parameter against the (statically known) valid > > token types for T. > > I like this idea. I have a draft for it in my repo, as "make-symbol". > Please, try it and report about it. Again, sorry for the delay (still busy), but now I tried it (removing the "b4_parse_assert_if", see below). It seems to work for me. The only issue I had was due to sloppiness on my side. I'm only mentioning it in case others do the same. Basically, I had stored tokens of one specific semantic type in a look-up table together with tokens without semantic type (storing a dummy value in the table for the latter), and constructed the tokens for both in the same branch, exploiting the only case where a mismatch is inconsequential, i.e. setting a value and not using it. This worked before, but the stricter checks now (correctly) caught it. To actually allow this, you could have the typed constructors all accept the typeless tokens as well, but I don't consider that really necessary. Unless you want to support that for backward (bugward?) compatibility, I'll just change my code to make two separate make_symbol calls. > There are a few issues: > - make_symbol will collide if the user has a token named symbol > Any idea of a better name? To avoid such collisions, I think we have to avoid the "make_" prefix entirely. Maybe "build_symbol"? > Or simply make them actual constructors for `symbol_type`. Yes, if they are (documented as) public. I think I'd prefer this as I wouldn't have to change my code from 3.2.2. > - In the signature of make_symbol, I've had to use int for the > token type, instead of the enum token_type, and then > to convert the int into token_type. I don't like that, but I > probably don't have much of a choice. (A template would be > overkill IMHO). Why, is it because of char tokens (like '+' in your example)? Well, it may not be too nice, but I don't really mind, especially since the value is explictly checked here. > - I'm not sure I should emit the assert only when parse.assert > was set, maybe we should always do it. After all, if the user > does not want the check, he can pass -DNDEBUG. parse.assert > was made to check what Bison generates, more than to check what > the user does. I agree, these are different things. Regards, Frank
Re: Dynamic token kinds
Any comment from someone? > Le 19 déc. 2018 à 09:13, Akim Demaille a écrit : > > > >> Le 19 déc. 2018 à 07:34, Akim Demaille a écrit : >> >> Hi Frank, Hi all, >> >>> Le 16 déc. 2018 à 10:02, Frank Heckenbach a écrit >>> : >>> >>> So to make it safe, we might need something like this: >>> >>> static inline >>> symbol_type >>> make_symbol (token_type type, b4_locations_if([const location_type& l, >>> ])T&& v); >>> >>> auto-generated for each semantic type T of any token (plus one >>> without the "v" parameter for untyped tokens) that checks (at >>> runtime) the "type" parameter against the (statically known) valid >>> token types for T. >> >> I like this idea. I have a draft for it in my repo, as "make-symbol". >> Please, try it and report about it. >> >> There are a few issues: >> - make_symbol will collide if the user has a token named symbol >> Any idea of a better name? > > Or simply make them actual constructors for `symbol_type`.
Re: Dynamic token kinds
> Le 19 déc. 2018 à 07:34, Akim Demaille a écrit : > > Hi Frank, Hi all, > >> Le 16 déc. 2018 à 10:02, Frank Heckenbach a écrit : >> >> So to make it safe, we might need something like this: >> >> static inline >> symbol_type >> make_symbol (token_type type, b4_locations_if([const location_type& l, ])T&& >> v); >> >> auto-generated for each semantic type T of any token (plus one >> without the "v" parameter for untyped tokens) that checks (at >> runtime) the "type" parameter against the (statically known) valid >> token types for T. > > I like this idea. I have a draft for it in my repo, as "make-symbol". > Please, try it and report about it. > > There are a few issues: > - make_symbol will collide if the user has a token named symbol > Any idea of a better name? Or simply make them actual constructors for `symbol_type`.
Re: Dynamic token kinds
> On 17 Dec 2018, at 18:37, Frank Heckenbach wrote: > > Do you actually use Bison's variant? Otherwise, what you do is > irrelevant here (sorry), as this is a proposal specifically about > Bison's variant. As I said, I do not use it now, but I wanted to know whether I could use it before actually attempting to convert to it, which may be irrelevant in your programming approach. I have used a typed C++ parser I wrote myself before Akim started to write one, but then it wasn't very useful.
Re: Dynamic token kinds
> On 17 Dec 2018, at 11:17, Frank Heckenbach wrote: > > Hans Åberg wrote: > >>> On 17 Dec 2018, at 10:48, Frank Heckenbach wrote: >> >> Might Bison generate a function with a switch statement, generate the right >> return for the lexer to use? > > Different semantic types need separate functions since C++ is > strongly typed. Perhaps an example makes it clearer: > > Say we have tokens V_FOO and V_BAR with no semantic type, I_BAZ and > I_QUX with semantic type int and S_BLA with type string. (BTW, I'm > no fan of Hungarian notation, just use it here for the sake of > example.) So far Bison generates (roughly speaking): > > symbol_type make_V_FOO (); > symbol_type make_V_BAR (); > symbol_type make_I_BAZ (int &&); > symbol_type make_I_QUX (int &&); > symbol_type make_S_BLA (string &&); I thought something like that from looking at the calculator example. > What I suggest to add (without changing the above), is: > > symbol_type make_symbol (token_type type); > // checks at runtime that type is V_FOO or V_BAR > > symbol_type make_symbol (token_type type, int &&); > // checks at runtime that type is I_BAZ or I_QUX > > symbol_type make_symbol (token_type type, string &&); > // checks at runtime that type is S_BLA > > These runtime checks might be implemented via a switch if that's > easier to auto-generate (it might be in fact) or with a simple > "if (... || ...)" statement, that's an implementation detail. Actually, I pass the semantic value through a class object to which the lexer function belongs, so the extra arguments are not needed. So I must think more about this. >>> It's not that bad actually. Again, my lexers work fine as is. >>> I just brought this up because Akim proposed to call the function >>> "unsafe_..." which I thought was too harsh and proposed >>> "unchecked_..." -- but by adding the checks, it would be neither >>> unsafe nor unchecked. :) >> >> This worries me. > > That's why I suggest to add the check. :) There must be some guard against programming errors, I think. >> But also having having to use something more complex to be returned by the >> lexer than a value on the lookup table . > > The lexer returns a token which contains the token kind (an enum) > and the semantic value (a union value). As mismatch is bad. The > make_FOO functions avoid a mismatch and are suitable for statically > known token kinds. The direct constructor call can be used for > dynamic token kinds, but allows a mismatch. The functions I propose > to generate instead could be used for dynamic token kinds and avoid > a mismatch. > > Everything clear now? Yes, it is the requirement of returning the semantic value that causes the issue. Then this requirement is perhaps born out the condition of not storing the type in the Bison variant.
Re: Dynamic token kinds
Hans Åberg wrote: > > On 17 Dec 2018, at 10:48, Frank Heckenbach wrote: > > > > I think we agree here, and that was actually my concern when I > > started this thread. I don't want to have to write a separate case > > for each token kind in my lexer. Of course, we need a separate case > > for each semantic type because that involves a different type in the > > constructor/builder call already, but these are relatively few, > > compared to token kinds, in my lexers. > > Might Bison generate a function with a switch statement, generate the right > return for the lexer to use? Different semantic types need separate functions since C++ is strongly typed. Perhaps an example makes it clearer: Say we have tokens V_FOO and V_BAR with no semantic type, I_BAZ and I_QUX with semantic type int and S_BLA with type string. (BTW, I'm no fan of Hungarian notation, just use it here for the sake of example.) So far Bison generates (roughly speaking): symbol_type make_V_FOO (); symbol_type make_V_BAR (); symbol_type make_I_BAZ (int &&); symbol_type make_I_QUX (int &&); symbol_type make_S_BLA (string &&); What I suggest to add (without changing the above), is: symbol_type make_symbol (token_type type); // checks at runtime that type is V_FOO or V_BAR symbol_type make_symbol (token_type type, int &&); // checks at runtime that type is I_BAZ or I_QUX symbol_type make_symbol (token_type type, string &&); // checks at runtime that type is S_BLA These runtime checks might be implemented via a switch if that's easier to auto-generate (it might be in fact) or with a simple "if (... || ...)" statement, that's an implementation detail. > >> Maybe an option. Akim perhaps haven't used this dynamic token > >> lookup. > > > > I guess he hasn't. But I don't think we need an option. These would > > just be additional functions that one can use or not. > > The with an option would be that those that do not need this feature could > use a more optimal variant. According to my proposal everyone could use any function. In fact, my lexers do, they use the "safe" make_FOO functions by default, and the (so far) unchecked ones for the dynamicalled looked-up tokens. > >> Those that do might prefer not risking the program to bomb. > > > > It's not that bad actually. Again, my lexers work fine as is. > > I just brought this up because Akim proposed to call the function > > "unsafe_..." which I thought was too harsh and proposed > > "unchecked_..." -- but by adding the checks, it would be neither > > unsafe nor unchecked. :) > > This worries me. That's why I suggest to add the check. :) > But also having having to use something more complex to be returned by the > lexer than a value on the lookup table . The lexer returns a token which contains the token kind (an enum) and the semantic value (a union value). As mismatch is bad. The make_FOO functions avoid a mismatch and are suitable for statically known token kinds. The direct constructor call can be used for dynamic token kinds, but allows a mismatch. The functions I propose to generate instead could be used for dynamic token kinds and avoid a mismatch. Everything clear now? Regards, Frank
Re: Dynamic token kinds
> On 17 Dec 2018, at 10:48, Frank Heckenbach wrote: > > Hans Åberg wrote: > >>> On 16 Dec 2018, at 15:48, Frank Heckenbach wrote: >>> >>> Hans Åberg wrote: >>> The idea would be that rather than returning the token value, something that does not need translation. I don't know if that helps up the static approach, though. >>> >>> Not sure what you mean here. Please elaborate. >> >> I couldn't see the details when I looked at it. I don't use the typed >> parser, but might if it is safe. > > The parser is type safe. This is only about an alternative way of > creating tokens by the lexer, which is also type safe when used > properly (as mine does). It's only about adding an additional safety > net. Right. >>> I think Akim made it clear enough that he doesn't like the overhead. >>> (I don't mind so much, as I used std::variant in my implementation, >>> but I only have a few parsers to care about.) >> >> In that case, my impression was that he thought he could do without it. > > Well, he can. :) In that case. >>> One might validly say that preventing it is the job of the lexer >>> (and my lexer does so), not Bison's, just like other kinds of >>> undefined or wrong behaviour outside of the generated code, also >>> dynamic token types are a somewhat special usage anyway, so Bison >>> can just do nothing about it, that's fine. >> >> I use the same thing, returning the token value found on a lookup >> table, but I would not want to use the typed parser if I would >> have to add translations for every possibility. The information >> about it is in Bison, therefore it should not be put on the >> writing of the lexer. > > I think we agree here, and that was actually my concern when I > started this thread. I don't want to have to write a separate case > for each token kind in my lexer. Of course, we need a separate case > for each semantic type because that involves a different type in the > constructor/builder call already, but these are relatively few, > compared to token kinds, in my lexers. Might Bison generate a function with a switch statement, generate the right return for the lexer to use? >>> I also suggested an approach in my previous mail with a few more >>> generated functions that help runtime checking. I'd prefer Bison to >>> add them, and then we'd have runtime checking as good as we'd have >>> with std::variant in this respect. >> >> Maybe an option. Akim perhaps haven't used this dynamic token >> lookup. > > I guess he hasn't. But I don't think we need an option. These would > just be additional functions that one can use or not. The with an option would be that those that do not need this feature could use a more optimal variant. >> Those that do might prefer not risking the program to bomb. > > It's not that bad actually. Again, my lexers work fine as is. > I just brought this up because Akim proposed to call the function > "unsafe_..." which I thought was too harsh and proposed > "unchecked_..." -- but by adding the checks, it would be neither > unsafe nor unchecked. :) This worries me. But also having having to use something more complex to be returned by the lexer than a value on the lookup table .
Re: Dynamic token kinds
Hans Åberg wrote: > > On 16 Dec 2018, at 15:48, Frank Heckenbach wrote: > > > > Hans Åberg wrote: > > > >> The idea would be that rather than returning the token value, > >> something that does not need translation. I don't know if that > >> helps up the static approach, though. > > > > Not sure what you mean here. Please elaborate. > > I couldn't see the details when I looked at it. I don't use the typed parser, > but might if it is safe. The parser is type safe. This is only about an alternative way of creating tokens by the lexer, which is also type safe when used properly (as mine does). It's only about adding an additional safety net. > > I think Akim made it clear enough that he doesn't like the overhead. > > (I don't mind so much, as I used std::variant in my implementation, > > but I only have a few parsers to care about.) > > In that case, my impression was that he thought he could do without it. Well, he can. :) > > One might validly say that preventing it is the job of the lexer > > (and my lexer does so), not Bison's, just like other kinds of > > undefined or wrong behaviour outside of the generated code, also > > dynamic token types are a somewhat special usage anyway, so Bison > > can just do nothing about it, that's fine. > > I use the same thing, returning the token value found on a lookup > table, but I would not want to use the typed parser if I would > have to add translations for every possibility. The information > about it is in Bison, therefore it should not be put on the > writing of the lexer. I think we agree here, and that was actually my concern when I started this thread. I don't want to have to write a separate case for each token kind in my lexer. Of course, we need a separate case for each semantic type because that involves a different type in the constructor/builder call already, but these are relatively few, compared to token kinds, in my lexers. > > I also suggested an approach in my previous mail with a few more > > generated functions that help runtime checking. I'd prefer Bison to > > add them, and then we'd have runtime checking as good as we'd have > > with std::variant in this respect. > > Maybe an option. Akim perhaps haven't used this dynamic token > lookup. I guess he hasn't. But I don't think we need an option. These would just be additional functions that one can use or not. > Those that do might prefer not risking the program to bomb. It's not that bad actually. Again, my lexers work fine as is. I just brought this up because Akim proposed to call the function "unsafe_..." which I thought was too harsh and proposed "unchecked_..." -- but by adding the checks, it would be neither unsafe nor unchecked. :) So to wrap this up, Akim, do you think adding the typed functions as proposed in one of my last mails is viable? Regards, Frank
Re: Dynamic token kinds
> On 16 Dec 2018, at 15:48, Frank Heckenbach wrote: > > Hans Åberg wrote: > >> The idea would be that rather than returning the token value, >> something that does not need translation. I don't know if that >> helps up the static approach, though. > > Not sure what you mean here. Please elaborate. I couldn't see the details when I looked at it. I don't use the typed parser, but might if it is safe. Personally, I am already at C++17, so I have no reason for using a simpler variant. Having the type stored just adds a size_t, and that is used a lot in other circumstances, so no overhead to worry about. >>> >>> Well, we had this discussion recently (as far as Bison is >>> concerned). >> >> Indeed, but that was where it seemed not requiring the type being >> stored in the variant. This situation might be different in that >> respect. >> >> Here, not making sure the type is properly returned may bomb the >> program, so preventing that seems higher than a rather small >> overhead. > > I think Akim made it clear enough that he doesn't like the overhead. > (I don't mind so much, as I used std::variant in my implementation, > but I only have a few parsers to care about.) In that case, my impression was that he thought he could do without it. > One might validly say that preventing it is the job of the lexer > (and my lexer does so), not Bison's, just like other kinds of > undefined or wrong behaviour outside of the generated code, also > dynamic token types are a somewhat special usage anyway, so Bison > can just do nothing about it, that's fine. I use the same thing, returning the token value found on a lookup table, but I would not want to use the typed parser if I would have to add translations for every possibility. The information about it is in Bison, therefore it should not be put on the writing of the lexer. > I also suggested an approach in my previous mail with a few more > generated functions that help runtime checking. I'd prefer Bison to > add them, and then we'd have runtime checking as good as we'd have > with std::variant in this respect. Maybe an option. Akim perhaps haven't used this dynamic token lookup. Those that do might prefer not risking the program to bomb.
Re: Dynamic token kinds
Hans Åberg wrote: > > On 16 Dec 2018, at 11:13, Frank Heckenbach wrote: > > > > Hans Åberg wrote: > > > >> Perhaps an entirely static approach can be achieved by the type > >> being a part of the token_type. On the other hand, the use is for > >> dynamic token lookup, so it may be too much of an effort for that. > > > > Not sure what you mean with "part of", but with Bison's variant the > > semantic type is determined by token_type, if that's what you mean. > > The idea would be that rather than returning the token value, > something that does not need translation. I don't know if that > helps up the static approach, though. Not sure what you mean here. Please elaborate. > >> Personally, I am already at C++17, so I have no reason for using a > >> simpler variant. Having the type stored just adds a size_t, and > >> that is used a lot in other circumstances, so no overhead to worry > >> about. > > > > Well, we had this discussion recently (as far as Bison is > > concerned). > > Indeed, but that was where it seemed not requiring the type being > stored in the variant. This situation might be different in that > respect. > > Here, not making sure the type is properly returned may bomb the > program, so preventing that seems higher than a rather small > overhead. I think Akim made it clear enough that he doesn't like the overhead. (I don't mind so much, as I used std::variant in my implementation, but I only have a few parsers to care about.) One might validly say that preventing it is the job of the lexer (and my lexer does so), not Bison's, just like other kinds of undefined or wrong behaviour outside of the generated code, also dynamic token types are a somewhat special usage anyway, so Bison can just do nothing about it, that's fine. I also suggested an approach in my previous mail with a few more generated functions that help runtime checking. I'd prefer Bison to add them, and then we'd have runtime checking as good as we'd have with std::variant in this respect. Regards, Frank
Re: Dynamic token kinds
> On 16 Dec 2018, at 11:13, Frank Heckenbach wrote: > > Hans Åberg wrote: > >> Perhaps an entirely static approach can be achieved by the type >> being a part of the token_type. On the other hand, the use is for >> dynamic token lookup, so it may be too much of an effort for that. > > Not sure what you mean with "part of", but with Bison's variant the > semantic type is determined by token_type, if that's what you mean. The idea would be that rather than returning the token value, something that does not need translation. I don't know if that helps up the static approach, though. >> Personally, I am already at C++17, so I have no reason for using a >> simpler variant. Having the type stored just adds a size_t, and >> that is used a lot in other circumstances, so no overhead to worry >> about. > > Well, we had this discussion recently (as far as Bison is > concerned). Indeed, but that was where it seemed not requiring the type being stored in the variant. This situation might be different in that respect. Here, not making sure the type is properly returned may bomb the program, so preventing that seems higher than a rather small overhead.
Re: Dynamic token kinds
Hans Åberg wrote: > Perhaps an entirely static approach can be achieved by the type > being a part of the token_type. On the other hand, the use is for > dynamic token lookup, so it may be too much of an effort for that. Not sure what you mean with "part of", but with Bison's variant the semantic type is determined by token_type, if that's what you mean. > Personally, I am already at C++17, so I have no reason for using a > simpler variant. Having the type stored just adds a size_t, and > that is used a lot in other circumstances, so no overhead to worry > about. Well, we had this discussion recently (as far as Bison is concerned). Regards, Frank
Re: Dynamic token kinds
> On 16 Dec 2018, at 10:02, Frank Heckenbach wrote: > > Hans Åberg wrote: > >>> On 30 Nov 2018, at 00:40, Frank Heckenbach wrote: >>> >>> Hans Åberg wrote: >>> It seems pretty standard to have lookup tokens with different syntactic behavior, for example when they are declared of different type in a language. So it is worrisome that the typed parser deems the use unsafe. >>> >>> What is potentially unsafe is that the actual type may not match the >>> declared type in the grammar. With std::variant, a mismatch would >>> cause an exception to be thrown. With Bison's static variant, a >>> mismatch might lead to UB. >>> >>> So perhaps this function could actually do a type check (which >>> probably requires another auto-generated switch) and also throw or >>> (if this is not desired) call std::terminate or so on mismatch, >>> Akim? >> >> The C++17 std::variant stores the type as an index. So perhaps >> there should be and additional table storing the type, with a >> symbol constructor that constructs the right value from the token. > > Unlike std::variant, Bison's variant does not store the type at all. > So the type check I suggested isn't actually that easy. By the time > the symbol_type constructor is called, the variant value has already > been constructed and type information lost. > > So to make it safe, we might need something like this: > > static inline > symbol_type > make_symbol (token_type type, b4_locations_if([const location_type& l, ])T&& > v); > > auto-generated for each semantic type T of any token (plus one > without the "v" parameter for untyped tokens) that checks (at > runtime) the "type" parameter against the (statically known) valid > token types for T. I felt that one should just return the token value, in part because otherwise the lookup data must be changed, and in part because the type is internal to the parser. Then the constructor must be able to add the type, and so arrived at the suggestion of an additional table, for use with std::variant. Perhaps an entirely static approach can be achieved by the type being a part of the token_type. On the other hand, the use is for dynamic token lookup, so it may be too much of an effort for that. Personally, I am already at C++17, so I have no reason for using a simpler variant. Having the type stored just adds a size_t, and that is used a lot in other circumstances, so no overhead to worry about.
Re: Dynamic token kinds
Hans Åberg wrote: > > On 30 Nov 2018, at 00:40, Frank Heckenbach wrote: > > > > Hans Åberg wrote: > > > >>> Sure, though for my taste "unsafe" sounds a bit harsh, perhaps > >>> "unchecked"? If you put in the next release, I'll change my code to > >>> use it. > >> > >> It seems pretty standard to have lookup tokens with different > >> syntactic behavior, for example when they are declared of > >> different type in a language. So it is worrisome that the typed > >> parser deems the use unsafe. > > > > What is potentially unsafe is that the actual type may not match the > > declared type in the grammar. With std::variant, a mismatch would > > cause an exception to be thrown. With Bison's static variant, a > > mismatch might lead to UB. > > > > So perhaps this function could actually do a type check (which > > probably requires another auto-generated switch) and also throw or > > (if this is not desired) call std::terminate or so on mismatch, > > Akim? > > The C++17 std::variant stores the type as an index. So perhaps > there should be and additional table storing the type, with a > symbol constructor that constructs the right value from the token. Unlike std::variant, Bison's variant does not store the type at all. So the type check I suggested isn't actually that easy. By the time the symbol_type constructor is called, the variant value has already been constructed and type information lost. So to make it safe, we might need something like this: static inline symbol_type make_symbol (token_type type, b4_locations_if([const location_type& l, ])T&& v); auto-generated for each semantic type T of any token (plus one without the "v" parameter for untyped tokens) that checks (at runtime) the "type" parameter against the (statically known) valid token types for T. Regards, Frank
Re: Dynamic token kinds
> On 30 Nov 2018, at 00:40, Frank Heckenbach wrote: > > Hans Åberg wrote: > >>> Sure, though for my taste "unsafe" sounds a bit harsh, perhaps >>> "unchecked"? If you put in the next release, I'll change my code to >>> use it. >> >> It seems pretty standard to have lookup tokens with different >> syntactic behavior, for example when they are declared of >> different type in a language. So it is worrisome that the typed >> parser deems the use unsafe. > > What is potentially unsafe is that the actual type may not match the > declared type in the grammar. With std::variant, a mismatch would > cause an exception to be thrown. With Bison's static variant, a > mismatch might lead to UB. > > So perhaps this function could actually do a type check (which > probably requires another auto-generated switch) and also throw or > (if this is not desired) call std::terminate or so on mismatch, > Akim? The C++17 std::variant stores the type as an index. So perhaps there should be and additional table storing the type, with a symbol constructor that constructs the right value from the token.
Re: Dynamic token kinds
> On 29 Nov 2018, at 01:12, Frank Heckenbach wrote: > > Akim Demaille wrote: > >> Wrt to the symbol constructor, you are right to be worried: I don't >> consider it (so far?) to be part of the public API. I do understand >> something like it is needed, but I don't like that it looks safe >> to use. >> >> Would you be ok with parser::unsafe_make_symbol, or something like >> this? > > Sure, though for my taste "unsafe" sounds a bit harsh, perhaps > "unchecked"? If you put in the next release, I'll change my code to > use it. It seems pretty standard to have lookup tokens with different syntactic behavior, for example when they are declared of different type in a language. So it is worrisome that the typed parser deems the use unsafe.
Re: Dynamic token kinds
Akim Demaille wrote: > Wrt to the symbol constructor, you are right to be worried: I don't > consider it (so far?) to be part of the public API. I do understand > something like it is needed, but I don't like that it looks safe > to use. > > Would you be ok with parser::unsafe_make_symbol, or something like > this? Sure, though for my taste "unsafe" sounds a bit harsh, perhaps "unchecked"? If you put in the next release, I'll change my code to use it. Here's what I had as "make_token" in my patches, now renamed as "unchecked_make_symbol", perhaps you can use it as is: In b4_symbol_constructor_declare: static inline symbol_type unchecked_make_symbol (token_type type, b4_locations_if([const location_type& l, ])semantic_type&& v = { }); In b4_symbol_constructor_define: b4_parser_class_name::symbol_type b4_parser_class_name::unchecked_make_symbol (token_type type, b4_locations_if([const location_type& l, ])semantic_type&& v) { return symbol_type (type, std::move (v)b4_locations_if([, l])); } Cheers, Frank
Re: Dynamic token kinds
Hi Frank! > Le 25 nov. 2018 à 18:17, Frank Heckenbach a écrit : > > [...] Then my yylex function does this (slightly simplified from my actual > code; GetToken, TokenText and Loc are functions of my RE-based > lexer): > > switch (GetToken ()) > { >// ... >case lt_Identifier: > if (auto i = Find (PredefinedStrings, TokenText ())) >return TParser::symbol_type (TParser::token_type (i->second), Loc ()); > return TParser::make_identifier (TokenText (), Loc ()); > } > > The only interesting thing here (as far as Bison is concerned) is > that I determine the token kind dynamically, so I can't use the > make_FOO functions, but call the TParser::symbol_type constructor > manually. (In my experimental changes, I had added a make_token > function for this purpose, but I hadn't mentioned it in my > announcements, so obviously it's not present in Bison currently; > I'm not sure if it was very useful anyway, since it was just a > wrapper around this constructor). > > So I need to make sure that this is officially supported, i.e. this > constructor and enum yytokentype will remain in the public interface > in the generated header, or if other alternatives are recommended > instead. yytokentype is already part of the documentation, there's no plan to change it, I think you are safe. Wrt to the symbol constructor, you are right to be worried: I don't consider it (so far?) to be part of the public API. I do understand something like it is needed, but I don't like that it looks safe to use. Would you be ok with parser::unsafe_make_symbol, or something like this?