Title: [233146] trunk/Tools
Revision
233146
Author
[email protected]
Date
2018-06-25 08:15:03 -0700 (Mon, 25 Jun 2018)

Log Message

[WSL] Start writing the Sphinx document
https://bugs.webkit.org/show_bug.cgi?id=186310

Rubberstamped by Filip Pizlo.

Very early work, just has the lexer and a few fragments of the parser so far.
Also fixing some minor mistake in the formal rules.

Modified Paths

Diff

Modified: trunk/Tools/ChangeLog (233145 => 233146)


--- trunk/Tools/ChangeLog	2018-06-25 15:01:35 UTC (rev 233145)
+++ trunk/Tools/ChangeLog	2018-06-25 15:15:03 UTC (rev 233146)
@@ -1,3 +1,18 @@
+2018-06-25  Robin Morisset  <[email protected]>
+
+        [WSL] Start writing the Sphinx document
+        https://bugs.webkit.org/show_bug.cgi?id=186310
+
+        Rubberstamped by Filip Pizlo.
+
+        Very early work, just has the lexer and a few fragments of the parser so far.
+        Also fixing some minor mistake in the formal rules.
+
+        * WebGPUShadingLanguageRI/SpecWork/WSL.g4:
+        * WebGPUShadingLanguageRI/SpecWork/WSL.ott:
+        * WebGPUShadingLanguageRI/SpecWork/source/conf.py:
+        * WebGPUShadingLanguageRI/SpecWork/source/index.rst:
+
 2018-06-23  Yusuke Suzuki  <[email protected]>
 
         [WTF] Add user-defined literal for ASCIILiteral

Modified: trunk/Tools/WebGPUShadingLanguageRI/SpecWork/WSL.g4 (233145 => 233146)


--- trunk/Tools/WebGPUShadingLanguageRI/SpecWork/WSL.g4	2018-06-25 15:01:35 UTC (rev 233145)
+++ trunk/Tools/WebGPUShadingLanguageRI/SpecWork/WSL.g4	2018-06-25 15:15:03 UTC (rev 233146)
@@ -4,6 +4,8 @@
  * Lexer
  */
 Whitespace: [ \t\r\n]+ -> skip ;
+LineComment: '//'[^\r\n] -> skip ;
+LongComment: '/*'.*?'*/' -> skip ;
 
 // Note: we forbid leading 0s in decimal integers. to bikeshed.
 fragment CoreDecimalIntLiteral: [1-9] [0-9]* ;
@@ -44,9 +46,9 @@
 RETURN: 'return';
 TRAP: 'trap';
 
-fragment NULL: 'null';
-fragment TRUE: 'true';
-fragment FALSE: 'false';
+NULL: 'null';
+TRUE: 'true';
+FALSE: 'false';
 // Note: We could make these three fully case sensitive or insensitive. to bikeshed.
 
 CONSTANT: 'constant';

Modified: trunk/Tools/WebGPUShadingLanguageRI/SpecWork/WSL.ott (233145 => 233146)


--- trunk/Tools/WebGPUShadingLanguageRI/SpecWork/WSL.ott	2018-06-25 15:01:35 UTC (rev 233145)
+++ trunk/Tools/WebGPUShadingLanguageRI/SpecWork/WSL.ott	2018-06-25 15:15:03 UTC (rev 233146)
@@ -27,7 +27,7 @@
     | trap ; :: :: trap
     | { blockAnnot s0 .. sn } :: :: block
     | e ; :: :: effectful_expr
-    | x : sid ; :: :: resolved_vdecl {{com post-monomorphisation variable declaration}}
+    | tval x : sid ; :: :: resolved_vdecl {{com post-monomorphisation variable declaration}}
     | Loop ( s , s' ) :: :: loop_construct {{com Special, only during execution}}
     | Cases ( s0 , .. , sn ) :: :: cases_construct {{com Special, only during execution}}
 
@@ -226,8 +226,9 @@
     | E = E' :: :: event_eq
     | R = R' :: :: exec_env_eq
     | s = s' :: :: stmt_eq
-    | rv not in sc0 .. scn :: :: rval_not_in_cases
+    | rv not in sc0 .. scn :: :: rval_not_in_cases % TODO: fix typesetting
     | s = { sblock } :: :: block_from_switch_block
+    | rv = Default ( tval ) :: :: default_value
 
 defns
 desugaring :: '' ::=
@@ -705,8 +706,9 @@
     Rout |- {R s s1..sn} -> s;
 
     R' = R[x -> LVal(sid)]
-    ------------------------------------------- :: block_vdecl
-    Rout |- {R x : sid; s1..sn} -> {R' s1..sn};
+    rv = Default(tval)
+    ----------------------------------------------------------- :: block_vdecl
+    Rout |- {R tval x : sid; s1..sn} -> {R' s1..sn} ; sid <- rv
 
     R |- e -> e' ; E
     ------------------ :: effectful_expr_reduce

Modified: trunk/Tools/WebGPUShadingLanguageRI/SpecWork/source/conf.py (233145 => 233146)


--- trunk/Tools/WebGPUShadingLanguageRI/SpecWork/source/conf.py	2018-06-25 15:01:35 UTC (rev 233145)
+++ trunk/Tools/WebGPUShadingLanguageRI/SpecWork/source/conf.py	2018-06-25 15:15:03 UTC (rev 233146)
@@ -77,7 +77,7 @@
 # The theme to use for HTML and HTML Help pages.  See the documentation for
 # a list of builtin themes.
 #
-html_theme = 'alabaster'
+html_theme = 'nature'
 
 # Theme options are theme-specific and customize the look and feel of a theme
 # further.  For a list of options available for each theme, see the
@@ -163,4 +163,4 @@
 # -- Options for todo extension ----------------------------------------------
 
 # If true, `todo` and `todoList` produce output, else they produce nothing.
-todo_include_todos = True
\ No newline at end of file
+todo_include_todos = True

Modified: trunk/Tools/WebGPUShadingLanguageRI/SpecWork/source/index.rst (233145 => 233146)


--- trunk/Tools/WebGPUShadingLanguageRI/SpecWork/source/index.rst	2018-06-25 15:01:35 UTC (rev 233145)
+++ trunk/Tools/WebGPUShadingLanguageRI/SpecWork/source/index.rst	2018-06-25 15:15:03 UTC (rev 233146)
@@ -12,18 +12,193 @@
 Lexical analysis
 ----------------
 
+Before parsing, the text of a WSL program is first turned into a list of tokens, removing comments and whitespace along the way.
+Tokens are built greedily, in other words each token is as long as possible.
+If the program cannot be transformed into a list of tokens by following these rules, the program is invalid and must be rejected.
+
+A token can be either of:
+
+- An integer literal
+- A float literal
+- Punctuation
+- A keyword
+- A normal identifier
+- An operator name
+
+Literals
+""""""""
+
+An integer literal can either be decimal or hexadecimal, and either signed or unsigned, giving 4 possibilities.
+
+- A signed decimal integer literal starts with an optional ``-``, then a number without leading 0.
+- An unsigned decimal integer literal starts with a number without leading 0, then ``u``.
+- A signed hexadecimal integer literal starts with an optional ``-``, then the string ``0x``, then a non-empty sequence of elements of [0-9a-fA-F] (non-case sensitive, leading 0s are allowed).
+- An unsigned hexadecimal inter literal starts with the string ``0x``, then a non-empty sequence of elements of [0-9a-fA-F] (non-case sensitive, leading 0s are allowed), and finally the character ``u``.
+
+.. todo:: I chose rather arbitrarily to allow leading 0s in hexadecimal, but not in decimal integer literals. This can obviously be changed either way.
+
+A float literal is made of the following elements in sequence:
+
+- an optional ``-`` character
+- a sequence of 0 or more digits (in [0-9])
+- a ``.`` character
+- a sequence of 0 or more digits (in [0-9]). This sequence must instead have 1 or more elements, if the last sequence was empty.
+- optionally a ``f`` or ``d`` character
+
+In regexp form: '-'? ([0-9]+ '.' [0-9]* | [0-9]* '.' [0-9]+) [fd]?
+
+Keywords and punctuation
+""""""""""""""""""""""""
+
+The following strings are reserved keywords of the language:
+
++-------------------------------+---------------------------------------------------------------------------------+
+| Top level                     | struct typedef enum operator vertex fragment native restricted                  |
++-------------------------------+---------------------------------------------------------------------------------+
+| Control flow                  | if else switch case default while do for break continue fallthrough return trap |
++-------------------------------+---------------------------------------------------------------------------------+
+| Literals                      | null true false                                                                 |
++-------------------------------+---------------------------------------------------------------------------------+
+| Address space                 | constant device threadgroup thread                                              |
++-------------------------------+---------------------------------------------------------------------------------+
+| Reserved for future extension | protocol auto                                                                   |
++-------------------------------+---------------------------------------------------------------------------------+
+
+Similarily, the following elements of punctuation are valid tokens:
+
++----------------------+-----------------------------------------------------------------------------------------+
+| Relational operators | ``==`` ``!=`` ``<=`` ``=>`` ``<`` ``>``                                                 |
++----------------------+-----------------------------------------------------------------------------------------+
+| Assignment operators | ``++`` ``--`` ``+=`` ``-=`` ``*=`` ``/=`` ``%=`` ``^=`` ``&=``  ``|=`` ``>>=``  ``<<=`` |
++----------------------+-----------------------------------------------------------------------------------------+
+| Arithmetic operators | ``+``  ``-`` ``*`` ``/`` ``%``                                                          |
++----------------------+-----------------------------------------------------------------------------------------+
+| Logic operators      | ``&&`` ``||`` ``&``  ``|``  ``^`` ``>>`` ``<<`` ``!`` ``~``                             |
++----------------------+-----------------------------------------------------------------------------------------+
+| Memory operators     | ``->`` ``.`` ``&`` ``@``                                                                |
++----------------------+-----------------------------------------------------------------------------------------+
+| Other                | ``?`` ``:`` ``;`` ``,`` ``[`` ``]`` ``{`` ``}`` ``(`` ``)``                             |
++----------------------+-----------------------------------------------------------------------------------------+
+
+Identifiers and operator names
+""""""""""""""""""""""""""""""
+
+An identifier is any sequence of alphanumeric characters or underscores, that does not start by a digit, that is not a single underscore (the single underscore is reserved for future extension), and that is not a reserved keyword.
+TODO: decide if we only accept [_a-zA-Z][_a-zA-Z0-9], or whether we also accept unicode characters.
+
+Operator names can be either of the 4 following possibilities:
+
+- the string ``operator``, followed immediately with one of the following strings: ``>>``, ``<<``, ``+``, ``-``, ``*``, ``/``, ``%``, ``&&``, ``||``, ``&``, ``|``, ``^``, ``>=``, ``<=``, ``>``, ``<``, ``++``, ``--``, ``!``, ``~``, ``[]``, ``[]=``, ``&[]``.
+- the string ``operator.`` followed immediately with what would be a valid identifier x. We call this token a 'getter for x'.
+- the string ``operator.`` followed immediately with what would be a valid identifier x, followed immediately with the character ``=``. We call this token 'a setter for x'.
+- the string ``operator&.`` followed immediately with what would be a valid identifier x. We call this token an 'address taker for x'.
+
+.. note:: Thanks to the rule that token are read greedily, the string "operator.foo" is a single token (a getter for foo), and not the keyword "operator" followed by the punctuation "." followed by the identifier "foo".
+
+Whitespace and comments
+"""""""""""""""""""""""
+
+Any of the following characters are considered whitespace, and ignored after this phase: space, tabulation (``\t``), carriage return (``\r``), new line(``\n``).
+
+.. todo:: do we want to also allow other unicode whitespace characters?
+
+We also allow two kinds of comments, that are treated like whitespace (i.e. ignored during parsing).
+The first kind is a line comment, that starts with the string ``//`` and continues until the next end of line character.
+The second kind is a multi-line comment, that starts with the string ``/*`` and ends as soon as the string ``*/`` is read.
+
+.. note:: Multi-line comments cannot be nested, as the first ``*/`` closes the outermost ``/*``
+
 Parsing
 -------
 
-Notations
-"""""""""
+.. todo:: add here a quick explanation of BNF syntax and our conventions.
 
 Top-level declarations
 """"""""""""""""""""""
 
+A valid file is made of a sequence of 0 or more top-level declarations, followed by the special End-Of-File token.
+
+.. productionlist::
+    topLevelDecl: ";" | typedef | structDef | enumDef | funcDef
+
+.. todo:: We may want to also allow variable declarations at the top-level if it can easily be supported by all of our targets.
+.. todo:: Decide whether we put native/restricted in the spec or not.
+
+.. productionlist::
+    typedef: "typedef" `Identifier` "=" `type` ";"
+
+.. productionlist::
+    structDef: "struct" `Identifier` "{" `structElement`* "}"
+    structElement: `type` `Identifier` ";"
+
+.. productionlist::
+    enumDef: "enum" `Identifier` (":" `type`)? "{" `enumElement` ("," `enumElement`)* "}"
+    enumElement: `Identifier` ("=" `constexpr`)?
+
+.. productionlist::
+    funcDef: `funcDecl` "{" `stmt`* "}"
+    funcDecl: `entryPointDecl` | `normalFuncDecl` | `castOperatorDecl`
+    entryPointDecl: ("vertex" | "fragment") `type` `Identifier` `parameters`
+    normalFuncDecl: `type` (`Identifier` | `OperatorName`) `parameters`
+    castOperatorDecl: "operator" `type` `parameters`
+    parameters: "(" ")" | "(" `parameter` ("," `parameter`)* ")"
+    parameter: `type` `Identifier`
+
+.. note:: the return type is put after the "operator" keyword when declaring a cast operator, mostly because it is also the name of the created function. 
+
 Statements
 """"""""""
 
+.. productionlist::
+    stmt: "{" `stmt`* "}"
+        : | `compoundStmt` 
+        : | `terminatorStmt` ";" 
+        : | `variableDecls` ";" 
+        : | `maybeEffectfulExpr` ";"
+    compoundStmt: `ifStmt` | `ifElseStmt` | `whileStmt` | `doWhileStmt` | `forStmt` | `switchStmt`
+    terminatorStmt: "break" | "continue" | "fallthrough" | "return" `expr`? | "trap"
+
+.. productionlist::
+    ifStmt: "if" "(" expr ")" stmt
+    ifElseStmt: "if" "(" expr ")" stmt "else" stmt
+
+The first of these two productions is merely syntactic sugar for the second:
+
+.. todo:: should I forbid assignments (without parentheses) inside the conditions of if/while to avoid the common mistaking of "=" for "==" ? 
+
+.. math:: \textbf{if}(e) \,s \leadsto \textbf{if}(e) \,s\, \textbf{else} \,\{\}
+
+.. productionlist::
+    whileStmt: "while" "(" `expr` ")" stmt
+    forStmt: "for" "(" (maybeEffectfulExpr | variableDecls) ";" `expr`? ";" `expr`? ")" `stmt`
+    doWhileStmt: "do" s "while" "(" `expr` ")" ";"
+
+Similarily, we desugar first for loops into while loops, and then all while loops into do while loops.
+First, if the second element of the for is empty we replace it by "true".
+Then, if the third element of the for is empty, we replace it by "null" (any _expression_ without side effect would do)
+Finally, we apply the following two rules:
+
+.. math::
+    \textbf{for} (X_{pre} ; e_{cond} ; e_{iter}) \, s \leadsto \{ X_{pre} ; \textbf{while} (e_{cond}) \{ s \, e_{iter} ; \} \}
+
+.. math::
+    \textbf{while} (e)\, s \leadsto \textbf{if} (e) \textbf{do}\, s\, \textbf{while}(e)
+
+.. productionlist::
+    switchStmt: "switch" "(" `expr` ")" "{" `switchCase`* "}"
+    switchCase: ("case" `constexpr` | "default") ":" stmt*
+
+.. productionlist::
+    variableDecls: `type` `variableDecl` ("," variableDecl)*
+    variableDecl: `Identifier` ("=" `expr`)?
+
+Complex variable declarations are also mere syntactic sugar.
+Several variable declarations separated by commas are the same as separating them with semicolons and repeating the type for each one.
+And a variable declaration with an initializer is the same as an uninitialized declaration, followed by an assignment of the corresponding value to this variable.
+These two transformations can always be done because variable declarations are only allowed inside blocks (and for loops, but these get desugared into a block, see above).
+
+.. todo:: should I make the requirement that variableDecls only appear in blocks be part of the syntax, or should it just be part of the validation rules?
+
 Types
 """""
 
@@ -36,6 +211,8 @@
 Phase 1: Desugaring
 -------------------
 
+TODO: move everything from this phase straight into the parser.
+
 desugaring top level:
 
 - Adding <> in basically every possible place. (after a struct name, after a typedef name, after a function/operator name)
@@ -113,7 +290,7 @@
 Typing expressions
 """"""""""""""""""
 
- typing rules (this and everything that follows can be managed by just a pair of judgements that type stmts/exprs)
+- typing rules (this and everything that follows can be managed by just a pair of judgements that type stmts/exprs)
 - checking returns
 - check that every variable declaration is in a block or at the top-level
 - check that no variable declaration shadows another one at the same scope
_______________________________________________
webkit-changes mailing list
[email protected]
https://lists.webkit.org/mailman/listinfo/webkit-changes

Reply via email to