To make it more consumable.
--
- Founder - https://commandprompt.com/ - 24x7x365 Postgres since 1997
- Founder and Co-Chair - https://postgresconf.org/
- Founder - https://postgresql.us - United States PostgreSQL
- Public speaker, published author, postgresql expert, and people
believer.
- Host - More than a refresh
<https://commandprompt.com/about/more-than-a-refresh/>: A podcast about
data and the people who wrangle it.
diff --git a/doc/src/sgml/syntax.sgml b/doc/src/sgml/syntax.sgml
index 916189a7d68..900f8f8e441 100644
--- a/doc/src/sgml/syntax.sgml
+++ b/doc/src/sgml/syntax.sgml
@@ -15,10 +15,10 @@
</para>
<para>
- We also advise users who are already familiar with SQL to read this
- chapter carefully because it contains several rules and concepts that
- are implemented inconsistently among SQL databases or that are
- specific to <productname>PostgreSQL</productname>.
+ We advise users who to read this chapter carefully because it
+ contains several rules and concepts that are implemented
+ inconsistently among SQL databases or that are specific to
+ <productname>PostgreSQL</productname>.
</para>
<sect1 id="sql-syntax-lexical">
@@ -29,53 +29,48 @@
</indexterm>
<para>
- SQL input consists of a sequence of
+ SQL consists of a sequence of
<firstterm>commands</firstterm>. A command is composed of a
sequence of <firstterm>tokens</firstterm>, terminated by a
- semicolon (<quote>;</quote>). The end of the input stream also
- terminates a command. Which tokens are valid depends on the syntax
- of the particular command.
+ semicolon (<quote>;</quote>). Which tokens are valid depends on
+ the syntax of the particular command.
</para>
<para>
A token can be a <firstterm>key word</firstterm>, an
<firstterm>identifier</firstterm>, a <firstterm>quoted
identifier</firstterm>, a <firstterm>literal</firstterm> (or
- constant), or a special character symbol. Tokens are normally
- separated by whitespace (space, tab, newline), but need not be if
- there is no ambiguity (which is generally only the case if a
- special character is adjacent to some other token type).
+ constant), or a special character symbol.
</para>
<para>
- For example, the following is (syntactically) valid SQL input:
+ The following is (syntactically) valid SQL input:
<programlisting>
SELECT * FROM MY_TABLE;
UPDATE MY_TABLE SET A = 5;
INSERT INTO MY_TABLE VALUES (3, 'hi there');
</programlisting>
- This is a sequence of three commands, one per line (although this
- is not required; more than one command can be on a line, and
- commands can usefully be split across lines).
+ This is a sequence of three commands, one per line. More
+ than one command can be on a line, and commands can also be split
+ across lines.
</para>
<para>
- Additionally, <firstterm>comments</firstterm> can occur in SQL
- input. They are not tokens, they are effectively equivalent to
- whitespace.
+ Additionally, <firstterm>comments</firstterm> are not tokens and
+ can occur in SQL input.
</para>
<para>
- The SQL syntax is not very consistent regarding what tokens
- identify commands and which are operands or parameters. The first
- few tokens are generally the command name, so in the above example
- we would usually speak of a <quote>SELECT</quote>, an
- <quote>UPDATE</quote>, and an <quote>INSERT</quote> command. But
- for instance the <command>UPDATE</command> command always requires
- a <token>SET</token> token to appear in a certain position, and
- this particular variation of <command>INSERT</command> also
- requires a <token>VALUES</token> in order to be complete. The
- precise syntax rules for each command are described in <xref linkend="reference"/>.
+ SQL is not consistent regarding what tokens identify commands and
+ which are operands or parameters. The first few tokens are
+ generally the command name, so in the above example we would
+ speak of a <quote>SELECT</quote>, an <quote>UPDATE</quote>, and
+ an <quote>INSERT</quote> command. But for the
+ <command>UPDATE</command> command always requires a <token>SET</token>
+ token to appear in a certain position, and this particular
+ variation of <command>INSERT</command> also requires a
+ <token>VALUES</token> in order to be complete. The precise syntax
+ rules for each command are described in <xref linkend="reference"/>.
</para>
<sect2 id="sql-syntax-identifiers">
@@ -98,11 +93,10 @@ INSERT INTO MY_TABLE VALUES (3, 'hi there');
<para>
Tokens such as <token>SELECT</token>, <token>UPDATE</token>, or
- <token>VALUES</token> in the example above are examples of
- <firstterm>key words</firstterm>, that is, words that have a fixed
- meaning in the SQL language. The tokens <token>MY_TABLE</token>
- and <token>A</token> are examples of
- <firstterm>identifiers</firstterm>. They identify names of
+ <token>VALUES</token> in the example above are
+ <firstterm>key words</firstterm> and they have a fixed meaning in
+ SQL. The tokens <token>MY_TABLE</token> and <token>A</token> are
+ examples of <firstterm>identifiers</firstterm>. They identify names of
tables, columns, or other database objects, depending on the
command they are used in. Therefore they are sometimes simply
called <quote>names</quote>. Key words and identifiers have the
@@ -119,24 +113,12 @@ INSERT INTO MY_TABLE VALUES (3, 'hi there');
(<literal>_</literal>). Subsequent characters in an identifier or
key word can be letters, underscores, digits
(<literal>0</literal>-<literal>9</literal>), or dollar signs
- (<literal>$</literal>). Note that dollar signs are not allowed in identifiers
- according to the letter of the SQL standard, so their use might render
- applications less portable.
- The SQL standard will not define a key word that contains
- digits or starts or ends with an underscore, so identifiers of this
- form are safe against possible conflict with future extensions of the
- standard.
- </para>
-
- <para>
- <indexterm><primary>identifier</primary><secondary>length</secondary></indexterm>
- The system uses no more than <symbol>NAMEDATALEN</symbol>-1
- bytes of an identifier; longer names can be written in
- commands, but they will be truncated. By default,
- <symbol>NAMEDATALEN</symbol> is 64 so the maximum identifier
- length is 63 bytes. If this limit is problematic, it can be raised by
- changing the <symbol>NAMEDATALEN</symbol> constant in
- <filename>src/include/pg_config_manual.h</filename>.
+ (<literal>$</literal>). Note that dollar signs are not allowed in
+ identifiers according to the letter of the SQL standard, so their use
+ might render applications less portable. The SQL standard will not
+ define a key word that contains digits or starts or ends with an
+ underscore, so identifiers of this form are safe against possible
+ conflict with future extensions of the standard.
</para>
<para>
@@ -152,7 +134,7 @@ UPDATE MY_TABLE SET A = 5;
<programlisting>
uPDaTE my_TabLE SeT a = 5;
</programlisting>
- A convention often used is to write key words in upper
+ The recommened convention is to write key words in upper
case and names in lower case, e.g.:
<programlisting>
UPDATE my_table SET a = 5;
@@ -183,27 +165,25 @@ UPDATE "my_table" SET "a" = 5;
<para>
Quoted identifiers can contain any character, except the character
- with code zero. (To include a double quote, write two double quotes.)
- This allows constructing table or column names that would
- otherwise not be possible, such as ones containing spaces or
- ampersands. The length limitation still applies.
+ with code zero. This allows constructing table or column names that
+ would otherwise not be possible, such as ones containing spaces or
+ ampersands.
</para>
<para>
- Quoting an identifier also makes it case-sensitive, whereas
- unquoted names are always folded to lower case. For example, the
- identifiers <literal>FOO</literal>, <literal>foo</literal>, and
+ Quoting an identifier makes it case-sensitive, whereas unquoted names
+ are always folded to lower case. For example, the identifiers
+ <literal>FOO</literal>, <literal>foo</literal>, and
<literal>"foo"</literal> are considered the same by
- <productname>PostgreSQL</productname>, but
- <literal>"Foo"</literal> and <literal>"FOO"</literal> are
- different from these three and each other. (The folding of
- unquoted names to lower case in <productname>PostgreSQL</productname> is
- incompatible with the SQL standard, which says that unquoted names
- should be folded to upper case. Thus, <literal>foo</literal>
- should be equivalent to <literal>"FOO"</literal> not
- <literal>"foo"</literal> according to the standard. If you want
- to write portable applications you are advised to always quote a
- particular name or never quote it.)
+ <productname>PostgreSQL</productname>, but <literal>"Foo"</literal>
+ and <literal>"FOO"</literal> are not considered the same.
+ </para>
+ <para>
+ The folding of unquoted names to lower case in is incompatible with the
+ SQL standard. Thus, <literal>foo</literal> should be equivalent to
+ <literal>"FOO"</literal> not <literal>"foo"</literal> according to the
+ standard. If you want to write portable applications you are advised to
+ always quote a particular name or never quote it.
</para>
<indexterm>
@@ -212,20 +192,16 @@ UPDATE "my_table" SET "a" = 5;
</indexterm>
<para>
- A variant of quoted
- identifiers allows including escaped Unicode characters identified
- by their code points. This variant starts
- with <literal>U&</literal> (upper or lower case U followed by
+ A variant of quoted identifiers allows including escaped Unicode
+ characters identified by their code points. This variant starts
+ with <literal>U&</literal> (upper or lower case U followed by
ampersand) immediately before the opening double quote, without
any spaces in between, for example <literal>U&"foo"</literal>.
- (Note that this creates an ambiguity with the
- operator <literal>&</literal>. Use spaces around the operator to
- avoid this problem.) Inside the quotes, Unicode characters can be
- specified in escaped form by writing a backslash followed by the
- four-digit hexadecimal code point number or alternatively a
- backslash followed by a plus sign followed by a six-digit
- hexadecimal code point number. For example, the
- identifier <literal>"data"</literal> could be written as
+ Inside the quotes, Unicode characters can be specified in escaped
+ form by writing a backslash followed by the four-digit hexadecimal
+ code point number or alternatively a backslash followed by a plus
+ sign followed by a six-digit hexadecimal code point number. For
+ example, the identifier <literal>"data"</literal> could be written as
<programlisting>
U&"d\0061t\+000061"
</programlisting>
@@ -330,9 +306,8 @@ SELECT 'foobar';
<programlisting>
SELECT 'foo' 'bar';
</programlisting>
- is not valid syntax. (This slightly bizarre behavior is specified
- by <acronym>SQL</acronym>; <productname>PostgreSQL</productname> is
- following the standard.)
+ is not valid syntax. This behavior is specified by
+ the <acronym>SQL</acronym> standard.
</para>
</sect3>
@@ -351,9 +326,9 @@ SELECT 'foo' 'bar';
string constants, which are an extension to the SQL standard.
An escape string constant is specified by writing the letter
<literal>E</literal> (upper or lower case) just before the opening single
- quote, e.g., <literal>E'foo'</literal>. (When continuing an escape string
+ quote, e.g., <literal>E'foo'</literal>. When continuing an escape string
constant across lines, write <literal>E</literal> only before the first opening
- quote.)
+ quote.
Within an escape string, a backslash character (<literal>\</literal>) begins a
C-like <firstterm>backslash escape</firstterm> sequence, in which the combination
of backslash and following character(s) represent a special byte
@@ -432,37 +407,9 @@ SELECT 'foo' 'bar';
It is your responsibility that the byte sequences you create,
especially when using the octal or hexadecimal escapes, compose
valid characters in the server character set encoding.
- A useful alternative is to use Unicode escapes or the
- alternative Unicode escape syntax, explained
- in <xref linkend="sql-syntax-strings-uescape"/>; then the server
+ If you can use Unicode escapes or the alternative Unicode escape syntax,
+ explained in <xref linkend="sql-syntax-strings-uescape"/>; then the server
will check that the character conversion is possible.
- </para>
-
- <caution>
- <para>
- If the configuration parameter
- <xref linkend="guc-standard-conforming-strings"/> is <literal>off</literal>,
- then <productname>PostgreSQL</productname> recognizes backslash escapes
- in both regular and escape string constants. However, as of
- <productname>PostgreSQL</productname> 9.1, the default is <literal>on</literal>, meaning
- that backslash escapes are recognized only in escape string constants.
- This behavior is more standards-compliant, but might break applications
- which rely on the historical behavior, where backslash escapes
- were always recognized. As a workaround, you can set this parameter
- to <literal>off</literal>, but it is better to migrate away from using backslash
- escapes. If you need to use a backslash escape to represent a special
- character, write the string constant with an <literal>E</literal>.
- </para>
-
- <para>
- In addition to <varname>standard_conforming_strings</varname>, the configuration
- parameters <xref linkend="guc-escape-string-warning"/> and
- <xref linkend="guc-backslash-quote"/> govern treatment of backslashes
- in string constants.
- </para>
- </caution>
-
- <para>
The character with the code zero cannot be in a string constant.
</para>
</sect3>
@@ -482,9 +429,9 @@ SELECT 'foo' 'bar';
constant starts with <literal>U&</literal> (upper or lower case
letter U followed by ampersand) immediately before the opening
quote, without any spaces in between, for
- example <literal>U&'foo'</literal>. (Note that this creates an
+ example <literal>U&'foo'</literal>. This creates an
ambiguity with the operator <literal>&</literal>. Use spaces
- around the operator to avoid this problem.) Inside the quotes,
+ around the operator to avoid this problem. Inside the quotes,
Unicode characters can be specified in escaped form by writing a
backslash followed by the four-digit hexadecimal code point
number or alternatively a backslash followed by a plus sign
@@ -493,7 +440,7 @@ SELECT 'foo' 'bar';
<programlisting>
U&'d\0061t\+000061'
</programlisting>
- The following less trivial example writes the Russian
+ The following example writes the Russian
word <quote>slon</quote> (elephant) in Cyrillic letters:
<programlisting>
U&'\0441\043B\043E\043D'
@@ -532,17 +479,6 @@ U&'d!0061t!+000061' UESCAPE '!'
by one of these escape sequences is converted to the actual server
encoding; an error is reported if that's not possible.
</para>
-
- <para>
- Also, the Unicode escape syntax for string constants only works
- when the configuration
- parameter <xref linkend="guc-standard-conforming-strings"/> is
- turned on. This is because otherwise this syntax could confuse
- clients that parse the SQL statements to the point that it could
- lead to SQL injections and similar security issues. If the
- parameter is set to off, this syntax will be rejected with an
- error message.
- </para>
</sect3>
<sect3 id="sql-syntax-dollar-quoting">
@@ -555,24 +491,22 @@ U&'d!0061t!+000061' UESCAPE '!'
<para>
While the standard syntax for specifying string constants is usually
convenient, it can be difficult to understand when the desired string
- contains many single quotes, since each of those must
- be doubled. To allow more readable queries in such situations,
- <productname>PostgreSQL</productname> provides another way, called
- <quote>dollar quoting</quote>, to write string constants.
- A dollar-quoted string constant
- consists of a dollar sign (<literal>$</literal>), an optional
- <quote>tag</quote> of zero or more characters, another dollar
- sign, an arbitrary sequence of characters that makes up the
- string content, a dollar sign, the same tag that began this
- dollar quote, and a dollar sign. For example, here are two
- different ways to specify the string <quote>Dianne's horse</quote>
- using dollar quoting:
+ contains many single quotes or backslashes. To allow more readable
+ queries in such situations, <productname>PostgreSQL</productname>
+ provides <quote>dollar quoting</quote> to write string constants.
+ A dollar-quoted string constant consists of a dollar sign
+ (<literal>$</literal>), an optional <quote>tag</quote> of zero or
+ more characters, another dollar sign, an arbitrary sequence of
+ characters that makes up the string content, a dollar sign, the same
+ tag that began this dollar quote, and a dollar sign. For example, here
+ are two different ways to specify the string
+ <quote>Dianne's horse</quote> using dollar quoting:
<programlisting>
$$Dianne's horse$$
$SomeTag$Dianne's horse$SomeTag$
</programlisting>
Notice that inside the dollar-quoted string, single quotes can be
- used without needing to be escaped. Indeed, no characters inside
+ used without needing to be escaped. No characters inside
a dollar-quoted string are ever escaped: the string content is always
written literally. Backslashes are not special, and neither are
dollar signs, unless they are part of a sequence matching the opening
@@ -590,13 +524,12 @@ BEGIN
END;
$function$
</programlisting>
- Here, the sequence <literal>$q$[\t\r\n\v\\]$q$</literal> represents a
+ The sequence <literal>$q$[\t\r\n\v\\]$q$</literal> represents a
dollar-quoted literal string <literal>[\t\r\n\v\\]</literal>, which will
be recognized when the function body is executed by
- <productname>PostgreSQL</productname>. But since the sequence does not match
+ <productname>PostgreSQL</productname>. Since the sequence does not match
the outer dollar quoting delimiter <literal>$function$</literal>, it is
- just some more characters within the constant so far as the outer
- string is concerned.
+ just some more characters within the constant.
</para>
<para>
@@ -634,7 +567,7 @@ $function$
</indexterm>
<para>
- Bit-string constants look like regular string constants with a
+ Bit-string constants is a string constant with a
<literal>B</literal> (upper or lower case) immediately before the
opening quote (no intervening whitespace), e.g.,
<literal>B'1001'</literal>. The only characters allowed within
@@ -654,6 +587,7 @@ $function$
across lines in the same way as regular string constants.
Dollar quoting cannot be used in a bit-string constant.
</para>
+
</sect3>
<sect3 id="sql-syntax-constants-numeric">
@@ -665,7 +599,7 @@ $function$
</indexterm>
<para>
- Numeric constants are accepted in these general forms:
+ Numeric constants are accepted in these forms:
<synopsis>
<replaceable>digits</replaceable>
<replaceable>digits</replaceable>.<optional><replaceable>digits</replaceable></optional><optional>e<optional>+-</optional><replaceable>digits</replaceable></optional>
@@ -678,13 +612,12 @@ $function$
exponent marker (<literal>e</literal>), if one is present.
There cannot be any spaces or other characters embedded in the
constant, except for underscores, which can be used for visual grouping as
- described below. Note that any leading plus or minus sign is not actually
- considered part of the constant; it is an operator applied to the
- constant.
+ described below. Any leading plus or minus sign is not considered part of
+ the constant; it is an operator applied to the constant.
</para>
<para>
- These are some examples of valid numeric constants:
+ Examples of valid numeric constants:
<literallayout>
42
3.5
@@ -711,7 +644,7 @@ $function$
</para>
<para>
- These are some examples of valid non-decimal integer constants:
+ Examples of valid non-decimal integer constants:
<literallayout>
0b100101
0B10011001
@@ -724,7 +657,7 @@ $function$
<para>
For visual grouping, underscores can be inserted between digits. These
- have no further effect on the value of the constant. For example:
+ have no effect on the value of the constant. For example:
<literallayout>
1_500_000_000
0b10001000_00000000
@@ -743,7 +676,7 @@ $function$
<indexterm><primary>bigint</primary></indexterm>
<indexterm><primary>numeric</primary></indexterm>
A numeric constant that contains neither a decimal point nor an
- exponent is initially presumed to be type <type>integer</type> if its
+ exponent is presumed to be type <type>integer</type> if its
value fits in type <type>integer</type> (32 bits); otherwise it is
presumed to be type <type>bigint</type> if its
value fits in type <type>bigint</type> (64 bits); otherwise it is
@@ -767,8 +700,8 @@ REAL '1.23' -- string style
1.23::REAL -- PostgreSQL (historical) style
</programlisting>
- These are actually just special cases of the general casting
- notations discussed next.
+ These are special cases of the general casting notations discussed
+ next.
</para>
</sect3>
@@ -863,8 +796,9 @@ CAST ( '<replaceable>string</replaceable>' AS <replaceable>type</replaceable> )
<listitem>
<para>
- A multiple-character operator name cannot end in <literal>+</literal> or <literal>-</literal>,
- unless the name also contains at least one of these characters:
+ A multiple-character operator name cannot end in <literal>+</literal>
+ or <literal>-</literal>, unless the name also contains at least one
+ of these characters:
<literallayout>
~ ! @ # % ^ & | ` ?
</literallayout>
@@ -878,7 +812,7 @@ CAST ( '<replaceable>string</replaceable>' AS <replaceable>type</replaceable> )
</para>
<para>
- When working with non-SQL-standard operator names, you will usually
+ When working with non-SQL-standard operator names, you will
need to separate adjacent operators with spaces to avoid ambiguity.
For example, if you have defined a prefix operator named <literal>@</literal>,
you cannot write <literal>X*@Y</literal>; you must write
@@ -911,7 +845,7 @@ CAST ( '<replaceable>string</replaceable>' AS <replaceable>type</replaceable> )
<listitem>
<para>
- Parentheses (<literal>()</literal>) have their usual meaning to
+ Parentheses (<literal>()</literal>) are used to
group expressions and enforce precedence. In some cases
parentheses are required as part of the fixed syntax of a
particular SQL command.
@@ -1003,7 +937,7 @@ CAST ( '<replaceable>string</replaceable>' AS <replaceable>type</replaceable> )
<para>
A comment is removed from the input stream before further syntax
- analysis and is effectively replaced by whitespace.
+ analysis and is replaced by whitespace.
</para>
</sect2>
@@ -1143,12 +1077,11 @@ CAST ( '<replaceable>string</replaceable>' AS <replaceable>type</replaceable> )
</table>
<para>
- Note that the operator precedence rules also apply to user-defined
+ Operator precedence rules also apply to user-defined
operators that have the same names as the built-in operators
- mentioned above. For example, if you define a
- <quote>+</quote> operator for some custom data type it will have
- the same precedence as the built-in <quote>+</quote> operator, no
- matter what yours does.
+ mentioned above. If you define a <quote>+</quote> operator
+ for some custom data type it will have the same precedence as
+ the built-in <quote>+</quote> operator, no matter what yours does.
</para>
<para>
@@ -1163,23 +1096,6 @@ SELECT 3 OPERATOR(pg_catalog.+) 4;
which specific operator appears inside <literal>OPERATOR()</literal>.
</para>
- <note>
- <para>
- <productname>PostgreSQL</productname> versions before 9.5 used slightly different
- operator precedence rules. In particular, <token><=</token>
- <token>>=</token> and <token><></token> used to be treated as
- generic operators; <literal>IS</literal> tests used to have higher priority;
- and <literal>NOT BETWEEN</literal> and related constructs acted inconsistently,
- being taken in some cases as having the precedence of <literal>NOT</literal>
- rather than <literal>BETWEEN</literal>. These rules were changed for better
- compliance with the SQL standard and to reduce confusion from
- inconsistent treatment of logically equivalent constructs. In most
- cases, these changes will result in no behavioral change, or perhaps
- in <quote>no such operator</quote> failures which can be resolved by adding
- parentheses. However there are corner cases in which a query might
- change behavior without any parsing error being reported.
- </para>
- </note>
</sect2>
</sect1>