Re: Proposal to improve the content in subsection 8.16.6. "Composite Type Input and Output Syntax"

2020-04-06 Thread David G. Johnston
On Mon, Apr 6, 2020 at 12:46 AM Bryn Llewellyn  wrote:

> Tom Lane  wrote:
>
> ...
>
>
> It's already an example, so I don't see this as an improvement.
>
> ...
>
>
> The proposed additional text is flat-out wrong.
>
> What actually happens here is that text between quotes is considered
> quoted (so that, for example, commas within it are not field separators),
> but that does not exclude there being other unquoted text within the
> same field value.
>
> regards, tom lane
>
> Thanks, Tom. Your reply was very helpful. It prompted me to read, re-read,
> and then re-re-read 8.16.6 several times. I did notice that neither of
> “varchar” nor “char” are found in the whole of section 8.16
>



> But I suppose that it’s obvious to the reader that “text” is to be taken
> as a term of art, denoting “character data types” and not as the name of
> one of these data types.
>

The word "text" in Tom's paragraph isn't talking about a data type at all.
Its talking about symbols.

>
> Second: for this specified target:
> > hello "you" <
>
> I found that each of these six literals produced the same result:
> ( hello \"you\” )
> ("" hello \"you\" "")
> (" hello \"you\" ")
> (" hello ""you"" ")
> ( "hello \"you\"" )
> ( "hello ""you""" )
>
> The result was canonically rendered in psql thus:
> (" hello ""you"" ")
>
> This observation tells me that this:
> ""
> serves as the syntax to escape a single double quote within a surrounding
> double quote pair, as an alternative to this:
> \"
>
> I searched in the whole of section 8.16 for this:
> ""
> but I found its meaning defined only as the empty string when it occurs
> outside of a surrounding double quote pair. I assume that it’s obvious to
> most readers that it has a second meaning in the other context.
>

You need to expand your search to more than just the literal symbol written
"".

Quoting from the 8.16.6 docs.

"To put a double quote or backslash in a quoted composite field value,
precede it with a backslash. (Also, a [emp]pair of double quotes within a
double-quoted field value is taken to represent a double quote
character[emp], analogously to the rules for single quotes in SQL literal
strings.)"

"Double quotes and backslashes embedded in field values will be doubled."

I found that, with is extra one rule, I could explain all my observations.
>
> I observed, over all my tests, that there always exists a canonical form
> like this:
> ("...")
> where the ellipsis indicates a well-formed sequence of characters that
> produces my specified target.
>
> I’m going to elevate this to a hypothesis.
>
> And unless/until it’s disproved, I’m going to adopt a rule of practice
> (for myself) always to use this form.
>

The " " just means first treat the underlying content as being textual (but
untyped).  Since every type has an input function that accepts textual data
and converts it to the relevant type this works.

However, you may wish to explore other data types comprising your composite
and consider boundary cases (empty string or missing/null) variations
before solidifying your hypothesis.

For example at least consider why the following fails.

create type rtint as (a int);
select '("")'::rtint;

>
> *literal-for-array-of-records.sql*
> **
> This was designed to help me meet my ultimate goal: I need to write a
> client program (say, in python) that will process data from a file whose
> semantic content is sufficient to let me populate the “rt[]” field in one
> row of a table, given by “type rt as (n numeric, s text, t timestamp, b
> booleaan)”.
>
> I’m afraid I couldn’t find the rules in the Version 11 PG doc. This
> probably reflects my poor searching skills.
>

The immediately preceding chapter in the documentation discusses arrays at
the same level of detail and definitional authority as the chapter on
composite types.  Its left as an exercise to the reader to deal with the
nesting dynamic - in particular the escaping of special characters
especially those common to both syntaxes.

https://www.postgresql.org/docs/12/arrays.html

   n   |  s  |  t  | b
> ---+-+-+---
>  19.13 | "Hello", she\he\r  +| 2019-07-21 16:47:00 | f
>| (said | is saying). | |
>
> The “\r”, along with the “+” that I’m used to, seem to be artifacts of how
> psql displays what I created with chr(10). But only in this special
> context. (This "select 'a'||chr(10)||’b'” produces only the “+” but not
> the “\r”. Strange.)
>

Welcome to the fun world of line break (newline) character sequences.  Just
for fun:

On a typewriter if you want to begin a new line your machine has to
physically do two things:
1 - move the piece of paper up one row (line feed) - this command code was
assigned chr(10) and is abbreviated \n
2 - move the strike bar back to the starting position (carriage return) -
code chr(13) and is abbreviated \r

Windows faithfully reproduced the 

Re: Proposal to improve the content in subsection 8.16.6. "Composite Type Input and Output Syntax"

2020-04-06 Thread Bryn Llewellyn
Tom Lane  wrote:...It's already an example, so I don't see this as an improvementThe proposed additional text is flat-out wrong.What actually happens here is that text between quotes is consideredquoted (so that, for example, commas within it are not field separators),but that does not exclude there being other unquoted text within thesame field value.			regards, tom laneThanks, Tom. Your reply was very helpful. It prompted me to read, re-read, and then re-re-read 8.16.6 several times. I did notice that neither of “varchar” nor “char” are found in the whole of section 8.16. But I suppose that it’s obvious to the reader that “text” is to be taken as a term of art, denoting “character data types” and not as the name of one of these data types.I ran some fairly exhaustive empirical tests. Not to be mysterious, I created two “.sql” files. The zip is attached.record_literal.sql--This uses "type rt as (v text)” and "table t(k serial primary key, literal text, r rt)”.It inserts a series of text values into column “literal” as is and into column “r” typecast to “rt”.I aimed to end up with specified target values shown by “select '>'||(r).v||’<‘ from t”.Here are two example findings,First: for this specified target:>hello hello "you" 

Re: Proposal to improve the content in subsection 8.16.6. "Composite Type Input and Output Syntax"

PG Doc comments form  writes:
> 1. Replace this:
> "...the whitespace will be ignored if the field type is integer, but not if
> it is text."
> with this:
> "...the whitespace will be ignored if the field type is integer, but not if
> it is a character data type like text."

It's already an example, so I don't see this as an improvement.

> Insert this between the two sentences, i.e. after "...any individual field
> value." and before "You must do so if...":

The proposed additional text is flat-out wrong.

What actually happens here is that text between quotes is considered
quoted (so that, for example, commas within it are not field separators),
but that does not exclude there being other unquoted text within the
same field value.

regards, tom lane




Proposal to improve the content in subsection 8.16.6. "Composite Type Input and Output Syntax"

The following documentation comment has been logged on the website:

Page: https://www.postgresql.org/docs/11/rowtypes.html
Description:

1. Replace this:

"...the whitespace will be ignored if the field type is integer, but not if
it is text."

with this:

"...the whitespace will be ignored if the field type is integer, but not if
it is a character data type like text."

--

2. Find this:

"...you can write double quotes around any individual field value. You must
do so if the field value would otherwise confuse..."

Insert this between the two sentences, i.e. after "...any individual field
value." and before "You must do so if...":

<<
Notice that, as previously stated, the value for a field whose data type is
text, varchar, or char starts with the charater that immediately follows the
opening parenthesis or comma deliminator, and ends with the character that
immediately precedes the comma deliminator or closing parenthesis. This
means that when you choose to surround a character value with double quotes,
the opening double quote must immediately follow the starting deliminator
and the closing double quote must immediately precede the closing
deliminator, like this:

'("hello world")'

If you don't follow this rule and write, for example, this:

'(   "hello world"   )'

then you will not get an error. However, the rules for the outcome are
undefined and you should simply avoid doing this.
>>