2010/1/16 Robert Haas <robertmh...@gmail.com>: > On Thu, Jan 14, 2010 at 8:46 PM, Robert Haas <robertmh...@gmail.com> wrote: >> I have yet to fully review the code but on a quick glance it looks >> reasonable. > > On further review, it looks less reasonable. :-( > > The new PQescapeIdentConn function is basically a cut-up version of > PQescapeStringInternal, which seems like a reasonable foundation, but > it rips out a little too much - specifically: > > 1. the length argument, > 2. the size_t return value, > 3. the portion of the handling for incomplete multibyte characters > that prevents us from overrunning the output buffer on a maliciously > constructed (or unlucky) input string, and > 4. some relevant comments. >
Yes, I didn't repeat parameter's pattern from PQescapeStringConn, I would to simplify interface but I hasn't problem to modify it to same interface. I rewrote patch so now interface for PQescapeIdentConn is same as PQescapeStringConn @3. I though so the protection under incomplete multibyte chars are enought - missing bytes are replaced by space - like PQescapeStringConn does. But now - mechanism is exactly same, so this problem should be solved. Regards Pavel Stehule > I'm inclined to think we should put all of that stuff back, but > certainly #3 at a minimum. > > ...Robert >
*** ./doc/src/sgml/libpq.sgml.orig 2010-01-17 07:04:02.000000000 +0100 --- ./doc/src/sgml/libpq.sgml 2010-01-17 07:07:05.000000000 +0100 *************** *** 2926,3032 **** <title>Escaping Strings for Inclusion in SQL Commands</title> <indexterm zone="libpq-exec-escape-string"> - <primary>PQescapeStringConn</primary> - </indexterm> - <indexterm zone="libpq-exec-escape-string"> - <primary>PQescapeString</primary> - </indexterm> - <indexterm zone="libpq-exec-escape-string"> <primary>escaping strings</primary> <secondary>in libpq</secondary> </indexterm> ! <para> ! <function>PQescapeStringConn</function> escapes a string for use within an SQL ! command. This is useful when inserting data values as literal constants ! in SQL commands. Certain characters (such as quotes and backslashes) must ! be escaped to prevent them from being interpreted specially by the SQL parser. ! <function>PQescapeStringConn</> performs this operation. ! </para> ! <tip> ! <para> ! It is especially important to do proper escaping when handling strings that ! were received from an untrustworthy source. Otherwise there is a security ! risk: you are vulnerable to <quote>SQL injection</> attacks wherein unwanted ! SQL commands are fed to your database. ! </para> ! </tip> ! <para> ! Note that it is not necessary nor correct to do escaping when a data ! value is passed as a separate parameter in <function>PQexecParams</> or ! its sibling routines. ! <synopsis> size_t PQescapeStringConn (PGconn *conn, char *to, const char *from, size_t length, int *error); ! </synopsis> ! </para> ! <para> ! <function>PQescapeStringConn</> writes an escaped version of the ! <parameter>from</> string to the <parameter>to</> buffer, escaping ! special characters so that they cannot cause any harm, and adding a ! terminating zero byte. The single quotes that must surround ! <productname>PostgreSQL</> string literals are not included in the ! result string; they should be provided in the SQL command that the ! result is inserted into. The parameter <parameter>from</> points to ! the first character of the string that is to be escaped, and the ! <parameter>length</> parameter gives the number of bytes in this ! string. A terminating zero byte is not required, and should not be ! counted in <parameter>length</>. (If a terminating zero byte is found ! before <parameter>length</> bytes are processed, ! <function>PQescapeStringConn</> stops at the zero; the behavior is ! thus rather like <function>strncpy</>.) <parameter>to</> shall point ! to a buffer that is able to hold at least one more byte than twice ! the value of <parameter>length</>, otherwise the behavior is undefined. ! Behavior is likewise undefined if the <parameter>to</> and ! <parameter>from</> strings overlap. ! </para> ! <para> ! If the <parameter>error</> parameter is not NULL, then ! <literal>*error</> is set to zero on success, nonzero on error. ! Presently the only possible error conditions involve invalid multibyte ! encoding in the source string. The output string is still generated ! on error, but it can be expected that the server will reject it as ! malformed. On error, a suitable message is stored in the ! <parameter>conn</> object, whether or not <parameter>error</> is NULL. ! </para> ! <para> ! <function>PQescapeStringConn</> returns the number of bytes written ! to <parameter>to</>, not including the terminating zero byte. ! </para> ! <para> ! <synopsis> ! size_t PQescapeString (char *to, const char *from, size_t length); ! </synopsis> ! </para> ! <para> ! <function>PQescapeString</> is an older, deprecated version of ! <function>PQescapeStringConn</>; the difference is that it does ! not take <parameter>conn</> or <parameter>error</> parameters. ! Because of this, it cannot adjust its behavior depending on the ! connection properties (such as character encoding) and therefore ! <emphasis>it might give the wrong results</>. Also, it has no way ! to report error conditions. ! </para> ! <para> ! <function>PQescapeString</> can be used safely in single-threaded ! client programs that work with only one <productname>PostgreSQL</> ! connection at a time (in this case it can find out what it needs to ! know <quote>behind the scenes</>). In other contexts it is a security ! hazard and should be avoided in favor of ! <function>PQescapeStringConn</>. ! </para> ! </sect2> <sect2 id="libpq-exec-escape-bytea"> <title>Escaping Binary Strings for Inclusion in SQL Commands</title> --- 2926,3088 ---- <title>Escaping Strings for Inclusion in SQL Commands</title> <indexterm zone="libpq-exec-escape-string"> <primary>escaping strings</primary> <secondary>in libpq</secondary> </indexterm> ! <variablelist> ! <varlistentry> ! <term> ! <function>PQescapeStringConn</function> ! <indexterm> ! <primary>PQescapeStringConn</primary> ! </indexterm> ! </term> ! <listitem> ! <para> ! <function>PQescapeStringConn</function> escapes a string for use within an SQL ! command. This is useful when inserting data values as literal constants ! in SQL commands. Certain characters (such as quotes and backslashes) must ! be escaped to prevent them from being interpreted specially by the SQL parser. ! <function>PQescapeStringConn</> performs this operation. ! </para> ! <tip> ! <para> ! It is especially important to do proper escaping when handling strings that ! were received from an untrustworthy source. Otherwise there is a security ! risk: you are vulnerable to <quote>SQL injection</> attacks wherein unwanted ! SQL commands are fed to your database. ! </para> ! </tip> ! <para> ! Note that it is not necessary nor correct to do escaping when a data ! value is passed as a separate parameter in <function>PQexecParams</> or ! its sibling routines. ! ! <synopsis> size_t PQescapeStringConn (PGconn *conn, char *to, const char *from, size_t length, int *error); ! </synopsis> ! </para> ! <para> ! <function>PQescapeStringConn</> writes an escaped version of the ! <parameter>from</> string to the <parameter>to</> buffer, escaping ! special characters so that they cannot cause any harm, and adding a ! terminating zero byte. The single quotes that must surround ! <productname>PostgreSQL</> string literals are not included in the ! result string; they should be provided in the SQL command that the ! result is inserted into. The parameter <parameter>from</> points to ! the first character of the string that is to be escaped, and the ! <parameter>length</> parameter gives the number of bytes in this ! string. A terminating zero byte is not required, and should not be ! counted in <parameter>length</>. (If a terminating zero byte is found ! before <parameter>length</> bytes are processed, ! <function>PQescapeStringConn</> stops at the zero; the behavior is ! thus rather like <function>strncpy</>.) <parameter>to</> shall point ! to a buffer that is able to hold at least one more byte than twice ! the value of <parameter>length</>, otherwise the behavior is undefined. ! Behavior is likewise undefined if the <parameter>to</> and ! <parameter>from</> strings overlap. ! </para> ! <para> ! If the <parameter>error</> parameter is not NULL, then ! <literal>*error</> is set to zero on success, nonzero on error. ! Presently the only possible error conditions involve invalid multibyte ! encoding in the source string. The output string is still generated ! on error, but it can be expected that the server will reject it as ! malformed. On error, a suitable message is stored in the ! <parameter>conn</> object, whether or not <parameter>error</> is NULL. ! </para> ! <para> ! <function>PQescapeStringConn</> returns the number of bytes written ! to <parameter>to</>, not including the terminating zero byte. ! </para> ! </listitem> ! </varlistentry> ! <varlistentry> ! <term> ! <function>PQescapeString</function> ! <indexterm> ! <primary>PQescapeString</primary> ! </indexterm> ! </term> ! <listitem> ! <para> ! <synopsis> ! size_t PQescapeString (char *to, const char *from, size_t length); ! </synopsis> ! </para> ! <para> ! <function>PQescapeString</> is an older, deprecated version of ! <function>PQescapeStringConn</>; the difference is that it does ! not take <parameter>conn</> or <parameter>error</> parameters. ! Because of this, it cannot adjust its behavior depending on the ! connection properties (such as character encoding) and therefore ! <emphasis>it might give the wrong results</>. Also, it has no way ! to report error conditions. ! </para> ! ! <para> ! <function>PQescapeString</> can be used safely in single-threaded ! client programs that work with only one <productname>PostgreSQL</> ! connection at a time (in this case it can find out what it needs to ! know <quote>behind the scenes</>). In other contexts it is a security ! hazard and should be avoided in favor of ! <function>PQescapeStringConn</>. ! </para> ! </listitem> ! </varlistentry> + <varlistentry> + <term> + <function>PQescapeIdentConn</function> + <indexterm> + <primary>PQescapeIdentConn</primary> + </indexterm> + </term> + + <listitem> + <para> + <synopsis> + void PQescapeIdentConn (PGconn *con, + char *to, const char *from, + int *error); + </synopsis> + </para> + + <para> + <function>PQescapeIdentConn</> writes an escaped string of the + <parameter>from</> string to the <parameter>to</>. Result is + well formed SQL identifier. <parameter>to</> shall point to + a buffer that is able to hold three bytes more than twice of + length string <parameter>from</>, otherwise the behavior is undefined. + </para> + + <para> + If the <parameter>error</> parameter is not NULL, then + <literal>*error</> is set to zero on success, nonzero on error. + Presently the only possible error conditions involve invalid multibyte + encoding in the source string. The output string is still generated + on error, but it can be expected that the server will reject it as + malformed. On error, a suitable message is stored in the + <parameter>conn</> object, whether or not <parameter>error</> is NULL. + </para> + + </listitem> + </varlistentry> + </variablelist> + + </sect2> <sect2 id="libpq-exec-escape-bytea"> <title>Escaping Binary Strings for Inclusion in SQL Commands</title> *** ./doc/src/sgml/ref/psql-ref.sgml.orig 2010-01-17 07:04:02.000000000 +0100 --- ./doc/src/sgml/ref/psql-ref.sgml 2010-01-17 07:07:05.000000000 +0100 *************** *** 2335,2340 **** --- 2335,2361 ---- </note> <para> + <application>psql</application> provides two additional syntax for + retrieving the content of variable. This auxilary syntax ensure + necessary escaping when we would to use content as sql literal or + sql identifier. + <programlisting> + testdb=> <userinput>\set foo 'hello world'</userinput> + testdb=> <userinput>\echo :'foo'</userinput> + 'hello world' + + testdb=> <userinput>\echo :"foo"</userinput> + "hello world" + + testdb=> <userinput>SELECT :'foo' AS :"foo";</userinput> + hello world + ------------- + hello world + (1 row) + </programlisting> + </para> + + <para> If you call <command>\set</command> without a second argument, the variable is set, with an empty string as value. To unset (or delete) a variable, use the command <command>\unset</command>. *************** *** 2722,2728 **** the variable is copied literally, so it can even contain unbalanced quotes or backslash commands. You must make sure that it makes sense where you put it. Variable interpolation will not be performed into ! quoted <acronym>SQL</acronym> entities. </para> <para> --- 2743,2755 ---- the variable is copied literally, so it can even contain unbalanced quotes or backslash commands. You must make sure that it makes sense where you put it. Variable interpolation will not be performed into ! quoted <acronym>SQL</acronym> entities. Identifiers with special chars ! have to be inserted between double quotes. <application>psql</application> ! ensure necessary quoting with additional syntax: ! <programlisting> ! testdb=> <userinput>\set foo 'my tab'</userinput> ! testdb=> <userinput>SELECT * FROM :"foo";</userinput> ! </programlisting> </para> <para> *************** *** 2752,2757 **** --- 2779,2793 ---- at one point you thought it was great that all Unix commands use the same escape character.) </para> + + <para> + With alternative syntax for retrieving content of variables external + escaping are not necessary: + <programlisting> + testdb=> <userinput>\set content `cat my_file.txt`</userinput> + testdb=> <userinput>INSERT INTO my_table VALUES(:'content');</userinput> + </programlisting> + </para> <para> Since colons can legally appear in SQL commands, the following rule *** ./src/bin/psql/psqlscan.l.orig 2010-01-17 07:04:02.000000000 +0100 --- ./src/bin/psql/psqlscan.l 2010-01-17 19:31:49.000000000 +0100 *************** *** 47,52 **** --- 47,53 ---- #include "settings.h" #include "variables.h" + #include "dumputils.h" /* * We use a stack of flex buffers to handle substitution of psql variables. *************** *** 118,123 **** --- 119,128 ---- char **txtcopy); static void emit(const char *txt, int len); static bool is_utf16_surrogate_first(uint32 c); + static char *quote_identifier_conn(PGconn *conn, const char *ident); + + static void appendStringIdentConn(PQExpBuffer buf, const char *str, PGconn *conn); + #define ECHO emit(yytext, yyleng) *************** *** 707,712 **** --- 712,777 ---- } } + :'[A-Za-z0-9_]+' { + /* + * Possible psql variable substitution + * with literal quoting. + */ + const char *value; + + yytext[yyleng - 1] = '\0'; + value = GetVariable(pset.vars, yytext + 2); + + if (value) + { + /* It is a variable, perform substitution */ + PQExpBufferData buf; + + initPQExpBuffer(&buf); + appendStringLiteralConn(&buf, value, pset.db); + push_new_buffer(buf.data); + termPQExpBuffer(&buf); + /* yy_scan_string already made buffer active */ + } + else + { + /* + * if the variable doesn't exist we'll copy the + * string as is + */ + ECHO; + } + } + + :\"[A-Za-z0-9_]+\" { + /* Possible psql variable substitution with double quoting */ + const char *value; + + /* remove dquotes */ + yytext[yyleng - 1] = '\0'; + value = GetVariable(pset.vars, yytext + 2); + + if (value) + { + PQExpBufferData buf; + + initPQExpBuffer(&buf); + appendStringIdentConn(&buf, value, pset.db); + push_new_buffer(buf.data); + termPQExpBuffer(&buf); + /* yy_scan_string already made buffer active */ + } + else + { + /* + * if the variable doesn't exist we'll copy the + * string as is + */ + ECHO; + } + } + + /* * Back to backend-compatible rules. */ *************** *** 927,932 **** --- 992,1044 ---- return LEXRES_OK; } + :'[A-Za-z0-9_]*' { + /* Possible psql variable substitution */ + if (option_type == OT_VERBATIM) + ECHO; + else + { + const char *value; + + yytext[yyleng - 1] = '\0'; + value = GetVariable(pset.vars, yytext + 2); + + /* + * The variable value is just emitted without any + * further examination. This is consistent with the + * pre-8.0 code behavior, if not with the way that + * variables are handled outside backslash commands. + */ + if (value) + appendStringLiteralConn(output_buf, value, pset.db); + } + + *option_quote = ':'; + + return LEXRES_OK; + } + + :\"[A-Za-z0-9_]*\" { + /* Possible psql variable substitution */ + if (option_type == OT_VERBATIM) + ECHO; + else + { + const char *value; + + yytext[yyleng - 1] = '\0'; + value = GetVariable(pset.vars, yytext + 2); + + if (value) + appendStringIdentConn(output_buf, value, pset.db); + } + + *option_quote = ':'; + + return LEXRES_OK; + } + + "|" { ECHO; if (option_type == OT_FILEPIPE) *************** *** 1740,1742 **** --- 1852,1887 ---- { return (c >= 0xD800 && c <= 0xDBFF); } + + + /* + * Convert a string value to an SQL identifier and append it to + * the givem buffer. + */ + static void + appendStringIdentConn(PQExpBuffer buf, const char *str, PGconn *conn) + { + char *buffer; + size_t length; + size_t retlen; + int error; + + appendPQExpBufferChar(buf, '\"'); + + length = strlen(str); + buffer = pg_malloc(2 * length + 1); + + retlen = PQescapeIdentConn(conn, buffer, str, length, &error); + if (error) + { + const char *error_message = PQerrorMessage(pset.db); + + if (strlen(error_message)) + psql_error("%s", error_message); + } + + appendBinaryPQExpBuffer(buf, buffer, retlen); + appendPQExpBufferChar(buf, '\"'); + + free(buffer); + } *** ./src/interfaces/libpq/exports.txt.orig 2009-03-31 03:41:27.000000000 +0200 --- ./src/interfaces/libpq/exports.txt 2010-01-17 07:07:05.000000000 +0100 *************** *** 153,155 **** --- 153,156 ---- PQfireResultCreateEvents 151 PQconninfoParse 152 PQinitOpenSSL 153 + PQescapeIdentConn 154 *** ./src/interfaces/libpq/fe-exec.c.orig 2010-01-17 07:04:02.000000000 +0100 --- ./src/interfaces/libpq/fe-exec.c 2010-01-17 16:03:31.000000000 +0100 *************** *** 3345,3347 **** --- 3345,3445 ---- *retbuflen = buflen; return tmpbuf; } + + + /* + * PQescapeIdentConn - returns valid SQL identifier. + * + * Replace " by "" and insert strings to pair of double quotes. + * + * length is the length of the source string. (Note: if a terminating NUL + * is encountered sooner, PQescapeString stops short of "length"; the behavior + * is thus rather like strncpy.) + * + * For safety the buffer at "to" must be at least 2*length + 1 bytes long. + * A terminating NUL character is added to the output string, whether the + * input is NUL-terminated or not. + * + * Returns the actual length of the output (not counting the terminating NUL). + */ + extern size_t PQescapeIdentConn(PGconn *conn, + char *to, const char *from, size_t length, + int *error) + { + const char *source = from; + char *target = to; + size_t remaining = length; + + if (!conn) + { + /* force empty-string result */ + *to = '\0'; + if (*error) + *error = 1; + return 0; + } + + if (error) + *error = 0; + + while (remaining > 0 && *source != '\0') + { + char c = *source; + int len; + int i; + + /* Fast path for plain ASCII */ + if (!IS_HIGHBIT_SET(c)) + { + /* add double quotes if needed */ + if (c == '"') + *target++ = c; + /* copy the character */ + *target++ = c; + source++; + remaining--; + continue; + } + + /* Slow path for possible multibyte characters */ + len = pg_encoding_mblen(conn->client_encoding, source); + + /* Copy the character */ + for (i = 0; i < len; i++) + { + if (remaining == 0 || *source == '\0') + break; + *target++ = *source++; + remaining--; + } + + /* + * If we hit premature end of string (ie, incomplete multibyte + * character), try to pad out to the correct length with spaces. We + * may not be able to pad completely, but we will always be able to + * insert at least one pad space (since we'd not have quoted a + * multibyte character). This should be enough to make a string that + * the server will error out on. + */ + if (i < len) + { + if (error) + *error = 1; + + printfPQExpBuffer(&conn->errorMessage, + libpq_gettext("incomplete multibyte character\n")); + for (; i < len; i++) + { + if (((size_t) (target - to)) / 2 >= length) + break; + *target++ = ' '; + } + break; + } + } + + /* Write the terminating NUL character. */ + *target = '\0'; + + return target - to; + } *** ./src/interfaces/libpq/libpq-fe.h.orig 2010-01-17 07:04:02.000000000 +0100 --- ./src/interfaces/libpq/libpq-fe.h 2010-01-17 15:34:53.000000000 +0100 *************** *** 471,476 **** --- 471,479 ---- extern size_t PQescapeStringConn(PGconn *conn, char *to, const char *from, size_t length, int *error); + extern size_t PQescapeIdentConn(PGconn *conn, + char *to, const char *from, size_t length, + int *error); extern unsigned char *PQescapeByteaConn(PGconn *conn, const unsigned char *from, size_t from_length, size_t *to_length);
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers