Re: [PHP-DEV] [RFC] [VOTE] is_literal

2021-09-14 Thread Craig Francis
On Wed, 8 Sept 2021 at 09:32, Stephen Reay  wrote:

> I had my share of issues with Craig’s PR, but I think the original goal of
> it was a good and useful concept



Thanks Stephen,

Just to confirm to others on-list... your "share of issues" was when we
looked at allowing all Integers, and you "would have been happy with
original/final implementation with strings only"... also, I agree with "A
type hint would be a great addition too".

For those who missed the original discussion... unlike strings, we cannot
flag if an integer was defined by the developer (in the PHP script); and
because integers cannot cause Injection Vulnerabilities (the pure
definition), I decided to allow all integers (to help adoption). But, with
my naming mistake (trying to better reflect that behaviour), and people
finding it a messy concept, we ended up removing integer support.

Craig


Re: [PHP-DEV] [RFC] [VOTE] is_literal

2021-09-14 Thread Craig Francis
On Wed, 8 Sept 2021 at 07:33, Claude Pache  wrote:

> We all want to protect from injection vulnerability, but I think there are
> better way than is_literal.
>
> One way is to use templates, an area where PHP is ironically lagging
> behind. I suggest looking at JS tagged templates:
>

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Template_literals#tagged_templates


Hi Claude,

Posting on-list, as I've not had a reply (was confirming I've not missed
anything).

I have looked at JavaScript Tagged Templates before, and while they could
be made to work (ish), I don't believe they are better than the
`is_literal()` proposal to protect against Injection Vulnerabilities:

1) It would require developers and libraries to re-write all of their
existing code to use Tagged Templates.

2) If we copied JavaScript, the methods/functions can still be called
incorrectly:

function template(html, ...values) {
>   console.log(html, values);
> }
> template`Hi ${name}`;
> template([`Hi ${name}`]); // Wrong
> template(['Hi ', name, '']); // Wrong


PHP could provide a way for Libraries to check the developer has used a
Tagged Template, but that's basically what the `is_literal()` proposal
does. With JavaScript, this is why `isTemplateObject()` is being developed,
and Trusted Types might get `fromLiteral()`.

3) Libraries would not be able to use Tagged Templates and easily support
older versions of PHP.

4) The backtick character is already used for `shell_exec()` like
functionality.

Craig


Re: [PHP-DEV] [RFC] [VOTE] is_literal

2021-09-08 Thread Stephen Reay



> On 8 Sep 2021, at 13:33, Claude Pache  wrote:
> 
> 
> 
>> Le 7 sept. 2021 à 11:49, Craig Francis  a écrit :
>> 
>> 
>> Obviously I'd still like libraries to be able to protect everyone from
>> introducing Injection Vulnerabilities (as the majority of programmers don't
>> use static analysis), but that's for another day.
>> 
> 
> 
> Hi, 
> 
> We all want to protect from injection vulnerability, but I think there are 
> better way than is_literal.
> 
> One way is to use templates, an area where PHP is ironically lagging behind. 
> I suggest looking at JS tagged templates:
> 
> https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Template_literals
>  
> 
> 
> For example:
> 
> 
> $qb->select('u')
>  ->from('User', 'u')
>  ->where('u.id = ' . $_GET['id']); // INSECURE
> could be written as
> 
> 
>  $qb->exec `
> SELECT u
> FROM User u
> WHERE u.id = %{ $_GET['id'] }
> `
> ?>
> 
> where the part between %{ ... } is transformed into an SQL literal string 
> (with delimiters "...", not just “escaping”) when it is a string; into the 
> SQL expression NULL when it is null; into an SQL subexpression if it is an 
> object (provided by the library) that represents a well-formed SQL 
> subexpression, etc.
> 
> —Claude
> 

Resending from on-list address because I’m an idiot. Apologies for the dupe 
Claude/Craig.


Hi Claude,

I had my share of issues with Craig’s PR, but I think the original goal of it 
was a good and useful concept - provide developers (mostly lib authors, but its 
not like it couldn’t be used by end developers too) a way to _know_ that a 
string came from something hard coded in a php file. 


A ‘tagged template’ like that doesn’t help solve the problem in any way that 
parameterised queries can’t already do, and if you want to make it more 
’templated’ like that, you could implement the same thing already by passing a 
printf-compatible template and the arguments to a function/method.

None of that helps solve what the `is_literal` function (or potential type 
hint) would help with: when the part of the query that needs to be substituted, 
is something that cannot be parameterised at the SQL level (i.e. a column name) 
you _really_ don’t want that to accept user input of any kind.



Cheers

Stephen
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php



Re: [PHP-DEV] [RFC] [VOTE] is_literal

2021-09-08 Thread Claude Pache


> Le 7 sept. 2021 à 11:49, Craig Francis  a écrit :
> 
> 
> Obviously I'd still like libraries to be able to protect everyone from
> introducing Injection Vulnerabilities (as the majority of programmers don't
> use static analysis), but that's for another day.
> 


Hi, 

We all want to protect from injection vulnerability, but I think there are 
better way than is_literal.

One way is to use templates, an area where PHP is ironically lagging behind. I 
suggest looking at JS tagged templates:

https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Template_literals
 


For example:


$qb->select('u')
   ->from('User', 'u')
   ->where('u.id = ' . $_GET['id']); // INSECURE
could be written as


exec `
SELECT u
FROM User u
WHERE u.id = %{ $_GET['id'] }
`
?>

where the part between %{ ... } is transformed into an SQL literal string (with 
delimiters "...", not just “escaping”) when it is a string; into the SQL 
expression NULL when it is null; into an SQL subexpression if it is an object 
(provided by the library) that represents a well-formed SQL subexpression, etc.

—Claude



Re: [PHP-DEV] [RFC] [VOTE] is_literal

2021-09-07 Thread Craig Francis
On Mon, 19 Jul 2021 at 19:59, Craig Francis 
wrote:

> On Mon, 5 Jul 2021 at 19:14, Craig Francis 
> wrote:
>
>> I have opened voting on https://wiki.php.net/rfc/is_literal for the
>> is-literal function.
>>
>
> This RFC has been rejected; with 10 votes in favour, and 23 against.
> [...]
> And thank you to Matthew Brown for adding the 'literal-string' type to
> Psalm:
> https://github.com/vimeo/psalm/releases/tag/4.8.0
>



FYI: The 'literal-string' type has now been added to PHPStan, thanks
to Ondřej Mirtes:

https://github.com/phpstan/phpstan/releases/tag/0.12.97

Obviously I'd still like libraries to be able to protect everyone from
introducing Injection Vulnerabilities (as the majority of programmers don't
use static analysis), but that's for another day.

Craig


Re: [PHP-DEV] [RFC] [VOTE] is_literal

2021-07-21 Thread Craig Francis
> On 20 Jul 2021, at 01:23, tyson andre  wrote:
> 
>> As an aside, only 4 of 23 'no' voters provided any comment as to why they
>> voted that way on the mailing list, which I feel undermines the point of
>> the Request For Comment process, with an additional 5 responding personally
>> off-list after prompting. This makes it harder (or impossible) for points
>> to be discussed and addressed.
> 
> 1. My earlier comments about static analysis, and on behavior depending on 
> whether opcache is enabled


Thanks Tyson,

Just to check I've not missed anything, are you referring to your comments from 
March 2020?
https://news-web.php.net/php.internals/109192 


I mentioned your email in the RFC, as I agree that developers should use Static 
Analysis, but I think it's highly unlikely we will be able to get most PHP 
developers to use it:
https://wiki.php.net/rfc/is_literal#static_analysis 


As to the OPcache, Joe's implementation already works with its optimisations, 
so the results are consistent.
https://wiki.php.net/rfc/is_literal#compiler_optimisations 




> 2. This might prevent certain optimizations in the future. For example, 
> currently, 1-byte strings are all interned to save memory.
>If is_literal was part of php prior to proposing that optimization, then 
> that optimization may be rejected.


Like the OPcache optimisations, the 1-byte strings and other existing 
optimisations work with Joe's implementation. I cannot comment or predict any 
future optimisation work, but this argument could be made for any change to PHP.



> 3. PHP's `string` type is used both for (hopefully) valid unicode strings and 
> for low level operations on literal byte arrays (e.g. cryptogrophy).
>It seems really, really strange for a type system to track trustedness for 
> a low level primitive to track byte arrays. (the php 6 unicode string 
> refactoring failed)
> 
>Further, if this were to be extended in the future beyond the original 
> proposal (e.g. literal strings that are entirely digits are automatically 
> interned or marked as trusted),
>this might open previously safe code acting on byte arrays to side channel 
> issues such as timing attacks (https://en.wikipedia.org/wiki/Timing_attack)


We're just adding a flag to say if the string was written by the developer, 
which is key for identifying Injection Vulnerabilities (it's similar to the 
Interned flag).

With cryptography and timing attacks, the only part affected would be during 
string concatenation, which has already got several optimisations that 
introduce variability in timings (e.g. concatenating variables containing 'a' 
with 'b' is about twice as slow than 'a' with '').

We did have a discussion about integer support, but the way 
integers/floats/booleans are currently stored in memory would require a big 
change to note if those values were defined in the developers PHP source code. 
We also explored the idea of allowing all integers, and while they cannot lead 
to an Injection Vulnerability, the consensus was that a developer defined 
string is easier to understand:
https://wiki.php.net/rfc/is_literal#integer_values 




> 4. Internal functions and userland polyfills for those functions may 
> unintentionally differ significantly for the resulting taintedness,
>e.g. base64_decode in userland being built up byte by byte would end up 
> being possibly untainted?


All of these functions would return a non-literal string, as the values were 
not written by the developer (as in, directly defined in their source code).



> 5. The fact that 1-byte strings are almost always interned seems like a 
> noticeable inconsistency (though library authors can deal with it once 
> they're aware of it), though for it to become an issue a library may need to 
> take multiple strings as input
>(e.g. a contrived example`"echo -- " . $trustedPrefix . 
> shell_escape($notTrusted)` for $trustedPrefix of "'" (or "\n") and 
> $notTrusted of "; evaluate command"


1-byte strings have already been addressed in the implementation.

Libraries need to take user values separately from developer defined strings, 
because developers frequently make mistakes:
https://github.com/craigfrancis/php-is-literal-rfc/blob/main/justification/mistakes.php?ts=4
 


Especially when it comes to escaping values:
https://github.com/craigfrancis/php-is-literal-rfc/blob/main/justification/escaping.php?ts=4
 


In your command example, the library could copy how Parameterised Queries work 
in SQL.
https://github.com/craigfrancis/php-is-literal-rfc/blob/main/examples/cli-basic.php
 

Re: [PHP-DEV] [RFC] [VOTE] is_literal

2021-07-19 Thread tyson andre
Hi Craig Francis,

> As an aside, only 4 of 23 'no' voters provided any comment as to why they
> voted that way on the mailing list, which I feel undermines the point of
> the Request For Comment process, with an additional 5 responding personally
> off-list after prompting. This makes it harder (or impossible) for points
> to be discussed and addressed.

1. My earlier comments about static analysis, and on behavior depending on 
whether opcache is enabled
2. This might prevent certain optimizations in the future. For example, 
currently, 1-byte strings are all interned to save memory.
If is_literal was part of php prior to proposing that optimization, then 
that optimization may be rejected.
3. PHP's `string` type is used both for (hopefully) valid unicode strings and 
for low level operations on literal byte arrays (e.g. cryptogrophy).
It seems really, really strange for a type system to track trustedness for 
a low level primitive to track byte arrays. (the php 6 unicode string 
refactoring failed)

Further, if this were to be extended in the future beyond the original 
proposal (e.g. literal strings that are entirely digits are automatically 
interned or marked as trusted),
this might open previously safe code acting on byte arrays to side channel 
issues such as timing attacks (https://en.wikipedia.org/wiki/Timing_attack)
4. Internal functions and userland polyfills for those functions may 
unintentionally differ significantly for the resulting taintedness,
e.g. base64_decode in userland being built up byte by byte would end up 
being possibly untainted?
5. The fact that 1-byte strings are almost always interned seems like a 
noticeable inconsistency (though library authors can deal with it once they're 
aware of it), though for it to become an issue a library may need to take 
multiple strings as input
(e.g. a contrived example`"echo -- " . $trustedPrefix . 
shell_escape($notTrusted)` for $trustedPrefix of "'" (or "\n") and $notTrusted 
of "; evaluate command"
6. Including it in core would make it harder to remove later if it interfered 
with opcache or jit work, or to migrate code to alternative interpreters for 
php if those were ever implemented (if frameworks were to extensively depend on 
is_literal)
7. Tracking whether a string is untrusted is a definition only suitable for a 
few (extremely common) formats for php. But for less common features, even 
stringified integers may be a problem (e.g. binary file formats, etc)

This is relatively minor given that php is typically used in a web 
programming context with json or html or js/css output

I'd think is_interned()/intern_string() is much closer to tracking 
something that corresponds with php's internals (e.g. and may allow saving 
memory in long-running processes which receive duplicate strings as input), 
though the 10 people who wanted fully featured trustedness checking would 
probably want is_literal instead
8. Serializing and unserializing data would lose information about trustedness 
of inputs, unpredictably (e.g. unserialize() in php 8.0 interns array keys).

There's no (efficient) way to change trusted strings to untrusted or vice 
versa, though there are inefficient workarounds (modifying a byte and restoring 
it to stop trusting it, imploding single characters to create a trusted string)

This may done implicitly in frameworks using APCu/memcached/redis as a cache

(I definitely don't think the serialization data format should track 
is_literal())

9. Future refactorings, optimizations or deoptimizations (or security fixes) to 
unserialize(), etc. may unexpectedly break code using is_literal that throw 
instead of warn (more bug reports, discourage users from upgrading, etc.)
10. This RFC adds an unknown amount of future work for php-src and PECLs to 
*intuitively* support mapping trusted inputs to trusted outputs or vice versa - 
less commonly used or unmaintained functions may not behave as expected for a 
while
11. https://pecl.php.net/package/taint is available already for a use case with 
some overlap for setups that need this


Aside: I'd have to wonder if ZSTR_IS_INTERNED (and the function to make an 
interned string) would make sense to expose in a PECL as a regular `extension` 
(not a `zend_extension`) if is_interned also fails.
Unlike the zend_extension for 
https://www.php.net/manual/en/function.is-tainted.php ,
something simple may be possible without needing the performance hit and
future conflicts with XDebug that I assume 
https://www.php.net/manual/en/function.is-tainted.php would be prone to.
(https://pecl.php.net/package/taint seems to use a separate bit to track this. 
The latest release of the Taint pecl fixes XDebug compatibility)

- Other languages, such as Java, have exposed this for memory management 
purposes (rather than security) though it's rarely used directly or in 
frameworks, e.g. 

Re: [PHP-DEV] [RFC] [VOTE] is_literal

2021-07-19 Thread Craig Francis
On Mon, 5 Jul 2021 at 19:14, Craig Francis  wrote:

> Hi Internals,
> I have opened voting on https://wiki.php.net/rfc/is_literal for the
> is-literal function.
>


This RFC has been rejected; with 10 votes in favour, and 23 against.

I'd like to thank everyone who has been involved in this project, in
particular Joe Watkins for creating the implementation and helping me
though this process, Máté Kocsis for the independent performance tests,
Rowan Francis for helping me with the RFC text and emails, Derick Rethans
for covering this on the PHP Internals News podcast, and for everyone who
has discussed this implementation and provided their input (especially
Larry Garfield, Scott Arciszewski, and Rowan Tommins).

And thank you to Matthew Brown for adding the 'literal-string' type to
Psalm:
https://github.com/vimeo/psalm/releases/tag/4.8.0

As an aside, only 4 of 23 'no' voters provided any comment as to why they
voted that way on the mailing list, which I feel undermines the point of
the Request For Comment process, with an additional 5 responding personally
off-list after prompting. This makes it harder (or impossible) for points
to be discussed and addressed.

Craig


Re: [PHP-DEV] [RFC] [VOTE] is_literal

2021-07-19 Thread Craig Francis
On Mon, 19 Jul 2021 at 12:51, Guilliam Xavier 
wrote:

> there was *indeed* an SQL injection vulnerability in that code.



Yep, you're right, there was an issue in there as well.

esc_like() also needs to use esc_sql() for the value to be added directly
to the SQL string.

By changing to $wpdb->prepare(), assuming a literal string for the $query
argument, then esc_sql() would have been used automatically (technically
escape_by_ref).

Shall we just put it down as another example of escaping going wrong, why
Parameterised Queries work, and how Taint Checking isn't really a solution
(as the value was escaped, ish).

And if you want another fun one, not that anyone should be using inline
JavaScript, but esc_js() doesen't escape single quotes:

$variable = '\' onmouseover=alert(1) a=\'';

$html = "Link";

https://developer.wordpress.com/themes/escaping/#javascript

Craig


Re: [PHP-DEV] [RFC] [VOTE] is_literal

2021-07-19 Thread Guilliam Xavier
On Fri, Jul 16, 2021 at 2:47 AM Craig Francis 
wrote:

> Just another day, and another injection vulnerability (please patch):
>
> https://woocommerce.com/posts/critical-vulnerability-detected-july-2021/
>
> If only escaping wasn't being used, so user values did not get included in
> certain strings :-)
>
> diff -r
> woocommerce.5.5.0/includes/data-stores/class-wc-webhook-data-store.php
> woocommerce.5.5.1/includes/data-stores/class-wc-webhook-data-store.php
> 280c280
> < $search  = ! empty( $args['search'] ) ? "AND `name` LIKE '%" .
> $wpdb->esc_like( sanitize_text_field( $args['search'] ) ) . "%'" : '';
> ---
> > $search  = ! empty( $args['search'] ) ? $wpdb->prepare( "AND
> `name` LIKE %s", '%' . $wpdb->esc_like( sanitize_text_field(
> $args['search'] ) ) . '%' ) : '';
>

On Sat, Jul 17, 2021 at 3:45 AM Craig Francis 
wrote:

> On Fri, 16 Jul 2021 at 21:24, Hans Henrik Bergan 
> wrote:
>
> > short of a bug in esc_like(), i don't even see the vulnerability issue in
> > that code?
> >
>
>
> Sorry Hans, I copied the wrong diff.
>
> There were only 2 changes from woocommerce 5.5.0 to 5.5.1.
>
> Like you I was wondering what that diff was doing before posting - I'm
> fairly sure it's just to be consistent with the other lines (which all use
> $wpdb->prepare).
>

I don't think so.  Looking at
https://developer.wordpress.org/reference/functions/sanitize_text_field/ and
https://developer.wordpress.org/reference/classes/wpdb/esc_like/
you can see that they *don't* escape single quotes,
so there was *indeed* an SQL injection vulnerability in that code.

(Which is [one of the reasons] why I avoid WordPress [and especially
third-party themes / plugins] as much as possible.)

-- 
Guilliam Xavier


Re: [PHP-DEV] [RFC] [VOTE] is_literal

2021-07-19 Thread Jordan LeDoux
Thanks Nikita, that's good to know. I'm still familiarizing myself with the
source right now, so I apologize if this is something that commonly gets
spread as false information, I honestly was exploring the workings of
injection protection in the source after following this discussion, but
that's why I asked. My intention was not to spread FUD, I just thought I'd
found something that slipped through the cracks.

Cheers,
Jordan

On Mon, Jul 19, 2021 at 12:24 AM Nikita Popov  wrote:

> On Sun, Jul 18, 2021 at 4:42 AM Jordan LeDoux 
> wrote:
>
>> Related to the general topic of injection attacks, I was considering
>> submitting a PR to change the default of PDO::ATTR_EMULUATE_PREPARES to
>> FALSE, since this mistakenly can lead people to believe that using
>> prepared
>> statements with PDO and MySQL protects against injection attacks. In fact,
>> this is only the case if they create the PDO object with the option
>> specified as false. I'm not aware however to reasoning for enabling
>> prepare
>> emulation by default for MySQL. I would assume it's a performance choice,
>> but how long ago was this choice made and is it worth revisiting? Would
>> this be something that requires its own RFC? It's a single line change.
>>
>> Jordan
>>
>
> Please do refrain from spreading this FUD. While there are certain
> tradeoffs between choosing emulated or native prepared statements, security
> considerations do not factor into it. There's a very narrow window where
> emulated prepared statements can lead to incorrect escaping (it involves
> picking an exotic non-ASCII-compatible charset, not specifying it in the
> connection DSN, and switching to it at runtime), but it's not something you
> can hit by accident.
>
> Regards,
> Nikita
>
> On Sat, Jul 17, 2021 at 9:48 AM Craig Francis 
>> wrote:
>>
>> > On Sat, 17 Jul 2021 at 4:05 pm, Marco Pivetta 
>> wrote:
>> >
>> > > my belief is that this is not a runtime problem, but rather a
>> type-level
>> > > issue with tainted/untainted input/output.
>> > >
>> >
>> >
>> > Thank you for the feedback Marco,
>> >
>> > As you appreciate, I don’t believe we can get every PHP developer to use
>> > Static Analysis. It’s an extra step that developers with less time,
>> energy,
>> > or care, will not setup and use.
>> >
>> > Putting something in the base language, means that libraries can just
>> use
>> > it, and people using the sites/systems of rushed or lazier developers
>> will
>> > have these checks helping keep their data secure. Data breeches can have
>> > life-changing consequences for people, Injection Vulnerabilities are
>> one of
>> > the biggest causes of them, and since we have the ability for libraries
>> to
>> > warn all developers about these mistake, we should.
>> >
>> > At the moment our house can catch on fire and we don’t even have a smoke
>> > alarm. This is the smoke alarm. And there are reasons why it’s builders
>> and
>> > landlords that have to install them, and we don’t rely on the tenants
>> going
>> > and sorting them out themselves. Because if they don’t, for the best or
>> the
>> > worse reasons, either way there are severe consequences to everybody.
>> >
>> > In regards to Taint Checking, it has a significant problem as it
>> creates a
>> > false sense of security, hence these examples in the RFC:
>> >
>> > $sql = 'SELECT * FROM users WHERE id = ' .
>> $db->real_escape_string($id); //
>> > INSECURE
>> >
>> > $html = ""; // INSECURE
>> >
>> > $html = "..."; // INSECURE
>> >
>> > Fortunately Psalm has just implemented the is_literal() concept, so
>> those
>> > developers who do use Psalm can protect themselves from these issues:
>> >
>> > https://github.com/vimeo/psalm/releases/tag/4.8.0
>> >
>> >
>> >
>> > In addition to that, a mechanism to un-taint values is missing,
>> > >
>> >
>> >
>> > That’s the main flaw with Taint Checking, because it’s not possible to
>> mark
>> > something as safe without knowing about the context. As in, developers
>> use
>> > an escaping function (to mark as untainted), think the value is now
>> “safe”,
>> > and incorrectly use that value in a way that causes a security
>> > vulnerability.
>> >
>> > is_literal() simplifies this problem considerably, by just identifying
>> > developer defined strings, and instead using libraries to handle user
>> > values.
>> >
>> > Craig
>> >
>>
>


Re: [PHP-DEV] [RFC] [VOTE] is_literal

2021-07-19 Thread Nikita Popov
On Sun, Jul 18, 2021 at 4:42 AM Jordan LeDoux 
wrote:

> Related to the general topic of injection attacks, I was considering
> submitting a PR to change the default of PDO::ATTR_EMULUATE_PREPARES to
> FALSE, since this mistakenly can lead people to believe that using prepared
> statements with PDO and MySQL protects against injection attacks. In fact,
> this is only the case if they create the PDO object with the option
> specified as false. I'm not aware however to reasoning for enabling prepare
> emulation by default for MySQL. I would assume it's a performance choice,
> but how long ago was this choice made and is it worth revisiting? Would
> this be something that requires its own RFC? It's a single line change.
>
> Jordan
>

Please do refrain from spreading this FUD. While there are certain
tradeoffs between choosing emulated or native prepared statements, security
considerations do not factor into it. There's a very narrow window where
emulated prepared statements can lead to incorrect escaping (it involves
picking an exotic non-ASCII-compatible charset, not specifying it in the
connection DSN, and switching to it at runtime), but it's not something you
can hit by accident.

Regards,
Nikita

On Sat, Jul 17, 2021 at 9:48 AM Craig Francis 
> wrote:
>
> > On Sat, 17 Jul 2021 at 4:05 pm, Marco Pivetta 
> wrote:
> >
> > > my belief is that this is not a runtime problem, but rather a
> type-level
> > > issue with tainted/untainted input/output.
> > >
> >
> >
> > Thank you for the feedback Marco,
> >
> > As you appreciate, I don’t believe we can get every PHP developer to use
> > Static Analysis. It’s an extra step that developers with less time,
> energy,
> > or care, will not setup and use.
> >
> > Putting something in the base language, means that libraries can just use
> > it, and people using the sites/systems of rushed or lazier developers
> will
> > have these checks helping keep their data secure. Data breeches can have
> > life-changing consequences for people, Injection Vulnerabilities are one
> of
> > the biggest causes of them, and since we have the ability for libraries
> to
> > warn all developers about these mistake, we should.
> >
> > At the moment our house can catch on fire and we don’t even have a smoke
> > alarm. This is the smoke alarm. And there are reasons why it’s builders
> and
> > landlords that have to install them, and we don’t rely on the tenants
> going
> > and sorting them out themselves. Because if they don’t, for the best or
> the
> > worse reasons, either way there are severe consequences to everybody.
> >
> > In regards to Taint Checking, it has a significant problem as it creates
> a
> > false sense of security, hence these examples in the RFC:
> >
> > $sql = 'SELECT * FROM users WHERE id = ' . $db->real_escape_string($id);
> //
> > INSECURE
> >
> > $html = ""; // INSECURE
> >
> > $html = "..."; // INSECURE
> >
> > Fortunately Psalm has just implemented the is_literal() concept, so those
> > developers who do use Psalm can protect themselves from these issues:
> >
> > https://github.com/vimeo/psalm/releases/tag/4.8.0
> >
> >
> >
> > In addition to that, a mechanism to un-taint values is missing,
> > >
> >
> >
> > That’s the main flaw with Taint Checking, because it’s not possible to
> mark
> > something as safe without knowing about the context. As in, developers
> use
> > an escaping function (to mark as untainted), think the value is now
> “safe”,
> > and incorrectly use that value in a way that causes a security
> > vulnerability.
> >
> > is_literal() simplifies this problem considerably, by just identifying
> > developer defined strings, and instead using libraries to handle user
> > values.
> >
> > Craig
> >
>


Re: [PHP-DEV] [RFC] [VOTE] is_literal

2021-07-18 Thread Pierre Joye
Good morning,

On Mon, Jul 19, 2021 at 9:11 AM Jordan LeDoux  wrote:
>
> > Are there documented SQL injection opportunities when using emulated
> prepares? I'm not aware of any.
>
> This was from my reading of the actual source, which of course may be
> flawed. It appeared that if emulated prepares were used the values were
> escaped and then passed as strings as part of the query, the same as if it
> had been concatenated and wrapped in real_escape_string. I hadn't gone too
> far in actually debugging it yet to find out how it behaved under different
> circumstances as I was still trying to figure out how "small" of a change
> this was from the perspective of internals.

I also don't think there is any left over possible SQL injection in
any of the core DB extensions (PDO or 'native'). It will indeed do not
prevent inserting invalid data but if there were any actual SQL
injection left, I am very confident we would have known it by now. :)

-- 
Pierre

@pierrejoye | http://www.libgd.org

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php



Re: [PHP-DEV] [RFC] [VOTE] is_literal

2021-07-18 Thread Jordan LeDoux
> Are there documented SQL injection opportunities when using emulated
prepares? I'm not aware of any.

This was from my reading of the actual source, which of course may be
flawed. It appeared that if emulated prepares were used the values were
escaped and then passed as strings as part of the query, the same as if it
had been concatenated and wrapped in real_escape_string. I hadn't gone too
far in actually debugging it yet to find out how it behaved under different
circumstances as I was still trying to figure out how "small" of a change
this was from the perspective of internals.



On Sun, Jul 18, 2021 at 3:38 PM Benjamin Morel 
wrote:

> There's some BC-breaks to be aware of when switching emulated prepares.
>> One example I know of is that when using emulated prepares you can reuse
>> the same placeholder (as in the following example), but with emulated
>> prepares disabled this does not work.
>>
>> $sql = "SELECT * FROM table WHERE column1 LIKE CONCAT('%', :searchValue,
>> '%') OR column2 LIKE CONCAT('%', :searchValue, '%');";
>> $params = [
>>  "searchValue" => "test",
>> ];
>>
>
> This, and do note that from a performance point of view, disabling
> emulated prepares is not innocuous : most of the time, you effectively send
> twice as many queries to the database : one prepare, one execute.
> There is a *small* performance improvement in using prepared statements
> if you're executing the same statement many times; but in all other cases,
> you're spending twice as much time waiting for the database.
>
> Are there documented SQL injection opportunities when using emulated
> prepares? I'm not aware of any.
>
> — Benjamin
>


Re: [PHP-DEV] [RFC] [VOTE] is_literal

2021-07-18 Thread Benjamin Morel
>
> There's some BC-breaks to be aware of when switching emulated prepares.
> One example I know of is that when using emulated prepares you can reuse
> the same placeholder (as in the following example), but with emulated
> prepares disabled this does not work.
>
> $sql = "SELECT * FROM table WHERE column1 LIKE CONCAT('%', :searchValue,
> '%') OR column2 LIKE CONCAT('%', :searchValue, '%');";
> $params = [
>  "searchValue" => "test",
> ];
>

This, and do note that from a performance point of view, disabling emulated
prepares is not innocuous : most of the time, you effectively send twice as
many queries to the database : one prepare, one execute.
There is a *small* performance improvement in using prepared statements if
you're executing the same statement many times; but in all other cases,
you're spending twice as much time waiting for the database.

Are there documented SQL injection opportunities when using emulated
prepares? I'm not aware of any.

— Benjamin


Re: [PHP-DEV] [RFC] [VOTE] is_literal

2021-07-18 Thread Jordan LeDoux
That sounds like something that would require both a deprecation and an RFC
for the change then, even if the actual change in the source is small.

It still may be worth exploring, since this surely gives a large number of
people false confidence in protection against injection attacks, as nearly
every available tutorial on using PDO in PHP declares that simply using
prepared statements protects you from injection attacks.

On Sun, Jul 18, 2021 at 12:47 AM AllenJB  wrote:

>
> On 18/07/2021 03:41, Jordan LeDoux wrote:
> > Related to the general topic of injection attacks, I was considering
> > submitting a PR to change the default of PDO::ATTR_EMULUATE_PREPARES to
> > FALSE, since this mistakenly can lead people to believe that using
> prepared
> > statements with PDO and MySQL protects against injection attacks. In
> fact,
> > this is only the case if they create the PDO object with the option
> > specified as false. I'm not aware however to reasoning for enabling
> prepare
> > emulation by default for MySQL. I would assume it's a performance choice,
> > but how long ago was this choice made and is it worth revisiting? Would
> > this be something that requires its own RFC? It's a single line change.
> >
> > Jordan
>
> There's some BC-breaks to be aware of when switching emulated prepares.
> One example I know of is that when using emulated prepares you can reuse
> the same placeholder (as in the following example), but with emulated
> prepares disabled this does not work.
>
> $sql = "SELECT * FROM table WHERE column1 LIKE CONCAT('%', :searchValue,
> '%') OR column2 LIKE CONCAT('%', :searchValue, '%');";
> $params = [
>  "searchValue" => "test",
> ];
>
> With emulated prepares disabled you have to use:
>
> $sql = "SELECT * FROM table WHERE column1 LIKE CONCAT('%', :searchValue,
> '%') OR column2 LIKE CONCAT('%', :searchValue2, '%');";
> $params = [
>  "searchValue" => "test",
>  "searchValue2" => "test",
> ];
>
>
>


Re: [PHP-DEV] [RFC] [VOTE] is_literal

2021-07-18 Thread AllenJB



On 18/07/2021 03:41, Jordan LeDoux wrote:

Related to the general topic of injection attacks, I was considering
submitting a PR to change the default of PDO::ATTR_EMULUATE_PREPARES to
FALSE, since this mistakenly can lead people to believe that using prepared
statements with PDO and MySQL protects against injection attacks. In fact,
this is only the case if they create the PDO object with the option
specified as false. I'm not aware however to reasoning for enabling prepare
emulation by default for MySQL. I would assume it's a performance choice,
but how long ago was this choice made and is it worth revisiting? Would
this be something that requires its own RFC? It's a single line change.

Jordan


There's some BC-breaks to be aware of when switching emulated prepares. 
One example I know of is that when using emulated prepares you can reuse 
the same placeholder (as in the following example), but with emulated 
prepares disabled this does not work.


$sql = "SELECT * FROM table WHERE column1 LIKE CONCAT('%', :searchValue, 
'%') OR column2 LIKE CONCAT('%', :searchValue, '%');";

$params = [
    "searchValue" => "test",
];

With emulated prepares disabled you have to use:

$sql = "SELECT * FROM table WHERE column1 LIKE CONCAT('%', :searchValue, 
'%') OR column2 LIKE CONCAT('%', :searchValue2, '%');";

$params = [
    "searchValue" => "test",
    "searchValue2" => "test",
];

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php



Re: [PHP-DEV] [RFC] [VOTE] is_literal

2021-07-17 Thread Jordan LeDoux
Related to the general topic of injection attacks, I was considering
submitting a PR to change the default of PDO::ATTR_EMULUATE_PREPARES to
FALSE, since this mistakenly can lead people to believe that using prepared
statements with PDO and MySQL protects against injection attacks. In fact,
this is only the case if they create the PDO object with the option
specified as false. I'm not aware however to reasoning for enabling prepare
emulation by default for MySQL. I would assume it's a performance choice,
but how long ago was this choice made and is it worth revisiting? Would
this be something that requires its own RFC? It's a single line change.

Jordan

On Sat, Jul 17, 2021 at 9:48 AM Craig Francis 
wrote:

> On Sat, 17 Jul 2021 at 4:05 pm, Marco Pivetta  wrote:
>
> > my belief is that this is not a runtime problem, but rather a type-level
> > issue with tainted/untainted input/output.
> >
>
>
> Thank you for the feedback Marco,
>
> As you appreciate, I don’t believe we can get every PHP developer to use
> Static Analysis. It’s an extra step that developers with less time, energy,
> or care, will not setup and use.
>
> Putting something in the base language, means that libraries can just use
> it, and people using the sites/systems of rushed or lazier developers will
> have these checks helping keep their data secure. Data breeches can have
> life-changing consequences for people, Injection Vulnerabilities are one of
> the biggest causes of them, and since we have the ability for libraries to
> warn all developers about these mistake, we should.
>
> At the moment our house can catch on fire and we don’t even have a smoke
> alarm. This is the smoke alarm. And there are reasons why it’s builders and
> landlords that have to install them, and we don’t rely on the tenants going
> and sorting them out themselves. Because if they don’t, for the best or the
> worse reasons, either way there are severe consequences to everybody.
>
> In regards to Taint Checking, it has a significant problem as it creates a
> false sense of security, hence these examples in the RFC:
>
> $sql = 'SELECT * FROM users WHERE id = ' . $db->real_escape_string($id); //
> INSECURE
>
> $html = ""; // INSECURE
>
> $html = "..."; // INSECURE
>
> Fortunately Psalm has just implemented the is_literal() concept, so those
> developers who do use Psalm can protect themselves from these issues:
>
> https://github.com/vimeo/psalm/releases/tag/4.8.0
>
>
>
> In addition to that, a mechanism to un-taint values is missing,
> >
>
>
> That’s the main flaw with Taint Checking, because it’s not possible to mark
> something as safe without knowing about the context. As in, developers use
> an escaping function (to mark as untainted), think the value is now “safe”,
> and incorrectly use that value in a way that causes a security
> vulnerability.
>
> is_literal() simplifies this problem considerably, by just identifying
> developer defined strings, and instead using libraries to handle user
> values.
>
> Craig
>


Re: [PHP-DEV] [RFC] [VOTE] is_literal

2021-07-17 Thread Craig Francis
On Sat, 17 Jul 2021 at 4:05 pm, Marco Pivetta  wrote:

> my belief is that this is not a runtime problem, but rather a type-level
> issue with tainted/untainted input/output.
>


Thank you for the feedback Marco,

As you appreciate, I don’t believe we can get every PHP developer to use
Static Analysis. It’s an extra step that developers with less time, energy,
or care, will not setup and use.

Putting something in the base language, means that libraries can just use
it, and people using the sites/systems of rushed or lazier developers will
have these checks helping keep their data secure. Data breeches can have
life-changing consequences for people, Injection Vulnerabilities are one of
the biggest causes of them, and since we have the ability for libraries to
warn all developers about these mistake, we should.

At the moment our house can catch on fire and we don’t even have a smoke
alarm. This is the smoke alarm. And there are reasons why it’s builders and
landlords that have to install them, and we don’t rely on the tenants going
and sorting them out themselves. Because if they don’t, for the best or the
worse reasons, either way there are severe consequences to everybody.

In regards to Taint Checking, it has a significant problem as it creates a
false sense of security, hence these examples in the RFC:

$sql = 'SELECT * FROM users WHERE id = ' . $db->real_escape_string($id); //
INSECURE

$html = ""; // INSECURE

$html = "..."; // INSECURE

Fortunately Psalm has just implemented the is_literal() concept, so those
developers who do use Psalm can protect themselves from these issues:

https://github.com/vimeo/psalm/releases/tag/4.8.0



In addition to that, a mechanism to un-taint values is missing,
>


That’s the main flaw with Taint Checking, because it’s not possible to mark
something as safe without knowing about the context. As in, developers use
an escaping function (to mark as untainted), think the value is now “safe”,
and incorrectly use that value in a way that causes a security
vulnerability.

is_literal() simplifies this problem considerably, by just identifying
developer defined strings, and instead using libraries to handle user
values.

Craig


Re: [PHP-DEV] [RFC] [VOTE] is_literal

2021-07-17 Thread Marco Pivetta
Hey Craig,

On Mon, Jul 5, 2021 at 8:15 PM Craig Francis 
wrote:

> Hi Internals,
>
> I have opened voting on https://wiki.php.net/rfc/is_literal for the
> is-literal function.
>
> The vote closes 2021-07-19
>
> The proposal is to add the function is_literal(), a simple way to identify
> if a string was written by a developer, removing the risk of a variable
> containing an Injection Vulnerability.
>
> This implementation is for literal strings ONLY (after discussion over
> allowing integers) and, thanks to the amazing work of Joe Watkins, now
> works fully with compiler optimisations, interned strings etc.
>
> Craig
>

I ended up voting "no" on this.

Even though I used to work on Doctrine ORM for years, and we had a private
discussion around this, my belief is that this is not a runtime problem,
but rather a type-level issue with tainted/untainted input/output.

This has been brought up in the discussion phase by Tyson Andre, as far as
I remember.

There is good progress in taint analysis in Psalm, specifically:

 * https://psalm.dev/articles/detect-security-vulnerabilities-with-psalm
 * https://psalm.dev/docs/security_analysis/

There's also the issue, from my end, that I would like the ecosystem to
pick up static analysis more, rather than people adding more runtime
roadblocks to their code, leading to further complexity in error analysis.
This is my personal push for putting more problems into build-time rather
than runtime.

You could call it a political choice to push a tool like psalm, instead of
the language having a runtime construct to further "duck-type safety".

In addition to that, a mechanism to un-taint values is missing, but it is
not a concept symmetrical to `is_literal()`.

Greets,

Marco Pivetta

http://twitter.com/Ocramius

http://ocramius.github.com/


Re: [PHP-DEV] [RFC] [VOTE] is_literal

2021-07-17 Thread Craig Francis
On Sat, 17 Jul 2021 at 08:59, Hans Henrik Bergan 
wrote:

> i can tell from only that diff that, at least as of 5.5.1,  woocommerce is
> not compatible with @@SQL_MODE=ANSI_QUOTES :p



Yep, and I did that years ago - I preferred to use single quotes for
strings in PHP (so variables stood out), and double quotes for SQL.

Just for fun, `NO_BACKSLASH_ESCAPES` is a good way to mess with people (I
believe the "real" escaping method can catch this, but I wouldn't trust it).

When it comes to escaping in general, my favourite mistake is:

$sql = 'WHERE id = ' . $mysqli->real_escape_string($_GET['id']);

Where the assumed to be number hasn't been quoted at all :-)

PDO::quote() does work a bit better (by adding the quotes itself), but I
still prefer its documentation - noting how the quoted string is only
"theoretically safe", and that "you are strongly recommended to use
PDO::prepare() to prepare SQL statements with bound parameters".

https://www.php.net/manual/en/pdo.quote.php

Craig


Re: [PHP-DEV] [RFC] [VOTE] is_literal

2021-07-17 Thread Hans Henrik Bergan
oh thanks, now the vulnerability is clear. (i would still complain on that
as a pull request though, using double quotes for strings is just a
horrible idea, it's not compliant with ISO sql, and it depends on the MySQL
server it's running on *not* having @@SQL_MODE=ANSI_QUOTES enabled which
changes double-quotes to be ISO-compliant instead of mysql-borky (double
quotes for strings only work on MySQL because of a MySQL quirk, ISO SQL say
double quotes is meant for identifiers just like ` ), and it's not portable
across different SQL servers, and all of these problems go away if you just
use the sensible alternative: single quotes - seriously someone should
complain to whoever wrote that, so at least he doesn't do the same in the
future
i can tell from only that diff that, at least as of 5.5.1,  woocommerce is
not compatible with @@SQL_MODE=ANSI_QUOTES :p
)


On Sat, 17 Jul 2021 at 03:45, Craig Francis 
wrote:

> On Fri, 16 Jul 2021 at 21:24, Hans Henrik Bergan 
> wrote:
>
>> short of a bug in esc_like(), i don't even see the vulnerability issue in
>> that code?
>>
>
>
> Sorry Hans, I copied the wrong diff.
>
> There were only 2 changes from woocommerce 5.5.0 to 5.5.1.
>
> Like you I was wondering what that diff was doing before posting - I'm
> fairly sure it's just to be consistent with the other lines (which all use
> $wpdb->prepare).
>
> The diff I should have copied is:
>
> diff -r
> woocommerce.5.5.0/packages/woocommerce-blocks/src/StoreApi/Utilities/ProductQueryFilters.php
> woocommerce.5.5.1/packages/woocommerce-blocks/src/StoreApi/Utilities/ProductQueryFilters.php
> 86c86,92
> <  $attributes_to_count = array_map( 'wc_sanitize_taxonomy_name',
> $attributes );
> ---
> >  $attributes_to_count = array_map(
> >function( $attribute ) {
> >  $attribute = wc_sanitize_taxonomy_name( $attribute );
> >  return esc_sql( $attribute );
> >},
> >$attributes
> >  );
>
> In context `$attributes_to_count` simply goes to:
>
> $attributes_to_count_sql = 'AND term_taxonomy.taxonomy IN ("' . implode(
> '","', $attributes_to_count ) . '")';
>
> Where the the esc_sql() is basically a call
> to mysqli_real_escape_string(), which explains why it needs risky quotes
> in/around implode.
>
> Craig
>
>
>
>


Re: [PHP-DEV] [RFC] [VOTE] is_literal

2021-07-16 Thread Craig Francis
On Fri, 16 Jul 2021 at 21:24, Hans Henrik Bergan 
wrote:

> short of a bug in esc_like(), i don't even see the vulnerability issue in
> that code?
>


Sorry Hans, I copied the wrong diff.

There were only 2 changes from woocommerce 5.5.0 to 5.5.1.

Like you I was wondering what that diff was doing before posting - I'm
fairly sure it's just to be consistent with the other lines (which all use
$wpdb->prepare).

The diff I should have copied is:

diff -r
woocommerce.5.5.0/packages/woocommerce-blocks/src/StoreApi/Utilities/ProductQueryFilters.php
woocommerce.5.5.1/packages/woocommerce-blocks/src/StoreApi/Utilities/ProductQueryFilters.php
86c86,92
<  $attributes_to_count = array_map( 'wc_sanitize_taxonomy_name',
$attributes );
---
>  $attributes_to_count = array_map(
>function( $attribute ) {
>  $attribute = wc_sanitize_taxonomy_name( $attribute );
>  return esc_sql( $attribute );
>},
>$attributes
>  );

In context `$attributes_to_count` simply goes to:

$attributes_to_count_sql = 'AND term_taxonomy.taxonomy IN ("' . implode(
'","', $attributes_to_count ) . '")';

Where the the esc_sql() is basically a call to mysqli_real_escape_string(),
which explains why it needs risky quotes in/around implode.

Craig


Re: [PHP-DEV] [RFC] [VOTE] is_literal

2021-07-16 Thread Hans Henrik Bergan
short of a bug in esc_like(), i don't even see the vulnerability issue in
that code?
that sanitize call looks like a data corruption issue and i bet it fails to
search for binary data, but i don't see the critical vulnerability?


Re: [PHP-DEV] [RFC] [VOTE] is_literal

2021-07-16 Thread Craig Francis
On Fri, 16 Jul 2021 at 15:50, Dan Ackroyd  wrote:

> On Mon, 12 Jul 2021 at 19:57, Craig Francis 
> wrote:
>
> > the “go-safe-html” library authors decided that
> > "the ergonomics of trusting concatenated constants far outweighs the
> security concern".
>
> Go is a quite different programming language to PHP.
>


Go is different, it's limited to running the check at compile time. That's
why I was referencing the “go-safe-html” library.

PHP is more dynamic, so we don't need to have the same restrictions (we can
allow the developer to concatenate the string, which allows us to support
existing code, rather than relying on the library).



The current JavaScript equivalent ideas for string literals appear to be
> inactive or archived.



As noted in the RFC, it's this one:
https://github.com/tc39/proposal-array-is-template-object

Krzysztof has just confirmed that he’s working on it, and is currently
getting it through tc39 (specifically updates related to Realms, a way of
executing JavaScript within the context of a new global object, something
PHP does not need to worry about).


The other JavaScript approach for dealing with trusted types
> (https://auth0.com/blog/securing-spa-with-trusted-types/) is even more
> different than this proposal.



While the article shows some React/Angular code, the focus is on Trusted
Types, which works with this concept. It protects unsafe APIs (like
innerHTML), where you can create a policy with methods to check/filter
values (e.g. forcing the use of DOMPurify). The isTemplateObject method
(which checks that a template string was created by the developer) will
work with Trusted Types, so you don't need to rely on filtering
(unreliable/limited).

Craig


Re: [PHP-DEV] [RFC] [VOTE] is_literal

2021-07-16 Thread Dan Ackroyd
On Mon, 12 Jul 2021 at 19:57, Craig Francis  wrote:

> the “go-safe-html” library authors decided that
> "the ergonomics of trusting concatenated constants far outweighs the security 
> concern".

Go is a quite different programming language to PHP.

When they say 'constants', they appear to be able to enforce that by
using a clever feature of that language
https://github.com/google/safehtml/blob/2057dd9c30f9e264f4d01c29d886d51f1b519302/template/template.go#L440-L452
:

```
// stringConstant is an unexported string type. Users of this package cannot
// create values of this type except by passing an untyped string constant to
// functions which expect a stringConstant. This type must be used only in
// function and method parameters.
type stringConstant string

func stringConstantsToStrings(strs []stringConstant) []string {
ret := make([]string, 0, len(strs))
for _, s := range strs {
ret = append(ret, string(s))
}
return ret
}

```

i.e. this code works, as the url template is an actual string constant
(not a variable):

```
safehtml.TrustedResourceURLFormatFromConstant(
  `//www.youtube.com/v/%{id}?hl=%{lang}`,
  map[string]string{
  "id":   "abc0def1",
  "lang": "en",
})
```

Attempting to use a string variable, fails to even compile:

```
format := `//www.youtube.com/v/%{id}?hl=%{lang}`

safehtml.TrustedResourceURLFormatFromConstant(
format,
map[string]string{
"id":   "abc0def1",
  "lang": "en",
})

cannot use format (type string) as type safehtml.stringConstant in
argument to safehtml.TrustedResourceURLFormatFromConstant
```

The current JavaScript equivalent ideas for string literals appear to
be inactive or archived:

* https://github.com/mikewest/tc39-proposal-literals - This repository
has been archived by the owner. It is now read-only.

* https://github.com/tc39/proposal-array-is-template-object - not
particularly active.

But my understanding is that like the Go implementation, they are
proposing to enforce literal-ness of function parameters through
special compilation rules for parameters to function calls, rather
than passing literal strings around to arbitrary places - below* or
https://tc39.es/proposal-array-is-template-object/#sec-gettemplateobject

The other JavaScript approach for dealing with trusted types
(https://auth0.com/blog/securing-spa-with-trusted-types/) is even more
different than this proposal.

It seems pretty inaccurate to claim that either the safehtml library
or the proposal for JavaScript support the choice made for the PHP
RFC. They don't appear to actually allow carrying literal-ness through
variables, only through compile-time constants that are placed inside
the parentheses or a function call. They also work in a different way,
and use features of those languages not available in PHP.

cheers
Dan
Ack


* this is the definition for the proposed JavaScript literal
implementation, I think.

```
Runtime Semantics: GetTemplateObject ( templateLiteral )

The abstract operation GetTemplateObject is called with a Parse Node,
templateLiteral, as an argument. It performs the following steps:

1. Let realm be the current Realm Record.
2. Let templateRegistry be realm.[[TemplateMap]].
3. For each element e of templateRegistry, do
  a. If e.[[Site]] is the same Parse Node as templateLiteral, then
i. Return e.[[Array]].
4. Let rawStrings be TemplateStrings of templateLiteral with argument true.
5. Let cookedStrings be TemplateStrings of templateLiteral with argument false.
6. Let count be the number of elements in the List cookedStrings.
7. Assert: count ≤ 232 - 1.
8. Let template be ! ArrayCreate(count).
9. Set template.[[TemplateObject]] to true.
10. Let rawObj be ! ArrayCreate(count).
11. Let index be 0.
12. ...
```

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php



Re: [PHP-DEV] [RFC] [VOTE] is_literal

2021-07-16 Thread Nikita Popov
On Fri, Jul 9, 2021 at 12:23 AM Máté Kocsis  wrote:

> Hi Nikita,
>
> I performed a few other benchmarks in order to provide a little bit more
> insights into the performance aspect of the RFC. My latest measurement is
> using the same setup as the previous
> one, although I made a few smaller changes in the process (e.g. changing
> the order of the tests). So at the end, I got the following results:
>
> - Laravel demo app: +0.1%
> - Symfony demo app: +0.43%
> - bench.php: +0.4%
> - custom concat bench: +3.89%
>
> The results are comparing the "literals" branch with the commit in master
> on which Joe's work is based. As far as I remember, the simpler variant of
> is_literals had a 0.1% slowdown
> in case of Symfony, and 2.09% in case of the custom concat benchmark.
> Personally, I think even the current numbers are acceptable, especially
> because PHP 8.1 is going to be
> roughly 30% faster than PHP 8.0 according to my benchmark in case of the
> Symfony demo app. Laravel's result is very similar, but I don't remember
> about the exact numbers.
>

Thanks Máté! While I'm not really happy with the technical implications,
your numbers clearly show that there is barely any impact in practice. I've
dropped my "no" vote for that reason.

Regards,
Nikita

>


Re: [PHP-DEV] [RFC] [VOTE] is_literal

2021-07-15 Thread Craig Francis
Just another day, and another injection vulnerability (please patch):

https://woocommerce.com/posts/critical-vulnerability-detected-july-2021/

If only escaping wasn't being used, so user values did not get included in
certain strings :-)

diff -r
woocommerce.5.5.0/includes/data-stores/class-wc-webhook-data-store.php
woocommerce.5.5.1/includes/data-stores/class-wc-webhook-data-store.php
280c280
< $search  = ! empty( $args['search'] ) ? "AND `name` LIKE '%" .
$wpdb->esc_like( sanitize_text_field( $args['search'] ) ) . "%'" : '';
---
> $search  = ! empty( $args['search'] ) ? $wpdb->prepare( "AND
`name` LIKE %s", '%' . $wpdb->esc_like( sanitize_text_field(
$args['search'] ) ) . '%' ) : '';


The vote for the is_literal RFC ends on Monday the 19th of July, 7:30pm UK
time and 6:30pm UTC, and needs your support.

https://wiki.php.net/rfc/is_literal


Re: [PHP-DEV] [RFC] [VOTE] is_literal

2021-07-14 Thread Craig Francis
Injection Vulnerabilities remain at the top of the OWASP "Top 10 Web
Application Security Risks".

It’s important to remember that Injection Vulnerabilities don't just affect
the developer, but rather the data of potentially thousands of people using
the website/system.

These can even occur when using libraries. Take this example from CakePHP,
where the developer has dangerously included user data into the SQL:

  $users->find()->where(['age >= ' . $_GET['age']]);

By distinguishing strings from a trusted developer, from strings that may
be attacker controlled, libraries can ensure values that go directly into
the SQL, HTML, CLI, etc have not been "Injected" with user data.

PHP is now lagging behind other languages, where Java and Go can already
test for developer defined strings (it's also being implemented in
JavaScript).

is_literal() is a simple and minor change that simply utilises a currently
unused flag on strings to mark whether the string was written by the
developer. It requires no rewriting of code by the developer to work, no
grand visionary overhaul of the PHP language, with only a 0.43% difference
in speed that is too small to measure with normal internet/database
variability. It’s just a basic but effective way of being able to warn
about and locate Injection Vulnerabilities (and therefore providing a way
for libraries to directly educate developers).

The vote for this RFC ends on Monday the 19th of July, 7:30pm UK time and
6:30pm UTC, and needs your support.
https://wiki.php.net/rfc/is_literal

The following link provides more examples of these mistakes, based on code
I’ve found on production servers. They show how similar they are to the
examples found in the libraries official documentation, and how easy it is
for a developer to make a small tweak that ends up being very dangerous:
https://github.com/craigfrancis/php-is-literal-rfc/blob/main/justification/mistakes.php

I have created 3 example libraries you can experiment with, to see what
is_literal() can do:
https://github.com/craigfrancis/php-is-literal-rfc/tree/main/examples

I'm happy to take questions on and off list.

Vote ends on Monday the 19th of July, 7:30pm UK time and 6:30pm UTC.
https://wiki.php.net/rfc/is_literal

Thanks,
Craig

>


Re: [PHP-DEV] [RFC] [VOTE] is_literal

2021-07-12 Thread Craig Francis
On Mon, 12 Jul 2021 at 14:17, Dan Ackroyd  wrote:

> On Mon, 12 Jul 2021 at 02:09, Craig Francis 
> wrote:
> >
> > you can choose not to use string concatenation ... allowing anyone to
> customise it to their needs
>
> Please can you explain how:
>
> i) An individual programmer can enforce that they don't accidentally
> use string concatenation for stuff that is used in a security
> sensitive context.
>


i) Usually if developers want to add specific restrictions for themselves
or their team that the vast majority of a userbase would not need or use,
they would use a separate tool that fills that niche to enforce their
chosen coding style (like how they might want to enforce "tabs vs spaces"
in their own codebase).



> ii) A team of 50 programmers can enforce that they don't accidentally
> use string concatenation for stuff that is used in a security
> sensitive context.
>


ii) Same answer as i) as it scales.


iii) A library can enforce that string concatenation isn't used in the
> params passed to it.
>


iii) A library shouldn't care if the developer used concatenation, it
should only care if user data has been included incorrectly (i.e. checking
for an Injection Vulnerability). A library’s purpose is to ensure whether
code is safe or not, not about enforcing personal coding styles.


> and doesn’t improve security in any way,
>
> It [disallowing concatenation] prevents issues from being able to make it
> through to production.
> But the main reason would be to reduce the cost of using this feature
> long-term.
>


Disallowing concatenation doesn’t guarantee that issues can’t get through
to production though; for example, something that can sometimes be a
non-literal, but during development defaults to a literal.

>From a security point of view, it doesn’t matter whether the error is
caught at the point of concatenation, or when it’s checked as the library
receives it, because the injection vulnerability gets caught either way.

I take it the "cost" that you’re referring to is just the debugging time?
Ultimately any extra debug time that might occur, is magnitudes less than
the time it would take almost every developer to check and rewrite most of
the projects that exist today, which is what the idea of not supporting
concatenation requires.

For a developer who really finds debugging too onerous, there are other
ways of debugging - using tools better suited for it such as XDebug or
Static Analysis tools like Psalm (and as above, if you were wanting to
force yourself/your team to not concatenate, you would be using a Static
Analysis tool already (i/ii)).


> you can choose not to use string concatenation (I haven't needed to).
>
> Waitwhat?
>
> Is your position both that preserving literal-ness across string
> concatenation is required, otherwise this feature is too hard to use,
> and at the same time, you've not needed that in your own applications.
>
> Is that right? Because if preserving literal-ness across strings
> wasn't required for you...why would it be required for every other
> project?
>
> And to be clear, I don't think it's required.
>


We are trying to improve the language for the majority of developers.

I’m an experienced PHP developer who is genuinely passionate about
security. I find it fun, curating my code to be as secure as possible is
practically a hobby. That makes me an Outlier. Like a lot of us here. My
focus is on writing an RFC that works for as many developers as possible,
so whether it’s ‘necessary’ in my own personal projects is irrelevant to
whether a simple safety feature should exist for the community.

We need to make things easy and safe for the people who are /not/ just
high-level programmers, but for people who don’t know everything we do and
need things to be simple and functional as possible.

My projects do use a lot of string concatenation. Not an erroneously high
amount, probably about normal. So let’s say we do break string
concatenation: For some numbers, I've found roughly 1,300 instances of SQL
and HTML concatenation in my projects. And even if I would be willing to go
through such a big task to replace them, the real problem is actually
finding them, because if you’re trying to use a search, well, who would
have thought looking for "." would return so many results?



> I think by listening to feedback from people who aren't sure it's a
> good idea, who said "this is a good idea but only if it's really easy
> to start using it" that this RFC has been watered down from the most
> useful proposal. At the core, there is a good idea behind this RFC,
> but the set of trade-offs chosen just aren't the right ones, and
> aren't the "proven" trade-offs made at Google.
>


The proposal hasn’t changed. This is keeping to the original concept, and
while you wanted to remove string concatenation support, that does not mean
that anything so strict was our intention. The proposal was always meant
for the greatest and easiest adoption possible, but that was your 

Re: [PHP-DEV] [RFC] [VOTE] is_literal

2021-07-12 Thread Rowan Tommins

On 11/07/2021 18:00, Dan Ackroyd wrote:

What it won't do, is tell you when you've forgotten to use it,

Which is a huge difference.



You've edited out the crucial last part of that sentence:

> but nor would a built-in implementation


I don't know how to make this point more emphatically: Neither a 
built-in literal_concat function, nor setting the internal flag to false 
when you use the "." operator, can prevent you using plain concatenation 
in the wrong place.



Regardless of what the . operator does to the internal literal flag, the 
following code will give an error in exactly the same place:


// What this line does to the literal flag is irrelevant:
$blah = 'a' . 'b';
// This line will set the literal flag to false, but will not raise any 
errors:

$foo = $blah . $_GET['foo'];
// $foo is not literal, so $bar is not literal; but there are still no 
errors:

$bar = $foo . '123';
// Finally, an error will be raised when trying to use $bar in 
literal_concat, regardless of whether it's built-in or user-implemented:

$baz = literal_concat($bar, '456');


The only way to put that error in a different place would be if the . 
operator itself was able to raise errors, and I can't think of any way 
that would be possible.


Regards,

--
Rowan Tommins
[IMSoP]

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php



Re: [PHP-DEV] [RFC] [VOTE] is_literal

2021-07-12 Thread Dan Ackroyd
On Mon, 12 Jul 2021 at 02:09, Craig Francis  wrote:
>
> you can choose not to use string concatenation ... allowing anyone to 
> customise it to their needs

Please can you explain how:

i) An individual programmer can enforce that they don't accidentally
use string concatenation for stuff that is used in a security
sensitive context.

ii) A team of 50 programmers can enforce that they don't accidentally
use string concatenation for stuff that is used in a security
sensitive context.

iii) A library can enforce that string concatenation isn't used in the
params passed to it.


> and doesn’t improve security in any way,

It prevents issues from being able to make it through to production.
But the main reason would be to reduce the cost of using this feature
long-term.

> you can choose not to use string concatenation (I haven't needed to).

Waitwhat?

Is your position both that preserving literal-ness across string
concatenation is required, otherwise this feature is too hard to use,
and at the same time, you've not needed that in your own applications.

Is that right? Because if preserving literal-ness across strings
wasn't required for you...why would it be required for every other
project?

And to be clear, I don't think it's required.

I think by listening to feedback from people who aren't sure it's a
good idea, who said "this is a good idea but only if it's really easy
to start using it" that this RFC has been watered down from the most
useful proposal. At the core, there is a good idea behind this RFC,
but the set of trade-offs chosen just aren't the right ones, and
aren't the "proven" trade-offs made at Google.

cheers
Dan
Ack

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php



Re: [PHP-DEV] [RFC] [VOTE] is_literal

2021-07-11 Thread Craig Francis
On Sun, 11 Jul 2021 at 18:09, Dan Ackroyd  wrote:

> As I said in my reply to Rowan, making it easy to track down issues where
> they occur, minimises the cost of using this feature over the years it
> would be used.
>


This implementation allows you to do all of that.

If you find debugging is a problem, you can choose not to use string
concatenation (I haven't needed to).

The implementation allows you to use a $query array, exactly as you
describe:
https://3v4l.org/aCFIT/rfc#vrfc.literals

And the RFC itself provides you with a userland version of the
`literal_concat()` function as you proposed, allowing anyone to customise
it to their needs (e.g. only throwing an exception during development).

But forcing every single project in existence to replace every instance of
concatenation, especially when it is not necessary, and doesn’t improve
security in any way, I would argue makes it too strict, and would harm
adoption.

Also, thanks for mentioning in your email to Rowan the problems with static
analysis.

Craig


Re: [PHP-DEV] [RFC] [VOTE] is_literal

2021-07-11 Thread Dan Ackroyd
On Tue, 6 Jul 2021 at 10:32, Craig Francis  wrote:
> which will result in too many invalid errors, requiring them to make
> substantial/unnecessary changes

My understanding is that the majority of PHP developers use one of the
many frameworks available. All of those provide libraries that people
use rather than accessing the DB directly.

At the other end of the scale, you have companies with large-ish
engineering teams (e.g. 30+) where access to the DB is done through an
in-house library.

The only people who don't already go through an access layer of code
to reach the DB are people who don't depend on frameworks, and
maintain their own applications with a small (or singular) engineering
team.

This type of person may be over-represented in internals, but are
probably relatively few in number compared to the majority of PHP
users.

> requiring them to make substantial/unnecessary changes

My very strong suspicion is that if/when people start using this, the
majority of changes that would be required would be actual security
problems that ought to be fixed.

> (i.e. replacing every string concat with
> `literal_concat()`, or using a special Query Builder for their
> SQL/HTML/CLI/etc).

Most people should be using a library for building HTML and CLI
escaping, because remembering how to do the escaping properly is
really difficult. I certainly never remember the difference between
the correct escaping for CSS and JS.

But the changes required for supporting DB queries just don't seem
that big to me. I've put a code example of an implementation below.

I'm pretty sure that the changes needed to fix the actual security
problems that exist in their code would be bigger, and also that
tracking down a single leaked non-literal would take longer than
migrating code to use the new feature.

As I said in my reply to Rowan, making it easy to track down issues
where they occur, minimises the cost of using this feature over the
years it would be used.

cheers
Dan
Ack


# Before

$query = "select * from preferences user_id = " . $_GET['user_id'];

db_exec($query);

# After

$query[] = "select * from preferences user_id = ";
$query[] = $_GET['user_id'];

db_exec($query);

And the function db_exec would be changed to something like:

function db_exec(string|array $query_parts)
{
if (is_string($query_parts) && !is_literal($query_parts)) {
throw new \Exception("Cannot use non literal string as query.
Please pass the parts in as an array");
} else {
foreach ($query_parts as $query_part) {
if (is_string($query_part) && !is_literal($query_part) {
throw new \Exception("non-literal string found [$query_part]");
}
else if (is_int($query_part)) {
// todo - decide if you want to allow this or not.
// I personally wouldn't.
}
else {
// todo - support other types
}
}

$query = implode("", $query_parts);
}

// rest of db_exec here...
}

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php



Re: [PHP-DEV] [RFC] [VOTE] is_literal

2021-07-11 Thread Dan Ackroyd
Rowan Tommins  wrote:

> I still don't follow this reasoning, for the reasons I outlined here:
> ...
> Whether "$foo . $bar" is always non-literal, or non-literal only if one
> of its operands is, you're going to get an error about a non-literal
> string somewhere else in the program, and have to trace back to find
> where the "bad" concatenation happened.


There are two differences. The first is "even if there is a problem,
you at least can find the relevant code".

With string concatenation preserving literal flags, when a problem is
detected, the situation is: "There is a non-literal somewhere in this
chunk of HTML. Good luck finding it in the string, and then figuring
out where it comes from in the code."

With having to jump through the hoop of using a specific function to
preserve the literal flag, the situation is: "This line of code is
wrong. It needs to be changed to use escaping."

To be precise, you wouldn't _have_ to trace back to find where the
non-literal value comes from. At the point where the error is you
would probably just change it to use the appropriate escaping. e.g. if
this line of fails, because the color suddenly becomes a non-literal:

literal_concat("");

you don't need to figure out where the color is coming from, you can
just change it to use a function that does the appropriate escaping:

html("", $up->getColor());

In general, any use of literal_concat() where the correctness of it
isn't obvious from reading the code, should probably start a
conversation of "have you considered defaulting to using
escaping/prepared query here"?

# It prevents the problem reaching an end-user.

The solution at Google prevents any security problem from reaching the
end-user by refusing to compile code.

One of the problems with PHP is that it is 'normal' for logs to be
filled with warnings...which means that most people just ignore them.

By allowing security problems to get through, rather than defaulting
to being an error means that they will reach end-users, and the
warning messages will end up in log files, and the process for finding
and fixing them will take more time than people will like.

So instead of defaulting to safe, it will default to being ignorable.
And the whole point (in my understanding) of Chris Kern's talk, is
that you need to make software safe by default.


> What it won't do, is tell you when you've forgotten to use it,

Which is a huge difference.

If you're aware that you're touching a security sensitive bit of code,
you're likely to do the right thing anyway, and so won't need the
crutch of is_literal.

If you're unaware that you're touching a security sensitive bit of
code, you're more likely to accidentally include user input where it
isn't meant to be used.

That means the feature would be annoying for the people who need it most.

Allowing errors to get through to users, for them to be allegedly
fixed at a later date is referred to as "Normalization of deviance"
and isn't a good choice for this RFC, imo.

cheers
Dan
Ack


* problems with static analysis as a solution

** Not everyone runs it, and it would be nicer to allow wordpress et
al to provide this security threat detection rather than hoping users
decided to use static analysis.

** It might require libraries also using the same static analysis annotations.

** It might get quite tricky to implement correctly, in particular for
functions where the literal-ness of the output depends on the
literal-ness of the input.

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php



Re: [PHP-DEV] [RFC] [VOTE] is_literal

2021-07-09 Thread Craig Francis
On Fri, 9 Jul 2021 at 11:49, Lauri Kenttä  wrote:

> is there a compelling reason why the internal names couldn't be marked
> literal
> (even if it's technically "wrong")?



Hi Lauri,

Just again quickly noting we’re only talking about ~0.43% difference,
nothing major in any way.

But we wanted to ensure developers didn't wonder why something was seen as
a literal in one case, but not in another (we wanted to ensure
consistency), because most people don’t know much about how the internals
of PHP work - especially optimisations which are supposed to be kind of
‘invisible’ to developers. This change also helped when a couple of other
compiler optimisations happen:

https://wiki.php.net/rfc/is_literal#compiler_optimisations

Craig


Re: [PHP-DEV] [RFC] [VOTE] is_literal

2021-07-09 Thread Lauri Kenttä

On 2021-07-07 12:23, Nikita Popov wrote:

permanent interned strings, which will be predominantly (or entirely?)
non-literal. At least it doesn't look to me like names of internal
functions/classes/etc are considered literal. I think this may be
problematic, in that now the permanent non-literal interned string in 
the

function/class tables will be different from the per-request literal
interned string used by user code to reference it.


Just wondering, if this is a major performance problem, is there a 
compelling reason why the internal names couldn't be marked literal 
(even if it's technically "wrong")? It seems unlikely to get a function 
or class name accidentally in a SQL query and even less likely that user 
input was involved.


--
Lauri Kenttä

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php



Re: [PHP-DEV] [RFC] [VOTE] is_literal

2021-07-08 Thread Máté Kocsis
Hi Nikita,

I performed a few other benchmarks in order to provide a little bit more
insights into the performance aspect of the RFC. My latest measurement is
using the same setup as the previous
one, although I made a few smaller changes in the process (e.g. changing
the order of the tests). So at the end, I got the following results:

- Laravel demo app: +0.1%
- Symfony demo app: +0.43%
- bench.php: +0.4%
- custom concat bench: +3.89%

The results are comparing the "literals" branch with the commit in master
on which Joe's work is based. As far as I remember, the simpler variant of
is_literals had a 0.1% slowdown
in case of Symfony, and 2.09% in case of the custom concat benchmark.
Personally, I think even the current numbers are acceptable, especially
because PHP 8.1 is going to be
roughly 30% faster than PHP 8.0 according to my benchmark in case of the
Symfony demo app. Laravel's result is very similar, but I don't remember
about the exact numbers.

Regards:
Máté


Re: [PHP-DEV] [RFC] [VOTE] is_literal

2021-07-07 Thread Joe Watkins
The perf numbers are for current implementation.

Predictability seems more important than simplicity, considering what the
feature is for.

The only way to make the simpler implementation predictable in the same
kind of way was found to be unacceptable during discussion.

Cheers
Joe

On Wednesday, 7 July 2021, Nikita Popov  wrote:

> On Mon, Jul 5, 2021 at 8:15 PM Craig Francis 
> wrote:
>
> > Hi Internals,
> >
> > I have opened voting on https://wiki.php.net/rfc/is_literal for the
> > is-literal function.
> >
> > The vote closes 2021-07-19
> >
> > The proposal is to add the function is_literal(), a simple way to
> identify
> > if a string was written by a developer, removing the risk of a variable
> > containing an Injection Vulnerability.
> >
> > This implementation is for literal strings ONLY (after discussion over
> > allowing integers) and, thanks to the amazing work of Joe Watkins, now
> > works fully with compiler optimisations, interned strings etc.
> >
>
> The new implementation, while being very predictable, also comes at a cost.
> The RFC doesn't really explain how this works, so let me explain it here
> (based on a cursory look through the patch):
>
> Strings have a new flag that indicates whether they are "literal". The
> problem is that PHP performs string interning, which means that certain
> strings, mostly those seen during compilation of PHP code, are
> deduplicated. If you write "foo" two times in a piece of PHP code, there
> will only be one interned string "foo". Now, what happens if there is both
> a literal string "foo" and a non-literal string "foo" and we want to intern
> both?
>
> I believe what the original implementation did is to just assume that all
> interned strings are literal strings (this is an educated guess, I don't
> know where to find the old implementation). This makes things technically
> very simple, and is a pretty sensible assumption in practice. The reason is
> that interned strings are almost always derived from literal strings in the
> program. The main exception to that are a) single character strings, where
> a lot of code will prefer fetching an interned string rather than
> allocating a new one and b) the result of optimizations in conjunction with
> opcache, which will render strings like $a=1; $b="Foo".$a; interned as a
> result of constant propagation.
>
> What the new implementation does is to actually allow two separate interned
> strings for "foo"(literal) and "foo"(not literal). Part of the hashtable
> implementation needs to be duplicated, because it now needs to distinguish
> strings that are nominally equal.
>
> The complexity involved here doesn't look terrible, but I'm unsure about
> the performance and memory usage impact. Where per-request strings are
> concerned, I expect duplication to be low, because most per-request
> interned strings are going to be literal. However, I don't think this holds
> for permanent interned strings, which will be predominantly (or entirely?)
> non-literal. At least it doesn't look to me like names of internal
> functions/classes/etc are considered literal. I think this may be
> problematic, in that now the permanent non-literal interned string in the
> function/class tables will be different from the per-request literal
> interned string used by user code to reference it. This means that all of
> these are going to be duplicated, and will no longer benefit from efficient
> identity comparison, and will have to be compared based on string contents
> instead.
>
> I'm not sure what the actual impact on performance would be. The RFC
> indicates that the impact is minor, but I believe those measurements were
> made with the original version of the RFC, which did not try to distinguish
> literal and non-literal interned strings.
>
> Overall, this is a no vote from my side. While I was not entirely convinced
> by the RFC, I was willing to accept the original approach based on sheer
> simplicity -- tracking the "probably literal" flag didn't really cost us
> anything in terms of either technical complexity or performance. With the
> new implementation this isn't the case anymore. I could be convinced that
> the technical implications here are unproblematic based on further
> analysis, but the current RFC doesn't really discuss this point at all.
>
> Regards,
> Nikita
>


Re: [PHP-DEV] [RFC] [VOTE] is_literal

2021-07-07 Thread Craig Francis
On Wed, 7 Jul 2021 at 10:23 am, Nikita Popov  wrote:

> The RFC indicates that the impact is minor, but I believe those
> measurements were made with the original version of the RFC, which did not
> try to distinguish literal and non-literal interned strings



Hi Nikita,

The performance test results in the RFC are based on the latest version,
updated on Monday (before the vote started), Máté Kocsis did these tests
independently to ensure there was no bias from me (my own tests were
roughly the same).

Joe Watkins would be better placed to explain the details of the
implementation.

We have always tried to keep the implementation as simple as possible, but
there were concerns that certain optimisations by the compiler would make
certain strings appear as literals, not in an unsafe way, but in a way that
might be confusing to developers, with the examples being noted in the RFC:

https://wiki.php.net/rfc/is_literal#compiler_optimisations

Thanks,
Craig


Re: [PHP-DEV] [RFC] [VOTE] is_literal

2021-07-07 Thread Nikita Popov
On Mon, Jul 5, 2021 at 8:15 PM Craig Francis 
wrote:

> Hi Internals,
>
> I have opened voting on https://wiki.php.net/rfc/is_literal for the
> is-literal function.
>
> The vote closes 2021-07-19
>
> The proposal is to add the function is_literal(), a simple way to identify
> if a string was written by a developer, removing the risk of a variable
> containing an Injection Vulnerability.
>
> This implementation is for literal strings ONLY (after discussion over
> allowing integers) and, thanks to the amazing work of Joe Watkins, now
> works fully with compiler optimisations, interned strings etc.
>

The new implementation, while being very predictable, also comes at a cost.
The RFC doesn't really explain how this works, so let me explain it here
(based on a cursory look through the patch):

Strings have a new flag that indicates whether they are "literal". The
problem is that PHP performs string interning, which means that certain
strings, mostly those seen during compilation of PHP code, are
deduplicated. If you write "foo" two times in a piece of PHP code, there
will only be one interned string "foo". Now, what happens if there is both
a literal string "foo" and a non-literal string "foo" and we want to intern
both?

I believe what the original implementation did is to just assume that all
interned strings are literal strings (this is an educated guess, I don't
know where to find the old implementation). This makes things technically
very simple, and is a pretty sensible assumption in practice. The reason is
that interned strings are almost always derived from literal strings in the
program. The main exception to that are a) single character strings, where
a lot of code will prefer fetching an interned string rather than
allocating a new one and b) the result of optimizations in conjunction with
opcache, which will render strings like $a=1; $b="Foo".$a; interned as a
result of constant propagation.

What the new implementation does is to actually allow two separate interned
strings for "foo"(literal) and "foo"(not literal). Part of the hashtable
implementation needs to be duplicated, because it now needs to distinguish
strings that are nominally equal.

The complexity involved here doesn't look terrible, but I'm unsure about
the performance and memory usage impact. Where per-request strings are
concerned, I expect duplication to be low, because most per-request
interned strings are going to be literal. However, I don't think this holds
for permanent interned strings, which will be predominantly (or entirely?)
non-literal. At least it doesn't look to me like names of internal
functions/classes/etc are considered literal. I think this may be
problematic, in that now the permanent non-literal interned string in the
function/class tables will be different from the per-request literal
interned string used by user code to reference it. This means that all of
these are going to be duplicated, and will no longer benefit from efficient
identity comparison, and will have to be compared based on string contents
instead.

I'm not sure what the actual impact on performance would be. The RFC
indicates that the impact is minor, but I believe those measurements were
made with the original version of the RFC, which did not try to distinguish
literal and non-literal interned strings.

Overall, this is a no vote from my side. While I was not entirely convinced
by the RFC, I was willing to accept the original approach based on sheer
simplicity -- tracking the "probably literal" flag didn't really cost us
anything in terms of either technical complexity or performance. With the
new implementation this isn't the case anymore. I could be convinced that
the technical implications here are unproblematic based on further
analysis, but the current RFC doesn't really discuss this point at all.

Regards,
Nikita


Re: [PHP-DEV] [RFC] [VOTE] is_literal

2021-07-06 Thread Dik Takken
On 05-07-2021 20:14, Craig Francis wrote:
> Hi Internals,
> 
> I have opened voting on https://wiki.php.net/rfc/is_literal for the
> is-literal function.

I am glad to see that the RFC eventually turned out as originally
proposed. It is simple and provides clear and strong guarantees about
the origins of string data.

While it has its limitations I do see the potential it has as a building
block for building strong security guarantees in applications.

Success of this feature probably depends on it being integrated in
frameworks. If I had a vote I would vote along with framework authors.

Regards,
Dik Takken

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php



Re: [PHP-DEV] [RFC] [VOTE] is_literal

2021-07-06 Thread Craig Francis
On Tue, 6 Jul 2021 at 7:38 am, G. P. B.  wrote:

> Although I think the idea of the feature is useful,
> I'm not so sure about the implementation.
> [...]
> Whereas using a function like concat_literal() which checks that the
> inputs are indeed literals provides immediate feedback that the type
> constraint is not being violated.




Hi George,

Thank you for your message.

We have provided a userland `literal_concat()` function in the RFC to do
exactly what you’re suggesting, while allowing developers to choose to do
things like raise exceptions during development/testing, and ignore/log
issues when running in production. So you absolutely can use it like that
if you want.
https://wiki.php.net/rfc/is_literal#support_functions

We also agree that a dedicated type would be useful, but as noted by Joe
and someniatko, that should come in 8.2 once the function is established
(allowing us to potentially build on Intersection Types, and will involve a
separate discussion). This is noted under "Future Scope".
https://externals.io/message/114835#114847

The only difference is that we decided to allow string concatenation of
literals, as we want to provide something that’s usable for everyone
immediately, provides the same level of security, and doesn’t require a
mass rewriting of existing code. (Which is basically a death sentence for
most security-based improvements, as a lot of people won’t have the
time/energy to do that. The more automatic security can be, the better,
which is why something libraries can implement is ideal).
https://wiki.php.net/rfc/is_literal#string_concatenation

Excluding concatenation would almost certainly prevent libraries from using
this check, simply because developers do use concatenation, which will
result in too many invalid errors, requiring them to make
substantial/unnecessary changes (i.e. replacing every string concat with
`literal_concat()`, or using a special Query Builder for their
SQL/HTML/CLI/etc).

It’s also worth noting that developers who want to use `strict_types` are
probably using static analysis already, where Psalm has just added support
for this (thank you Matthew):
https://github.com/vimeo/psalm/releases/tag/4.8.0

Thanks,
Craig


Re: [PHP-DEV] [RFC] [VOTE] is_literal

2021-07-06 Thread Rowan Tommins

On 06/07/2021 07:38, G. P. B. wrote:

This is I think the main issue with the current shape of the proposal.
This implementation will detect certain security issues, but finding the
root cause for them is going to be rather complicated, as the concatenation
operation is basically kicking the can down the road about the
responsibility of checking whether or not the result is a literal.
Whereas using a function like concat_literal() which checks that the inputs
are indeed literals provides immediate feedback that the type constraint is
not being violated.



I still don't follow this reasoning, for the reasons I outlined here: 
https://externals.io/message/114835#114868 and again later: 
https://externals.io/message/115037#115156


You can write your own concat_literal() function with the current 
implementation:


function concat_literal(string $a, string $b): string {
   if( ! is_literal($a) || ! is_literal($b) ) {
    throw new TypeError;
   }
   return $a . $b;
}

This does everything a native version would, and can be used in all the 
same places.


What it won't do, is tell you when you've forgotten to use it, and used 
the normal string concatenation operator - but nor would a built-in 
implementation.


Whether "$foo . $bar" is always non-literal, or non-literal only if one 
of its operands is, you're going to get an error about a non-literal 
string somewhere else in the program, and have to trace back to find 
where the "bad" concatenation happened.


Regards,

--
Rowan Tommins
[IMSoP]

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php



Re: [PHP-DEV] [RFC] [VOTE] is_literal

2021-07-06 Thread G. P. B.
On Mon, 5 Jul 2021 at 20:15, Craig Francis  wrote:

> Hi Internals,
>
> I have opened voting on https://wiki.php.net/rfc/is_literal for the
> is-literal function.
>
> The vote closes 2021-07-19
>
> The proposal is to add the function is_literal(), a simple way to identify
> if a string was written by a developer, removing the risk of a variable
> containing an Injection Vulnerability.
>
> This implementation is for literal strings ONLY (after discussion over
> allowing integers) and, thanks to the amazing work of Joe Watkins, now
> works fully with compiler optimisations, interned strings etc.
>
> Craig
>

Hi Craig,

Although I think the idea of the feature is useful,
I'm not so sure about the implementation.

I watched the talk you referenced a couple of times at different points in
time (the first being a couple of years back), and I fail to see how this
RFC is a similar implementation to it. As how they do it at Google is to
have it part of the type systems (arguably in a weird way but nonaless),
and due to the language being compiled the compiler will just flat out
refuse to produce an executable if the types mismatch.

>From my understanding, the RFC's implementation is similar to what Google
does, which is to "annotate" the string, but without having the guarantees
of a compiler to back it up. This approach is totally reasonable for static
analysis, as running it is akin to the compilation step in checking the
validity.
However, having this approach built into the language itself seems rather
problematic to me.
Ideally we would want to assign a variable to be of 'literal' type to
ensure none of the actions applied to it demote it from being a literal,
and when such a demotion would occur, for it to TypeError.
Due to PHP's nature we cannot do this (yet?), therefore overloading the
concatenation operation seems rather unwise.
The case where concatenation between a literal and a non-literal happens,
without error, is very similar to passing around a nullable type until one
function/method/property doesn't accept null where it blows up into your
face, and you need to track down where on earth did the null value came
from, which might be multiple calls prior.
And there has been a hell of a lot of talks/articles/etc. about *not* using
nullable types due to this issue.

This is I think the main issue with the current shape of the proposal.
This implementation will detect certain security issues, but finding the
root cause for them is going to be rather complicated, as the concatenation
operation is basically kicking the can down the road about the
responsibility of checking whether or not the result is a literal.
Whereas using a function like concat_literal() which checks that the inputs
are indeed literals provides immediate feedback that the type constraint is
not being violated.

Due to this reason, I'm voting against this proposal.

Best regards,

George P. Banyard