Re: [PHP-DEV] Proposal: clamp

2021-06-22 Thread Marco Pivetta
Hey Kim

On Wed, Jun 23, 2021, 02:25 Kim Hallberg  wrote:

> This function is very easy to implement, has no side effects or backward
> compatibility issues, unless a userland implementation already has the
> function declared,
>

I like the function, the name, and its simplicity.

The problem is exactly the fact that it is trivial to implement in
userland: why not do it there instead?

>


[PHP-DEV] Proposal: clamp

2021-06-22 Thread Kim Hallberg
Greetings internals,

I present to you a proposal for a new basic math function: clamp.

The main goal of this function is to contain a number inside a given bound.
And return the number if it is inside of that bound, and if not, and the 
number is outside of the given bound, the nearest bound is returned.

Now, does this need an RFC? The behaviour can be implemented in userland 
in a number of ways, one of which is to compare the values in a if statement.
Another is to use the min and max math functions.

max($min, min($max, $num));

I don’t have any schpeel as to why this should be in core, only that I thought 
it already was.
Considering it is part of other languages standard math library, and it is a 
useful function.

Can see this being useful in image manipulation, server-side rendering and math 
just to name few.

The proposed signature follows the convention of other math related functions.

function clamp(int|float $num, int|float $min, int|float $max): int|float {}

This function is very easy to implement, has no side effects or backward
compatibility issues, unless a userland implementation already has the function 
declared,
in which case PHP will throw a fatal error since the function cannot be 
redeclared.

I've implemented this feature a few months ago already.
- Link: https://github.com/thinkverse/php-src/pull/1 


What are your opinions on this function?

Regards,
Kim Hallberg.


Re: [PHP-DEV] [RFC] is_trusted - was is_literal

2021-06-22 Thread Moritz Friedrich
> Am 22.06.2021 um 20:38 schrieb Stephen Reay :
> 
> It took me about a minute to think of this:
> 
> "select * from customer_purchases where {$column} = :value”. 
> 
> The developer inadvertently passes the same “trusted value” in as the 
> `$column` substitute and the value parameter. It must be safe because we ran 
> it through `is_trusted`!
> 
> The query now executes as:
> 
> "select * from customer_purchases where 12345 = 12345”
> 
> 
> You cannot magically make all dynamically generated queries safe - they tried 
> that about a quarter of a century ago. Hint: it did not end well - and 
> explicitly allowing some user input is just mind boggling given the stated 
> goals.

There is a difference between “safe”, as in “safe to insert anywhere in a 
query”, and “trusted”, as in “trusted to not originate from user input”. 

Untrusted input is well understood and documented as a concept. PHP provides 
enough foot guns to choose from if you like taking function names too 
literally. 

This discussion focuses way too much on possible abuse by application 
developers, while it is actually a low-level feature for library authors which 
I trust very well to use it appropriately, and with opt-in configuration.

Regards
Moritz


-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php



Re: [PHP-DEV] [RFC] Deprecate boolean to string coercion

2021-06-22 Thread tyson andre
Hi George P. Banyard,

> With Ilija we are proposing a short RFC to deprecate coercion from bool to
> string:
> https://wiki.php.net/rfc/deprecate-boolean-string-coercion
> 
> As this is the final day for any RFC to be even able to land in PHP 8.1
> the voting is expected to start in two weeks on the 6th of July.
> 
> The implementation is yet to be done but is expected to be rather
> straightforward and finished within the week.

I'd agree any casts from booleans to strings are usually a bug in the 
application

Something I'd like to see in the rfc: What's the intended behavior (notices) of 
`sprintf('%s', false);` (functions internally casting to strings)
What about `echo true; print(false);`, etc.

Thanks,
Tyson
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php



[PHP-DEV] [RFC] Deprecate boolean to string coercion

2021-06-22 Thread G. P. B.
Hello Internals,

With Ilija we are proposing a short RFC to deprecate coercion from bool to
string:
https://wiki.php.net/rfc/deprecate-boolean-string-coercion

As this is the final day for any RFC to be even able to land in PHP 8.1
the voting is expected to start in two weeks on the 6th of July.

The implementation is yet to be done but is expected to be rather
straightforward and finished within the week.

Best regards,

George P. Banyard


[PHP-DEV] Re: Add mysqi_stmt->query? (Was Sql Object Model Parser & Sanitizer)

2021-06-22 Thread Kamil Tekiela
  Sounds good to me. Looking forward to your PoC.


[PHP-DEV] Add mysqi_stmt->query? (Was Sql Object Model Parser & Sanitizer)

2021-06-22 Thread Mike Schinkel
> On Jun 22, 2021, at 12:54 PM, Kamil Tekiela  wrote:
> While it's true that a lot of the internet is using mysqli due to WordPress, 
> this doesn't change the fact that PHP already offers a solution to the 
> problem. Both PDO and mysqli support server-side prepared statements. 
> 

Hi Kamil,

Clearly for whatever reason your arguments and Larry's arguments are 
misunderstanding and thus misrepresenting what I was proposing.  But given how 
long our respective emails are getting I do not have confidence we will get to 
clarification if we continue, so I am going to suggest we bring the original 
point of this thread to a close.  

I will just have to implement a proof-of-concept so that what I have been 
proposing will not be so hard to misunderstand.

-Mike
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php



Re: [PHP-DEV] [RFC] is_trusted - was is_literal

2021-06-22 Thread Stephen Reay


Sent from my iPhone

> On 23 Jun 2021, at 03:08, Lauri Kenttä  wrote:
> 
> On 2021-06-22 21:38, Stephen Reay wrote:
 On 22 Jun 2021, at 21:38, Craig Francis  wrote:
>>> If you can point me to an example where including integers in this has
>>> introduced a security vulnerability then please do, and I mean it, that’s
>>> what this process is for, I genuinely want people to come forward with them
>>> so we can refine this.
>> It took me about a minute to think of this:
>> "select * from customer_purchases where {$column} = :value”.
>> The developer inadvertently passes the same “trusted value” in as the
>> `$column` substitute and the value parameter. It must be safe because
>> we ran it through `is_trusted`!
>> The query now executes as:
>> "select * from customer_purchases where 12345 = 12345”
>> You cannot magically make all dynamically generated queries safe -
>> they tried that about a quarter of a century ago. Hint: it did not end
>> well - and explicitly allowing some user input is just mind boggling
>> given the stated goals.
> 
> 
> Your example is interesting and kind of valid. Looks like there is a bug in 
> the code: $column is sometimes an user-defined integer and sometimes a valid 
> literal column name, clearly this should not happen. (If it was supposed to 
> be integer, you would pass it as a parameter just like :value, right?)
> 
> Is this RFC about preventing bugs (accidentally used wrong variable) or 
> preventing bad code (user input intentional but used the wrong way)? I 
> thought it was more the about bad code. Bugs can live even in literal queries 
> as well as outside queries, where is_literal/is_trusted can't reach them.
> 
> Another line of thought: One possible approach would be to accept only 
> explicit integer casts (sprintf %d and %u, and a new function like 
> implode_ints() and/or strval_int()) and otherwise only accept strings. This 
> would avoid the accidental case where $x is supposed to be a trusted string 
> but is an untrusted integer instead, like the given example.
> 
> -- 
> Lauri Kenttä
> 

Yes it very deliberately shows a case where developer error can lead to 
incorrect behaviour, because of a bug. I don’t think it’s a stretch to say that 
not validating the dynamic elements used to generate a query would also be 
considered a bug caused by developer error.

Given that this is meant to be a security feature, an approach of “well this is 
probably fine” seems like a pretty cavalier approach to start from, especially 
given that there are existing solutions to prevent injections of integer values.

Rather than people keep repeating “integers aren’t injections”, perhaps someone 
can show just one example why you’d need to use a literal integer as an 
identifier in sql (ie a table name etc) and thus can’t use parameterisation.

Re: [PHP-DEV] [RFC] Name issue - is_literal/is_trusted

2021-06-22 Thread Kamil Tekiela
FWIW, I would prefer is_literal, but without integers in scope. Modern code
is often type hinted. I would expect that a lot of libraries would accept
only strings and then check whether it's a literal string. I don't think
accepting integers in scope increases security or improves ease of use.
However, I am more concerned about other things that have been mentioned in
this thread and the overall disagreement. I'd rather not rush this change.
Also, I am concerned about the performance of applications that are heavy
on string processing. Although I haven't tested this yet myself.

--Kamil


Re: [PHP-DEV] [RFC] is_trusted - was is_literal

2021-06-22 Thread Lauri Kenttä

On 2021-06-22 21:38, Stephen Reay wrote:
On 22 Jun 2021, at 21:38, Craig Francis  
wrote:


If you can point me to an example where including integers in this has
introduced a security vulnerability then please do, and I mean it, 
that’s
what this process is for, I genuinely want people to come forward with 
them

so we can refine this.



It took me about a minute to think of this:

 "select * from customer_purchases where {$column} = :value”.

The developer inadvertently passes the same “trusted value” in as the
`$column` substitute and the value parameter. It must be safe because
we ran it through `is_trusted`!

 The query now executes as:

 "select * from customer_purchases where 12345 = 12345”


You cannot magically make all dynamically generated queries safe -
they tried that about a quarter of a century ago. Hint: it did not end
well - and explicitly allowing some user input is just mind boggling
given the stated goals.



Your example is interesting and kind of valid. Looks like there is a bug 
in the code: $column is sometimes an user-defined integer and sometimes 
a valid literal column name, clearly this should not happen. (If it was 
supposed to be integer, you would pass it as a parameter just like 
:value, right?)


Is this RFC about preventing bugs (accidentally used wrong variable) or 
preventing bad code (user input intentional but used the wrong way)? I 
thought it was more the about bad code. Bugs can live even in literal 
queries as well as outside queries, where is_literal/is_trusted can't 
reach them.


Another line of thought: One possible approach would be to accept only 
explicit integer casts (sprintf %d and %u, and a new function like 
implode_ints() and/or strval_int()) and otherwise only accept strings. 
This would avoid the accidental case where $x is supposed to be a 
trusted string but is an untrusted integer instead, like the given 
example.


--
Lauri Kenttä

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php



Re: [PHP-DEV] [RFC] Name issue - is_literal/is_trusted

2021-06-22 Thread Derick Rethans
On 22 June 2021 19:11:05 BST, Craig Francis  wrote:
>Hi Internals,
>
>As the name `is_trusted()` seems to be causing contention, I think more
>than the alternative option would. Since we want to get this right, and
>we
>still have time before the feature freeze, this might be worth
>re-looking
>at. Particularly if you are one of those unsure about it, read on.

Just changing the name to something not misleading isn't going to change my 
opinion about voting for this. It's grown into a concoction of different things 
than the original is_literal, that just checks a single string for its 
literalness. 

cheers, 
Derick 

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php



Re: [PHP-DEV] [RFC] Name issue - is_literal/is_trusted

2021-06-22 Thread Craig Francis
On Tue, 22 Jun 2021 at 20:38, Mel Dafert  wrote:

> Another idea - `is_internal()`, since it is not external code, and
> internal would be the
> opposite of external.



Unfortunately, because we cannot record internal vs external integers (too
big of a change to how integers are stored), we are currently allowing all
integers, as that helps adoption, without affecting security.

Craig


Re: [PHP-DEV] [RFC] Name issue - is_literal/is_trusted

2021-06-22 Thread Mel Dafert
>- `is_trusted()` - suggested by Moritz and separately by Mike, was the
>second option in the original vote, and was based on the idea that what is
>returned can be 'trusted' to be free from external code.

Another idea - `is_internal()`, since it is not external code, and internal 
would be the
opposite of external.
Not quite sure how helpful/realistic this one is, but I wanted to bring it up 
either
way.

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php



Re: [PHP-DEV] [RFC] is_trusted - was is_literal

2021-06-22 Thread Stephen Reay


> On 22 Jun 2021, at 21:38, Craig Francis  wrote:
> 
> If you can point me to an example where including integers in this has
> introduced a security vulnerability then please do, and I mean it, that’s
> what this process is for, I genuinely want people to come forward with them
> so we can refine this.


It took me about a minute to think of this:

 "select * from customer_purchases where {$column} = :value”. 

The developer inadvertently passes the same “trusted value” in as the `$column` 
substitute and the value parameter. It must be safe because we ran it through 
`is_trusted`!

 The query now executes as:

 "select * from customer_purchases where 12345 = 12345”


You cannot magically make all dynamically generated queries safe - they tried 
that about a quarter of a century ago. Hint: it did not end well - and 
explicitly allowing some user input is just mind boggling given the stated 
goals.





[PHP-DEV] [RFC] Name issue - is_literal/is_trusted

2021-06-22 Thread Craig Francis
Hi Internals,

As the name `is_trusted()` seems to be causing contention, I think more
than the alternative option would. Since we want to get this right, and we
still have time before the feature freeze, this might be worth re-looking
at. Particularly if you are one of those unsure about it, read on.



The name `is_trusted()` was chosen by a community vote over several days.
While I’m of a similar opinion that "trusted" might be misleading for some
in the strength of its word, I do not want to simply override 18 of the 21
people, who I assume read the RFC, asked questions to clarify on the
mailing list, understood how it works, and have chosen that name.

However, clearly some people missed the vote and its discussion time, and
some voted but then perhaps wanted clarifying on what the RFC was fully
about later. If we say that's about five people, then assuming there is a
larger audience who reads but does not post (as the voting numbers
indicated) then I'm inclined to guesstimate that maybe that means 3x the
number of people share those feelings. And with that number it starts to
feel like maybe we should double-check here.

While a one-word name is always going to be misunderstood by some people,
we want to be as clear as possible.

The Function:
- Is a security-based function that prevents Injection Vulnerabilities in
PHP.
- Flags strings written by the developer, including when concatenated.
- Also accepts integer values, which as purely numerical cannot contain
code/dangerous characters. (Due to technical limitations within PHP, it's
not possible for these to be flagged as user or developer in the codebase
itself without performance issues).

(RFC for full breakdown: https://wiki.php.net/rfc/is_literal)

Ideally we want a one-word name that suggests this as best we can - one
word to be consistent with other `is_*()` functions.

- `is_literal()` was my original placeholder name. However, given that it's
not just literal strings, but also supports concatenation and integers I
felt it may be misleading with the definition of 'literal' stretched so far
it might get confusing, and is why I didn't include my original name for it
in the poll. However, if you feel it would be more accurate my mind isn't
fixed on it.

- `is_known()` - suggested by Joe, who created the implementation, was one
of two options in the original vote, and was based on the principle that
the value be 'known' to the developer to be free from external code and be
within a 'known' understanding of the values that should be going in it.

- `is_trusted()` - suggested by Moritz and separately by Mike, was the
second option in the original vote, and was based on the idea that what is
returned can be 'trusted' to be free from external code.

I suggest that people who are serious in their feelings about this, offer
the name that they would prefer (including potentially making one
themselves that fits the RFC content and style mentioned above) so we can
assess whether the current name needs a second look.

Thanks,
Craig


Re: [PHP-DEV] Sql Object Model Parser & Sanitizer (was [RFC] is_literal)

2021-06-22 Thread Kamil Tekiela
This is open to SQL injection:

$queryBuilder
->select(...$_GET['columns'])
->from($_GET['table'])
->where($_GET['where'])
;

All below statements produce 42. This is valid SQL:

SELECT `42 FROM TABLE`() FROM dual;
SELECT `⠀` FROM `⠀`;
SELECT * FROM "42"; -- With ANSI_QUOTES
SELECT * FROM "";

This is valid in MySQL:

VALUES ROW(42)

This is valid in MariaDB:

VALUES (42);

This is not a valid SQL:

SELECT * FROM "\"\"";

There are also windows functions, CTE, Stored procedures, and a bunch of
new features.


Re: [PHP-DEV] Sql Object Model Parser & Sanitizer (was [RFC] is_literal)

2021-06-22 Thread Larry Garfield
On Tue, Jun 22, 2021, at 11:46 AM, Mike Schinkel wrote:

> Maybe this will help. There are a billion XML schemas, but DomDocument 
> and its related classes can process them all. A SqlObjectModel would be 
> similar; it would know how to process text queries where the dialect 
> interface implementors would be the equivalent of the XML schema.  
> 
> It is not a perfect analogy, but it is close enough.

It's a very helpful analogy, actually.  SQL is 100x more complex of a language 
than XML is.  XML is trivial by comparison.  That's exactly why what you 
describe is vastly harder than you think it is, even if you limit yourself to 
the rudimentary ancient SQL standards.

And a tool that breaks down and becomes unusable as soon as you start using 
anything even vaguely modern is a tool I would rather *not* exist, quite 
frankly, as it only further discourages anyone learning SQL beyond MySQL 3.

--Larry Garfield

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php



Re: [PHP-DEV] Sql Object Model Parser & Sanitizer (was [RFC] is_literal)

2021-06-22 Thread Kamil Tekiela
While it's true that a lot of the internet is using mysqli due to
WordPress, this doesn't change the fact that PHP already offers a solution
to the problem. Both PDO and mysqli support server-side prepared
statements.

We don't talk about WordPress. They should not hold back PHP. That project
is so behind that they still offer mysql_* API in their abstraction
library. Nonetheless, whatever solution we offer, they're unlikely to pick
it up. They haven't used prepared statements for whatever reason, and they
created their own client-side prepared statements. What makes you think
that if we write "Parser & Sanitizer", they're going to start using it? The
proposal for `is_literal` is going to be completely useless to WordPress as
they have no place to use it in their codebase. It's a pity we can't help
them, but it's true.

To clarify the proposal from Craig, I would like to point out that
`is_literal()` will not help with the data being passed to SQL. User input
cannot be trusted in any sense, no matter where it comes from. The proposal
for `is_literal()` helps libraries that try to build the SQL dynamically,
without the user input. For example, it would help to write a secure method
like this:

function select(mysqli $mysqli, string $table) {
if(!is_literal($table)) {
throw new RuntimeException('The table name has to be a hardcoded string');
}

return $mysqli->query("SELECT * FROM {$table}")->fetch_all();
}

This code would prevent the developer from misusing this function by
passing user input to it.

>That is why the title of this email said "Parse and Sanitizer" not
"Builder."

Then we definitely don't need that. Both PDO and mysqli can escape string
literals in a context aware manner (to some extent at least). PDO tried to
implement a parser, but it proved challenging and to this day doesn't work
correctly. As Larry explained, SQL is not a standardised language anymore.
The flavours diverged and created a widely inconsistent language.

To build a context-aware parser for SQL, you would need to know:

- The SQL flavour
- The database type (MySQL or MariaDB)
- The database version
- The settings maintained on that database server

For example, in MySQL, you need to know which character is used to escape
quotes in string literals, and you need to know the SQL_MODE. Then you need
to know which syntax is allowed and understood by that specific MySQL
version. The SQL syntax differs a lot between MySQL and MariaDB. Building,
parsing and compiling the SQL in PHP would involve rebuilding the same
parser that MySQL server is using. And you'd have to do it for every single
possible database vendor/version. On top of that, you need to have direct
access to the database/table/column/SP/functions names, so that your parser
can apply the business logic. Not to mention escaping the data if you don't
want to use prepared statements.

>OTOH, what about when a developer needs to parameterize field and table
names in PDO

You don't parameterize field and table names. Period. You hardcode these
names in your business logic. Here's an example from DBAL:

$queryBuilder
->select('id', 'name') // <-- Hardcoded
->from('users') // <-- Hardcoded
->where('email = ?') // <-- Hardcoded
->setParameter(0, $userInputEmail) // <-- Parameterized
;

Irrelevant of which parameterization technique is used here, the idea is
that you never parameterize SQL code. You parameterize the data. WordPress
does exactly the same with their prepare() method. If you need to have
dynamic field identifiers, then it's your job as a developer to write the
code that can safely handle such requirement.

A native PHP "SQL Object Model with parser and sanitizer functionality"
wouldn't prevent SQL injection in the same way that userland DB abstraction
libraries don't. You can still create SQL injection with wordpress $wpdp
class, you can do it with DBAL, and you can do it with PDO.

To summarize, we are not looking for a solution to prevent SQL injection or
to help Wordpress. We are looking for a solution that would prevent the
misuse of libraries like DBAL, so that methods like select(), from(), and
where() could generate a warning when used with non-hardcoded data.

Regards,
Kamil

P.S. By "sanitization" I understand removing unwanted information from a
piece of data. SQL injection cannot be prevented by removing data. It can
only be prevented by strict separation of SQL and data, or by careful
formatting of the data when included in the SQL string itself. This
includes quoting, type casting and escaping of quotes. Sanitization cannot
prevent SQL injection!


Re: [PHP-DEV] Sql Object Model Parser & Sanitizer (was [RFC] is_literal)

2021-06-22 Thread Mike Schinkel


> On Jun 22, 2021, at 11:41 AM, Larry Garfield  wrote:
> 
> On Tue, Jun 22, 2021, at 8:39 AM, Mike Schinkel wrote:
>>> On Jun 22, 2021, at 9:00 AM, Kamil Tekiela  wrote:
>>> 
>>> Hi Mike,
>>> 
>>> Please don't do this. We already have PDO with prepared statements. The 
>>> data must be bound. This is the secure way of writing SQL queries. 
>> 
>> The problem is that over 40% of the web currently runs on PHP code that 
>> using mysqli.  That CMS does not support PDO nor prepared statements, 
>> and is unlikely to switch to it anytime some in the foreseeable future. 
> 
> WordPress is not going to leverage anything we do here until and unless there 
> is a major change of leadership and culture at that project.  Please don't 
> waste any mental effort on it; they clearly waste no mental effort on what 
> the rest of the PHP community considers good, secure practices.  Anything 
> involving them is tilting at windmills.

You misunderstand.  What I am (likely) "wasting" my mental effort is to discuss 
features that I would be able to use WITHOUT WordPress having to make ANY 
changes.

HOWEVER, these would be beneficial beyond WordPress, since parameterized 
queries cannot parameterize table names nor field names. 

This is all trying to address the concerns that Craig Francis brought up off 
list when he said that you cannot escape or sanitize without knowing context 
when I was asking his to provide a make_literal() function, add support for a 
IsLiteral attribute, or support an IsLiteralInterface so that people don't 
latch on to using is_literal() and make certain edge cases impossible. 

When I am trying to address a problem, I come at it from many angles until I 
find a solution.  One potential solution then is to have a class built into PHP 
that can ensure the SQL returned is indeed safe. 

> Mike, speaking as someone who has written an SQL abstraction layer and query 
> builder with significant usage (Drupal 7-9), you are *grossly* 
> under-estimating the complexity of what you describe.  It might be possible 
> to hack together for SQL92, aka "what most PHP devs actually use because they 
> haven't noticed that it's not 1992 anymore", but that's already been done.  
> We have DBTNG in Drupal, we have Doctrine, problem solved.

I think what is happening here is you are making an assumption I am proposing a 
much larger scope than I am.

Think of the scope I am proposing being on par with $mysqli->prepare(), but a 
bit more to be able to handle more than just values.

> Modern SQL, though, is a stupidly complex and stupidly inconsistent beast.  
> Most of the syntax beyond the basics is different on every damned database.  
> The official spec *is not even publicly available*, and costs a lot of money 
> to access.  And no DB engine actually supports all of it; they all support 
> different subsets with their own different extensions that may or may not be 
> comparable.

Both Modern SQL and legacy SQL are both still text-based query languages  and 
they all have grammars that can be represented by BNF rules. 

https://github.com/ronsavage/SQL

Those rules could be abstracted into a form accessible via a "dialect" 
interface and that is how these would literally any version of SQL could be 
supported.

Could we finish all of a given dialect at once?  No.  Iteration based on what 
is found to be supported is how this could be approached.  Remember, these 
dialects could be implemented in userland. By any PHP developer. 

Could we ever get them to be perfect?  Probably not.  But they would be good at 
the start and then very good and there would like to be many people providing 
PRs to fix new edge cases for the dialects that get the most use.

> Building a tool that parses an arbitrary string to an AST for a spec that is 
> inconsistent, inaccessible, and not implemented correctly by anyone is a 
> fool's errand, and that's just the first part of it.  

That is not what I am proposing.  I am proposing to build a tool that knows how 
to parse based on a simplified grammar and then sanitize based on those rules.

There really are only a few contexts in a SQL query to be concerned about.  
Keywords, field names, table name, table aliases, and typed values (plus maybe 
a few other things I missed?) The grammars themselves can evolve independently 
as userland implementations of interfaces. This is not rocket science.

> That's not even getting into designing an API for people to modify it,

No API per se, an interface.  (Yes that is also an API, but not one that you 
call; instead it calls you.)

> or questions of performance,

Parsing a SQL query should not take nearly as long as executing the query 
itself, especially if the SqlObjectModel were written in C.

> or compiling the AST back into a DB-specific string

Sounds like you are envisioning something I am not.

> AND then doing parameter binding which varies from one database to another.

Again, that would be handled by the dialect class.

> You're 

Re: [PHP-DEV] Sql Object Model Parser & Sanitizer (was [RFC] is_literal)

2021-06-22 Thread Mike Schinkel


> On Jun 22, 2021, at 11:56 AM, Pierre  wrote:
> 
> Le 22/06/2021 à 17:35, Mike Schinkel a écrit :
>> https://github.com/WordPress/WordPress/blob/master/wp-includes/wp-db.php#L2050
> Sorry for the discussion pollution here but, but ouch, plugins are still 
> using this unsafe form ? Reminds when I was a student, I learnt to 
> parametrize queries there, it was 20 years ago. I never understood people not 
> doing that in the first place.
>> But also because of much legacy code exists in the form of plugins and 
>> themes that do not support parameterized queries.
> Yes I agree, legacy is legacy, you have to deal with it. But all legacy code 
> cannot be fixed, and doing a highly complex SQL parsing / escaping / 
> vulnerability detection code that explicitly targets legacy code and not 
> modern code seems weird to me.
>> HOWEVER, whether mysqli supports parameterised queries or not is all a moot 
>> point because parameterised queries do not allow for parameterizing field 
>> names or table names.  And the point of this thread was to discuss how to 
>> mark SQL that has been composed at run-time to be "safe."  Without being 
>> able to parameterize field names and table names parameterised queries are 
>> not a sufficiently complete solution.
> 
> Not being able to parametrize table or field names is not only a problem for 
> mysqli, but it is for PDO and pgsql too. That's a place where userland 
> query-builders and others DBALs, even the most basic ones do shine, and 
> brings a real added-value.
> 
> But having anyone, writing SQL with user-given table names or column names, 
> and executing it using something like WP's _do_query() method seems like they 
> *WANT* to be hacked.

Are you not familiar with PHPMyAdmin[1] and/or Adminer[2]?  

Do they, or anyone else who has a similar use-case want to be hacked?

> I'm not sure how you will succeed in plugging the is_trusted() / is_literal() 
> / is_wathever() method correctly in an SQL Model Parser & Sanitizer anyway, 
> knowing that at this point, all you'll receive is a huge string issued by 
> some plugin API which has already done crazy dark (and probably bugguy as 
> well) magic.
> 
> I don't see how adding magic in PHP core will avoid the need to fix all those 
> legacy plugins, they probably would need themselves to use this new shiny API 
> to benefit from it ? In the opposite, if something alters the behavior of 
> mysqli implicitly for everyone in order to make it safe, it sounds like there 
> will be a lot of BC ? In both case, it seems that it will not do any shiny 
> magic to me.

It would not affect *any* of those plugins by itself.  They would be left to 
fend for their own.

What it *would* do is allow those developers writing new sites and/or plugins 
or maintaining old ones who proactively choose to sanitize their SQL to be able 
to do so before they pass their SQL to $wpdb->query(), especially if the RFC 
is_trusted() and/or is_literal() passes.

> But, I might be wrong, this thread becomes harder and harder to read, and I 
> may have missed a few points.

It just started!  Note I created a new thread from the 
is_literal()/is_trusted() thread to talk about SQL.

-Mike

[1] https://www.phpmyadmin.net/
[2] https://www.adminer.org/




Re: [PHP-DEV] Sql Object Model Parser & Sanitizer (was [RFC] is_literal)

2021-06-22 Thread Pierre

Le 22/06/2021 à 17:35, Mike Schinkel a écrit :

https://github.com/WordPress/WordPress/blob/master/wp-includes/wp-db.php#L2050
Sorry for the discussion pollution here but, but ouch, plugins are still 
using this unsafe form ? Reminds when I was a student, I learnt to 
parametrize queries there, it was 20 years ago. I never understood 
people not doing that in the first place.

But also because of much legacy code exists in the form of plugins and themes 
that do not support parameterized queries.
Yes I agree, legacy is legacy, you have to deal with it. But all legacy 
code cannot be fixed, and doing a highly complex SQL parsing / escaping 
/ vulnerability detection code that explicitly targets legacy code and 
not modern code seems weird to me.

HOWEVER, whether mysqli supports parameterised queries or not is all a moot point because 
parameterised queries do not allow for parameterizing field names or table names.  And 
the point of this thread was to discuss how to mark SQL that has been composed at 
run-time to be "safe."  Without being able to parameterize field names and 
table names parameterised queries are not a sufficiently complete solution.


Not being able to parametrize table or field names is not only a problem 
for mysqli, but it is for PDO and pgsql too. That's a place where 
userland query-builders and others DBALs, even the most basic ones do 
shine, and brings a real added-value.


But having anyone, writing SQL with user-given table names or column 
names, and executing it using something like WP's _do_query() method 
seems like they *WANT* to be hacked. I'm not sure how you will succeed 
in plugging the is_trusted() / is_literal() / is_wathever() method 
correctly in an SQL Model Parser & Sanitizer anyway, knowing that at 
this point, all you'll receive is a huge string issued by some plugin 
API which has already done crazy dark (and probably bugguy as well) magic.


I don't see how adding magic in PHP core will avoid the need to fix all 
those legacy plugins, they probably would need themselves to use this 
new shiny API to benefit from it ? In the opposite, if something alters 
the behavior of mysqli implicitly for everyone in order to make it safe, 
it sounds like there will be a lot of BC ? In both case, it seems that 
it will not do any shiny magic to me.


But, I might be wrong, this thread becomes harder and harder to read, 
and I may have missed a few points.


Regards,

--

Pierre



Re: [PHP-DEV] Sql Object Model Parser & Sanitizer (was [RFC] is_literal)

2021-06-22 Thread Larry Garfield
On Tue, Jun 22, 2021, at 8:39 AM, Mike Schinkel wrote:
> > On Jun 22, 2021, at 9:00 AM, Kamil Tekiela  wrote:
> > 
> > Hi Mike,
> > 
> > Please don't do this. We already have PDO with prepared statements. The 
> > data must be bound. This is the secure way of writing SQL queries. 
> 
> The problem is that over 40% of the web currently runs on PHP code that 
> using mysqli.  That CMS does not support PDO nor prepared statements, 
> and is unlikely to switch to it anytime some in the foreseeable future. 

WordPress is not going to leverage anything we do here until and unless there 
is a major change of leadership and culture at that project.  Please don't 
waste any mental effort on it; they clearly waste no mental effort on what the 
rest of the PHP community considers good, secure practices.  Anything involving 
them is tilting at windmills.

Mike, speaking as someone who has written an SQL abstraction layer and query 
builder with significant usage (Drupal 7-9), you are *grossly* under-estimating 
the complexity of what you describe.  It might be possible to hack together for 
SQL92, aka "what most PHP devs actually use because they haven't noticed that 
it's not 1992 anymore", but that's already been done.  We have DBTNG in Drupal, 
we have Doctrine, problem solved.

Modern SQL, though, is a stupidly complex and stupidly inconsistent beast.  
Most of the syntax beyond the basics is different on every damned database.  
The official spec *is not even publicly available*, and costs a lot of money to 
access.  And no DB engine actually supports all of it; they all support 
different subsets with their own different extensions that may or may not be 
comparable.

Building a tool that parses an arbitrary string to an AST for a spec that is 
inconsistent, inaccessible, and not implemented correctly by anyone is a fool's 
errand, and that's just the first part of it.  That's not even getting into 
designing an API for people to modify it, or questions of performance, or 
compiling the AST back into a DB-specific string, AND then doing parameter 
binding which varies from one database to another.

You're talking about reimplementing major portions of MySQL, PostgreSQL, 
Oracle, etc. themselves in PHP, all at the same time.  Well, good luck, you're 
going to need it.

Personally I've long since concluded that database portability is no longer an 
achievable or even desirable feature.  SQL is just too fragmented a language, 
leaving you with a least common denominator that is grossly under-whelming for 
modern needs.  If you want more than SQL92, it's not really viable anymore.

--Larry Garfield

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php



Re: [PHP-DEV] Sql Object Model Parser & Sanitizer (was [RFC] is_literal)

2021-06-22 Thread Mike Schinkel
> On Jun 22, 2021, at 10:35 AM, Stephen Reay  wrote:
> 
>> On 22 Jun 2021, at 20:39, Mike Schinkel  wrote:
>> 
>>> On Jun 22, 2021, at 9:00 AM, Kamil Tekiela  wrote:
>>> 
>>> Hi Mike,
>>> 
>>> Please don't do this. We already have PDO with prepared statements. The 
>>> data must be bound. This is the secure way of writing SQL queries. 
>> 
>> The problem is that over 40% of the web currently runs on PHP code that 
>> using mysqli.  That CMS does not support PDO nor prepared statements, and is 
>> unlikely to switch to it anytime some in the foreseeable future.  
>> 
>> A SQL object model parser and sanitizer could more easily be used 
>> incrementally by that CMS since PDO does not share connections with mysqli 
>> (AFAIK, anyway.)
>> 
> 
> (Resending from on-list address)
> 
> Apparently you didn't know mysqli supports parameterised queries?
> Wordpress could have adopted parameterised queries when they grudgingly 
> switched to mysqli, years after both it and PDO were introduced.

Well, yes and no.  

I admit I had forgotten it because frankly the WordPress wp-db object wrapper 
used by 99.7%[1] of WordPress developers does not support it and given I had 
long ago realized I could not use them in WP I guess my memory blocked out the 
existence of prepared statements in mysqli.

But it is not relevant in WordPress. Here is where (almost?) all SQL queries go 
to run in WordPress:

https://github.com/WordPress/WordPress/blob/master/wp-includes/wp-db.php#L2050 

Note there are no bind_param() or execute() calls anywhere in that file.  Their 
is a prepare(), but WordPress rolled their own:

https://github.com/WordPress/WordPress/blob/master/wp-includes/wp-db.php#L1302 

So saying that parameterized query is a solution leaves out 40% of the web.  
Because some other guys are the intransigent ones here. 

But also because of much legacy code exists in the form of plugins and themes 
that do not support parameterized queries.

HOWEVER, whether mysqli supports parameterised queries or not is all a moot 
point because parameterised queries do not allow for parameterizing field names 
or table names.  And the point of this thread was to discuss how to mark SQL 
that has been composed at run-time to be "safe."  Without being able to 
parameterize field names and table names parameterised queries are not a 
sufficiently complete solution.  

At least not from an is_literal()/is_trusted() perspective.

> There’s zero reason to believe they would adopt this unless forced to.

Exactly! WordPress is not going to change, so that leaves it up to PHP 
developers using WordPress to protect themselves. 

Since I am *assuming* that everyone on this list wants to minimize the number 
of security issues in PHP applications across the web by whatever reasonable 
means necessary, I also assume that pointing at WordPress and saying "It's 
their fault not ours" does not help achieve those security goals?  So I am also 
assuming that finding a pragmatic solution that would allow PHP to be able to 
empower the developer in a way that WordPress won't might be worth considering.
 
But tell me if I am assuming wrongly here.

-Mike

[1] Yes, I made that up out of whole cloth. But honestly in 10+ years working 
with WP I have almost never seen anyone use anything but the $wpdb provided by 
WP — except for one single plugin — and $wpdb does not support parameterized 
queries.

Re: [PHP-DEV] [RFC] is_trusted - was is_literal

2021-06-22 Thread Craig Francis
On Tue, 22 Jun 2021 at 3:05 pm, Stephen Reay 
wrote:

>
> On 22 Jun 2021, at 20:13, Craig Francis  wrote:
>
> On Tue, 22 Jun 2021 at 09:59, Stephen Reay 
> wrote:
>
>> So I just want to make sure I understand the progression on this so far.
>> It started out with people wanting a way to check that a string was a
>> literal string, in code somewhere, and does not come from user input. Ok
>> makes sense. The name makes sense too.
>>
>
>
>
> The primary reason was never just to define literal strings, the intention
> has always been to create a practical, implementable solution to address
> the issue of Injection Vulnerabilities (SQl/HTML/CLI/etc).
>
>
> Preventing injection vulnerabilities may be your *goal* but I’m talking
> about the *intended behaviour of this one function*.
>



I understand this is not the one true perfect solution, but that solution
does not exist without tanking performance and never being accepted solely
on that basis alone.

If you can point me to an example where including integers in this has
introduced a security vulnerability then please do, and I mean it, that’s
what this process is for, I genuinely want people to come forward with them
so we can refine this.

The only thing that you could suggest as an alternative is to not include
integers at all. That is not realistic for the reasons I’ve stated not just
here, but in replies to others, and the RFC itself. So if we can’t do that,
and we can’t mark developer-integers for reasons mentioned before, what
else is there? That we just don’t do it??

Torpedoing the idea because it is not 100% idealistically perfect, and
ignoring the fact that it, (as it stands right now with integers and all),
**Prevents Injection Vulnerabilities** - the most common security
vulnerability that dominates the top of the leaderboards again and again -
and does that without adding in any new security vulnerabilities, is just
nonsensical.

Craig


Re: [PHP-DEV] Sql Object Model Parser & Sanitizer (was [RFC] is_literal)

2021-06-22 Thread Stephen Reay



> On 22 Jun 2021, at 20:39, Mike Schinkel  wrote:
> 
>> On Jun 22, 2021, at 9:00 AM, Kamil Tekiela  wrote:
>> 
>> Hi Mike,
>> 
>> Please don't do this. We already have PDO with prepared statements. The data 
>> must be bound. This is the secure way of writing SQL queries. 
> 
> The problem is that over 40% of the web currently runs on PHP code that using 
> mysqli.  That CMS does not support PDO nor prepared statements, and is 
> unlikely to switch to it anytime some in the foreseeable future.  
> 
> A SQL object model parser and sanitizer could more easily be used 
> incrementally by that CMS since PDO does not share connections with mysqli 
> (AFAIK, anyway.)
> 

(Resending from on-list address)

Apparently you didn't know mysqli supports parameterised queries?
Wordpress could have adopted parameterised queries when they grudgingly 
switched to mysqli, years after both it and PDO were introduced.
There’s zero reason to believe they would adopt this unless forced to.

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php



Re: [PHP-DEV] [RFC] is_trusted - was is_literal

2021-06-22 Thread Stephen Reay

> On 22 Jun 2021, at 20:13, Craig Francis  wrote:
> 
> On Tue, 22 Jun 2021 at 09:59, Stephen Reay  > wrote:
> So I just want to make sure I understand the progression on this so far. It 
> started out with people wanting a way to check that a string was a literal 
> string, in code somewhere, and does not come from user input. Ok makes sense. 
> The name makes sense too.
> 
> 
> 
> The primary reason was never just to define literal strings, the intention 
> has always been to create a practical, implementable solution to address the 
> issue of Injection Vulnerabilities (SQl/HTML/CLI/etc).
> 

Preventing injection vulnerabilities may be your goal but I’m talking about the 
intended behaviour of this one function. Your original email says this:

>> Distinguishing strings from a trusted developer from strings that may be 
>> attacker controlled



If you feel that somehow doesn’t mean the same as "check that a string was a 
literal string, in code somewhere, and does not come from user input”, then we 
need to crack open a dictionary and work out which words one of us doesn’t know 
the meaning of.




> The name `is_literal()` has always just been a placeholder, it came up when I 
> first started looking at this problem because that was the most obvious thing 
> I knew we could anchor around. (Unfortunately I think it was easy to make 
> assumptions based solely on that name, rather than focussing on the issue it 
> is meant to address).
> 
> So, we cannot look for literals only - while it was part of the solution, it 
> was very much incomplete. Bearing in mind, there is considerable amount of 
> existing code and tutorials out there which include integers in their 
> SQL/HTML/CLI/etc, and they are perfectly safe in doing so. Making a solution 
> which does not support integers is not going to be adopted/used because the 
> task of rewriting and changing everything, for no benefit, will not be 
> considered by developers.
> 

There is a considerable amount of existing code that includes strings in SQL, 
HTML without danger too. Plenty of string values are fine, and plenty of 
integer values are fine. That doesn’t mean we should just blindly trust a value 
that came from the user, just because it’s a number.
The saying goes “never trust user input” not “never trust user input unless 
it’s a number”. 


> Likewise, a lot of code already builds SQL/HTML/CLI/etc strings via 
> concatenation and sprintf(), and forcing everyone to use a query builder is 
> likely to cause most people to not even consider using this.
> 

If they won’t adopt an existing solution to the problem why would they adopt 
this?
You’ve said very recently that this is not intended to be used directly by most 
developers, and instead used within libraries and frameworks. It seems a little 
weird to then make concessions that will defeat the stated goal, in the name of 
adoption. 


> It's all well thinking of one thing that might theoretically/idealistically 
> solve the issue, but it also needs to have a plan on how this will be 
> practically implemented and used by developers (which this has done).
>  


Having a plan for how to implement something doesn’t help much when the thing 
you’re implementing deliberately ignores a specific type of ‘untrusted’ input..






Re: [PHP-DEV] Sql Object Model Parser & Sanitizer (was [RFC] is_literal)

2021-06-22 Thread Mike Schinkel
> On Jun 22, 2021, at 9:00 AM, Kamil Tekiela  wrote:
> 
> Hi Mike,
> 
> Please don't do this. We already have PDO with prepared statements. The data 
> must be bound. This is the secure way of writing SQL queries. 

The problem is that over 40% of the web currently runs on PHP code that using 
mysqli.  That CMS does not support PDO nor prepared statements, and is unlikely 
to switch to it anytime some in the foreseeable future.  

A SQL object model parser and sanitizer could more easily be used incrementally 
by that CMS since PDO does not share connections with mysqli (AFAIK, anyway.)

> The idea behind SQL builder is to generate SQL, not to allow the data to be 
> sanitized. 

That is why the title of this email said "Parse and Sanitizer" not "Builder."

> Every time I hear the word sanitize I get goose bumps. You can't remove any 
> characters from a string to make it safe. If you want to use escaping, then 
> you need to do it context aware and properly formatted. Don't sanitize 
> anything. Format the SQL properly instead. 

I believe you are latching onto the word sanitizing as you understand it and 
ignoring the context of the discussion.

What I believe I am proposing is to implement an object that would be 
SQL-syntax-aware and — to use your words — "format the SQL properly."  

But maybe I am wrong?  Can you explain how what I am proposing would not be:

- "Removing the characters" needed to make it safe?  
- Context aware?

Note that I did not specifically talk about removing characters, and I did talk 
about context. I talked about parsing the SQL so as to ensure it is safe.

> On a general note. Implementing SQL builder in PHP would be an enormous task, 
> which is not feasible.

Is it possible you just scanned the first email in this thread and did not 
fully read it?  

For the email to which you replied I literally wrote:

I think we are talking two different things here.  And I will admit I am 
probably the one that is not using exactly the right term.  I had envisioned 
something, and then Dan said "SQL Builder" I used that term (wrongly.)Let me 
clarify by giving what I am referring to a different name:  A SQL Object Model 
with parser and sanitizer functionality.

What I was discussing would not be the enormous task you are referring to.  I 
was not proposing creating anything at all like these first two Google results 
for "php sql query builder: [1] and [2].  I was proposing a SQL object model, 
parser and "safe SQL generator" (I changed the last word since you triggered on 
it.)

> There are so many dialects, so many options, and even then it won't ever be 
> accurate as you don't have the full context in PHP.

My email addressed how dialects and options would be handled.  Using dependency 
injection of objects that define each dialect and provide access to the schema.

> SQL is a very powerful language, and building a parser for it in PHP would 
> mean that we either limit it to a subset of valid SQL commands, or we try to 
> create a super tool that is more powerful than MySQL, Oracle, PostgreSQL, 
> etc. combined. 

That is a false binary. It would not be a huge undertaking to create a generic 
query parser that used an object implementing an interface to provide it with 
the information needed to correctly parse a query _*enough*_ to sanitize it.  

And classes defining any given SQL subset could, if not in PHP core, be written 
in userland. And the DB vendor is a likely candidate to write them too.

> There's absolutely nothing wrong with writing SQL in PHP and preparing it on 
> the server. For database servers that don't support prepared statements we 
> already have PDO which is an abstraction library that tries to escape and 
> format data within SQL. It works 99% of the time. 

"Nothing _wrong_ with"  (per se), is true.  

But for applications that have 15 years of legacy code using mysql(i), PDO is a 
non-starter.

> The example you suggested already has a simple syntax in PHP. 
> 
> $conn = mysqli_connect(...);
> $stmt = $conn->prepare($sql);
> $stmt->execute([$_GET['openings'], $_GET['limit']]);

If PHP were to add functionality to core to use prepared statements with the 
mysqli connection, this discussion _might) be different.

But, as I said, for the CMS that controls >40% of the web PDO is a non-starter.

OTOH, what about when a developer needs to parameterize field and table names 
in PDO[3]?  PDO does not address that aspect of assembling a known-safe SQL 
string and requires manual sanitization. What I am proposing _*would*_ address 
this.  

IOW, PDO is a tool for a different albeit overlapping use-case, and could even 
leverage PDO which as applicable to achieve its objectives.

-Mike

[1] https://github.com/nilportugues/php-sql-query-builder
[2] https://github.com/ClanCats/Hydrahon
[3] https://stackoverflow.com/a/182353/102699



Re: [PHP-DEV] Sql Object Model Parser & Sanitizer (was [RFC] is_literal)

2021-06-22 Thread Pierre

Le 22/06/2021 à 15:00, Kamil Tekiela a écrit :

Hi Mike,

Please don't do this. We already have PDO with prepared statements. The
data must be bound. This is the secure way of writing SQL queries. The idea
behind SQL builder is to generate SQL, not to allow the data to be
sanitized.
Every time I hear the word sanitize I get goose bumps. You can't remove any
characters from a string to make it safe. If you want to use escaping, then
you need to do it context aware and properly formatted. Don't sanitize
anything. Format the SQL properly instead.

On a general note. Implementing SQL builder in PHP would be an enormous
task, which is not feasible. There are so many dialects, so many options,
and even then it won't ever be accurate as you don't have the full context
in PHP. SQL is a very powerful language, and building a parser for it in
PHP would mean that we either limit it to a subset of valid SQL commands,
or we try to create a super tool that is more powerful than MySQL, Oracle,
PostgreSQL, etc. combined.
There's absolutely nothing wrong with writing SQL in PHP and preparing it
on the server. For database servers that don't support prepared statements
we already have PDO which is an abstraction library that tries to escape
and format data within SQL. It works 99% of the time.

The example you suggested already has a simple syntax in PHP.

$conn = mysqli_connect(...);
$stmt = $conn->prepare($sql);
$stmt->execute([$_GET['openings'], $_GET['limit']]);

Regards,
Kamil

I fully agree with you. Any attempt to be smart in doing this will 
eventually end-up in no-so-smart corner-case bugs that will be very, 
very hard to deal with. It's almost an impossible mission. And if by any 
luck you'd succeed in making this work, it probably would be a 
maintenance nightmare.


Regards,

--

Pierre

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php



Re: [PHP-DEV] [RFC] is_trusted - was is_literal

2021-06-22 Thread Craig Francis
On Tue, 22 Jun 2021 at 09:59, Stephen Reay  wrote:

> So I just want to make sure I understand the progression on this so far.
> It started out with people wanting a way to check that a string was a
> literal string, in code somewhere, and does not come from user input. Ok
> makes sense. The name makes sense too.
>



The primary reason was never just to define literal strings, the intention
has always been to create a practical, implementable solution to address
the issue of Injection Vulnerabilities (SQl/HTML/CLI/etc).

The name `is_literal()` has always just been a placeholder, it came up when
I first started looking at this problem because that was the most obvious
thing I knew we could anchor around. (Unfortunately I think it was easy to
make assumptions based solely on that name, rather than focussing on the
issue it is meant to address).

So, we cannot look for literals only - while it was part of the solution,
it was very much incomplete. Bearing in mind, there is considerable amount
of existing code and tutorials out there which include integers in their
SQL/HTML/CLI/etc, and they are perfectly safe in doing so. Making a
solution which does not support integers is not going to be adopted/used
because the task of rewriting and changing everything, for no benefit, will
not be considered by developers.

Likewise, a lot of code already builds SQL/HTML/CLI/etc strings via
concatenation and sprintf(), and forcing everyone to use a query builder is
likely to cause most people to not even consider using this.

It's all well thinking of one thing that might theoretically/idealistically
solve the issue, but it also needs to have a plan on how this will be
practically implemented and used by developers (which this has done).

Craig


Re: [PHP-DEV] Sql Object Model Parser & Sanitizer (was [RFC] is_literal)

2021-06-22 Thread Kamil Tekiela
Hi Mike,

Please don't do this. We already have PDO with prepared statements. The
data must be bound. This is the secure way of writing SQL queries. The idea
behind SQL builder is to generate SQL, not to allow the data to be
sanitized.
Every time I hear the word sanitize I get goose bumps. You can't remove any
characters from a string to make it safe. If you want to use escaping, then
you need to do it context aware and properly formatted. Don't sanitize
anything. Format the SQL properly instead.

On a general note. Implementing SQL builder in PHP would be an enormous
task, which is not feasible. There are so many dialects, so many options,
and even then it won't ever be accurate as you don't have the full context
in PHP. SQL is a very powerful language, and building a parser for it in
PHP would mean that we either limit it to a subset of valid SQL commands,
or we try to create a super tool that is more powerful than MySQL, Oracle,
PostgreSQL, etc. combined.
There's absolutely nothing wrong with writing SQL in PHP and preparing it
on the server. For database servers that don't support prepared statements
we already have PDO which is an abstraction library that tries to escape
and format data within SQL. It works 99% of the time.

The example you suggested already has a simple syntax in PHP.

$conn = mysqli_connect(...);
$stmt = $conn->prepare($sql);
$stmt->execute([$_GET['openings'], $_GET['limit']]);

Regards,
Kamil


[PHP-DEV] Re: Sql Object Model Parser & Sanitizer (was [RFC] is_literal)

2021-06-22 Thread Pierre

Le 22/06/2021 à 14:45, Mike Schinkel a écrit :


I think we are talking two different things here.  And I will
admit I am probably the one that is not using exactly the right
term.  I had envisioned something, and then Dan said "SQL Builder"
I used that term.

Let me clarify by giving what I am referring to a different name:
 A SQL Object Model with parser and sanitizer functionality.  One
with a simple interface and one that would need not be tied to or
limited by any specific dialect of SQL.

To illustrate what I mean I created the following straw man
example.  It assumes a hypothetical SqlObjectModel classes that
needs to be instantiated with an objects that implement
hypothetical `SqlDialectInterface` and `SqlSchemaInterface`
instances passed to its constructor.  The former defines the rules
for the MySql dialect, and the latter accesses the schema to
validate data and data types.

Some "dialects" and "validators" for very well-known and widely
used database servers such as MySQL could be implemented in PHP
core, while others could be implemented in userland (the database
vendors would be good candidate to implement them, actually.)  I
don't yet know what those interfaces should look like but I am
pretty sure that could be figured out.

So here is the example (I omitted error handling logic for brevity
and clarity):

$sql = < AS company_id,
    c.company_name,
    jo.job_title,
    COUNT(*) AS job_openings
FROM companies c
    JOIN jobs_openings jo ON c.id =jo.company_id
WHERE 1=1
    AND city_name = 'Nantes'
    AND country_name='France'
    AND job_openings > {openings}
GROUP BY
    c.company_name
LIMIT
    {limit}
SQL;

$conn = mysqli_connect(...);
$som = new SqlObjectModel(new MySqlDialect(), new MySqlSchema($conn));
$som->parse($sql);
$som->set( "limit", $_GET['limit'] );
$som->set( "openings", $_GET['openings'] );
$result = mysqli_query( $conn, $som->sql() );


The SQLObjectModel *could* potentially have additional features
like the ability to get and set the table name, list of fields,
joins, where clauses, etc. but the core value-add is not in
building SQL but rather parsing and sanitizing with properly
escaped and quoted arguments so that known-safe SQL could be
generated.

Existing or new SQL builders could use this to take
SqlObjectModel() and pass its generated SQL through to ensure
their SQL they will output is safe, which by the way the string
could now be tagged as "trusted" before the builder returns the
SQL to its developer, assuming an is_trusted() RFC were to pass.

Similarly, any ORM could ensure its SQL is fully sanitized and
then pass on to whatever functions are needed to execute the SQL.

So, absolutely nothing here would keep people from using "modern
SQL," and/or cause people to lower the expectation to normalize
SQL-92.  This could be a generic query parser and sanitizer that
could be configured by the "dialect" and "schema" objects passed
to it and its scope it small, simple and straightforward.

I hope that clarifies what I was thinking?

-Mike


Actually, it makes more sense.

It's even funnier if you consider it, it's more or less what I'm doing 
in my own SQL builder for find and replace placeholders, after the query 
builder actually built the query. I'm not replacing placeholders with 
values, but I'm replacing my own API arbitrary placeholders with those 
expected by the underlaying dialect or low-level API (I'm not naive 
enough to concatenate user strings in SQL, I'm leaving the hard escaping 
work to the server or lower-level db connector API using prepared 
queries when the low level driver doesn't handle it properly).


But that said, what you describe already intersects with most SQL 
builders own features, and since they all have very different 
architecture, I'm not sure this kind of helper would be that much 
beneficial. But that something that could make sense to be discussed.


In all cases, thanks for answering.

Regards,

--

Pierre




[PHP-DEV] Sql Object Model Parser & Sanitizer (was [RFC] is_literal)

2021-06-22 Thread Mike Schinkel
> On Jun 22, 2021, at 6:45 AM, Pierre  wrote:
> 
> Le 22/06/2021 à 11:28, Dan Ackroyd a écrit :
>> On Tue, 22 Jun 2021 at 10:25, Mike Schinkel  wrote:
>>> Should(n't?) PHP add a basic SQL builder class that can be extended for 
>>> special cases, e.g. different flavors of SQL?
>>> 
>> No. Or at least not yet.
>> 
>> This type of thing is much better done in userland, where the api can
>> evolve at a fast rate, rather than being limited by the fixed release
>> schedule of PHP.
> 
> Agreed, PHP is probably not the right place for an SQL builder, there's too 
> many dialects, too many standard or non-standard features, in the end SQL 
> builders almost always end-up being opinionated by its designer's original 
> need, philosophy, or SQL usage habits, and tailored for users which use 
> certain paradigms.
> 
> An SQL query builder is already business domain tied, in some way. Of course 
> most are very generic, but often don't handle properly SQL advanced or modern 
> features, whereas some other where built for ORMs and such and may a reduced 
> advanced SQL features that fits the original need.
> 
> I don't wish to see an SQL query builder in PHP core, instead of favoring 
> usage of modern SQL, it'll in my opinion (of course, that's subjective) lower 
> the expectation of people and probably normalize SQL-92 being the only SQL 
> spoken by people writing PHP (just exaggerating a bit).


I think we are talking two different things here.  And I will admit I am 
probably the one that is not using exactly the right term.  I had envisioned 
something, and then Dan said "SQL Builder" I used that term.

Let me clarify by giving what I am referring to a different name:  A SQL Object 
Model with parser and sanitizer functionality.  One with a simple interface and 
one that would need not be tied to or limited by any specific dialect of SQL.

To illustrate what I mean I created the following straw man example.  It 
assumes a hypothetical SqlObjectModel classes that needs to be instantiated 
with an objects that implement hypothetical `SqlDialectInterface` and 
`SqlSchemaInterface` instances passed to its constructor.  The former defines 
the rules for the MySql dialect, and the latter accesses the schema to validate 
data and data types.  

Some "dialects" and "validators" for very well-known and widely used database 
servers such as MySQL could be implemented in PHP core, while others could be 
implemented in userland (the database vendors would be good candidate to 
implement them, actually.)  I don't yet know what those interfaces should look 
like but I am pretty sure that could be figured out.

So here is the example (I omitted error handling logic for brevity and clarity):

$sql = << {openings}
GROUP BY
c.company_name
LIMIT 
{limit}
SQL;

$conn = mysqli_connect(...);
$som = new SqlObjectModel(new MySqlDialect(), new MySqlSchema($conn));
$som->parse($sql);
$som->set( "limit", $_GET['limit'] );
$som->set( "openings", $_GET['openings'] );
$result = mysqli_query( $conn, $som->sql() );

The SQLObjectModel *could* potentially have additional features like the 
ability to get and set the table name, list of fields, joins, where clauses, 
etc. but the core value-add is not in building SQL but rather parsing and 
sanitizing with properly escaped and quoted arguments so that known-safe SQL 
could be generated.

Existing or new SQL builders could use this to take SqlObjectModel() and pass 
its generated SQL through to ensure their SQL they will output is safe, which 
by the way the string could now be tagged as "trusted" before the builder 
returns the SQL to its developer, assuming an is_trusted() RFC were to pass.

Similarly, any ORM could ensure its SQL is fully sanitized and then pass on to 
whatever functions are needed to execute the SQL.

So, absolutely nothing here would keep people from using "modern SQL," and/or 
cause people to lower the expectation to normalize SQL-92.  This could be a 
generic query parser and sanitizer that could be configured by the "dialect" 
and "schema" objects passed to it and its scope it small, simple and 
straightforward.

I hope that clarifies what I was thinking?

-Mike

Re: [PHP-DEV] [RFC] is_literal

2021-06-22 Thread Lauri Kenttä

On 2021-06-15 10:19, Joe Watkins wrote:
It's just that existing implementation details - RETURN_CHAR using 
interned
strings, and literal constant strings being interned by the compiler - 
lead

to a literal string being "re-used".

This is just a natural consequence of literal strings retaining their
literalness in the way we want them too, nothing dangerous is going on.


This happens with substr as well, even with user input.
With just one character, the possibilities are rather limited, but still 
I'm sure someone finds a way to turn this into a vulnerability.


Such as:
$c = substr($input, 0, 1); # $c is now literal.
$sql = "SELECT * FROM t WHERE c LIKE '$c%'";

Hopefully this "retaining literalness" does not extend to longer strings 
after any operations.


What about Opcache? A few years back I debugged some interning-related 
bugs which only manifested with Opcache, so I'm just thinking maybe 
there's some way to exploit Opcache to make some string interned under 
special circumstances, and just maybe that string can then be 
accidentally reused in a SQL query. I'm not in the mood to dive into 
Opcache internals to check this, so I hope someone with prior knowledge 
can tell what kind of strings get interned by Opcache. Array keys? 
Object properties?


--
Lauri Kenttä

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php



Re: [PHP-DEV] [RFC] is_trusted - was is_literal

2021-06-22 Thread Craig Francis
On Tue, 22 Jun 2021 at 11:31 am, Hans Henrik Bergan 
wrote:

> How is one supposed to use this? like
>
> if(!is_trusted($val)){
> $val = htmlentities($str, ENT_QUOTES | ENT_HTML401 | ENT_SUBSTITUTE |
> ENT_DISALLOWED, 'UTF-8', true);
> }
> echo "$val";




No, if anything that’s the opposite, and almost Taint Checking.

While this is covered in the RFC (https://wiki.php.net/rfc/is_literal) and
will be best read in context, in summary:

The developer does not use this function, instead you rely on libraries to
do that work for you. In this case you would use a HTML Templating Library
(which knows about all the complexities of HTML encoding), and you simply
provide the trusted string ‘?‘ and the values separately.

The Libraries will then use is_trusted(), with something like this:

https://github.com/craigfrancis/php-is-literal-rfc/blob/main/justification/example.php?ts=4

Craig


Re: [PHP-DEV] [RFC] is_literal

2021-06-22 Thread Pierre

Le 22/06/2021 à 11:28, Dan Ackroyd a écrit :

On Tue, 22 Jun 2021 at 10:25, Mike Schinkel  wrote:

Should(n't?) PHP add a basic SQL builder class that can be extended for special 
cases, e.g. different flavors of SQL?


No. Or at least not yet.

This type of thing is much better done in userland, where the api can
evolve at a fast rate, rather than being limited by the fixed release
schedule of PHP.


Agreed, PHP is probably not the right place for an SQL builder, there's 
too many dialects, too many standard or non-standard features, in the 
end SQL builders almost always end-up being opinionated by its 
designer's original need, philosophy, or SQL usage habits, and tailored 
for users which use certain paradigms.


An SQL query builder is already business domain tied, in some way. Of 
course most are very generic, but often don't handle properly SQL 
advanced or modern features, whereas some other where built for ORMs and 
such and may a reduced advanced SQL features that fits the original need.


I don't wish to see an SQL query builder in PHP core, instead of 
favoring usage of modern SQL, it'll in my opinion (of course, that's 
subjective) lower the expectation of people and probably normalize 
SQL-92 being the only SQL spoken by people writing PHP (just 
exaggerating a bit).


Regards

--

Pierre

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php



Re: [PHP-DEV] [RFC] is_trusted - was is_literal

2021-06-22 Thread Hans Henrik Bergan
How is one supposed to use this? like

if(!is_trusted($val)){
$val = htmlentities($str, ENT_QUOTES | ENT_HTML401 | ENT_SUBSTITUTE |
ENT_DISALLOWED, 'UTF-8', true);
}
echo "$val";
(...)
if(!is_trusted($val)){
$val = $mysqli->real_escape_string($val);
}
$mysqli->query("INSERT INTO tbl VALUES('$val');");


like that? (my first impression is that this sounds stupid, but i haven't
given it enough thought to be sure)



On Tue, 22 Jun 2021 at 11:48, Lauri Kenttä  wrote:

> On 2021-06-21 23:25, Craig Francis wrote:
> >
> > - Integers are now included, which will help adoption:
> >
> > https://wiki.php.net/rfc/is_literal
> >
>
> Thanks for the great improvements!
>
> sprintf seems to have some issues, though.
>
> Currently you're checking the parameter types, not the formats.
> The parameter type matters only when coercing to a string (%s).
> Otherwise sprintf should consider the format, not the parameter.
>
> Example:
>  function test($s) { var_dump($s, is_trusted($s)); }
> setlocale(LC_ALL, "de_DE.UTF-8");
> test(sprintf("SET c = %c, f = %f, e = %e", 0x27, 1234, 1234));
> test(sprintf("SET d = %d, x = %x, b = %b", 1e2, 1e2, 1e2));
> test(sprintf("SET weird_d = %''*d", 4, 1));
> test(sprintf("SET s = '%s', int to string should be ok", 123));
> ?>
>
> Currently:
> string(43) "SET c = ', f = 1234,00, e = 1.234000e+3"
> bool(true)
> string(32) "SET d = 100, x = 64, b = 1100100"
> bool(false)
> string(18) "SET weird_d = '''1"
> bool(true)
> string(41) "SET s = '123', int to string should be ok"
> bool(true)
>
> Obviously the results with ints and floats should be the opposite.
>
> If you really want to allow %c, so be it, but I'd disallow it on the
> grounds that (1) it's probably not used in secure strings (usage data,
> anyone?), and thus (2) it could easily be a misspelled %d (for example,
> '%c' instead of '%d' could silently produce an empty result in a query),
> and (3) you're allowing a simple workaround with %s and chr() which
> makes the intent more obvious.
>
> In general, as this is supposed to be a security feature, allowing
> multiple ways for uninformed people to produce "trusted" but actually
> very unsafe values doesn't look like a good idea. I'm not sure if
> allowing trusted single characters to be created through chr or %c
> serves any useful purpose, but I can imagine people using either one
> without realizing that they can create any character, including \0 or '
> or " or non-UTF-8. Better to leave only chr(), one less thing to worry
> about.
>
> Custom padding is a weird edge case, maybe just disallow that too?
>
> As you said yourself, it's not easy to prove anything safe. ;)
>
> --
> Lauri Kenttä
>
> --
> PHP Internals - PHP Runtime Development Mailing List
> To unsubscribe, visit: https://www.php.net/unsub.php
>
>


Re: [PHP-DEV] [RFC] is_literal

2021-06-22 Thread Lauri Kenttä

On 2021-06-22 12:15, Dan Ackroyd wrote:

Dan Ackroyd wrote:
If people aren't going to make using a non-literal where a literal is
expected, be an error, the only alternative I can see is logging it.
Please correct me if you think people should be doing something other
than those two things.


Maybe I'm mistaken, but I'd imagine people use custom error handlers
which report errors (warnings, notices) to ticket system or email or
some other convenient place, so they don't need to read through logs.

It should be well known that unexpected things happen in production.
You can already have e.g. undefined variables or bad array offsets
in a non-tested code path. Non-literal string is just one more
kind of bug, it can be reported just like all the rest.

--
Lauri Kenttä

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php



Re: [PHP-DEV] [RFC] is_trusted - was is_literal

2021-06-22 Thread Lauri Kenttä

On 2021-06-21 23:25, Craig Francis wrote:


- Integers are now included, which will help adoption:

https://wiki.php.net/rfc/is_literal



Thanks for the great improvements!

sprintf seems to have some issues, though.

Currently you're checking the parameter types, not the formats.
The parameter type matters only when coercing to a string (%s).
Otherwise sprintf should consider the format, not the parameter.

Example:


Currently:
string(43) "SET c = ', f = 1234,00, e = 1.234000e+3"
bool(true)
string(32) "SET d = 100, x = 64, b = 1100100"
bool(false)
string(18) "SET weird_d = '''1"
bool(true)
string(41) "SET s = '123', int to string should be ok"
bool(true)

Obviously the results with ints and floats should be the opposite.

If you really want to allow %c, so be it, but I'd disallow it on the 
grounds that (1) it's probably not used in secure strings (usage data, 
anyone?), and thus (2) it could easily be a misspelled %d (for example, 
'%c' instead of '%d' could silently produce an empty result in a query), 
and (3) you're allowing a simple workaround with %s and chr() which 
makes the intent more obvious.


In general, as this is supposed to be a security feature, allowing 
multiple ways for uninformed people to produce "trusted" but actually 
very unsafe values doesn't look like a good idea. I'm not sure if 
allowing trusted single characters to be created through chr or %c 
serves any useful purpose, but I can imagine people using either one 
without realizing that they can create any character, including \0 or ' 
or " or non-UTF-8. Better to leave only chr(), one less thing to worry 
about.


Custom padding is a weird edge case, maybe just disallow that too?

As you said yourself, it's not easy to prove anything safe. ;)

--
Lauri Kenttä

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php



Re: [PHP-DEV] [RFC] is_literal

2021-06-22 Thread Mike Schinkel
> On Jun 22, 2021, at 5:28 AM, Dan Ackroyd  wrote:
> 
> On Tue, 22 Jun 2021 at 10:25, Mike Schinkel  wrote:
>> 
>> Should(n't?) PHP add a basic SQL builder class that can be extended for 
>> special cases, e.g. different flavors of SQL?
>> 
> 
> No. Or at least not yet.
> 
> This type of thing is much better done in userland, where the api can
> evolve at a fast rate, rather than being limited by the fixed release
> schedule of PHP.
> 
> If, after people had settled on a good api, someone could make a
> strong argument for adding it to core, it could be evaluated then.


Sure.  But that is not inconsistent with what I was asking/proposing.  

There would be a benefit to collectively working towards to defect-standard 
solution rather than leave it to the chaos of userland to come up many 
different approache, even if initially in userland.

Even a PSR might be appropriate, but I tend to think the interface-centric 
approach only works well things there the benefit it interoperability and not 
when the benefit is implementation.

-Mike
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php



Re: [PHP-DEV] [RFC] is_literal

2021-06-22 Thread Dan Ackroyd
On Tue, 22 Jun 2021 at 10:25, Mike Schinkel  wrote:
>
> Should(n't?) PHP add a basic SQL builder class that can be extended for 
> special cases, e.g. different flavors of SQL?
>

No. Or at least not yet.

This type of thing is much better done in userland, where the api can
evolve at a fast rate, rather than being limited by the fixed release
schedule of PHP.

If, after people had settled on a good api, someone could make a
strong argument for adding it to core, it could be evaluated then.

cheers
Dan
Ack

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php



Re: [PHP-DEV] [RFC] is_literal

2021-06-22 Thread Mike Schinkel
> On Jun 22, 2021, at 4:53 AM, Dan Ackroyd  wrote:
> 
> The whole point of the idea of literal strings is to make it be easier
> to write some code that:
> 
> i) is safe.
> ii) can be reasoned about at scale.
> 
> Passing bare strings around is not so great for either of those
> things. You have to manually remember "this is a string that is
> intended to be used in a particular way. And doing that in a large
> code base, when your engineering department is large enough to work in
> separate teams is particularly difficult.
> 
> Instead of using bare strings, using a more specific type e.g.
> HtmlAttribute::fromString('#fff'); and then passing that around would
> be much easier to reason about.

THIS.

> Similarly for the thoughts about concatenating numbers into strings.
> Yeah, people might think they want that, but in practice using an SQL
> builder that allows you to put the number in the right place like:
> 
> $sqlBuilder->add('SELECT * FROM foo LIMIT %d', FIXED_LIMIT);
> 
> 
> And then predictable when the request comes in to allow users to set
> their own limit, changing it to:
> 
> $sqlBuilder->add('SELECT * FROM foo LIMIT %d', $_GET['limit']);
> 
> Doesn't take any time to refactor the code, because it's already using
> an appropriate library that handles variables correctly.

So, IMO this begs the question: 

Should(n't?) PHP add a basic SQL builder class that can be extended for special 
cases, e.g. different flavors of SQL?  

The problem with depending on userland to do this is that there will never be a 
single standard adopted.

If built into PHP it could:

1. Legitimize it across all PHP developers, 
2. To make sure its always available in the most basic PHP install, 
3. To allow implementing a performant SQL parser (which is not reasonably done 
in PHP),
4. To make creating a target for polyfills for prior versions possible, 
5. And to create a standard that all PHP projects that use SQL can adopt?

-Mike
P.S. PHP could also implement a sanitizing templating language for SQL as 
possibly a different or even additional approach.
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php



Re: [PHP-DEV] [RFC] is_literal

2021-06-22 Thread Dan Ackroyd
Dan Ackroyd wrote:
>> The developer who made that change would run the tests,
>> see the tests pass and then push their code live. At which point, it
>> would cause an error in production, as the string returned would not
>> pass the is_literal check.

Craig Francis wrote:

> No, the only way fatal errors can occur is if you choose to make them fatal 
> (in userland code).

Please can you go into some detail about what you think people are
meant to do when they detect a non-literal used where a literal is
expected?

This seems like a core flaw in the proposed RFC and I think you aren't
addressing it.

> that's going to make adoption very difficult.

Why are you prioritising speed of adoption, over reducing the cost of
using this feature for projects over say the next 5 or 10 years?

It kind of sounds like the idea behind the RFC has been a compromise
between the idea in Mr Kern's talk, and people who are not sure this
feature is a good idea or not.

This feature should be of most use to larger teams, where reasoning
about the application is difficult to begin with. For them an extra
few hours migrating a large code base to use the appropriate
techniques is a fine trade-off over increased difficulty maintaining
their code.

> No, the only way fatal errors can occur is if you choose to make them fatal 
> (in userland code).

If people aren't going to make using a non-literal where a literal is
expected, be an error, the only alternative I can see is logging it.
Please correct me if you think people should be doing something other
than those two things.

>From my experience, any process that involves looking in log files is
a pretty bad process.

For is_literal checks, what's going to happen is:

* programmer makes a mistake, that isn't caught by a test.
* programmer deploys site, and entries about non-literal strings start
being written to log files.
* someone then needs to remember to check the log files, and pull out
the appropriate entries, and make a ticket for them to be
investigated.
* someone then needs to pick up that ticket, investigate the cause of
the non-literal string being used, and trace it back to which
module/library is causing it, then update the ticket for someone on
that team to fix it.
* someone on that team needs to have that ticket prioritised before
they start work on it.

Apologising for expressing myself poorly before, but this is a really,
really, really, really, really, really result to have. It makes using
this feature either have a high maintenance burden, and allows
security flaws to exist until someone gets around to fix them, or it
results in things failing in production.

I know it's slightly annoying to require any combining of strings,
where you want to preserve 'literal-ness', to have to jump through the
hoop of using a specific function, but it's just a few seconds work,
that  would save hours of debugging.

The whole reason why the solution worked for Google was that it made
it cost less for programmers to do the right thing, rather than having
to remember fix problems after they are found.*

I don't think there would be an easy way to fix this if the RFC was
passed in its current form. Changing from string concatenation
carrying the literal flag around to not, would be a huge BC break,
that would require multiple versions to fix. If it was the other way
round so that we don't support carrying the literal flag around, and
(much to my embarrassment) it does mean that people can't actually use
the feature, because it's too difficult, it would be much easier to
move to carrying the flag around.

> However requiring developers to rewrite all of their code to use 
> literal_concat() and literal_implode()

No-one is forcing developers to rewrite their code. And I don't think
many people would actually use those functions.

The vast majority of people are going to continue to use the functions
provided by Wordpress, Doctrine etc, and any checks on literal, or
combining of literals would be internal to those functions.

As I wrote to Thomas, I think the majority of people are more likely
to shift to using type specific wrapping types, rather than copying
bare strings around.

cheers
Dan
Ack

*For anyone wondering, unfortunately we can't use the same
implementation as Google, due to PHP not being statically compiled,
and also not being able to set 'type flags' on strings.

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php



Re: [PHP-DEV] [RFC] is_trusted - was is_literal

2021-06-22 Thread Mike Schinkel
> On Jun 21, 2021, at 7:18 PM, Benjamin Morel  wrote:
> 
> On Tue, 22 Jun 2021 at 01:06, Derick Rethans  wrote:
> 
>> On 21 June 2021 23:37:56 BST, Yasuo Ohgaki  wrote:
>>> 
>>> The name "is_trusted" is misleading.
>>> Literal is nothing but literal.
>> 
>> I agree with this. The name is_trusted is going to be the same naming
>> mistake as "safe mode" was. Developers will put their trust in it that it
>> is 100% guaranteed safe.
> 
> 
> FWIW, agreed, too. Trusted is vague and may imply some false sense of
> security. Literal is literally what it says on the tin.


I proposed the name change originally.

For my inspiration take a look at Trusted Types API in Javascript:

https://developer.mozilla.org/en-US/docs/Web/API/Trusted_Types_API 


And also Trusted Types API from the W3C

https://w3c.github.io/webappsec-trusted-types/dist/spec/ 


The reason is_trusted() would be better than is_literal() is so that the 
function could in the future handle other trusted types that are definitely not 
literal and allow most code that using is_trusted() to continue to work.  

So, for example, in the future we could add a new TrustedInterface and if an 
object implemented the trusted interface and a __ToString() method it could be 
considered trusted by code that uses the is_trusted() to guard against 
untrusted code.

Alternately if in the future PHP added a TrustedAttribute we could use on the 
__ToString() method to declare an object that could be used when a trusted 
string is required.

-

Conversely, I am *strongly* opposed to is_literal() as proposed.  The current 
RFC proposes half a solution without first envisioning what the other half of 
the solution will look like.

If an is_literal() RFC were passed I would envision the following things 
happening:

1. Bloggers would start blogging about how all PHP users should use 
is_literal() when accepting SQL and HTML code, but won't provide any of the 
nuances about special-cases because many won't understand them, they will just 
see a new PHP feature to hype.

2. PHP listing tools will jump on is_literal() and champion its use.  Just look 
at Psalm.

3. Coding standards teams in large organizations will see that the listing 
tools are flagging non-literals and wil start requiring all SQL and HTML etc. 
code to be literals. And since they already bar the use of eval() there will be 
no way to work around the requirement when you actually know what you are 
doing. 

And because they are in large organizations, they will be divorced from the 
developers in the trenches who 95% of the time will have no problem with the 
requirement, but 5% of the time will simply not be able to achieve their 
development needs. Unfortunately these trench developers will not have enough 
power in the organization to get the coding standards teams to even list to 
their issues, especially if they are an agency or contractor.

4. Library developers will similarly adopt is_literal() as a requirement to use 
their library.  While the most popular ones will likely be enlightened enough 
they will address edge cases. many in the long tail will not.  So lots of the 
long-tail PHP libraries out there — and especially the ones only needed by a 
small audience so there won't be alternatives — will not address exceptions. 
And for any of many reasons, many of those library developers will not be 
responsive to those who submit PRs to provide workarounds.

THIS is the future I do not want to see, and if is_trusted() is renamed back to 
is_literal() this is the future I predict. Maybe it won't happen, but I 
personally do not want to risk it.

The biggest problem with the RFC is it does not address how to bypass someone 
else's requirement for use of is_literal() when you *know* what you are doing 
and you absolutely need to.  Craig and I have had several long discussions 
about this, and he said he did not want to allow that because someone might 
misuse it (Craig, if I am paraphrasing incorrectly, please do correct me here.) 
 

And I completely understand Craig's misgivings, but then he would not address 
mine.  His only solution to get around the special-case needs that *will* exist 
is to use eval(). which is a lot like saying: "So you are in pain?  Why not try 
Meth?"  

Craig shared this talk from Google which inspired him:

https://www.youtube.com/watch?v=ccfEu-Jj0as 


I would highly recommend watching this before voting on Craig's RFC.

The speaker talks about how they used "CompileTimeConstants" to reduce security 
bugs, BUT:

1. They used a @CompileTimeConstant attribute and a static checker to guard 
against unsafe strings instead of a function baked into Java. So convention and 
process over language enforcement.

https://errorprone.info/bugpattern/CompileTimeConstant 

Re: [PHP-DEV] [PROPOSAL] Bare name array literals (again)

2021-06-22 Thread Rowan Tommins

On 22/06/2021 01:03, tyson andre wrote:

If linters and automatic fixers were set up for a project to enforce it (e.g. 
phpcbf),
there would only be one way that literals would be used in a project either way.



The only way to be consistent across a whole project would be to never 
use the new syntax, since it can't be used for all arrays/keys. 
Otherwise, you'd have to encourage either mixing the styles within one 
array, or switching arrays back and forth if keys are added and removed 
which meet/don't meet the requirements.




Also, with no common interface between those anonymous classes,
using just anonymous classes would be writing functions that accept any object 
or return any object,
which would be error prone and hard to analyze.

Those anonymous classes wouldn't have any ancestors in common for `instanceof` 
checks.



Yes, the simplest anonymous class, with only untyped public properties, 
is equivalent to stdClass, which in turn is equivalent to a plain array. 
However, you can then add natively checked types, implement interfaces, 
extend base classes, use traits, and so on. And then you can switch to a 
named class with minimal impact on surrounding code.


Unfortunately, a neat syntax like object(foo: $foo) loses most of those 
advantages; what we need is something that lets us keep the class body, 
but "capture" variables from the declaring scope somehow, e.g.


$foo = new class(public bool success: true, public array data: $data, 
public CursorInterface cursor: $cursor) implements Bar { use BarTrait; }


or

$foo = new class implements Bar { public bool $success = true; public 
array $data = lexical $data; public CursorInterface $cursor = lexical 
$cursor; use BarTrait; }





Also, if there were optional parameters, that can be represented in 
non-standard JSDoc supported by static analyzers
(`success: bool, data?: array, cursor?: CursorInterface, errorMessage?: 
string`),
but that wouldn't be represented in an anonymous class
(setting a property to null is not the same as omitting it).



Certainly, if what you have are dynamic keys rather than fixed 
properties, then a key-value array makes sense, and I'm not suggesting 
we remove that from the language. But I do think we should be adding 
richer choices beside arrays, rather than just more ways to write the 
same thing.


Regards,

--
Rowan Tommins
[IMSoP]

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php



Re: [PHP-DEV] [RFC] is_trusted - was is_literal

2021-06-22 Thread Stephen Reay


> On 22 Jun 2021, at 15:58, Stephen Reay  wrote:
> 
> 
> 
>> On 22 Jun 2021, at 06:28, Craig Francis > > wrote:
>> 
>> On Tue, 22 Jun 2021 at 12:18 am, Benjamin Morel >  > >>
>> wrote:
>> 
>>> On Tue, 22 Jun 2021 at 01:06, Derick Rethans  wrote:
>>> 
 On 21 June 2021 23:37:56 BST, Yasuo Ohgaki  wrote:
> 
> The name "is_trusted" is misleading.
> Literal is nothing but literal.
 
 I agree with this. The name is_trusted is going to be the same naming
 mistake as "safe mode" was. Developers will put their trust in it that it
 is 100% guaranteed safe.
>>> 
>>> 
>>> FWIW, agreed, too. Trusted is vague and may imply some false sense of
>>> security. Literal is literally what it says on the tin.
>>> 
>> 
>> 
>> I can follow up properly tomorrow, but by popular request we do support
>> integers as well (could be seen as stretching the definition of “literal” a
>> bit).
>> 
>> And we did ask for suggestions last week, which ended up with a vote (as I
>> couldn’t decide).
>> 
>> That said, I’m really glad that the only issue we seem to have is the name.
>> 
>> Craig
> 
> So I just want to make sure I understand the progression on this so far.
> 
> 
> It started out with people wanting a way to check that a string was a literal 
> string, in code somewhere, and does not come from user input. Ok makes sense. 
> The name makes sense too.
> 
> Then someone said they wanted to check if an integer was a literal too - but 
> because of technical limitations, it now allows any integer, regardless of 
> where it came from, to be treated as a literal.
> 
> Then because it’s not actually checking for literals, people thought the name 
> “trusted” made more sense?
> 
> 
> That nobody thinks “any user supplied integer must be surely safe” is kind of 
> hilarious, and sad at the same time.
> 
> Knowing that a string is literal would be very helpful. Knowing that the 
> string potentially still contains user input, in spite of the one thing it 
> claims to do, is not just unhelpful, it makes the entire thing useless.
> 
> 
> I can’t vote, but this whole thing would be a No from me unless it was the 
> original scope - a variable is a literal defined in code somewhere. If there 
> are technical limitations with some types, then leave them off the list of 
> what it will check.

s/nobody/anybody/

I blame a lack of caffeine.

Re: [PHP-DEV] [RFC] is_trusted - was is_literal

2021-06-22 Thread Stephen Reay


> On 22 Jun 2021, at 06:28, Craig Francis  wrote:
> 
> On Tue, 22 Jun 2021 at 12:18 am, Benjamin Morel  >
> wrote:
> 
>> On Tue, 22 Jun 2021 at 01:06, Derick Rethans  wrote:
>> 
>>> On 21 June 2021 23:37:56 BST, Yasuo Ohgaki  wrote:
 
 The name "is_trusted" is misleading.
 Literal is nothing but literal.
>>> 
>>> I agree with this. The name is_trusted is going to be the same naming
>>> mistake as "safe mode" was. Developers will put their trust in it that it
>>> is 100% guaranteed safe.
>> 
>> 
>> FWIW, agreed, too. Trusted is vague and may imply some false sense of
>> security. Literal is literally what it says on the tin.
>> 
> 
> 
> I can follow up properly tomorrow, but by popular request we do support
> integers as well (could be seen as stretching the definition of “literal” a
> bit).
> 
> And we did ask for suggestions last week, which ended up with a vote (as I
> couldn’t decide).
> 
> That said, I’m really glad that the only issue we seem to have is the name.
> 
> Craig

So I just want to make sure I understand the progression on this so far.


It started out with people wanting a way to check that a string was a literal 
string, in code somewhere, and does not come from user input. Ok makes sense. 
The name makes sense too.

Then someone said they wanted to check if an integer was a literal too - but 
because of technical limitations, it now allows any integer, regardless of 
where it came from, to be treated as a literal.

Then because it’s not actually checking for literals, people thought the name 
“trusted” made more sense?


That nobody thinks “any user supplied integer must be surely safe” is kind of 
hilarious, and sad at the same time.

Knowing that a string is literal would be very helpful. Knowing that the string 
potentially still contains user input, in spite of the one thing it claims to 
do, is not just unhelpful, it makes the entire thing useless.


I can’t vote, but this whole thing would be a No from me unless it was the 
original scope - a variable is a literal defined in code somewhere. If there 
are technical limitations with some types, then leave them off the list of what 
it will check.









Re: [PHP-DEV] [RFC] is_literal

2021-06-22 Thread Dan Ackroyd
On Mon, 14 Jun 2021 at 14:27, Thomas Nunninger  wrote:
>

Thomas Nunninger wrote:

Thomas Nunninger wrote:
> Only some (infrastructure or
> end-to-end) test that covers the business logic plus the corresponding
> infrastructure by accident would uncover an error.


I think testing will be much less of an issue than you might expect,
but partly only if string concatenation doesn't carry the flag around.

The whole point of the idea of literal strings is to make it be easier
to write some code that:

i) is safe.
ii) can be reasoned about at scale.

Passing bare strings around is not so great for either of those
things. You have to manually remember "this is a string that is
intended to be used in a particular way. And doing that in a large
code base, when your engineering department is large enough to work in
separate teams is particularly difficult.

Instead of using bare strings, using a more specific type e.g.
HtmlAttribute::fromString('#fff'); and then passing that around would
be much easier to reason about.

Similarly for the thoughts about concatenating numbers into strings.
Yeah, people might think they want that, but in practice using an SQL
builder that allows you to put the number in the right place like:

$sqlBuilder->add('SELECT * FROM foo LIMIT %d', FIXED_LIMIT);


And then predictable when the request comes in to allow users to set
their own limit, changing it to:

$sqlBuilder->add('SELECT * FROM foo LIMIT %d', $_GET['limit']);

Doesn't take any time to refactor the code, because it's already using
an appropriate library that handles variables correctly.

cheers
Dan
Ack

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php



Re: [PHP-DEV] [PROPOSAL] Bare name array literals (again)

2021-06-22 Thread Mike Schinkel
> On Jun 21, 2021, at 1:34 PM, Rowan Tommins  wrote:
> 
> On 21/06/2021 17:54, tyson andre wrote:
>> In every place where `key` is a valid php identifier
>> (e.g. can be used in PHP 8.0's named parameters),
>> I propose to allow `[key: expr]` to be used instead of `['key' => expr]`.
> 
> This is an immediate "no" from me: it multiplies the ways to write the same 
> thing from 2 to 4, in order to save a few bytes, in a few instances.
> 
> I think this is something that Douglas Crockford got absolutely right when he 
> simplified JavaScript object syntax to formalise JSON: every valid key can be 
> represented as a quoted string, so if the quotes are always there,p ...

That is one way to look at it.  

Another way to look at it was Crockford's regrettable decision to optimize for 
the uncommon case by imposing tedium on developers and making reading of JSON 
files even harder for end-users than developers has created an unfortunate 
legacy that we can likely never correct given the ubiquity of JSON at this 
point. That decision was probably the initial impetus for JSON5, even, which 
unfortunately can't get widespread adoption because of JSON classic's ubiquity.

But I digress.  And the reason I digress is to contrast — as Tyson also did — 
the JSON the serialization form with the programming language form.  Javascript 
itself supports object initialization without having quotes around the 
identifiers.  And unless I remember wrongly, serializing the JSON object was 
the spark that turned into JSON.

Meanwhile back at PHP, one of the most annoying and poor developer experience 
aspects of working with PHP — because arrays are so ubiquitous in code in the 
wild — is the requirement for those damn quotes around array indexes, and 
especially that array indexes are not implemented as symbols so that we can get 
some "compile-time" checking of indexes in our IDEs and linters and even PHP 
itself.

> you don't need to remember a list of rules about reserved words, allowed 
> characters, etc.

Having to remember the same rules for symbol identifiers hardly seems to be a 
burden for a developer to me.

Also, you used etc. but I cannot think of anything else. Is there something you 
did not include?

> 
>> This is useful for shortening long arrays where the keys are known literals,
>> e.g.
>> 
>> ```php
>> return [success: true, data: $data, cursor: $cursor];
>> // is equivalent to the following, but shorter:
>> return ['success' => true, 'data' => $data, 'cursor' => $cursor];
>> ```
> 
> Although common, this is not a good use of arrays; if your keys are "known 
> literals", they should be fields of some object:
> 
> return new Result(success: true, data: $data, cursor: $cursor);

Except that arrays in PHP can do things objects cannot do in PHP.  So your 
assertion as an absolute is not valid.

To start with, and most important for the immutable-envious developer world is 
that arrays are passed by value.  Unless and until we have an object equivalent 
for passing by value you cannot authoritatively say "your fields should always 
be in an object."

Also, as noted by Kami and acknowledged by Larry, there are many things you can 
do with arrays you cannot do with objects such as destructuring.


> If you don't want to declare a class (yet), you can use an anonymous object. 
> Rather than yet another way to write arrays, it would be great to have some 
> more powerful syntax for those; currently you'd have something like this:
> 
> return new class(success: true, data: $data, cursor: $cursor) { public 
> function __construct(public bool $success, public array $data, public 
> CursorInterface $cursor) {} };

> 
> Brainstorming, we could perhaps extend property promotion into the "new 
> class" clause, so that you could write this:
> 
> return new class(public bool success: true, public array data: $data, public 
> CursorInterface cursor: $cursor) {};

Let's compare your suggestion with the following from a *readability* and also 
developer quality-of-life perspective:

return [ success: true, data: $data, cursor: $cursor ];

YES, there are benefits you get with the approach you mention, but readability 
and developer experience are definitely not in that set of benefits. 



NOW, all that said, I am not sure I will support this change to arrays proposed 
by Tyson.  

I might, but I would like to understand Tyson's use-cases first where he would 
prefer to need an array instead of an object if objects instantiation were made 
easier, and more importantly, why could objects not be used?


-Mike

P.S. While I wanted to close with a request for Tyson to explain, I also want 
to mention I would likely be a lot more interested in is innovation in the area 
of object initialization, implicit typing and implicit class declaration.

Consider if the following could create an instance of an anonymous class with 
an implicitly-typed public boolean property `success`, and array property 
`data` and a CursorInterface property