Re: [PHP-DEV] Sql Object Model Parser & Sanitizer (was [RFC] is_literal)

2021-06-22 Thread Kamil Tekiela
This is open to SQL injection:

$queryBuilder
->select(...$_GET['columns'])
->from($_GET['table'])
->where($_GET['where'])
;

All below statements produce 42. This is valid SQL:

SELECT `42 FROM TABLE`() FROM dual;
SELECT `⠀` FROM `⠀`;
SELECT * FROM "42"; -- With ANSI_QUOTES
SELECT * FROM "";

This is valid in MySQL:

VALUES ROW(42)

This is valid in MariaDB:

VALUES (42);

This is not a valid SQL:

SELECT * FROM "\"\"";

There are also windows functions, CTE, Stored procedures, and a bunch of
new features.


Re: [PHP-DEV] Sql Object Model Parser & Sanitizer (was [RFC] is_literal)

2021-06-22 Thread Larry Garfield
On Tue, Jun 22, 2021, at 11:46 AM, Mike Schinkel wrote:

> Maybe this will help. There are a billion XML schemas, but DomDocument 
> and its related classes can process them all. A SqlObjectModel would be 
> similar; it would know how to process text queries where the dialect 
> interface implementors would be the equivalent of the XML schema.  
> 
> It is not a perfect analogy, but it is close enough.

It's a very helpful analogy, actually.  SQL is 100x more complex of a language 
than XML is.  XML is trivial by comparison.  That's exactly why what you 
describe is vastly harder than you think it is, even if you limit yourself to 
the rudimentary ancient SQL standards.

And a tool that breaks down and becomes unusable as soon as you start using 
anything even vaguely modern is a tool I would rather *not* exist, quite 
frankly, as it only further discourages anyone learning SQL beyond MySQL 3.

--Larry Garfield

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php



Re: [PHP-DEV] Sql Object Model Parser & Sanitizer (was [RFC] is_literal)

2021-06-22 Thread Kamil Tekiela
While it's true that a lot of the internet is using mysqli due to
WordPress, this doesn't change the fact that PHP already offers a solution
to the problem. Both PDO and mysqli support server-side prepared
statements.

We don't talk about WordPress. They should not hold back PHP. That project
is so behind that they still offer mysql_* API in their abstraction
library. Nonetheless, whatever solution we offer, they're unlikely to pick
it up. They haven't used prepared statements for whatever reason, and they
created their own client-side prepared statements. What makes you think
that if we write "Parser & Sanitizer", they're going to start using it? The
proposal for `is_literal` is going to be completely useless to WordPress as
they have no place to use it in their codebase. It's a pity we can't help
them, but it's true.

To clarify the proposal from Craig, I would like to point out that
`is_literal()` will not help with the data being passed to SQL. User input
cannot be trusted in any sense, no matter where it comes from. The proposal
for `is_literal()` helps libraries that try to build the SQL dynamically,
without the user input. For example, it would help to write a secure method
like this:

function select(mysqli $mysqli, string $table) {
if(!is_literal($table)) {
throw new RuntimeException('The table name has to be a hardcoded string');
}

return $mysqli->query("SELECT * FROM {$table}")->fetch_all();
}

This code would prevent the developer from misusing this function by
passing user input to it.

>That is why the title of this email said "Parse and Sanitizer" not
"Builder."

Then we definitely don't need that. Both PDO and mysqli can escape string
literals in a context aware manner (to some extent at least). PDO tried to
implement a parser, but it proved challenging and to this day doesn't work
correctly. As Larry explained, SQL is not a standardised language anymore.
The flavours diverged and created a widely inconsistent language.

To build a context-aware parser for SQL, you would need to know:

- The SQL flavour
- The database type (MySQL or MariaDB)
- The database version
- The settings maintained on that database server

For example, in MySQL, you need to know which character is used to escape
quotes in string literals, and you need to know the SQL_MODE. Then you need
to know which syntax is allowed and understood by that specific MySQL
version. The SQL syntax differs a lot between MySQL and MariaDB. Building,
parsing and compiling the SQL in PHP would involve rebuilding the same
parser that MySQL server is using. And you'd have to do it for every single
possible database vendor/version. On top of that, you need to have direct
access to the database/table/column/SP/functions names, so that your parser
can apply the business logic. Not to mention escaping the data if you don't
want to use prepared statements.

>OTOH, what about when a developer needs to parameterize field and table
names in PDO

You don't parameterize field and table names. Period. You hardcode these
names in your business logic. Here's an example from DBAL:

$queryBuilder
->select('id', 'name') // <-- Hardcoded
->from('users') // <-- Hardcoded
->where('email = ?') // <-- Hardcoded
->setParameter(0, $userInputEmail) // <-- Parameterized
;

Irrelevant of which parameterization technique is used here, the idea is
that you never parameterize SQL code. You parameterize the data. WordPress
does exactly the same with their prepare() method. If you need to have
dynamic field identifiers, then it's your job as a developer to write the
code that can safely handle such requirement.

A native PHP "SQL Object Model with parser and sanitizer functionality"
wouldn't prevent SQL injection in the same way that userland DB abstraction
libraries don't. You can still create SQL injection with wordpress $wpdp
class, you can do it with DBAL, and you can do it with PDO.

To summarize, we are not looking for a solution to prevent SQL injection or
to help Wordpress. We are looking for a solution that would prevent the
misuse of libraries like DBAL, so that methods like select(), from(), and
where() could generate a warning when used with non-hardcoded data.

Regards,
Kamil

P.S. By "sanitization" I understand removing unwanted information from a
piece of data. SQL injection cannot be prevented by removing data. It can
only be prevented by strict separation of SQL and data, or by careful
formatting of the data when included in the SQL string itself. This
includes quoting, type casting and escaping of quotes. Sanitization cannot
prevent SQL injection!


Re: [PHP-DEV] Sql Object Model Parser & Sanitizer (was [RFC] is_literal)

2021-06-22 Thread Mike Schinkel


> On Jun 22, 2021, at 11:41 AM, Larry Garfield  wrote:
> 
> On Tue, Jun 22, 2021, at 8:39 AM, Mike Schinkel wrote:
>>> On Jun 22, 2021, at 9:00 AM, Kamil Tekiela  wrote:
>>> 
>>> Hi Mike,
>>> 
>>> Please don't do this. We already have PDO with prepared statements. The 
>>> data must be bound. This is the secure way of writing SQL queries. 
>> 
>> The problem is that over 40% of the web currently runs on PHP code that 
>> using mysqli.  That CMS does not support PDO nor prepared statements, 
>> and is unlikely to switch to it anytime some in the foreseeable future. 
> 
> WordPress is not going to leverage anything we do here until and unless there 
> is a major change of leadership and culture at that project.  Please don't 
> waste any mental effort on it; they clearly waste no mental effort on what 
> the rest of the PHP community considers good, secure practices.  Anything 
> involving them is tilting at windmills.

You misunderstand.  What I am (likely) "wasting" my mental effort is to discuss 
features that I would be able to use WITHOUT WordPress having to make ANY 
changes.

HOWEVER, these would be beneficial beyond WordPress, since parameterized 
queries cannot parameterize table names nor field names. 

This is all trying to address the concerns that Craig Francis brought up off 
list when he said that you cannot escape or sanitize without knowing context 
when I was asking his to provide a make_literal() function, add support for a 
IsLiteral attribute, or support an IsLiteralInterface so that people don't 
latch on to using is_literal() and make certain edge cases impossible. 

When I am trying to address a problem, I come at it from many angles until I 
find a solution.  One potential solution then is to have a class built into PHP 
that can ensure the SQL returned is indeed safe. 

> Mike, speaking as someone who has written an SQL abstraction layer and query 
> builder with significant usage (Drupal 7-9), you are *grossly* 
> under-estimating the complexity of what you describe.  It might be possible 
> to hack together for SQL92, aka "what most PHP devs actually use because they 
> haven't noticed that it's not 1992 anymore", but that's already been done.  
> We have DBTNG in Drupal, we have Doctrine, problem solved.

I think what is happening here is you are making an assumption I am proposing a 
much larger scope than I am.

Think of the scope I am proposing being on par with $mysqli->prepare(), but a 
bit more to be able to handle more than just values.

> Modern SQL, though, is a stupidly complex and stupidly inconsistent beast.  
> Most of the syntax beyond the basics is different on every damned database.  
> The official spec *is not even publicly available*, and costs a lot of money 
> to access.  And no DB engine actually supports all of it; they all support 
> different subsets with their own different extensions that may or may not be 
> comparable.

Both Modern SQL and legacy SQL are both still text-based query languages  and 
they all have grammars that can be represented by BNF rules. 

https://github.com/ronsavage/SQL

Those rules could be abstracted into a form accessible via a "dialect" 
interface and that is how these would literally any version of SQL could be 
supported.

Could we finish all of a given dialect at once?  No.  Iteration based on what 
is found to be supported is how this could be approached.  Remember, these 
dialects could be implemented in userland. By any PHP developer. 

Could we ever get them to be perfect?  Probably not.  But they would be good at 
the start and then very good and there would like to be many people providing 
PRs to fix new edge cases for the dialects that get the most use.

> Building a tool that parses an arbitrary string to an AST for a spec that is 
> inconsistent, inaccessible, and not implemented correctly by anyone is a 
> fool's errand, and that's just the first part of it.  

That is not what I am proposing.  I am proposing to build a tool that knows how 
to parse based on a simplified grammar and then sanitize based on those rules.

There really are only a few contexts in a SQL query to be concerned about.  
Keywords, field names, table name, table aliases, and typed values (plus maybe 
a few other things I missed?) The grammars themselves can evolve independently 
as userland implementations of interfaces. This is not rocket science.

> That's not even getting into designing an API for people to modify it,

No API per se, an interface.  (Yes that is also an API, but not one that you 
call; instead it calls you.)

> or questions of performance,

Parsing a SQL query should not take nearly as long as executing the query 
itself, especially if the SqlObjectModel were written in C.

> or compiling the AST back into a DB-specific string

Sounds like you are envisioning something I am not.

> AND then doing parameter binding which varies from one database to another.

Again, that would be handled by the dialect class.

> You're 

Re: [PHP-DEV] Sql Object Model Parser & Sanitizer (was [RFC] is_literal)

2021-06-22 Thread Mike Schinkel


> On Jun 22, 2021, at 11:56 AM, Pierre  wrote:
> 
> Le 22/06/2021 à 17:35, Mike Schinkel a écrit :
>> https://github.com/WordPress/WordPress/blob/master/wp-includes/wp-db.php#L2050
> Sorry for the discussion pollution here but, but ouch, plugins are still 
> using this unsafe form ? Reminds when I was a student, I learnt to 
> parametrize queries there, it was 20 years ago. I never understood people not 
> doing that in the first place.
>> But also because of much legacy code exists in the form of plugins and 
>> themes that do not support parameterized queries.
> Yes I agree, legacy is legacy, you have to deal with it. But all legacy code 
> cannot be fixed, and doing a highly complex SQL parsing / escaping / 
> vulnerability detection code that explicitly targets legacy code and not 
> modern code seems weird to me.
>> HOWEVER, whether mysqli supports parameterised queries or not is all a moot 
>> point because parameterised queries do not allow for parameterizing field 
>> names or table names.  And the point of this thread was to discuss how to 
>> mark SQL that has been composed at run-time to be "safe."  Without being 
>> able to parameterize field names and table names parameterised queries are 
>> not a sufficiently complete solution.
> 
> Not being able to parametrize table or field names is not only a problem for 
> mysqli, but it is for PDO and pgsql too. That's a place where userland 
> query-builders and others DBALs, even the most basic ones do shine, and 
> brings a real added-value.
> 
> But having anyone, writing SQL with user-given table names or column names, 
> and executing it using something like WP's _do_query() method seems like they 
> *WANT* to be hacked.

Are you not familiar with PHPMyAdmin[1] and/or Adminer[2]?  

Do they, or anyone else who has a similar use-case want to be hacked?

> I'm not sure how you will succeed in plugging the is_trusted() / is_literal() 
> / is_wathever() method correctly in an SQL Model Parser & Sanitizer anyway, 
> knowing that at this point, all you'll receive is a huge string issued by 
> some plugin API which has already done crazy dark (and probably bugguy as 
> well) magic.
> 
> I don't see how adding magic in PHP core will avoid the need to fix all those 
> legacy plugins, they probably would need themselves to use this new shiny API 
> to benefit from it ? In the opposite, if something alters the behavior of 
> mysqli implicitly for everyone in order to make it safe, it sounds like there 
> will be a lot of BC ? In both case, it seems that it will not do any shiny 
> magic to me.

It would not affect *any* of those plugins by itself.  They would be left to 
fend for their own.

What it *would* do is allow those developers writing new sites and/or plugins 
or maintaining old ones who proactively choose to sanitize their SQL to be able 
to do so before they pass their SQL to $wpdb->query(), especially if the RFC 
is_trusted() and/or is_literal() passes.

> But, I might be wrong, this thread becomes harder and harder to read, and I 
> may have missed a few points.

It just started!  Note I created a new thread from the 
is_literal()/is_trusted() thread to talk about SQL.

-Mike

[1] https://www.phpmyadmin.net/
[2] https://www.adminer.org/




Re: [PHP-DEV] Sql Object Model Parser & Sanitizer (was [RFC] is_literal)

2021-06-22 Thread Pierre

Le 22/06/2021 à 17:35, Mike Schinkel a écrit :

https://github.com/WordPress/WordPress/blob/master/wp-includes/wp-db.php#L2050
Sorry for the discussion pollution here but, but ouch, plugins are still 
using this unsafe form ? Reminds when I was a student, I learnt to 
parametrize queries there, it was 20 years ago. I never understood 
people not doing that in the first place.

But also because of much legacy code exists in the form of plugins and themes 
that do not support parameterized queries.
Yes I agree, legacy is legacy, you have to deal with it. But all legacy 
code cannot be fixed, and doing a highly complex SQL parsing / escaping 
/ vulnerability detection code that explicitly targets legacy code and 
not modern code seems weird to me.

HOWEVER, whether mysqli supports parameterised queries or not is all a moot point because 
parameterised queries do not allow for parameterizing field names or table names.  And 
the point of this thread was to discuss how to mark SQL that has been composed at 
run-time to be "safe."  Without being able to parameterize field names and 
table names parameterised queries are not a sufficiently complete solution.


Not being able to parametrize table or field names is not only a problem 
for mysqli, but it is for PDO and pgsql too. That's a place where 
userland query-builders and others DBALs, even the most basic ones do 
shine, and brings a real added-value.


But having anyone, writing SQL with user-given table names or column 
names, and executing it using something like WP's _do_query() method 
seems like they *WANT* to be hacked. I'm not sure how you will succeed 
in plugging the is_trusted() / is_literal() / is_wathever() method 
correctly in an SQL Model Parser & Sanitizer anyway, knowing that at 
this point, all you'll receive is a huge string issued by some plugin 
API which has already done crazy dark (and probably bugguy as well) magic.


I don't see how adding magic in PHP core will avoid the need to fix all 
those legacy plugins, they probably would need themselves to use this 
new shiny API to benefit from it ? In the opposite, if something alters 
the behavior of mysqli implicitly for everyone in order to make it safe, 
it sounds like there will be a lot of BC ? In both case, it seems that 
it will not do any shiny magic to me.


But, I might be wrong, this thread becomes harder and harder to read, 
and I may have missed a few points.


Regards,

--

Pierre



Re: [PHP-DEV] Sql Object Model Parser & Sanitizer (was [RFC] is_literal)

2021-06-22 Thread Larry Garfield
On Tue, Jun 22, 2021, at 8:39 AM, Mike Schinkel wrote:
> > On Jun 22, 2021, at 9:00 AM, Kamil Tekiela  wrote:
> > 
> > Hi Mike,
> > 
> > Please don't do this. We already have PDO with prepared statements. The 
> > data must be bound. This is the secure way of writing SQL queries. 
> 
> The problem is that over 40% of the web currently runs on PHP code that 
> using mysqli.  That CMS does not support PDO nor prepared statements, 
> and is unlikely to switch to it anytime some in the foreseeable future. 

WordPress is not going to leverage anything we do here until and unless there 
is a major change of leadership and culture at that project.  Please don't 
waste any mental effort on it; they clearly waste no mental effort on what the 
rest of the PHP community considers good, secure practices.  Anything involving 
them is tilting at windmills.

Mike, speaking as someone who has written an SQL abstraction layer and query 
builder with significant usage (Drupal 7-9), you are *grossly* under-estimating 
the complexity of what you describe.  It might be possible to hack together for 
SQL92, aka "what most PHP devs actually use because they haven't noticed that 
it's not 1992 anymore", but that's already been done.  We have DBTNG in Drupal, 
we have Doctrine, problem solved.

Modern SQL, though, is a stupidly complex and stupidly inconsistent beast.  
Most of the syntax beyond the basics is different on every damned database.  
The official spec *is not even publicly available*, and costs a lot of money to 
access.  And no DB engine actually supports all of it; they all support 
different subsets with their own different extensions that may or may not be 
comparable.

Building a tool that parses an arbitrary string to an AST for a spec that is 
inconsistent, inaccessible, and not implemented correctly by anyone is a fool's 
errand, and that's just the first part of it.  That's not even getting into 
designing an API for people to modify it, or questions of performance, or 
compiling the AST back into a DB-specific string, AND then doing parameter 
binding which varies from one database to another.

You're talking about reimplementing major portions of MySQL, PostgreSQL, 
Oracle, etc. themselves in PHP, all at the same time.  Well, good luck, you're 
going to need it.

Personally I've long since concluded that database portability is no longer an 
achievable or even desirable feature.  SQL is just too fragmented a language, 
leaving you with a least common denominator that is grossly under-whelming for 
modern needs.  If you want more than SQL92, it's not really viable anymore.

--Larry Garfield

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php



Re: [PHP-DEV] Sql Object Model Parser & Sanitizer (was [RFC] is_literal)

2021-06-22 Thread Mike Schinkel
> On Jun 22, 2021, at 10:35 AM, Stephen Reay  wrote:
> 
>> On 22 Jun 2021, at 20:39, Mike Schinkel  wrote:
>> 
>>> On Jun 22, 2021, at 9:00 AM, Kamil Tekiela  wrote:
>>> 
>>> Hi Mike,
>>> 
>>> Please don't do this. We already have PDO with prepared statements. The 
>>> data must be bound. This is the secure way of writing SQL queries. 
>> 
>> The problem is that over 40% of the web currently runs on PHP code that 
>> using mysqli.  That CMS does not support PDO nor prepared statements, and is 
>> unlikely to switch to it anytime some in the foreseeable future.  
>> 
>> A SQL object model parser and sanitizer could more easily be used 
>> incrementally by that CMS since PDO does not share connections with mysqli 
>> (AFAIK, anyway.)
>> 
> 
> (Resending from on-list address)
> 
> Apparently you didn't know mysqli supports parameterised queries?
> Wordpress could have adopted parameterised queries when they grudgingly 
> switched to mysqli, years after both it and PDO were introduced.

Well, yes and no.  

I admit I had forgotten it because frankly the WordPress wp-db object wrapper 
used by 99.7%[1] of WordPress developers does not support it and given I had 
long ago realized I could not use them in WP I guess my memory blocked out the 
existence of prepared statements in mysqli.

But it is not relevant in WordPress. Here is where (almost?) all SQL queries go 
to run in WordPress:

https://github.com/WordPress/WordPress/blob/master/wp-includes/wp-db.php#L2050 

Note there are no bind_param() or execute() calls anywhere in that file.  Their 
is a prepare(), but WordPress rolled their own:

https://github.com/WordPress/WordPress/blob/master/wp-includes/wp-db.php#L1302 

So saying that parameterized query is a solution leaves out 40% of the web.  
Because some other guys are the intransigent ones here. 

But also because of much legacy code exists in the form of plugins and themes 
that do not support parameterized queries.

HOWEVER, whether mysqli supports parameterised queries or not is all a moot 
point because parameterised queries do not allow for parameterizing field names 
or table names.  And the point of this thread was to discuss how to mark SQL 
that has been composed at run-time to be "safe."  Without being able to 
parameterize field names and table names parameterised queries are not a 
sufficiently complete solution.  

At least not from an is_literal()/is_trusted() perspective.

> There’s zero reason to believe they would adopt this unless forced to.

Exactly! WordPress is not going to change, so that leaves it up to PHP 
developers using WordPress to protect themselves. 

Since I am *assuming* that everyone on this list wants to minimize the number 
of security issues in PHP applications across the web by whatever reasonable 
means necessary, I also assume that pointing at WordPress and saying "It's 
their fault not ours" does not help achieve those security goals?  So I am also 
assuming that finding a pragmatic solution that would allow PHP to be able to 
empower the developer in a way that WordPress won't might be worth considering.
 
But tell me if I am assuming wrongly here.

-Mike

[1] Yes, I made that up out of whole cloth. But honestly in 10+ years working 
with WP I have almost never seen anyone use anything but the $wpdb provided by 
WP — except for one single plugin — and $wpdb does not support parameterized 
queries.

Re: [PHP-DEV] Sql Object Model Parser & Sanitizer (was [RFC] is_literal)

2021-06-22 Thread Stephen Reay



> On 22 Jun 2021, at 20:39, Mike Schinkel  wrote:
> 
>> On Jun 22, 2021, at 9:00 AM, Kamil Tekiela  wrote:
>> 
>> Hi Mike,
>> 
>> Please don't do this. We already have PDO with prepared statements. The data 
>> must be bound. This is the secure way of writing SQL queries. 
> 
> The problem is that over 40% of the web currently runs on PHP code that using 
> mysqli.  That CMS does not support PDO nor prepared statements, and is 
> unlikely to switch to it anytime some in the foreseeable future.  
> 
> A SQL object model parser and sanitizer could more easily be used 
> incrementally by that CMS since PDO does not share connections with mysqli 
> (AFAIK, anyway.)
> 

(Resending from on-list address)

Apparently you didn't know mysqli supports parameterised queries?
Wordpress could have adopted parameterised queries when they grudgingly 
switched to mysqli, years after both it and PDO were introduced.
There’s zero reason to believe they would adopt this unless forced to.

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php



Re: [PHP-DEV] Sql Object Model Parser & Sanitizer (was [RFC] is_literal)

2021-06-22 Thread Mike Schinkel
> On Jun 22, 2021, at 9:00 AM, Kamil Tekiela  wrote:
> 
> Hi Mike,
> 
> Please don't do this. We already have PDO with prepared statements. The data 
> must be bound. This is the secure way of writing SQL queries. 

The problem is that over 40% of the web currently runs on PHP code that using 
mysqli.  That CMS does not support PDO nor prepared statements, and is unlikely 
to switch to it anytime some in the foreseeable future.  

A SQL object model parser and sanitizer could more easily be used incrementally 
by that CMS since PDO does not share connections with mysqli (AFAIK, anyway.)

> The idea behind SQL builder is to generate SQL, not to allow the data to be 
> sanitized. 

That is why the title of this email said "Parse and Sanitizer" not "Builder."

> Every time I hear the word sanitize I get goose bumps. You can't remove any 
> characters from a string to make it safe. If you want to use escaping, then 
> you need to do it context aware and properly formatted. Don't sanitize 
> anything. Format the SQL properly instead. 

I believe you are latching onto the word sanitizing as you understand it and 
ignoring the context of the discussion.

What I believe I am proposing is to implement an object that would be 
SQL-syntax-aware and — to use your words — "format the SQL properly."  

But maybe I am wrong?  Can you explain how what I am proposing would not be:

- "Removing the characters" needed to make it safe?  
- Context aware?

Note that I did not specifically talk about removing characters, and I did talk 
about context. I talked about parsing the SQL so as to ensure it is safe.

> On a general note. Implementing SQL builder in PHP would be an enormous task, 
> which is not feasible.

Is it possible you just scanned the first email in this thread and did not 
fully read it?  

For the email to which you replied I literally wrote:

I think we are talking two different things here.  And I will admit I am 
probably the one that is not using exactly the right term.  I had envisioned 
something, and then Dan said "SQL Builder" I used that term (wrongly.)Let me 
clarify by giving what I am referring to a different name:  A SQL Object Model 
with parser and sanitizer functionality.

What I was discussing would not be the enormous task you are referring to.  I 
was not proposing creating anything at all like these first two Google results 
for "php sql query builder: [1] and [2].  I was proposing a SQL object model, 
parser and "safe SQL generator" (I changed the last word since you triggered on 
it.)

> There are so many dialects, so many options, and even then it won't ever be 
> accurate as you don't have the full context in PHP.

My email addressed how dialects and options would be handled.  Using dependency 
injection of objects that define each dialect and provide access to the schema.

> SQL is a very powerful language, and building a parser for it in PHP would 
> mean that we either limit it to a subset of valid SQL commands, or we try to 
> create a super tool that is more powerful than MySQL, Oracle, PostgreSQL, 
> etc. combined. 

That is a false binary. It would not be a huge undertaking to create a generic 
query parser that used an object implementing an interface to provide it with 
the information needed to correctly parse a query _*enough*_ to sanitize it.  

And classes defining any given SQL subset could, if not in PHP core, be written 
in userland. And the DB vendor is a likely candidate to write them too.

> There's absolutely nothing wrong with writing SQL in PHP and preparing it on 
> the server. For database servers that don't support prepared statements we 
> already have PDO which is an abstraction library that tries to escape and 
> format data within SQL. It works 99% of the time. 

"Nothing _wrong_ with"  (per se), is true.  

But for applications that have 15 years of legacy code using mysql(i), PDO is a 
non-starter.

> The example you suggested already has a simple syntax in PHP. 
> 
> $conn = mysqli_connect(...);
> $stmt = $conn->prepare($sql);
> $stmt->execute([$_GET['openings'], $_GET['limit']]);

If PHP were to add functionality to core to use prepared statements with the 
mysqli connection, this discussion _might) be different.

But, as I said, for the CMS that controls >40% of the web PDO is a non-starter.

OTOH, what about when a developer needs to parameterize field and table names 
in PDO[3]?  PDO does not address that aspect of assembling a known-safe SQL 
string and requires manual sanitization. What I am proposing _*would*_ address 
this.  

IOW, PDO is a tool for a different albeit overlapping use-case, and could even 
leverage PDO which as applicable to achieve its objectives.

-Mike

[1] https://github.com/nilportugues/php-sql-query-builder
[2] https://github.com/ClanCats/Hydrahon
[3] https://stackoverflow.com/a/182353/102699



Re: [PHP-DEV] Sql Object Model Parser & Sanitizer (was [RFC] is_literal)

2021-06-22 Thread Pierre

Le 22/06/2021 à 15:00, Kamil Tekiela a écrit :

Hi Mike,

Please don't do this. We already have PDO with prepared statements. The
data must be bound. This is the secure way of writing SQL queries. The idea
behind SQL builder is to generate SQL, not to allow the data to be
sanitized.
Every time I hear the word sanitize I get goose bumps. You can't remove any
characters from a string to make it safe. If you want to use escaping, then
you need to do it context aware and properly formatted. Don't sanitize
anything. Format the SQL properly instead.

On a general note. Implementing SQL builder in PHP would be an enormous
task, which is not feasible. There are so many dialects, so many options,
and even then it won't ever be accurate as you don't have the full context
in PHP. SQL is a very powerful language, and building a parser for it in
PHP would mean that we either limit it to a subset of valid SQL commands,
or we try to create a super tool that is more powerful than MySQL, Oracle,
PostgreSQL, etc. combined.
There's absolutely nothing wrong with writing SQL in PHP and preparing it
on the server. For database servers that don't support prepared statements
we already have PDO which is an abstraction library that tries to escape
and format data within SQL. It works 99% of the time.

The example you suggested already has a simple syntax in PHP.

$conn = mysqli_connect(...);
$stmt = $conn->prepare($sql);
$stmt->execute([$_GET['openings'], $_GET['limit']]);

Regards,
Kamil

I fully agree with you. Any attempt to be smart in doing this will 
eventually end-up in no-so-smart corner-case bugs that will be very, 
very hard to deal with. It's almost an impossible mission. And if by any 
luck you'd succeed in making this work, it probably would be a 
maintenance nightmare.


Regards,

--

Pierre

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php



Re: [PHP-DEV] Sql Object Model Parser & Sanitizer (was [RFC] is_literal)

2021-06-22 Thread Kamil Tekiela
Hi Mike,

Please don't do this. We already have PDO with prepared statements. The
data must be bound. This is the secure way of writing SQL queries. The idea
behind SQL builder is to generate SQL, not to allow the data to be
sanitized.
Every time I hear the word sanitize I get goose bumps. You can't remove any
characters from a string to make it safe. If you want to use escaping, then
you need to do it context aware and properly formatted. Don't sanitize
anything. Format the SQL properly instead.

On a general note. Implementing SQL builder in PHP would be an enormous
task, which is not feasible. There are so many dialects, so many options,
and even then it won't ever be accurate as you don't have the full context
in PHP. SQL is a very powerful language, and building a parser for it in
PHP would mean that we either limit it to a subset of valid SQL commands,
or we try to create a super tool that is more powerful than MySQL, Oracle,
PostgreSQL, etc. combined.
There's absolutely nothing wrong with writing SQL in PHP and preparing it
on the server. For database servers that don't support prepared statements
we already have PDO which is an abstraction library that tries to escape
and format data within SQL. It works 99% of the time.

The example you suggested already has a simple syntax in PHP.

$conn = mysqli_connect(...);
$stmt = $conn->prepare($sql);
$stmt->execute([$_GET['openings'], $_GET['limit']]);

Regards,
Kamil


[PHP-DEV] Sql Object Model Parser & Sanitizer (was [RFC] is_literal)

2021-06-22 Thread Mike Schinkel
> On Jun 22, 2021, at 6:45 AM, Pierre  wrote:
> 
> Le 22/06/2021 à 11:28, Dan Ackroyd a écrit :
>> On Tue, 22 Jun 2021 at 10:25, Mike Schinkel  wrote:
>>> Should(n't?) PHP add a basic SQL builder class that can be extended for 
>>> special cases, e.g. different flavors of SQL?
>>> 
>> No. Or at least not yet.
>> 
>> This type of thing is much better done in userland, where the api can
>> evolve at a fast rate, rather than being limited by the fixed release
>> schedule of PHP.
> 
> Agreed, PHP is probably not the right place for an SQL builder, there's too 
> many dialects, too many standard or non-standard features, in the end SQL 
> builders almost always end-up being opinionated by its designer's original 
> need, philosophy, or SQL usage habits, and tailored for users which use 
> certain paradigms.
> 
> An SQL query builder is already business domain tied, in some way. Of course 
> most are very generic, but often don't handle properly SQL advanced or modern 
> features, whereas some other where built for ORMs and such and may a reduced 
> advanced SQL features that fits the original need.
> 
> I don't wish to see an SQL query builder in PHP core, instead of favoring 
> usage of modern SQL, it'll in my opinion (of course, that's subjective) lower 
> the expectation of people and probably normalize SQL-92 being the only SQL 
> spoken by people writing PHP (just exaggerating a bit).


I think we are talking two different things here.  And I will admit I am 
probably the one that is not using exactly the right term.  I had envisioned 
something, and then Dan said "SQL Builder" I used that term.

Let me clarify by giving what I am referring to a different name:  A SQL Object 
Model with parser and sanitizer functionality.  One with a simple interface and 
one that would need not be tied to or limited by any specific dialect of SQL.

To illustrate what I mean I created the following straw man example.  It 
assumes a hypothetical SqlObjectModel classes that needs to be instantiated 
with an objects that implement hypothetical `SqlDialectInterface` and 
`SqlSchemaInterface` instances passed to its constructor.  The former defines 
the rules for the MySql dialect, and the latter accesses the schema to validate 
data and data types.  

Some "dialects" and "validators" for very well-known and widely used database 
servers such as MySQL could be implemented in PHP core, while others could be 
implemented in userland (the database vendors would be good candidate to 
implement them, actually.)  I don't yet know what those interfaces should look 
like but I am pretty sure that could be figured out.

So here is the example (I omitted error handling logic for brevity and clarity):

$sql = << {openings}
GROUP BY
c.company_name
LIMIT 
{limit}
SQL;

$conn = mysqli_connect(...);
$som = new SqlObjectModel(new MySqlDialect(), new MySqlSchema($conn));
$som->parse($sql);
$som->set( "limit", $_GET['limit'] );
$som->set( "openings", $_GET['openings'] );
$result = mysqli_query( $conn, $som->sql() );

The SQLObjectModel *could* potentially have additional features like the 
ability to get and set the table name, list of fields, joins, where clauses, 
etc. but the core value-add is not in building SQL but rather parsing and 
sanitizing with properly escaped and quoted arguments so that known-safe SQL 
could be generated.

Existing or new SQL builders could use this to take SqlObjectModel() and pass 
its generated SQL through to ensure their SQL they will output is safe, which 
by the way the string could now be tagged as "trusted" before the builder 
returns the SQL to its developer, assuming an is_trusted() RFC were to pass.

Similarly, any ORM could ensure its SQL is fully sanitized and then pass on to 
whatever functions are needed to execute the SQL.

So, absolutely nothing here would keep people from using "modern SQL," and/or 
cause people to lower the expectation to normalize SQL-92.  This could be a 
generic query parser and sanitizer that could be configured by the "dialect" 
and "schema" objects passed to it and its scope it small, simple and 
straightforward.

I hope that clarifies what I was thinking?

-Mike