Re: [PHP-DEV] [RFC] [Under Discussion] New Curl URL API

2022-07-01 Thread Tim Düsterhus

Hi

On 7/1/22 12:07, Rowan Tommins wrote:

My concern is rather the opposite: if it is not obvious *to someone writing
PHP code* how the API should be used, there is a higher chance of them
using it incorrectly, and introducing errors.

A good example of this mindset is the password_* functions: they do not
expose the options of underlying implementations, they present an
easy-to-use *opinionated* set of behaviours, that solves a common use case
for their audience. They allow users to "fall into the pit of success",
because the easy thing is also the correct thing.


The difference to me is that "curl" is a well-established library with 
an existing API. By faithfully representing the original API (e.g. the 
naming of the options/flags), I can leverage the existing documentation 
or Stack Overflow Q to learn what exactly a specific option does.


This would be different for an "url" or "http" or "uri" extension. That 
one might use cURL currently and might use 
https://github.com/uriparser/uriparser in a future version if there are 
good arguments to change the underlying library. It would just guarantee 
you that it can process URIs correctly according to whatever API one 
creates.


The password_* family uses such a generic term: "password". It does not 
pretend to be "libargon2".


In other words: If it says "curl" on the box, then I expect to have 
"curl" within the box.



However, to me at least, it's not entirely clear who the target audience
for this API is, what tasks it should be making easy, and what correct
usage looks like. Just providing a bunch of functions, in whatever form,
doesn't provide security unless users can understand how to use them
securely.


Best regards
Tim Düsterhus

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php



Re: [PHP-DEV] [RFC] [Under Discussion] New Curl URL API

2022-07-01 Thread Rowan Tommins
On Thu, 30 Jun 2022 at 23:21, Levi Morrison via internals <
internals@lists.php.net> wrote:

> On Thu, Jun 30, 2022 at 10:48 AM Pierrick Charron 
> wrote:
> > One of the goal of this API is to tighten a problematic vulnerable area
> > for applications where the URL parser library would believe one thing
> > and libcurl another. This could and has sometimes led to security
> > problems.
>
> Designing another API on top of what libcurl provides _could make the
> problem worse_.



My concern is rather the opposite: if it is not obvious *to someone writing
PHP code* how the API should be used, there is a higher chance of them
using it incorrectly, and introducing errors.

A good example of this mindset is the password_* functions: they do not
expose the options of underlying implementations, they present an
easy-to-use *opinionated* set of behaviours, that solves a common use case
for their audience. They allow users to "fall into the pit of success",
because the easy thing is also the correct thing.

However, to me at least, it's not entirely clear who the target audience
for this API is, what tasks it should be making easy, and what correct
usage looks like. Just providing a bunch of functions, in whatever form,
doesn't provide security unless users can understand how to use them
securely.

Regards,
-- 
Rowan Tommins
[IMSoP]


Re: [PHP-DEV] [RFC] [Under Discussion] New Curl URL API

2022-06-30 Thread Levi Morrison via internals
On Thu, Jun 30, 2022 at 10:48 AM Pierrick Charron  wrote:
>
> Hi all,
>
>
> > - The new CurlUrl class should probably be immutable from the start. It
> > was my biggest mistake not to do that with DateTime.
> >
> >
> After thinking about it and some discussions, I followed Derick's
> recommendation and therefore changed the RFC to make the CurlUrl class
> immutable. All the setters were replaced by new `with*` methods.
> For example setHost is now withHost and will return a new object with the
> host modified. This will prevent confusing behavior where the CurlUrl
> object would be unintentionally modified after being attached to a
> CurlHandle.
>
> Pierrick

It's clear people do not agree on how we should be designing the APIs
for 3rd party extensions. However, let me redraw attention to the
introduction of the RFC:

> One of the goal of this API is to tighten a problematic vulnerable area
> for applications where the URL parser library would believe one thing
> and libcurl another. This could and has sometimes led to security
> problems.

Designing another API on top of what libcurl provides _could make the
problem worse_. I am fine with these kinds of adjustments:

 1. Using exceptions instead of return codes.
 2. Using enums instead of constants if it makes sense (it may not if
they are bitwise-or'd together, which is pretty common for C libs).
 3. Renaming things that have keyword or reserved word conflicts.

I am not fine with designing an immutable, with* style API that
doesn't mirror the underlying library at all. At least, not by itself;
I'd be okay with having both, where the nicer API is built on top of
the lower level, but what is nicer is subjective. As this thread
shows, designing a nicer API will have quite a bit more discussion and
disagreement than "exposing" or "porting" libcurl's API.

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php



Re: [PHP-DEV] [RFC] [Under Discussion] New Curl URL API

2022-06-30 Thread Pierrick Charron
Hi all,


> - The new CurlUrl class should probably be immutable from the start. It
> was my biggest mistake not to do that with DateTime.
>
>
After thinking about it and some discussions, I followed Derick's
recommendation and therefore changed the RFC to make the CurlUrl class
immutable. All the setters were replaced by new `with*` methods.
For example setHost is now withHost and will return a new object with the
host modified. This will prevent confusing behavior where the CurlUrl
object would be unintentionally modified after being attached to a
CurlHandle.

Pierrick


Re: [PHP-DEV] [RFC] [Under Discussion] New Curl URL API

2022-06-28 Thread Rowan Tommins
On Mon, 27 Jun 2022 at 22:32, Pierrick Charron  wrote:

> That's just an example with an old version of PHP, but let's say you have
> some code that makes requests but only to a specific list of servers, so
> you want to analyze the URL and check if the host is in a whitelist. If the
> provided URL is "http://127.0.0.1:11211#@google.com:80/; and that you
> used PHP <= 7.0.13 your parse_url function would tell you that the domain
> you're trying to request is google.com so everything is fine, but in fact
> when the call to curl is made, curl would call 127.0.0.1. This one was
> fixed but the problem could still occur if the parser is not the same as
> the one used in the requester.
>

...


> You can use CurlUrl within your implementation of UriInterface but for the
> same reason if you're using another request engine than curl, you may have
> the same security problem where curl will not parse the same data. If you
> want to make sure that your CurlUrl object represents the same thing as
> your UriInterface you could build the CurlUrl object part by part using
> your UriInterface. When you assign your CurlUrl to your CurlHandle with the
> CURLOPT_CURLU option, curl will use the parts directly instead of parsing
> the URL again, so you're sure that the host will be the one you set with
> `CurlUrl::setHost()` and so on.
>


This is actually a lot trickier than it sounds.

Imagine this code, with the bug you gave as an example still present in one
of the libraries we're interacting with:

$url = new MyUrlObject("http://127.0.0.1:11211#@google.com:80/;);
var_dump($url->getHostName()); // ???

This won't work, because we don't know which parser to call; it needs to be
something like this:

var_dump($url->getHostName(UrlContext::CURL)); // '127.0.0.1'
var_dump($url->getHostName(UrlContext::BROKEN_PHP)); // 'google.com'

Then we call this:

$url->setHostName("duckduckgo.com");
var_dump($url->getFullUrl(); // ???

Again, the result depends on context:

var_dump($url->getFullUrl( UrlContext::CURL )); // "
http://duckduckgo.com#@google.com:80/;
var_dump($url->getFullUrl( UrlContext::BROKEN_PHP )); // "
http://127.0.0.1:11211#@duckduckgo.com:80/;

But that means we can't actually apply the setHostName change until the
call to getFullUrl(), because we don't know how the original URL will be
parsed. Instead, the object has to internally store a queue of applied
modifications, and then reproduce them, in order, on the underlying
implementation.

Alternatively, we can build our implementation to assume it will always be
used in the context of curl, or for debugging curl, so can just have
getHostName() and setHostName() directly use curl's implementation. Which
leads us back to where we started, of using CurlUrl directly or via a thin
wrapper, as either a URL parser+builder, or as a value object in its own
right.

I don't really know how to make all this nuance clear to users, but that
makes me a bit wary of adding the object to PHP in its current design.

Regards,
-- 
Rowan Tommins
[IMSoP]


Re: [PHP-DEV] [RFC] [Under Discussion] New Curl URL API

2022-06-27 Thread Pierrick Charron
Hi Rowan


> If I've got a URL, which is already a string, what code would I write to
> "do some checks" on it, outside of a unit test?
>

That's just an example with an old version of PHP, but let's say you have
some code that makes requests but only to a specific list of servers, so
you want to analyze the URL and check if the host is in a whitelist. If the
provided URL is "http://127.0.0.1:11211#@google.com:80/; and that you used
PHP <= 7.0.13 your parse_url function would tell you that the domain you're
trying to request is google.com so everything is fine, but in fact when the
call to curl is made, curl would call 127.0.0.1. This one was fixed but the
problem could still occur if the parser is not the same as the one used in
the requester.


>
> If I'm using CurlUrl to "add/delete/overwrite some parts" how is that
> not "using it alone as a representation of an URL"?
>
>
What I meant here was that if you're not using curl, you have no advantage
of using this class alone to parse since the requester you're using could
handle the URL differently.


> If I'm writing a PSR-7 object, am I only supposed to use CurlUrl when
> interfacing with curl, and generate the string myself for other
> purposes? If the implementation I come up with differs from curl's, how
> does the user know which is the "real" URL?
>
>
You can use CurlUrl within your implementation of UriInterface but for the
same reason if you're using another request engine than curl, you may have
the same security problem where curl will not parse the same data. If you
want to make sure that your CurlUrl object represents the same thing as
your UriInterface you could build the CurlUrl object part by part using
your UriInterface. When you assign your CurlUrl to your CurlHandle with the
CURLOPT_CURLU option, curl will use the parts directly instead of parsing
the URL again, so you're sure that the host will be the one you set with
`CurlUrl::setHost()` and so on.

Pierrick

[1]
https://www.blackhat.com/docs/us-17/thursday/us-17-Tsai-A-New-Era-Of-SSRF-Exploiting-URL-Parser-In-Trending-Programming-Languages.pdf


Re: [PHP-DEV] [RFC] [Under Discussion] New Curl URL API

2022-06-27 Thread Rowan Tommins

On 27/06/2022 14:10, Pierrick Charron wrote:

The purpose of the CurlUrl class is to be able to build/see/modify an URL
as it is seen by libcurl before/after passing it to your CurlHandle. It is
not meant to be used alone as a representation of an URL.

For example, you may want to do some checks to make sure that the URL you
gave to your CurlHandle is well formatted and that the host or any other
parts are what you want them to be, and that it was not something else
because of some differences on how the URL was parsed. Or you may want to
add/delete/overwrite some parts to sanitize your URL. That's why you have
those setters/getters. This class is definitively not there to replace
PSR-7 UriInterface. I can imagine some Guzzle or other implementation to
build a CurlUrl handle from a UriInterface before giving it to curl.



This leaves me a bit baffled, frankly.

If I've got a URL, which is already a string, what code would I write to 
"do some checks" on it, outside of a unit test?


If I'm using CurlUrl to "add/delete/overwrite some parts" how is that 
not "using it alone as a representation of an URL"?


If I'm writing a PSR-7 object, am I only supposed to use CurlUrl when 
interfacing with curl, and generate the string myself for other 
purposes? If the implementation I come up with differs from curl's, how 
does the user know which is the "real" URL?



Regards,

--
Rowan Tommins
[IMSoP]

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php



Re: [PHP-DEV] [RFC] [Under Discussion] New Curl URL API

2022-06-27 Thread Pierrick Charron
Hi Rowan

>
> Rather than a *representation* of a URL, think of the class as a
> *builder* for URLs. There are multiple methods because you might want to
> build the URL in different orders ("start with this URL but replace the
> port", "start with this domain and I'll add the path later", etc). All
> the flags are related to how the input should be interpreted, and the
> output manipulated, in order to build a correctly formatted URL string.
>

Sorry if it was really not clear in the RFC since I didn't even talk about
the CURLOPT_CURLU option, but this class is not only there to parse/build
strings for Curl but to give this specific object to Curl instead of a
string representation of an URL.


>
> Maybe it should even be called CurlUrlBuilder? That also fits with the
> design of having mutable setters; as Derick pointed out, mutable value
> objects are generally a bad idea, so it would make sense to encourage
> users to think of this as a way to get one or more strings, rather than
> as a result in itself.
>

Since you can give this object to curl instead of an URL string, I would
not call it CurlUrlBuilder.

Regards,
Pierrick


Re: [PHP-DEV] [RFC] [Under Discussion] New Curl URL API

2022-06-27 Thread Pierrick Charron
Hi Michał,

Thanks for your comments. You made me realise that the RFC missed
information on a crucial part which is the new CURLOPT_CURLU option that
tells curl to use the CurlUrl object instead of the "standard" CURLOPT_URL
option. I just added some information on it in the RFC.

The purpose of the CurlUrl class is to be able to build/see/modify an URL
as it is seen by libcurl before/after passing it to your CurlHandle. It is
not meant to be used alone as a representation of an URL.

For example, you may want to do some checks to make sure that the URL you
gave to your CurlHandle is well formatted and that the host or any other
parts are what you want them to be, and that it was not something else
because of some differences on how the URL was parsed. Or you may want to
add/delete/overwrite some parts to sanitize your URL. That's why you have
those setters/getters. This class is definitively not there to replace
PSR-7 UriInterface. I can imagine some Guzzle or other implementation to
build a CurlUrl handle from a UriInterface before giving it to curl.

So this RFC is really targeted to users of curl_* functions.

Le lun. 27 juin 2022, à 02 h 29, Michał Marcin Brzuchalski <
michal.brzuchal...@gmail.com> a écrit :

> Hi Pierrick
>
> śr., 22 cze 2022 o 06:38 Pierrick Charron  napisał(a):
>
>> Hi,
>>
>> Following our discussions we had on the subject of the new Curl URL API,
>> and other curl improvements. I decided to only focus on adding the new
>> Curl
>> URL API and put aside all other improvements. Here is the RFC that
>> reflects
>> our current conversations.
>>
>> https://wiki.php.net/rfc/curl-url-api
>
>
> TBH I have a problem with the proposed Curl* classes.
> IMO they present a mix of responsibilities that I don't like.
>
> CurlUrl is for me is a mix of Url/Uri object properties well known from
> other userland implementations like the PSR one
> with Uri specific like the host, scheme, port, path, query, fragment, etc.
> but on the other hand, we have flags and options which
> purpose is to pass Curl specific things but in IMO wrong place instead
> (for instance Guzzle use separate argument in all methods for options and
> flags.
> Secondly, it has some default logic like `setScheme` allows to put a
> scheme but requires the supported one, with an exception that you can
> ignore support checking by a flag - IMO this logic belongs to its consumer
> logic and is not part of URL class logic and the `setPort` like above.
> Next thing is that it again has all the getters and setters and is not
> immutable. Why can't it be a simple constructor with read-only properties,
> no getters, no setters? Maybe I miss some discussion about why here.
>
>
> The same goes for CurlException which looks like is an exception for
> handling some encoding of Url and not exactly to the Url properties itself.
>
> But with all that said, if the target audience is only a user of low lever
> `curl_*` functions rather than users of higher abstraction libraries like
> Guzzle or Httplug then, I guess I don't care that much.
> I'm only afraid that such a mix of responsibilities can lead to bad habits
> when it gets to the separation of concerns by developers who get used to it
> and won' see the problem as I do.
>
> Cheers,
> Michał Marcin Brzuchalski
>


Re: [PHP-DEV] [RFC] [Under Discussion] New Curl URL API

2022-06-27 Thread Rowan Tommins

On 27/06/2022 07:29, Michał Marcin Brzuchalski wrote:

CurlUrl is for me is a mix of Url/Uri object properties well known from
other userland implementations like the PSR one
with Uri specific like the host, scheme, port, path, query, fragment, etc.
but on the other hand, we have flags and options which
purpose is to pass Curl specific things but in IMO wrong place



Rather than a *representation* of a URL, think of the class as a 
*builder* for URLs. There are multiple methods because you might want to 
build the URL in different orders ("start with this URL but replace the 
port", "start with this domain and I'll add the path later", etc). All 
the flags are related to how the input should be interpreted, and the 
output manipulated, in order to build a correctly formatted URL string.


Maybe it should even be called CurlUrlBuilder? That also fits with the 
design of having mutable setters; as Derick pointed out, mutable value 
objects are generally a bad idea, so it would make sense to encourage 
users to think of this as a way to get one or more strings, rather than 
as a result in itself.


Regards,

--
Rowan Tommins
[IMSoP]

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php



Re: [PHP-DEV] [RFC] [Under Discussion] New Curl URL API

2022-06-27 Thread Michał Marcin Brzuchalski
Hi Pierrick

śr., 22 cze 2022 o 06:38 Pierrick Charron  napisał(a):

> Hi,
>
> Following our discussions we had on the subject of the new Curl URL API,
> and other curl improvements. I decided to only focus on adding the new Curl
> URL API and put aside all other improvements. Here is the RFC that reflects
> our current conversations.
>
> https://wiki.php.net/rfc/curl-url-api


TBH I have a problem with the proposed Curl* classes.
IMO they present a mix of responsibilities that I don't like.

CurlUrl is for me is a mix of Url/Uri object properties well known from
other userland implementations like the PSR one
with Uri specific like the host, scheme, port, path, query, fragment, etc.
but on the other hand, we have flags and options which
purpose is to pass Curl specific things but in IMO wrong place instead (for
instance Guzzle use separate argument in all methods for options and flags.
Secondly, it has some default logic like `setScheme` allows to put a scheme
but requires the supported one, with an exception that you can ignore
support checking by a flag - IMO this logic belongs to its consumer logic
and is not part of URL class logic and the `setPort` like above.
Next thing is that it again has all the getters and setters and is not
immutable. Why can't it be a simple constructor with read-only properties,
no getters, no setters? Maybe I miss some discussion about why here.


The same goes for CurlException which looks like is an exception for
handling some encoding of Url and not exactly to the Url properties itself.

But with all that said, if the target audience is only a user of low lever
`curl_*` functions rather than users of higher abstraction libraries like
Guzzle or Httplug then, I guess I don't care that much.
I'm only afraid that such a mix of responsibilities can lead to bad habits
when it gets to the separation of concerns by developers who get used to it
and won' see the problem as I do.

Cheers,
Michał Marcin Brzuchalski


Re: [PHP-DEV] [RFC] [Under Discussion] New Curl URL API

2022-06-24 Thread Rowan Tommins
On Fri, 24 Jun 2022 at 15:05, Jeffrey Dafoe  wrote:

> A thin wrapper would be the most flexible. Someone can always write a
> "friendlier" class using your thin wrapper, as you mentioned, but one
> cannot easily go the other direction.
>


I think this argument has some validity when talking about the entire API
of a library like curl. But we're not, we're talking about a tiny bounded
context.

The underlying implementation consists of 6 functions, 4 of which are
trivial:

* An argument-less "constructor" https://curl.se/libcurl/c/curl_url.html
* An equivalent to PHP "clone" https://curl.se/libcurl/c/curl_url_dup.html
* A cleanup method, which would be automatic in any PHP implementation
https://curl.se/libcurl/c/curl_url_cleanup.html
* A function for looking up error strings from codes
https://curl.se/libcurl/c/curl_url_strerror.html

That leaves the main getter and setter functions
https://curl.se/libcurl/c/curl_url_get.html and
https://curl.se/libcurl/c/curl_url_set.html which take the same arguments:

* A resource handle
* The URL part to handle
* The new value for set, or an output parameter for get
* A set of flags

If we really believe the goal is to expose the C API in PHP, the get
function would look like this:

/**
 * @return int Error code
 */
function curl_url_get(resource $url, int $what, string &$part, int $flags):
int

That would be ... awful, and I hope nobody's really suggesting that. So we
have to map each part to some concept in PHP, which is what the proposal
does:

* The resource handle becomes $this
* The $what argument becomes the method name
* The output parameter becomes the return value
* The error code becomes an exception

There's no "flexibility" which has been lost, and no "future changes" which
are being prevented.


The only other change is a few renamed constants, which I suggested on the
PR. I can see an argument for making them match the library exactly; but
it's the *values* that actually matter, so I don't see why we shouldn't
choose our own names if they're descriptive of the functionality.


Regards,
-- 
Rowan Tommins
[IMSoP]


Re: [PHP-DEV] [RFC] [Under Discussion] New Curl URL API

2022-06-24 Thread Larry Garfield
On Fri, Jun 24, 2022, at 8:49 AM, Pierrick Charron wrote:
> Hi all, and thanks Levi for your feedback,
>
> If you look at the first thread (Discussion about new Curl URL API and
> ext/curl improvements) you'll see that this was my first approach. I even
> proposed to "OOPify" the actual CurlHandle & co objects with "simple"
> methods like $ch->setOpt(). Anyone who reads the actual API can see this is
> the actual "philosophy" of the extension. It was designed (on purpose or
> not) as a thin wrapper over curl. But lets focus on URL API only !
>
> With previous thread answers, I was under the impression that I really was
> the only one liking the approach of following the cURL api as much as
> possible and leaving the more high level API to things like Guzzle & co
> (for example by adding an Implementation of PSR-7 UriInterface using this
> API) since all the others parts of the API are done this way. I think it's
> important to have this new Curl URL API in 8.2 since it fixes security
> issues and that's of course not ideal but I would rather have an
> implementation that would not be my first choice, than not having it at all.
>
> What do you think ? I would love more people to give feedback on what they
> are expecting, if they don't care if they prefer one approach or the other
> and of course why ? I was thinking about doing a vote on this, but I'm not
> sure it's a good idea. What do you all think ?
>
> Regards,
> Pierrick

The root question, I think, is who the consumer is for this RFC?

Is the consumer PHP developers?  Then it should be a good PHP API.

Is the consumer the Guzzle team and everyone else just uses Guzzle and ignores 
Curl/CurlUrl?  Then it should probably adhere as closely as feasible to Curl's 
API, awful as it is, and be documented out the wazoo.

There likely isn't time for 8.2 to do the first option, but probably is for the 
second.

--Larry Garfield

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php



Re: [PHP-DEV] [RFC] [Under Discussion] New Curl URL API

2022-06-24 Thread Kamil Tekiela
Please do not add yet another thin wrapper of an underlying C API. PHP is
not a drop-in replacement of C. PHP is a much higher-level language.
Developers should not have to understand how the underlying library works
to be able to use PHP. We need to move away from thin C wrappers as a
language. PHP should abstract more, not less, of C code.

The new API doesn't have to have exactly the same names as the C library.
Please follow PHP naming conventions and implement OOP-based API.


Re: [PHP-DEV] [RFC] [Under Discussion] New Curl URL API

2022-06-24 Thread Tim Düsterhus

Hi

On 6/24/22 15:49, Pierrick Charron wrote:

If you look at the first thread (Discussion about new Curl URL API and
ext/curl improvements) you'll see that this was my first approach. I even
proposed to "OOPify" the actual CurlHandle & co objects with "simple"
methods like $ch->setOpt(). Anyone who reads the actual API can see this is
the actual "philosophy" of the extension. It was designed (on purpose or
not) as a thin wrapper over curl. But lets focus on URL API only !

With previous thread answers, I was under the impression that I really was
the only one liking the approach of following the cURL api as much as
possible and leaving the more high level API to things like Guzzle & co
(for example by adding an Implementation of PSR-7 UriInterface using this
API) since all the others parts of the API are done this way. I think it's
important to have this new Curl URL API in 8.2 since it fixes security
issues and that's of course not ideal but I would rather have an
implementation that would not be my first choice, than not having it at all.

What do you think ? I would love more people to give feedback on what they
are expecting, if they don't care if they prefer one approach or the other
and of course why ? I was thinking about doing a vote on this, but I'm not
sure it's a good idea. What do you all think ?


I agree with Levi. The curl extension should be a thin wrapper over libcurl.

A high level API that is more convenient to use may be provided by PHP, 
but it should not be called curl. It should have a generic name with the 
fact that curl does the heavy lifting only being an implementation detail.


Best regards
Tim Düsterhus

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php



RE: [PHP-DEV] [RFC] [Under Discussion] New Curl URL API

2022-06-24 Thread Jeffrey Dafoe
Hello Pierrick,

A thin wrapper would be the most flexible. Someone can always write a 
"friendlier" class using your thin wrapper, as you mentioned, but one cannot 
easily go the other direction.

-Jeff


-Original Message-
From: Pierrick Charron  
Sent: Friday, June 24, 2022 9:49 AM
To: Levi Morrison 
Cc: PHP internals 
Subject: Re: [PHP-DEV] [RFC] [Under Discussion] New Curl URL API

Hi all, and thanks Levi for your feedback,

If you look at the first thread (Discussion about new Curl URL API and ext/curl 
improvements) you'll see that this was my first approach. I even proposed to 
"OOPify" the actual CurlHandle & co objects with "simple"
methods like $ch->setOpt(). Anyone who reads the actual API can see this is the 
actual "philosophy" of the extension. It was designed (on purpose or
not) as a thin wrapper over curl. But lets focus on URL API only !

With previous thread answers, I was under the impression that I really was the 
only one liking the approach of following the cURL api as much as possible and 
leaving the more high level API to things like Guzzle & co (for example by 
adding an Implementation of PSR-7 UriInterface using this
API) since all the others parts of the API are done this way. I think it's 
important to have this new Curl URL API in 8.2 since it fixes security issues 
and that's of course not ideal but I would rather have an implementation that 
would not be my first choice, than not having it at all.

What do you think ? I would love more people to give feedback on what they are 
expecting, if they don't care if they prefer one approach or the other and of 
course why ? I was thinking about doing a vote on this, but I'm not sure it's a 
good idea. What do you all think ?

Regards,
Pierrick







Le jeu. 23 juin 2022, à 12 h 49, Levi Morrison  a 
écrit :

> On Tue, Jun 21, 2022 at 10:38 PM Pierrick Charron 
> wrote:
> >
> > Hi,
> >
> > Following our discussions we had on the subject of the new Curl URL 
> > API, and other curl improvements. I decided to only focus on adding 
> > the new
> Curl
> > URL API and put aside all other improvements. Here is the RFC that
> reflects
> > our current conversations.
> >
> > https://wiki.php.net/rfc/curl-url-api
> >
> > Feel free to give any feedback, concern or support :-)
> >
> > Regards,
> > Pierrick
>
> IMO, this should mirror the low-level curl url API very directly. The 
> basis of my opinion is that we do not own libcurl; we are merely 
> adapting it for use in PHP. We cannot anticipate changes in their 
> design, nor do we have authority to do so if we feel something should 
> change. Touching it as little as possible makes it easier to track 
> upstream changes, etc.
>
> Based on that, I think the naming should be closer to libcurl.:
>   - CurlUrl::URL_ENCODE should be CurlUrl::URLENCODE
>   - CurlUrl::URL_DECODE should be CurlUrl::URLDECODE
>
> And so on, only differing if necessary because something is a reserved 
> word. The API should be as exact as possible to what libcurl provides.
> The "helpers" getHost, getPassword, etc should be removed and should 
> expose `curl_url_get` more directly.
>
> Of course, it should be object based instead of resource based, but 
> that's it.
>
> A nicer API can be built on top of it, but I don't think that's the 
> role this particular API should play.
>

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php



Re: [PHP-DEV] [RFC] [Under Discussion] New Curl URL API

2022-06-24 Thread Pierrick Charron
Hi all, and thanks Levi for your feedback,

If you look at the first thread (Discussion about new Curl URL API and
ext/curl improvements) you'll see that this was my first approach. I even
proposed to "OOPify" the actual CurlHandle & co objects with "simple"
methods like $ch->setOpt(). Anyone who reads the actual API can see this is
the actual "philosophy" of the extension. It was designed (on purpose or
not) as a thin wrapper over curl. But lets focus on URL API only !

With previous thread answers, I was under the impression that I really was
the only one liking the approach of following the cURL api as much as
possible and leaving the more high level API to things like Guzzle & co
(for example by adding an Implementation of PSR-7 UriInterface using this
API) since all the others parts of the API are done this way. I think it's
important to have this new Curl URL API in 8.2 since it fixes security
issues and that's of course not ideal but I would rather have an
implementation that would not be my first choice, than not having it at all.

What do you think ? I would love more people to give feedback on what they
are expecting, if they don't care if they prefer one approach or the other
and of course why ? I was thinking about doing a vote on this, but I'm not
sure it's a good idea. What do you all think ?

Regards,
Pierrick







Le jeu. 23 juin 2022, à 12 h 49, Levi Morrison 
a écrit :

> On Tue, Jun 21, 2022 at 10:38 PM Pierrick Charron 
> wrote:
> >
> > Hi,
> >
> > Following our discussions we had on the subject of the new Curl URL API,
> > and other curl improvements. I decided to only focus on adding the new
> Curl
> > URL API and put aside all other improvements. Here is the RFC that
> reflects
> > our current conversations.
> >
> > https://wiki.php.net/rfc/curl-url-api
> >
> > Feel free to give any feedback, concern or support :-)
> >
> > Regards,
> > Pierrick
>
> IMO, this should mirror the low-level curl url API very directly. The
> basis of my opinion is that we do not own libcurl; we are merely
> adapting it for use in PHP. We cannot anticipate changes in their
> design, nor do we have authority to do so if we feel something should
> change. Touching it as little as possible makes it easier to track
> upstream changes, etc.
>
> Based on that, I think the naming should be closer to libcurl.:
>   - CurlUrl::URL_ENCODE should be CurlUrl::URLENCODE
>   - CurlUrl::URL_DECODE should be CurlUrl::URLDECODE
>
> And so on, only differing if necessary because something is a reserved
> word. The API should be as exact as possible to what libcurl provides.
> The "helpers" getHost, getPassword, etc should be removed and should
> expose `curl_url_get` more directly.
>
> Of course, it should be object based instead of resource based, but that's
> it.
>
> A nicer API can be built on top of it, but I don't think that's the
> role this particular API should play.
>


Re: [PHP-DEV] [RFC] [Under Discussion] New Curl URL API

2022-06-23 Thread Rowan Tommins
On 23 June 2022 17:48:57 BST, Levi Morrison via internals 
 wrote:
>IMO, this should mirror the low-level curl url API very directly. The
>basis of my opinion is that we do not own libcurl; we are merely
>adapting it for use in PHP. We cannot anticipate changes in their
>design, nor do we have authority to do so if we feel something should
>change. Touching it as little as possible makes it easier to track
>upstream changes, etc.
>
>Based on that, I think the naming should be closer to libcurl.:
>  - CurlUrl::URL_ENCODE should be CurlUrl::URLENCODE
>  - CurlUrl::URL_DECODE should be CurlUrl::URLDECODE
>
>And so on, only differing if necessary because something is a reserved
>word. The API should be as exact as possible to what libcurl provides.
>The "helpers" getHost, getPassword, etc should be removed and should
>expose `curl_url_get` more directly.
>
>Of course, it should be object based instead of resource based, but that's it.
>
>A nicer API can be built on top of it, but I don't think that's the
>role this particular API should play.


For the record, I disagree with literally everything you've said here.

PHP indeed does not own libcurl, and nor does libcurl own PHP. We are targeting 
a different audience, and have a different set of facilities available to us. 
We have our own documentation, so there is no reason a user of PHP should know 
or care what the underlying implementation looks like, any more than they 
should know how the memory allocation works.

If the underlying library adds a feature, it will be as easy to add a new 
method as a new constant. If the underlying library changes behaviour, we will 
want to make our own decision on whether that meets our compatibility policy, 
and whether to emulate the older behaviour (or indeed emulate the newer 
behaviour on systems with an older library).

Twenty years ago, maybe PHP programmers were used to it being a veneer over C. 
I don't think that is or should be the expectation today, unless you're using 
FFI.

Regards,

-- 
Rowan Tommins
[IMSoP]

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php



Re: [PHP-DEV] [RFC] [Under Discussion] New Curl URL API

2022-06-23 Thread Levi Morrison via internals
On Tue, Jun 21, 2022 at 10:38 PM Pierrick Charron  wrote:
>
> Hi,
>
> Following our discussions we had on the subject of the new Curl URL API,
> and other curl improvements. I decided to only focus on adding the new Curl
> URL API and put aside all other improvements. Here is the RFC that reflects
> our current conversations.
>
> https://wiki.php.net/rfc/curl-url-api
>
> Feel free to give any feedback, concern or support :-)
>
> Regards,
> Pierrick

IMO, this should mirror the low-level curl url API very directly. The
basis of my opinion is that we do not own libcurl; we are merely
adapting it for use in PHP. We cannot anticipate changes in their
design, nor do we have authority to do so if we feel something should
change. Touching it as little as possible makes it easier to track
upstream changes, etc.

Based on that, I think the naming should be closer to libcurl.:
  - CurlUrl::URL_ENCODE should be CurlUrl::URLENCODE
  - CurlUrl::URL_DECODE should be CurlUrl::URLDECODE

And so on, only differing if necessary because something is a reserved
word. The API should be as exact as possible to what libcurl provides.
The "helpers" getHost, getPassword, etc should be removed and should
expose `curl_url_get` more directly.

Of course, it should be object based instead of resource based, but that's it.

A nicer API can be built on top of it, but I don't think that's the
role this particular API should play.

-- 
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php



Re: [PHP-DEV] [RFC] [Under Discussion] New Curl URL API

2022-06-22 Thread Pierrick Charron
HI Hans,

any particular reason CurlUrl::getPort() defaults to 0 rather than one of
> the valid options? (that being CurlUrl::DEFAULT_PORT
> and CurlUrl::NO_DEFAULT_PORT )
>

This is because the default is none of those 2 behaviours, If the  port
wasn't set it will return null, but if the port is the default port for the
scheme it will still return it.


makes it sound like these would return null: http://localhost:80/
> https://localhost:443/ ftps://localhost:22/
>
> Is that correct? I would imagine it returns null if the port isn't
> specified, rather than null if the port when specified matches the default
> port?
>
>
That's correct, if you use  CurlUrl::NO_DEFAULT_PORT. The behaviour you
were expecting is the default one ($flags = 0)


Re: [PHP-DEV] [RFC] [Under Discussion] New Curl URL API

2022-06-22 Thread Hans Henrik Bergan
(dammit, mixed sftp:// with ftps:// there, ignore that, i meant sftp:// )

On Wed, 22 Jun 2022 at 16:53, Hans Henrik Bergan 
wrote:

> nitpicking but I kind-of doubt the description
> for CurlUrl::NO_DEFAULT_PORT is correct, quote:
> > Instructs the method to return null if the port matches the default port
> for the scheme.
>
> makes it sound like these would return null: http://localhost:80/
> https://localhost:443/ ftps://localhost:22/
>
> Is that correct? I would imagine it returns null if the port isn't
> specified, rather than null if the port when specified matches the default
> port?
>
> On Wed, 22 Jun 2022 at 16:46, Hans Henrik Bergan 
> wrote:
>
>> any particular reason CurlUrl::getPort() defaults to 0 rather than one of
>> the valid options? (that being CurlUrl::DEFAULT_PORT
>> and CurlUrl::NO_DEFAULT_PORT )
>>
>> On Wed, 22 Jun 2022 at 16:23, Pierrick Charron  wrote:
>>
>>> Hi Derick,
>>>
>>>
>>> >
>>> > - The new CurlUrl class should probably be immutable from the start. It
>>> > was my biggest mistake not to do that with DateTime.
>>> >
>>> >
>>> Thanks for sharing your lessons learned. But I still see some use cases
>>> where mutable objects are easier to use. From the experience you had with
>>> DateTime, do you think that having `CurlUrl` being immutable and
>>> providing
>>> a `MutableCurlUrl` would make sense ? I see some cases where you
>>> "navigate"
>>> on a website using the same CurlHandle and just want to manipulate the
>>> MutableCurlUrl handle to change urls.
>>>
>>>
>>> > - What happens if the curl library available on the system does not
>>> have
>>> > the features and functions that this new class relies on? I would
>>> expect
>>> > the class to not be available either, but the RFC does not mention
>>> that.
>>> >
>>>
>>> Good point. As you expected, if the functions are not available in
>>> libcurl,
>>> the class will not be available. Same thing for each constant/feature.
>>> The
>>> extension will "adapt" to the curl version. I will add this to the RFC.
>>>
>>> Pierrick
>>>
>>


Re: [PHP-DEV] [RFC] [Under Discussion] New Curl URL API

2022-06-22 Thread Hans Henrik Bergan
nitpicking but I kind-of doubt the description for CurlUrl::NO_DEFAULT_PORT
is correct, quote:
> Instructs the method to return null if the port matches the default port
for the scheme.

makes it sound like these would return null: http://localhost:80/
https://localhost:443/ ftps://localhost:22/

Is that correct? I would imagine it returns null if the port isn't
specified, rather than null if the port when specified matches the default
port?

On Wed, 22 Jun 2022 at 16:46, Hans Henrik Bergan 
wrote:

> any particular reason CurlUrl::getPort() defaults to 0 rather than one of
> the valid options? (that being CurlUrl::DEFAULT_PORT
> and CurlUrl::NO_DEFAULT_PORT )
>
> On Wed, 22 Jun 2022 at 16:23, Pierrick Charron  wrote:
>
>> Hi Derick,
>>
>>
>> >
>> > - The new CurlUrl class should probably be immutable from the start. It
>> > was my biggest mistake not to do that with DateTime.
>> >
>> >
>> Thanks for sharing your lessons learned. But I still see some use cases
>> where mutable objects are easier to use. From the experience you had with
>> DateTime, do you think that having `CurlUrl` being immutable and providing
>> a `MutableCurlUrl` would make sense ? I see some cases where you
>> "navigate"
>> on a website using the same CurlHandle and just want to manipulate the
>> MutableCurlUrl handle to change urls.
>>
>>
>> > - What happens if the curl library available on the system does not have
>> > the features and functions that this new class relies on? I would expect
>> > the class to not be available either, but the RFC does not mention that.
>> >
>>
>> Good point. As you expected, if the functions are not available in
>> libcurl,
>> the class will not be available. Same thing for each constant/feature. The
>> extension will "adapt" to the curl version. I will add this to the RFC.
>>
>> Pierrick
>>
>


Re: [PHP-DEV] [RFC] [Under Discussion] New Curl URL API

2022-06-22 Thread Hans Henrik Bergan
any particular reason CurlUrl::getPort() defaults to 0 rather than one of
the valid options? (that being CurlUrl::DEFAULT_PORT
and CurlUrl::NO_DEFAULT_PORT )

On Wed, 22 Jun 2022 at 16:23, Pierrick Charron  wrote:

> Hi Derick,
>
>
> >
> > - The new CurlUrl class should probably be immutable from the start. It
> > was my biggest mistake not to do that with DateTime.
> >
> >
> Thanks for sharing your lessons learned. But I still see some use cases
> where mutable objects are easier to use. From the experience you had with
> DateTime, do you think that having `CurlUrl` being immutable and providing
> a `MutableCurlUrl` would make sense ? I see some cases where you "navigate"
> on a website using the same CurlHandle and just want to manipulate the
> MutableCurlUrl handle to change urls.
>
>
> > - What happens if the curl library available on the system does not have
> > the features and functions that this new class relies on? I would expect
> > the class to not be available either, but the RFC does not mention that.
> >
>
> Good point. As you expected, if the functions are not available in libcurl,
> the class will not be available. Same thing for each constant/feature. The
> extension will "adapt" to the curl version. I will add this to the RFC.
>
> Pierrick
>


Re: [PHP-DEV] [RFC] [Under Discussion] New Curl URL API

2022-06-22 Thread Pierrick Charron
Hi Derick,


>
> - The new CurlUrl class should probably be immutable from the start. It
> was my biggest mistake not to do that with DateTime.
>
>
Thanks for sharing your lessons learned. But I still see some use cases
where mutable objects are easier to use. From the experience you had with
DateTime, do you think that having `CurlUrl` being immutable and providing
a `MutableCurlUrl` would make sense ? I see some cases where you "navigate"
on a website using the same CurlHandle and just want to manipulate the
MutableCurlUrl handle to change urls.


> - What happens if the curl library available on the system does not have
> the features and functions that this new class relies on? I would expect
> the class to not be available either, but the RFC does not mention that.
>

Good point. As you expected, if the functions are not available in libcurl,
the class will not be available. Same thing for each constant/feature. The
extension will "adapt" to the curl version. I will add this to the RFC.

Pierrick


Re: [PHP-DEV] [RFC] [Under Discussion] New Curl URL API

2022-06-22 Thread Derick Rethans
On 22 June 2022 05:38:13 BST, Pierrick Charron  wrote:
>Hi,
>
>Following our discussions we had on the subject of the new Curl URL API,
>and other curl improvements. I decided to only focus on adding the new Curl
>URL API and put aside all other improvements. Here is the RFC that reflects
>our current conversations.
>
>https://wiki.php.net/rfc/curl-url-api
>
>Feel free to give any feedback, concern or support :-)

I've two things:

- The new CurlUrl class should probably be immutable from the start. It was my 
biggest mistake not to do that with DateTime.

- What happens if the curl library available on the system does not have the 
features and functions that this new class relies on? I would expect the class 
to not be available either, but the RFC does not mention that.

cheers
Derick

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php



Re: [PHP-DEV] [RFC] [Under Discussion] New Curl URL API

2022-06-22 Thread Craig Francis
On 22 Jun 2022, at 05:38, Pierrick Charron  wrote:
> Here is the RFC that reflects our current conversations.
> 
> https://wiki.php.net/rfc/curl-url-api
> 
> Feel free to give any feedback, concern or support :-)


Thanks Pierrick,

I think this is a good approach to add the URL functionality to PHP 8.2 with a 
fairly simple CurlUrl object, just like CurlFile. This would allow developers 
to avoid the parsing vulnerability issues (as in, using two different libraries 
that can parse a URL in different ways)... then the completely new, and fully 
OOP version, would be done at a later date, after a lot more discussion / 
planning.

Craig

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: https://www.php.net/unsub.php



[PHP-DEV] [RFC] [Under Discussion] New Curl URL API

2022-06-21 Thread Pierrick Charron
Hi,

Following our discussions we had on the subject of the new Curl URL API,
and other curl improvements. I decided to only focus on adding the new Curl
URL API and put aside all other improvements. Here is the RFC that reflects
our current conversations.

https://wiki.php.net/rfc/curl-url-api

Feel free to give any feedback, concern or support :-)

Regards,
Pierrick