Re: a URL API ?

2018-09-09 Thread Daniel Stenberg via curl-library

On Thu, 2 Aug 2018, Daniel Stenberg wrote:


 https://github.com/curl/curl/wiki/URL-API


FYI: this has now landed in git. Take it for a spin and let us know how it 
works out for you!


--

 / daniel.haxx.se
---
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library
Etiquette:   https://curl.haxx.se/mail/etiquette.html

Re: a URL API ?

2018-08-14 Thread Dan Fandrich via curl-library
On Tue, Aug 14, 2018 at 11:17:08AM +0200, Daniel Stenberg wrote:
> Aha... well even if this is so, the effects of this will at least be
> mitigated by the fact that libcurl will still canonicalize them even if it
> wouldn't be perfect.
> 
> I mean a user who wants to compare two URLs should make sure to canonicalize
> *both* of them before the comparison. Then such suble details such as the
> one mentioned above will actually not matter since the end results from both
> those URLs should be the same. Even if another library with more specific
> domain knowledge possibly would end up with a slightly different output.
> 
> Or am I wrong?

You're right in the case of comparing URLs, but if an app is canonicalizing
them for the purpose of displaying them to the user in a nice format, then it
wouldn't be optimium, although it would still work fine.
---
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library
Etiquette:   https://curl.haxx.se/mail/etiquette.html

Re: a URL API ?

2018-08-14 Thread Daniel Stenberg via curl-library

On Mon, 13 Aug 2018, Dan Fandrich via curl-library wrote:

I'm not sure I see the difference between these two approaches. Can you 
show them with some example URLs?


For example, + and ! are reserved characters in RFC 3986 but unreserved in 
RFC 2326 (RTSP), so a generic canonicalization might return 
rtsp://example.com/me%2byou%21 whereas an RTSP-specific canonicalization 
would return rtsp://example.com/me+you!  At least, that's my interpretation 
after a quick reading of the RFCs.


Aha... well even if this is so, the effects of this will at least be mitigated 
by the fact that libcurl will still canonicalize them even if it wouldn't be 
perfect.


I mean a user who wants to compare two URLs should make sure to canonicalize 
*both* of them before the comparison. Then such suble details such as the one 
mentioned above will actually not matter since the end results from both those 
URLs should be the same. Even if another library with more specific domain 
knowledge possibly would end up with a slightly different output.


Or am I wrong?

--

 / daniel.haxx.se
---
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library
Etiquette:   https://curl.haxx.se/mail/etiquette.html

Re: a URL API ?

2018-08-13 Thread Dan Fandrich via curl-library
On Mon, Aug 13, 2018 at 09:44:53AM +0200, Daniel Stenberg wrote:
> On Mon, 13 Aug 2018, Dan Fandrich via curl-library wrote:
> >I think there should be a new option for this kind of encoding so the
> >canonical form stays canonical for every URI scheme, but programs that
> >would prefer merely a fairly consistent human-readable form using an
> >encoding set optimized for the scheme in use could use the other
> >CURLU_URLENCODE_OPTIMIZED (or whatever it's called) option instead.
> 
> I'm not sure I see the difference between these two approaches. Can you show
> them with some example URLs?

For example, + and ! are reserved characters in RFC 3986 but unreserved in RFC
2326 (RTSP), so a generic canonicalization might return
rtsp://example.com/me%2byou%21 whereas an RTSP-specific canonicalization would
return rtsp://example.com/me+you!  At least, that's my interpretation after a
quick reading of the RFCs.
---
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library
Etiquette:   https://curl.haxx.se/mail/etiquette.html

Re: a URL API ?

2018-08-13 Thread Daniel Stenberg via curl-library

On Mon, 13 Aug 2018, Dan Fandrich via curl-library wrote:

I think you're right, it should work. Documenting 
(CURLU_URLDECODE|CURLU_URLENCODE) as performing canonicalization is probably 
all you'd need, besides ensuring decode and encode happen in the correct 
order.


We could perhaps even make it a separate flag to make it more obvious to the 
user: CURLU_CANONICAL. Only recognized when getting the URL.


Actually, does CURLU_URLDECODE do anything on the curl_url_get call? It 
sounds like something that should only do something on the curl_url_set 
call.


The code keeps the strings URL encoded in the struct, pretty much as they were 
in the original URL so if you want the "raw" version of them you ask for URL 
decoding on *get().


On *set() you're expected to pass in the URL encoded version or ask libcurl to 
encode it for you.


This means that the preferred form of a URI differs depending on the scheme. 
Do we want to build in knowledge of the preferred encoding sets for all the 
different URI schemes out there today, or even just the ones curl supports?


The URI syntax is or can be subtly different depending on scheme already even 
without canonicalization (like the options part of the authority section). My 
approach so far is to only recognize libcurl-supported schemes by default, 
allowing that to be overridden with a flag. For unsupported schemes, it will 
of course just become a "best effort" and a generic handling.


I *suspect* libcurl users will most likely often only care for schemes that 
libcurl supports.


I think there should be a new option for this kind of encoding so the 
canonical form stays canonical for every URI scheme, but programs that would 
prefer merely a fairly consistent human-readable form using an encoding set 
optimized for the scheme in use could use the other 
CURLU_URLENCODE_OPTIMIZED (or whatever it's called) option instead.


I'm not sure I see the difference between these two approaches. Can you show 
them with some example URLs?


--

 / daniel.haxx.se
---
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library
Etiquette:   https://curl.haxx.se/mail/etiquette.html

Re: a URL API ?

2018-08-12 Thread Dan Fandrich via curl-library
On Sun, Aug 12, 2018 at 06:45:27PM +0200, Daniel Stenberg wrote:
> The current code for the API doesn't offer URL decoding at all when you ask
> for the full URL - since a returned URL is still supposed to be a URL so it
> can't really be "decoded" then. We can of course document that bit to mean
> "canonicalization" when used in combination with getting the URL.
> 
> Canonicalization can probably be done by always URL decoding *and* URL
> encoding each individual part before they're put together to the end
> result... Wouldn't that work?

I think you're right, it should work. Documenting
(CURLU_URLDECODE|CURLU_URLENCODE) as performing canonicalization is probably
all you'd need, besides ensuring decode and encode happen in the correct order.
Actually, does CURLU_URLDECODE do anything on the curl_url_get call? It sounds
like something that should only do something on the curl_url_set call.

I'm a bit concerned by this paragraph of RFC 3986, though, with respect to
canonicalization in the curl API:

   URI producing applications should percent-encode data octets that
   correspond to characters in the reserved set unless these characters
   are specifically allowed by the URI scheme to represent data in that
   component.  If a reserved character is found in a URI component and
   no delimiting role is known for that character, then it must be
   interpreted as representing the data octet corresponding to that
   character's encoding in US-ASCII.

This means that the preferred form of a URI differs depending on the scheme. Do
we want to build in knowledge of the preferred encoding sets for all the
different URI schemes out there today, or even just the ones curl supports?
This implies that the canonical form could change if curl adds support for a
new scheme in the future.  If so, then I think there should be a new option for
this kind of encoding so the canonical form stays canonical for every URI
scheme, but programs that would prefer merely a fairly consistent
human-readable form using an encoding set optimized for the scheme in use could
use the other CURLU_URLENCODE_OPTIMIZED (or whatever it's called) option
instead.
---
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library
Etiquette:   https://curl.haxx.se/mail/etiquette.html

Re: a URL API ?

2018-08-12 Thread Daniel Stenberg via curl-library

On Sun, 12 Aug 2018, Ray Satiro via curl-library wrote:

I think you overdid it (I know, I'm a little behind on this discussion [1]), 
a struct would be simpler.


curl_url_parse(url, );

curl_url_build(parts, );

Is there really a need for a more expansive API?


I don't think exposing such a struct in the API and expecting users to handle 
it correctly is a good idea, as I think it's a bit too error-prone.


That API wouldn't support URL encoding/decoding of the parts - which I 
believe users will want.


It would also miss some of the features libcurl itself uses (like path-as-is 
disallow user) - which would make harder for us to switch to this API 
internally.


--

 / daniel.haxx.se
---
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library
Etiquette:   https://curl.haxx.se/mail/etiquette.html

Re: a URL API ?

2018-08-12 Thread Daniel Stenberg via curl-library

On Sun, 12 Aug 2018, Dan Fandrich via curl-library wrote:

I'm curious whether the API can be used to canonicalize a URL, i.e., URL 
decode parts that can be decoded without semantic difference but 
canonicalize those parts that the specs say must be encoded.


I think it should be possible

The idea would be that the output of canonicalization would be the same for 
every version of a URL passed in. I'm guessing that url_url_get(h, 
CURLUPART_URL, , CURLU_URLDECODE); would get you half way there, except 
that it might not encode parts that have to be encoded.  Is there another 
way I'm missing?


The current code for the API doesn't offer URL decoding at all when you ask 
for the full URL - since a returned URL is still supposed to be a URL so it 
can't really be "decoded" then. We can of course document that bit to mean 
"canonicalization" when used in combination with getting the URL.


Canonicalization can probably be done by always URL decoding *and* URL 
encoding each individual part before they're put together to the end result... 
Wouldn't that work?


--

 / daniel.haxx.se
---
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library
Etiquette:   https://curl.haxx.se/mail/etiquette.html

Re: a URL API ?

2018-08-12 Thread Ray Satiro via curl-library
On 8/1/2018 6:17 PM, Daniel Stenberg wrote:
> In the 2018 user survey, more than 40% of the 395 users who answered
> the question said they'd use a "URL handling" API in libcurl if one
> existed.
>
> I gave it some thoughts the other day and I've now jotted down my
> initial suggestion on how it could be made to work. An API that can
> parse a URL, extract the individual pieces, allow the user to set
> individual parts and finally to get the full URL out from there again.
>
> Here's my thoughts:
>
>   https://github.com/curl/curl/wiki/URL-API
>
> Good or bad? What would your application need and would this work for
> that? If not, how should we change it to make it better?
>
> (There's no promise that this will ever actually get implemented, but
> if we can come up with a proposal we believe in, I don't think there
> needs to be anything stopping it from happening...) 

I think you overdid it (I know, I'm a little behind on this discussion
[1]), a struct would be simpler.

curl_url_parse(url, );

curl_url_build(parts, );

Is there really a need for a more expansive API?


[1]: https://www.youtube.com/watch?v=gqQ99s4Ywnw



---
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library
Etiquette:   https://curl.haxx.se/mail/etiquette.html

Re: a URL API ?

2018-08-11 Thread Dan Fandrich via curl-library
On Thu, Aug 09, 2018 at 10:48:32AM +0200, Daniel Stenberg via curl-library 
wrote:
> Returning to this, as I've polished the API a bit over the last few days.
> The wiki page has been updated to reflect the changes I've done.

I'm curious whether the API can be used to canonicalize a URL, i.e., URL decode
parts that can be decoded without semantic difference but canonicalize those
parts that the specs say must be encoded.  The idea would be that the output of
canonicalization would be the same for every version of a URL passed in. I'm
guessing that url_url_get(h, CURLUPART_URL, , CURLU_URLDECODE); would get
you half way there, except that it might not encode parts that have to be
encoded.  Is there another way I'm missing?
---
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library
Etiquette:   https://curl.haxx.se/mail/etiquette.html

Re: a URL API ?

2018-08-09 Thread Daniel Jeliński via curl-library
2018-08-09 14:15 GMT+02:00 Daniel Stenberg via curl-library
:
> ... or should it perhaps just skip the *first* '=' ?

I don't think any URL parsing library cares about = beyond the first
one. Which is why = in name may pose a problem, but in value probably
won't. I'd skip all.
---
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library
Etiquette:   https://curl.haxx.se/mail/etiquette.html

Re: a URL API ?

2018-08-09 Thread Daniel Stenberg via curl-library

On Thu, 9 Aug 2018, Daniel Stenberg via curl-library wrote:

(replying to myself...9


 /* append to query, ask for encoding */
 curl_url_set(h, CURLUPART_QUERY, "company=AT", CURLU_APPENDQUERY|



- CURLU_URLENCODE with CURLU_APPENDQUERY set, will skip the '=' letter when
 doing the encoding


... or should it perhaps just skip the *first* '=' ?

If we ponder a user wants to add a "name=contents" pair, I figure the first 
assignment is the one that shouldn't be encoded but subsequent ones then 
presumably should?


--

 / daniel.haxx.se
---
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library
Etiquette:   https://curl.haxx.se/mail/etiquette.html

Re: a URL API ?

2018-08-09 Thread Daniel Stenberg via curl-library

On Thu, 9 Aug 2018, Daniel Jeliński via curl-library wrote:


 char *append = "=44";


Well assuming we want to use the API to build URL based on HTML form with 
GET action, curl_url_query_append suggested by Geoff would be much nicer.


Yes, you're right. I've taken a more generic approach that isn't at all aware 
of HTML forms.



In particular, I would expect the API to:
- figure out if it needs to add & or ?
- figure out if it needs to URLEncode the parameter or value (eg. when
setting "company"="AT", we need to escape the ampersand)
- do the appending / memory allocation part on its own
What do you think?


I hear you! How about...

A dedicated feature bit to append the string to the query?

  /* append to query, ask for encoding */
  curl_url_set(h, CURLUPART_QUERY, "company=AT", CURLU_APPENDQUERY|
   CURLU_URLENCODE);

  /* append to query, already encoded */
  curl_url_set(h, CURLUPART_QUERY, "company=AT%26T", CURLU_APPENDQUERY);


- CURLU_APPENDQUERY makes it also add a '&' before the string if there's
  already contents in the query.
- CURLU_URLENCODE with CURLU_APPENDQUERY set, will skip the '=' letter when
  doing the encoding

--

 / daniel.haxx.se---
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library
Etiquette:   https://curl.haxx.se/mail/etiquette.html

Re: a URL API ?

2018-08-09 Thread Daniel Jeliński via curl-library
2018-08-09 10:48 GMT+02:00 Daniel Stenberg via curl-library
:
> Say we want to append this to the query:
>
>  char *append = "=44";

Well assuming we want to use the API to build URL based on HTML form
with GET action, curl_url_query_append suggested by Geoff would be
much nicer. In particular, I would expect the API to:
- figure out if it needs to add & or ?
- figure out if it needs to URLEncode the parameter or value (eg. when
setting "company"="AT", we need to escape the ampersand)
- do the appending / memory allocation part on its own
What do you think?
---
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library
Etiquette:   https://curl.haxx.se/mail/etiquette.html

Re: a URL API ?

2018-08-09 Thread Daniel Stenberg via curl-library

On Thu, 2 Aug 2018, Geoff Beier wrote:

The setters would be important to us. I might be bikeshedding here, but the 
ability to add to the query would be very nice. So something like 
curl_url_query_append(urlp, "numitems", 3)


Returning to this, as I've polished the API a bit over the last few days. The 
wiki page has been updated to reflect the changes I've done.


As the curl URL API works now, this is how you append a string to the query 
of a URL.


First, create a handle and pass it a full URL:

 CURLU *h = curl_url();
 curl_url_set(h, CURLUPART_URL, "https://example.com/foo?askforthis;, 0);

Say we want to append this to the query:

 char *append = "=44";

We extract the query part

 char *q;
 curl_url_get(h, CURLUPART_QUERY, , 0);

Make space for the new enlarged query doing regular memory management and 
create the updated querty there. The 'q' pointer points to memory managed by 
libcurl so it can't be realloc'ed.


 char *newptr = malloc(strlen(q) + strlen(append) + 1);
 strcpy(newptr, q);
 strcat(newptr, append);

Then replace the former query part in the URL by setting this new one:

 curl_url_set(h, CURLUPART_QUERY, newptr, 0);

Free the data

 curl_free(q);
 free(newptr);

... and now we can extract the full URL again and it will have the updated 
query part:


 char *url;
 curl_url_get(h, CURLUPART_URL, , 0);

--

 / daniel.haxx.se
---
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library
Etiquette:   https://curl.haxx.se/mail/etiquette.html

Re: a URL API ?

2018-08-06 Thread Daniel Stenberg

On Mon, 6 Aug 2018, Daniel Stenberg wrote:


- implement the missing pieces of the API:


I've pushed a first PR of the URL API work to make sure all tests and builds 
are happy:


  https://github.com/curl/curl/pull/2842

I've listed outstanding work in there.

The API works pretty good already so if there's someone who's interested in 
being a test-pilot, building and testing it could be fun and give some 
valueable input for me and is an opportunity for you to also affect its 
functionality.


--

 / daniel.haxx.se
---
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library
Etiquette:   https://curl.haxx.se/mail/etiquette.html

Re: a URL API ?

2018-08-06 Thread Daniel Stenberg

On Thu, 2 Aug 2018, Daniel Stenberg wrote:


Good or bad? What would your application need and would this work for that?


I've written some initial code for this API [1] now. As I proceed further, I 
intend to remove the wiki page as I suspect there will be a lot of details 
that have changed and I'll instead get the details right and documented in the 
coming man pages for the new stuff.


curl_url() mostly works and there's a bunch of tests for the fundamentals.

curl_url_get() works to extract all the parsed parts except CURLUPART_URL.

curl_url_cleanup() works.

The initial work exists in a separate branch [2] - and as long as I'm the only 
developer in this branch I intend to rebase and squash it regularly. Just 
beware.


The current api header can also be browsed online at [3].

Next up, I intend to:

- implement the missing pieces of the API:
  curl_url_get's CURLUPART_URL option
  curl_url_set()
  curl_url_dup()

- add more test cases to make sure the functions work, also to make it easy to
  add more tests later
- write documentation for the new functions and options
- make use of the new parser for the existing libcurl functionlity to reduce
  code duplication and make sure the URL API and the libcurl transfer engine
  treat URLs identically
- add a curl_easy_setopt() option to take a CURLURL pointer instead of a URL
  string
- consider command line tool access to the URL API for parsing URLs

There's not yet any schedule for when this can land. I intend to mark this API 
as "experimental" for the first few releases it appears to signal to users and 
us all what to expect from it. Maybe this is 7.63.0 material?


[1] = https://github.com/curl/curl/wiki/URL-API

[2] = https://github.com/curl/curl/tree/URL-API

[3] = https://github.com/curl/curl/blob/URL-API/include/curl/urlapi.h

--

 / daniel.haxx.se
---
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library
Etiquette:   https://curl.haxx.se/mail/etiquette.html

Re: a URL API ?

2018-08-03 Thread Daniel Stenberg

On Fri, 3 Aug 2018, Samuel Hurst wrote:

For example, being able to build new 
URLs from relative ones. I can't quite tell from the examples provided 
whether curl_url would do relative transformation if the urlhandle is already 
valid. I can see a use case where I'd want to do the following:


CURLURL *url_handle = NULL;
curl_url("https://example.org/hello;, url_handle, 0);
...
curl_url("/image.png", url_handle, 0);


Ah, yes. I like this suggestion - even if "/image.png" could've been handled 
with just setting a new path. I suppose "../image.png" or something would be a 
better example.


As for the specific API to deal with a relative URL, I think I prefer to have 
it not overload curl_url() for that. As I prefer the "alternative B" API, I 
think we can just add a dedicated CURLUPart for a relative URL. Then it would 
be used like this:


  curl_url("https://example.org/path/to/hello;, _handle, 0);

  curl_url_set(url_handle, CURLUPART_RELURL, "../image.png", 0);

Some form of handle copy might be useful here too, if you're having to do a 
lot of relative transformations from a single base URL.


Yes, something like this:

  CURLURL *handle = curl_url_dup(inhandle);

In addition, our own URL classes support returning the path as an array of 
strings corresponding to to each individual path segment. We've certainly 
found use for this in the past, and others may also find this useful.


Mm, maybe. Is there anything else to that than splitting the path on every 
forward slash?


--

 / daniel.haxx.se
---
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library
Etiquette:   https://curl.haxx.se/mail/etiquette.html

Re: a URL API ?

2018-08-03 Thread Samuel Hurst

On 01/08/18 23:17, Daniel Stenberg wrote:

Hi,

In the 2018 user survey, more than 40% of the 395 users who answered the 
question said they'd use a "URL handling" API in libcurl if one existed.


I gave it some thoughts the other day and I've now jotted down my 
initial suggestion on how it could be made to work. An API that can 
parse a URL, extract the individual pieces, allow the user to set 
individual parts and finally to get the full URL out from there again.


Here's my thoughts:

   https://github.com/curl/curl/wiki/URL-API

Good or bad? What would your application need and would this work for 
that? If not, how should we change it to make it better?


I think the current proposal is fairly sound as a first step. I think 
having something built-in to cURL makes a lot of sense, given the 
project's focus on "doing stuff with URLs", and not having to add 
another library dependency is a definite plus point.


However, I can think of a couple of worthwhile improvements:

Currently, we have our own URL parsing classes, and there's a couple of 
important features that we find useful. For example, being able to build 
new URLs from relative ones. I can't quite tell from the examples 
provided whether curl_url would do relative transformation if the 
urlhandle is already valid. I can see a use case where I'd want to do 
the following:


CURLURL *url_handle = NULL;
curl_url("https://example.org/hello;, url_handle, 0);
...
curl_url("/image.png", url_handle, 0);

I'd then expect a call to curl_url_get() on url_handle to return 
something like "https://example.org/image.png;


Some form of handle copy might be useful here too, if you're having to 
do a lot of relative transformations from a single base URL.


In addition, our own URL classes support returning the path as an array 
of strings corresponding to to each individual path segment. We've 
certainly found use for this in the past, and others may also find this 
useful.


That's just my two pennies worth anyway.

-Sam
---
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library
Etiquette:   https://curl.haxx.se/mail/etiquette.html

Re: a URL API ?

2018-08-02 Thread Daniel Stenberg

On Thu, 2 Aug 2018, Geoff Beier wrote:

The setters would be important to us. I might be bikeshedding here, but the 
ability to add to the query would be very nice. So something like 
curl_url_query_append(urlp, "numitems", 3)


That ("numitems", 3) approach is very specific for adding a "[name]=[number]" 
snippet though. Shouldn't it rather be a function for appdending a generic 
string?


  curl_url_query_append(urp, "numitems=3");

... or using the alternative B API 
(https://github.com/curl/curl/wiki/URL-API#url-api-alternative-b) approach:


  curl_url_append(urp, CURLUPART_QUERY, "numitems=3");


An unsigned type feels more appropriate for port than int does.


Yes, or as a full blown string to make it a generic part in alternative B.

I assume this is obvious, but if this API gets added to curl, a setopt that 
takes a CURLURL would be wanted :)


Yes!

--

 / daniel.haxx.se
---
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library
Etiquette:   https://curl.haxx.se/mail/etiquette.html

Re: a URL API ?

2018-08-02 Thread Geoff Beier
We use the uriparser library Radu Hociung mentioned primarily because there
was nothing analogous in curl.

The setters would be important to us. I might be bikeshedding here, but the
ability to add to the query would be very nice. So something like
curl_url_query_append(urlp, "numitems", 3)

An unsigned type feels more appropriate for port than int does.

While I can't say that we'd move our working code from uriparser to this
API, I'm certain this API would have prevented us from reaching for
uriparser in the first place, particularly because all of our URL usage is
for consumption by libcurl anyway.

I assume this is obvious, but if this API gets added to curl, a setopt that
takes a CURLURL would be wanted :)



On Thu, Aug 2, 2018 at 8:41 AM, Daniel Stenberg  wrote:

> On Wed, 1 Aug 2018, Radu Hociung wrote:
>
> a "URL handling" API in libcurl
>>>
>>
>> This wheel has already been invented. [1]
>>
>> [1] https://uriparser.github.io/
>>
>
> Are you suggesting this uses an API that would be worth getting
> inspiration from? Personally I don't see a lot I like there...
>
> (Dan already presented some reasons why having our own makes sense.)
>
> --
>
>  / daniel.haxx.se
>
> ---
> Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library
> Etiquette:   https://curl.haxx.se/mail/etiquette.html
>
---
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library
Etiquette:   https://curl.haxx.se/mail/etiquette.html

Re: a URL API ?

2018-08-02 Thread Daniel Stenberg

On Wed, 1 Aug 2018, Radu Hociung wrote:


a "URL handling" API in libcurl


This wheel has already been invented. [1]

[1] https://uriparser.github.io/


Are you suggesting this uses an API that would be worth getting inspiration 
from? Personally I don't see a lot I like there...


(Dan already presented some reasons why having our own makes sense.)

--

 / daniel.haxx.se
---
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library
Etiquette:   https://curl.haxx.se/mail/etiquette.html

Re: a URL API ?

2018-08-02 Thread Daniel Stenberg

On Thu, 2 Aug 2018, Dan Fandrich wrote:


My first thought it that it's pretty long-winded. I can't think of many
situations where you might want only one part of a URL, like the port, but
don't want e.g. the host, or the query without the path. What was in the back
of my mind for such an API was something simple like:

 CURLUCode curl_url_parse(
 const char *url,
 char **scheme,
 char **host,
 char **port,
 char **user,
 char **password,
 char **path,
 char **query,
 char **fragment);


I find that API a bit awkward due to the large amount of arguments. That style 
could also be supported with:


  CURLUCode curl_url_parse(const char *url, struct curlurl *store);

... and hake 'struct curlurl' be a public struct that gets filled in with 
those eight char pointers. But I'm not too fond of doing public structs like 
that as they're so hard to change in the future.


I also comtemplated having the API work like this:

 curl_url_get(CURLURL *handle, CURLPIECE what, char **piece, int flags);
 curl_url_set(CURLURL *handle, CURLPIECE what, char *piece, int flags);

... and have the second argument specify what part to extract or set instead 
of doing one function for each. In fact, I think I now prefer this way since 
it reduces the amount of function calls needed and makes the get/set calls 
very uniform.


Your proposal makes me realize that a bitmap options argument would also be 
good to allow some configuration of that process.


It's either that or taking a firm stance of what the "correct" response should 
be in those cases, but I think it might be hard to have one single correct 
behavior everywhere...


The handle-based approach makes it easier to manage URLs; if all parts of 
the URL aren't needed at the same time, the application doesn't need to keep 
8 variables lying around to hold them all.


I've grown to like handle-based APIs since they keep things flexible for the 
future...



A couple of things missing from in your proposal are handling of the fragment
part of the URL


Oops!

and support for unescaping and possibly un-plussing query parts (changing 
plusses to spaces), i.e., turning a URL into something useful as-is in an 
application.


Yeah, I sort of left that out as a start as I wasn't really sure if that 
should be part of this API, but I think you're right that users will probably 
appreciate exactly that ability.


I'd also move the "set" part of the name before what it's setting, so it's 
more like curl_easy_setopt.


Fair enough!

--

 / daniel.haxx.se
---
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library
Etiquette:   https://curl.haxx.se/mail/etiquette.html

Re: a URL API ?

2018-08-02 Thread Dan Fandrich
On Wed, Aug 01, 2018 at 11:54:56PM -0500, Radu Hociung wrote:
> On 01/08/2018 5:17 PM, Daniel Stenberg wrote:
> > a "URL handling" API in libcurl
> 
> This wheel has already been invented. [1]

The main point of having a libcurl API has to do with reducing security issues
due to differences in URL parsing between libcurl and applications. Pointing
people at another parser won't solve that problem, unless libcurl were
rewritten to use that parser itself. That may actually be a solution, but we'd
have to be careful that using an external parser doesn't introduce regressions.
What worries me is the statement that this particular library is "strictly RFC
3986 compliant" which means that the various quirks added to libcurl's parser
to conform to WhatWG and browser implementation will disappear and introduce
such regressions.  An external library would also hamper libcurl's ability to
fix problems or introduce new URL schemes (standardized or not) in the future.
---
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library
Etiquette:   https://curl.haxx.se/mail/etiquette.html

Re: a URL API ?

2018-08-02 Thread Radu Hociung


On 01/08/2018 5:17 PM, Daniel Stenberg wrote:
> a "URL handling" API in libcurl

This wheel has already been invented. [1]

Cheers.
Radu.

[1] https://uriparser.github.io/
---
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library
Etiquette:   https://curl.haxx.se/mail/etiquette.html

Re: a URL API ?

2018-08-02 Thread James Fuller
yes, I think passing around a struct with this information (and able
to get uri from its parts) would be useful would be more useful then
just direct url construction ... that way one could represent a common
base uri and modify it by just setting new path (in that respect we
may want ability to manage segments, query or fragments of a uri). +1
for unescape/escape.

Jim

On 2 August 2018 at 07:36, Dan Fandrich  wrote:
> On Thu, Aug 02, 2018 at 12:17:26AM +0200, Daniel Stenberg wrote:
>> Here's my thoughts:
>>
>>   https://github.com/curl/curl/wiki/URL-API
>>
>> Good or bad? What would your application need and would this work for that?
>> If not, how should we change it to make it better?
>
> My first thought it that it's pretty long-winded. I can't think of many
> situations where you might want only one part of a URL, like the port, but
> don't want e.g. the host, or the query without the path. What was in the back
> of my mind for such an API was something simple like:
>
>   CURLUCode curl_url_parse(
>   const char *url,
>   char **scheme,
>   char **host,
>   char **port,
>   char **user,
>   char **password,
>   char **path,
>   char **query,
>   char **fragment);
>   CURLUCode curl_url_build(
>   char **url,
>   const char *scheme,
>   const char *host,
>   const char *port,
>   const char *user,
>   const char *password,
>   const char *path,
>   const char *query,
>   const char *fragment);
>
> which simply splits an URL into its constituent parts and puts it back 
> together
> again. Your proposal makes me realize that a bitmap options argument would 
> also
> be good to allow some configuration of that process.
>
> But maybe this is too simplistic. Using a handle-based approach as you do lets
> you cleanly add additional functions to, e.g., iterate over individual
> parameters in the query part, or over parts of the path (although that could 
> be
> done with my simple proposal with another function that operates on the query
> part directly). Additional functions for handling specific URL schemes more
> easily might also be welcome and could be added later, like one to directly
> return an IMAP mailbox name from an IMAP URL. The handle-based approach makes
> it easier to manage URLs; if all parts of the URL aren't needed at the same
> time, the application doesn't need to keep 8 variables lying around to hold
> them all.
>
> A couple of things missing from in your proposal are handling of the fragment
> part of the URL, and support for unescaping and possibly un-plussing query
> parts (changing plusses to spaces), i.e., turning a URL into something useful
> as-is in an application. I'd also move the "set" part of the name before what
> it's setting, so it's more like curl_easy_setopt.
> ---
> Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library
> Etiquette:   https://curl.haxx.se/mail/etiquette.html
---
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library
Etiquette:   https://curl.haxx.se/mail/etiquette.html

Re: a URL API ?

2018-08-01 Thread Dan Fandrich
On Thu, Aug 02, 2018 at 12:17:26AM +0200, Daniel Stenberg wrote:
> Here's my thoughts:
> 
>   https://github.com/curl/curl/wiki/URL-API
> 
> Good or bad? What would your application need and would this work for that?
> If not, how should we change it to make it better?

My first thought it that it's pretty long-winded. I can't think of many
situations where you might want only one part of a URL, like the port, but
don't want e.g. the host, or the query without the path. What was in the back
of my mind for such an API was something simple like:

  CURLUCode curl_url_parse(
  const char *url,
  char **scheme, 
  char **host, 
  char **port, 
  char **user, 
  char **password, 
  char **path, 
  char **query, 
  char **fragment);
  CURLUCode curl_url_build(
  char **url,
  const char *scheme, 
  const char *host, 
  const char *port, 
  const char *user, 
  const char *password, 
  const char *path, 
  const char *query, 
  const char *fragment);

which simply splits an URL into its constituent parts and puts it back together
again. Your proposal makes me realize that a bitmap options argument would also
be good to allow some configuration of that process.

But maybe this is too simplistic. Using a handle-based approach as you do lets
you cleanly add additional functions to, e.g., iterate over individual
parameters in the query part, or over parts of the path (although that could be
done with my simple proposal with another function that operates on the query
part directly). Additional functions for handling specific URL schemes more
easily might also be welcome and could be added later, like one to directly
return an IMAP mailbox name from an IMAP URL. The handle-based approach makes
it easier to manage URLs; if all parts of the URL aren't needed at the same
time, the application doesn't need to keep 8 variables lying around to hold
them all.

A couple of things missing from in your proposal are handling of the fragment
part of the URL, and support for unescaping and possibly un-plussing query
parts (changing plusses to spaces), i.e., turning a URL into something useful
as-is in an application. I'd also move the "set" part of the name before what
it's setting, so it's more like curl_easy_setopt.
---
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library
Etiquette:   https://curl.haxx.se/mail/etiquette.html

Re: a URL API ?

2018-08-01 Thread bch
On Wed, Aug 1, 2018 at 3:19 PM Daniel Stenberg  wrote:

> Hi,
>
> In the 2018 user survey, more than 40% of the 395 users who answered the
> question said they'd use a "URL handling" API in libcurl if one existed.
>
> I gave it some thoughts the other day and I've now jotted down my initial
> suggestion on how it could be made to work. An API that can parse a URL,
> extract the individual pieces, allow the user to set individual parts and
> finally to get the full URL out from there again.


In case this is interesting, I’ve used
https://github.com/nodejs/http-parser in a similar capacity in the past.
It’s been a while since I’ve used it, and I haven’t done a comparison
between your proposal and it, but figured “before i forget... I’d better
send this”. Obviously not fully integrated w cURL, but if there’s good
reference material...

-bch



>
> Here's my thoughts:
>
>https://github.com/curl/curl/wiki/URL-API
>
> Good or bad? What would your application need and would this work for
> that? If
> not, how should we change it to make it better?
>
> (There's no promise that this will ever actually get implemented, but if
> we
> can come up with a proposal we believe in, I don't think there needs to be
> anything stopping it from happening...)
>
> --
>
>   / daniel.haxx.se
> ---
> Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library
> Etiquette:   https://curl.haxx.se/mail/etiquette.html
---
Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library
Etiquette:   https://curl.haxx.se/mail/etiquette.html