Re: a URL API ?
On Thu, 2 Aug 2018, Daniel Stenberg wrote: https://github.com/curl/curl/wiki/URL-API FYI: this has now landed in git. Take it for a spin and let us know how it works out for you! -- / daniel.haxx.se --- Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library Etiquette: https://curl.haxx.se/mail/etiquette.html
Re: a URL API ?
On Tue, Aug 14, 2018 at 11:17:08AM +0200, Daniel Stenberg wrote: > Aha... well even if this is so, the effects of this will at least be > mitigated by the fact that libcurl will still canonicalize them even if it > wouldn't be perfect. > > I mean a user who wants to compare two URLs should make sure to canonicalize > *both* of them before the comparison. Then such suble details such as the > one mentioned above will actually not matter since the end results from both > those URLs should be the same. Even if another library with more specific > domain knowledge possibly would end up with a slightly different output. > > Or am I wrong? You're right in the case of comparing URLs, but if an app is canonicalizing them for the purpose of displaying them to the user in a nice format, then it wouldn't be optimium, although it would still work fine. --- Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library Etiquette: https://curl.haxx.se/mail/etiquette.html
Re: a URL API ?
On Mon, 13 Aug 2018, Dan Fandrich via curl-library wrote: I'm not sure I see the difference between these two approaches. Can you show them with some example URLs? For example, + and ! are reserved characters in RFC 3986 but unreserved in RFC 2326 (RTSP), so a generic canonicalization might return rtsp://example.com/me%2byou%21 whereas an RTSP-specific canonicalization would return rtsp://example.com/me+you! At least, that's my interpretation after a quick reading of the RFCs. Aha... well even if this is so, the effects of this will at least be mitigated by the fact that libcurl will still canonicalize them even if it wouldn't be perfect. I mean a user who wants to compare two URLs should make sure to canonicalize *both* of them before the comparison. Then such suble details such as the one mentioned above will actually not matter since the end results from both those URLs should be the same. Even if another library with more specific domain knowledge possibly would end up with a slightly different output. Or am I wrong? -- / daniel.haxx.se --- Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library Etiquette: https://curl.haxx.se/mail/etiquette.html
Re: a URL API ?
On Mon, Aug 13, 2018 at 09:44:53AM +0200, Daniel Stenberg wrote: > On Mon, 13 Aug 2018, Dan Fandrich via curl-library wrote: > >I think there should be a new option for this kind of encoding so the > >canonical form stays canonical for every URI scheme, but programs that > >would prefer merely a fairly consistent human-readable form using an > >encoding set optimized for the scheme in use could use the other > >CURLU_URLENCODE_OPTIMIZED (or whatever it's called) option instead. > > I'm not sure I see the difference between these two approaches. Can you show > them with some example URLs? For example, + and ! are reserved characters in RFC 3986 but unreserved in RFC 2326 (RTSP), so a generic canonicalization might return rtsp://example.com/me%2byou%21 whereas an RTSP-specific canonicalization would return rtsp://example.com/me+you! At least, that's my interpretation after a quick reading of the RFCs. --- Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library Etiquette: https://curl.haxx.se/mail/etiquette.html
Re: a URL API ?
On Mon, 13 Aug 2018, Dan Fandrich via curl-library wrote: I think you're right, it should work. Documenting (CURLU_URLDECODE|CURLU_URLENCODE) as performing canonicalization is probably all you'd need, besides ensuring decode and encode happen in the correct order. We could perhaps even make it a separate flag to make it more obvious to the user: CURLU_CANONICAL. Only recognized when getting the URL. Actually, does CURLU_URLDECODE do anything on the curl_url_get call? It sounds like something that should only do something on the curl_url_set call. The code keeps the strings URL encoded in the struct, pretty much as they were in the original URL so if you want the "raw" version of them you ask for URL decoding on *get(). On *set() you're expected to pass in the URL encoded version or ask libcurl to encode it for you. This means that the preferred form of a URI differs depending on the scheme. Do we want to build in knowledge of the preferred encoding sets for all the different URI schemes out there today, or even just the ones curl supports? The URI syntax is or can be subtly different depending on scheme already even without canonicalization (like the options part of the authority section). My approach so far is to only recognize libcurl-supported schemes by default, allowing that to be overridden with a flag. For unsupported schemes, it will of course just become a "best effort" and a generic handling. I *suspect* libcurl users will most likely often only care for schemes that libcurl supports. I think there should be a new option for this kind of encoding so the canonical form stays canonical for every URI scheme, but programs that would prefer merely a fairly consistent human-readable form using an encoding set optimized for the scheme in use could use the other CURLU_URLENCODE_OPTIMIZED (or whatever it's called) option instead. I'm not sure I see the difference between these two approaches. Can you show them with some example URLs? -- / daniel.haxx.se --- Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library Etiquette: https://curl.haxx.se/mail/etiquette.html
Re: a URL API ?
On Sun, Aug 12, 2018 at 06:45:27PM +0200, Daniel Stenberg wrote: > The current code for the API doesn't offer URL decoding at all when you ask > for the full URL - since a returned URL is still supposed to be a URL so it > can't really be "decoded" then. We can of course document that bit to mean > "canonicalization" when used in combination with getting the URL. > > Canonicalization can probably be done by always URL decoding *and* URL > encoding each individual part before they're put together to the end > result... Wouldn't that work? I think you're right, it should work. Documenting (CURLU_URLDECODE|CURLU_URLENCODE) as performing canonicalization is probably all you'd need, besides ensuring decode and encode happen in the correct order. Actually, does CURLU_URLDECODE do anything on the curl_url_get call? It sounds like something that should only do something on the curl_url_set call. I'm a bit concerned by this paragraph of RFC 3986, though, with respect to canonicalization in the curl API: URI producing applications should percent-encode data octets that correspond to characters in the reserved set unless these characters are specifically allowed by the URI scheme to represent data in that component. If a reserved character is found in a URI component and no delimiting role is known for that character, then it must be interpreted as representing the data octet corresponding to that character's encoding in US-ASCII. This means that the preferred form of a URI differs depending on the scheme. Do we want to build in knowledge of the preferred encoding sets for all the different URI schemes out there today, or even just the ones curl supports? This implies that the canonical form could change if curl adds support for a new scheme in the future. If so, then I think there should be a new option for this kind of encoding so the canonical form stays canonical for every URI scheme, but programs that would prefer merely a fairly consistent human-readable form using an encoding set optimized for the scheme in use could use the other CURLU_URLENCODE_OPTIMIZED (or whatever it's called) option instead. --- Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library Etiquette: https://curl.haxx.se/mail/etiquette.html
Re: a URL API ?
On Sun, 12 Aug 2018, Ray Satiro via curl-library wrote: I think you overdid it (I know, I'm a little behind on this discussion [1]), a struct would be simpler. curl_url_parse(url, ); curl_url_build(parts, ); Is there really a need for a more expansive API? I don't think exposing such a struct in the API and expecting users to handle it correctly is a good idea, as I think it's a bit too error-prone. That API wouldn't support URL encoding/decoding of the parts - which I believe users will want. It would also miss some of the features libcurl itself uses (like path-as-is disallow user) - which would make harder for us to switch to this API internally. -- / daniel.haxx.se --- Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library Etiquette: https://curl.haxx.se/mail/etiquette.html
Re: a URL API ?
On Sun, 12 Aug 2018, Dan Fandrich via curl-library wrote: I'm curious whether the API can be used to canonicalize a URL, i.e., URL decode parts that can be decoded without semantic difference but canonicalize those parts that the specs say must be encoded. I think it should be possible The idea would be that the output of canonicalization would be the same for every version of a URL passed in. I'm guessing that url_url_get(h, CURLUPART_URL, , CURLU_URLDECODE); would get you half way there, except that it might not encode parts that have to be encoded. Is there another way I'm missing? The current code for the API doesn't offer URL decoding at all when you ask for the full URL - since a returned URL is still supposed to be a URL so it can't really be "decoded" then. We can of course document that bit to mean "canonicalization" when used in combination with getting the URL. Canonicalization can probably be done by always URL decoding *and* URL encoding each individual part before they're put together to the end result... Wouldn't that work? -- / daniel.haxx.se --- Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library Etiquette: https://curl.haxx.se/mail/etiquette.html
Re: a URL API ?
On 8/1/2018 6:17 PM, Daniel Stenberg wrote: > In the 2018 user survey, more than 40% of the 395 users who answered > the question said they'd use a "URL handling" API in libcurl if one > existed. > > I gave it some thoughts the other day and I've now jotted down my > initial suggestion on how it could be made to work. An API that can > parse a URL, extract the individual pieces, allow the user to set > individual parts and finally to get the full URL out from there again. > > Here's my thoughts: > > https://github.com/curl/curl/wiki/URL-API > > Good or bad? What would your application need and would this work for > that? If not, how should we change it to make it better? > > (There's no promise that this will ever actually get implemented, but > if we can come up with a proposal we believe in, I don't think there > needs to be anything stopping it from happening...) I think you overdid it (I know, I'm a little behind on this discussion [1]), a struct would be simpler. curl_url_parse(url, ); curl_url_build(parts, ); Is there really a need for a more expansive API? [1]: https://www.youtube.com/watch?v=gqQ99s4Ywnw --- Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library Etiquette: https://curl.haxx.se/mail/etiquette.html
Re: a URL API ?
On Thu, Aug 09, 2018 at 10:48:32AM +0200, Daniel Stenberg via curl-library wrote: > Returning to this, as I've polished the API a bit over the last few days. > The wiki page has been updated to reflect the changes I've done. I'm curious whether the API can be used to canonicalize a URL, i.e., URL decode parts that can be decoded without semantic difference but canonicalize those parts that the specs say must be encoded. The idea would be that the output of canonicalization would be the same for every version of a URL passed in. I'm guessing that url_url_get(h, CURLUPART_URL, , CURLU_URLDECODE); would get you half way there, except that it might not encode parts that have to be encoded. Is there another way I'm missing? --- Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library Etiquette: https://curl.haxx.se/mail/etiquette.html
Re: a URL API ?
2018-08-09 14:15 GMT+02:00 Daniel Stenberg via curl-library : > ... or should it perhaps just skip the *first* '=' ? I don't think any URL parsing library cares about = beyond the first one. Which is why = in name may pose a problem, but in value probably won't. I'd skip all. --- Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library Etiquette: https://curl.haxx.se/mail/etiquette.html
Re: a URL API ?
On Thu, 9 Aug 2018, Daniel Stenberg via curl-library wrote: (replying to myself...9 /* append to query, ask for encoding */ curl_url_set(h, CURLUPART_QUERY, "company=AT", CURLU_APPENDQUERY| - CURLU_URLENCODE with CURLU_APPENDQUERY set, will skip the '=' letter when doing the encoding ... or should it perhaps just skip the *first* '=' ? If we ponder a user wants to add a "name=contents" pair, I figure the first assignment is the one that shouldn't be encoded but subsequent ones then presumably should? -- / daniel.haxx.se --- Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library Etiquette: https://curl.haxx.se/mail/etiquette.html
Re: a URL API ?
On Thu, 9 Aug 2018, Daniel Jeliński via curl-library wrote: char *append = "=44"; Well assuming we want to use the API to build URL based on HTML form with GET action, curl_url_query_append suggested by Geoff would be much nicer. Yes, you're right. I've taken a more generic approach that isn't at all aware of HTML forms. In particular, I would expect the API to: - figure out if it needs to add & or ? - figure out if it needs to URLEncode the parameter or value (eg. when setting "company"="AT", we need to escape the ampersand) - do the appending / memory allocation part on its own What do you think? I hear you! How about... A dedicated feature bit to append the string to the query? /* append to query, ask for encoding */ curl_url_set(h, CURLUPART_QUERY, "company=AT", CURLU_APPENDQUERY| CURLU_URLENCODE); /* append to query, already encoded */ curl_url_set(h, CURLUPART_QUERY, "company=AT%26T", CURLU_APPENDQUERY); - CURLU_APPENDQUERY makes it also add a '&' before the string if there's already contents in the query. - CURLU_URLENCODE with CURLU_APPENDQUERY set, will skip the '=' letter when doing the encoding -- / daniel.haxx.se--- Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library Etiquette: https://curl.haxx.se/mail/etiquette.html
Re: a URL API ?
2018-08-09 10:48 GMT+02:00 Daniel Stenberg via curl-library : > Say we want to append this to the query: > > char *append = "=44"; Well assuming we want to use the API to build URL based on HTML form with GET action, curl_url_query_append suggested by Geoff would be much nicer. In particular, I would expect the API to: - figure out if it needs to add & or ? - figure out if it needs to URLEncode the parameter or value (eg. when setting "company"="AT", we need to escape the ampersand) - do the appending / memory allocation part on its own What do you think? --- Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library Etiquette: https://curl.haxx.se/mail/etiquette.html
Re: a URL API ?
On Thu, 2 Aug 2018, Geoff Beier wrote: The setters would be important to us. I might be bikeshedding here, but the ability to add to the query would be very nice. So something like curl_url_query_append(urlp, "numitems", 3) Returning to this, as I've polished the API a bit over the last few days. The wiki page has been updated to reflect the changes I've done. As the curl URL API works now, this is how you append a string to the query of a URL. First, create a handle and pass it a full URL: CURLU *h = curl_url(); curl_url_set(h, CURLUPART_URL, "https://example.com/foo?askforthis;, 0); Say we want to append this to the query: char *append = "=44"; We extract the query part char *q; curl_url_get(h, CURLUPART_QUERY, , 0); Make space for the new enlarged query doing regular memory management and create the updated querty there. The 'q' pointer points to memory managed by libcurl so it can't be realloc'ed. char *newptr = malloc(strlen(q) + strlen(append) + 1); strcpy(newptr, q); strcat(newptr, append); Then replace the former query part in the URL by setting this new one: curl_url_set(h, CURLUPART_QUERY, newptr, 0); Free the data curl_free(q); free(newptr); ... and now we can extract the full URL again and it will have the updated query part: char *url; curl_url_get(h, CURLUPART_URL, , 0); -- / daniel.haxx.se --- Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library Etiquette: https://curl.haxx.se/mail/etiquette.html
Re: a URL API ?
On Mon, 6 Aug 2018, Daniel Stenberg wrote: - implement the missing pieces of the API: I've pushed a first PR of the URL API work to make sure all tests and builds are happy: https://github.com/curl/curl/pull/2842 I've listed outstanding work in there. The API works pretty good already so if there's someone who's interested in being a test-pilot, building and testing it could be fun and give some valueable input for me and is an opportunity for you to also affect its functionality. -- / daniel.haxx.se --- Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library Etiquette: https://curl.haxx.se/mail/etiquette.html
Re: a URL API ?
On Thu, 2 Aug 2018, Daniel Stenberg wrote: Good or bad? What would your application need and would this work for that? I've written some initial code for this API [1] now. As I proceed further, I intend to remove the wiki page as I suspect there will be a lot of details that have changed and I'll instead get the details right and documented in the coming man pages for the new stuff. curl_url() mostly works and there's a bunch of tests for the fundamentals. curl_url_get() works to extract all the parsed parts except CURLUPART_URL. curl_url_cleanup() works. The initial work exists in a separate branch [2] - and as long as I'm the only developer in this branch I intend to rebase and squash it regularly. Just beware. The current api header can also be browsed online at [3]. Next up, I intend to: - implement the missing pieces of the API: curl_url_get's CURLUPART_URL option curl_url_set() curl_url_dup() - add more test cases to make sure the functions work, also to make it easy to add more tests later - write documentation for the new functions and options - make use of the new parser for the existing libcurl functionlity to reduce code duplication and make sure the URL API and the libcurl transfer engine treat URLs identically - add a curl_easy_setopt() option to take a CURLURL pointer instead of a URL string - consider command line tool access to the URL API for parsing URLs There's not yet any schedule for when this can land. I intend to mark this API as "experimental" for the first few releases it appears to signal to users and us all what to expect from it. Maybe this is 7.63.0 material? [1] = https://github.com/curl/curl/wiki/URL-API [2] = https://github.com/curl/curl/tree/URL-API [3] = https://github.com/curl/curl/blob/URL-API/include/curl/urlapi.h -- / daniel.haxx.se --- Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library Etiquette: https://curl.haxx.se/mail/etiquette.html
Re: a URL API ?
On Fri, 3 Aug 2018, Samuel Hurst wrote: For example, being able to build new URLs from relative ones. I can't quite tell from the examples provided whether curl_url would do relative transformation if the urlhandle is already valid. I can see a use case where I'd want to do the following: CURLURL *url_handle = NULL; curl_url("https://example.org/hello;, url_handle, 0); ... curl_url("/image.png", url_handle, 0); Ah, yes. I like this suggestion - even if "/image.png" could've been handled with just setting a new path. I suppose "../image.png" or something would be a better example. As for the specific API to deal with a relative URL, I think I prefer to have it not overload curl_url() for that. As I prefer the "alternative B" API, I think we can just add a dedicated CURLUPart for a relative URL. Then it would be used like this: curl_url("https://example.org/path/to/hello;, _handle, 0); curl_url_set(url_handle, CURLUPART_RELURL, "../image.png", 0); Some form of handle copy might be useful here too, if you're having to do a lot of relative transformations from a single base URL. Yes, something like this: CURLURL *handle = curl_url_dup(inhandle); In addition, our own URL classes support returning the path as an array of strings corresponding to to each individual path segment. We've certainly found use for this in the past, and others may also find this useful. Mm, maybe. Is there anything else to that than splitting the path on every forward slash? -- / daniel.haxx.se --- Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library Etiquette: https://curl.haxx.se/mail/etiquette.html
Re: a URL API ?
On 01/08/18 23:17, Daniel Stenberg wrote: Hi, In the 2018 user survey, more than 40% of the 395 users who answered the question said they'd use a "URL handling" API in libcurl if one existed. I gave it some thoughts the other day and I've now jotted down my initial suggestion on how it could be made to work. An API that can parse a URL, extract the individual pieces, allow the user to set individual parts and finally to get the full URL out from there again. Here's my thoughts: https://github.com/curl/curl/wiki/URL-API Good or bad? What would your application need and would this work for that? If not, how should we change it to make it better? I think the current proposal is fairly sound as a first step. I think having something built-in to cURL makes a lot of sense, given the project's focus on "doing stuff with URLs", and not having to add another library dependency is a definite plus point. However, I can think of a couple of worthwhile improvements: Currently, we have our own URL parsing classes, and there's a couple of important features that we find useful. For example, being able to build new URLs from relative ones. I can't quite tell from the examples provided whether curl_url would do relative transformation if the urlhandle is already valid. I can see a use case where I'd want to do the following: CURLURL *url_handle = NULL; curl_url("https://example.org/hello;, url_handle, 0); ... curl_url("/image.png", url_handle, 0); I'd then expect a call to curl_url_get() on url_handle to return something like "https://example.org/image.png; Some form of handle copy might be useful here too, if you're having to do a lot of relative transformations from a single base URL. In addition, our own URL classes support returning the path as an array of strings corresponding to to each individual path segment. We've certainly found use for this in the past, and others may also find this useful. That's just my two pennies worth anyway. -Sam --- Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library Etiquette: https://curl.haxx.se/mail/etiquette.html
Re: a URL API ?
On Thu, 2 Aug 2018, Geoff Beier wrote: The setters would be important to us. I might be bikeshedding here, but the ability to add to the query would be very nice. So something like curl_url_query_append(urlp, "numitems", 3) That ("numitems", 3) approach is very specific for adding a "[name]=[number]" snippet though. Shouldn't it rather be a function for appdending a generic string? curl_url_query_append(urp, "numitems=3"); ... or using the alternative B API (https://github.com/curl/curl/wiki/URL-API#url-api-alternative-b) approach: curl_url_append(urp, CURLUPART_QUERY, "numitems=3"); An unsigned type feels more appropriate for port than int does. Yes, or as a full blown string to make it a generic part in alternative B. I assume this is obvious, but if this API gets added to curl, a setopt that takes a CURLURL would be wanted :) Yes! -- / daniel.haxx.se --- Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library Etiquette: https://curl.haxx.se/mail/etiquette.html
Re: a URL API ?
We use the uriparser library Radu Hociung mentioned primarily because there was nothing analogous in curl. The setters would be important to us. I might be bikeshedding here, but the ability to add to the query would be very nice. So something like curl_url_query_append(urlp, "numitems", 3) An unsigned type feels more appropriate for port than int does. While I can't say that we'd move our working code from uriparser to this API, I'm certain this API would have prevented us from reaching for uriparser in the first place, particularly because all of our URL usage is for consumption by libcurl anyway. I assume this is obvious, but if this API gets added to curl, a setopt that takes a CURLURL would be wanted :) On Thu, Aug 2, 2018 at 8:41 AM, Daniel Stenberg wrote: > On Wed, 1 Aug 2018, Radu Hociung wrote: > > a "URL handling" API in libcurl >>> >> >> This wheel has already been invented. [1] >> >> [1] https://uriparser.github.io/ >> > > Are you suggesting this uses an API that would be worth getting > inspiration from? Personally I don't see a lot I like there... > > (Dan already presented some reasons why having our own makes sense.) > > -- > > / daniel.haxx.se > > --- > Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library > Etiquette: https://curl.haxx.se/mail/etiquette.html > --- Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library Etiquette: https://curl.haxx.se/mail/etiquette.html
Re: a URL API ?
On Wed, 1 Aug 2018, Radu Hociung wrote: a "URL handling" API in libcurl This wheel has already been invented. [1] [1] https://uriparser.github.io/ Are you suggesting this uses an API that would be worth getting inspiration from? Personally I don't see a lot I like there... (Dan already presented some reasons why having our own makes sense.) -- / daniel.haxx.se --- Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library Etiquette: https://curl.haxx.se/mail/etiquette.html
Re: a URL API ?
On Thu, 2 Aug 2018, Dan Fandrich wrote: My first thought it that it's pretty long-winded. I can't think of many situations where you might want only one part of a URL, like the port, but don't want e.g. the host, or the query without the path. What was in the back of my mind for such an API was something simple like: CURLUCode curl_url_parse( const char *url, char **scheme, char **host, char **port, char **user, char **password, char **path, char **query, char **fragment); I find that API a bit awkward due to the large amount of arguments. That style could also be supported with: CURLUCode curl_url_parse(const char *url, struct curlurl *store); ... and hake 'struct curlurl' be a public struct that gets filled in with those eight char pointers. But I'm not too fond of doing public structs like that as they're so hard to change in the future. I also comtemplated having the API work like this: curl_url_get(CURLURL *handle, CURLPIECE what, char **piece, int flags); curl_url_set(CURLURL *handle, CURLPIECE what, char *piece, int flags); ... and have the second argument specify what part to extract or set instead of doing one function for each. In fact, I think I now prefer this way since it reduces the amount of function calls needed and makes the get/set calls very uniform. Your proposal makes me realize that a bitmap options argument would also be good to allow some configuration of that process. It's either that or taking a firm stance of what the "correct" response should be in those cases, but I think it might be hard to have one single correct behavior everywhere... The handle-based approach makes it easier to manage URLs; if all parts of the URL aren't needed at the same time, the application doesn't need to keep 8 variables lying around to hold them all. I've grown to like handle-based APIs since they keep things flexible for the future... A couple of things missing from in your proposal are handling of the fragment part of the URL Oops! and support for unescaping and possibly un-plussing query parts (changing plusses to spaces), i.e., turning a URL into something useful as-is in an application. Yeah, I sort of left that out as a start as I wasn't really sure if that should be part of this API, but I think you're right that users will probably appreciate exactly that ability. I'd also move the "set" part of the name before what it's setting, so it's more like curl_easy_setopt. Fair enough! -- / daniel.haxx.se --- Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library Etiquette: https://curl.haxx.se/mail/etiquette.html
Re: a URL API ?
On Wed, Aug 01, 2018 at 11:54:56PM -0500, Radu Hociung wrote: > On 01/08/2018 5:17 PM, Daniel Stenberg wrote: > > a "URL handling" API in libcurl > > This wheel has already been invented. [1] The main point of having a libcurl API has to do with reducing security issues due to differences in URL parsing between libcurl and applications. Pointing people at another parser won't solve that problem, unless libcurl were rewritten to use that parser itself. That may actually be a solution, but we'd have to be careful that using an external parser doesn't introduce regressions. What worries me is the statement that this particular library is "strictly RFC 3986 compliant" which means that the various quirks added to libcurl's parser to conform to WhatWG and browser implementation will disappear and introduce such regressions. An external library would also hamper libcurl's ability to fix problems or introduce new URL schemes (standardized or not) in the future. --- Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library Etiquette: https://curl.haxx.se/mail/etiquette.html
Re: a URL API ?
On 01/08/2018 5:17 PM, Daniel Stenberg wrote: > a "URL handling" API in libcurl This wheel has already been invented. [1] Cheers. Radu. [1] https://uriparser.github.io/ --- Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library Etiquette: https://curl.haxx.se/mail/etiquette.html
Re: a URL API ?
yes, I think passing around a struct with this information (and able to get uri from its parts) would be useful would be more useful then just direct url construction ... that way one could represent a common base uri and modify it by just setting new path (in that respect we may want ability to manage segments, query or fragments of a uri). +1 for unescape/escape. Jim On 2 August 2018 at 07:36, Dan Fandrich wrote: > On Thu, Aug 02, 2018 at 12:17:26AM +0200, Daniel Stenberg wrote: >> Here's my thoughts: >> >> https://github.com/curl/curl/wiki/URL-API >> >> Good or bad? What would your application need and would this work for that? >> If not, how should we change it to make it better? > > My first thought it that it's pretty long-winded. I can't think of many > situations where you might want only one part of a URL, like the port, but > don't want e.g. the host, or the query without the path. What was in the back > of my mind for such an API was something simple like: > > CURLUCode curl_url_parse( > const char *url, > char **scheme, > char **host, > char **port, > char **user, > char **password, > char **path, > char **query, > char **fragment); > CURLUCode curl_url_build( > char **url, > const char *scheme, > const char *host, > const char *port, > const char *user, > const char *password, > const char *path, > const char *query, > const char *fragment); > > which simply splits an URL into its constituent parts and puts it back > together > again. Your proposal makes me realize that a bitmap options argument would > also > be good to allow some configuration of that process. > > But maybe this is too simplistic. Using a handle-based approach as you do lets > you cleanly add additional functions to, e.g., iterate over individual > parameters in the query part, or over parts of the path (although that could > be > done with my simple proposal with another function that operates on the query > part directly). Additional functions for handling specific URL schemes more > easily might also be welcome and could be added later, like one to directly > return an IMAP mailbox name from an IMAP URL. The handle-based approach makes > it easier to manage URLs; if all parts of the URL aren't needed at the same > time, the application doesn't need to keep 8 variables lying around to hold > them all. > > A couple of things missing from in your proposal are handling of the fragment > part of the URL, and support for unescaping and possibly un-plussing query > parts (changing plusses to spaces), i.e., turning a URL into something useful > as-is in an application. I'd also move the "set" part of the name before what > it's setting, so it's more like curl_easy_setopt. > --- > Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library > Etiquette: https://curl.haxx.se/mail/etiquette.html --- Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library Etiquette: https://curl.haxx.se/mail/etiquette.html
Re: a URL API ?
On Thu, Aug 02, 2018 at 12:17:26AM +0200, Daniel Stenberg wrote: > Here's my thoughts: > > https://github.com/curl/curl/wiki/URL-API > > Good or bad? What would your application need and would this work for that? > If not, how should we change it to make it better? My first thought it that it's pretty long-winded. I can't think of many situations where you might want only one part of a URL, like the port, but don't want e.g. the host, or the query without the path. What was in the back of my mind for such an API was something simple like: CURLUCode curl_url_parse( const char *url, char **scheme, char **host, char **port, char **user, char **password, char **path, char **query, char **fragment); CURLUCode curl_url_build( char **url, const char *scheme, const char *host, const char *port, const char *user, const char *password, const char *path, const char *query, const char *fragment); which simply splits an URL into its constituent parts and puts it back together again. Your proposal makes me realize that a bitmap options argument would also be good to allow some configuration of that process. But maybe this is too simplistic. Using a handle-based approach as you do lets you cleanly add additional functions to, e.g., iterate over individual parameters in the query part, or over parts of the path (although that could be done with my simple proposal with another function that operates on the query part directly). Additional functions for handling specific URL schemes more easily might also be welcome and could be added later, like one to directly return an IMAP mailbox name from an IMAP URL. The handle-based approach makes it easier to manage URLs; if all parts of the URL aren't needed at the same time, the application doesn't need to keep 8 variables lying around to hold them all. A couple of things missing from in your proposal are handling of the fragment part of the URL, and support for unescaping and possibly un-plussing query parts (changing plusses to spaces), i.e., turning a URL into something useful as-is in an application. I'd also move the "set" part of the name before what it's setting, so it's more like curl_easy_setopt. --- Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library Etiquette: https://curl.haxx.se/mail/etiquette.html
Re: a URL API ?
On Wed, Aug 1, 2018 at 3:19 PM Daniel Stenberg wrote: > Hi, > > In the 2018 user survey, more than 40% of the 395 users who answered the > question said they'd use a "URL handling" API in libcurl if one existed. > > I gave it some thoughts the other day and I've now jotted down my initial > suggestion on how it could be made to work. An API that can parse a URL, > extract the individual pieces, allow the user to set individual parts and > finally to get the full URL out from there again. In case this is interesting, I’ve used https://github.com/nodejs/http-parser in a similar capacity in the past. It’s been a while since I’ve used it, and I haven’t done a comparison between your proposal and it, but figured “before i forget... I’d better send this”. Obviously not fully integrated w cURL, but if there’s good reference material... -bch > > Here's my thoughts: > >https://github.com/curl/curl/wiki/URL-API > > Good or bad? What would your application need and would this work for > that? If > not, how should we change it to make it better? > > (There's no promise that this will ever actually get implemented, but if > we > can come up with a proposal we believe in, I don't think there needs to be > anything stopping it from happening...) > > -- > > / daniel.haxx.se > --- > Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library > Etiquette: https://curl.haxx.se/mail/etiquette.html --- Unsubscribe: https://cool.haxx.se/list/listinfo/curl-library Etiquette: https://curl.haxx.se/mail/etiquette.html