RE: A/R matching against query strings
Micah Cowan wrote: > Would "hash" really be useful, ever? Probably not as long as we strip off the hash before we do the comparison. Tony
Re: A/R matching against query strings
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Tony Lewis wrote: > Micah Cowan wrote: > >> On expanding current URI acc/rej matches to allow matching against query >> strings, I've been considering how we might enable/disable this >> functionality, with an eye toward backwards compatibility. > > What about something like --match-type=TYPE (with accepted values of all, > hash, path, search)? > > For the URL http://www.domain.com/path/to/name.html?a=true#content > > all would match against the entire string > hash would match against "content" > path would match against "path/to/name.html" > search would match against "a=true" > > For backward compatibility the default should be --match-type=path. > > I thought about having "host" as an option, but that duplicates another > option. As does path (up to the final /). Would "hash" really be useful, ever? It's never part of the request to the server, so it's really more "context" to the URL than a real part of the URL, as far as requests go. Perhaps that sort of thing could best wait for when we allow custom URL-parsers/filters. Also, I don't like the name "search" overly much, as that's a very limited description of the much more general use of query strings. But differentiating between three or more different match types tilts me much more strongly toward some sort of shorthand, like the explicit need for \?; with three types, perhaps we'd just use some special prefix for patterns to indicate which sort of match we want (":q:" query strings, ":a:" for all, or whatever), to save on prefix each different type of match with --match-type (or just using "all" for everything). OTOH, regex support is easy enough to add to Wget, now that we're using gnulib; we could just leave wildcards the way they are, and introduce regexes that match everything. Then query strings are '\?.*foo=bar' (or, for the really pedantic, '\?([^?]*&)?foo=bar(&[^?]*)?$') That last one, though, highlights how cumbersome it is to do proper matching against typical HTML form-generated query strings (it's not really even possible with wildcards). Perhaps a more appropriate pattern-matcher specifically for query strings would be a good idea. It's probably enough to do something like --query-='action=Edit', where there's an implied '\?([^?]*&)?' before, and '(&[^?]*)?$' after. - -- Micah J. Cowan Programmer, musician, typesetting enthusiast, gamer. GNU Maintainer: wget, screen, teseq http://micah.cowan.name/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.6 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFI/qLZ7M8hyUobTrERAmRdAJsH+9p+mTafoxqeVOstTPKrZP31CACdECCa vQ1lZnncrdHd8SSbXevK02Y= =YC2A -END PGP SIGNATURE-
RE: A/R matching against query strings
Micah Cowan wrote: > On expanding current URI acc/rej matches to allow matching against query > strings, I've been considering how we might enable/disable this > functionality, with an eye toward backwards compatibility. What about something like --match-type=TYPE (with accepted values of all, hash, path, search)? For the URL http://www.domain.com/path/to/name.html?a=true#content all would match against the entire string hash would match against "content" path would match against "path/to/name.html" search would match against "a=true" For backward compatibility the default should be --match-type=path. I thought about having "host" as an option, but that duplicates another option. Tony
Re: A/R matching against query strings
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 I sent the following last month but didn't get any feedback. I'm trying one more time. :) - -M Micah Cowan wrote: > On expanding current URI acc/rej matches to allow matching against query > strings, I've been considering how we might enable/disable this > functionality, with an eye toward backwards compatibility. > > It seems to me that one usable approach would be to require the "?" > query string to be an explicit part of rule, if it's expected to be > matched against query strings. So "-A .htm,.gif,*Action=edit*" would all > result in matches against the filename portion only, but "-A > '\?*Action=edit*' would look for "Action=edit" within the query-string > portion. (The '\?' is necessary because otherwise '?' is a wildcard > character; [?] would also work.) > > The disadvantage of that technique is that it's harder to specify that a > given string should be checked _anywhere_, regardless of whether it > falls in the filename or query-string portion; but I can't think offhand > of any realistic cases where that's actually useful. We could also > supply a --match-queries option to turn on matching of wildcard rules > for anywhere (non-wildcard suffix rules should still match only at the > end of the filename portion). > > Another option is to use a separate "-A"-like option that does what -A > does for filenames, but matches against query strings. I like this idea > somewhat less. > > Thoughts? -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.7 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iD8DBQFI/fhT7M8hyUobTrERAgvtAJ0daQEub5GS4EFc7BuGT0pG1E1n0wCgjbnx zb1QK0suZx0woMauqfL0qZI= =5mdh -END PGP SIGNATURE-