Hello everyone
I've got a decode-url function from somewhere, did a search to find out where, but
didn't succeed. Have searched the escribe site as well, but with no luck. (Did I write
it myself?).
Here's the code:
decode-url: func [to-decode /local hex] [
hex: charset "0123456789ABCDEFabcdef"
parse/all to-decode [some [copy entity insert-point: ["%" 2 hex] (
insert-point: remove/part insert-point 3
insert insert-point to-char to-integer to-issue next
entity) |
skip ]]
to-decode
]
Now I discovered that the code has a problem: once it finds an entity, it replaces
three characters with one. As the parse continues, of two adjacent entities, only the
first will be replaced, since parse suddenly finds itself in the middle of the next
one after the replace:
>> decode-url "http%3A%2F%2Fwww.rebol.com%2F"
== "http:%2F/www.rebol.com/"
I looked at different parse tutorials, including yours, Brett, to manipulate parse's
index. But look at this:
decode-url: func [to-decode /local hex] [
hex: charset "0123456789ABCDEFabcdef"
parse/all to-decode [some [copy entity insert-point: ["%" 2 hex] (
insert-point: remove/part insert-point 3
insert insert-point to-char to-integer to-issue next
entity
print join "entity: " entity
print join "instert-point after replace: " insert-point
) |
(print join "not %: " insert-point ) skip ]]
to-decode
]
>> print decode-url "http%3A%2F%2Fwww.rebol.com%2F"
not %: http%3A%2F%2Fwww.rebol.com%2F
not %: ttp%3A%2F%2Fwww.rebol.com%2F
not %: tp%3A%2F%2Fwww.rebol.com%2F
not %: p%3A%2F%2Fwww.rebol.com%2F
entity: %3A
instert-point after replace: :%2F%2Fwww.rebol.com%2F
not %: F%2Fwww.rebol.com%2F
entity: %2F
instert-point after replace: /www.rebol.com%2F
not %: w.rebol.com%2F
not %: .rebol.com%2F
not %: rebol.com%2F
not %: ebol.com%2F
not %: bol.com%2F
not %: ol.com%2F
not %: l.com%2F
not %: .com%2F
not %: com%2F
not %: om%2F
not %: m%2F
entity: %2F
instert-point after replace: /
not %: /
not %:
http:%2F/www.rebol.com/
So the insert-point is perfectly well situated to continue, but it seems once an
entity is evaluated and replaced, 'parse continues at the index where it left of
*in*the*original*string*. Suppose this is only natural and as it should be, but I
haven't had enough coffee to find a workaround this morning. (except this:
replace/all the_url "%3A" ":"
replace/all the_url "%2F" "/"
replace/all the_url "\" "/"
but I'd prefer my decode-url method to work).
Do I have to rewrite the rule to look only for "%", so that the next two characters
are untouched?
~H
Pr�tera censeo Carthaginem esse delendam
--
To unsubscribe from this list, please send an email to
[EMAIL PROTECTED] with "unsubscribe" in the
subject, without the quotes.