[REBOL] Parse's current index

Hallvard Ystad Thu, 17 Jan 2002 01:50:59 -0800

Hello everyone

I've got a decode-url function from somewhere, did a search to find out where, but 
didn't succeed. Have searched the escribe site as well, but with no luck. (Did I write 
it myself?).


Here's the code:
decode-url: func [to-decode /local hex] [
  hex: charset "0123456789ABCDEFabcdef"

  parse/all to-decode [some [copy entity insert-point: ["%" 2 hex] (
                             insert-point: remove/part insert-point 3
                             insert insert-point to-char to-integer to-issue next 
entity) |
                             skip ]]
  to-decode
]

Now I discovered that the code has a problem: once it finds an entity, it replaces 
three characters with one. As the parse continues, of two adjacent entities, only the 
first will be replaced, since parse suddenly finds itself in the middle of the next 
one after the replace:
>> decode-url "http%3A%2F%2Fwww.rebol.com%2F"
== "http:%2F/www.rebol.com/"

I looked at different parse tutorials, including yours, Brett, to manipulate parse's 
index. But look at this:

decode-url: func [to-decode /local hex] [
  hex: charset "0123456789ABCDEFabcdef"

  parse/all to-decode [some [copy entity insert-point: ["%" 2 hex] (
                             insert-point: remove/part insert-point 3
                             insert insert-point to-char to-integer to-issue next 
entity
                             print join "entity: " entity
                             print join "instert-point after replace: " insert-point
                             ) |
                             (print join "not %: " insert-point ) skip ]]
  to-decode
]
>> print decode-url "http%3A%2F%2Fwww.rebol.com%2F"
not %: http%3A%2F%2Fwww.rebol.com%2F
not %: ttp%3A%2F%2Fwww.rebol.com%2F
not %: tp%3A%2F%2Fwww.rebol.com%2F
not %: p%3A%2F%2Fwww.rebol.com%2F
entity: %3A
instert-point after replace: :%2F%2Fwww.rebol.com%2F
not %: F%2Fwww.rebol.com%2F
entity: %2F
instert-point after replace: /www.rebol.com%2F
not %: w.rebol.com%2F
not %: .rebol.com%2F
not %: rebol.com%2F
not %: ebol.com%2F
not %: bol.com%2F
not %: ol.com%2F
not %: l.com%2F
not %: .com%2F
not %: com%2F
not %: om%2F
not %: m%2F
entity: %2F
instert-point after replace: /
not %: /
not %:
http:%2F/www.rebol.com/

So the insert-point is perfectly well situated to continue, but it seems once an 
entity is evaluated and replaced, 'parse continues at the index where it left of 
*in*the*original*string*. Suppose this is only natural and as it should be, but I 
haven't had enough coffee to find a workaround this morning. (except this:
replace/all the_url "%3A" ":"
replace/all the_url "%2F" "/"
replace/all the_url "\" "/"
but I'd prefer my decode-url method to work).

Do I have to rewrite the rule to look only for "%", so that the next two characters 
are untouched?

~H

Pr�tera censeo Carthaginem esse delendam

-- 
To unsubscribe from this list, please send an email to
[EMAIL PROTECTED] with "unsubscribe" in the 
subject, without the quotes.

[REBOL] Parse's current index

Reply via email to