Hello, I'm running a (simple) web scrapper in a page written in iso-8859-1 (declared in source using a meta tag). The page contains links like this:
"http://www.ufrj.br/editais.php?tp=Acad*%EA*micos&no=Cursos&idtp=4" In one point the code calls: (combine-url/relative current-url resource) Where current-url is a Racket URL and resource is the aforementioned string. I then get the error: bytes->string/utf-8: string is not a well-formed UTF-8 encoding: #"Acad\352micos" This seems to be a problem with uri-decode. (uri-decode resource) bytes->string/utf-8: string is not a well-formed UTF-8 encoding: #" http://www.ufrj.br/editais.php?tp=Acad\352micos&no=Cursos&idtp=4" I looked at the source code of uri-decode to see that after decoding the percent encoded string, a call to bytes->string/utf-8 expects the string to be UTF-8 encoded... but there's no way to tell uri-decode to use a different encoding. I copied the relevant portion of code from uri-codec-unit.rkt from the collects/net, and verified that I can change bytes->string/utf-8 => bytes->string/latin-1 and get it to work... but that's like cheating :) AFAICT Chrome and Firefox handles the URL " http://www.ufrj.br/editais.php?tp=Acad*%EA*micos&no=Cursos&idtp=4" as well as it's UTF-8 %-encoded equivalent "http://www.ufrj.br/editais.php?tp=Acad* %C3%AA*micos&no=Cursos&idtp=4", with the difference that the second appears as "http://www.ufrj.br/editais.php?tp=Acadêmicos&no=Cursos&idtp=4" (but when copied->pasted is still *%C3%AA* instead of *ê*). How could I make uri-decode understand an encoding other than UTF-8? Thanks, Rodolfo Carvalho
_________________________________________________ For list-related administrative tasks: http://lists.racket-lang.org/listinfo/users

