At Sat, 25 Dec 2010 10:23:54 -0500, Neil Van Dyke wrote: > When doing a regexp on a character input port, what's the best way to > get string results out instead of bytes results?
Decode the results of `regexp-match' using `bytes->string/utf-8'. > For example, this is documented behavior, but not actually what I want, > because I don't want to have to re-encode the bytes as a string (plus, I > would have to query the input port to find out what its character > encoding, if I don't know it a priori): A string regexp on an input port matches via UTF-8 encoding by definition, so you can always use UTF-8. If some layer of the input has a different encoding, it's handled by conversion to a UTF-8 encoding at the port level. > do "regexp-match-peek-positions" as a peek and then use "read-string" That doesn't work, because you don't know how many characters to read given the positions in bytes. > Is there a better way using regexp operations on input ports? No. Decoding bytes to a string using UTF-8 has to happen at some level, so there are not really any efficiency or generality issues in performing the decoding on the result of `regexp-match'. _________________________________________________ For list-related administrative tasks: http://lists.racket-lang.org/listinfo/users