I'm grateful to have seen this post so that I had some idea what was going
on when read-xml just gave me the exception:
>
> read-xml: parse-error: expected root element - received "\uFEFF"

on a file that had been validated by libxml.

I confess that this is a bit low-level for my expertise, but I wonder if
perhaps higher-level reading functions like read-xml should take care of
this detail automatically.

-Philip

On Thu, Jun 22, 2017 at 5:33 PM, Matthew Flatt <mfl...@cs.utah.edu> wrote:

> At Thu, 22 Jun 2017 14:59:35 -0700 (PDT), kay wrote:
> > There're files that starts with BOM. When reading from those files, it
> seems
> > all of Racket's I/O API don't know to strip the BOM, making them
> extremely
> > difficult to work with.
>
> I think that's the normal choice for UTF-8 readers. (Just to make sure
> I typed "UTF-8 BOM <language>" for a few <language>s and got the same
> answer each time.) Apparently, there's some question of what the
> standard recommends for readers, although it clearly recommends against
> a useless BOM for UTF-8 writers.
>
> Some languages/libraries provide an encoding to strip a BOM from UTF-8,
> and selecting that encoding is analogous to using `reencode-input-port`
> in Racket. There's not a convenient encoding for that purpose in
> `iconv`, though, which is what `reencode-input-port` uses.
>
> So, instead of changing the port's encoding, I recommend just
> discarding a BOM match at the start of the port:
>
>  (define (discard-bom p)
>    (void (regexp-try-match #rx"^\uFEFF" p)))
>
> Used like this:
>
>   (define port (open-input-file #:mode 'text "test.txt"))
>   (discard-bom p)
>   (define line (read-line port))
>
> --
> You received this message because you are subscribed to the Google Groups
> "Racket Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to racket-users+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to