Re: [go-nuts] how is xml.Decoder.CharsetReader supposed to be held?

2022-05-06 Thread 'Dan Kortschak' via golang-nuts
On Fri, 2022-05-06 at 15:55 -0700, Ian Lance Taylor wrote:
> On Fri, May 6, 2022 at 3:07 AM 'Dan Kortschak' via golang-nuts
>  wrote:
> >
> > On Fri, 2022-05-06 at 11:22 +0200, Diego Joss wrote:
> > > Does this work for you?
> > >
> > > https://go.dev/play/p/xLRawVhcRtF
> > >
> >
> > Thanks. No, the documents are in UTF-16, and the procinst will be
> > too.
> > So it looks more like this https://go.dev/play/p/4IcXNI3yd2M. If I
> > pull
> > the proc inst out of the UTF-16, then I can get it to work;
> > https://go.dev/play/p/kHwkVWtxbNO. But this leads to the issue
> > where at
> > that point I could just decode the whole message and pass it
> > through.
> > So I don't really see the point of using CharsetReader (at least
> > not
> > with UTF-16).
>
> Yeah, that's not the kind of thing that CharsetReader can help with.
> You'll need a plain io.Reader that converts from UTF-16 to UTF-8.
>
> CharsetReader only works if the character set name is available in
> plain ASCII in the first XML definitions, but the data doesn't use
> UTF-8.  It can be used with the kinds of encodings found in the
> subdirectories of https://pkg.go.dev/golang.org/x/text/encoding.
>
> Ian
>

Thanks, Ian.

It might be moot, because it looks like the encoding declaration in the
XML that I have is lying. But in general the solution would need to
sniff the first line and then try for finding the encoding declaration.
I suspect that this is what other languages do in this situation.

Dan


-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/3277068af53ec326a6b3163e4f6d0242b96aa81a.camel%40kortschak.io.


Re: [go-nuts] how is xml.Decoder.CharsetReader supposed to be held?

2022-05-06 Thread Ian Lance Taylor
On Fri, May 6, 2022 at 3:07 AM 'Dan Kortschak' via golang-nuts
 wrote:
>
> On Fri, 2022-05-06 at 11:22 +0200, Diego Joss wrote:
> > Does this work for you?
> >
> > https://go.dev/play/p/xLRawVhcRtF
> >
>
> Thanks. No, the documents are in UTF-16, and the procinst will be too.
> So it looks more like this https://go.dev/play/p/4IcXNI3yd2M. If I pull
> the proc inst out of the UTF-16, then I can get it to work;
> https://go.dev/play/p/kHwkVWtxbNO. But this leads to the issue where at
> that point I could just decode the whole message and pass it through.
> So I don't really see the point of using CharsetReader (at least not
> with UTF-16).

Yeah, that's not the kind of thing that CharsetReader can help with.
You'll need a plain io.Reader that converts from UTF-16 to UTF-8.

CharsetReader only works if the character set name is available in
plain ASCII in the first XML definitions, but the data doesn't use
UTF-8.  It can be used with the kinds of encodings found in the
subdirectories of https://pkg.go.dev/golang.org/x/text/encoding.

Ian

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/CAOyqgcWUsmnRpC_a_HausGTAasgcT1862F%3Dcae04cs%3DTP2uRVA%40mail.gmail.com.


Re: [go-nuts] how is xml.Decoder.CharsetReader supposed to be held?

2022-05-06 Thread 'Dan Kortschak' via golang-nuts
On Fri, 2022-05-06 at 11:22 +0200, Diego Joss wrote:
> Does this work for you?
>
> https://go.dev/play/p/xLRawVhcRtF
>

Thanks. No, the documents are in UTF-16, and the procinst will be too.
So it looks more like this https://go.dev/play/p/4IcXNI3yd2M. If I pull
the proc inst out of the UTF-16, then I can get it to work;
https://go.dev/play/p/kHwkVWtxbNO. But this leads to the issue where at
that point I could just decode the whole message and pass it through.
So I don't really see the point of using CharsetReader (at least not
with UTF-16).

Dan


-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/b3554811bb056f6741c4b2f484abd2fb7e801f6f.camel%40kortschak.io.


Re: [go-nuts] how is xml.Decoder.CharsetReader supposed to be held?

2022-05-06 Thread Diego Joss
Does this work for you?

https://go.dev/play/p/xLRawVhcRtF

-- Diego

-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/CAGjxhKmCJWczpUhtgyWsn0dJ1_E9PcTUBOZW112bGsVQyMZq0g%40mail.gmail.com.


[go-nuts] how is xml.Decoder.CharsetReader supposed to be held?

2022-05-05 Thread 'Dan Kortschak' via golang-nuts
I'm in the situation of needing to provide cross-platform xml decoding.
So I thought that xml.Decoder.CharsetReader would be the right approach
in conjunction with golag.org/x/text/encoding. However, the xml decoder
needs to be able to understand the text in order to be able to read the
proc inst to get the charset out to hand to CharsetReader.

So it seems that we need to get the proc inst out from the io.Reader
input, deduce the charset and convert it to UTF-8 and then reinject it
into the io.Reader so that the charset can then be passed to
CharsetReader. This can't be the right way to do things.

I'm wondering what is the use of CharsetReader if it can't be used to
determine the charset without already having determined the charset.
How should it be used?

Dan


-- 
You received this message because you are subscribed to the Google Groups 
"golang-nuts" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to golang-nuts+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/golang-nuts/b682b574be51ed8cf1f4c1f02b4170e0560b2f6b.camel%40kortschak.io.