Hello Michael, Thank you for your feedback.
In Webkit1 we used to do it this way: > frame = webkit_web_view_get_main_frame(webview) > source = webkit_web_frame_get_data_source (frame); > encoding = webkit_web_data_source_get_encoding (source); 'encoding' would be something like "UTF8". So we can deal with the string in the relevant encoding. For webkit2, I cannot find a way to get encoding. Just the bytes: guchar * gu_data = webkit_web_resource_get_data_finish(...) How did webkit1's 'webkit_web_data_source_get_encoding()' function retrieve the encoding and is there a way to do the same on webkit2? Thank you. On Thu, May 31, 2018 at 8:03 PM, Michael Catanzaro <mcatanz...@igalia.com> wrote: > On Thu, May 31, 2018 at 5:05 PM, Leo Ufimtsev <leoni...@redhat.com> wrote: > >> Hello guys, >> >> The following function: >> guchar * webkit_web_resource_get_data_finish(..) >> >> Sometimes returns utf8 and sometimes utf16. Is there a way to tell them >> apart? >> >> Thank you. >> > > Hm, good question. I don't know the answer, but here are some thoughts > anyway: > > We use guchar instead of gchar to indicate that it's a byte array, not a > string, so it's not expected to be UTF-8. In fact, it could be any > arbitrary encoding, not just UTF-16. I've seen more esoteric encodings > before, particularly for CJKV websites. Of course, it might not be an HTML > resource at all, it could be an image or an executable file or anything. > > Assuming you know it is an HTML doc, then I think you want to parse the > charset from the meta tag. Of course, that's a bit difficult because you do > not know the encoding you should be using to parse it until after you have > somehow successfully parsed it. I don't know how you would do it, but > clearly WebKit knows how, somewhere. In Epiphany, our use is limited to > saving resources on disk, which then get parsed by other applications when > you open them, which is why we've never needed to deal with this problem. > > For a website loaded via HTTP, the encoding could also have been set by an > HTTP header. There's really nothing you can do in that case, as you don't > have access to that. > > I think Firefox uses an encoding detector. WebKit does not, but it's one > option. ICU can do this, as can uchardet. Problem is, they are > probabilistic and do not work well for some important encodings (e.g. > GB18030). But that might work well enough for your needs. > > Michael > > -- Leo Ufimtsev, Software Engineer, Red Hat
_______________________________________________ webkit-gtk mailing list webkit-gtk@lists.webkit.org https://lists.webkit.org/mailman/listinfo/webkit-gtk