On 27/11/2021 14:41, Stewart C. Russell via talk wrote:
I have been running a WordPress blog hosted on a Linux-based shared host
since WordPress became a thing. It has worked quite well from about 2004
up until a few weeks ago.
<snip>
So the phonetic character U+0252 has been mangled into U+00C9 + U+2019.
Every UTF-8 character seems to be affected this way.
I wasn't expecting to wake up to a UTF-8 encoding problem this decade.
There are a raft of "how to fix WP encoding issues" pages that show up
in web searches, but the newest of them is from 2008 or so.
I'm pretty much resigned to going through 16+ years of posts fixing
this, but can mangled UTF-8 be recovered without rekeying?
Probably. If you've been running it for 10+ years, there is/was most
certainly some latin1 data hanging around, that's likely been converted
to UTF-8, or UTF-8 that's been double-encoded somewhere along the line.
This page has a section on the possible incorrect casing issue and a
fix:
https://codex.wordpress.org/Converting_Database_Character_Sets#Variant:_3-step_conversion_when_data_and_table_charset_already_don.27t_match
The rest of the page has a lot of useful information as well that might
apply to your situation.
Another thing to try is using mysqli_set_charset("UTF8"); somewhere in
your site's code. Substitute in different character sets until you find
the correct one, and then you'll be able to figure out a way to migrate
your tables to whatever WordPress wants.
Cheers, Jamon
---
Post to this mailing list talk@gtalug.org
Unsubscribe from this mailing list https://gtalug.org/mailman/listinfo/talk