On 27/11/2021 14:41, Stewart C. Russell via talk wrote:
I have been running a WordPress blog hosted on a Linux-based shared host since WordPress became a thing. It has worked quite well from about 2004 up until a few weeks ago.
<snip>
So the phonetic character U+0252 has been mangled into U+00C9 + U+2019. Every UTF-8 character seems to be affected this way.

I wasn't expecting to wake up to a UTF-8 encoding problem this decade. There are a raft of "how to fix WP encoding issues" pages that show up in web searches, but the newest of them is from 2008 or so.

I'm pretty much resigned to going through 16+ years of posts fixing this, but can mangled UTF-8 be recovered without rekeying?

Probably. If you've been running it for 10+ years, there is/was most certainly some latin1 data hanging around, that's likely been converted to UTF-8, or UTF-8 that's been double-encoded somewhere along the line.

This page has a section on the possible incorrect casing issue and a fix: https://codex.wordpress.org/Converting_Database_Character_Sets#Variant:_3-step_conversion_when_data_and_table_charset_already_don.27t_match

The rest of the page has a lot of useful information as well that might apply to your situation.

Another thing to try is using mysqli_set_charset("UTF8"); somewhere in your site's code. Substitute in different character sets until you find the correct one, and then you'll be able to figure out a way to migrate your tables to whatever WordPress wants.

Cheers, Jamon

---
Post to this mailing list talk@gtalug.org
Unsubscribe from this mailing list https://gtalug.org/mailman/listinfo/talk

Reply via email to