FYI, none of these diacritics look like this on my screen. (Firefox 12.0, Mac OS 10.7.4). I also looked at it in Opera and Safari, and both looked ok. It may be something peculiar either to your system or your Firefox settings.
kc On 5/18/12 2:17 PM, Ben Companjen wrote: > Hi Tom, > > On 18 May 2012 19:24, Tom Morris<[email protected]> wrote: >> On Fri, May 18, 2012 at 6:51 AM, Ben Companjen<[email protected]> >> wrote: >> >>> I have noticed a couple of times that accents in names seem to be >>> disconnected from the letters. It may depend on the font and / or >>> rendering whether you see it, but when I look at >>> <http://openlibrary.org/authors/OL5264776A/Barcynska_He%CC%81le%CC%80ne_Countess.>, >>> the accents seem to float over the letters, a little to the right. (I >>> see the escaped URI does take the letter and the accent apart...) >>> >>> Compare: Hélène (copied from OL) and Hélène (typed myself) >>> >>> Are these imported from a 'bad' source? This example was imported from >>> Talis, but the specific record >>> <http://openlibrary.org/show-records/talis_openlibrary_contribution/talis-openlibrary-contribution.mrc:1446177421:664> >>> shows the correct symbols. Or does/did ImportBot create these separate >>> accents? >> >> Both those strings render the same for me, so the rendering issue >> sounds like a problem with whatever rendering system you use. > > That's interesting. I was looking at the strings in Firefox 12 on > Windows 7. Whereas these show minor differences in distance to the > letter and shape, some other times the accents are almost completely > over the character to the right... Attached is how I see this example > on the Open Library website and in the source. > I may file a report with Firefox on this, if that is the source of the > problems. Notepad++ and LibreOffice render the combined characters > correctly. > >> >> Having said that, there are multiple ways to create accented >> characters in Unicode. There are single code points which have the >> base letter and accent pre-composed and there are separate accent code >> points that can be combined with the base letter from a different code >> point to create the character. >> >> Although both are valid, I think not normalizing is an invitation for >> confusion. If it's different from the source, perhaps the import bot >> was normalizing at one point, but was using Normalization Form D >> (NFD). I think Normalization Form C (NFC) is more natural for most >> people (and processing systems) and have recommended Freebase adopt >> it. I'd recommend OpenLibrary do the same. >> http://unicode.org/reports/tr15/#Norm_Forms >> >> Actually, looking at the raw record more closely, >> http://openlibrary.org/show-records/talis_openlibrary_contribution/talis-openlibrary-contribution.mrc:1446177421:664 >> it's character encoding is MARC8 (space character in position 9 of leader) >> http://www.loc.gov/marc/bibliographic/bdleader.html >> so something is converting it to Unicode for the web rendering and >> it's obviously doing it differently than the importer did. >> >> MARC8 using combining diacritics, >> http://www.loc.gov/marc/specifications/speccharmarc8.html#combine >> so it's not too surprising that a direct translation would would yield >> the same in Unicode, but I'd suggest that it's better to combine them >> into their NFC form. > > I hope the staff agree :) > > Thanks for clearing that up. > > Ben >> >> Tom >> _______________________________________________ >> Ol-tech mailing list >> [email protected] >> http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech >> To unsubscribe from this mailing list, send email to >> [email protected] >> >> >> _______________________________________________ >> Ol-tech mailing list >> [email protected] >> http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech >> To unsubscribe from this mailing list, send email to >> [email protected] -- Karen Coyle [email protected] http://kcoyle.net ph: 1-510-540-7596 m: 1-510-435-8234 skype: kcoylenet _______________________________________________ Ol-tech mailing list [email protected] http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech To unsubscribe from this mailing list, send email to [email protected]
