FYI, none of these diacritics look like this on my screen. (Firefox 
12.0, Mac OS 10.7.4). I also looked at it in Opera and Safari, and both 
looked ok. It may be something peculiar either to your system or your 
Firefox settings.

kc

On 5/18/12 2:17 PM, Ben Companjen wrote:
> Hi Tom,
>
> On 18 May 2012 19:24, Tom Morris<[email protected]>  wrote:
>> On Fri, May 18, 2012 at 6:51 AM, Ben Companjen<[email protected]>  
>> wrote:
>>
>>> I have noticed a couple of times that accents in names seem to be
>>> disconnected from the letters. It may depend on the font and / or
>>> rendering whether you see it, but when I look at
>>> <http://openlibrary.org/authors/OL5264776A/Barcynska_He%CC%81le%CC%80ne_Countess.>,
>>> the accents seem to float over the letters, a little to the right. (I
>>> see the escaped URI does take the letter and the accent apart...)
>>>
>>> Compare: Hélène (copied from OL) and Hélène (typed myself)
>>>
>>> Are these imported from a 'bad' source? This example was imported from
>>> Talis, but the specific record
>>> <http://openlibrary.org/show-records/talis_openlibrary_contribution/talis-openlibrary-contribution.mrc:1446177421:664>
>>> shows the correct symbols. Or does/did ImportBot create these separate
>>> accents?
>>
>> Both those strings render the same for me, so the rendering issue
>> sounds like a problem with whatever rendering system you use.
>
> That's interesting. I was looking at the strings in Firefox 12 on
> Windows 7. Whereas these show minor differences in distance to the
> letter and shape, some other times the accents are almost completely
> over the character to the right... Attached is how I see this example
> on the Open Library website and in the source.
> I may file a report with Firefox on this, if that is the source of the
> problems. Notepad++ and LibreOffice render the combined characters
> correctly.
>
>>
>> Having said that, there are multiple ways to create accented
>> characters in Unicode.  There are single code points which have the
>> base letter and accent pre-composed and there are separate accent code
>> points that can be combined with the base letter from a different code
>> point to create the character.
>>
>> Although both are valid, I think not normalizing is an invitation for
>> confusion.  If it's different from the source, perhaps the import bot
>> was normalizing at one point, but was using Normalization Form D
>> (NFD).  I think Normalization Form C (NFC) is more natural for most
>> people (and processing systems) and have recommended Freebase adopt
>> it.  I'd recommend OpenLibrary do the same.
>> http://unicode.org/reports/tr15/#Norm_Forms
>>
>> Actually, looking at the raw record more closely,
>> http://openlibrary.org/show-records/talis_openlibrary_contribution/talis-openlibrary-contribution.mrc:1446177421:664
>> it's character encoding is MARC8 (space character in position 9 of leader)
>> http://www.loc.gov/marc/bibliographic/bdleader.html
>> so something is converting it to Unicode for the web rendering and
>> it's obviously doing it differently than the importer did.
>>
>> MARC8 using combining diacritics,
>> http://www.loc.gov/marc/specifications/speccharmarc8.html#combine
>> so it's not too surprising that a direct translation would would yield
>> the same in Unicode, but I'd suggest that it's better to combine them
>> into their NFC form.
>
> I hope the staff agree :)
>
> Thanks for clearing that up.
>
> Ben
>>
>> Tom
>> _______________________________________________
>> Ol-tech mailing list
>> [email protected]
>> http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech
>> To unsubscribe from this mailing list, send email to 
>> [email protected]
>>
>>
>> _______________________________________________
>> Ol-tech mailing list
>> [email protected]
>> http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech
>> To unsubscribe from this mailing list, send email to 
>> [email protected]

-- 
Karen Coyle
[email protected] http://kcoyle.net
ph: 1-510-540-7596
m: 1-510-435-8234
skype: kcoylenet
_______________________________________________
Ol-tech mailing list
[email protected]
http://mail.archive.org/cgi-bin/mailman/listinfo/ol-tech
To unsubscribe from this mailing list, send email to 
[email protected]

Reply via email to