① Lack of Unicode education in University Computer Science departments. The 
basic functions of programming, storage, sorting, transmission, regex ...etc... 
are taught using ASCII text only. Therefore Unicode programming, for example, 
is not something Graduate Computer Science students even think about. The vast 
majority of Computer Science students that I have contact with have no 
knowledge of Unicode. So, I consider it no surprise there are so many systems 
that can only handle ASCII text. If Unicode education was core to the Computer 
Science curriculum I think the situation would be much different and there 
would many more systems which are Unicode friendly.

② Lack of interest/initiative/willingness/laziness/curiosity? Actually, this 
one I find a lot more difficult to explain. In digital communication, the 
majority of people write my name as Andre instead of André. Why? They see me 
write my name as André. Does the diacritic not register with them. With my 
students, at the beginning of the academic year, I ask them to write my name 
correctly and add that it is not difficult to write my name correctly. 
Following that message to my students, the majority will write my name 
correctly. Prior to that message they write my name as Andre. So, this general 
default, lack of interest in writing names correctly will be reflected in 
webforms.

André Schappo
________________________________
From: Unicode <unicode-boun...@corp.unicode.org> on behalf of Bríd-Áine Parnell 
via Unicode <unicode@corp.unicode.org>
Sent: 29 January 2025 15:39
To: unicode@corp.unicode.org <unicode@corp.unicode.org>
Subject: Why do webforms often refuse non-ASCII characters?


** THIS MESSAGE ORIGINATED OUTSIDE LOUGHBOROUGH UNIVERSITY **

** Be wary of links or attachments, especially if the email is unsolicited or 
you don't recognise the sender's email address. **

Hi everyone,

I'm hoping someone can help me out with some information. I'm doing some 
research into the refusal of accents in names (and other multicultural naming 
conventions) in online webforms. For example, in Ireland, there was a campaign 
recently to get the government to mandate acceptance of the fada in Irish 
language names (Seán instead of Sean). The campaign was successful, and the law 
changed in 2022, but it's only a requirement for public bodies, companies do 
not have to comply.

During the campaign, reports were made to the Data Protection Commissioner on 
the right to rectify about some of the companies, including Bank of Ireland and 
Aer Lingus. They defended themselves by saying that their systems couldn't 
accept fadas in names.

I'm assuming that its systems on the back end, such as database systems, that 
can't accept the so-called special characters. My question is, why would this 
be, given that Unicode would seem to solve this, and modern databases can use 
Unicode? Does anyone understand what the value is in continuing to retain 
legacy systems that only accept ASCII or some ISO variants? Or is there a 
different problem happening?

Appreciate any information that might shed light on this.

Thanks,

Bríd-Áine Parnell

Doctoral Researcher | Designing Responsible Natural Language Processing

School of Informatics | Edinburgh Futures Institute
The University of Edinburgh is a charitable body, registered in Scotland, with 
registration number SC005336. Is e buidheann carthannais a th’ ann an Oilthigh 
Dhùn Èideann, clàraichte an Alba, àireamh clàraidh SC005336.

Reply via email to