On Tue, Jul 21, 2015 at 2:55 PM Dreiheller, Albrecht < [email protected]> wrote:
> My concern is not about the Ogham space, but about the free usage of non-Ascii in programming languages in general. > Just imagine, when you decide to open a door for public traffic in busy city with a security check point, you wouldn't consider only how to check a single person; instead, you have to consider how you would check thousands of people within one hour, if you don’t plan to close the door again. There is no way to check thousands of people in an hour through a door that's a security check point. That's why few places have security check points. That's comparable; it's very hard to check any significant body of code at any speed, so it's a rare issue. > Therefore, consider a huge software system written developed in, let's say, Serbia or Russia using Cyrillic names throughout for classes and variables. > int ци́фра = чита́ть(пе́речень); return ци́фра; Then do what you need to do. Transliterate the Serbian characters, see if it works any differently. The language (in any character set) is going to be a large barrier for a lot of audiences, but that's what it is. > Looking for a deliberate attempt to confuse within this code would be like looking for a needle in a haystack, since every line has non-Ascii in it. Looking for a deliberate attempt to confuse in code is like looking for a needle in a haystack. If those two lines shown in my last post had been hidden in a million line kernel, they would have been rather hard to find, particularly if the kernel wasn't warning-clean. > I used a term "exclusion rules", meaning a ruleset bases on the confusables list. First step probably is implement it as a lint type program. Then discuss it with the compiler writers of the languages you're worried about. As I've said above, I don't see this as a huge concern for most real-life programs, since the attack surface is huge. > With "black-listed" I meant "known to be unsafe" in some way. I.e. Javascript. C. C++. A huge amount of existing and still-in-use code is written in C, whose buffer overruns are a notorious source of security holes. It seems like a much better candidate to be black-listed, if anyone was capable of such. > The fathers of ALGOL and other early languages racked their brain to avoid ambigous semantics caused by poor syntax rules. Published examples of ALGOL 60 are unreadable, and very hard to verify correctness; a modern reader will generally have to start by reformatting the code, and then replacing GOTOs with loops and ifs, and finding better variable names, if they want to know what's going on. We've increased code clarity hugely, but reading large amounts of code is still hard, hard enough that I see stressing about deliberate deception to be a narrow market. This is not something that really needs language support; it can be done in compilers and editors and lint-type programs without that support. >

