On Fri, 25 May 2007, Josiah Carlson wrote: > Apples and oranges to be sure, but there are no other statistics that > anyone else is able to offer about use of non-ascii identifiers in Java, > Javascript, C#, etc.
Let's see what we can find. I made several attempts to search for non-ASCII identifiers using google.com/codesearch and here's what I got. Java or JavaScript (total: about 1480000 files found with "lang:java .") ------------------------------------------------------------------------ 1. lang:java ^[^"]*[^\s!-~].*= (assignment to non-ASCII name) 2 files with a UTF-8 BOM at the beginning; 1 file with non-ASCII in comments; 5 files with non-ASCII in strings; 2 files with non-ASCII elsewhere in source code: 1. moin-1.5.8/wiki/htdocs/applets/moinFCKplugins/.../lang/en.js UTF-8 BOM in middle of file. 2. SMSkyline.wdgt/fr.lproj/localizedStrings.js UTF-16 BOM beginning of a UTF-8 file. (!) 2. lang:java ^[^"]*[^\s!-~]\w*\. (method call on non-ASCII name) 2 files with a UTF-8 BOM at the beginning; 13 files with non-ASCII in comments; 5 files with non-ASCII in strings; 5 files with non-ASCII elsewhere in source code: 1. struts-2.0.6/src/core/src/.../Editor2Plugin/FindReplaceDialog.js UTF-8 BOM in middle of file. 2. moin-1.5.8/wiki/htdocs/applets/moinFCKplugins/.../lang/en.js UTF-8 BOM in middle of file. 3. chickenfoot/chickenscratch/tests/findTest.js Non-breaking spaces embedded in indentation. 3. lang:java ^\s*class.*[^\s!-~] (class declaration) 2 files with non-ASCII in strings; no other hits. 4. lang:javascript ^\s*function.*[^\s!-~] (function declaration) 1 non-JavaScript file; 9 files with non-ASCII in comments; 1 file with non-ASCII in strings; 1 file with non-ASCII elsewhere in source code: 1. google_hacks_3E_code/hack_61/zoom-google.user.js Thin spaces (U+2009) embedded in code. C# (total: about 266000 files found with "lang:c# .") ----------------------------------------------------- 5. lang:c# ^[^"]*[^\s!-~].*= (assignment to non-ASCII name) 5 non-C# files; 6 files with a UTF-8 BOM at the beginning; 9 files with non-ASCII in comments; 7 files with non-ASCII elsewhere in source code: 1. blam-1.8.4pre2/src/PreferencesDialog.cs Non-breaking spaces in the middle of the line. 2. BildschirmTennis2/BildschirmTennis2/Program1.cs Identifier containing non-ASCII. 3. Ukazkova reseni CS - Prakticke priklady/.../Exp_2_03/Class2.cs Identifier containing non-ASCII. 4. Rule.cs Identifier containing non-ASCII. 5. SharpIntroduction/ComplexExample/Zv?????tko.cs Identifier containing non-ASCII. 6. WitherwynWebDist/Witherwyn/Map.cs "Times" character in expression, probably a typo. 7. PDFsharp/XGraphicsLab/MainForm.cs Identifier containing non-ASCII. 6. lang:c# ^[^"]*[^\s!-~]\w*\( (function call on non-ASCII name) 4 files with non-ASCII in comments; 6 files with non-ASCII elsewhere in source code: 1. BildschirmTennis2/BildschirmTennis2/Program1.cs Identifier containing non-ASCII. 2. SharpIntroduction/ComplexExample/Program.cs Identifier containing non-ASCII. 3. Ukazkova reseni CS - Prakticke priklady/.../Exp_2_03/Class1.cs Identifier containing non-ASCII. 4. ActiveRecord/Generator/.../RelationshipBuilderTestCase.cs Identifier containing non-ASCII, almost certainly a typo. 5. Sample1/Sample1/Program.cs Identifier containing non-ASCII. 6. Kap11/03/TEXT.CS Identifier containing non-ASCII. 7. lang:c# ^\s*class.*[^\s!-~] (class declaration) 1 hit: 1. Kap06/03/Kalen.cs Identifier containing non-ASCII. In summary, that means out of around 5.7 million Java, JavaScript, and C# files that are indexed by Google Code Search, the only use of non-ASCII identifiers I could find was in 12 C# files, and one of those 12 occurrences is almost certainly a mistake. -- ?!ng _______________________________________________ Python-3000 mailing list Python-3000@python.org http://mail.python.org/mailman/listinfo/python-3000 Unsubscribe: http://mail.python.org/mailman/options/python-3000/archive%40mail-archive.com