More on IDN homograph attacks, continued from the previous <http://lookout.net/2008/11/29/unicode-attacks-and-test-cases-visual-spoofing-and-the-confusables-part-1/> post.
The Confusables These types of visual attacks are attributed to what’s known as ‘the confusables‘ and have been documented in Unicode’s Technical <http://www.unicode.org/reports/tr36/> Report 36 and TR39 <http://www.unicode.org/reports/tr39/> . The confusables is a name given to scripts that essentially lookalike each other. In the confusables, you have three types of variations possible: 1. Single-script 2. Mixed-script 3. Whole-script Disclaimer: The following statements are my interpretation of the Unicode Technical Reports, and may be inaccurate. I’ve tried to understand them best as my tiny mind can. Quick definitions of each with a note. Because I’m simplifying things here, I may not be accurate in my use of the terms script, alphabet, letter, and so on. Linguistics people get it better than I do but for the rest of us, the term ‘script <http://www.unicode.org/glossary/> ‘ refers to: A collection of letters and other written signs used to represent textual information in one or more writing systems. For example, Russian is written with a subset of the Cyrillic script; Ukranian is written with a different subset. The Japanese writing system uses several scripts. Single-script confusables These occur when letters from the same alphabet, or script, are used to give the same visual appearance. For example, the following two combinations of Latin letters look identical: * so̷s * søs If you take these apart, there’s a big difference. While the letter ’s’ is the same in each, the ‘o̷’ and ‘ø’ are different. The first uses the Basic Latin ‘o’ with a combining diacritical mark named COMBINING SHORT SOLIDUS OVERLAY. To put it a different way, we have two atomic Unicode code points here, which together give the affect of a single character or letter. The second uses the atomic character LATIN SMALL LETTER O WITH STROKE. Let’s take these apart and look at the Unicode code point values for each. * so̷s == \u0073\u006F\u0337\u0073 * søs == \u0073\u00F8\u0073 As you can see, the first ‘o̷’ gets formed from two Unicode code points, \u006F and \u0337. If you copy and paste that word into a text editor that supports Unicode (e.g. Notepad) and click backspace, you’ll see the first backspace removes the combining diacritical mark, and the second removes the ‘o’. Continuing with the example, the second ‘ø’ is made of a single Unicode code point \u00F8 part of the Latin-1 Supplement Unicode block. At a lower level, because we’re using different code points and bytes to achieve the same visual affect, we have a case of the confusables. Let’s take a closer look at what qualifies as a single-script confusable for the Latin lower-case letter ‘a’ - taken from the confusables table at http://unicode.org/reports/tr39/data/confusables.txt. FF21 ; 0041 ; SA # ( A → A ) FULLWIDTH LATIN CAPITAL LETTER A → LATIN CAPITAL LETTER A # {nfkc:65314} 1D400 ; 0041 ; SA # ( [Ph4nt0m] <http://www.ph4nt0m.org/> [Ph4nt0m Security Team] <http://blog.ph4nt0m.org/> [EMAIL PROTECTED] Email: [EMAIL PROTECTED] PingMe: <http://cn.pingme.messenger.yahoo.com/webchat/ajax_webchat.php?yid=hanqin_wuhq&sig=9ae1bbb1ae99009d8859e88e899ab2d1c2a17724> === V3ry G00d, V3ry Str0ng === === Ultim4te H4cking === === XPLOITZ ! === === #_# === #If you brave,there is nothing you cannot achieve.# --~--~---------~--~----~------------~-------~--~----~ 要向邮件组发送邮件,请发到 [email protected] 要退订此邮件,请发邮件至 [EMAIL PROTECTED] -~----------~----~----~----~------~----~------~--~---
<<inline: image001.gif>>

