[Ph4nt0m] [zz]Unicode IDN homograph attacks and test cases - Visual Spoofing and the Single Script Confusables part 2

大风 Wed, 03 Dec 2008 18:23:25 -0800

More on IDN homograph attacks, continued from the previous 
<http://lookout.net/2008/11/29/unicode-attacks-and-test-cases-visual-spoofing-and-the-confusables-part-1/>
  post.


The Confusables 

These types of visual attacks are attributed to what’s known as ‘the 
confusables‘ and have been documented in Unicode’s Technical 
<http://www.unicode.org/reports/tr36/>  Report 36 and TR39 
<http://www.unicode.org/reports/tr39/> .  The confusables is a name given to 
scripts that essentially lookalike each other. In the confusables, you have 
three types of variations possible:

1.      Single-script
2.      Mixed-script
3.      Whole-script

Disclaimer:  The following statements are my interpretation of the Unicode 
Technical Reports, and may be inaccurate.  I’ve tried to understand them best 
as my tiny mind can.

Quick definitions of each with a note.  Because I’m simplifying things here, I 
may not be accurate in my use of the terms script, alphabet, letter, and so on. 
 Linguistics people get it better than I do but for the rest of us, the term 
‘script <http://www.unicode.org/glossary/> ‘ refers to:

A collection of letters and other written signs used to represent textual 
information in one or more writing systems. For example, Russian is written 
with a subset of the Cyrillic script; Ukranian is written with a different 
subset. The Japanese writing system uses several scripts.

Single-script confusables

These occur when letters from the same alphabet, or script, are used to give 
the same visual appearance.  For example, the following two combinations of 
Latin letters look identical:

*       so̷s
*       søs

If you take these apart, there’s a big difference.  While the letter ’s’ is the 
same in each, the ‘o̷’ and ‘ø’ are different.  The first uses the Basic Latin 
‘o’ with a combining diacritical mark named COMBINING SHORT SOLIDUS OVERLAY.  
To put it a different way, we have two atomic Unicode code points here, which 
together give the affect of a single character or letter.  The second uses the 
atomic character LATIN SMALL LETTER O WITH STROKE.  Let’s take these apart and 
look at the Unicode code point values for each.

*       so̷s == \u0073\u006F\u0337\u0073
*       søs == \u0073\u00F8\u0073

As you can see, the first ‘o̷’ gets formed from two Unicode code points, \u006F 
and \u0337.  If you copy and paste that word into a text editor that supports 
Unicode (e.g. Notepad) and click backspace, you’ll see the first backspace 
removes the combining diacritical mark, and the second removes the ‘o’.  
Continuing with the example, the second ‘ø’ is made of a single Unicode code 
point \u00F8 part of the Latin-1 Supplement Unicode block. At a lower level, 
because we’re using different code points and bytes to achieve the same visual 
affect, we have a case of the confusables.

Let’s take a closer look at what qualifies as a single-script confusable for 
the Latin lower-case letter ‘a’ - taken from the confusables table at 
http://unicode.org/reports/tr39/data/confusables.txt.

FF21 ; 0041 ; SA # ( Ａ → A ) FULLWIDTH LATIN CAPITAL LETTER A → LATIN CAPITAL 
LETTER A # {nfkc:65314}
1D400 ; 0041 ; SA # (

 

 

[Ph4nt0m] <http://www.ph4nt0m.org/>  

[Ph4nt0m Security Team]

                   <http://blog.ph4nt0m.org/> [EMAIL PROTECTED]

          Email:  [EMAIL PROTECTED]

          PingMe:  
<http://cn.pingme.messenger.yahoo.com/webchat/ajax_webchat.php?yid=hanqin_wuhq&sig=9ae1bbb1ae99009d8859e88e899ab2d1c2a17724>
 

          === V3ry G00d, V3ry Str0ng ===

          === Ultim4te H4cking ===

          === XPLOITZ ! ===

          === #_# ===

#If you brave,there is nothing you cannot achieve.#

 

 


--~--~---------~--~----~------------~-------~--~----~
 要向邮件组发送邮件，请发到 [email protected]
 要退订此邮件，请发邮件至 [EMAIL PROTECTED]
-~----------~----~----~----~------~----~------~--~---

<<inline: image001.gif>>

[Ph4nt0m] [zz]Unicode IDN homograph attacks and test cases - Visual Spoofing and the Single Script Confusables part 2

回复