Re: Textual analysis

2003-12-16 Thread coderman
Adam Shostack wrote:

...
| It's not obvious to me how you'd change your writing style to defeat these 
| textual analysis schemes--would it really be as simple as changing the 
| average length of sentences and getting rid of the big words, or would 
| there still be ways to determine your identity from that text?

So, the question boils down to economics.  There's how much you need
to communicate, how much someone is willing to spend to tag you, and
how good their proof needs to be.  I suspect that for most purposes,
proof does not need to be very strong in relation to your need to
communicate.
An interesting ad-hoc test subject might be Eleusis/ZWITTERION from
a.d.c.; I've wanted to see someone apply these techniques against his
writing after following his posts and being amused/surprised myself.
http://groups.google.com/groups?safe=offq=Eleusis+group%3Aalt.drugs.chemistry
http://groups.google.com/groups?safe=offq=ZWITTERION+group%3Aalt.drugs.chemistry
Strangely enough, the powers that be showed little interest in his
electronic trail ...
[ http://www.rhodium.ws/chemistry/eleusis/memoirs.html ]



Re: Textual analysis

2003-12-15 Thread Adam Shostack
On Sun, Dec 14, 2003 at 10:36:02AM -0500, John Kelsey wrote:
| Textual analysis correctly identified the author of _Primary Colors_, 
| though that was from a pretty small field of people with the right level of 
| inside knowledge.  Does anyone know whether there have been real randomized 
| trials of any of the textual analysis software or techniques?  E.g., is 

Not as far as I know, and I spent a bit of time reading through both
Author Unknown, by Don Foster (who named Klien) and Analyzing for
Authorship, by Jill Farringdon.

Foster is an English professor, and reads the work under analysis, and
then works by the potential authors.  His technique would be described
as intuitive, but the human brain has large power to make linkages.
Analysing for Authorship, from the University of Wales press.

Analyzing for Authorship really didn't strike me as better. It uses
a technique called CUSUM, but the methodology and graphs (as I
recall) vary from text to text, and neither I, nor Alice, who read the
book for ZKS, wondering if we could build this stuff into a product,
was very impressed by it.

| It's not obvious to me how you'd change your writing style to defeat these 
| textual analysis schemes--would it really be as simple as changing the 
| average length of sentences and getting rid of the big words, or would 
| there still be ways to determine your identity from that text?  I'm 
| thinking especially of long discussions of technical topics--if I wrote a 
| five page essay on what to look at when trying to cryptanalyze a new block 
| cipher, I think it would be hard to keep readers who knew me from having a 
| pretty good guess about the author, even if I tried changing terms, being 
| more mathematical and less conversational, etc.  (Though this is more of a 
| problem with humans familiar with my writing style, rather than with 
| automated analysis.)

So, the question boils down to economics.  There's how much you need
to communicate, how much someone is willing to spend to tag you, and
how good their proof needs to be.  I suspect that for most purposes,
proof does not need to be very strong in relation to your need to
communicate.   That is, if Tricky Dick thinks you're Deep Throat, or
Saddam thinks you're the guy who betrayed him, etc.

Adam



-- 
It is seldom that liberty of any kind is lost all at once.
   -Hume