Re:Textual analysis

2003-12-16 Thread Major Variola (ret)
At 10:36 AM 12/14/03 -0500, John Kelsey wrote:
It's not obvious to me how you'd change your writing style to defeat
these
textual analysis schemes--would it really be as simple as changing the
average length of sentences and getting rid of the big words, or would
there still be ways to determine your identity from that text?

Its like steganalysis.  Its an arms race between measuring your own
signatures vs. what the Adversary can measure.  If sentence length
is a metric known to you, you can write filters that warn you.
Similarly for the Adversary.   You end up in an arms race
over metrics ---who has the more sensitive ones that the other
does not control for?



Re:Textual analysis

2003-12-16 Thread Morlock Elloi
 Its like steganalysis.  Its an arms race between measuring your own
 signatures vs. what the Adversary can measure.  If sentence length
 is a metric known to you, you can write filters that warn you.
 Similarly for the Adversary.   You end up in an arms race
 over metrics ---who has the more sensitive ones that the other
 does not control for?

But unlike stego, where the issue is faking the noise, personal fingerprints
can be removed from the message more reliably. You just need the right gloves.

One way is to use automated translators. They all have an internal language
and modules that translate to and from it. The internal language is far more
restricted than the natural one, so it doesn't leak many aspects of the
linguistic fingerprint. Going to the internal form is lossy compression.
There is no way to recreate the original.

The simplest method is an englih-to-english translator. Better method, and
thicker gloves, can be used by going through several from/to modules for
different languages. In commercial engines the meaning starts to suffer after
3-4 steps but just before that happens the word ordering and use gets
completely skewed.

Of course, you have to buy the translator and not use the online
google/babelfish access. It's the small things that get you ...



=
end
(of original message)

Y-a*h*o-o (yes, they scan for this) spam follows:

__
Do you Yahoo!?
New Yahoo! Photos - easier uploading and sharing.
http://photos.yahoo.com/



Re:Textual analysis

2003-12-15 Thread John Kelsey
At 09:44 AM 12/13/03 -0600, Harmon Seaver wrote:
..
  And what is my supposed three-space paragraph lead-ins? The concept of
textual analysis to prove ID has always amused me. A competent writer can 
easily
change writing styles from moment to moment. I well recall a university 
english
lit prof almost accusing me of plagarism when I wrote a piece mimicking 
Faulkner
and doing so well enough that the prof actually started looking thru his works
trying to find it.
Textual analysis correctly identified the author of _Primary Colors_, 
though that was from a pretty small field of people with the right level of 
inside knowledge.  Does anyone know whether there have been real randomized 
trials of any of the textual analysis software or techniques?  E.g., is 
this an identification technique like DNA, or is it an identification 
technique like retrieving repressed memories under hypnosis (or, 
equivalently, consulting a ouiji board)?

It's not obvious to me how you'd change your writing style to defeat these 
textual analysis schemes--would it really be as simple as changing the 
average length of sentences and getting rid of the big words, or would 
there still be ways to determine your identity from that text?  I'm 
thinking especially of long discussions of technical topics--if I wrote a 
five page essay on what to look at when trying to cryptanalyze a new block 
cipher, I think it would be hard to keep readers who knew me from having a 
pretty good guess about the author, even if I tried changing terms, being 
more mathematical and less conversational, etc.  (Though this is more of a 
problem with humans familiar with my writing style, rather than with 
automated analysis.)

Harmon Seaver
CyberShamanix
http://www.cybershamanix.com
--John Kelsey, [EMAIL PROTECTED]
PGP: FA48 3237 9AD5 30AC EEDD  BBC8 2A80 6948 4CAA F259