On Feb 29, 2012 1:58 PM, "Sai" <[email protected]> wrote:

>  For a 6 word sentence, with 8 (3b) templates, we need ~12b (4k word)
>  dictionaries for each word category.

1. You need 2^8=256 templates, not just 8, to reach 6*12+8=80 bits.

2. Having toyed with this idea in the past, let me warn that forming a 4096
word dictionary of memorable, non-colliding  words for each word category
is going to be very difficult.  Too many words are semantically similar,
phonetically similar, or just unfamiliar.  You might find Google Ngrams a
good resource for common words; I provide a complete sorted list here:

http://kenta.blogspot.com/2012/02/lefoezyy-some-notes-on-google-books.html

Another way to go about it might be to first catalogue semantic categories
(colors, animals, etc.) then list the most common (yet dissimilar) members
of each category.  An attempt at 64 words is here:

http://kenta.blogspot.com/2011/10/xpmqawkv-common-words.html

I'd propose that the "right" way to do this is not just sentences, but
entire semantically consistent stories, written in rhyming verse, with
entropy of perhaps only a few bits per sentence.  (Prehistoric oral
tradition does prove we can memorize such poems.)  However, synthesizing
these seem extremely difficult, an AI problem.

3. I presume people are familiar with Bubblebabble?  It doesn't solve all
the problems, but does make bit strings seem less "dense".

Ken
_______________________________________________
tor-dev mailing list
[email protected]
https://lists.torproject.org/cgi-bin/mailman/listinfo/tor-dev

Reply via email to