Re: [Jprogramming] markov chain algorithm

Skip Cave Sun, 13 Nov 2011 23:52:39 -0800

I am remembering a bit more about the algorithm I used many years ago, to
emulate Shakespeare's writing style.

One can get text files of many of Shakespeare's works from the web. The
Natural Language Toolkit has several of Shakespeare's works in text form,
for experimentation.

One can produce nonsensical text in the style of Shakespeare in a very
simple way, using J.

First, catenate all of Shakespeare's works into a single file. Read that
text file into a text noun (variable) in J, which we will call 'orgtxt'.

Now use J to pick a word or contiguous series of N words (including
punctuation) randomly from the text in 'orgtxt' , and insert that string in
a new variable that we will call 'newtxt'. Often this first selection will
be the start of some sentence in the text.

Now find all the N-word-length strings in the text that start with the last
M words in 'newtxt'. Pick one of these strings at random, discard the
duplicate words, and append that string on the end of the string in the
'newtext' variable. ( M < N, by the way)

Now find all the N-word-length strings in the text that start with the last
M words in 'newtxt'. Pick one of these strings at random, discard the
duplicate words, and append that string on the end of the original string,
in the 'newtext' variable.

Keep doing this until you have enough text to read.  The larger M &N are,
the closer the new text will be to Shakespeare's writing style, the more
sense the text will make, and the closer newtxt will be to the original
text.

I am probably forgetting or mis-remembering some key step, but it was
pretty close to this.

You can go to lots of extra effort encoding words to integers, getting
n-gram probabilities, Markov chains, etc, but the end result will still be
the same as this simple approach. One of the beauties of J is that this
kind of string searching, matching, and random selection are easy to do, so
more complex approaches are not required.

Skip

On Mon, Nov 14, 2011 at 12:20 AM, Daniel Lyons <[email protected]>wrote:

>
> On Nov 12, 2011, at 5:12 PM, Kip Murray wrote:
>
> > I see from Skip's discussion and rereading Daniel's post that Daniel may
> > be thinking, choose a word at random from a population list, next choose
> > a word at random from the first word's successor list, next choose a
> > word at random from the second word's successor list, and so on,
> > bypassing finding the probabilities I had put into a transition matrix.
>
> This is correct, actually.
>
> > My discussion assumed someone had indexed the n distinct words in the
> > population with integers 0 to n-1 and found the n^2 probabilities in my
> > n by n transition matrix.  To generate a random sentence: when index k
> > has been chosen, use the probabilities in row k to choose the next
> > index; afterwards convert the chosen indices into words.  Too much
> > trouble?  You decide.  When you know the transition matrix you can
> > calculate other interesting matrices, see my discussion.
>
> I actually found the responses from you and Skip very helpful and
> informative. I made progress on that front which I'll share when I achieve
> something closer to being finished. Everything is just so very different
> here. I'm very grateful for the list and the ample documentation!
>
> Thanks!
>
> —
> Daniel Lyons
>
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
>
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jprogramming] markov chain algorithm

Reply via email to