I see from Skip's discussion and rereading Daniel's post that Daniel may be thinking, choose a word at random from a population list, next choose a word at random from the first word's successor list, next choose a word at random from the second word's successor list, and so on, bypassing finding the probabilities I had put into a transition matrix.
My discussion assumed someone had indexed the n distinct words in the population with integers 0 to n-1 and found the n^2 probabilities in my n by n transition matrix. To generate a random sentence: when index k has been chosen, use the probabilities in row k to choose the next index; afterwards convert the chosen indices into words. Too much trouble? You decide. When you know the transition matrix you can calculate other interesting matrices, see my discussion. Kip Murray On 11/12/2011 4:49 PM, Skip Cave wrote: > Take a look at Roger Hui's responses in my discussion thread entitled : "The > Travel Itinerary > Problem<http://www.jsoftware.com/pipermail/general/2006-March/026640.html>" > This thread was posted in the General forum March 26, 2006. My problem was > couched as calculating the probabilities of tourists taking specific tour > routes through various cities given lots of collected trip data. However, > the same calculations could be used to calculate the probabilities of word > n-grams in a text. Those probabilities can then be used to generate texts > which, while mostly nonsensical, can seem to emulate a particular author's > writing style. > > Scientific American had an article a few years back about a program to > analyze the word-frequencies and n-gram probabilities of most of > Shakespeare's works. Then those probabilities were used to generate new > texts that, while total gibberish, still had the flavor of Shakespeare. > The Natural Language Toolkit (http://www.nltk.org/) has tools for that > purpose, and actually provides several of Shakespeare's texts to test them > on. > > I also wrote an APL program to do that same thing back then, but I'm afraid > I lost that program years ago. Roger's functions in the > previously-mentioned thread do the bulk of the work, however. > > Skip > > On Sat, Nov 12, 2011 at 12:43 AM, Daniel Lyons<[email protected]>wrote: > >> Hi, >> >> I'm playing with J, just trying to get into the J mindset. A small program >> I wrote recently in another language was a small Markov chain sentence >> generator. This program had essentially two parts: parsing some input into >> some kind of internal representation, and generating sentences randomly >> using that. I'm trying to figure out how one would go about doing this in J. >> >> My first thought is to make a bunch of nested boxed arrays, so given a >> sample input string it would produce a pair of arrays, one with the unique >> set of words, and another with boxed arrays of successor words. But >> stumbling onto the lab that discusses a dice game it seems like it might be >> more natural to write this in terms of some kind of transition table. >> >> I am not looking for a concrete solution so much as clues as to how a J >> programmer would decompose this problem, and what techniques would be >> involved in solving it. >> >> Thanks, >> >> — >> Daniel Lyons >> >> ---------------------------------------------------------------------- >> For information about J forums see http://www.jsoftware.com/forums.htm >> > > > ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
