hello, i'm doing this exercise and would appreciate any comments. i want to create a machine to scan a text, then split the text into elements (storing in a hash table). then we connect these hash keys in a probabilistic way, so that if we start from a word, we can jump to other words in a probabilistic way. hence, we can generate a sentence that is sufficiently independent from our bias. the point is, if these probability makes sense (for example, it is real statistics from real high quality text, i.e. from famous writers), i hope that the machine can generate one or two sentences that is entertaining.
(require 2htdp/batch-io) (require racket/hash) (define input (read-file "sample.txt")) (define data (remove-duplicates (string-split input))) to make it very easy at first, i dont use the frequency of the elements (words mostly) just yet. i just make a hash table that take each element as a key, the associated value will be a dispatching rule. to begin with, the dispatching rule is simple: the current key will be connected to two other keys, with probability. for example: the element "run" is connected to "instead." with probability .48 and "helmets" with probability .08. (hash "run" (hash 0.48 "instead." 0.08 "helmets")) (struct state (word dispatch) #:transparent) (define (random-member lst) (list-ref lst (random (length lst)))) (define (make-sample-machine lst) (define l (length lst)) (define (make-transition) (hash (round-2 (random)) (random-member lst) (round-2 (random)) (random-member lst))) (foldl (lambda (word h) (hash-union h (hash word (make-transition)))) (hash) lst)) (define (round-n x n) (/ (round (* x (expt 10 n))) (expt 10 n))) (define (round-2 x) (round-n x 2)) (define m (make-sample-machine data)) data of the machine looks like this: '#hash(("spread" . #hash((0.9 . "right") (0.21 . "deep"))) ("instead" . #hash((0.64 . "then") (0.19 . "dark"))) ("through" . #hash((0.3 . "meadow") (0.95 . "white,"))) ("their" . #hash((0.56 . "instead") (0.98 . "valley,"))) now i try to generate a sentence of 10 words, i guess it is some kind of loops, but when it write the function, it is super slow. the idea is that, we randomise to choose the first word, then this first word has an associated dispatching rule. we use the probability in this rule to randomise for the next word.. is it because i use too much randomisation that the function is super slow? (define (accumulate lst) (define total (apply + lst)) (let absolute->relative ([elements lst] [so-far #i0.0]) (cond [(empty? elements) '()] [else (define nxt (+ so-far (round-2 (/ (first elements) total)))) (cons nxt (absolute->relative (rest elements) nxt))]))) (define (randomise accumulated-lst) (define r (random)) (for/last ([p (in-naturals)] [% (in-list accumulated-lst)] #:final (< r %)) p)) (define (generate-text m n) (define l (hash-count m)) (define r (random l)) (match-define (cons first-word dispatch) (hash-iterate-pair m r)) (cons first-word (let generate ([count-down n] [next-batch dispatch]) (cond [(zero? count-down) '()] [else (define proba (hash-keys dispatch)) (define next-word (hash-iterate-key m (randomise (accumulate proba)))) (define next-dispatch (hash-ref m next-word)) (cons next-word (generate (- n 1) next-dispatch))])))) this exercise is at the beginer level, so i guess, someone must have done it before. anyone has experience in doing this? like, is it a good way to represent the data in a hash table? how to handle when the sample text (so the hash table) becomes very large? here is the sample text: They had marched more than thirty kilometres since dawn, along the white, hot road where occasional thickets of trees threw a moment of shade, then out into the glare again. On either hand, the valley, wide and shallow, glittered with heat; dark green patches of rye, pale young corn, fallow and meadow and black pine woods spread in a dull, hot diagram under a glistening sky. But right in front the mountains ranged across, pale blue and very still, snow gleaming gently out of the deep atmosphere. And towards the mountains, on and on, the regiment marched between the rye fields and the meadows, between the scraggy fruit trees set regularly on either side the high road. The burnished, dark green rye threw off a suffocating heat, the mountains drew gradually nearer and more distinct. While the feet of the soldiers grew hotter, sweat ran through their hair under their helmets, and their knapsacks could burn no more in contact with their shoulders, but seemed instead to give off a cold, prickly sensation. thank you, and have a good day, (if you read until this point) -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.