As a meta point, I wrote a version closer to the actual requirements -- lowercase everything, process an external input line by line to allow for arbitrary input size. The result is about 8-10x slower than most other languages -- not as bad as I feared, not as good as I hoped. Here's the code for that version:
on mouseUp answer file "choose input:" if it is empty then exit mouseUp put it into F lock screen put the long seconds into T open file F for read repeat read from file F for 1 line repeat for each word w in toLower(it) add 1 to R[w] end repeat if the result is not empty then exit repeat end repeat combine R using cr and tab sort R numeric descending by word 2 of each put the long seconds - T into T1 put R into fld "output" put the long seconds - T into T2 put T1 && T2 close file F end mouseUp On Sun, Jul 24, 2022 at 11:01 PM Geoff Canyon <gcan...@gmail.com> wrote: > On this Hacker News thread <https://news.ycombinator.com/item?id=32214419>, > I read this programming interview question > <https://benhoyt.com/writings/count-words/>. Roughly, the challenge is to > count the frequency of words in input, and return a list with counts, > sorted from most to least frequent. So input like this: > > The foo the foo the > defenestration the > > would produce output like this: > > the 4 > foo 2 > defenestration 1 > > Of course I smiled because LC is literally built for this problem. I took > well under two minutes to write this function: > > function wordCount X > repeat for each word w in X > add 1 to R[w] > end repeat > combine R using cr and tab > sort R numeric descending by word 2 of each > return R > end wordCount > > There are quibbles -- the examples given in the article work line by line, > so input size isn't an issue, and of course quotes would cause an issue, > and LC is case insensitive, so it works, but the output would look like > this: > > The 4 > foo 2 > defenestration 1 > > But generally, it works, and is super-easy to code. But for the sake of > argument, consider this Python solution given: > > counts = collections.Counter() > for line in sys.stdin: > words = line.lower().split() > counts.update(words) > > for word, count in counts.most_common(): > print(word, count) > > That requires a library, but it's also super-easy to code and understand, > and it requires just the same number of lines. So, daydreaming extensions > to LC syntax, this comes to mind: > > function wordCount X > add 1 to R[w] for each word w in X > return R combined using cr and tab and sorted numeric descending by > word 2 of each > end wordCount > > or if you prefer: > > function wordCount X > for each word w in X add 1 to R[w] > return (R combined using cr and tab) sorted numeric descending by word > 2 of each > end wordCount > > Or to really apply ourselves: > > function wordCount X > return the count of each word in X using cr and tab sorted numeric > descending by word 2 of each > end wordCount > > So: the xTalk syntax is over thirty years old; when was the last > significant syntax update? > > (I'm not at all core to the process, so feel free to tell me how much I've > missed lately!) > > _______________________________________________ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode