Re: problem with counting words

larry Mon, 13 Oct 2014 10:09:28 -0700

Hi Richard,
in a word...
"I really enjoyed reading your post and I learned a lot!"


Larry

----- Original Message -----From: "Richard Gaskin" <ambassa...@fourthworld.com>

To: <use-livecode@lists.runrev.com>
Sent: Monday, October 13, 2014 9:03 AM
Subject: Re: problem with counting words

Good post, Kay. Each of the examples you provided is among the reasons Ilike xTalk.
But even though they demonstrate useful features of the language, neitheris dependent on xTalk's trait of counting quoted text as a single wordwhen using the word chunk type.
Perhaps I should preface this by noting that I very much enjoy xTalk ingeneral and LiveCode in particular, a love that's only grown in my 27years with this family of languages.
But all programming languages have historical anomalies, and xTalk is notthe world's only exception to this. Programming languages are, by nature,somewhat funky, attempting to communicate the richness of human thought toa machine too stupid to count past 1. All of them require trade-offs.
In the first example you provided, the list of names, none of themincludes quoted text. And even with the broader support of treating wordsas white-space delimited (breaking from the English rule of usually notincluding punctuation), as you noted at least one of the examples therewill fail (sorry, Mr. Van Damme).
Many other languages also provide means of dealing with multi-characterwhite space (sed, awk, and Python come to mind), and none of them, noteven xTalk, will reliably sort by last name unless we separate the firstand last more explicitly, such in separate fields or with a tab character,as is commonly done in any language where a last-name sort is important,even in LiveCode.
In the second example in which a multi-word value is used as an objectidentifier, once again we're not asked to parse that using xTalk's "word"chunk type, but instead get to rely on the engine's expression evaluator,which works very much like JavaScript's and others' in which literalstrings can be used as object identifiers. Useful as it is, it's neitherunique to xTalk nor necessarily dependent on how we use the "word" chunktype.
Object identifiers *can* become dependent on the word chunk type if youneed to parse them yourself, as others have noted along with many othergood examples to justify the HyperTalk team's implementation (though wemight ask why we need to do this so often, such as why we don't haveobjectType or ownerStack functions).
No matter how useful the current implementation is, the choice stillrequires justification. Even if that justification is sound, favoring acertain utility, it's still a trade-off, the downside being a redefinitionof the word "word" from its more common definition in natural language.
Larry's initial confusion is far from rare. xTalk's reliance on a uniquedefinition of "word" that differs from its use in natural language issomething we all had to learn. We may accept it, we may like it, we mayeven prefer it, but it's by no means intuitive to the native Englishspeaker.
xTalk was born more than a decade before Unicode was invented, so itcouldn't have taken advantage of the vast pool of collective knowledgeembodied in the Unicode spec, nor was there the luxury of having thecomputational horsepower needed to use such a spec efficiently.
Today the LiveCode team has at last corrected this with the introductionof the "trueWord" token type, though I have to shrug my shoulders with anacknowledging chuckle in sharing Larry's initial observation that if xTalkwere being designed today, with it's ostensible emphasis on "English-like"syntax, the order is backwards:
If we didn't have 27 years of code dependent on xTalk's uniqueredefinition of "word", to support the claim of "English-like" it might bemore intuitive to have "word" act as "trueWord" does, and have some othertoken do what "word" currently does in xTalks unique redefinition.
But that's not the world we live in. Like every other language, LiveCodeis a product of its unique history. Useful as its conventions are, theywill from time to time require us to learn new ways of doing things.
This is just one of many reasons I generally don't use the phrase"English-like" when giving talks on LiveCode. Our favorite languagebrings to the world's programming choices a uniquely valuable blend offeatures, but while it's certainly more readable than most it isn'tparticularly "English-like", nor does it really even try all that hard tobe.
And that's a good thing.
Natural language is really tough stuff to parse, full of its own evenlonger and more nuanced history, and intended for a very differentaudience (the cognitive complexity of the human mind rather than thelogical simplicity of computers).
I think most of us (except Geoff Canyon who has a rare mind for this sortof stuff <g>) would agree that we're all glad this isn't a valid statementin xTalk, even though it's a perfectly valid sentence in English:
"Buffalo buffalo Buffalo buffalo buffalo buffalo Buffalo buffalo."

<http://en.wikipedia.org/wiki/Buffalo_buffalo_Buffalo_buffalo_buffalo_buffalo_Buffalo_buffalo>

:)



Kay C Lan wrote:
On Mon, Oct 13, 2014 at 7:45 AM, Richard Gaskin wrote:
I hear ya', but like so many other oddities in the language this onecame
from Apple,
Sheer brilliance! One of the first analogies of HyperCard was that it wasa
an electronic rolodex. Here is a list of names:

Abu Musab    Al-Zarqawi
Camilla Parker-Bowles
Catherine    Zeta-Jones
Claude Levi-Strauss
D'Arcy    Corrigan
Daniel  Day-Lewis
David    Ben-Gurion
Dodi Al-Fayed
Florence    Griffith-Joyner
Gilbert  O'Sullivan
Gloria    Macapagal-Arroyo
Jean-Claude Van Damme
Jimmy    O'Dea
Justine  Henin-Hardenne
Kareem    Abdul-Jabbar
Karim Abdul-Jabbar
Kristin    Scott-Thomas
Maddox  Jolie-Pitt
Michael    O'Leary
Olivia Newton-John
Peter    O'Toole
Sinéad O'Connor
Tim    Brooke-Taylor
Ralph Twistleton-Wykham-Fiennes
So lets say you want to sort these by surname - a kind of rolodex thingto
do.

sort lines of myListOfNames by word  of  -1 each

will result in only one mistake

sort lines of myListOfnames by trueword -1 of each --if you are on LC7.0

will result in basically the same messed up result most other programming
languages will give you. Put it in and word processor and see how you go.
Please feel free to try and write your own function that is moresuccessfuland more efficient than the beautiful one liner Bill Atkinson gave us.Even
if you had wordDel it wouldn't help much. I can't imagine the amount of
hours that have been wasted, especially on genealogical websites, tryingto
unfathom why double barrelled names never sort correctly. This is also
compounded by the certain fact that some people will put a space between
the last given name and the Surname, some a tab, and some will 'format'the
data by placing multiple spaces in between names so that things 'line up
nicely' - and are then confused as to why it only looks that way on their
screen an not on someone else's. One of the reasons double barrellednameshave picked up the '-' is to help computers recognise them as a singleword.
Also;

put myVariable into fld Not A Variable

doesn't work

put myVariable into fld "Not A Variable"

does. The ability to recognise words in quote as a single entity is
extremely important. Yes, we don't typically think of such as a single
word, but when we understand that computers don't think like us, and wedo
understand why things are the way they are, such oddities can be
manipulated in many powerful ways to our own advantage. It is alsohelpful
when we understand such things that we don't go around replacing one
character willy nilly with another character. ~ [tilde] for instance isone
character I'd never use as it has a special meaning in many computer
languages; as does / \ < > . * and many others. If we had some text that
contained both straight and curly quotes and replaced the straight quotes
with curly quotes so we could get a word count, and then changed thecurlyquotes back to straight quotes, the finL text is not the same as itstarted
- and this could cause problems. Today your function might work perfectly
for today's problem, but next month, or next year, when you startexpanding
your LC skills and try working with SQL databases, or Servers and network
connections, every now and then someone will report a bug that your app
does something strange. You may never be able to track it down because it
just happens that once every million DB calls a random user happens touse
data that contains a character that you never use yourself and thought no
one else would. I have a particular liking to numToChar(127) myself.

Yep, no other programming language might define a word like LC defines a
word, but I for one am EXTREMELY thankful for that.
--
 Richard Gaskin
 Fourth World Systems
 Software Design and Development for the Desktop, Mobile, and the Web
 ____________________________________________________________________
 ambassa...@fourthworld.com                http://www.FourthWorld.com

_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage yoursubscription preferences:http://lists.runrev.com/mailman/listinfo/use-livecode



_______________________________________________
use-livecode mailing list
use-livecode@lists.runrev.com
Please visit this url to subscribe, unsubscribe and manage your subscription 
preferences:
http://lists.runrev.com/mailman/listinfo/use-livecode

Re: problem with counting words

Reply via email to