On Aug 8, 2015, at 12:42 PM, Richmond wrote: > Jane Austen [amongst others] uses an interesting type of grammatical > construction of this sort: > > After breakfast, the girls walked to Meryton to inquire if Mr. Wickham > _were returned_, and to lament over his absence from the Netherfield ball. > > Pride and Prejudice. > > I would like to analyse a million word corpus that I have been granted access > to for this type of construction. > > However, I don't want to find examples of only 'were returned', but all > examples of > > were + infinitive / preterite / past participle > > and, presumably for that I shall have to use wildcards . . . > > OR ???
I'll leave it to those who speak Regex to suggest a wildcard solution. Here's another one (not tested) that will catch past participles ending in "ed". Not sure how this will scale with large texts: function findWere pText -- returns a comma-delim list of all the word offsets matching "were *ed" put wordOffsets("were", pText, true) into offList repeat for each item w in offList put word w+1 of pText into testWord if testWord ends with "ed" then put w & comma after outList end repeat return item 1 to -1 of outList end if function wordOffsets str, pContainer, matchWhole -- returns a comma-delimited list of all the wordOffsets of str in pContainer -- if matchWhole = true then only whole words are located -- else will find word matches everywhere str is part of a word in pContainer -- note that in LC words will include adjacent puncutation, -- so using matchWhole = true may exclude too many "words" -- duplicates are stripped out -- eg wordOffsets("co","the common coconut") = 2,3 not 2,3,3 -- note: to get the last wordOffset of a string in a container (often useful) -- use "item -1 of wordOffsets(...)" -- by Peter M. Brigham, pmb...@gmail.com — freeware -- requires offsets() if matchWhole = empty then put false into matchWhole put offsets(str,pContainer) into offList if offList = 0 then return 0 repeat for each item i in offList put the number of words of (char 1 to i of pContainer) into wdNbr if matchWhole then if word wdNbr of pContainer <> str then next repeat end if put 1 into A[wdNbr] -- using an array avoids duplicates end repeat put the keys of A into wordList sort lines of wordList ascending numeric replace cr with comma in wordList return wordList end wordOffsets function offsets str, pContainer -- returns a comma-delimited list of all the offsets of str in pContainer -- returns 0 if not found -- note: offsets("xx","xxxxxx") returns "1,3,5" not "1,2,3,4,5" -- ie, overlapping offsets are not counted -- note: to get the last occurrence of a string in a container (often useful) -- use "item -1 of offsets(...)" -- by Peter M. Brigham, pmb...@gmail.com — freeware if str is not in pContainer then return 0 put 0 into startPoint repeat put offset(str,pContainer,startPoint) into thisOffset if thisOffset = 0 then exit repeat add thisOffset to startPoint put startPoint & comma after offsetList add length(str)-1 to startPoint end repeat return item 1 to -1 of offsetList -- delete trailing comma end offsets P.S. I love Jane Austen. One of my favorite books of all time is "Pride and Prejudice." It's so beautifully constructed. -- Peter Peter M. Brigham pmb...@gmail.com http://home.comcast.net/~pmbrig _______________________________________________ use-livecode mailing list use-livecode@lists.runrev.com Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-livecode