Re: [users] Finding two-character strings [was: How do I un-hyperlink in bulk?]

Brian Barker Fri, 09 Nov 2007 18:25:32 -0800

At 15:08 09/11/2007 -0800, Allen Schaaf wrote:

The alternate function which drops the mailto: tag got me tothinking about another problem I have that is not directly relatedto OOo but uses OOo Writer as one of the steps.
I'm an analyst, not a programmer so I've been a bit stumped tryingto solve this problem.
I want to take large (3+ megs) of words from a variety of sources Ihave compiled and parse them like this: Say the first words are"Mary had a little lamb..." What I need is a moving window to grabthe letters, spaces and punctuation, add a delimiter and then countthe number of each digraph in the set.
So the result would look like this using ~~ as the separator:

Ma~~
ar~~
ry~~
y ~~
 h~~
ha~~
ad~~
d ~~
 a~~
a ~~
 l~~ <
li~~
it~~
tt~~
tl~~
le~~
e ~~
 l~~ <
la~~
am~~
mb~~
b.~~
..~~
..~~

etc.
I have a program that will sort and count the duplicates like theones marked with the <s.
It occurred to me that what the hyperlink program does when it dropsthe mailto: tag is almost the same. So the question I have cananyone modify it to do what I need, or point me to a tutorial whereI could learn enough to do it myself. What would be best, in my viewwould be some kind soul who would be patient enough to help meunderstand how it works and how to modify it with an exchange ofe-mails, that way I would learn something useful and not just hackat it until it sort of works.

You can do this fairly easily in Writer, using Find & Replace -though I don't know whether it is a particularly efficient way ofdoing so for large datasets.


o  Copy your text in Writer and paste a second copy after the first.

o Delete the first character of one copy - so that copy in yourexample would start "ary had ...".o Open the Find & Replace dialogue, click on More Options and thentick "Regular expressions".

o  Search for .{2} and replace with &~~\n - using Replace All.

The dot in the search pattern represents any character (except a lineor paragraph break), and the two in braces limits the match to anytwo-character string. The ampersand in the replace pattern copiesthe string matched and then appends two swung dashes and a paragraphbreak - which is what you seem to need. Once this has matched thefirst two characters, it then matches the third and fourth, and so on- so it catches only alternate pairs. That's why you need the secondcopy of the text, so that this technique will find the other set ofalternate pairs of characters in that copy. The results are in adifferent order from your list, but if you need just to sort andcount them, this should not matter.


I trust this helps.

Brian Barker


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Re: [users] Finding two-character strings [was: How do I un-hyperlink in bulk?]

Reply via email to