At 15:08 09/11/2007 -0800, Allen Schaaf wrote:
The alternate function which drops the mailto: tag got me to
thinking about another problem I have that is not directly related
to OOo but uses OOo Writer as one of the steps.
I'm an analyst, not a programmer so I've been a bit stumped trying
to solve this problem.
I want to take large (3+ megs) of words from a variety of sources I
have compiled and parse them like this: Say the first words are
"Mary had a little lamb..." What I need is a moving window to grab
the letters, spaces and punctuation, add a delimiter and then count
the number of each digraph in the set.
So the result would look like this using ~~ as the separator:
Ma~~
ar~~
ry~~
y ~~
h~~
ha~~
ad~~
d ~~
a~~
a ~~
l~~ <
li~~
it~~
tt~~
tl~~
le~~
e ~~
l~~ <
la~~
am~~
mb~~
b.~~
..~~
..~~
etc.
I have a program that will sort and count the duplicates like the
ones marked with the <s.
It occurred to me that what the hyperlink program does when it drops
the mailto: tag is almost the same. So the question I have can
anyone modify it to do what I need, or point me to a tutorial where
I could learn enough to do it myself. What would be best, in my view
would be some kind soul who would be patient enough to help me
understand how it works and how to modify it with an exchange of
e-mails, that way I would learn something useful and not just hack
at it until it sort of works.
You can do this fairly easily in Writer, using Find & Replace -
though I don't know whether it is a particularly efficient way of
doing so for large datasets.
o Copy your text in Writer and paste a second copy after the first.
o Delete the first character of one copy - so that copy in your
example would start "ary had ...".
o Open the Find & Replace dialogue, click on More Options and then
tick "Regular expressions".
o Search for .{2} and replace with &~~\n - using Replace All.
The dot in the search pattern represents any character (except a line
or paragraph break), and the two in braces limits the match to any
two-character string. The ampersand in the replace pattern copies
the string matched and then appends two swung dashes and a paragraph
break - which is what you seem to need. Once this has matched the
first two characters, it then matches the third and fourth, and so on
- so it catches only alternate pairs. That's why you need the second
copy of the text, so that this technique will find the other set of
alternate pairs of characters in that copy. The results are in a
different order from your list, but if you need just to sort and
count them, this should not matter.
I trust this helps.
Brian Barker
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]