Jonathon Blake wrote:
All:
I have 250 pages (13,750 lines) of abbreviations, of which at least a
third are duplicates.
[This is _after_ doing "cat textfile | sort | uniq > uniq_line_listings" ]
a) Has anybody written a macro that will detect and remove duplicate
listings in either a text file or a spreadsheet?
b) Does anybody have any other suggestions/recommendations for
eliminating duplicate listings.
If it makes any difference, the file contains text in at least 20
different languages, using ten different writing systems.
xan
jonathon
--
Does your Office Suite conform to ISO Standards?
Hi Jonathon,
Is your data sorted? Do you have trailing spaces?
If the data is sorted, then this is a rather easy problem, well mostly
anyway...
REM This assumes a sorted list
Sub RemoveDupliatesLines
Dim oCurs1, oCurs2
REM The first cursor will start by selecting the first paragraph
oCurs1 = ThisComponent.getText.createTextCursor()
oCurs1.gotoStart(False)
oCurs1.gotoEndOfParagraph(True)
REM The second cursor will start by selecting the second paragraph
oCurs2 = ThisComponent.getText.createTextCursor()
oCurs2.gotoStart(False)
oCurs2.gotoNextParagraph(False)
oCurs2.gotoEndOfParagraph(True)
Dim s1$
Dim s2$
s1 = Trim(oCurs1.getString())
Do
s2 = Trim(oCurs2.getString())
If s1 = s2 Then
oCurs2.gotoNextParagraph(True)
oCurs2.setString("")
Else
oCurs1.gotoNextParagraph(False)
oCurs1.gotoEndOfParagraph(True)
s1 = Trim(oCurs1.getString())
If NOT oCurs2.gotoNextParagraph(False) Then Exit Do
End If
Loop Until NOT oCurs2.gotoEndOfParagraph(True)
End Sub
--
Andrew Pitonyak
My Macro Document: http://www.pitonyak.org/AndrewMacro.odt
My Book: http://www.hentzenwerke.com/catalog/oome.htm
Info: http://www.pitonyak.org/oo.php
See Also: http://documentation.openoffice.org/HOW_TO/index.html
---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]