Jonathon Blake wrote:

All:

I have 250 pages (13,750 lines) of abbreviations, of which at least a
third are duplicates.
[This is _after_ doing "cat textfile | sort | uniq > uniq_line_listings"  ]

a) Has anybody written a macro that will detect and remove duplicate
listings in either a text file or a spreadsheet?

b) Does anybody have any other suggestions/recommendations for
eliminating duplicate listings.

If it makes any difference, the file contains text in at least 20
different languages, using ten  different writing systems.


xan

jonathon
--
Does your Office Suite conform to ISO Standards?
Hi Jonathon,

Is your data sorted? Do you have trailing spaces?

If the data is sorted, then this is a rather easy problem, well mostly anyway...

REM This assumes a sorted list
Sub RemoveDupliatesLines
 Dim oCurs1, oCurs2

 REM The first cursor will start by selecting the first paragraph
 oCurs1 = ThisComponent.getText.createTextCursor()
 oCurs1.gotoStart(False)
 oCurs1.gotoEndOfParagraph(True)

 REM The second cursor will start by selecting the second paragraph
 oCurs2 = ThisComponent.getText.createTextCursor()
 oCurs2.gotoStart(False)
 oCurs2.gotoNextParagraph(False)
 oCurs2.gotoEndOfParagraph(True)

 Dim s1$
 Dim s2$
 s1 = Trim(oCurs1.getString())
 Do
   s2 = Trim(oCurs2.getString())
   If s1 = s2 Then
     oCurs2.gotoNextParagraph(True)
     oCurs2.setString("")
   Else
     oCurs1.gotoNextParagraph(False)
     oCurs1.gotoEndOfParagraph(True)
     s1 = Trim(oCurs1.getString())
     If NOT oCurs2.gotoNextParagraph(False) Then Exit Do
   End If
 Loop Until NOT oCurs2.gotoEndOfParagraph(True)
End Sub

--
Andrew Pitonyak
My Macro Document: http://www.pitonyak.org/AndrewMacro.odt
My Book: http://www.hentzenwerke.com/catalog/oome.htm
Info:  http://www.pitonyak.org/oo.php
See Also: http://documentation.openoffice.org/HOW_TO/index.html


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

Reply via email to