Are you just wanting to remove lines that are duplicates within a given file, 
or lines that are duplicate by occurring in one or more files?

Bruce
> On Sep 24, 2016, at 7:04 AM, Les Koehler <vmr...@tampabay.rr.com> wrote:
> 
> That's an interesting problem to solve. What's your approach?
> 
> Les
> 
> On 9/24/2016 7:31 AM, Bertram Moshier wrote:
>> Hello,
>> 
>> I'm writing a program to remove duplicate lines from multiple files.
>> (About 3600 files (971 or so have duplicate lines) and all files total
>> about 512MB of space.)
>> 
>> After reading in all the files, I've discovered ooRexx will no longer
>> write to any hard drive on the system.  I suspect this is a memory
>> issue, as usage is above 3GB and can be above 4GB.  YES, I AM using the
>> 64 bit version.
>> 
>> The version number:  REXX-ooRexx_4.2.0(MT)_64-bit 6.04 22 Feb 2014
>> 
>> The program runs fine even when memory usage exceed 8GB, except for I/O
>> (specifically output).
>> 
>> Below you'll find some of the code.  PLEASE note:  the !!Say_Directed
>> subroutine is like the UNIX tee command.  It can output to both the
>> console and file.  It was my first indication of a problem, as any
>> output would go to the console BUT not the file!
>> 
>> The !!EOJ subroutine generates a stop and timing information message.
>> This is the FAILURE as the line does not get written to disk (RC = 1 for
>> lineout).  The line does get written to disk if the !!EOJ occurs BEFORE
>> the last do loop (e.g. after the SysFileTree).  The memory usage of the
>> last DO LOOP is what takes ooRexx to 3-4GB and higher.
>> 
>> NOTE:  The system is running Windows 10 Professional AND has 48GB of
>> physical RAM.
>> 
>> 
>> Here is a piece of the code:
>> 
>> files_with_duplicate_lines. = ''
>> files_with_duplicate_lines.0 = 0
>> 
>> GOODMARK = 'GOOD: '
>> 
>> OFN = 0
>> output_files. = ''
>> input_files. = ''
>> 
>> 
>> rc = SysFileTree('C:\Program Files (x86)\Windower4\logs\*.log','files','O')
>> if rc <> 0 then do
>>  call !!EOJ 1000 + rc
>>  end
>> 
>> do filenum = 1 to files.0
>>  do lines = 1 while lines(files.filenum) > 0
>>    files.filenum.lines = linein(files.filenum)
>>    end
>>  files.filenum.0 = lines - 1
>>  end
>> 
>> call !!EOJ 0
>> 
>> 
>> 
>> 
>> ------------------------------------------------------------------------------
>> 
>> 
>> 
>> _______________________________________________
>> Oorexx-users mailing list
>> Oorexx-users@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/oorexx-users
>> 
> 
> ------------------------------------------------------------------------------
> _______________________________________________
> Oorexx-users mailing list
> Oorexx-users@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/oorexx-users

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

------------------------------------------------------------------------------
_______________________________________________
Oorexx-users mailing list
Oorexx-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/oorexx-users

Reply via email to