Dear developers, I have had the memory bloating problem again, this time I reached 48 GB (the maximum for one CPU in my machine) and the process only ended after some 13 CPU hours with 100% CPU the whole time. |
From the logging info I could confirm that the program was stuck somewhere here most of the time, here are the rough steps Language pairs detected in C routine -> External call, no memory bloating Data processing finished after 2107 Seconds 00:58:12 Splitting finished after 49487 Seconds 14:42:59 -> Routine Split_data Sorting finished after 16527 Seconds 19:18:27 -> Routine Sort_data Processing of Data file finished after 68123 Seconds Writing the Logfile TR_DE-EN-eu_logfile.txt 26 Jun 2017 19:18:28 I have enclosed the Routines in question. In my dropbox I have stored the complete program with some test data to replicate the processing, the problem is reproducible. Just put the folder somewhere, move there and perform the command indicated. I run Open Object Rexx Version 5.0.0, Build date: May 20 2017, Addressing mode: 64 Hardware Mac Pro with dual-CPU Xeon Processors running Mac OS Sierra 10.12.5 PS as I was making the screenshot the process finished nicely, no crash or anything and the memory was released. So maybe it is just bad programming, but at least you can confirm that then :-) |
/* -------------------------------------------------------------------- */ /* Sort the data in the word files */ /* Use a barrel shifter and keep the top 5 words in each combo */ /* Todo */ /* To speed up sorting maybe a shorter list or better sorting */ /* algorithm must be used */ /* -------------------------------------------------------------------- */ Sort_data: Procedure Expose CountMB. StemMB.
trace o Top01MB = .mutablebuffer~new Top02MB = .mutablebuffer~new Top03MB = .mutablebuffer~new Top04MB = .mutablebuffer~new Top05MB = .mutablebuffer~new Word01MB = .mutablebuffer~new Word02MB = .mutablebuffer~new Word03MB = .mutablebuffer~new Word04MB = .mutablebuffer~new Word05MB = .mutablebuffer~new a = .array~new DO i=1 TO CountMB.0 /* Reset this list once for each word 2-tuple */ Top01MB = 0 Top02MB = 0 Top03MB = 0 Top04MB = 0 Top05MB = 0 Word01MB = 'NIL-NIL' Word02MB = 'NIL-NIL' Word03MB = 'NIL-NIL' Word04MB = 'NIL-NIL' Word05MB = 'NIL-NIL' DO j= 1 TO CountMB.i~Words --say 'CountMB.'i'~Word('j')' CountMB.i~Word(j) /* Store here only the Top-5 of the Iceberg */ SELECT WHEN CountMB.i~Word(j) > Top01MB THEN DO Top05MB = Top04MB Top04MB = Top03MB Top03MB = Top02MB Top02MB = Top01MB Top01MB = CountMB.i~Word(j) Word05MB = Word04MB Word04MB = Word03MB Word03MB = Word02MB Word02MB = Word01MB Word01MB = StemMB.i~Word(j) END WHEN CountMB.i~Word(j) > Top02MB THEN DO Top05MB = Top04MB Top04MB = Top03MB Top03MB = Top02MB Top02MB = CountMB.i~Word(j) Word05MB = Word04MB Word04MB = Word03MB Word03MB = Word02MB Word02MB = StemMB.i~Word(j) END WHEN CountMB.i~Word(j) > Top03MB THEN DO Top05MB = Top04MB Top04MB = Top03MB Top03MB = CountMB.i~Word(j) Word05MB = Word04MB Word04MB = Word03MB Word03MB = StemMB.i~Word(j) END WHEN CountMB.i~Word(j) > Top04MB THEN DO Top05MB = Top04MB Top04MB = CountMB.i~Word(j) Word05MB = Word04MB Word04MB = StemMB.i~Word(j) END WHEN CountMB.i~Word(j) > Top05MB THEN DO Top05MB = CountMB.i~Word(j) Word05MB = StemMB.i~Word(j) END OTHERWISE Iterate /* ignore lower counts */ END /* SELECT */ END j /* One tuple is sorted, store back to original stem items */ /* If less than 5 stem will end with 'NIL's and 0s */ -- StemMB.i = Word01MB Word02MB Word03MB Word04MB Word05MB -- CountMB.i = Top01MB Top02MB Top03MB Top04MB Top05MB a[i] = Word01MB Word02MB Word03MB Word04MB Word05MB Top01MB Top02MB Top03MB Top04MB Top05MB END i /* The stem items are now sorted internally */ /* Now sort all items globally */ /* This is not efficient or elegant programming, Q&D but it works */ a = a~StableSort i=0 DO item over a i = i+1 StemMB.i = item~Word(1) item~Word(2) item~Word(3) item~Word(4) item~Word(5) CountMB.i = item~Word(6) item~Word(7) item~Word(8) item~Word(9) item~Word(10) END StemMB.0 = i CountMB.0 = i Drop i j a, Top01MB Top02MB Top03MB Top04MB Top05MB, Word01MB Word02MB Word03MB Word04MB Word05MB Return .nil
/* ---------------------------------------------------------------------*/ /* Split the data in word stems, one word per stem */ /* The input file contains a long list of word-tuples with counts */ /* accelerated-acc?l?ration 45 */ /* accelerated-lutte 38 */ /* accelerated-contre 41 */ /* accordance-conseil 1107 */ /* accordance-mai 103 */ /* accordance-?tablissant 36 */ /* This long list is read in and split into separate stem entities */ /* accelerated-acc?l?ration accelerated-lutte accelerated-contre */ /* accordance-conseil accordance-mai accordance-?tablissant */ /* With corresponding count stem items */ /* 45 38 41 */ /* 1107 103 36 */ /* ---------------------------------------------------------------------*/ Split_data: Procedure Expose CountMB. StemMB. trace o USE ARG _FileStem. tempMB = .mutablebuffer~new('') LeftWordMB = .mutablebuffer~new('') /* 1 word in input lang */ LeftWordsMB = .mutablebuffer~new('') /* all different left words */ StemMB. = .mutablebuffer~new('') /* One entry per left word */ StemMB.0 = 0 CountMB. = .mutablebuffer~new('') /* The count for each left word */ CountMB.0 = 0 IF SysFileExists(_FileStem.OutputFile1) THEN qfileIn = .stream~new(_FileStem.OutputFile1) ELSE DO say 'FATAL File does not exist:' _FileStem.OutputFile1 Exit END /* Loop over entire output file and extract data for sorting */ DO WHILE qfileIn~lines <> 0 tempMB = qfileIn~linein --say --say 'New Line' i 'in' tempMB /* This will only happen if there is a write conflict with 2 tr.rex trying to write to the same file = should not happen in prod. */ IF tempMB~Pos('-') > 0 THEN DO LeftWordMB = tempMB~Word(1)~Left(tempMB~Pos('-')-1) Position = LeftWordsMB~WordPos(LeftWordMB) END ELSE DO say 'Corrupted Line :' || tempMB || ':' Iterate END --say 'LeftWordMB' LeftWordMB --say 'Position' Position IF Position > 0 THEN DO /* We already have this word, add the next translation */ StemMB.Position = StemMB.Position tempMB~Word(1) CountMB.Position = CountMB.Position tempMB~Word(2) --say 'Already found' --say 'StemMB.Position' StemMB.Position --say 'CountMB.Position' CountMB.Position END ELSE DO LeftWordsMB = LeftWordsMB LeftWordMB Position = LeftWordsMB~WordPos(LeftWordMB) StemMB.Position = StemMB.Position tempMB~Word(1) CountMB.Position = CountMB.Position tempMB~Word(2) StemMB.0 = Position CountMB.0 = Position --say 'New Word pair' --say 'LeftWordsMB' LeftWordsMB --say 'StemMB.' || Position StemMB.Position --say 'CountMB.' || Position CountMB.Position END END qfileIn~Close /* Get rid of temporary data file here */ res = SysFileDelete(_FileStem.OutputFile1) --say 'Words found' LeftWordsMB --say 'StemMB.0' StemMB.0 --say 'CountMB.0' CountMB.0 Return .nil
------------------------------------------------------------------------------ Check out the vibrant tech community on one of the world's most engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________ Oorexx-devel mailing list Oorexx-devel@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/oorexx-devel