[Oorexx-devel] High memory usage in ooRexx

P . O . Jonsson Mon, 26 Jun 2017 22:56:58 -0700

Dear developers,

I have had the memory bloating problem again, this time I reached 48 GB (the maximum for one CPU in my machine) and the process only ended after some 13 CPU hours with 100% CPU the whole time.

From the logging info I could confirm that the program was stuck somewhere here most of the time, here are the rough steps

Language pairs detected in C routine -> External call, no memory bloating

Data processing finished after 2107 Seconds 00:58:12

Splitting finished after 49487 Seconds 14:42:59 -> Routine Split_data

Sorting finished after 16527 Seconds 19:18:27 -> Routine Sort_data

Processing of Data file finished after 68123 Seconds

Writing the Logfile TR_DE-EN-eu_logfile.txt 26 Jun 2017 19:18:28

I have enclosed the Routines in question.

In my dropbox I have stored the complete program with some test data to replicate the processing, the problem is reproducible. Just put the folder somewhere, move there and perform the command indicated.

https://www.dropbox.com/sh/vettlcb4f8ae3cw/AACWIQivo_F2KhhytJ6izkbFa?dl=0

I run Open Object Rexx Version 5.0.0, Build date: May 20 2017, Addressing mode: 64

Hardware Mac Pro with dual-CPU Xeon Processors running Mac OS Sierra 10.12.5

PS as I was making the screenshot the process finished nicely, no crash or anything and the memory was released. So maybe it is just bad programming, but at least you can confirm that then :-)

/* -------------------------------------------------------------------- */
/* Sort the data in the word files                                      */
/* Use a barrel shifter and keep the top 5 words in each combo          */
/* Todo                                                                 */
/* To speed up sorting maybe a shorter list or better sorting           */
/* algorithm must be used                                               */
/* -------------------------------------------------------------------- */
Sort_data: Procedure Expose CountMB. StemMB.


  trace o

  Top01MB = .mutablebuffer~new
  Top02MB = .mutablebuffer~new
  Top03MB = .mutablebuffer~new
  Top04MB = .mutablebuffer~new
  Top05MB = .mutablebuffer~new

  Word01MB = .mutablebuffer~new
  Word02MB = .mutablebuffer~new
  Word03MB = .mutablebuffer~new
  Word04MB = .mutablebuffer~new
  Word05MB = .mutablebuffer~new

  a = .array~new

  DO i=1 TO CountMB.0

/* Reset this list once for each word 2-tuple                           */
      Top01MB = 0
      Top02MB = 0
      Top03MB = 0
      Top04MB = 0
      Top05MB = 0

      Word01MB = 'NIL-NIL'
      Word02MB = 'NIL-NIL'
      Word03MB = 'NIL-NIL'
      Word04MB = 'NIL-NIL'
      Word05MB = 'NIL-NIL'

    DO j= 1 TO CountMB.i~Words

--say 'CountMB.'i'~Word('j')' CountMB.i~Word(j)

/* Store here only the Top-5 of the Iceberg             */
      SELECT
        WHEN CountMB.i~Word(j) > Top01MB THEN
        DO
          Top05MB = Top04MB
          Top04MB = Top03MB
          Top03MB = Top02MB
          Top02MB = Top01MB
          Top01MB = CountMB.i~Word(j)

          Word05MB = Word04MB
          Word04MB = Word03MB
          Word03MB = Word02MB
          Word02MB = Word01MB
          Word01MB = StemMB.i~Word(j)
        END

        WHEN CountMB.i~Word(j) > Top02MB THEN
        DO
          Top05MB = Top04MB
          Top04MB = Top03MB
          Top03MB = Top02MB
          Top02MB = CountMB.i~Word(j)

          Word05MB = Word04MB
          Word04MB = Word03MB
          Word03MB = Word02MB
          Word02MB = StemMB.i~Word(j)
        END

        WHEN CountMB.i~Word(j) > Top03MB THEN
        DO
          Top05MB = Top04MB
          Top04MB = Top03MB
          Top03MB = CountMB.i~Word(j)

          Word05MB = Word04MB
          Word04MB = Word03MB
          Word03MB = StemMB.i~Word(j)
        END

        WHEN CountMB.i~Word(j) > Top04MB THEN
        DO
          Top05MB = Top04MB
          Top04MB = CountMB.i~Word(j)

          Word05MB = Word04MB
          Word04MB = StemMB.i~Word(j)
        END

        WHEN CountMB.i~Word(j) > Top05MB THEN
        DO
          Top05MB = CountMB.i~Word(j)

          Word05MB = StemMB.i~Word(j)
        END
        OTHERWISE Iterate       /* ignore lower counts */
      END /* SELECT */
    END j

/* One tuple is sorted, store back to original stem items               */
/* If less than 5 stem will end with 'NIL's and 0s                      */
--    StemMB.i  = Word01MB Word02MB Word03MB Word04MB Word05MB
--    CountMB.i = Top01MB Top02MB Top03MB Top04MB Top05MB

      a[i] = Word01MB Word02MB Word03MB Word04MB Word05MB Top01MB Top02MB 
Top03MB Top04MB Top05MB

  END i

/* The stem items are now sorted internally                             */
/* Now sort all items globally                                          */
/* This is not efficient or elegant programming, Q&D but it works       */
  a = a~StableSort

  i=0
  DO item over a
    i = i+1
    StemMB.i = item~Word(1) item~Word(2) item~Word(3) item~Word(4) item~Word(5)
    CountMB.i = item~Word(6) item~Word(7) item~Word(8) item~Word(9) 
item~Word(10)
  END

  StemMB.0 = i
  CountMB.0 = i

  Drop i j a, 
       Top01MB Top02MB Top03MB Top04MB Top05MB,
       Word01MB Word02MB Word03MB Word04MB Word05MB

Return .nil

/* ---------------------------------------------------------------------*/
/* Split the data in word stems, one word per stem                      */
/* The input file contains a long list of word-tuples with counts       */
/* accelerated-acc?l?ration 45                                          */
/* accelerated-lutte 38                                                 */
/* accelerated-contre 41                                                */
/* accordance-conseil 1107                                              */
/* accordance-mai 103                                                   */
/* accordance-?tablissant 36                                            */
/* This long list is read in and split into separate stem entities      */
/* accelerated-acc?l?ration accelerated-lutte accelerated-contre        */
/* accordance-conseil accordance-mai accordance-?tablissant             */
/* With corresponding count stem items                                  */
/* 45 38 41                                                             */
/* 1107 103 36                                                          */
/* ---------------------------------------------------------------------*/
Split_data: Procedure Expose CountMB. StemMB.
  trace o

  USE ARG _FileStem.

  tempMB      = .mutablebuffer~new('')
  LeftWordMB  = .mutablebuffer~new('')          /* 1 word in input lang */
  LeftWordsMB = .mutablebuffer~new('')          /* all different left words */

  StemMB.   = .mutablebuffer~new('')            /* One entry per left word */
  StemMB.0  = 0
  CountMB.  = .mutablebuffer~new('')            /* The count for each left word 
*/
  CountMB.0 = 0

  IF SysFileExists(_FileStem.OutputFile1) THEN qfileIn = 
.stream~new(_FileStem.OutputFile1)
  ELSE
  DO
    say 'FATAL File does not exist:' _FileStem.OutputFile1
    Exit
  END

/* Loop over entire output file and extract data for sorting            */
  DO WHILE qfileIn~lines <> 0

    tempMB = qfileIn~linein

--say
--say 'New Line' i 'in' tempMB

/*
   This will only happen if there is a write conflict with 2 tr.rex
   trying to write to the same file = should not happen in prod.
*/
    IF tempMB~Pos('-') > 0 THEN
    DO
      LeftWordMB = tempMB~Word(1)~Left(tempMB~Pos('-')-1)
      Position   = LeftWordsMB~WordPos(LeftWordMB)
    END
    ELSE
    DO
      say 'Corrupted Line :' || tempMB || ':'
      Iterate
    END

--say 'LeftWordMB' LeftWordMB
--say 'Position' Position

    IF Position > 0 THEN
    DO
/* We already have this word, add the next translation                  */
      StemMB.Position = StemMB.Position tempMB~Word(1)
      CountMB.Position = CountMB.Position tempMB~Word(2)
--say 'Already found'
--say 'StemMB.Position' StemMB.Position
--say 'CountMB.Position' CountMB.Position
    END
    ELSE
    DO
      LeftWordsMB = LeftWordsMB LeftWordMB
      Position = LeftWordsMB~WordPos(LeftWordMB)
      StemMB.Position = StemMB.Position tempMB~Word(1)
      CountMB.Position = CountMB.Position tempMB~Word(2)
      StemMB.0  = Position
      CountMB.0 = Position
--say 'New Word pair'
--say 'LeftWordsMB' LeftWordsMB
--say 'StemMB.' || Position StemMB.Position
--say 'CountMB.' || Position CountMB.Position
    END
  END

  qfileIn~Close

/* Get rid of temporary data file here                  */
 res = SysFileDelete(_FileStem.OutputFile1)

--say 'Words found' LeftWordsMB
--say 'StemMB.0' StemMB.0
--say 'CountMB.0' CountMB.0


Return .nil

Hälsningar/Regards/Grüsse,

P.O. Jonsson

oor...@jonases.se

------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot

_______________________________________________
Oorexx-devel mailing list
Oorexx-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/oorexx-devel

[Oorexx-devel] High memory usage in ooRexx

Reply via email to