Hi P.O.,
executing other peoples code .. please give it a try
I can run code if you provide it as a platform-independent test case. I
have no Mac, and the Mac binaries you provide won't run on Ubuntu or
Windows, the platforms I can test on.
tr.rex gets stuck in the routine split_data
>
The code in tr.rex is inefficient when applied to a large data set. It
seems to be likely that this leads to the very long run-time and high
memory consumption you experience. Let me give you an example of the gains
that may be achieved by coding things differently: using string append and
wordPos(), this code will take a minute to execute for n = 100000
call random , , 42
rs = ""
do n
r = random(1, n % 2)
if rs~wordPos(r) = 0 then do
rs = rs r
stem.r = r
end
else
stem.r = stem.r r
end
say time("e")~format(, 2) "sec"
Achieving something very similar using StringTable and Array, will run in a
tenth of a second for the same n
call time "r"
call random , , 42
table = .StringTable~new(n)
do n
r = random(1, n % 2)
if \table~hasIndex(r) then
table[r] = .Array~of(r)
else
table[r]~append(r)
end
say time("e")~format(, 2) "sec"
A change like this gives a 600-fold improvement, and the numbers you are
working with are much larger than 100000.
I also noted, that the DE-EN-Cleaned.txt you provide, contains more than
56% duplicate lines - cleaning this might also bring some improvement.
The gains are getting
On Wed, Jun 28, 2017 at 11:46 PM, P.O. Jonsson <oor...@jonases.se> wrote:
>
> Hälsningar/Regards/Grüsse,
> P.O. Jonsson
> oor...@jonases.se
>
> Hello again Erich,
>
> I know executing other peoples code can be a p.i.t.a. but please give it a
> try. Se it as a golden opportunity to stress test ooRexx :-)
>
> Am 28.06.2017 um 21:21 schrieb Erich Steinböck <erich.steinbo...@gmail.com
> >:
>
> Please download the complete test set and let it run and
>>
> I neither have a Mac nor do I have 50 GB of memory
>
>
> I can share my machine over remote logon if that would help or we can try
> to look at it using a shared screen. You do not need much memory to run the
> program, 5 GB is more than sufficient for ONE instance of the program, and
> that is enough to simulate the problem.
>
>
> I had a REPRODUCIBLE scenario where this problem occurs
>> Out of 1200 or so runs it was only this single run that produced memory
>> bloating
>>
> see if you can reproduce the memory problem
>>
> Can you explain the problem in more detail? What exactly happens when you
> run which command with what arguments? What are you expecting to happen
> instead and why?
>
>
> The problem is that the program tr.rex gets stuck in the routine
> split_data (in the main loop when I break it)) or in sort_data (on the
> ~Stablesort, presumably) for 1000 times longer for certain intermediate
> data (read below) than for other. It is not so much more data compared to
> other runs that I would expect this memory load. While being in one of
> these routines the memory allocation for the rexx process goes up and up
> and up until you have no more memory (and start to swap). At the beginning
> the memory allocated to the rexx process is negligible so you can try it
> with any memory that runs.
>
>
>
> it finished in 7 hours 1200 individual ooRexx processes
>
> What does "1200 individual ooRexx processes" mean? Are you starting you
> program with 1200 different sets of arguments? Sequentially or in parallel?
> Which one of the programs shows the issue? Is it always the same one?
>
>
> In order to use all cores/threads on my machine I use a bash shell script
> to launch/spawn up to at most 24 instances of the same program in parallel,
> running on the *same* data but with different parameters, producing
> *different* intermediate data files (_RAW files) that are read and
> processed in Split_data and handed over to Sort_data. When one chunk of
> data is processed that process finishes (tr.rex exits) and another one is
> started to do the same over and over again up to around 1200 individual
> runs for one batch. There is only one rexx program and the problem only
> arises for specific parameters in combination with specific input data. I
> have provided you two examples, one that runs like a charm and another one
> that never finishes.
>
> why is the interpreter not warning me when I overwrite an object with a
>> string?
>>
> You're not overwriting an object with a string, you're changing a variable
> from referring to one object to referring to another one. That's totally
> normal .. similar to coding a = 1; a = 2;
>
>
> I don't think this is normal but never mind, I never liked objects anyway
> :-) When I started using Rexx the credo was „Everything is a string“. And
> I am still in the habit of programming like that, hence the code you see
> before you.
>
> In the past (4.1, 4.2? If I did a *say myMutableBuffer* it reported „A
> Mutable Buffer“ or something, nowadays I get the value stored in the MB. Is
> there a way to check what kind of object you are referring to? A
> ~whatAreYou method. Useful when you look for mistakes in your code (I
> occasionally write imperfect code, unfortunately).
>
>
> On Tue, Jun 27, 2017 at 10:26 PM, P.O. Jonsson <oor...@jonases.se> wrote:
>
>> "maybe it is just bad programming“
>>
>> I guess I had it coming…
>>
>> Thanks Erich for your advice, I will consider it all, but my intention
>> with this report was another one; for the first time I had a REPRODUCIBLE
>> scenario where this problem occurs. Out of 1200 or so runs it was only this
>> single run that produced memory bloating so my assumption was that is was
>> not ONLY :-) bad programming.
>>
>> Please download the complete test set and let it run and see if you can
>> reproduce the memory problem I have. If so it is easy for you to just
>> improve the code and see where the problem goes away. I have a feeling I am
>> stuck at
>>
>> a = a~StableSort
>>
>> For quite some time, maybe because of unfavorable data. But I can´t tell
>> for sure.
>>
>> PS I had the program run again overnight, it finished in 7 hours 1200
>> individual ooRexx processes with no problem. In another run I am now at 53
>> GB in a single process running at 100% CPU for 10 hours.
>>
>> Question on Mutable Buffers (there is a lot of *NEW* there): I understand
>> I need to ~append or ~insert for the MB but why is the interpreter not
>> warning me when I overwrite an object with a string? Why is that not an
>> error? Is there a reason why it should be allowed to destroy an object like
>> I did?
>>
>> Hälsningar/Regards/Grüsse,
>> P.O. Jonsson
>> oor...@jonases.se
>>
>>
>>
>>
>> Am 27.06.2017 um 17:15 schrieb Erich Steinböck <
>> erich.steinbo...@gmail.com>:
>>
>> maybe it is just bad programming
>>
>> Hi P.O.,
>> I had a look at Split_data and as far as I can see there are a lot of
>> things which can be improved.
>>
>> 1)
>>
>> You may want to re-read how to work with a MutableBuffer. E. g.
>>
>> tempMB = .mutablebuffer~new('')
>> do while ..
>> tempMB = qfileIn~linein
>>
>> Initializing a variable with a MutableBuffer instance, and afterwards
>> assigning it a String (linein() resturns a String) doesn't make sense.
>>
>> I can see quite a few instances of this, e. g.
>>
>> TranslatedMB = .mutablebuffer~new('')
>> do while ..
>> DO i=1 TO i_End
>> DO j=1 TO j_End
>>
>> TranslatedMB = TranslatedMB TranslateWordMB
>>
>> Again, the final TranslatedMB assignment is not what the ..MB ending of
>> the variables suggest.
>>
>> 2)
>>
>> You might move invariant stuff (here: LeftWordsMB~Word(i) || '-') in an
>> inner loop outside the loop, e.g.
>>
>> DO j=1 TO j_End
>> TranslateWordMB = LeftWordsMB~Word(i) || '-' ||
>> RightWordsMB~Word(j)
>>
>>
>> 3)
>>
>> Consider using use a single startsWith() instead of the code between
>> lines 448 and 485
>>
>> 4)
>>
>> IF TranslatedMB~WordPos(TranslateWordMB) > 0 THEN
>> ..
>> ELSE
>> DO
>> TranslatedMB = TranslatedMB TranslateWordMB
>>
>> Instead of building a long string of all things seen before, and checking
>> with wordPos(), you might instead put all things seen into a Set and
>> check with hasIndex()
>>
>> 5)
>>
>> Generally, using Arrays may be more efficient if you can save the Stem.0
>> handling
>> But then, using the proper type of Collection and appropriate algorithm
>> may help much more
>> To give suggestions for that, I'd need more detail would on what exactly
>> you would like to achieve
>>
>> On Tue, Jun 27, 2017 at 7:55 AM, P.O. Jonsson <oor...@jonases.se> wrote:
>>
>>> Dear developers,
>>>
>>> I have had the memory bloating problem again, this time I reached 48 GB
>>> (the maximum for one CPU in my machine) and the process only ended after
>>> some 13 CPU hours with 100% CPU the whole time.
>>>
>>>
>>>
>>>
>>> From the logging info I could confirm that the program was stuck
>>> somewhere here most of the time, here are the rough steps
>>>
>>> Language pairs detected in C routine -> External call, no memory
>>> bloating
>>> Data processing finished after 2107 Seconds 00:58:12
>>> Splitting finished after 49487 Seconds 14:42:59 *-> Routine Split_data*
>>> Sorting finished after 16527 Seconds 19:18:27 *-> Routine Sort_data*
>>> Processing of Data file finished after 68123 Seconds
>>> Writing the Logfile TR_DE-EN-eu_logfile.txt 26 Jun 2017 19:18:28
>>>
>>> I have enclosed the Routines in question.
>>>
>>> In my dropbox I have stored the complete program with some test data to
>>> replicate the processing, the problem is reproducible. Just put the folder
>>> somewhere, move there and perform the command indicated.
>>>
>>> https://www.dropbox.com/sh/vettlcb4f8ae3cw/AACWIQivo_F2Khhyt
>>> J6izkbFa?dl=0
>>>
>>> I run Open Object Rexx Version 5.0.0, Build date: May 20 2017,
>>> Addressing mode: 64
>>> Hardware Mac Pro with dual-CPU Xeon Processors running Mac OS Sierra
>>> 10.12.5
>>>
>>> PS as I was making the screenshot the process finished nicely, no crash
>>> or anything and the memory was released. So maybe it is just bad
>>> programming, but at least you can confirm that then :-)
>>>
>>>
>>>
>>>
>>> Hälsningar/Regards/Grüsse,
>>> P.O. Jonsson
>>> oor...@jonases.se
>>>
>>>
>>>
>>>
>>>
>>> ------------------------------------------------------------
>>> ------------------
>>> Check out the vibrant tech community on one of the world's most
>>> engaging tech sites, Slashdot.org <http://slashdot.org/>!
>>> http://sdm.link/slashdot
>>> _______________________________________________
>>> Oorexx-devel mailing list
>>> Oorexx-devel@lists.sourceforge.net
>>> https://lists.sourceforge.net/lists/listinfo/oorexx-devel
>>>
>>>
>> ------------------------------------------------------------
>> ------------------
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org <http://slashdot.org/>!
>> http://sdm.link/slashdot_______________________________________________
>> Oorexx-devel mailing list
>> Oorexx-devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/oorexx-devel
>>
>>
>>
>> ------------------------------------------------------------
>> ------------------
>> Check out the vibrant tech community on one of the world's most
>> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
>> _______________________________________________
>> Oorexx-devel mailing list
>> Oorexx-devel@lists.sourceforge.net
>> https://lists.sourceforge.net/lists/listinfo/oorexx-devel
>>
>>
> ------------------------------------------------------------
> ------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot______
> _________________________________________
> Oorexx-devel mailing list
> Oorexx-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/oorexx-devel
>
>
>
> ------------------------------------------------------------
> ------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> Oorexx-devel mailing list
> Oorexx-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/oorexx-devel
>
>
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Oorexx-devel mailing list
Oorexx-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/oorexx-devel