Hi P.O.,
executing other peoples code .. please give it a try
I can run code if you provide it as a platform-independent test case.
I have no Mac, and the Mac binaries you provide won't run on Ubuntu or
Windows, the platforms I can test on.
tr.rex gets stuck in the routine split_data
The code in tr.rex is inefficient when applied to a large data set.
It seems to be likely that this leads to the very long run-time and
high memory consumption you experience. Let me give you an example of
the gains that may be achieved by coding things differently: using
string append and wordPos(), this code will take a minute to execute
for n = 100000
call random , , 42
rs = ""
do n
r = random(1, n % 2)
if rs~wordPos(r) = 0 then do
rs = rs r
stem.r = r
end
else
stem.r = stem.r r
end
say time("e")~format(, 2) "sec"
Achieving something very similar using StringTable and Array, will run
in a tenth of a second for the same n
call time "r"
call random , , 42
table = .StringTable~new(n)
do n
r = random(1, n % 2)
if \table~hasIndex(r) then
table[r] = .Array~of(r)
else
table[r]~append(r)
end
say time("e")~format(, 2) "sec"
A change like this gives a 600-fold improvement, and the numbers you
are working with are much larger than 100000.
I also noted, that the DE-EN-Cleaned.txt you provide, contains more
than 56% duplicate lines - cleaning this might also bring some
improvement.
The gains are getting
On Wed, Jun 28, 2017 at 11:46 PM, P.O. Jonsson <oor...@jonases.se
<mailto:oor...@jonases.se>> wrote:
Hälsningar/Regards/Grüsse,
P.O. Jonsson
oor...@jonases.se <mailto:oor...@jonases.se>
Hello again Erich,
I know executing other peoples code can be a p.i.t.a. but please
give it a try. Se it as a golden opportunity to stress test ooRexx :-)
Am 28.06.2017 um 21:21 schrieb Erich Steinböck
<erich.steinbo...@gmail.com <mailto:erich.steinbo...@gmail.com>>:
Please download the complete test set and let it run and
I neither have a Mac nor do I have 50 GB of memory
I can share my machine over remote logon if that would help or we
can try to look at it using a shared screen. You do not need much
memory to run the program, 5 GB is more than sufficient for ONE
instance of the program, and that is enough to simulate the problem.
I had a REPRODUCIBLE scenario where this problem occurs
Out of 1200 or so runs it was only this single run that
produced memory bloating
see if you can reproduce the memory problem
Can you explain the problem in more detail? What exactly happens
when you run which command with what arguments? What are you
expecting to happen instead and why?
The problem is that the program tr.rex gets stuck in the routine
split_data (in the main loop when I break it)) or in sort_data (on
the ~Stablesort, presumably) for 1000 times longer for certain
intermediate data (read below) than for other. It is not so much
more data compared to other runs that I would expect this memory
load. While being in one of these routines the memory allocation
for the rexx process goes up and up and up until you have no more
memory (and start to swap). At the beginning the memory allocated
to the rexx process is negligible so you can try it with any
memory that runs.
it finished in 7 hours 1200 individual ooRexx processes
What does "1200 individual ooRexx processes" mean? Are you
starting you program with 1200 different sets of arguments?
Sequentially or in parallel? Which one of the programs shows the
issue? Is it always the same one?
In order to use all cores/threads on my machine I use a bash shell
script to launch/spawn up to at most 24 instances of the same
program in parallel, running on the *same* data but with different
parameters, producing *different* intermediate data files (_RAW
files) that are read and processed in Split_data and handed over
to Sort_data. When one chunk of data is processed that process
finishes (tr.rex exits) and another one is started to do the same
over and over again up to around 1200 individual runs for one
batch. There is only one rexx program and the problem only arises
for specific parameters in combination with specific input data. I
have provided you two examples, one that runs like a charm and
another one that never finishes.
why is the interpreter not warning me when I overwrite an
object with a string?
You're not overwriting an object with a string, you're changing
a variable from referring to one object to referring to another
one. That's totally normal .. similar to coding a = 1; a = 2;
I don't think this is normal but never mind, I never liked objects
anyway :-) When I started using Rexx the credo was „Everything is
a string“. And I am still in the habit of programming like that,
hence the code you see before you.
In the past (4.1, 4.2? If I did a /say myMutableBuffer/ it
reported „A Mutable Buffer“ or something, nowadays I get the value
stored in the MB. Is there a way to check what kind of object you
are referring to? A ~whatAreYou method. Useful when you look for
mistakes in your code (I occasionally write imperfect code,
unfortunately).
On Tue, Jun 27, 2017 at 10:26 PM, P.O. Jonsson
<oor...@jonases.se <mailto:oor...@jonases.se>> wrote:
"maybe it is just bad programming“
I guess I had it coming…
Thanks Erich for your advice, I will consider it all, but my
intention with this report was another one; for the first
time I had a REPRODUCIBLE scenario where this problem
occurs. Out of 1200 or so runs it was only this single run
that produced memory bloating so my assumption was that is
was not ONLY :-) bad programming.
Please download the complete test set and let it run and see
if you can reproduce the memory problem I have. If so it is
easy for you to just improve the code and see where the
problem goes away. I have a feeling I am stuck at
a = a~StableSort
For quite some time, maybe because of unfavorable data. But
I can´t tell for sure.
PS I had the program run again overnight, it finished in 7
hours 1200 individual ooRexx processes with no problem. In
another run I am now at 53 GB in a single process running at
100% CPU for 10 hours.
Question on Mutable Buffers (there is a lot of *NEW* there):
I understand I need to ~append or ~insert for the MB but why
is the interpreter not warning me when I overwrite an object
with a string? Why is that not an error? Is there a reason
why it should be allowed to destroy an object like I did?
Hälsningar/Regards/Grüsse,
P.O. Jonsson
oor...@jonases.se <mailto:oor...@jonases.se>
Am 27.06.2017 um 17:15 schrieb Erich Steinböck
<erich.steinbo...@gmail.com
<mailto:erich.steinbo...@gmail.com>>:
maybe it is just bad programming
Hi P.O.,
I had a look at Split_data and as far as I can see there
are a lot of things which can be improved.
1)
You may want to re-read how to work with a MutableBuffer.
E. g.
tempMB = .mutablebuffer~new('')
do while ..
tempMB = qfileIn~linein
Initializing a variable with a MutableBuffer instance, and
afterwards assigning it a String (linein() resturns a
String) doesn't make sense.
I can see quite a few instances of this, e. g.
TranslatedMB = .mutablebuffer~new('')
do while ..
DO i=1 TO i_End
DO j=1 TO j_End
TranslatedMB = TranslatedMB TranslateWordMB
Again, the final TranslatedMB assignment is not what the
..MB ending of the variables suggest.
2)
You might move invariant stuff (here: LeftWordsMB~Word(i)
|| '-') in an inner loop outside the loop, e.g.
DO j=1 TO j_End
TranslateWordMB = LeftWordsMB~Word(i) || '-' ||
RightWordsMB~Word(j)
3)
Consider using use a single startsWith() instead of the
code between lines 448 and 485
4)
IF TranslatedMB~WordPos(TranslateWordMB) > 0 THEN
..
ELSE
DO
TranslatedMB = TranslatedMB TranslateWordMB
Instead of building a long string of all things seen
before, and checking with wordPos(), you might instead put
all things seen into a Set and check with hasIndex()
5)
Generally, using Arrays may be more efficient if you can
save the Stem.0 handling
But then, using the proper type of Collection and
appropriate algorithm may help much more
To give suggestions for that, I'd need more detail would on
what exactly you would like to achieve
On Tue, Jun 27, 2017 at 7:55 AM, P.O. Jonsson
<oor...@jonases.se <mailto:oor...@jonases.se>> wrote:
Dear developers,
I have had the memory bloating problem again, this time
I reached 48 GB (the maximum for one CPU in my machine)
and the process only ended after some 13 CPU hours with
100% CPU the whole time.
From the logging info I could confirm that the program
was stuck somewhere here most of the time, here are the
rough steps
Language pairs detected in C routine-> External call,
no memory bloating
Data processing finished after 2107 Seconds 00:58:12
Splitting finished after 49487 Seconds 14:42:59*->
Routine Split_data*
Sorting finished after 16527 Seconds 19:18:27*->
Routine Sort_data*
Processing of Data file finished after 68123 Seconds
Writing the Logfile TR_DE-EN-eu_logfile.txt 26 Jun 2017
19:18:28
I have enclosed the Routines in question.
In my dropbox I have stored the complete program with
some test data to replicate the processing, the problem
is reproducible. Just put the folder somewhere, move
there and perform the command indicated.
https://www.dropbox.com/sh/vettlcb4f8ae3cw/AACWIQivo_F2KhhytJ6izkbFa?dl=0
<https://www.dropbox.com/sh/vettlcb4f8ae3cw/AACWIQivo_F2KhhytJ6izkbFa?dl=0>
I run Open Object Rexx Version 5.0.0, Build date: May
20 2017, Addressing mode: 64
Hardware Mac Pro with dual-CPU Xeon Processors running
Mac OS Sierra 10.12.5
PS as I was making the screenshot the process finished
nicely, no crash or anything and the memory was
released. So maybe it is just bad programming, but at
least you can confirm that then :-)
Hälsningar/Regards/Grüsse,
P.O. Jonsson
oor...@jonases.se <mailto:oor...@jonases.se>
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the
world's most
engaging tech sites, Slashdot.org
<http://slashdot.org/>! http://sdm.link/slashdot
_______________________________________________
Oorexx-devel mailing list
Oorexx-devel@lists.sourceforge.net
<mailto:Oorexx-devel@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/oorexx-devel
<https://lists.sourceforge.net/lists/listinfo/oorexx-devel>
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org <http://slashdot.org/>!
http://sdm.link/slashdot_______________________________________________
<http://sdm.link/slashdot_______________________________________________>
Oorexx-devel mailing list
Oorexx-devel@lists.sourceforge.net
<mailto:Oorexx-devel@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/oorexx-devel
<https://lists.sourceforge.net/lists/listinfo/oorexx-devel>
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org <http://Slashdot.org>!
http://sdm.link/slashdot
_______________________________________________
Oorexx-devel mailing list
Oorexx-devel@lists.sourceforge.net
<mailto:Oorexx-devel@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/oorexx-devel
<https://lists.sourceforge.net/lists/listinfo/oorexx-devel>
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org <http://Slashdot.org>!
http://sdm.link/slashdot_______________________________________________
<http://sdm.link/slashdot_______________________________________________>
Oorexx-devel mailing list
Oorexx-devel@lists.sourceforge.net
<mailto:Oorexx-devel@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/oorexx-devel
<https://lists.sourceforge.net/lists/listinfo/oorexx-devel>
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Oorexx-devel mailing list
Oorexx-devel@lists.sourceforge.net
<mailto:Oorexx-devel@lists.sourceforge.net>
https://lists.sourceforge.net/lists/listinfo/oorexx-devel
<https://lists.sourceforge.net/lists/listinfo/oorexx-devel>
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Oorexx-devel mailing list
Oorexx-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/oorexx-devel