Hi P.O.,
I think the optimizations suggested by others are very relevant to your use
case. Looking at your you perform many operations that are linear and are
performed for every input word so roughly quadratic in runtime. Here are
some examples:

Position   = LeftWordsMB~WordPos(LeftWordMB)

The runtime of WordPos is linear in the length of the receiver object, i.e.
LeftWordMB (which is a string).

LeftWordsMB = LeftWordsMB LeftWordMB

This constructs a new object out of LeftWordsMB and LeftWordMB. It
allocates memory and copies data, so its time complexity roughly linear in
the length of the string being combined.

Both operations are performed for every input line(!), which leads to an
approximately quadratic runtime (actually lines*words, but good enough for
our analysis). Processing a million lines with a quadratic algorithm can be
expected to timeout and given you're allocating memory in a loop, also
exhaust Rexx's heap.

Note that ooRexx can only trigger garbage collection cycles at very
specific places in the code. I'm not sure which they are right out of my
head but sometimes tight loops with memory allocation can prevent the GC
from running. You should make use of better data structured to reduce the
run-time complexity of your algorithm, then it'll be much easier to help
you. See the suggestions from the other thread, i.e. use hashing to lookup
counts.

Moritz


On Fri, Jun 30, 2017 at 12:15 PM, P.O. Jonsson <oor...@jonases.se> wrote:

> Dear all,
>
> Just for the record: I did NOT ask anyone to *debug* my code, I
> volunteered a test case that could systematically exhaust the memory of my
> machine for a certain configuration whereas for all other data it performed
> well. Had I desired to be ridiculed I would have turned to Facebook or
> Twitter :-(
>
> I have received constructive feedback from Rick, Erich and others, I know
> what I have to do, but again, that was not the point with my posting; for
> that I would post to oorexx-users, and I have done so earlier so all that
> is in the pipeline.
>
> What I have made is just a proof-of-concept, not a candidate for Miss
> Universe. In its crappy state it is nevertheless producing the correct data
> output, and the program has ben run some 50 000 times in the last two weeks
> or so, and only in a few cases (five maybe) do I have the problem I have
> described here. I have harvested more useful translation data with this
> program than dict.cc (for EN -> FR and FR -> DE at least) can provide and
> that using a single corpus, the DCEP.
>
> @Erich,
>
> If you are still interested in seeing the behavior I have made a
> monolithic test case that should run on any platform. I have tested it om
> Mac OS,  Windows 7 and Windows 10 and the behavior is the same in all
> places.
>
> Testcases 1-4 run just fine, with the Rexx process starting nicely at
> 40-50 MB and going up to a maximum of 100 MB before exiting.
> Testcase 5, the problem case, quickly start to allocate memory and
> continue to do so until there is none left.
>
> I am still puzzled how the same program can allocate 1000 times more
> memory for the same processing of different data.
>
> Download from my Dropbox folder, there is no need to sign in or anything,
> just download & unpack  everything in a folder where you want it.
>
> https://www.dropbox.com/sh/vettlcb4f8ae3cw/AACWIQivo_F2KhhytJ6izkbFa?dl=0
>
>
> Hälsningar/Regards/Grüsse,
> P.O. Jonsson
> oor...@jonases.se
>
>
>
>
>
> ------------------------------------------------------------
> ------------------
> Check out the vibrant tech community on one of the world's most
> engaging tech sites, Slashdot.org! http://sdm.link/slashdot
> _______________________________________________
> Oorexx-devel mailing list
> Oorexx-devel@lists.sourceforge.net
> https://lists.sourceforge.net/lists/listinfo/oorexx-devel
>
>


-- 
Moritz Hoffmann;
http://antiguru.de/
------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Oorexx-devel mailing list
Oorexx-devel@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/oorexx-devel

Reply via email to