Roger

Sorry to be a pain.  Naturally I owe you and the forum
an apology, yet again.

In preparing example values to send to you,  I've
discovered the cause of the poor performance which
does seem to lie elsewhere than in the aggregation,
as you clearly suspected.  I did produce nice big
arrays of nj etc, but needn't bother you with them.

It appears that the "mj's" described earlier are indeed
very long integers for later stages.  The next stage
comes to grief not with the aggregation but with the
attempt to assign the mj's in the next stage, along
the lines of

  for_iji i.#mj do.
  ...
  ijk  =. (a contiguous block of indices into new array nj1)
  mji  =. iji { mj        NB. some mj ~ 2^30 ... 2^50
  nj1  =. mji ijk } nj1
  ...
  end.

jpm shows the indexed assignment to nj1 is the time-waster.
Presumably the machine starts thrashing for memory.

Of course the alternative approach saving the indices
rather than the large counts themselves

  ij1 =. iji ijk } ij1

doesn't suffer in the same way, at the cost of a less
obvious method of aggregation.

So there are two lessons for me if noone else:
0)  Don't try to create multiple copies of large exact
integers when an index array will do;
1)  Do try to discover the source of a performance
problem before bothering others with it.  (Often
easier said than done.)

Many years of programming have however convinced me
that you (I) often only solve an inscrutable problem
through describing/explaining it to others - so thanks
for all the help.

Now for J602

Best wishes

Mike


Roger Hui wrote:
If you are really interested in the answer, please provide the vj ij nj values that exhibit the described behaviour.

   (3!:1 vj) 1!:2 <'\junk\vj'
   (3!:1 ij) 1!:2 <'\junk\ij'
   (3!:1 nj) 1!:2 <'\junk\nj'

Then send the 3 files, preferably zipped.



----- Original Message -----
From: Mike Day <[EMAIL PROTECTED]>
Date: Saturday, September 29, 2007 11:45
Subject: Re: [Jprogramming] Performance of exact keyed aggregate
To: Programming forum <[email protected]>

Thanks to Roger and Raul for their thoughts. My original message on 28/9 didn't make clear
that the results of one stage of aggregation
supplied input to the next stage,  so the
exact placement of the x: wasn't as crucial
as it appeared from my flawed presentation.

So I'm afraid all the discussion has been on my
misunderstanding of floats and integers and has
not focused on the nub of my enquiry - why does the version with indirect summation terminate in
finite time while the direct version doesn't
seem to?

This difficulty remains even when I resort to
Roger's preferred placement of the x: which I
now use in the following.

To illustrate the staged or cyclic process,
if we had:

  vj,:nj NB. stage j   - toy data
1  2  1  2  1  2  1  2  1
20 30 40 30 40 20 40 30 40

these would lead to

  [mj =: vj (+//.) nj NB. aggregate nj grouped by vj
180 110

The next stage j1 = j+1 scatters the mj around and
forms new values vj1  (the v's are always small integers
 - integer arrays in fact, but scalars do for the example)
so we might have

  vj1,:nj1 NB. stage j+1 NB. new v's, n's from stage j
3 4 5 3 4 5 3 4 4 5
180 180 180 110 110 110 180 180 110 110

and

   [mj1 =: vj1 (+//.) nj1
470 580 400

All n0 = 1 so it's somewhat immaterial whether I force
integer
with         vj(+//.) x: nj
or with     x: vj (+//.) nj

In practice, the m's are one or two orders of
magnitude larger than the n's in each stage, so we hit
the integer limits somewhere around stage 10 to 20.

However
a) the process does not produce the "correct" result  at
the final stage if it is too large for datatype integer. It is of course approximately correct.
b) the process does not terminate if I force integer with
the recommended vj(+//.) x: nj
c) the process does terminate with the "correct" result
if instead I use
(vj (</.) ij)(+/@:{)every <x:nj

Sorry about the red herring.  I think there is an
interesting problem here despite my distracting use of
x:@+/   - maybe not so entirely unimportant despite my
earlier remarks!

I hadn't noticed the appearance of j602 beta until today
- I'll download it and see if it makes any difference.

Thanks again

Mike
..............

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to