I'd want to see some detailed reference on this issue (~.!.0 on non-numeric arrays) before I'd want to blow another day or longer trying to reproduce the problem with that change.
Alternatively, I'd want to get into the C implementation and find how this could happen. That maybe should be done as a theoretical exercise (understanding how the algorithm works and how it can fail) than as a practical exercise. Please also keep in mind that I have not eliminated hardware flaws from the plausible cause list. Memory corruption (or things equivalent to memory corruption, such as an intermittently failing logic component) is an all-too-likely possibility. Thanks, -- Raul On Tue, Aug 19, 2014 at 5:15 PM, Henry Rich <[email protected]> wrote: > ~.!.0 as I understand it uses a different algorithm from ~. even on > nonnumerics, and might be worth trying. > > I am sure that ~.!.0 is much faster than ~. of floating-point arrays of > rank > 1. I think ~. is OK when the rank is 1. > > Henry Rich > > > On 8/19/2014 2:11 PM, Raul Miller wrote: > >> Please include the current time in the sequence of timestamps. The code >> was >> still running at the point in time where I posted my email. >> >> That said, at this point, my attempt to interrupt succeeded, and I have >> found the line of code which was stalled: >> data=. ~.data >> >> And, here is what it looks like (extra indent because the code is halted): >> $data >> 194238 25 >> 3!:0 data >> 32 >> >> And at this point I understand what to do about this issue. I've solved >> this before in a different context. >> >> Here's a slightly larger view of that bit of code: >> >> assert. 2=#$data >> data=. a:,data >> data=. ~.data >> data=. }.data >> >> The fundamental point of this code is to remove some rows of aces which >> were introduced in an earlier step, but only in some revisions of the >> code. >> (This is "big data" where the volume of data is huge and manual mistakes >> need to be compensated for.) >> >> A secondary point of this code would be to eliminate duplicate entries >> which could happen for a variety of reasons (there were manual steps in >> the >> original data preparation and sometimes the same data got packaged up >> twice, for example). >> >> So, originally, this was a really cute solution - I could deal with two >> different problems with the same expression. But the secondary point >> (which >> was only a heuristic because manual mistakes could result in non-exact >> duplicates) is too expensive to deal with at this stage. It'll have to be >> dealt with later, when people can bring their attention to bear on the >> problem (yay job creation!?). >> >> Anyways, the solution is this: >> >> data=. data#~ data+./ .~: a: >> >> This accomplishes the primary objective - deleting those blank rows, and >> without the asymptotically bad behavior from ~. on this kind of data. >> >> So that is what I will have to do. >> >> Thanks, >> >> ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm > ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
