I'd want to see some detailed reference on this issue (~.!.0 on non-numeric
arrays) before I'd want to blow another day or longer trying to reproduce
the problem with that change.

Alternatively, I'd want to get into the C implementation and find how this
could happen. That maybe should be done as a theoretical exercise
(understanding how the algorithm works and how it can fail) than as a
practical exercise.

Please also keep in mind that I have not eliminated hardware flaws from the
plausible cause list. Memory corruption (or things equivalent to memory
corruption, such as an intermittently failing logic component) is an
all-too-likely possibility.

Thanks,

-- 
Raul



On Tue, Aug 19, 2014 at 5:15 PM, Henry Rich <[email protected]> wrote:

> ~.!.0 as I understand it uses a different algorithm from ~. even on
> nonnumerics, and might be worth trying.
>
> I am sure that ~.!.0 is much faster than ~. of floating-point arrays of
> rank > 1.  I think ~. is OK when the rank is 1.
>
> Henry Rich
>
>
> On 8/19/2014 2:11 PM, Raul Miller wrote:
>
>> Please include the current time in the sequence of timestamps. The code
>> was
>> still running at the point in time where I posted my email.
>>
>> That said, at this point, my attempt to interrupt succeeded, and I have
>> found the line of code which was stalled:
>>    data=. ~.data
>>
>> And, here is what it looks like (extra indent because the code is halted):
>>        $data
>> 194238 25
>>        3!:0 data
>> 32
>>
>> And at this point I understand what to do about this issue. I've solved
>> this before in a different context.
>>
>> Here's a slightly larger view of that bit of code:
>>
>>    assert. 2=#$data
>>    data=. a:,data
>>    data=. ~.data
>>    data=. }.data
>>
>> The fundamental point of this code is to remove some rows of aces which
>> were introduced in an earlier step, but only in some revisions of the
>> code.
>> (This is "big data" where the volume of data is huge and manual mistakes
>> need to be compensated for.)
>>
>> A secondary point of this code would be to eliminate duplicate entries
>> which could happen for a variety of reasons (there were manual steps in
>> the
>> original data preparation and sometimes the same data got packaged up
>> twice, for example).
>>
>> So, originally, this was a really cute solution - I could deal with two
>> different problems with the same expression. But the secondary point
>> (which
>> was only a heuristic because manual mistakes could result in non-exact
>> duplicates) is too expensive to deal with at this stage. It'll have to be
>> dealt with later, when people can bring their attention to bear on the
>> problem (yay job creation!?).
>>
>> Anyways, the solution is this:
>>
>>    data=. data#~ data+./ .~: a:
>>
>> This accomplishes the primary objective - deleting those blank rows, and
>> without the asymptotically bad behavior from ~. on this kind of data.
>>
>> So that is what I will have to do.
>>
>> Thanks,
>>
>>  ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
>
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to