Is it possible to describe a significant amount of the strings occupying memory 
as coming from a "small" universe? 

In other words, are symbols (of the s: variety) an option for you? If you can 
describe your main table as a collection of symbols (in their integer form, ie 
6 s: s:), numeric values, and foreign keys into other in-memory tables (ie the 
integers which dyad i. returns), you could express your entire table as 
numbers, which should provide a significant savings in space and time over a 
boxed representation.

But that's a big re-engineering project. Even to get it back to the point where 
you have the same confidence in the numeric representation as you do in the 
current boxed implementation. (Plus, s: numbers have issues with transience).

Please excuse typos; sent from a phone.

> On Aug 19, 2014, at 7:39 PM, Raul Miller <[email protected]> wrote:
> 
> I updated the code in the live session and it's working much better now.
> 
> Or at least, that part is.
> 
> I'm also getting interface errors from 2!:0 and I am having to work around
> that issue also. :/ (This issue, I think, represents kernel memory
> fragmentation - I guess linux is not tuned for processes which hold huge
> amounts of memory making system calls...)
> 
> Thanks,
> 
> -- 
> Raul
> 
> 
> 
>> On Tue, Aug 19, 2014 at 7:34 PM, Dan Bron <[email protected]> wrote:
>> 
>> 
>> There is also integrated rank support (a specific category special code)
>> for dyad -:"n , especially when n=1 (ie matching rows of tables has been
>> made particularly efficient).
>> 
>> That said, it's probably worth doing a few performance tests on
>> medium-sized data sets to compare the performance of -:"1 to that of *./ .
>> ~: rather than making a substitution on the blind and potentially wasting a
>> 24 hour run (or more) on the larger, production inputs.
>> 
>> -Dan
>> 
>> Please excuse typos; sent from a phone.
>> 
>>> On Aug 19, 2014, at 6:38 PM, Raul Miller <[email protected]> wrote:
>>> 
>>> I'd want to see some detailed reference on this issue (~.!.0 on
>> non-numeric
>>> arrays) before I'd want to blow another day or longer trying to reproduce
>>> the problem with that change.
>>> 
>>> Alternatively, I'd want to get into the C implementation and find how
>> this
>>> could happen. That maybe should be done as a theoretical exercise
>>> (understanding how the algorithm works and how it can fail) than as a
>>> practical exercise.
>>> 
>>> Please also keep in mind that I have not eliminated hardware flaws from
>> the
>>> plausible cause list. Memory corruption (or things equivalent to memory
>>> corruption, such as an intermittently failing logic component) is an
>>> all-too-likely possibility.
>>> 
>>> Thanks,
>>> 
>>> --
>>> Raul
>>> 
>>> 
>>> 
>>>> On Tue, Aug 19, 2014 at 5:15 PM, Henry Rich <[email protected]>
>> wrote:
>>>> 
>>>> ~.!.0 as I understand it uses a different algorithm from ~. even on
>>>> nonnumerics, and might be worth trying.
>>>> 
>>>> I am sure that ~.!.0 is much faster than ~. of floating-point arrays of
>>>> rank > 1.  I think ~. is OK when the rank is 1.
>>>> 
>>>> Henry Rich
>>>> 
>>>> 
>>>>> On 8/19/2014 2:11 PM, Raul Miller wrote:
>>>>> 
>>>>> Please include the current time in the sequence of timestamps. The code
>>>>> was
>>>>> still running at the point in time where I posted my email.
>>>>> 
>>>>> That said, at this point, my attempt to interrupt succeeded, and I have
>>>>> found the line of code which was stalled:
>> ----------------------------------------------------------------------
>> For information about J forums see http://www.jsoftware.com/forums.htm
> ----------------------------------------------------------------------
> For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to