Is it possible to describe a significant amount of the strings occupying memory as coming from a "small" universe?
In other words, are symbols (of the s: variety) an option for you? If you can describe your main table as a collection of symbols (in their integer form, ie 6 s: s:), numeric values, and foreign keys into other in-memory tables (ie the integers which dyad i. returns), you could express your entire table as numbers, which should provide a significant savings in space and time over a boxed representation. But that's a big re-engineering project. Even to get it back to the point where you have the same confidence in the numeric representation as you do in the current boxed implementation. (Plus, s: numbers have issues with transience). Please excuse typos; sent from a phone. > On Aug 19, 2014, at 7:39 PM, Raul Miller <[email protected]> wrote: > > I updated the code in the live session and it's working much better now. > > Or at least, that part is. > > I'm also getting interface errors from 2!:0 and I am having to work around > that issue also. :/ (This issue, I think, represents kernel memory > fragmentation - I guess linux is not tuned for processes which hold huge > amounts of memory making system calls...) > > Thanks, > > -- > Raul > > > >> On Tue, Aug 19, 2014 at 7:34 PM, Dan Bron <[email protected]> wrote: >> >> >> There is also integrated rank support (a specific category special code) >> for dyad -:"n , especially when n=1 (ie matching rows of tables has been >> made particularly efficient). >> >> That said, it's probably worth doing a few performance tests on >> medium-sized data sets to compare the performance of -:"1 to that of *./ . >> ~: rather than making a substitution on the blind and potentially wasting a >> 24 hour run (or more) on the larger, production inputs. >> >> -Dan >> >> Please excuse typos; sent from a phone. >> >>> On Aug 19, 2014, at 6:38 PM, Raul Miller <[email protected]> wrote: >>> >>> I'd want to see some detailed reference on this issue (~.!.0 on >> non-numeric >>> arrays) before I'd want to blow another day or longer trying to reproduce >>> the problem with that change. >>> >>> Alternatively, I'd want to get into the C implementation and find how >> this >>> could happen. That maybe should be done as a theoretical exercise >>> (understanding how the algorithm works and how it can fail) than as a >>> practical exercise. >>> >>> Please also keep in mind that I have not eliminated hardware flaws from >> the >>> plausible cause list. Memory corruption (or things equivalent to memory >>> corruption, such as an intermittently failing logic component) is an >>> all-too-likely possibility. >>> >>> Thanks, >>> >>> -- >>> Raul >>> >>> >>> >>>> On Tue, Aug 19, 2014 at 5:15 PM, Henry Rich <[email protected]> >> wrote: >>>> >>>> ~.!.0 as I understand it uses a different algorithm from ~. even on >>>> nonnumerics, and might be worth trying. >>>> >>>> I am sure that ~.!.0 is much faster than ~. of floating-point arrays of >>>> rank > 1. I think ~. is OK when the rank is 1. >>>> >>>> Henry Rich >>>> >>>> >>>>> On 8/19/2014 2:11 PM, Raul Miller wrote: >>>>> >>>>> Please include the current time in the sequence of timestamps. The code >>>>> was >>>>> still running at the point in time where I posted my email. >>>>> >>>>> That said, at this point, my attempt to interrupt succeeded, and I have >>>>> found the line of code which was stalled: >> ---------------------------------------------------------------------- >> For information about J forums see http://www.jsoftware.com/forums.htm > ---------------------------------------------------------------------- > For information about J forums see http://www.jsoftware.com/forums.htm ---------------------------------------------------------------------- For information about J forums see http://www.jsoftware.com/forums.htm
