John Darrington <[email protected]> writes: > On Sat, Mar 17, 2012 at 11:13:56AM -0700, Ben Pfaff wrote: > > The current default of 64 MB is fairly conservative for modern > systems. I'd be happy to adjust that downward on (presumably > older) systems that have little memory. Perhaps we could use the > gnulib "physmem" module to find that out. > > But: Are you sure that the problem is that the default setting is > too high? I would have guessed that the problem is actually one > of two things: either the setting is being raised manually to a > value that is too high for the system, or the categorical code > does not honor the setting regardless of its value. (Without > looking at code, I'd guess that the latter is the case.) > > 64MB is quite acceptable if it is being called just a few > times, which would be typical of a normal use of a categorical procedure. > However if a continuous variable is unwittingly specified as a categorical > variable, then potentially the system will attempt to allocate 64MB * N > where N is the number of distinct values of that variable. Clearly if N is > very large, that's not going to work.
Ah, yes. I've been aware of related problems for a long time, but I haven't come up with a good solution. One must limit the total memory allocated, not the memory allocated per-instance, of course, but the proper way to distribute the available memory among the competing users is not obvious. I guess that the easiest way is first-come-first-served. That might be just fine in the common case, so perhaps we should implement it that way as a first cut. For categoricals, though, what's the fallback if the memory usage becomes too high? Can we fall back to some kind of on-disk storage, or do we just fail? "Just fail" is probably not a good way to go, if first-come-first-served is the strategy we use, because it means that unrelated memory use (e.g. for cases) can cause even small number of categories to break. Here's another idea that comes to mind: is there a maximum number of categories that makes sense? Would a "max categories" setting defaulting to, say, 1000, still allow most users to get real work done in realistic cases? -- Ben Pfaff http://benpfaff.org _______________________________________________ pspp-dev mailing list [email protected] https://lists.gnu.org/mailman/listinfo/pspp-dev
