John Darrington <[email protected]> writes:

> On Sat, Mar 17, 2012 at 12:15:17PM -0700, Ben Pfaff wrote:
>      John Darrington <[email protected]> writes:
>      
>      Ah, yes.  I've been aware of related problems for a long time,
>      but I haven't come up with a good solution.  One must limit the
>      total memory allocated, not the memory allocated per-instance, of
>      course, but the proper way to distribute the available memory
>      among the competing users is not obvious.  I guess that the
>      easiest way is first-come-first-served.  That might be just fine
>      in the common case, so perhaps we should implement it that way as
>      a first cut.
>
> Unless the number of cases per instances is known a priori
> (which in general it isn't) I don't see any better alternative
> to first-come-first-served. -- perhaps decadically decreasing
> might be one way, in the assumption that if there are many
> instances, then hopefully they are small ones.
>
> Is it feasible to have workspaces which dynamically change
> their allocation or is that not possible?

For casereaders, it's easy enough to dynamically change, since
casereaders are able to dump all of their in-memory data to disk.

>      For categoricals, though, what's the fallback if the memory usage
>      becomes too high?  Can we fall back to some kind of on-disk
>      storage, or do we just fail?  "Just fail" is probably not a good
>      way to go, if first-come-first-served is the strategy we use,
>      because it means that unrelated memory use (e.g. for cases) can
>      cause even small number of categories to break.
>
> Maybe we should do the "just fail" option in the first instance and see
> if we can improve it later.

OK.

>      Here's another idea that comes to mind: is there a maximum number
>      of categories that makes sense?  Would a "max categories" setting
>      defaulting to, say, 1000, still allow most users to get real work
>      done in realistic cases?
>
> 1000 would be much too high.  How many machines can allocate 64GB of heap?
> "Realistic cases" is somewhat subjective.  But I cannot envisage that in 
> most instances more than 20 categories would be involved - but who knows?

I mean, 1000 categories per instance, not 1000 instances.
Presumably, 1000 categories do not need much memory (a few
kilobytes?) unless the space for categories is, say, O(n**2) in
the number of categories (I haven't looked).
-- 
Ben Pfaff 
http://benpfaff.org

_______________________________________________
pspp-dev mailing list
[email protected]
https://lists.gnu.org/mailman/listinfo/pspp-dev

Reply via email to