Bosko Milekic wrote:

On Fri, Jul 18, 2003 at 07:05:58PM +0200, Harti Brandt wrote:

Hi all,

it seems there is a problem with the zone allocator in SMP systems.

I have a zone, that has an upper limit on items that resolves to an
upper limit of pages of 1. It turns out, that allocations from this
zone get stuck from time to time. It seems to me, that the following
happens:

- on the first call to uma_zalloc a page is allocated and all the free
items are put into the cache of the CPU. uz_free of the zone is 0 and
uz_cachefree holds all the free items.

- when the next call to uma_zalloc occurs on the same CPU, everything is
fine. uma_zalloc just gets the next item from the cache.

- when the call happens on another CPU, the code finds uz_free to be 0 and
checks the page limit (uma_core.c:1492). It finds the limit already
reached and puts the process to sleep (uma_zalloc was called with
M_WAITOK).

- the process may sleep there forever (depending on circumstances).

If M_WAITOK is not set, the code will falsely return NULL while there
are still free items (albeight in the cache of another CPU).

I wonder whether this is intended behaviour. If yes, this should be
definitely documented. uma_zone_set_max() seems to be documented only in
the header file and it does not mention, that free items may not actually
be allocatable because they happen to sit in another CPU's cache.

If it is not intended (I would prefer this), I wonder how one can get the
items out of another's CPU cache. I'm not too familiar with this code.
I suppose this should be done somewhere around uma_core.c:1485?

Regards,
harti
--
harti brandt,
http://www.fokus.fraunhofer.de/research/cc/cats/employees/hartmut.brandt/private
[EMAIL PROTECTED], [EMAIL PROTECTED]


If the per-cpu caches are relatively small (which they ought to be, especially when you've hit a maximum number of allocations from the zone), then this is actually not that bad of a behavior.

  I spoke to Jeff about this and it seemed to me that he was leaning
  toward keeping the behavior this way and, in fact, also perhaps _not_
  even doing an internal free to the zone when UMA_ZFLAG_FULL is in
  effect but we still have space in the pcpu cache.  While I'm not sure
  if going that far is a good idea, I _don't_ really think that the
  current behavior is a bad idea.  As mentionned, when you have a zone
  that is mostly starved, all future frees will go back to the zone and
  not the per-cpu caches, but if you have some free items in another
  per-cpu cache, you're not likely to hit a starvation situation unless
  something is horribly wrong.  And having the free code actually drain
  the per-cpu caches in a zone-full situation may lead to bad behavior
  under heavy load.  Think about what happens under heavy load... your
  zone is starved and if you then flush all the pcpu caches and the load
  is still heavy, you're likely to have other threads try to allocate
  anyway, so they'll end up having to dip into the zone anyway;
  therefore, there doesn't seem to be much of a reason to push the
  cached objects back into the zone (if they're going to leave it again
  soon anyway).



Well the problem is, that nothing is starved. I have an idle machine and a zone that I have limited to 60 or so items. When allocating the 2nd item I get block on the zone limit. Usually I get unblocked whenever I free an item. This will however not happen, because I have neither reached the limit nor is there memory pressure in the system to which I could react. I simply may be blocked forever.

That makes the limit feature for zones rather useless, because I cannot predict how many of the items I can really allocate (this depends on the number of processors, the page size and the configuration of UMA itself).

Perhaps we could make the behaviour dependent on the maximum number of items. When it is rather low (a couple of pages worth) and I would block on the zone limit and I have free items in another CPU's cache then drain one of the caches.

Or I could simply remove the limits.


harti




_______________________________________________
[EMAIL PROTECTED] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "[EMAIL PROTECTED]"

Reply via email to