Re: CVS commit: src/sys/kern
The first step would be to analyze code, identify glocal resource allocations, and replace them to local ones. m_get() / m_clget() would become per-socket, by allocating per-socket pool_cache. On Tue, Nov 09, 2010 at 12:36:16PM +0900, Masao Uebayashi wrote: > The root cause of such a problem is, we don't do I/O scheduling at all. > > I think that pool(9) (and pool_cache(9)) is a great tool to manage > limited resources. The way to go would be to extend it to manage > bandwidth in I/O subsystems. (We already use pool_cache(9) for > KVA.) > > Resources are shared, and have dependencies. I/O resources would > look like some hierachical structure chained with pool backend > callbacks. Probably combined with device tree too. > > That said, I have no idea how to *fix* our networking stack. It's > unmaintainable IMO... > > On Thu, Oct 14, 2010 at 05:07:39PM +0900, o...@netbsd.org wrote: > > Hello, > > > > Mindaugas said: > > > Can you explain what the actual problem M_WAIT causes i.e. how would > > > it "freeze network stack"? > > > > In M_WAIT case, m_reclaim() will run and run until get mbuf cluster > > if mclpool limit reached. If m_reclaim() repeatedly but cannot to > > get new mbuf cluster, m_clget() will not return. > > > > network stacks using mbufs is use with M_DONTWAIT, but it will failed > > to get new mbuf cluster in this case. "freeze" means that. > > > > Generally NMBCLUSTERS is enough and SB_MAX is small, the problem > > is rarely caused. It depends parameter configuration, I think. > > > > In other words, to cause this problem, > > - many packets receives from the network, it will read by userland > > - userland program sends many packets, but not read packet from the stack > > - and mclpool limit reached. > > - no one calls pool_put() > > > > In M_DONTWAIT case, M_EXT flag is result of m_clget(). > > code of checking M_EXT is already prepared, but didn't used > > it with M_WAIT. I think this is careless miss. > > > > Thanks. > > -- > > Masaru OKI -- Masao Uebayashi / Tombi Inc. / Tel: +81-90-9141-4635
Re: CVS commit: src/sys/arch/mips/mips
David Holland wrote: > On Mon, Nov 08, 2010 at 06:09:39PM +, Antti Kantee wrote: > > Modified Files: > >src/sys/arch/mips/mips: locore_mips1.S > > > > Log Message: > > In TLBRead, restore PID before doing the saves so that the caller's > > TLB entries are used instead of the PID given as the argument. > > > > from Alessandro Forin > > [ ... ] > > Do you have any further explanations or reports of symptoms? I don't > see anything on any of the mips lists. I'm curious about this too. That code hasn't changed for at least 14 years... Cheers, Simon.
Re: CVS commit: src/sys/kern
The root cause of such a problem is, we don't do I/O scheduling at all. I think that pool(9) (and pool_cache(9)) is a great tool to manage limited resources. The way to go would be to extend it to manage bandwidth in I/O subsystems. (We already use pool_cache(9) for KVA.) Resources are shared, and have dependencies. I/O resources would look like some hierachical structure chained with pool backend callbacks. Probably combined with device tree too. That said, I have no idea how to *fix* our networking stack. It's unmaintainable IMO... On Thu, Oct 14, 2010 at 05:07:39PM +0900, o...@netbsd.org wrote: > Hello, > > Mindaugas said: > > Can you explain what the actual problem M_WAIT causes i.e. how would > > it "freeze network stack"? > > In M_WAIT case, m_reclaim() will run and run until get mbuf cluster > if mclpool limit reached. If m_reclaim() repeatedly but cannot to > get new mbuf cluster, m_clget() will not return. > > network stacks using mbufs is use with M_DONTWAIT, but it will failed > to get new mbuf cluster in this case. "freeze" means that. > > Generally NMBCLUSTERS is enough and SB_MAX is small, the problem > is rarely caused. It depends parameter configuration, I think. > > In other words, to cause this problem, > - many packets receives from the network, it will read by userland > - userland program sends many packets, but not read packet from the stack > - and mclpool limit reached. > - no one calls pool_put() > > In M_DONTWAIT case, M_EXT flag is result of m_clget(). > code of checking M_EXT is already prepared, but didn't used > it with M_WAIT. I think this is careless miss. > > Thanks. > -- > Masaru OKI
Re: CVS commit: src/sys/arch/mips/mips
On 11/08/2010 15:19, Antti Kantee wrote: On Mon Nov 08 2010 at 21:40:49 +, David Holland wrote: On Mon, Nov 08, 2010 at 06:09:39PM +, Antti Kantee wrote: > Modified Files: > src/sys/arch/mips/mips: locore_mips1.S > > Log Message: > In TLBRead, restore PID before doing the saves so that the caller's > TLB entries are used instead of the PID given as the argument. > > from Alessandro Forin This doesn't make any sense. As far as I can tell, the real problem is that the code is not attending to pipeline hazards properly; moving things around until it experimentally seems to work is just going to create other odd behavior sometime down the line under different timing circumstances. I don't have a mips1-specific reference on hand, but my recollection is that you need *three* nops when messing with cop0 registers for all effects to flush through. locore_mips1.S doesn't agree with your recollection, e.g. mips1_TLBUpdate right above mips1_TLBRead. mips1 had a 2 stage pipeline, so at most you'd need 1 nop. 3 nops is for the R4000's and similar. Then it got weird with multi-issue, threads, and sometimes silicon bugs that required more before MIPS32r2 and MIPS64r2 codified a single, special kind of nop that did the trick. Warner
Re: CVS commit: src/sys/arch/mips/mips
On Mon Nov 08 2010 at 21:40:49 +, David Holland wrote: > On Mon, Nov 08, 2010 at 06:09:39PM +, Antti Kantee wrote: > > Modified Files: > >src/sys/arch/mips/mips: locore_mips1.S > > > > Log Message: > > In TLBRead, restore PID before doing the saves so that the caller's > > TLB entries are used instead of the PID given as the argument. > > > > from Alessandro Forin > > This doesn't make any sense. > > As far as I can tell, the real problem is that the code is not > attending to pipeline hazards properly; moving things around until it > experimentally seems to work is just going to create other odd > behavior sometime down the line under different timing circumstances. > > I don't have a mips1-specific reference on hand, but my recollection > is that you need *three* nops when messing with cop0 registers for all > effects to flush through. locore_mips1.S doesn't agree with your recollection, e.g. mips1_TLBUpdate right above mips1_TLBRead.
Re: CVS commit: src/sys/arch/mips/mips
On Mon, Nov 08, 2010 at 06:09:39PM +, Antti Kantee wrote: > Modified Files: > src/sys/arch/mips/mips: locore_mips1.S > > Log Message: > In TLBRead, restore PID before doing the saves so that the caller's > TLB entries are used instead of the PID given as the argument. > > from Alessandro Forin This doesn't make any sense. As far as I can tell, the real problem is that the code is not attending to pipeline hazards properly; moving things around until it experimentally seems to work is just going to create other odd behavior sometime down the line under different timing circumstances. I don't have a mips1-specific reference on hand, but my recollection is that you need *three* nops when messing with cop0 registers for all effects to flush through. Do you have any further explanations or reports of symptoms? I don't see anything on any of the mips lists. -- David A. Holland dholl...@netbsd.org
Re: CVS commit: src/tests/fs/tmpfs
On Mon, Nov 08, 2010 at 03:25:50PM +, Antti Kantee wrote: > Module Name: src > Committed By:pooka > Date:Mon Nov 8 15:25:50 UTC 2010 > > Modified Files: > src/tests/fs/tmpfs: t_mkdir.sh > > Log Message: > Use su -m since _atf has nologin as its shell. > XXX: this will still fail if root's shell is csh If you're going to go to the trouble of littering the system with test users and groups, why not create a wrapper program that behaves in a useful and predictable manner instead of using su and blowing up on random user settings? -- David A. Holland dholl...@netbsd.org
Re: CVS commit: src/sys/arch/powerpc
On Sun, Nov 07, 2010 at 01:46:06PM +1100, Simon Burge wrote: > "Masao Uebayashi" wrote: > > > Module Name:src > > Committed By: uebayasi > > Date: Sat Nov 6 16:36:27 UTC 2010 > > > > Modified Files: > > > > src/sys/arch/powerpc/ibm4xx: pmap.c > > src/sys/arch/powerpc/include/ibm4xx: vmparam.h > > > > Log Message: > > > > Merge from uebayasi-xip: > > > > revision 1.60.2.5 > > date: 2010/08/14 02:09:57; author: uebayasi; state: Exp; lines: +2 -1 > > Teach TLB miss handler (pmap_tlbmiss()) to map "Expansion ROM" area as > > PA == VA. Now we don't need to reserve a TLB entry for it. > > > > At a quick glance, this seems to hard-wire in device addresses which may > not be the same on every ibm4xx platform (the 0xef00 address). > > Also I'm not sure I like the idea of modifying a general pmap to have > knowledge of device-specific "Expansion ROM" address ranges. Are these > _always_ guaranteed to live in the address range you've specified or > is there a chance a ROM could be mapped at different addresses on some > boards? The change as is also limits expansion ROMs to a bit over > 256MB which mightn't be a problem now but also indicates a problem with > hardcoding specific values in a general pmap. > > I'm not sure what the "correct" way of doing this is. I think I'd > prefer to see some sort of registration of expansion ROM space that you > want to reserve, but would need to put some more thought in to it. I'm sure what the correct way should be. This is part of fault handler specific to CPU. If we decide to map part kernel VA to fixed addreess backed by fixed pages on this CPU, we *resolve* it first. Otherwise we'll try more general path. So, we follow the strict definition that pmap(9) manages per-virtual space (proc) address translation information, this pmap_tlbmiss() breaks the rule. > > If this change was to remain in its current form, the comment > immediately above the change you made in the tlb miss handler is now > incorrect and would need to be adjusted. > > Cheers, > Simon.