Re: [PATCH] Re: zero-filed page on VOP_PUTPAGES

2011-08-22 Thread Emmanuel Dreyfus
Emmanuel Dreyfus m...@netbsd.org wrote:

 Or just avoid uvm_vnp_setsize() calls?

I wonder is that does not open the door to situation where fsync
semantics gets broken, because of a skiped uvm_vnp_setsize().

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org


Re: [PATCH] Re: zero-filed page on VOP_PUTPAGES

2011-08-22 Thread Emmanuel Dreyfus
YAMAMOTO Takashi y...@mwd.biglobe.ne.jp wrote:

  We would have a PNODE_IN_RESIZE flag for struct pnode's pn_stat, set and
  cleared in dosetattr(), and use vp-v_size on uvm_vnp_setsize() calls
  when set? Or just avoid uvm_vnp_setsize() calls?
 
 just avoid the calls.

There is a problem if two threads enter dosetattr() at the same time:
the flag would be cleared on first thread completing the operation, and
we get the same race again.

Therefore I think a mutex must guard setattr/uvm_vnp_setsize in
dosettatr(). Does it makes sense to reuse an existing mutex (which
one?), or should I introduce a new one, for instance pn_inrewrite in
struct pnode?

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org


Re: [PATCH] Re: zero-filed page on VOP_PUTPAGES

2011-08-22 Thread YAMAMOTO Takashi
 Emmanuel Dreyfus m...@netbsd.org wrote:
 
 Or just avoid uvm_vnp_setsize() calls?
 
 I wonder is that does not open the door to situation where fsync
 semantics gets broken, because of a skiped uvm_vnp_setsize().

nothing i can think of right now.
do you have a particular idea what semantics might be broken?

YAMAMOTO Takashi

 
 -- 
 Emmanuel Dreyfus
 http://hcpnet.free.fr/pubz
 m...@netbsd.org


Re: [PATCH] Re: zero-filed page on VOP_PUTPAGES

2011-08-22 Thread Emmanuel Dreyfus
YAMAMOTO Takashi y...@mwd.biglobe.ne.jp wrote:

 do you have a particular idea what semantics might be broken?

Not sure, I am still searching. 

Here is a current posible behaviour:

thread1 puffs_vnop_setattr  dosetattr size1  SETATTR size1 sent 
thread2 puffs_vnop_write  uvm_vnp_setsize increase to size2
thread2 puffs_vnop_fsync  dosetattr size2  SETATTR size2 sent
thread1 SETATTR completes, uvm_vnp_setsize size1, file gets truncated
thread2 SETATTR completes, uvm_vnp_setsize size2, but data was lost

Now we prevent uvm_vnp_setsize() calls while a SETATTR is in prorgress.
We get this:

thread1 puffs_vnop_setattr  dosetattr size1  SETATTR size1 sent 
thread2 puffs_vnop_write, uvm_vnp_setwritesize size2
thread2 puffs_vnop_fsync  dosetattr size2  SETATTR size2 sent
thread1 SETATTR completes, uvm_vnp_setsize size1
thread2 SETATTR completes, uvm_vnp_setsize size2
thread2 VOP_PUTPAGES called (with mutex held)
thread2 VOP_PUTPAGES returns (releases mutex)
thread2 puffs_vnop_fsync returns

That seems to works.

-- 
Emmanuel Dreyfus
http://hcpnet.free.fr/pubz
m...@netbsd.org


Re: what to do on memory or cache errors?

2011-08-22 Thread Matt Thomas

On Aug 22, 2011, at 2:04 PM, paul_kon...@dell.com paul_kon...@dell.com 
wrote:

 I would think that memory errors are far more likely than cache errors.  If a 
 CPU gets cache errors, it is very badly broken. 

Probably true but.

 I'm not sure it's worth doing anything other than panic for cache errors.  

Specifically uncorrected cache errors on a dirty line.  If the cache line was 
clean, you could just clear it and keep going.  You might also want to keep a 
bitmap of cache lines to see cache errors keep happening for the same cache 
line.

 For memory errors, if you can get the failing address (which some CPUs can do 
 and some cannot) and you can associate that address with some process, then 
 you might kill that process instead of panicking.  Again, I'm not sure how 
 valuable that would be.  For highly fault tolerant control systems, perhaps.  
 For anything else, not clear.  Also, a highly fault tolerant system may well 
 use  replicated CPUs, in which case having one CPU panic simply means the 
 other one takes over.

If ECC error was in a page backed by the vnode-pager, you could just unmap the 
errant page, refill with zeros (fixing ECC), return it to a free list, and let 
whoever wanted the page fault the contents back in.

 In short, is there a reason to change anything?

I don't know.  Which is why I'm asking.

Re: what to do on memory or cache errors?

2011-08-22 Thread Mouse
 besides panicing, of course.

Ideally, I think...

Corrected error: Usually, log and ignore.  Maybe watch for elevated
levels of corrected errors and disable either the containing page or
the containing memory stick, depending on how much the hardware lets
the kernel determine and maybe policy sysctls.  Maybe even allow
paranoid sysadmins to configure elevated levels of to mean any.

Uncorrectable error: Log.  Disable the containing page and/or stick, as
mentioned above.  If it's for the contents of a dirty page, about all
we can do is deliver a memory-error signal.  If it's for a clean page
(including (most) instruction-stream fetches), re-fetch the virtual
page into a new physical page and carry on.

 This is going to involve a lot of help from UVM.

Probably.  Maybe the pmap, too, for things such as figuring out what
regions of RAM would have to be disabled to stop using the affected
memory stick, or the like.

 If uvm_page_error can't correct the error, it would panic.

I'd recommend doing that only for kernel accesses; for userland, I'd
much prefer to blow up at most the process incurring the fault.

 Preemptively, we could have a thread force dirty cache lines to
 memory if they've been in L2 too long (thereby reducing the problem
 to an ECC error on a clean cache line which means you just toss the
 cache-line contents.)

Depends.  Are we talking ECC on L2 cache, or on main memory?  I'd say
the results should be different.

 We can also have a thread that reads all of memory (slowly) thereby
 causing any single bit errors to be corrected before they become
 double-bit errors.

Well, to be detected.  Whether the correct action upon detecting them
is to silently correct them is a policy matter I'd prefer to avoid
wiring into the kernel.

 I'm not familiar enough with UVM internals to actually know what to
 do but I hope someone else reading this is.

Me neither.  I have just about zero idea how implementable any of the
above is; I've been speaking in ideal generalities.  (My idea of ideal
generalities, that is, of course.)

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B


Where are the specific WARNS=n defined?

2011-08-22 Thread Paul Goyette
I'm trying to modularize a couple of drivers, and one of them is 
generating some gcc errors due to comparison of signed and unsigned 
values.


The driver module is currently being compiled with WARNS=4 (just picked 
that up from another Makefile).  Is there a more appropriate WARNS=n to 
use to permit the comparison?  Or should I simply add


CCOPTS+= -Wno-sign-compare

(which is used when the driver is being compiled as part of a kernel)?



-
| Paul Goyette | PGP Key fingerprint: | E-mail addresses:   |
| Customer Service | FA29 0E3B 35AF E8AE 6651 | paul at whooppee.com|
| Network Engineer | 0786 F758 55DE 53BA 7731 | pgoyette at juniper.net |
| Kernel Developer |  | pgoyette at netbsd.org  |
-


Re: Where are the specific WARNS=n defined?

2011-08-22 Thread Christos Zoulas
In article pine.neb.4.64.1108222146340.22...@screamer.whooppee.com,
Paul Goyette  p...@whooppee.com wrote:
I'm trying to modularize a couple of drivers, and one of them is 
generating some gcc errors due to comparison of signed and unsigned 
values.

The driver module is currently being compiled with WARNS=4 (just picked 
that up from another Makefile).  Is there a more appropriate WARNS=n to 
use to permit the comparison?  Or should I simply add

   CCOPTS+= -Wno-sign-compare

(which is used when the driver is being compiled as part of a kernel)?

It is best to fix the errors.

christos



Re: Where are the specific WARNS=n defined?

2011-08-22 Thread Mouse
 [...] gcc errors due to comparison of signed and unsigned values.

 It is best to fix the errors.

What errors?

It is not necessarily an error to compare signed and unsigned values.
In my experience, that warning produces so many more false positives
than useful warnings that I normally shut it off entirely.

/~\ The ASCII Mouse
\ / Ribbon Campaign
 X  Against HTMLmo...@rodents-montreal.org
/ \ Email!   7D C8 61 52 5D E7 2D 39  4E F1 31 3E E8 B3 27 4B