Re: DIOCGDISCARDINFO and DIOCDISCARD

2013-10-01 Thread David Holland
On Sat, Jun 15, 2013 at 11:56:55PM +, David Holland wrote:
  I had almost forgotten about this; but a few months back when I came
  into contact with the wd TRIM support in current I wanted to change
  the interface around before it appears in a release.

Ok, as of this writing I have preliminary patches for about half of
the following:

   - add a d_discard op to struct bdevsw
   - add VOP_FALLOCATE and VOP_FDISCARD
   - implement spec_fdiscard that calls d_discard
   - add posix_fallocate, fallocate, and fdiscard syscalls that
 call VOP_FALLOCATE and VOP_FDISCARD
   - add posix_fallocate, fallocate, and fdiscard to libc
   - change existing device DIOCDISCARD implementations to d_discard
 implementations
   - update the ffs -odiscard code to use the new interface
   - fix existing discard implementations to not require the caller to
 check the maximum range that can be discarded at once
   - remove DIOCGDISCARDPARAMS
   - remove DIOCDISCARD
   - implement d_discard on dk, closing PR 47940
   - fix PR 47435, where wd's discard lets you zoom off the end of a
 partition or (I think) even the whole disk
   - add compat_linux support for linux's native fallocate and
 linux's posix_fallocate

Notes and comments on this:

 - I hope adding an op to struct bdevsw isn't going to cause a ruckus
or open a can of worms. It seems like the right thing to do. However,
there are other disk ioctls that I would think probably ought to be
first-class bdevsw ops and I'm not sure what the conventional wisdom
is.

 - I'm not adding either fallocate or fdiscard support for regular
files to any fs yet; this batch of changes is all plumbing and for
access to existing TRIM support. Doing this shouldn't be that hard.

 - I have defined all the interfaces (including the bdevsw one) to use
off_t and byte counts and offsets. I don't think proliferation of
DEV_BSIZE is beneficial; plus if everything is bytes it reduces the
chance of using the wrong shift macro and messing everything up. For
an operation that erases data in a bulk fashion at a low level, I
think this is fairly important.

 - I don't think we should adopt Linux's native fallocate interface.
It has the ugly misfeature that discard is, rather than a separate
call, an extra flag passed to fallocate -- that is, you delete blocks
by allocating.

 - I haven't decided if our native fallocate and fdiscard interface
should expose the Linux option to leave the file size unchanged. (In
Linux this allows allocating blocks past EOF, which is wrong on many
levels...) If not, our native fallocate will end up the same as
posix_fallocate. Another option is to allow this for fdiscard (where
it makes more sense) but not fallocate.

int fallocate(int fd, off_t pos, off_t len, int keepsize);
int fdiscard(int fd, off_t pos, off_t len, int keepsize);
   or
int fallocate(int fd, off_t pos, off_t len);
int fdiscard(int fd, off_t pos, off_t len);
   or
int fallocate(int fd, off_t pos, off_t len);
int fdiscard(int fd, off_t pos, off_t len, int keepsize);

 - I think it's ok to have a fallocate that's incompatible with the
linux fallocate, as long as we also have posix_fallocate.

 - The vnode-level interface supports the keep-size option because
it's supposed to be able to support the linux fallocate.

 - Someone(TM) should add discard support to vnd and raidframe, and
probably also ld and in a few other places; I might do this afterwards
but I'm not (I think) going to do it as part of the main patch set.

 - This patch set is all interconnected so it looks unlikely that it
can be committed incrementally.

Thoughts and objections? I'll post a link to the patches when they're
more fully baked.

-- 
David A. Holland
dholl...@netbsd.org


lua_Number in the kernel

2013-10-01 Thread Alexander Nasonov
[ Ccing to Justin who seems to be interested in Lua in NetBSD
  but I'm not sure whether he's subscribed to tech-kern@ ].

Like some other people, I beleived that Lua kernel project is dormant
and was just waiting for any activity before starting a discussion here
but Marc replied today to an ongoing discussion on developers@. Hence,
my post.

It's very important to decide on lua_Number type in kernel early because
it affects nearly all Lua code and also because arithmetic operations is
often a cause of security holes.

In Lua, you can use your-own signed type for numbers. De-facto standard
for userspace is double (note that LuaJIT supports _only_ double type)
but ptrdiff_t is also recommented if for some reason double isn't an
option.

So, ptrdiff_t looks like a reasonable choice for the kernel but when I
was developing sljit binding for Lua [1] and I was trying to make it
robust against overflows for two different types (double and ptrdiff_t)
on both 32bit and 64bit platforms, I ended up creating bindings [2] for
arbitrary precision library.

The problem is that there are three different ranges for integer
arithmetic:

1. IEEE 754 double is a always -2^53-1 to 2^53-1 regardless of platform,
2. ptrdiff_t on 2s-complement 32bit platform is -2^31 to 2^31-1,
3. ptrdiff_t on 2s-complement 64bit platform is -2^63 to 2^63-1.

Note also that min values of ptrdiff_t on 2s-complement platforms don't
have a regular math semantic (e.g. negation produces identical value).

It's very challenging to write a code that supports all three options
without resorting to #ifdef (or Lua equivalent) or splitting a number
into low and high halves.

I'd like to propose that lua_Number in the kernel should always be
int64_t (*). This type will guarantee regular arithmetic rules for
the range (-2^53, 2^53) and for 32-bit signed integer range, in
particular.

When possible, Lua code can assume 32-bit arithmetic (without
2s-complement iggerularities). Some Lua libraries (for instance, BitOp)
assume 32bit. Other libraries that we might port to the kernel are often
written in assumption that lua_Number is double and int64_t should be
safe to use too.

When you need a full 64-bit range (signed or unsigned), arbitrary
precision library can be used instead.

[1] https://github.com/alnsn/luaSljit
[2] https://github.com/alnsn/luaBn
(*) I assume that int64_t type is available on all supported platforms
but it can be emulated by a compiler, of course.

Alex


Re: high load, no bottleneck

2013-10-01 Thread Michael
Hello,

on Saturday 28 September 2013 04:54:19 you wrote:
 Date:Sat, 28 Sep 2013 09:09:02 +0100
 From:Roland C. Dowdeswell el...@imrryr.org
 Message-ID:  20130928080902.gg4...@roofdrak.imrryr.org

   | I thought quite some time ago that it probably makes sense for us
   | to make the installer mount everything async to extract the sets
   | because, after all, if the install fails you are rather likely to
   | simply re-install rather than manually figure out where it was and
   | fix it.

 I have had the same thought - the one downside for this usage is the
 sync time at umount - for novice users this would appear to be a hang/crash
 after the install was almost finished (since it might take minutes to
 complete) to which a natural response would be to push the reset button
 and try again...   So, I think this would need to be an option to be used
 by those who know to expect that delay, and so default off.

Or just throw up a warning - Hey, this can take a few minutes, don't freak 
out!

have fun
Michael


Re: com(4) and LSI bug workaround

2013-10-01 Thread KIYOHARA Takashi
Hi!


From: Izumi Tsutsui tsut...@ceres.dti.ne.jp
Date: Mon, 30 Sep 2013 23:02:17 +0900

  It looks meaningless that the (*sc_vendor_workaround)() hook function
  is inside of MD if (sc-sc_type == COM_TYPE_ARMADAXP) statement.
  
  Isn't it simpler to prepare a MD hook function that returns
  (possible pre-processed) IIR register value that can be used
  to check if the interrupt for the device is pending or not?
 
 I performed operation to a com register into com.c.
 I think that it is not so brief to return the value of both IIR(u_char)
 and an error moreover.
 
 Sorry I don't understand what you mean.
 
 Do you think the name of sc_vendor_workaround is really appropriate
 for the function that is called only in COM_TYPE_ARMADAXP case?

I apologize for the first explanation having been lacking.

I consider calling this method for evasion processing of the problem
of LSI.  Only ARMADAXP uses it only in comintr() now.
However, this may be used during the processing from which LSI besides
the future differs.
In such a case, is a different method from this prepared?
  For example

  void *sc_vendor_workaround{1,2,3...}(void);

We are evaluating the conditions of COM_TYPE_ARMADAXP and can cope with
it by one method.


When it depends especially, LSI with much fault may come out to a market.
In this case, it may be able to be coped with by making a fault number
(tag ?) and a reason into an argument, and calling a method.

  void *sc_vendor_workaround(int reason, int value);


 The requirement in the MI interrupt handler is just
 whether interrupts are pending or not, isn't it?

Yes, it is.

Thanks,
--
kiyohara