Re: devstat overhead VS precision

2013-04-15 Thread Pawel Jakub Dawidek
On Sat, Apr 13, 2013 at 12:59:49PM +0300, Alexander Motin wrote:
 Hi.
 
 It is long known that collecting disk and GEOM statistics may cause 
 significant processing overhead under high IOPS. On my recent high-IOPS 
 benchmarks performance difference was reaching three times! Last time 
 situation improved a lot by more active use of TSC, but there are still 
 many systems where TSCs are not synchronized. I propose to switch that 
 statistics from using binuptime() to getbinuptime() to solve the problem 
 globally.
 
  From one side getbinuptime() resolution is limited by 1ms, but since 
 time is usually averaged over the many I/Os, additional sub-millisecond 
 precision will come from sampling. Since most of tools now show request 
 processing times up to 0.1ms, that precision should be sufficient. I 
 believe real disk performance is more important that n-th digit in some 
 statistics.
 
 The following patch does the change and makes disk performance 
 irrelevant to the timecounter performance:
 http://people.freebsd.org/~mav/devstat_time.patch
 
 Are there any objections against it?

No objections here, but I wonder if you were able to compare the results
somehow before and after the change so we have some hard numbers to show
that we don't lose much by applying the change.

On a mostly unrelated note when two threads (T0 and T1) call get*time()
on two different cores, but T0 does that a bit earlier is it possible
that T0 can get later time than T1?

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://mobter.com


pgpClJvKZmY1R.pgp
Description: PGP signature


Re: devstat overhead VS precision

2013-04-15 Thread Pawel Jakub Dawidek
On Mon, Apr 15, 2013 at 10:18:15PM +0300, Konstantin Belousov wrote:
 On Mon, Apr 15, 2013 at 08:42:03PM +0200, Pawel Jakub Dawidek wrote:
  On a mostly unrelated note when two threads (T0 and T1) call get*time()
  on two different cores, but T0 does that a bit earlier is it possible
  that T0 can get later time than T1?
 
 Define earlier first.
 
 If you have taken sufficient measures to prevent preemption and interruption,
 e.g. by entering spinlock before the fragment that calls get*, then no,
 it is impossible, at least not with any x86 timekeeping hardware we use.
 
 On the other hand, if interrupts are allowed, all bets are off.

So if we consider only one thread, it is not possible for it to obtain
time t0, be scheduled to different CPU and obtain t1 where t1  t0?

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://mobter.com


pgpRmT5bNwABq.pgp
Description: PGP signature


Re: kmem_map auto-sizing and size dependencies

2013-01-21 Thread Pawel Jakub Dawidek
On Fri, Jan 18, 2013 at 08:26:04AM -0800, m...@freebsd.org wrote:
   Should it be set to a larger initial value based on min(physical,KVM) space
   available?
 
 It needs to be smaller than the physical space, [...]

Or larger, as the address space can get fragmented and you might not be
able to allocate memory even if you have physical pages available.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://tupytaj.pl


pgpLKog0RIkYD.pgp
Description: PGP signature


Re: [RFQ] make witness panic an option

2012-11-25 Thread Pawel Jakub Dawidek
On Thu, Nov 15, 2012 at 04:39:55PM +, Attilio Rao wrote:
 On 11/15/12, Adrian Chadd adr...@freebsd.org wrote:
  On 15 November 2012 05:27, Giovanni Trematerra
  giovanni.tremate...@gmail.com wrote:
 
  I really do think that is a very bad idea.
  When a locking assertion fails you have just to stop your mind and
  think what's wrong,
  no way to postpone on this.
 
  Not all witness panics are actually fatal. For a developer who is
  sufficiently cluey in their area, they are quite likely able to just
  stare at the code paths for a while to figure out why the
  incorrectness occured.
 
 The problem is that such mechanism can be abused, just like the
 BLESSING one and that's why this is disabled by default.

WITNESS is a development tool. We don't ship production kernels with
WITNESS even compiled in. What is more efficient use of developer time:
going through full reboot cycle every time or reading the warning from
console, unloading a module, fixing the bug and loading it again?

And if this option is turned off by default what is the problem?

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://tupytaj.pl


pgpqNuuWS6QFO.pgp
Description: PGP signature


Re: [RFQ] make witness panic an option

2012-11-25 Thread Pawel Jakub Dawidek
On Sun, Nov 25, 2012 at 12:42:16PM +, Attilio Rao wrote:
 On Sun, Nov 25, 2012 at 12:39 PM, Pawel Jakub Dawidek p...@freebsd.org 
 wrote:
  WITNESS is a development tool. We don't ship production kernels with
  WITNESS even compiled in. What is more efficient use of developer time:
  going through full reboot cycle every time or reading the warning from
  console, unloading a module, fixing the bug and loading it again?
 
  And if this option is turned off by default what is the problem?
 
 Yes, so, why do you write here?

I'm trying to understand why do you object. Until now the only concern
you have that I found is that you are afraid of it being abused. I don't
see how this can be abused if it is turned off by default. If someone
will commit a change that will turn it on by default, believe me, I'll
unleash hell personally.

As I said, WITNESS is development tool, a very handy one. This doesn't
mean we can't make it even more handy. It is there to help find bugs
faster, right? Adrian is proposing a change that will make it help to
find and fix bugs maybe even faster.

 Go ahead and fix BLESSED, make it the default, etc.

This is another story, but BLESSED is much less controversial to me.
It is turned off by default in assumption that all the code that runs in
our kernel is developed for FreeBSD, which is not true. For example ZFS
is, I think, the biggest locking consumer in our kernel (around 120
locks), which wasn't originally developed for FreeBSD and locking order
was verified using different tools. Now on FreeBSD it triggers massive
LOR warnings from WITNESS, eventhough those are not bugs. At some point
I verified many of them and they were all false-positives, so I simply
turned off WITNESS warnings for ZFS locks. Why? Because BLESSED is
turned off in fear of abuse, and this is turn is the cause of mentioned
hack in ZFS.

 I have enough of your (not referred to you particulary but to the
 people which contributed to this and other thread) to not be able to
 respect others opinion.
 As I said I cannot forbid you guys from doing anything, just go ahead,
 write the code and commit it, albeit completely bypassing other
 people's opinion.

I'm sorry, I wasn't aware that your opinions are set in stone. I hoped
that with some new arguments you may want to reconsider:)

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://tupytaj.pl


pgpnNWcx8GwNH.pgp
Description: PGP signature


Re: [RFQ] make witness panic an option

2012-11-25 Thread Pawel Jakub Dawidek
On Sun, Nov 25, 2012 at 01:37:19PM +, Attilio Rao wrote:
 On Sun, Nov 25, 2012 at 1:12 PM, Pawel Jakub Dawidek p...@freebsd.org wrote:
  On Sun, Nov 25, 2012 at 12:42:16PM +, Attilio Rao wrote:
  On Sun, Nov 25, 2012 at 12:39 PM, Pawel Jakub Dawidek p...@freebsd.org 
  wrote:
   WITNESS is a development tool. We don't ship production kernels with
   WITNESS even compiled in. What is more efficient use of developer time:
   going through full reboot cycle every time or reading the warning from
   console, unloading a module, fixing the bug and loading it again?
  
   And if this option is turned off by default what is the problem?
 
  Yes, so, why do you write here?
 
  I'm trying to understand why do you object. Until now the only concern
  you have that I found is that you are afraid of it being abused. I don't
  see how this can be abused if it is turned off by default. If someone
  will commit a change that will turn it on by default, believe me, I'll
  unleash hell personally.
 
 So I don't understand what are you proposing.
 You are not proposing to switch BLESSING on and you are not proposing
 to import Adrian's patches in, if I get it correctly. I don't
 understand then.

I propose to get Adrian's patches in, just leave current behaviour as
the default.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://tupytaj.pl


pgprDLu11pa5N.pgp
Description: PGP signature


Re: [RFQ] make witness panic an option

2012-11-25 Thread Pawel Jakub Dawidek
On Sun, Nov 25, 2012 at 01:48:23PM +, Attilio Rao wrote:
 On Sun, Nov 25, 2012 at 1:47 PM, Pawel Jakub Dawidek p...@freebsd.org wrote:
  On Sun, Nov 25, 2012 at 01:37:19PM +, Attilio Rao wrote:
  On Sun, Nov 25, 2012 at 1:12 PM, Pawel Jakub Dawidek p...@freebsd.org 
  wrote:
   On Sun, Nov 25, 2012 at 12:42:16PM +, Attilio Rao wrote:
   On Sun, Nov 25, 2012 at 12:39 PM, Pawel Jakub Dawidek 
   p...@freebsd.org wrote:
WITNESS is a development tool. We don't ship production kernels with
WITNESS even compiled in. What is more efficient use of developer 
time:
going through full reboot cycle every time or reading the warning from
console, unloading a module, fixing the bug and loading it again?
   
And if this option is turned off by default what is the problem?
  
   Yes, so, why do you write here?
  
   I'm trying to understand why do you object. Until now the only concern
   you have that I found is that you are afraid of it being abused. I don't
   see how this can be abused if it is turned off by default. If someone
   will commit a change that will turn it on by default, believe me, I'll
   unleash hell personally.
 
  So I don't understand what are you proposing.
  You are not proposing to switch BLESSING on and you are not proposing
  to import Adrian's patches in, if I get it correctly. I don't
  understand then.
 
  I propose to get Adrian's patches in, just leave current behaviour as
  the default.
 
 So if I tell that I'm afraid this mechanism will be abused (and
 believe me, I really wanted to trimm out BLESSING stuff also for the
 same reason) and you say you can't see how there is not much we can
 discuss.

This is not what I said. I would see it as abuse if someone will
suddenly decided to turn off locking assertions by default in FreeBSD
base.

If he will turn that off on his private machine be it to speed up his
development (a good thing) or to shut up important lock assertion (a bad
thing) this is entirely his decision. He can already do that having all
the source code, its just more complex. Make tools, not policies.

BLESSING is totally different subject. You were afraid that people will
start to silence LORs they don't understand by committing blessed pairs
to FreeBSD base. And this situation is abuse and I fully agree, but I
also still think BLESSING is useful, although I recognize it might be
hard to prevent mentioned abuse.

In case of Adrian's patch nothing will change in how we enforce locking
assertions in FreeBSD base.

 You know how I think, there is no need to wait for me to reconsider,
 because I don't believe this will happen with arguments like I don't
 think, I don't agree, etc.

I provide valid arguments with I hope proper explanation, you choose not
to address them or ignore them and I hope this will change:)

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://tupytaj.pl


pgpaSF1ixWska.pgp
Description: PGP signature


Re: Training wheels for commandline (was Re: Pull in upstream before 9.1 code freeze?)

2012-07-07 Thread Pawel Jakub Dawidek
On Sat, Jul 07, 2012 at 11:25:07AM +0200, Wojciech Puchar wrote:
  something they probably don't even know about, than to skilled users to
  turn it off.
 
  If this feature is going to prints quite a few extra lines, let's just
  add one more line saying:
 
  To disable this message run: echo set 31337mode  ~/.tcshrc
 
  -- 
 should i - from now, understand that this way of extending OS is 
 considered right (i mean going down to newbies instead of going up) by 
 FreeBSD developers?

Not exactly. The from now part is a bit misleading. This is not
starting now, we try hard to make FreeBSD easier to use, more consistent
and friendlier in general for a long time now.

In your terminology making FreeBSD easier for newcomers is going
down implies that going up is to make it harder for newcomer.
I hate to break it to you, but you are living upside down.

 Please answer it is important for me, and many other people for a future.

You should definiately pay more attention, as this is happening every day.

Everyone was newcomer once. I didn't succeed on my first attempt to
install FreeBSD, neither on the second attempt. It took me few tries to
do it right. I knew nothing about UNIX back then. I consider myself as
someone who improved FreeBSD a bit, but I could as easly gave up after
first two failed attempts to install it and move to something easier.
How many people gave up after first or second attempt and never looked
back?

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://tupytaj.pl


pgpVfudPm2SF3.pgp
Description: PGP signature


Re: Pull in upstream before 9.1 code freeze?

2012-07-06 Thread Pawel Jakub Dawidek
On Thu, Jul 05, 2012 at 12:10:17AM -0600, Warner Losh wrote:
 
 On Jul 4, 2012, at 4:08 PM, Doug Barton wrote:
 
  On 07/04/2012 15:01, Mike Meyer wrote:
  On Wed, 04 Jul 2012 14:19:38 -0700
  Doug Barton do...@freebsd.org wrote:
  On 07/04/2012 11:51, Jason Hellenthal wrote:
  What would be really nice here is a command wrapper hooked into the
  shell so that when you type a command and it does not exist it presents
  you with a question for suggestions to install somewhat like Fedora has
  done.
  I would also like to see this feature, which is pretty much universal in
  linux at this point. It's very handy.
  
  I, on the other hand, count it as one of the many features of Linux
  that make me use FreeBSD.
  
  First, I agree that being able to turn it off should be possible. But I
  can't help being curious ... why would you *not* want a feature that
  tells you what to install if you type a command that doesn't exist on
  the system?
 
 Because I find on Linux it often gets it wrong and winds up being useless 
 noise.  Mostly, though, it is because I mistype commands more than I type 
 commands that should be there, but aren't.

It is even cooler than I thought initially. It punishes you for making
typos:) Cool.

I think this is very useful for newcomers. The only thing that is
missing is a one-liner how to disable this feature next to instruction
how to install a package containing the missing command.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://tupytaj.pl


pgpeGNSklXll5.pgp
Description: PGP signature


Re: Training wheels for commandline (was Re: Pull in upstream before 9.1 code freeze?)

2012-07-06 Thread Pawel Jakub Dawidek
On Thu, Jul 05, 2012 at 12:15:44PM +0200, Jonathan McKeown wrote:
 On Thursday 05 July 2012 11:03:32 Doug Barton wrote:
  If the new feature gets created, and you don't want to use it, turn it
  off. No problem.
 
 No. I think this is entirely the wrong way round. If the new feature is 
 created and you want it, turn it on. Don't make me turn off something I 
 didn't want in the first place. [...]

This feature is targeted at new users, for whom it is harder to turn on
something they probably don't even know about, than to skilled users to
turn it off.

If this feature is going to prints quite a few extra lines, let's just
add one more line saying:

To disable this message run: echo set 31337mode  ~/.tcshrc

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://tupytaj.pl


pgpxEGzpKfXff.pgp
Description: PGP signature


Re: [CFC/CFT] large changes in the loader(8) code

2012-06-28 Thread Pawel Jakub Dawidek
On Thu, Jun 28, 2012 at 08:33:17AM -0700, Marcel Moolenaar wrote:
 
 On Jun 28, 2012, at 3:10 AM, Stefan Esser wrote:
  
  All of the above is ugly, U'm afraid :(
 
 Indeed. The only sane way is to put the metadata in a partition of its own.
 Every compliant OS will respect that and consequently will not scribble over
 the data unintentionally. Any other scheme that puts valuable data in some
 undocumented or unregistered location is violating the GPT spec right away
 and is susceptible to being clobbered unintentionally.

If the user runs:

# gpart create -s GPT /dev/mirror/foo

for me it is obvious that he wants to partition the mirror device and
not individual disks. Because the mirror was configured earlier, do you
expect gmirror to somehow detect that someone is writting GPT metadata
later and magically place GPT metadata on the raw disk and move mirror's
metadata to some magic partition? Not to mention that the mirror itself
doesn't have to be configured on top of raw disks. And not to mention
that the mirror may never be partitioned.

If GPT in your opinion is limited only to raw disks then I guess the
best way to fix that is to refuse to configure GPT on anything except
raw disks (which was already proposed by Andrey?). In my opinion this is
unacceptable, but I think this is what you are suggesting.

One of the GEOM design goals was to be flexible. Let the user decide in
what order he wants to configure various layers. How do you know that in
every possible scenerio software mirroring should come after
partitioning and encryption after mirroring? Why can't we provide
flexible tools to the user and let him decide? Maybe GPT nesting
violates standards, but why can't we support it as an extention, really?

I recognize the need to warn users if they use FreeBSD-specific
features. We do that with non-standard APIs. So how about this.

Let's modify gpart(8) to print a warning if GPT is configured on
something else than raw disk. Let's the warning say that such
configuration is non-standard and problems are expected if the disk is
shared between other OSes.

In my opinion that's fair.

With such a warning in place, I think we can allow users to decide on
their own if they really want that or not. Then, we can also improve
FreeBSD boot loader to play nice with FreeBSD-specific extensions.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://tupytaj.pl


pgp74cN3XpwPl.pgp
Description: PGP signature


Re: [CFC/CFT] large changes in the loader(8) code

2012-06-28 Thread Pawel Jakub Dawidek
On Thu, Jun 28, 2012 at 02:54:43PM -0700, Marcel Moolenaar wrote:
 On Jun 28, 2012, at 12:49 PM, Alexander Leidinger wrote:
  Or are you suggesting to
  convince all BIOS vendors to include the ability to boot from some kind
  of FreeBSD private partitioning scheme (not MBR as it is not
  suitable, not GPT as you are not OK to use it on a gmirror)?
 
 I would be having less problems if the mirroring didn't force the backup
 GPT header in anything but the last sector. [...]

GPT backup header is placed in the last sector of the mirror device,
just like the user asked. Gmirror doesn't force anything. User decides
to put GPT partitioning on the mirror device instead of raw disk.
Gmirror doesn't even know and doesn't have to know how the user uses
data area on the mirror device.

 [...] If the metadata was somewhere
 else, then we wouldn't need to kluge various places to deal with the
 ambiguity and visible interoperability problems of the various tools and
 OSes. [...]

Where is somewhere else, exactly?

If somewhere else on this disk, then where? At the begining of the disk?
Then you would complain that it keeps metadata where the primary header
should be located and also MBR metadata, BSDlabel metadata, etc.
Somewhere in the middle of the disk? Some future GPTng may want to use
the same spot, but also gmirror-unaware boot loader will see corrupted
data (shifted by one sector). Come on...

If somewhere else is not on this disk, then I'm sorry, but this is
totally impractical. Disks are the place you store stuff. In 99% of the
cases there is no other place to store it, but the disk itself. Should
we ask users to use additional disk to keep mirror's metadata?

 [...] Thus, it's not that I object to the mirroring per se, just to the
 mirroring as it is currently implemented with gmirror.

Do you know software RAID (=1) or volume manager that doesn't keep
metadata on component disks?

PS. We are discussing two totally different things here:
1. Is placing GPT on anything but raw disk violates the spec? I can
   agree that it does and I'm happy with gpart(8) growing a warning.
2. How to do software mirroring. Besides trying really hard I'm not sure
   what alternative are you proposing. Could you be more specific and
   describe how gmirror should be implemented in your opinion?

  What about multipathing? In case the disk is attached via two paths but
  multipath is not enabled, the OS sees the same disk (and the same
  identical unique disk identifier) multiple times. Is this a violation
  of the spec too?
 
 It's the same disk, isn't it? The OS can actually use the property
 of the ID to infer that it has already seen this disk and not create
 multiple device nodes.

You cannot trust some id that is found on disk to be unique, as all
your assumptions break when the user decides to dd(1)-copy content of
this disk to another disk, for example.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://tupytaj.pl


pgpDtjuGB9EcQ.pgp
Description: PGP signature


Re: [CFC/CFT] large changes in the loader(8) code

2012-06-27 Thread Pawel Jakub Dawidek
On Wed, Jun 27, 2012 at 08:22:25AM -0400, John Baldwin wrote:
  I don't think so. Most common case is to configure partitions on top of
  a mirror. Mirroring partitions is less common. Mostly because of
  hardware RAIDs being popular. You don't expect hardware RAID vendor to
  mirror partitions. Partition editors for other OS's won't work, but only
  because they don't support gmirror. If they wouldn't recognize and
  support some hardware (or pseudo-hardware) RAIDs there will be the same
  problem.
 
 Hardware RAIDs hide the metadata from the disk that the BIOS (and disk
 editors) see.  Thus, putting a GPT on a hardware RAID volume works fine
 as the logical volume is always seen by all OS's consistently. [...]

Only if you won't connect this disk to a different controller.

 [...] The same
 is even true of the software RAID that graid supports since the metadata
 is defined by the vendor and thus the logical volume is always seen other
 OS's consistently.

But is it seen without metadata by the boot loader?

What I'm trying to say is that it is fair to expect from the user to not
use gmirror-configured disk on different OS. If the user wants to use
this disk in different OS then he has to use format that is recognized
by both.

Because gmirror is supported by FreeBSD we should improve the support by
teaching boot loader about it. Pretending gmirror is special and
recommending to mirror partitions with it instead of raw disks is not
the solution.

I really can't see how gmirror is different in this regard from any
other software RAID or volume manager. If you try to use disk that
contains unrecognized metadata the behaviour is undefined (but hopefully
not a panic).

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://tupytaj.pl


pgpuYtYuIiw2R.pgp
Description: PGP signature


Re: [CFC/CFT] large changes in the loader(8) code

2012-06-27 Thread Pawel Jakub Dawidek
On Wed, Jun 27, 2012 at 10:37:11AM -0700, Marcel Moolenaar wrote:
 
 On Jun 26, 2012, at 10:37 AM, John Baldwin wrote:
  
  GPT really wants the backup header at the last LBA.  I know you can set it, 
  but I've interpreted that as a way to see if the primary header is correct 
  or 
  not.  It seems to me that GPT tables created in this fashion (inside a GEOM 
  provider) will not work properly with partition editors for other OS's.  
  I'm 
  hesitant to encourage the use of this as I do think putting GPT inside of a 
  gmirror violates the GPT spec.
 
 Agreed.

Guys. This doesn't violate the GPT spec in any way. The spec is
narrow-minded if it talks only about raw disks, but you should think
about gmirror as pseudo-hardware RAID. That's all. If putting GPT on top
of RAID array is spec violation, then I guess we just have to live with it.

 While it is a nice trick to use the last sector for meta data, it does
 create 2 problems. 1 is mentioned above. [...]

It doesn't really matter where gmirror puts its metadata. If gmirror
would keep its metadata in the first sector, gpart/gpt will find its
metadata in the last sector and will complain about missing primary
header.

 [...] The second is that when there's
 different metadata in the first *and* the last sector, you can't decide
 which is to take precedence without also looking at the other and know
 how to interpret it. We have not solved this second problem at all.  We
 do get reports about the problems though. At best we're handwaving or
 kluging.

This is different kind of problem. It took me a while to realize that,
but now I know:)

The real problem is that not all metadata formats are suitable for
autodetection. That's all.

The metadata I use in my GEOM classes play nice with autodetection.
The solution is very easy - keep size of the disk device within metadata.
This allows gmirror to figure out if it is configured on raw disk, last
slice or last partition within last slice, etc.
If GPT would keep disk size in its metadata the second problem you
mentioned would not exist. And to be honest GPT kinda does that by having
backup header's LBA stored in the primary header. And this is fine as
long the primary header is valid.

The same problem is with things like UFS labels. There is no way to
properly support them using GEOM autodetection, because there is no
provider size in UFS superblock. UFS superblock contains file system
size, but it is not the same, as one can create smaller file system than
the underlying disk device.

 I think it's unwise to depend on FreeBSD-specific extensions or features
 in industry-standard partitioning schemes and as such make the use of
 foreign tools hard if not impossible.

If you plan to use the given disk with FreeBSD only, what's the problem?
Partitioning is not the end of the world. Even if you use
industry-standard partitioning schemes what file system are you going
to use to actually access your data? FAT? Of course if you do share your
disk between various OSes then probably your best bet is to use MBR or
GPT on raw disk and FAT file system. But if you use your disk with
FreeBSD only, then I see no reason to not to leverage FreeBSD-specific
features (be it gmirror, geli or zfs).

 A much more flexible approach is to support out-of-band configuration
 data. This allows us to mirror GPT disks without having to become non-
 standard as it removes the need to use the last sector for meta-data.
 The ability to construct GEOM hierarchies unambiguously is very
 important and our current approach has proven to not deliver on that.
 This is actually impacting existing FreeBSD consumers already, like
 Juniper. So, se should not go deeper into this rabbit hole. We should
 finally solve this problem for real...

Marcel, nothing stops anyone from implementing GEOM mirror class that
uses no on-disk metadata. GEOM is not a limiting factor here. GEOM does
provide mechanism for autoconfiguration, but it is totally optional and
GEOM class might choose not to use it.

As an example you can take a look at two other GEOM classes of mine:
gconcat(8) and gstripe(8). You can use 'label' subcommand to store
metadata on component disks, which will take advantage of  GEOM
autodetection and autoconfiguration. You can also use 'create'
subcommand to create ad hoc provider that stores no metadata and makes
use of entire disks, which also means it won't be automatically created
on next boot.

For Juniper it might be more handy to use out-of-band configuration as
you know the hardware you are running on, so you know where the disks
are exactly, etc. My company build appliances too, so I have been there.
For most of our users automatic configuration is simply better, as they
can shuffle disks around and not wonder if the system will boot or not.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http

Re: [CFC/CFT] large changes in the loader(8) code

2012-06-27 Thread Pawel Jakub Dawidek
On Wed, Jun 27, 2012 at 10:45:35AM -0700, Marcel Moolenaar wrote:
 
 On Jun 26, 2012, at 2:43 PM, Pawel Jakub Dawidek wrote:
  
  As for sharing disk with other OS. If you share the disk with OS that
  doesn't support gmirror, you shouldn't use gmirror in the first place.
  You probably want to use only formats that are recognized by all your
  OSes.
 
 This statement is ridicuous by virtue of not being in touch with
 reality and by making gmirror useless for such wide range of cases
 that one can question why we have it at all.
 
 Put differently: a mirroring class is a fairly basic and useful thing
 to have. Limiting it's use is nothing but artificial and follows from
 having to use the underlying provider to store metadata. This then
 changes the view of the underlying providing to consumers above gmirror
 in a way that makes the presence or absence of gmirror visible.
 Solving the visibility problem makes gmirror useful all the time.
 I see that as a better way of looking at it than simply blurting out
 that you shouldn't use gmirror when certain awkward and artifical
 conditions apply.

I'm sorry, Marcel, but what you describe here has nothing to do with
reality. To be able to implement realiable mirroring you have to use
on-disk metadata. There is no way around that. You can implement
non-redundant GEOM classes without using on-disk metadata, but
out-of-band configuration in case of mirroring is simply naive. How do
you detect that components are out of sync, for example?

And when it comes to visablity. Are you suggesting that gmirror should
present entire underlying provider to upper layers? Including its
metadata? I hope not, because we went through that hell already
(remember skipping first 16 sectors by UFS, as BSDlabel metadata might
be there? The same for swap?).
I think I did pretty good job by making the metadata as simple as
possible - I use exactly one sector at the end of the target device.
I'm really having a hard time to think of a simpler format.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://tupytaj.pl


pgpHuBBkXk10K.pgp
Description: PGP signature


Re: [CFC/CFT] large changes in the loader(8) code

2012-06-26 Thread Pawel Jakub Dawidek
On Tue, Jun 26, 2012 at 04:50:36PM +0400, Andrey V. Elsukov wrote:
 Hi All,
 
 Some time ago i have started reading the code in the sys/boot.
 Especially i'm interested in the partition tables handling.
 I found several problems:
 1. There are several copies of the same code in the libi386/biosdisk.c
 and common/disk.c, and partially libpc98/biosdisk.c.
 2. ZFS probing is very slow, because the ZFS code doesn't know how many
 disks and partitions the system has:
   http://www.freebsd.org/cgi/query-pr.cgi?pr=148296
   http://www.freebsd.org/cgi/query-pr.cgi?pr=161897
 3. The GPT support doesn't check CRC and even doesn't know anything
 about the secondary GPT header/table.

Just a quick note here. At some point when I was adding GPT attributes
to allow for test starts I greatly improved, at least parts of, the GPT
implementation. I did implement support for both CRC checksum
verification and fallback to backup GPT header when primary is broken.
And the code is still in sys/boot/common/gpt.c. So my question would be
what do you mean by this sentence?

 So, i have created the branch and committed the changes:
   http://svnweb.freebsd.org/base/user/ae/bootcode/
 The patch is here:
   http://people.freebsd.org/~ae/boot.diff
 
 What i already did:
 1. The partition tables handling now is machine independent,
 and it is compatible with the kernel's GEOM_PART implementation.
 There is new API for disk drivers in the loader to get information
 about partitions and tables:
 common/Makefile.inc
   common/part.c
   common/part.h
 
 2. The similar and general code from the disk drivers merged in the
 disk.c:
 common/disk.c
 common/disk.h
 i386/libi386/libi386.h
 i386/libi386/biosdisk.c
 userboot/test/test.c
 userboot/userboot/userboot_disk.c
 userboot/userboot.h
 3. ZFS code now uses new API and probing on the systems with many disks
 should be greatly increased:
 zfs/zfs.c
 i386/loader/main.c
 4. The gptboot now searches the backup GPT header in the previous sectors,
 when it finds the GEOM:: signature in the last sector. PMBR code also
 tries to do the same:
 common/gpt.c
 i386/pmbr/pmbr.s
 
 5. Also the pmbr image now contains one fake partition record.
 When several first sectors are damaged the kernel can't detect GPT
 (see RECOVERING section in the gpart(8)). We can restore PMBR with dd(1)
 command, but the old pmbr image has an empty partition table and
 loader doesn't able to boot from GPT, when there is no partition record
 in the PMBR. Now it will be able. When pmbr is installed via 'gpart bootcode'
 command, the kernel correctly modifies this partition record. So, this is only
 for the first rescue step.
 
 6. I have changed userboot interface. I guess there is none consumers except
 the one test program. But if it isn't that, i can make it compatible.
 
 Any comments are welcome.
 
 -- 
 WBR, Andrey V. Elsukov
 
 



-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://tupytaj.pl


pgpiIPR0p9Pav.pgp
Description: PGP signature


Re: [CFC/CFT] large changes in the loader(8) code

2012-06-26 Thread Pawel Jakub Dawidek
On Tue, Jun 26, 2012 at 06:01:26PM +0400, Andrey V. Elsukov wrote:
 On 26.06.2012 16:57, Pawel Jakub Dawidek wrote:
  On Tue, Jun 26, 2012 at 04:50:36PM +0400, Andrey V. Elsukov wrote:
  Hi All,
 
  Some time ago i have started reading the code in the sys/boot.
  Especially i'm interested in the partition tables handling.
  I found several problems:
  1. There are several copies of the same code in the libi386/biosdisk.c
  and common/disk.c, and partially libpc98/biosdisk.c.
  2. ZFS probing is very slow, because the ZFS code doesn't know how many
  disks and partitions the system has:
 http://www.freebsd.org/cgi/query-pr.cgi?pr=148296
 http://www.freebsd.org/cgi/query-pr.cgi?pr=161897
  3. The GPT support doesn't check CRC and even doesn't know anything
  about the secondary GPT header/table.
  
  Just a quick note here. At some point when I was adding GPT attributes
  to allow for test starts I greatly improved, at least parts of, the GPT
  implementation. I did implement support for both CRC checksum
  verification and fallback to backup GPT header when primary is broken.
  And the code is still in sys/boot/common/gpt.c. So my question would be
  what do you mean by this sentence?
 
 Yes, gptboot does that, but the loader/zfsloader doesn't. So there might
 be a situation when gptboot does boot, but loader(8) can't.

I see. I don't know if I'll find time for a proper review, but it is
really great that you are working on cleaning up this huge mess.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://tupytaj.pl


pgpLgEysD3gTw.pgp
Description: PGP signature


Re: [CFC/CFT] large changes in the loader(8) code

2012-06-26 Thread Pawel Jakub Dawidek
On Tue, Jun 26, 2012 at 01:37:11PM -0400, John Baldwin wrote:
  4. The gptboot now searches the backup GPT header in the previous sectors,
  when it finds the GEOM:: signature in the last sector. PMBR code also
  tries to do the same:
  common/gpt.c
  i386/pmbr/pmbr.s
 
 GPT really wants the backup header at the last LBA.  I know you can set it, 
 but I've interpreted that as a way to see if the primary header is correct or 
 not. [...]

My interpretation is different: The way to verify if the header is valid
is to check its checksum, not to check if the backup header location in
the primary header points at the last LBA.

Of course if primary header's checksum is incorrect it is hard to trust
that the backup header location is correct. And we need the backup
header when the primary header is invalid...

 [...] It seems to me that GPT tables created in this fashion (inside a GEOM 
 provider) will not work properly with partition editors for other OS's.  I'm 
 hesitant to encourage the use of this as I do think putting GPT inside of a 
 gmirror violates the GPT spec.

I don't think so. Most common case is to configure partitions on top of
a mirror. Mirroring partitions is less common. Mostly because of
hardware RAIDs being popular. You don't expect hardware RAID vendor to
mirror partitions. Partition editors for other OS's won't work, but only
because they don't support gmirror. If they wouldn't recognize and
support some hardware (or pseudo-hardware) RAIDs there will be the same
problem.

In other words, IMHO, our problem is that FreeBSD's boot code doesn't
recognize/support gmirror's metadata. What Andrey is proposing is to
recognize the metadata and act accordingly - in case of a gmirror we
simply need to skip it.

In the future we will have the same problem with graid - until we add
support for it to the boot code, we won't be able to boot from it.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://tupytaj.pl


pgp3XvXHY46CU.pgp
Description: PGP signature


Re: [CFC/CFT] large changes in the loader(8) code

2012-06-26 Thread Pawel Jakub Dawidek
On Tue, Jun 26, 2012 at 02:41:31PM -0700, Kevin Oberman wrote:
 Long ago I saw a proposal to create a dedicated partition on GPT to
 hold the metadata. With the large number of partitions available on
 GPT, tying up one just for GEOM seems like a low price and it moves
 the device GEOM out of the realm of FreeBSD unique and subject to
 serious issues when/if a disk is shared with some other OS. I have
 seen little comment on this and have never seen any argument that that
 it could not work.
 
 I think this is an issue that will continue to bite users unless it is fixed.

I don't really see how dedicating a partition for metadata can work or
is good idea, sorry.

As for sharing disk with other OS. If you share the disk with OS that
doesn't support gmirror, you shouldn't use gmirror in the first place.
You probably want to use only formats that are recognized by all your
OSes.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://tupytaj.pl


pgpWHeMC9knsD.pgp
Description: PGP signature


Re: [RFC] last(1) with security.bsd.see_other_uids support

2012-06-06 Thread Pawel Jakub Dawidek
On Tue, Jun 05, 2012 at 11:31:01PM +0200, Jilles Tjoelker wrote:
 Also, the attack surface of such a daemon may be smaller than that of a
 setuid/setgid program.

Really? I don't see that. With current patch and setgid to utmp the
process can only read some files that don't even contain very sensitive
data (like passwords).

Any privileged daemon is much bigger threat. Also, do we really want a
daemon running all the time just to be able to parse utx files?

 Alternatively, the daemon could be a setgid program that is spawned by
 the utmpx APIs when needed.

Still seems a bit too far for my taste. Spawning a daemon somewhere from
within library doesn't sound like a good idea to me... At least until we
have something like launchd that can start such services on demand.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://tupytaj.pl


pgp5gmb6H1FxE.pgp
Description: PGP signature


Re: [RFC] last(1) with security.bsd.see_other_uids support

2012-06-04 Thread Pawel Jakub Dawidek
(_SC_NGROUPS_MAX);
if (ngroups_max == -1)
ngroups_max = NGROUPS_MAX;
ngroups_max++;

 + if ((groups = malloc(sizeof(gid_t) * (ngroups_max))) == NULL)
 + err(1, malloc);

When this goes into library you has to return an error here.

 + ngroups = ngroups_max;
 + (void) getgrouplist(pw-pw_name, pw-pw_gid, groups, ngroups);

You know that getgrouplist(3) returns groups from the system files and
not actuall process groups? Was that intended? IMHO you should use
getgroups(2) here. And again you ignore return value.

 + for (cnt = 0; cnt  ngroups; ++cnt) {
 + gid = groups[cnt];
 + group = getgrgid(gid);
 + /* User is in utmp or wheel group, they can see all */
 + if (strncmp(utmp, group-gr_name, 4) == 0 || 
 strncmp(wheel,
 group-gr_name, 5) == 0) {

strncmp(3) is bad idea here. If the user is a member of utmpfoo group or
wheelx group you turn off restrictions.

I'd really use getgroups(2) and look for GID_WHEEL or _UTMP_GID.

 @@ -212,7 +255,30 @@ struct idtab {
   /* Load the last entries from the file. */
   if (setutxdb(UTXDB_LOG, file) != 0)
   err(1, %s, file);
 +
 + /* drop setgid now that the db is open */

Style: Sentence should start with capital letter and end with a period.

 + setgid(getgid());

And if setgid(2) fails?

 + /* Lookup current user information */

Style: Sentence should end with a period.

 + pw = getpwuid(getuid());

And if getpwuid(3) fails?

 + len = sizeof(see_other_uids);
 + if (sysctlbyname(security.bsd.see_other_uids, see_other_uids, len,
 NULL, 0))

sysctlbyname(3) doesn't return bool.

 + see_other_uids = 0;
 + restricted = is_user_restricted(pw, see_other_uids);
 +
   while ((ut = getutxent()) != NULL) {
 + /* Skip this entry if the invoking user is not permitted
 +  * to see it */
 + if (restricted 
 + !(ut-ut_type == BOOT_TIME ||
 + ut-ut_type == SHUTDOWN_TIME ||
 + ut-ut_type == OLD_TIME ||
 + ut-ut_type == NEW_TIME ||
 + ut-ut_type == INIT_PROCESS) 
 + strncmp(ut-ut_user, pw-pw_name, sizeof(ut-ut_user)))

That's one complex if. And again strncmp(3) used instead of strcmp(3).
Also strncmp(3) doesn't return bool. If getpwuid(3) failed earlier you
have NULL pointer dereference here.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://tupytaj.pl


pgp2ssyiX0rVx.pgp
Description: PGP signature


Re: NFS mount inside jail fails

2011-05-18 Thread Pawel Jakub Dawidek
On Tue, May 17, 2011 at 10:17:12PM +0200, Alexander Leidinger wrote:
 On Tue, 17 May 2011 12:56:40 -0700 Sean Bruno sean...@yahoo-inc.com
 wrote:
 
  Silly thing I ran into today.  User wanted to NFS mount a dir inside a
  jail.  After I groaned about the security implication of this, I noted
  that there is a sysctl that looks like it should allow this.  Namely,
  security.jail.mount_allowed.  I noted that setting this follows a path
  that *should* have allowed this silly thing to happen, except that the
  credentials in the nfsclient were not setup correctly.
 
 As you noticed, this is supposed to allow to mount inside a jail, IF
 the FS you want to mount is marked as secure/safe to do so. Nearly no
 FS is marked as such, as nobody wants to guarantee that it is safe
 (root in a jail should not be able to panic a system by trying to
 mount a corrupt/malicious FS-image) and secure (not possible to get
 elevated access/privileges).
 
 For NFS there is theoretically the problem that the outgoing address on
 requests could be the one of the physical host instead of the IP of the
 jail. If this is true in practice, I do not know. This could be
 the reason why NFS is not marked with VFCF_JAIL.

It is not marked with VFCF_JAIL, because I just had no time to audit
that it is safe. It might be safe in theory.

There are some file systems types that can't be securely mounted within
a jail no matter what, like UFS, MSDOFS, EXTFS, XFS, REISERFS, NTFS,
etc.  because the user mounting it has access to raw storage and can
corrupt it in a way that it will panic entire system.

There are other file systems that don't require access to raw storage
for the user doing the mount and chances are they are safe to mount from
within a jail, like ZFS (user can have access to ZFS datasets, but don't
need access to ZFS pool), NFS, SMBFS, NULLFS, UNIONFS, PROCFS, FDESCFS,
etc. I added VFCF_JAIL flag, so there is general mechanism to mark file
systems as jail-friendly, but back then I only needed it for ZFS.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://yomoli.com


pgpsb21WHbLTg.pgp
Description: PGP signature


Re: Add SUM sysctl

2011-04-18 Thread Pawel Jakub Dawidek
On Mon, Apr 18, 2011 at 08:24:57AM -0400, John Baldwin wrote:
 On Saturday, April 16, 2011 10:24:44 am rank1see...@gmail.com wrote:
  After compilation of kernel and world in MUM, kernel is installed in MUM, 
  but to install world, we reboot into SUM, then install world. (HANDBOOK)
  Now, in case of GELI usage AND if upgrading is taking place, i.e; 8.2 - 
  8.3, once you reboot into SUM to install world, you are doomed, BECAUSE 
  ...
  Kernel will bitch (GELI part), about world-kernel mismatch and you won't 
  be able to install world as you cant decrypt geom providers!!
  The only way to save yourself in that case is to restore /boot/kernel.old, 
  or one is doomed.
 
 This seems broken to me.  An 8.3 kernel+modules should be able to handle GELI 
 devices with an 8.2 world.  If they can't, it means someone broke the ABI.  
 Even a 9.0 kernel should work fine with an 8.x-stable world.

This is generally not expected to have a bit of the system encrypted.
You either have whole root encrypted and there is no userland involved
to attach it or you have some secure partition encrypted.
I don't fully understand how you can boot your system and then need to
attach GELI provider to be able to install world. If you booted fine
then your system is available and not encrypted.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://yomoli.com


pgptHXwgaj9Ei.pgp
Description: PGP signature


Re: looking for error codes

2011-04-03 Thread Pawel Jakub Dawidek
On Fri, Apr 01, 2011 at 06:18:54PM +0300, Andriy Gapon wrote:
 on 01/04/2011 18:04 Andrew Duane said the following:
  AFAIK, FreeBSD does not really detect read-only media. This was something I 
  had to add as a small project here at work, and was considering cleaning up 
  to try to get into CURRENT. If there's a real need for it, I could speed 
  that up.
  
 
 Yes, that's exactly the problem that I am looking at.
 So if you have anything to share it will be greatly appreciated at least by 
 me.
 But I think many more people could benefit from it (e.g. those having 
 SD/SDHC/etc
 cards).

Once you detect read-only media, I suggest to implement the support by
adding new DISKFLAG_READONLY to disk(9) API and simply deny write access
in g_disk_access() when DISKFLAG_READONLY is set.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://yomoli.com


pgppOnSwo9rXW.pgp
Description: PGP signature


Re: glabel force sectorsize patch

2010-08-08 Thread Pawel Jakub Dawidek
On Sun, Aug 08, 2010 at 03:57:44AM +0200, Ivan Voras wrote:
 Hi,
 
 In order to help users having 4k sector drives which the system
 recognizes as 512 byte sector drives, I'm proposing a patch to glabel
 which enables it to use a forced sector size for its native-labeled
 providers. It is naturally only usable with glabel-native labels
 (those created by glabel label) and not partition and file system
 labels because we cannot add arbitrary new fields to metadata of those
 types.
 
 The patch is here:
 
 http://people.freebsd.org/~ivoras/diffs/glabel_ssize.patch
[...]
 This mechanism is a band-aid until there's a better way of dealing
 with 4k drives.

So why do you want to obfuscate glabel with it? For people to start
depend on it? Once we start supporting 4kB sectors what do we do with
such a change? Remove it and decrease version number? What people will
do with providers already labeled this way?

If its temporary, just allow to list providers you want to increase
sector size in /boot/loader.conf. Once we start supporting it properly
people might simply remove it from loader.conf and it should just work.

Glabel is not for that and I don't agree for such obfuscation.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgp9g74Rergrb.pgp
Description: PGP signature


Re: glabel force sectorsize patch

2010-08-08 Thread Pawel Jakub Dawidek
On Sun, Aug 08, 2010 at 02:02:17PM +0200, Ivan Voras wrote:
 On 8.8.2010 12:30, Pawel Jakub Dawidek wrote:
  So why do you want to obfuscate glabel with it? For people to start
  depend on it? Once we start supporting 4kB sectors what do we do with
  such a change? Remove it and decrease version number? What people will
  do with providers already labeled this way?
  
  If its temporary, just allow to list providers you want to increase
  sector size in /boot/loader.conf. Once we start supporting it properly
  people might simply remove it from loader.conf and it should just work.
  
  Glabel is not for that and I don't agree for such obfuscation.
 
 Of course, there are good and bad sides to it. My take on it is that the
 only bad side is that it really isn't glabel's primary function to
 (optionally) fixup geometry, while the good sides are:

It isn't its secondary function either.

 * glabel is in GENERIC and judging by the mailing lists' traffic it is
 one of the better used parts of the system so people are familiar with
 it. It is also already used as a perfectly valid fixup for device
 renaming, making both UFS and ZFS more stable for usage.

That's an excellent argument. But you know what? The em(4) is also in
GENERIC, why not to add it in there?

 * You can't really make people depend on glabel both because it is in
 GENERIC and because of it storing metadata in the last sector, making
 the rest of the drive completely usable without it in the event native
 4k sector support is grown.

I never said that. I do want people to depend on glabel, because it is
free of such ugly hacks, so I know it won't bite them in the future.

I don't want people to start depend on the fact that glabel supports
changing sector sizes.

Once we start supporting 4kB sectors properly people configuration will
stop working, because glabel won't be able to read its metadata anymore.
Your hack will break all configurations that started to depend on your
hack. In what I proposed, GEOM provider will be presented to glabel (or
any other GEOM class) as 4kB provider and everything will just work,
also after adding proper support for 4kB sectors.

 I'd like to hear comments from the wider audience. In respect with your
 comment, I will compromise: as 4k sector drives have become available
 over the counter more than 6 months ago and so far I think this is the
 first effort to give some support for them, I will commit this patch
 before 9.0 code freeze only if no other support gets developed.

I'll repeat. You won't commit this patch, because it is totally wrong
solution and can only do a lot of damage in the future.
If you look forward, even temporary solutions can be done right.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgp8NDzCMjfAM.pgp
Description: PGP signature


Re: glabel force sectorsize patch

2010-08-08 Thread Pawel Jakub Dawidek
On Sun, Aug 08, 2010 at 02:57:20PM +0200, Marius Nünnerich wrote:
 On Sun, Aug 8, 2010 at 14:02, Ivan Voras ivo...@freebsd.org wrote:
  I'd like to hear comments from the wider audience. In respect with your
  comment, I will compromise: as 4k sector drives have become available
  over the counter more than 6 months ago and so far I think this is the
  first effort to give some support for them, I will commit this patch
  before 9.0 code freeze only if no other support gets developed.
 
 I do not like this at all. Even if it's just for the KISS and POLA
 principles. A geom should do one thing and do it right imo.
 Why not write a new geom class that does what you want?

New GEOM class only for sectorsize conversion that can operate on
metadata will be useful, not only to solve this particular problem.
Although keep in mind that if at some point disks will be detected and
presented as 4kB providers to the GEOM, this class won't be able to find
its metadata anymore (as it was stored in the last 512 bytes, not in the
last 4 kilobytes).

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpLef8MwEhAp.pgp
Description: PGP signature


Re: GEOM_ULZMA

2010-03-02 Thread Pawel Jakub Dawidek
On Tue, Mar 02, 2010 at 08:32:20PM +0100, Dimitry Andric wrote:
 On 2010-03-02 09:47, Alexandr Rybalko wrote:
 Definiatelly separately, not sure where. There is ongoing discussion
 somwhere on importing this algorithm to the base for tar(1) to use, it
 would be best to have only one copy of code in the tree.
 I have already said, that it would be good for embedded platforms have 
 only one copy of the code for the kernel and userland.
 It is not thought of how done it.
 
 I think Pawel means the *source* code in this case, not the executable
 code.  E.g. lzma source should most likely go under /usr/src/contrib,
 and be built separately for kernel and userland.

If it is going to be used be the kernel it has to be under sys/.

And yes, I was talking about one copy of the source, not executable.
I think it would be bad idea to do compression in the kernel for
userland applications for many reasons - the most important one is
security. Look at projects like Capsicum where Robert closed for example
gzip in a tight sandbox and gzip is not even set-uid and giving it
chance to gain kernel access when bug is found is very, very bad.
Another reason is performance. You can see how much faster, eg. openssl
crypto is when doing it in userland and when forcing it to use software
crypto from the opencrypto kernel framework.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpechSxEvBZl.pgp
Description: PGP signature


Re: GEOM_ULZMA

2010-03-01 Thread Pawel Jakub Dawidek
On Fri, Feb 19, 2010 at 04:36:44PM +0200, Alexandr Rybalko wrote:
 Hi,
 I wrote a module GEOM_ULZMA (such as GEOM_UZIP, but compression with lzma), 
 [...]

Wouldn't it be better to modify geom_uzip to be universal decompression
class with various algorithms implemented as plugins?
This is bascially what I did for the LABEL class - before we had VOL_FFS
class only for UFS labels.

 [...] in connection with this is an issue best left lzma
 code in the file geom_ulzma.c or store lzma library separately. If 
 separately, then where better?

Definiatelly separately, not sure where. There is ongoing discussion
somwhere on importing this algorithm to the base for tar(1) to use, it
would be best to have only one copy of code in the tree.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpXqmkopDvFy.pgp
Description: PGP signature


Re: Deadlock between GEOM and devfs device destroy and process exit.

2010-02-01 Thread Pawel Jakub Dawidek
On Sat, Jan 30, 2010 at 12:44:51PM +0100, Pawel Jakub Dawidek wrote:
 Maybe I'll add how I understand what's going on:
 
 GEOM calls destroy_dev() while holding the topology lock.
 
 Destroy_dev() wants to destroy device, but can't because there are
 threads that still have it open.
 
 The threads can't close it, because to close it they need the topology
 lock.
 
 The deadlock is quite obvious, IMHO.

Guys, changing destroy_dev() to destroy_dev_sched() in geom_dev.c fixes
the problem for me (at least it makes race window so small that I can't
reproduce it). Is there anyone who isn't happy with such a change?

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpAaq49ZfpjR.pgp
Description: PGP signature


Re: Deadlock between GEOM and devfs device destroy and process exit.

2010-01-30 Thread Pawel Jakub Dawidek
On Sat, Jan 30, 2010 at 12:58:26AM +0200, Alexander Motin wrote:
 Hi.
 
 Experimenting with SATA hot-plug I've found quite repeatable deadlock
 case. Problem observed when several SATA devices, opened via devfs,
 disappear at exactly same time. In my case, at time of unplugging SATA
 Port Multiplier with several disks beyond it. All I have to do is to run
 several `dd if=/dev/adaX of=/dev/null bs=1m ` commands and unplug
 multiplier. That causes predictable I/O errors and devices destruction.
 But with high probability several dd processes getting stuck in kernel.
[...]

I observed the same thing yesterday while stress-testing HAST:

 3659  2504  3659 0  DE+ GEOM top 0x8079a348 dd
 3658  2102  2102 0  DE+ GEOM top 0x8079a348 hastd
2 0 0 0  DL  devdrn   0x85b1bc68 [g_event]

Both dd(1) and hastd(8) wait for the GEOM topology lock in the exit path,
which is already held by the g_event thread.

Interesting backtraces:

db bt 2
[...]
_sleep(85b1bc68,8079aab8,4c,80711ab3,64,...) at _sleep+0x339
destroy_devl(5,0,80711c53,85b1bcb0,804945cd,...) at destroy_devl+0x20f
destroy_dev(86a10a00,8070ea93,86a09800,860888e0,0,...) at destroy_dev+0x2f
g_dev_orphan(86a09800,8070f424,871038d8,90,6,...) at g_dev_orphan+0x6d
g_run_events(8079a378,0,4c,8070c221,64,...) at g_run_events+0x1c0
g_event_procbody(0,85b1bd38,80713228,343,85d0b7f8,...) at g_event_procbody+0x8a
[...]

db bt 3658
[...]
sleepq_wait(8079a348,0,8070f822,3,0,...) at sleepq_wait+0x63
_sx_xlock_hard(8079a348,86974240,0,8070ea66,c8,...) at _sx_xlock_hard+0x496
_sx_xlock(8079a348,0,8070ea66,c8,2000,...) at _sx_xlock+0xc0
g_dev_close(85f8ee00,4003,2000,86974240,86974240,...) at g_dev_close+0xbd
devfs_close(dc49eaac,80745707,8,8,868be984,...) at devfs_close+0x2b2
VOP_CLOSE_APV(80753ac0,dc49eaac,80726500,128,2,...) at VOP_CLOSE_APV+0xc5
vn_close(868be984,4003,85fd5500,86974240,0,...) at vn_close+0x190
vn_closefile(86a20968,86974240,86a20968,0,dc49eb5c,...) at vn_closefile+0xe4
devfs_close_f(86a20968,86974240,0,0,86a20968,...) at devfs_close_f+0x2b
_fdrop(86a20968,86974240,14,80719d1a,0,dc49eb98,1,86975000,8635c22c,8635c22c,721,8071264b,dc49ebb8,804f87d0,8635c22c,8,8071264b,721)
 at _fdrop+0x43
closef(86a20968,86974240,721,71e,869742e4,...) at closef+0x290
fdfree(86974240,0,80712fdd,107,864c4330,...) at fdfree+0x3ea
exit1(86974240,0,dc49ed2c,806d830a,86974240,...) at exit1+0x513
sys_exit(86974240,dc49ecf8,86974240,dc49ed2c,202,...) at sys_exit+0x1d
[...]
db bt 3659
[...]
sleepq_wait(8079a348,0,8070f822,3,0,...) at sleepq_wait+0x63
_sx_xlock_hard(8079a348,863e06c0,0,8070ea66,c8,...) at _sx_xlock_hard+0x496
_sx_xlock(8079a348,0,8070ea66,c8,2000,...) at _sx_xlock+0xc0
g_dev_close(86a10a00,3,2000,863e06c0,863e06c0,...) at g_dev_close+0xbd
devfs_close(dc4f6aac,80745707,8,8,86aa6c3c,...) at devfs_close+0x2b2
VOP_CLOSE_APV(80753ac0,dc4f6aac,80726500,128,2,...) at VOP_CLOSE_APV+0xc5
vn_close(86aa6c3c,3,870d4080,863e06c0,80cbac08,...) at vn_close+0x190
vn_closefile(871028f8,863e06c0,871028f8,0,dc4f6b5c,...) at vn_closefile+0xe4
devfs_close_f(871028f8,863e06c0,0,0,871028f8,...) at devfs_close_f+0x2b
_fdrop(871028f8,863e06c0,8071809c,40e,0,805354ab,8071809c,8071df19,8635d42c,8635d42c,721,8071264b,dc4f6bb8,804f87d0,8635d42c,8,8071264b,721)
 at _fdrop+0x43
closef(871028f8,863e06c0,721,71e,863e0764,...) at closef+0x290
fdfree(863e06c0,0,80712fdd,107,86153088,...) at fdfree+0x3ea
exit1(863e06c0,100,dc4f6d2c,806d830a,863e06c0,...) at exit1+0x513
sys_exit(863e06c0,dc4f6cf8,863e06c0,dc4f6d2c,202,...) at sys_exit+0x1d
[...]
db show lock 0x8079a348
 class: sx
 name: GEOM topology
 state: XLOCK: 0x85d0d000 (tid 18, pid 2, g_event)
 waiters: exclusive

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpJLOrEZhLnv.pgp
Description: PGP signature


Re: Deadlock between GEOM and devfs device destroy and process exit.

2010-01-30 Thread Pawel Jakub Dawidek
On Sat, Jan 30, 2010 at 12:27:49PM +0100, Pawel Jakub Dawidek wrote:
 On Sat, Jan 30, 2010 at 12:58:26AM +0200, Alexander Motin wrote:
  Hi.
  
  Experimenting with SATA hot-plug I've found quite repeatable deadlock
  case. Problem observed when several SATA devices, opened via devfs,
  disappear at exactly same time. In my case, at time of unplugging SATA
  Port Multiplier with several disks beyond it. All I have to do is to run
  several `dd if=/dev/adaX of=/dev/null bs=1m ` commands and unplug
  multiplier. That causes predictable I/O errors and devices destruction.
  But with high probability several dd processes getting stuck in kernel.
 [...]
 
 I observed the same thing yesterday while stress-testing HAST:
 
  3659  2504  3659 0  DE+ GEOM top 0x8079a348 dd
  3658  2102  2102 0  DE+ GEOM top 0x8079a348 hastd
 2 0 0 0  DL  devdrn   0x85b1bc68 [g_event]
 
 Both dd(1) and hastd(8) wait for the GEOM topology lock in the exit path,
 which is already held by the g_event thread.

Maybe I'll add how I understand what's going on:

GEOM calls destroy_dev() while holding the topology lock.

Destroy_dev() wants to destroy device, but can't because there are
threads that still have it open.

The threads can't close it, because to close it they need the topology
lock.

The deadlock is quite obvious, IMHO.

I believe the problem could be solved by dropping the topology lock in
g_dev_orphan() when calling destroy_dev(dev), but it is hard to say if
it is safe to drop the topology lock there. Maybe Poul-Henning could
take a look.

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpInlBIRuozT.pgp
Description: PGP signature


Re: ZFS group ownership

2009-09-22 Thread Pawel Jakub Dawidek
On Sat, Sep 12, 2009 at 01:49:36PM +0200, Giulio Ferro wrote:
[...]
 Now I try to do the same on a zfs partition on the same machine
 This is what I see with ls
 ---
 ls -la
 total 4
 drwxrwx---  3 www www 4 Sep 12 13:43 .
 drwxr-xr-x  4 rootwheel   4 Sep 12 13:43 ..
 drwxrwx---  2 gferro  gferro  2 Sep 12 13:43 asda
 -rw-rw  1 gferro  gferro  0 Sep 12 13:43 qweq
 ---
 
 As you can see, both file and directory belongs now to gferro and
 not www. This means that other users won't even be able to read
 my files / dir, let alone modify them.
 
 What I ask now is: is this a bug or a feature?

This is a bug. I changed default ZFS behaviour (which is SYSV) to match
BSD behaviour (ie. inherit group ownership from the parent directory),
but it become broken during v6 - v13 switch. Could you file PR for
this, I should be able to fix it before 8.0-RELEASE.

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpoGUKsyza8F.pgp
Description: PGP signature


Re: sosend() and mbuf

2009-08-04 Thread Pawel Jakub Dawidek
On Mon, Aug 03, 2009 at 09:25:27PM +, Maslan wrote:
 No my code doesn't work, I thought it may be because that soaccept()
 -which is not found in man 9- is non-blocking, so i've to put my code
 in a thread.
 Now i got another problem, when I open a text file from this thread,
 the kernel crashes, I'm sure that its the thread.
 
 kthread_create((void *)thread_main, NULL, NULL, RFNOWAIT, 0, thread);
 
 void thread_main(){
   struct thread *td = curthread;
   int ret;
   int fd;
   ret = f_open(/path/to/file.txt, fd);
   printf(%d\n, ret);
   tsleep(td, PDROP, test tsleep, 10*hz);
 f_close(fd);
   kthread_exit(0);
 }
 
 int f_open(char *filename, int *fd){
   struct thread *td = curthread;
   int ret = kern_open(td, filename, UIO_SYSSPACE, O_RDONLY, FREAD);
   if(!ret){
   *fd = td-td_retval[0];
   return 1;
   }
   return 0;
 }
 
 I've to finish up this problem to go back for the first one.
 Can you figure out what's wrong with this code, it works when I call
 thread_main() rather than kthread_create((void *)thread_main, .

When you did kern_open() without creating kernel thread, it worked,
because kern_open() used file descriptor table from your current
(userland) process. In FreeBSD 7.x kthread_create() creates a process
without file descriptor table, so you can't use kern_open() and actually
you shouldn't do this either.

Take a look at sys/cddl/compat/opensolaris/kern/opensolaris_kobj.c,
where you can find functions to do what you want.

I guess you already considered doing all this in userland?:)

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpwbxTvE99e9.pgp
Description: PGP signature


Linker deadlock.

2008-08-03 Thread Pawel Jakub Dawidek
Hi.

Linker can easly deadlock when we try to load the same kernel module
from two processes at the same time. This is because we drop kld_sx in
linker_load_file() and reacquire it, which leads to LOR, because we
already held vnode lock at this point. Interesing backtraces below.

First process:

db tr 3066
Tracing pid 3066 tid 100090 td 0x8514b240
sched_switch(8514b240,0,104,177,bb6bbb2e,...) at sched_switch+0x40e
mi_switch(104,0,80681605,1ca,0,...) at mi_switch+0x200
sleepq_switch(8514b240,0,80681605,237,80a281ec,...) at sleepq_switch+0x14d
sleepq_wait(80a281ec,0,8067a18b,3,0,...) at sleepq_wait+0x63
_sx_xlock_hard(80a281ec,8514b240,0,8067a1cf,1a0,...) at _sx_xlock_hard+0x2c6
_sx_xlock(80a281ec,0,8067a1cf,1a0,0,...) at _sx_xlock+0x99
linker_load_module(853a1264,0,83ba8940,83ba893c,83ba8938,...) at 
linker_load_module+0xa4a
linker_load_dependencies(84fb8500,bb74,8539f000,2adc,156000,...) at 
linker_load_dependencies+0x194
link_elf_load_file(806b74e0,8557e4c0,83ba8c24,17c,0,...) at 
link_elf_load_file+0x4f0
linker_load_module(0,83ba8c4c,8067a1cf,3cd,280cb730,...) at 
linker_load_module+0x8db
kern_kldload(8514b240,8592d400,83ba8c70,0,b395eb11,...) at kern_kldload+0xc8
[...]
db show lock 0x80a281ec
 class: sx
 name: kernel linker
 state: XLOCK: 0x8514bd80 (tid 100117, pid 3065, zpool)
 waiters: exclusive

Second process:

db tr 3065
Tracing pid 3065 tid 100117 td 0x8514bd80
sched_switch(8514bd80,0,104,177,bb7e358b,...) at sched_switch+0x40e
mi_switch(104,0,80681605,1ca,50,...) at mi_switch+0x200
sleepq_switch(8514bd80,0,80681605,237,8523d9c0,...) at sleepq_switch+0x14d
sleepq_wait(8523d9c0,50,806906bb,4,0,...) at sleepq_wait+0x63
__lockmgr_args(8523d9c0,80100,8523da28,0,0,...) at __lockmgr_args+0x9a5
vop_stdlock(83bd2660,8508aa80,2,80100,8523d968,...) at vop_stdlock+0x65
VOP_LOCK1_APV(806c3560,83bd2660,806d2ac0,8523d968,80100,...) at 
VOP_LOCK1_APV+0xa5
_vn_lock(8523d968,80100,8068815b,802,804c9cb4,...) at _vn_lock+0x5e
vget(8523d968,80100,8514bd80,1b7,8065d00f,...) at vget+0xc9
cache_lookup(85090158,83bd2a00,83bd2a14,0,84f3b400,...) at cache_lookup+0x4c2
nfs_lookup(83bd2838,80688e43,806d2720,8,85090158,...) at nfs_lookup+0x101
VOP_LOOKUP_APV(806c3560,83bd2838,8068783d,1bd,83bd2a00,...) at 
VOP_LOOKUP_APV+0xe5
lookup(83bd29e8,8068783d,e0,c0,8506e52c,...) at lookup+0x52e
namei(83bd29e8,81159a38,80a352b4,4,8067be1f,...) at namei+0x48b
vn_open_cred(83bd29e8,83bd2a4c,0,84f3b400,0,...) at vn_open_cred+0x2ba
vn_open(83bd29e8,83bd2a4c,0,0,806b2a00,...) at vn_open+0x33
linker_lookup_file(3,0,3,8514bd80,0,...) at linker_lookup_file+0x163
linker_load_module(0,83bd2c4c,8067a1cf,3cd,280cb730,...) at 
linker_load_module+0x7bd
kern_kldload(8514bd80,85a7e400,83bd2c70,0,b395eb11,...) at kern_kldload+0xc8
[...]
db show vnode 0x8523d968
vnode 0x8523d968: tag nfs, type VREG
usecount 1, writecount 0, refcount 189 mountedhere 0
flags ()
v_object 0x852489b0 ref 0 pages 372
 lock type nfs: EXCL by thread 0x8514b240 (pid 3066)
 with exclusive waiters pending
#0 0x804c2e5d at __lockmgr_args+0xa6d
#1 0x80546c85 at vop_stdlock+0x65
#2 0x8065dcd5 at VOP_LOCK1_APV+0xa5
#3 0x805627ee at _vn_lock+0x5e
#4 0x80557419 at vget+0xc9
#5 0x805444b2 at cache_lookup+0x4c2
#6 0x805c3b51 at nfs_lookup+0x101
#7 0x8065ee65 at VOP_LOOKUP_APV+0xe5
#8 0x8054a9be at lookup+0x52e
#9 0x8054b5eb at namei+0x48b
#10 0x805621da at vn_open_cred+0x2ba
#11 0x80562463 at vn_open+0x33
#12 0x804f45e8 at link_elf_load_file+0x68
#13 0x804c0f9b at linker_load_module+0x8db
#14 0x804c1568 at kern_kldload+0xc8
#15 0x804c1624 at kldload+0x74
#16 0x80650513 at syscall+0x283
#17 0x80634e40 at Xint0x80_syscall+0x20
[...]

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
[EMAIL PROTECTED]   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgp7AAbGMNK4D.pgp
Description: PGP signature


Re: Linker deadlock.

2008-08-03 Thread Pawel Jakub Dawidek
On Sun, Aug 03, 2008 at 02:09:26PM +0300, Kostik Belousov wrote:
 Source line backtraces would be nicer, since gcc inliner forces me to make
 a guess. It seems that linker_load_module() calls linker_load_file()
 that drops and reaquires the linker lock.
 
 Then, it seems that dropping the module' vnode lock around the call to
 linker_load_dependencies() should help.

Yes, it doesn't deadlock now, thanks!

 diff --git a/sys/kern/link_elf.c b/sys/kern/link_elf.c
 index 2664ba9..52b3f8f 100644
 --- a/sys/kern/link_elf.c
 +++ b/sys/kern/link_elf.c
 @@ -802,7 +802,9 @@ link_elf_load_file(linker_class_t cls, const char* 
 filename,
   goto out;
  link_elf_reloc_local(lf);
  
 +VOP_UNLOCK(nd.ni_vp, 0);
  error = linker_load_dependencies(lf);
 +vn_lock(nd.ni_vp, LK_EXCLUSIVE | LK_RETRY);
  if (error)
   goto out;
  #if 0/* this will be more trouble than it's worth for now */
 diff --git a/sys/kern/link_elf_obj.c b/sys/kern/link_elf_obj.c
 index d8e9219..657dd0e 100644
 --- a/sys/kern/link_elf_obj.c
 +++ b/sys/kern/link_elf_obj.c
 @@ -798,7 +798,9 @@ link_elf_load_file(linker_class_t cls, const char 
 *filename,
   link_elf_reloc_local(lf);
  
   /* Pull in dependencies */
 + VOP_UNLOCK(nd.ni_vp);
   error = linker_load_dependencies(lf);
 + vn_lock(nd.ni_vp, LK_EXCLUSIVE | LK_RETRY);
   if (error)
   goto out;

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
[EMAIL PROTECTED]   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpbFGLaCqhgl.pgp
Description: PGP signature


Re: crypto(9) and maxoplen

2008-07-21 Thread Pawel Jakub Dawidek
On Mon, Jul 21, 2008 at 02:10:00PM +0200, Patrick Lamaizi?re wrote:
 Le Sun, 20 Jul 2008 21:39:55 +0200,
 Pawel Jakub Dawidek [EMAIL PROTECTED] a écrit :
 
 Hello,
 
   In the opencrypto framework the function crypto_register() has an
   argument 'maxoplen'.
   
   http://fxr.watson.org/fxr/source/opencrypto/crypto.c#L625
   
   Does somebody know what was the goal of this parameter? It is not
   used by the framework.
   
   The man page of crypto(9) says :
   For each algorithm the driver supports, it must then call
   crypto_register(). The first two arguments are the driver and
   algorithm identifiers.  The next two arguments specify the largest
   possible operator length (in bits, important for public key
   operations) and flags for this algorithm.
   
   I'm asking if it can help for this problem: the glxsb driver can
   perform AES-CBC algorithm only with 128 bits key and may be
   'maxoplen' was intended for this case. 
   
   Without something to specify the key's length, the driver is
   selected by the framework even with keys != 128 bits. So it fails
   when the session is opened. This prevents setkey/ipsec to work with
   key length != 128 bits if the driver is loaded.
  
  If I read code properly, there is currently no way for a driver to say
  to the opencrypto framework that only AES-CBC with 128bit key is
  supported. A driver can only state that it supports AES-CBC, that's
  all. As a workaround the driver should implement AES-CBC-192 and
  AES-CBC-256 in software.
 
 Yes, but my question is about the maxoplen parameter. Was it intended
 for this case? Why we keep this parameter?

Can't help here, no idea. Eventhough it isn't something I'd like to see
implemented. 'maxoplen' is just a little better than what we have now.
And what if a driver supports 192 or 256 bits only?

 IMHO, It is far easier to hack the OCF to use this parameter than
 to implement a workaround. It would be a better solution, by
 sample we may want to use the driver for AES-128 and another
 hardware that provides AES 192/256.
 
 Another (the best?) solution would be for the crypto framework to select
 another driver if the driver's newsession() fails.

There are many improvements that could be done in opencrypto framework,
believe me. One of the things that annoys me a lot is that if you want
to use IPsec with a driver that support only encryption, you have to
implement hash functions in software for the given driver.

Feel free to work on this, but be sure to avoid solutions like this
maxoplen thing, which bascially isn't really a step further. Choosing
another driver on newsession failure sounds reasonable, although we may
lose informations like 'the caller wanted hardware crypto only'.

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
[EMAIL PROTECTED]   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpXFV2jtyaNI.pgp
Description: PGP signature


Re: crypto(9) and maxoplen

2008-07-20 Thread Pawel Jakub Dawidek
On Sat, Jul 19, 2008 at 12:58:13AM +0200, Patrick Lamaizi?re wrote:
 Hello,
 
 In the opencrypto framework the function crypto_register() has an
 argument 'maxoplen'.
 
 http://fxr.watson.org/fxr/source/opencrypto/crypto.c#L625
 
 Does somebody know what was the goal of this parameter? It is not used
 by the framework.
 
 The man page of crypto(9) says :
 For each algorithm the driver supports, it must then call
 crypto_register(). The first two arguments are the driver and algorithm
 identifiers.  The next two arguments specify the largest possible
 operator length (in bits, important for public key operations) and
 flags for this algorithm.
 
 I'm asking if it can help for this problem: the glxsb driver can
 perform AES-CBC algorithm only with 128 bits key and may be 'maxoplen'
 was intended for this case. 
 
 Without something to specify the key's length, the driver is selected
 by the framework even with keys != 128 bits. So it fails when the
 session is opened. This prevents setkey/ipsec to work with key
 length != 128 bits if the driver is loaded.

If I read code properly, there is currently no way for a driver to say
to the opencrypto framework that only AES-CBC with 128bit key is
supported. A driver can only state that it supports AES-CBC, that's all.
As a workaround the driver should implement AES-CBC-192 and AES-CBC-256
in software.

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
[EMAIL PROTECTED]   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpGJl1Bb23wd.pgp
Description: PGP signature


Re: Is there any way to increase the KVM?

2008-06-07 Thread Pawel Jakub Dawidek
On Thu, Jun 05, 2008 at 04:00:13PM +0200, Ivan Voras wrote:
 Pawel Jakub Dawidek wrote:
 
  If we're comparing who has bigger... :)
  
  beast:root:~# zpool list
  NAMESIZEUSED   AVAILCAP  HEALTH ALTROOT
  tank732G604G128G82%  ONLINE -
  
  but:
  
  beast:root:~# zfs list | wc -l
  1932
  
  No panics.
  
  PS. I'm quite sure the ZFS version I've in perforce will fix most if not
  all 'kmem_map too small' panics. It's not yet committed, but I do want
  to MFC it into RELENG_7.
 
 At the risk of sounding repetitive, can you try a simple test on your
 ZFS pools, to see if you can panic the kernel? Do this:
 
 * install blogbench and bonnie++ from ports/benchmarks
 * run:
   blogbench -c 100 -d . -i 30 -r 50 -W 10 -w 10
   bonnie++ -d . -s 16G -n 80
   in parallel, until completion or crash. It shouldn't take too long to
 complete the above benchmarks, so you probably won't invest too much
 time in it even if it doesn't crash.

Both completed successfully (i386, 1GB of RAM, dual core CPU).

Can you now go and revert all the FUD you spread? You probably need to
invest much more time than that.

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
[EMAIL PROTECTED]   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgprEWLvjmJUi.pgp
Description: PGP signature


Re: Is there any way to increase the KVM?

2008-06-06 Thread Pawel Jakub Dawidek
On Thu, Jun 05, 2008 at 02:10:02PM +0100, Hugo Silva wrote:
 Pawel Jakub Dawidek wrote:
 PS. I'm quite sure the ZFS version I've in perforce will fix most if not
 all 'kmem_map too small' panics. It's not yet committed, but I do want
 to MFC it into RELENG_7.
 
   
 
 Any guesstimate as to when the MFC will happen ?

Hard to tell, really. The number of changes is huge, so it's hard to
predict how much I'd need to fix after commit to HEAD. Two months sounds
possible. I'll provide patches for RELENG_7 probably earlier.

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
[EMAIL PROTECTED]   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpfZZHd4pl5o.pgp
Description: PGP signature


Re: AMD Geode LX crypto accelerator (glxsb)

2008-06-06 Thread Pawel Jakub Dawidek
On Fri, Jun 06, 2008 at 11:41:35PM +0200, Patrick Lamaizi?re wrote:
 Dears,
 
 I'm trying to port the glxsb driver from OpenBSD to FreeBSD 7-STABLE
 (via the NetBSD port).

Cool.

  The glxsb driver supports the security block of the Geode LX
 series processors.  The Geode LX is a member of the AMD Geode family
 of integrated x86 system chips.
  
 Driven by periodic checks for available data from the generator,
 glxsb supplies entropy to the random(4) driver for common usage.
 
 glxsb also supports acceleration of AES-128-CBC operations for
 crypto(4).
 
 I think that most of the work is done, except the random generator.
 Source in progress for 7-STABLE:
 http://user.lamaiziere.net/patrick/glxsb.c
 http://user.lamaiziere.net/patrick/glxsb.tar.gz (c+Makefile)
 
 Credits to OpenBSD and NetBSD, Thanks!
 
 Well, it seems to work but i've got few problems to test the module :
 
 - How check the encryption/decryption ?
 
 Openssl seems ok, i've got quite the same results as NetBSD on a Soekris
 net5501 box. But i must use -engine cryptodev, why ?

This is ok, as you may not want to use it, right?

 $ openssl speed -evp aes-128-cbc -engine cryptodev -elapsed
 engine cryptodev set.
 ...CUT...
 type16 bytes  64 bytes  256 bytes 1024 bytes 8192 bytes
 aes-128-cbc 1151.08k  4134.25k  11936.49k 22504.83k  25576.36k
 
 When i test ssh -c aes128-cbc hostname, ssh does not use the crypto
 device. I receive a crypto_newsession() followed by a
 crypto_freesession(), i mean i don't receive any crypto_process().

Have you tried to put some debug to opencrypto? I believe openssh should
use it automatically, at least this was the case some time ago, AFAIR.

 So how can I be sure that the datas are well encrypted ?

Try comparing result of openssl encryption with and without '-engine
cryptodev'. Remember to use -nosalt (and maybe -raw) prevent openssl
from putting salt in front of the ciphertext.

 Also, I've got some questions to finish the driver:
 
 - between arc4rand() and read_random(), witch function shall i use ?

arc4rand() is preferred.

 - Shall I lock the sessions ? The padlock driver uses a mutex to lock
 the sessions
 http://fxr.watson.org/fxr/source/crypto/via/padlock.c?v=FREEBSD7#L211 
 
 Is it usefull ? Drivers ubsec, safe and hifn don't lock the sessions at
 all.

You should and they should as well.

 - during crypto_process() the driver uses s = splnet();. I'm not sure
 about this ?

Drop this one.

 - The driver does a busy wait to check the completion of the
 encryption. I think it would be beter to use the interrupt. I will
 look later.

I remember looking at that code sometime ago and that bit is really
lame, so lame that I think they would do it in a different way if that
was possible. Maybe it's worth contacting OpenBSD/NetBSD and ask? There
might be a good reason for that.

 - Any comment is welcome, this is my first work on a driver.

Looks good:) I can do a final review and commit once you are done and if
I'll be able to start my Soekris and test it.

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
[EMAIL PROTECTED]   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgppAoXvRI1QS.pgp
Description: PGP signature


Re: Is there any way to increase the KVM?

2008-06-05 Thread Pawel Jakub Dawidek
On Thu, Jun 05, 2008 at 01:53:37AM +0800, Tz-Huan Huang wrote:
 On Thu, Jun 5, 2008 at 12:31 AM, Dag-Erling Sm??rgrav [EMAIL PROTECTED] 
 wrote:
  Tz-Huan Huang [EMAIL PROTECTED] writes:
  The vfs.zfs.arc_max was set to 512M originally, the machine survived for
  4 days and panicked this morning. Now the vfs.zfs.arc_max is set to 64M
  by Oliver's suggestion, let's see how long it will survive. :-)
 
  [EMAIL PROTECTED] ~% uname -a
  FreeBSD ds4.des.no 8.0-CURRENT FreeBSD 8.0-CURRENT #27: Sat Feb 23 01:24:32 
  CET 2008 [EMAIL PROTECTED]:/usr/obj/usr/src/sys/ds4  amd64
  [EMAIL PROTECTED] ~% sysctl -h vm.kmem_size_min vm.kmem_size_max 
  vm.kmem_size vfs.zfs.arc_min vfs.zfs.arc_max
  vm.kmem_size_min: 1,073,741,824
  vm.kmem_size_max: 1,073,741,824
  vm.kmem_size: 1,073,741,824
  vfs.zfs.arc_min: 67,108,864
  vfs.zfs.arc_max: 536,870,912
  [EMAIL PROTECTED] ~% zpool list
  NAMESIZEUSED   AVAILCAP  HEALTH ALTROOT
  raid   1.45T435G   1.03T29%  ONLINE -
  [EMAIL PROTECTED] ~% zfs list | wc -l
  210
 
  Haven't had a single panic in over six months.
 
 Thanks for your information, the major difference is that we
 runs on 7-stable and the size of our zfs pool is much bigger.

I'm don't think the panics are related to pool size. More to the load
and characteristics of your workload.

 [EMAIL PROTECTED] uname -a
 FreeBSD cml2.csie.ntu.edu.tw 7.0-STABLE FreeBSD 7.0-STABLE #40: Sat
 May 31 10:29:16 CST 2008
 [EMAIL PROTECTED]:/usr/local/obj/usr/local/src/sys/CML2  amd64
 [EMAIL PROTECTED] sysctl -h vm.kmem_size_min vm.kmem_size_max vm.kmem_size
 vfs.zfs.arc_min vfs.zfs.arc_max
 vm.kmem_size_min: 0
 vm.kmem_size_max: 1,610,612,736
 vm.kmem_size: 1,610,612,736
 vfs.zfs.arc_min: 16,777,216
 vfs.zfs.arc_max: 67,108,864
 [EMAIL PROTECTED] zpool list
 NAMESIZEUSED   AVAILCAP  HEALTH ALTROOT
 sun11.3T   9.03T   2.30T79%  ONLINE -
 [EMAIL PROTECTED] zfs list | wc -l
  295

If we're comparing who has bigger... :)

beast:root:~# zpool list
NAMESIZEUSED   AVAILCAP  HEALTH ALTROOT
tank732G604G128G82%  ONLINE -

but:

beast:root:~# zfs list | wc -l
1932

No panics.

PS. I'm quite sure the ZFS version I've in perforce will fix most if not
all 'kmem_map too small' panics. It's not yet committed, but I do want
to MFC it into RELENG_7.

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
[EMAIL PROTECTED]   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgp9rbBT2lsbh.pgp
Description: PGP signature


Re: Is there any way to increase the KVM?

2008-06-03 Thread Pawel Jakub Dawidek
On Sat, May 31, 2008 at 01:52:56PM +0800, Tz-Huan Huang wrote:
 Hi,
 
 Our nfs server is running 7-stable/amd64 with 8G ram, the size of zfs
 pool is 12T. We have set vm.kmem_size and vm.kmem_size_max to
 1.5G, but the kernel still panics by kmem_map too small often.

Could you also try to decrease vfs.zfs.arc_max?

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
[EMAIL PROTECTED]   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpJ8VT5v06Ug.pgp
Description: PGP signature


Re: Security Flaw in Popular Disk Encryption Technologies

2008-02-26 Thread Pawel Jakub Dawidek
On Sat, Feb 23, 2008 at 02:08:54PM +1300, Atom Smasher wrote:
 article below. does anyone know how this affects eli/geli?
 
 from the geli man page: detach - Detach the given providers, which means 
 remove the devfs entry and clear the keys from memory. does that mean 
 that geli properly wipes keys from RAM when a laptop is turned off?

Yes, geli tries to clear sensitive informations on detach (mostly keys).
I use a script to suspend my laptop, which detach my encrypted partition
before suspend. In perforce I've suspend/resume geli(8) subcommands that
helps a bit here - on 'geli suspend' command the keys are cleared and
all I/O requests are suspended until 'geli resume' provides proper keys.
This way one doesn't have to unmount file systems to allow 'geli detach'
to succeed.

Of course even if keys are cleared there could still be important data
in RAM (eg. file system's buffer cache).

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
[EMAIL PROTECTED]   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpikfNoxpT5s.pgp
Description: PGP signature


Re: A TrustedBSD voluntary sandbox policy.

2007-11-08 Thread Pawel Jakub Dawidek
On Wed, Nov 07, 2007 at 10:20:28PM -0500, [EMAIL PROTECTED] wrote:
 I'm considering developing a policy/module for TrustedBSD loosely based
 on the systrace concept - A process loads a policy and then executes
 another program in a sandbox with fine grained control over what that
 program can do.
 
 I'm aiming for a much simpler implementation, however. No interaction.
 No privilege elevation (only restriction). No system call rewriting,
 only access control.
 
 The interface will look something like this:
 
 (cat EOF
 deny all 
 allow file_open /etc/passwd
 allow file_open /dev/tty
 allow sock_connect 127.0.0.1 80
 allow sock_connect 208.77.188.166 80
 rlimit core 0
 rlimit cpu 20
 rlimit nofile 10
 EOF
 ) | sandbox /bin/ls -alF /bin
 
 Please note that the 'policy' given on the command line is purely for 
 the sake of example, no syntax or semantics have been decided upon.
 
 The implementation appears to be simple, as far as I'm aware. I'm sure
 there will be thorns and problems - that's what I'm here to find out.
 
 The 'sandbox' process compiles the policy text into a binary structure
 in userland, loads the binary structure into the kernel module via a
 system call implemented with mac_syscall(), sets various rlimits and 
 then runs /bin/ls with execve().  When the process exits, the memory for 
 the binary structure is freed.
 
 I would like, at this stage, to know if the above model is seriously
 incompatible with the way the MAC framework works, it's not entirely
 clear either way having read other policies such as mac_biba, mac_stub
 etc.
 
 For example - how to know when a process has exited? Policy for an
 executed process would be kept in a small hash table, indexed by process
 id. The policy will be enabled when the process sucessfully calls
 execve() for the first time and will be destroyed when the process
 exits.  If we're not notified when a process has exited, we can't remove
 policy from the table.
 
 Also, what should be done when a process decides to fork() or execve()?
 It'd be rather unfortunate if the process could break out of the sandbox
 just by executing another process but blocking all attempts to fork()
 or execve() would make classes of programs unusable.

First problem is that it is hard to operate on file paths. MAC passes a
locked vnode to you and you cannot go from there to a file name easly.
You could do it by comparsion: call VOP_GETATTR(9) on the given vnode,
do the same for /etc/passwd and others and compare their inodes and
file system ids. Performance hit may be significant for complex
policies.

You can register yourself for process_exit, process_fork and
process_exec in-kernel events and do your cleanups from your event
handler. Take a look at EVENTHANDLER(9).

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
[EMAIL PROTECTED]   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpnSAKoJorcw.pgp
Description: PGP signature


Re: kern.ngroups (non) setting ... new bounty ?

2007-09-27 Thread Pawel Jakub Dawidek
On Tue, Sep 25, 2007 at 09:51:06AM -0700, rsync.net wrote:
 
 It has been impossible to change kern.ngroups - at least for several years
 now.  It was not fixed in either 5.x or 6.x :
 
 http://lists.freebsd.org/pipermail/freebsd-bugs/2007-January/022140.html
 
 It is seemingly a difficult problem:
 
 http://www.atm.tut.fi/list-archive/freebsd-stable/msg09969.html   [1]
 
 However it should be solved - we can't be the only ones out there trying
 to add a UID to more than 16 groups...
 
 
 -
 
 
 The rsync.net code bounties have been fairly successful this year - two of
 the five projects have been completed, and the large vmware 6 on FreeBSD
 project is now underway.
 
 We'd like to add a new bounty for this kern.ngroups issue.  We are posting
 to -hackers today to get some feedback on how long this will take and how
 much money might reasonably be expected to lure this work.
 
 
 --rsync.net Support
 
 
 
 [1]  Is it indeed true that these programs are broken by not following
  NGROUPS_MAX from syslimits.h?

I don't see how they can be broken. They may not see more than 16
groups, but they shouldn't blow up. The only possibility of bad usage I
see is something like this:

gid_t gids[NGROUPS_MAX];
int gidsetlen;

gidsetlen = getgroups(0, NULL);
getgroups(gidsetlen, gids);

But I guess the most common use is:

gid_t gids[NGROUPS_MAX];
int gidsetlen;

gidsetlen = getgroups(NGROUPS_MAX, gids);

Binaries using the latter method should be just fine.
BTW. The latter method is what all utilities from the base system use.

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
[EMAIL PROTECTED]   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpjoAh2xp0gD.pgp
Description: PGP signature


Re: Hierarchical jails - any current work?

2007-09-20 Thread Pawel Jakub Dawidek
On Wed, Sep 19, 2007 at 01:30:44PM -0600, James Gritton wrote:
 Pawel Jakub Dawidek wrote:
 Something like this:
  http://garage.freebsd.pl/mljail.README
 
 I did it some time ago, and this is one of the feature for new jail
 implementation with is beeing designed
 
 Yes, that's just the thing I'm talking about, so it looks like I have 
 indeed be reinventing something.  (The jail scheduling work of cdjones 
 it something else I'm interested in, but for another time).
 
 Now the question becomes: how much jail work is out there, and what's 
 the likelihood is it seeing the light of day in a released kernel?  I 
 hate to be going about coding stuff that's been done before (well, 
 actually I enjoy coding it but you know...), but I only ever see 
 snippets of jail work mentioned here and there and nothing ever seems to 
 get anywhere official.  I figured the place to talk about this was the 
 freebsd-jail mailing list, but it seems to be mostly for stuff like 
 getting app X to work in a jail or the current jail rc scripts have 
 this or that deficiency.  That's why I cross-mailed to freebsd-hackers 
 - maybe more appropriate there?
 
 Where's the secret place people really go to communicate this kind of 
 thing?  I've done a lot of work in the general jail-like area, and while 
 much of it it the same as others' I'd like to share what isn't.  Of 
 course, with other people's jail-related projects staying on the 
 sidelines so long - and that by those with @freebsd.org stature - one 
 wonders if there's a point.  I don't mean to sound down on anything, 
 just wondering what the state of the jail community is.  Or where it is.

We are not hidding anything, don't worry:) We just had developers summit
in Denmark when we talked about future jail design. We also talked about
this at the developers summit in Milan last year.

Currently we have the big picture and quite a few details, I wouldn't
call it finished project, because it's not, but we moved forward
definiately. Once we polish the notes taken at devsummit we will publish
them on a wiki page and give some time to the community to comment on
that. If you want to work on jails I would hold on before the wiki page
is ready, because I suspect there will be a lot of work to do.

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
[EMAIL PROTECTED]   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpq6qmf7wlIN.pgp
Description: PGP signature


Re: Hierarchical jails - any current work?

2007-09-19 Thread Pawel Jakub Dawidek
On Tue, Sep 18, 2007 at 03:03:12PM -0600, James Gritton wrote:
 I've been doing some work on a hierarchical jail setup, but I've got
 this nagging feeling it's been done before.  Does anyone know of such
 an existing project?  If not, I'll put forward my own code.

Something like this:

http://garage.freebsd.pl/mljail.README

I did it some time ago, and this is one of the feature for new jail
implementation with is beeing designed.

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
[EMAIL PROTECTED]   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpMC8inVuBxL.pgp
Description: PGP signature


Re: VFS locking questions

2007-08-06 Thread Pawel Jakub Dawidek
On Fri, Aug 03, 2007 at 09:29:33PM +0200, Ulf Lilleengen wrote:
 Hi,
 
 I have a couple of questions regarding VFS, since I'm trying to SMPify the
 fdescfs code in an effort to get some experience with VFS and freebsd 
 locking...
 
 What is really LK_INTERLOCK? When should it be used? When should one acquire 
 it
 (with VI_LOCK i assume), and what are the semantics?  

Vnode internal lock (v_interlock, VI_LOCK()) is used to protect various
field in the vnode structure (those marked with 'i' letter in vnode.h).
You pass the LK_INTERLOCK flag to functions like lockmgr(), vn_lock(),
VOP_UNLOCK() when you already hold vnode's interlock. This way if one of
those functions needs vnode's interlock internally, it knows if you
already hold it or not (thus the function needs to acquire it on its
own). We could probably just use mtx_owned() inside those functions.

 Let's say I have a function that should return a locked vnode. I lock the
 hash-table with a regular mutex. Then, when I traverse the list, I check if 
 the
 entry is what I look for. If it is, I call VI_LOCK() on the vnode, use vget to
 increment refcount, and then use vn_lock(vp, LK_EXCLUSIVE...) to lock the 
 vnode
 before the function returns. Is this correct behaviour? 

Instead of doing what you suggest:

VI_LOCK(vp);
vget(vp, LK_INTERLOCK, td);
vn_lock(vp, LK_EXCLUSIVE, td);

You can simply call:

vget(vp, LK_EXCLUSIVE, td);

This is why:
- You haven't passed LK_INTERLOCK, so vget() will lock it by itself if
  needed (it does need it).
- You passed LK_EXCLUSIVE, so vget() will return locked vnode.

 The LK_INTERLOCK bothers me a bit, because I'm not 100% sure on how it works.

It probably mostly an optimization and probably protection before some
races, so you can call various functions with vnode's interlock already
held.

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
[EMAIL PROTECTED]   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpfMgZKdrlaY.pgp
Description: PGP signature


Re: ufs_rename: fvp == tvp (can't happen), but it did

2007-01-14 Thread Pawel Jakub Dawidek
On Sun, Jan 14, 2007 at 07:18:04PM +0100, Attila Nagy wrote:
 On 2007.01.12. 20:06, Pawel Jakub Dawidek wrote:
 Silent data corruptions happens, look for example at the problem with
 4T volume under FreeBSD thread on [EMAIL PROTECTED]
 
 I'd suggest configuring geli with data authentication on top of the FC
 array. geli will detect silent data corruptions.
   
 Data corruption was the first thing, which came into my mind, I am currently 
 trying to reproduce this on another machine. geli's data authentication is a 
 good thing, but 
 ZFS's ability to actually correct the errors (in this case, at least) is even 
 more better. :)
 
 Is there a newer patch for ZFS than this: 
 http://people.freebsd.org/~pjd/patches/zfs_20061117.patch.bz2 ?
 I as far as I can see, you've put a tremendous amount of work into it in 
 perforce...

There is no newer patch yet. It's quite time consuming to create such a
patch, test it, etc. so I'm trying to avoid doing it:)

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
[EMAIL PROTECTED]   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpKlbcekRHHt.pgp
Description: PGP signature


Re: ufs_rename: fvp == tvp (can't happen), but it did

2007-01-12 Thread Pawel Jakub Dawidek
On Fri, Jan 12, 2007 at 11:35:44AM +0100, Attila Nagy wrote:
 On 01/11/07 02:01, Kris Kennaway wrote:
 On Sun, Jan 07, 2007 at 12:44:39PM +0100, Attila Nagy wrote:
   
 On 2007.01.07. 1:11, [EMAIL PROTECTED] wrote:
 
 It sounds as if the caller of ufs_rename() is confused.  You could
 try setting a breakpoint on the printf(), or change it to a panic()
 to get a dump, and try to figure out who the caller is and what is
 going on.

 Yes this would be very good, especially if this wouldn't be a production 
 machine or if I could reproduce this on a test system. But neither of this 
 are true. :(
 
 Maybe I will try it on a sleepless night, in the maintenance window, thanks 
 for the idea.
 
 
 Try forcing a fsck, sometimes bizarre FS panics are due to filesystem
 corruption.
   
 I've already thought of that, but in that case the FC array must be bad, 
 since going with only the locally attached disks in the mirror, the error 
 doesn't appear...

Silent data corruptions happens, look for example at the problem with
4T volume under FreeBSD thread on [EMAIL PROTECTED]

I'd suggest configuring geli with data authentication on top of the FC
array. geli will detect silent data corruptions.

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
[EMAIL PROTECTED]   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpLSEWhISiCi.pgp
Description: PGP signature


Re: iSCSI disconnects dilema

2007-01-12 Thread Pawel Jakub Dawidek
On Tue, Jan 09, 2007 at 09:06:46AM +0200, Danny Braniss wrote:
 Hi,
 While I think I have almost solved the problem of network disconnects,
 It downed on me a major problem:
 When a 'local' disk crashes, the kernel will probably hang/panic/crash.
 if i don't try to recover, then there is no change in the above scenario.
 if i try to recover, then the client does not know that it should
 umount/fsck/mount.
 While all this seems familiar, removing  a floppy/disk-on-key while it's
 mounted, we could always say you shouldn't have done that!, with
 a network connection, it can happen very often - rebooting the target, a
 network hickup, etc.
 
 So, any ideas?

In my opinion it should be done this way:

You have a queue of I/O requests. You send the to the other end and wait
for confirmation. Until confirmation is received, you keep the requests
queued. If the other end dies, you try to reconnect (until some timeout
expires, the processes which send those requests will just wait), if you
reconnect successfully, you resend not-confirmed requests, if you won't
be able to reconnect, you just pass the errors up.

This is what I did in ggate and it seems to work.

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
[EMAIL PROTECTED]   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpmejRxKe25f.pgp
Description: PGP signature


Re: Best practices for using gjournal with gmirror?

2007-01-12 Thread Pawel Jakub Dawidek
On Wed, Jan 10, 2007 at 11:21:01PM -0500, John Nielsen wrote:
 I have a few questions for pjd (or anyone else) about using gjournal, 
 particularly when used with gmirror.
 
 1) I'm running 6-STABLE and plan to test with gjournal6_20061030.patch (from 
 the mailing list; updated version of 20061024 that applies cleanly). Is 
 there a better/newer version for -STABLE that I should use instead?

There probably should be a newer version as there were some minor
changes after I committed the code to HEAD. I'll try to create a new
patch during the weekend.

 2) When using gjournal and for a gmirror volume, does the journal need to be 
 mirrored as well to maintain redundancy? If so, when storing the journal on 
 the same physical disks as the mirror, is it better to mirror at the slice 
 level (journal and fs on different partitions in the same mirror) or at the 
 partition level (journal and fs each have their own mirror) or does it 
 matter?

The problem with mirroring each partition/slice separately is that when
you have a crash, on boot, gmirror will start to rebuild all partitions
at once, which may be problematic. On the other hand, when you mirror
each partition/slice separately, and some partitions weren't modified in
last few seconds before the crash, gmirror will not resync them on boot,
so not entire disk will be synchronized.

When you run gjournal on top of gmirror/graid3 there is no need for
resync after a crash, so bascially all cons against mirroring the whole
disks and against mirroring partitions are no longer true. Both
configurations will work the same. In that case I'd suggest mirroring
the whole disks, because when one of your disks dies, you may just
replace it and be down with it. If you mirror partitions separately, you
first have to create partitions and insert each of them into their
mirrors, which is more complex than simple 'gmirror insert foo newdisk'.

 3) I remember reading where pjd said that gjournal plus gmirror or graid3 
 would eliminate the need to re-sync the array after a crash. While clearly 
 a design goal, is that actually the case with the version of the patch 
 mentioned above? If so, are any config changes needed or will it just 
 happen automagically?

No, you need to:

# gmirror configure -F mirror_name

 4) In the same vein as 3)--does a gjournal volume need to be fsck'ed after a 
 crash? If not, will it just work (e.g. fsck -p sees that the filesystem is 
 clean) or does it need to be disabled somehow?

Gjournaled file system has to be fscked, but only to handle orphaned
files. Such fsck on multiterabyte provider takes seconds, not hours.

 5) Finally, how dangerous is this code? I realize it's experimental and only 
 plan to use it with data that has recent backups, but how much should I 
 worry about it blowing up my system or corrupting my files?

I'm using it in production, my customer using it in production on large
number of FreeBSD servers and I also have heard already many success
stories, BUT I still consider the code to be experimental.

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
[EMAIL PROTECTED]   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpGYgK8t204N.pgp
Description: PGP signature


Re: FW: FreeBSD: driver for ssl hardware accelerator board based on broadcom bcm5825, bcm5862 chips

2006-12-19 Thread Pawel Jakub Dawidek
On Sun, Dec 17, 2006 at 07:38:30AM +0200, Alex Aronson wrote:
 
 
 -Original Message-
 From: Alex Aronson [mailto:[EMAIL PROTECTED]
 Sent: Wednesday, December 13, 2006 10:48 AM
 To: '[EMAIL PROTECTED]'; '[EMAIL PROTECTED]'; '[EMAIL PROTECTED]';
 '[EMAIL PROTECTED]'
 Subject: FreeBSD: driver for ssl hardware accelerator board based on
 broadcom bcm5825, bcm5862 chips
 
 
 Hello,
 I am working on FreeBSD driver for bcm5825 (5862) based board.
 Would you please help me.
 
 First of all I tried to work with bcm5820 based board.
 I installed FreeBSD 6.1 and load ubsec module:
 kldload ubsec
 In dmesg I saw that module recognized board (ubsec0: Broadcom 5820), crypto
 module was also loaded.
 
 After that I run openssl test (openssl version 0.9.7e-p1 25 Oct 2004)
 openssl speed rsa1024 -engine ubsec
 can't use that engine
 830:error:2507006C:DSO support routines:DSO_load:functionality not
 supported:/usr/src/secure/lib/libcrypto/../../../crypto/openssl/crypto/dso/d
 so_lib.c:239:
 830:error:84069067:ubsec engine:UBSEC_INIT:dso
 failure:/usr/src/secure/lib/libcrypto/../../../crypto/openssl/crypto/engine/
 hw_ubsec.c:390:
 830:error:260B806D:engine routines:ENGINE_TABLE_REGISTER:init
 failed:/usr/src/secure/lib/libcrypto/../../../crypto/openssl/crypto/engine/e
 ng_table.c:182:
 
 What am I missing?
 There is no libubsec.so in the system.
 
 Any help will be appreciated.

'-engine ubsec' will try to use userland driver. If you loaded ubsec.ko
and cryptodev.ko, you should use '-engine cryptodev'.

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
[EMAIL PROTECTED]   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpGOj1T0uvk2.pgp
Description: PGP signature


Re: SEEK_HOLE and SEEK_DATA for sparse files any takers?

2006-11-14 Thread Pawel Jakub Dawidek
On Wed, Nov 08, 2006 at 02:38:36AM +0100, Pedro F Giffuni wrote:
 Hi;
 
 From http://blogs.sun.com/bonwick/date/200512
 
 At this writing, SEEK_HOLE and SEEK_DATA are Solaris-specific. I encourage
 (implore? beg?) other operating systems to adopt these lseek(2) extensions
 verbatim (100% tax-free) so that sparse file navigation becomes a ubiquitous
 feature that every backup and archiving program can rely on. It's long
 overdue.
 
 It should be mentioned that linux adopted them and they would help the ZFS
 port.

I've some starting code for this and I'm planning to implement them, at
least for ZFS.

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
[EMAIL PROTECTED]   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpu3L8mZGSH6.pgp
Description: PGP signature


Re: Yet another magic symlinks implementation

2006-11-05 Thread Pawel Jakub Dawidek
On Sat, Nov 04, 2006 at 11:56:29AM +0300, Andrey V. Elsukov wrote:
 Hi, All!
 
 I've ported NetBSD magic symlinks implementation to FreeBSD.
 The description of magiclinks can been found here:
 http://www.daemon-systems.org/man/symlink.7.html
 
 Patch here:
 http://butcher.heavennet.ru/patches/kernel/magiclinks/

From what I know NetBSD removed mount flag and switched to global sysctl
to enable/disable this feature. Would be good to know why and eventually
do the same.

I like the idea and I probably can work on getting it to the tree.
Creating perforce account for you would be a good start. Would you like
to work there?

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
[EMAIL PROTECTED]   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpd5wcOLlCS9.pgp
Description: PGP signature


Re: fsync: giving up on dirty

2006-08-26 Thread Pawel Jakub Dawidek
On Fri, Aug 25, 2006 at 09:22:26PM -0500, Eric Anderson wrote:
 I got this error today, while some very heavy disk access was occurring:
 
 
 Aug 25 13:47:07 snapshot1 kernel: fsync: giving up on dirty
 Aug 25 13:47:07 snapshot1 kernel: 0xff01bbb99a20: tag devfs, type VCHR
 Aug 25 13:47:07 snapshot1 kernel: usecount 1, writecount 0, refcount 445 
 mountedhere 0xff023ee20800
 Aug 25 13:47:07 snapshot1 kernel: flags ()
 Aug 25 13:47:07 snapshot1 kernel: v_object 0xff01c34afb60 ref 0 pages 
 16386
 Aug 25 13:47:07 snapshot1 kernel: lock type devfs: EXCL (count 1) by thread 
 0xff023f11d980 (pid 46)#0 0x803eeaa6 at lockmgr+0x5f6
 Aug 25 13:47:07 snapshot1 kernel: #1 0x8065e8d1 at VOP_LOCK_APV+0x81
 Aug 25 13:47:07 snapshot1 kernel: #2 0x8047015b at vn_lock+0x6b
 Aug 25 13:47:07 snapshot1 kernel: #3 0x805719be at ffs_sync+0x1fe
 Aug 25 13:47:07 snapshot1 kernel: #4 0x80472045 at 
 vfs_write_suspend+0x95
 Aug 25 13:47:07 snapshot1 kernel: #5 0x80b794a5 at 
 g_journal_switcher+0xa55
 Aug 25 13:47:07 snapshot1 kernel: #6 0x803e3cdb at fork_exit+0xbb
 Aug 25 13:47:07 snapshot1 kernel: #7 0x805f39ce at fork_trampoline+0xe
 Aug 25 13:47:07 snapshot1 kernel:
 Aug 25 13:47:07 snapshot1 kernel: dev label/vol11-data.journal
 Aug 25 13:47:07 snapshot1 kernel: GEOM_JOURNAL: Cannot suspend file system 
 /vol11 (error=35).

I'm aware of this, but it is harmless. On journal switch gjournal cannot
synchronize the file system, so it will try again later. It should be
probably better logged (as a warning).

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
[EMAIL PROTECTED]   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpjKHQzQnVJy.pgp
Description: PGP signature


Re: 6-STABLE snapshot (background fsck) lock-up

2006-08-26 Thread Pawel Jakub Dawidek
On Sat, Aug 26, 2006 at 07:23:36AM -0500, Eric Anderson wrote:
 Hmm - had another panic.  Again, screen shots are here:
 
 http://www.googlebit.com/freebsd/snapshots/gjournal_panic2/

I can't find panic message. What was it?

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
[EMAIL PROTECTED]   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpbQZNmAZmgd.pgp
Description: PGP signature


Re: 6-STABLE snapshot (background fsck) lock-up

2006-08-26 Thread Pawel Jakub Dawidek
On Sat, Aug 26, 2006 at 08:19:40PM -0500, Eric Anderson wrote:
 On 08/26/06 07:44, Pawel Jakub Dawidek wrote:
 On Sat, Aug 26, 2006 at 07:23:36AM -0500, Eric Anderson wrote:
 Hmm - had another panic.  Again, screen shots are here:
 
 http://www.googlebit.com/freebsd/snapshots/gjournal_panic2/
 I can't find panic message. What was it?
 
 
 It was a deadlock.

This looks like VM related problem - g_event thread is waiting for free
pages, but it never get them.

Are you able to connect serial console to this machine and provide also
output from 'alltrace' if it happens again?

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
[EMAIL PROTECTED]   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpKdzGuvhbg9.pgp
Description: PGP signature


Re: 6-STABLE snapshot (background fsck) lock-up

2006-08-24 Thread Pawel Jakub Dawidek
On Tue, Aug 22, 2006 at 03:38:15PM -0500, Eric Anderson wrote:
 Did you get a chance to look at those screenshots?  I'm curious to know if 
 you also think it is gjournal related.  I've stopped loading gjournal, and 
 I've had no other 
 related deadlocks.

This patch was not yet merged to RELENG_6, can you try it?

http://people.freebsd.org/~pjd/patches/vfs_subr.c.3.patch

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
[EMAIL PROTECTED]   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpiRQF0kNjEn.pgp
Description: PGP signature


Re: 6-STABLE snapshot (background fsck) lock-up

2006-08-22 Thread Pawel Jakub Dawidek
On Tue, Aug 22, 2006 at 03:38:15PM -0500, Eric Anderson wrote:
 Did you get a chance to look at those screenshots?  I'm curious to know if 
 you also think it is gjournal related.  I've stopped loading gjournal, and 
 I've had no other 
 related deadlocks.

I'm out of town tomorrow, I'll try to take a look when I'm back. We saw
snapshot/gjournal related deadlocks, but all were fixed, maybe there is
a fix which wasn't comitted.

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
[EMAIL PROTECTED]   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpiXHCQ9oJQU.pgp
Description: PGP signature


Re: ENOMEM @ RELENG_6 graid3

2006-06-27 Thread Pawel Jakub Dawidek
On Mon, Jun 26, 2006 at 07:34:51PM +0400, Dmitry Morozovsky wrote:
 Dear colleagues,
 
 turning on bootverbose reveals additional info to 
 
 ad10: FAILURE - out of memory in start
 
 under load this machine (5 ata disks, most of their space allocated for 2 
 graid3's) many messages like
 
 ENOMEM 0xc6e834a4 on 0xc493c080(ad8)
 ENOMEM 0xc703fdec on 0xc4960480(ad10)
 ENOMEM 0xc6b49528 on 0xc4901400(ad0)
 ENOMEM 0xc6c378c4 on 0xc493ca80(ad4g)
 ENOMEM 0xc662b210 on 0xc4900b00(ad0f)
 ENOMEM 0xc6b33630 on 0xc493c380(ad4)
 ENOMEM 0xc7320d68 on 0xc4901400(ad0)
 ENOMEM 0xc6bd6948 on 0xc493c380(ad4)
 ENOMEM 0xc7299dec on 0xc493c200(ad6)
 ENOMEM 0xc6d91528 on 0xc495f700(ad6g)
 ENOMEM 0xc47b07bc on 0xc4960480(ad10)
 ENOMEM 0xc7c22bdc on 0xc493c080(ad8)
 
 Machine is rather stable; however, it panics two or three times on 
 /ftp: bad dir ino 3454117 at offset 444: mangled entry
 panic: ufs_dirbad: bad dir
 
 Any hints to debug?

I hope ENOMEM errors are not related to your panic, because on ENOMEM
GEOM should repeat the request a bit later.

Will be good to know if you have simlar panics without graid3. For
example on a plain disk, but with 2kB sector size (you can do it with
gnop(8)). You can also try gstripe(8) your disks with small stripesize,
eg. 512 bytes and use gnop(8) on top of it to change sector size, so all
disks will be used, in case there is a problem with your controller.

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
[EMAIL PROTECTED]   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpRAmqgGDrBu.pgp
Description: PGP signature


Re: freebsd 5.3, gmirror raid 1, PROBLEM

2006-06-06 Thread Pawel Jakub Dawidek
On Mon, May 29, 2006 at 05:44:04PM -0400, sara lidgey wrote:
+ Hi All,
+  
+  I've been running a server using FreeBSD 5.3 and gmirror to mirror two 
identical IDE hard drives.  Its been running great for over a year.  But 
recently everything went down and when I reboot and put a monitor on it I get 
the following errors on screen:
+  
+  GEOM_MIRROR: Device gm0: provider ad1 disconnected
+  GEOM_MIRROR: Device gm0: provider mirror/gm0 destroyed
+  GEOM_MIRROR: Device gm0: rebuilding provider ad0 stopped
+  
+  Fatal trap 12: page fault while in kernel mode...  (this is followed by 
details about the fault)
+  
+  These errors are preceded by other related error information that flys by 
on the screen and I have no way of seeing them again.
+  
+  Does anyone now what steps I should take to figure what is going on and try 
to recover data or get the machine to boot?

Can you provide more info? There should be more interesting informations
before those you pasted.
There was a lot of fixes to gmirror in 6.1, so you may consider an
upgrade.

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
[EMAIL PROTECTED]   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpobLE2XMUuz.pgp
Description: PGP signature


Re: bus_dmamap_load_uio and uiomove

2006-06-06 Thread Pawel Jakub Dawidek
On Tue, May 30, 2006 at 02:49:22PM +0200, Jimmy Olgeni wrote:
+ 
+ Hello,
+ 
+ Just quick busdma question...
+ 
+ I'm currently upgrading a custom device driver to use bus_dmamap_load_uio 
rather than uiomove. Everything works fine, but calls to write fail unless I 
set uio-uio_resid 
+ to 0 by hand (as I'm not using uiomove anymore).
+ 
+ Am I supposed to set uio_resid by hand when using bus_dmamap_load_uio, or is 
there a better way to signal that all the data in uio was used?

From what I see, bus_dmamap_load_uio() is using uio_resid as the number
of bytes to proceed, so it has to be set before the call.

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
[EMAIL PROTECTED]   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpmp8Q0Urm4t.pgp
Description: PGP signature


Re: Fingerprint Authentication

2006-05-09 Thread Pawel Jakub Dawidek
On Fri, May 05, 2006 at 03:58:06PM +0200, Fredrik Lindberg wrote:
+ Alin-Adrian Anton wrote:
+ Fredrik Lindberg wrote:
+ 
+ But that would sort of defeat the whole purpose of biometric 
authentication and you could really just use public keys instead
+ which would be a lot faster and easier than scanning your finger
+ at each login. :)
+ 
+ Unless you locally encrypt your private key with information gathered by 
the fingerprint reader, as a password.
+ 
+ That's exactly the problem with, at least, UPEKs driver. If you scan
+ one of your fingers twice you'll get two different BioAPI records.
+ That's different as in two binary data blobs which aren't equal.
+ To match these records with each other, you hand them over to the
+ driver which, as far as I know, hand them over to the hardware
+ which in turn performs some black magic and then tell you if
+ the records match or not.

That's right, but the idea with asymmetric crypto is very accurate.
Such fingerprint reader should have a secure chip with your private
key and on authentication, you should provide data from your finger scan
and data to sign - on match, it should return signed data, which you can
use to continue authentication process.

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
[EMAIL PROTECTED]   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpwxmt56juB3.pgp
Description: PGP signature


Re: Using open system call in KLD

2006-03-06 Thread Pawel Jakub Dawidek
On Mon, Mar 06, 2006 at 02:10:10AM -0800, Anupam Deshpande wrote:
+ I successfully created a file using kern_open().
+ Now I want to 'write to' or 'read from' the file.What functions should
+ I use for that purpose?

This is not so trivial as it is in userland (but you already know
that:)).

Here are functions I created for one of my projects:

http://people.freebsd.org/~pjd/misc/kernio/subr_kernio.c
http://people.freebsd.org/~pjd/misc/kernio/kernio.h

There are only open/close/write functions - no read function as I didn't
needed it, so you must create one for your own.

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
[EMAIL PROTECTED]   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpkLPh9TonWL.pgp
Description: PGP signature


Re: File creation using KLD

2006-02-05 Thread Pawel Jakub Dawidek
On Sun, Feb 05, 2006 at 11:42:04PM +0530, Pranav Sawargaonkar wrote:
+ Hi
+ I want  to create a file on disk using  KLD and then tryout some reading and
+ writing stuff on  that file,so can any one suggest me any solution i.e.
+ functions to use and locks which i need to carry out this.

This is a bit tricky, ie. there is no clean API for this, but it is of
course possible.

There are few frameworks in the kernel that do exactly this. One of them
is alq(9), so take a look at sys/kern/kern_alq.c.

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
[EMAIL PROTECTED]   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpH2IDlMMU2n.pgp
Description: PGP signature


More user developers friendly memguard.

2005-12-27 Thread Pawel Jakub Dawidek
Here is the patch:

http://people.freebsd.org/~pjd/patches/kern_malloc.c.3.patch

It allows to configure memory type to debug without recompilling the
kernel. It also allows to debug kernel modules with memguard.

The rules:
1. If memory type is compiled into the kernel vm.memguard_desc should be
   configured in /boot/loader.conf.
2. If memory type is in kernel module, vm.memguard_desc sysctl should be
   configured before loading the module.

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
[EMAIL PROTECTED]   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgp2yAA7gXdRK.pgp
Description: PGP signature


Re: accessing NetBSD filesystem

2005-12-18 Thread Pawel Jakub Dawidek
On Sun, Dec 18, 2005 at 04:16:16PM +0100, [EMAIL PROTECTED] wrote:
+ On Sun, Dec 18, 2005 at 01:54:18AM +0100, Gilbert Fernandes wrote:
+  
+  The FreeBSD UFS is the FFS accessed through the VFS layer, but basically
+  the format is the same. If you want to have access, from FreeBSD, to
+  NetBSD partitions, make sure the NetBSD partitions have been formated
+  using FFSv2 which is the port of UFS to NetBSD. There are some
+  differences though : no ACL support nor snapshots available there.
+ 
+ FFS v1 and v2 are both working. I'm using that everyday. The one part
+ which needs attention is soft updates: FreeBSD / DragonFly have it as
+ permanent flag, NetBSD as mount option.

Interesting. In FreeBSD fsck(8) works differently for SU-enabled FS, so
having SU as a mount option won't be possible (if we want to protect our
users from a foot-shooting).
And because of the way SU works, it is possible to run background fsck,
as the only problems are unreferenced objects (inodes, blocks, etc.).

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
[EMAIL PROTECTED]   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpBMllV0FSGm.pgp
Description: PGP signature


Re: unable to build geom_gate

2005-12-17 Thread Pawel Jakub Dawidek
On Fri, Dec 16, 2005 at 05:27:11PM +0700, Vitaliy Ovsyannikov wrote:
+ Hello, freebsd-hackers.
+ 
+ Please, look at the output and help if you can:
+ 
+ # tar -yxf geom_gate.tbz
+ # cd geom_gate
+ # make
[...]

Why don't you just use ggate from the base system?

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
[EMAIL PROTECTED]   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgp0Bf63j8rE3.pgp
Description: PGP signature


Re: SSH From within a Jail

2005-11-14 Thread Pawel Jakub Dawidek
On Sun, Nov 13, 2005 at 09:26:05PM +0100, Koen Martens wrote:
+ Just remembered something else: do you jexec into the jail, or do
+ you do a proper logon (eg. ssh into the jail). I think that if you
+ jexec into the jail and then try to ssh, you might have a problem
+ because you aren't really logged in to the jail and thus have no
+ (psuedo) tty associated with your session..

I just saw this thread. Yes, you are right, I can confirm this.
To be able to ssh to another server from within a jail, you need to
log in to the jail properly (have access to your terminal), so
jexec won't work here.
Try to ssh into the jail and then ssh to another box.

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
[EMAIL PROTECTED]   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpRTiVDkIhMn.pgp
Description: PGP signature


Re: GEOM for multipath? How?

2005-11-10 Thread Pawel Jakub Dawidek
On Thu, Nov 10, 2005 at 02:48:15PM +0100, Poul-Henning Kamp wrote:
+ In message [EMAIL PROTECTED], Sergey 
+ Babkin writes:
+ From: Danny Howard [EMAIL PROTECTED]
+ 
+ Hey ... yes, I recall there being issues with the QLogic drivers ... I
+ wonder if anyone has given the mpt drivers a shot?  I was able to speak
+ with an engineer at Engenio (now owned by LSI) and she said there were
+ some issues with the QLogic dual-port cards that were interesting to
+ her, but the LSI dual-port cards behaved differently ...
+ 
+ QLogic worked fine in multi-path configuration with UnixWare.
+ I think LSI and Adaptec did too. The only trick is to make sure 
+ that the IRQs of the cards are not shared between the cards or with
+ any other device.
+ 
+ I suspect it is not the card as much as the driver, but I am not sure.

I was able to modify the driver in a way multipathing started to work
(no more hanging request when path was disconnected). It was hackish,
but worked, so I'm quite sure it's driver's fault.

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
[EMAIL PROTECTED]   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpPdJmurRePN.pgp
Description: PGP signature


Re: [PATCH] IPv6 support for ggate

2005-11-01 Thread Pawel Jakub Dawidek
On Thu, Oct 27, 2005 at 11:04:50PM -0500, Craig Boston wrote:
+ Hi hackers:
+ 
+ Today I had a need to run ggate over an IPv6-only network.  I was a
+ little surprised that it didn't seem to like that, but not discouraged.
+ So here's a patch that adds IPv6 support for ggated(8) and ggatec(8)
+ ;)

Thanks a lot! Unfortunately I don't have time to setup test environment
(I don't use ipv6 at all) and it can take a while before I'll be ready
for committing this (if noone else beat me on this).
I'll be grateful if you could file PR and send me its number. Thanks!

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
[EMAIL PROTECTED]   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgp1u3R0vdcRB.pgp
Description: PGP signature


Re: Kernel Source Divergence, Security (was: booting gbde-encrypted filesystem)

2005-07-31 Thread Pawel Jakub Dawidek
On Sun, Jul 31, 2005 at 04:07:27PM +0200, Poul-Henning Kamp wrote:
+ In message [EMAIL PROTECTED], Allan Fields writes:
+ 
+ Yes, this is all very nice, but when is someone actually going to
+ commit it? ;)
+ 
+ I'm (as always) short of time, and GBDE is not the top priority
+ for me for the time being.
+ 
+ So I am more than happy to see people band together and improve
+ gbde.
+ 
+ The main work necessary is to polish the userland program and that
+ is relatively trivial programming, so anyone should be able to pick
+ that up: just go for it.
+ 
+ Giving gbde a taste function so that the root filesystem can be
+ protected by GBDE, this is also OK by me in principle, but I'd like
+ to review the patch before it gets committed because there are a
+ large number of dragons.
+ 
+ In P4:phk_gbde there is the beginning of hw-crypto support through
+ opencrypto(9), if somebody wants to work on that, get in touch with
+ me.

I'm starting to wonder if we couldn't create one storage-crypto-base
and rewrite gbde, geli on top of it.
geli(8) is complete, ie. you can use any command on attached and
detached providers, you can backup your metadata, protect your passphrase
with PKCS#5v2, use files as a key part, etc.
gbde(8) (userland tool) is not finished (all those things I've in
geli already are on its todo list).

I've plan for another crypto-storage class, which will provide privacy
and integrity verification (the very thing we are missing now).
I want another class, because it will be slower than geli in both
crypto-time and disk-access-time aspects.
Another possibility is to integrate two classes and allow user to
decide if he wants privacy, integrity verification or both.

If someone can spend time on integreting gbde crypto scheme into geli
where userland part is complete, where crypto(9) is used already, etc.
that'd be cool.
The truth is, that the main difference between gbde/geli is how crypto is
used on disk, the other elements (managing keys, protecting passphrases,
metadata backups, encrypted root partition, etc.) are or could be the same.

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
[EMAIL PROTECTED]   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgp7lyHzfctn3.pgp
Description: PGP signature


Re: booting gbde-encrypted filesystem

2005-07-29 Thread Pawel Jakub Dawidek
On Fri, Jul 29, 2005 at 01:18:10PM +0800, Ronnel P. Maglasang wrote:
+ Hello,
+ 
+ I think there was already a thread on this. I just
+ want to raise the question again if anyone has successfully
+ booted an gdbe-encrypted filesystem (everything encrypted except
+ the bootloader). The passphrase is entered at the bootloader prompt
+ or embedded in the bootloader.

This is not not possible with current GBDE.
I've patches which allows this here:

http://people.freebsd.org/~pjd/patches/gbde.patch

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
[EMAIL PROTECTED]   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgprEhgdp0jjZ.pgp
Description: PGP signature


Re: booting gbde-encrypted filesystem

2005-07-29 Thread Pawel Jakub Dawidek
On Fri, Jul 29, 2005 at 09:56:18AM +0200, Jeremie Le Hen wrote:
+  This is not not possible with current GBDE.
+  I've patches which allows this here:
+  
+ http://people.freebsd.org/~pjd/patches/gbde.patch
+ 
+ This is great.  Do you intend to commit it someday ?  I know the GELI
+ framework allows to use an encrypted root partition, but it would be
+ interesting for GBDE users to be provided such a fonctionnality.

I sent those patches to phk@ few months ago now. If he decided to add
such functionality he is welcome to use them:)
I'm not going to commit it by myself.

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
[EMAIL PROTECTED]   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpBzeFSPfvLU.pgp
Description: PGP signature


Re: Google SoC idea

2005-06-07 Thread Pawel Jakub Dawidek
On Mon, Jun 06, 2005 at 06:11:23PM +0200, Ivan Voras wrote:
+ I have an idea that I could implement through Google's Summer of Code 
+ project, but as I have little experience with stuff it involves (kernel 
+ programming / disks / filesystem optimization), so I expect any answer 
+ from It won't work or It's useless to It can't be done.  :)
+ 
+ The idea is this: to implement sort of GEOM-layer disk data journaling 
+ system. I imagine it to be a GEOM class using two lower-level devices: 
+ one for data and one for the journal (this way, the journal device can 
+ be on a fast and small disk). Such journaled device could be used to 
+ host any filesystem, probably mounted with synchronoues-access, and it 
+ will result in faster write access by keeping the writes sequential in 
+ the journal device. Journal information will be commited to the data 
+ disk periodically by a separate log-writer thread, or when it gets full. 
+ The data disk will be consistent so it can be used without it's 
+ journal part (after a clean disconnect/rebuild) if needed. At the 
+ worst case, I think this will help performance in cases when there's a 
+ burst of write activity followed by a period of IO idleness.
+ 
+ I've made the above idea more-or-less from my head in one afternoon, so 
+ it's perfectly possible that I'm missing some vital point or that it's 
+ complete nonsense :)
+ 
+ Does it make sense to do it this way? Is it worth applying for the SoC?

Not sure. Basically this is simlar what softupdate does, I think.
From another point of view softupdates are only available for UFS.
You probably wants to hear scottl and phk opinions (CCed).

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
[EMAIL PROTECTED]   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpZfsCV0hMzR.pgp
Description: PGP signature


Re: ggate failures.

2005-04-11 Thread Pawel Jakub Dawidek
On Sat, Apr 09, 2005 at 05:19:43PM -0400, David Gilbert wrote:
+ I have two systems, each with 4 300 gig SATA disks.  Let's call them
+ m0 and m1.  M1 exports it's disks with ggated ... on two private GigE
+ networks.  M0, on those same two GigE networks, imports them with
+ ggatec.  M0, then does the following:
+ 
+ MirrorDisks
+ ===
+ s0ggate0 da0s1g
+ s1ggate1 da1s1g
+ s2ggate2 da2s1g
+ s3ggate3 da3s1g
+ 
+ And then:
+ 
+ concat   Disks
+ ==   =
+ v0   s0 s1 s2 s3
+ 
+ (so v0 is a concatination of 4 mirrors that consist of a local and
+ remote disk, each)
+ 
+ Now... This all works, and we create a filesystem on v0.  The problem
+ arises that whenever a lot of activity occurs on v0 (untaring a copy
+ of /usr is sufficient), the ggate links break down.  An example
+ message from the dmesg:
+ 
+ GEOM_MIRROR: Request failed (error=5). ggate2[WRITE(offset=25989184, 
length=8192)]
+ 
+ Now... I don't know a lot about ggate, but this appears trivial to
+ trigger.  Has anyone tried similar configurations and is there any
+ wisdom about ggate configurations?

Set kern.geom.gate.debug to 1 and send output which is generated on
failures.

I've much improved ggate in perforce, but it needs some polishing still...

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
[EMAIL PROTECTED]   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpSqGJL3Gspx.pgp
Description: PGP signature


Re: JKH Task: Stack saving/tracing functionality.

2005-04-11 Thread Pawel Jakub Dawidek
On Mon, Apr 11, 2005 at 09:08:33AM -0400, Jeff Roberson wrote:
+ I have proprietary code from a previous employer of mine that implements
+ some really useful debugging features.  I'm looking for someone who is
+ interested in cleaning it up, making it architecture indepenent, and
+ getting it running on current.  The code basically allows you to save and
+ manipulate stack information.
+ 
+ This would be very useful for things like lockmgr, which right now we
+ can't really pass file:line information down to without making #ifdef mess
+ of all of the APIs as options DEBUG_LOCKs does somewhat today.  Lockmgr
+ would have a buffer which contained the last N EIPs up the callstack, and
+ this information could be queried and printed using a simple api.
+ 
+ Interested parties please email me.  We can discuss this and I can provide
+ source.

It would be probably useful for wintess, so when first order is stored,
it can be stored with stack and on LOR, both backtraces can be shown.

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
[EMAIL PROTECTED]   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpIIuh0a028w.pgp
Description: PGP signature


Re: Idea about skeleton jail

2005-02-01 Thread Pawel Jakub Dawidek
On Mon, Jan 31, 2005 at 11:13:04PM -0800, Justin Hopper wrote:
+ We are considering open sourcing all of our stuff, to contribute back
+ what we can to the OS that allowed us to build our entire company.  I'd
+ really like to see what others have done to make jails more manageable,
+ as it seems like there is so much that can be done but not many people
+ are working on it.  It seems jails have the potential to become an
+ incredible way to virtually partition servers, and it would not be that
+ hard to implement solid tools for managing them.  We have things like
+ JID-aware top and tools for automated jail builds, but it would be great
+ to work with some FreeBSD heavies to finish up clean development of
+ things like jail resource restrictions (CPU,MEM,#PROCS,etc) and perhaps
+ a clean and universally useful way to easily configure and launch full
+ jail environments.

Yes, it would be useful (I mean CPU/MEM/#PROCS limits), but as I understand
there are two kinds of opinions about jails. First is that it should be
extended and allow to create a real virtual server and second is that it
should be light-weight.

+ Pawel had some really interesting ideas for jails, but it seems that
+ he's too busy to work on them at the moment.  Speaking of which, his
+ multiple IPs patch for 5.3 is still broken, and I haven't been able to
+ find what the problem is =(

Could you describe the brokeness? I've made some fixes a week or something
ago, I just created a patch against HEAD if you want to try it:

http://people.freebsd.org/~pjd/patches/jail_2005020101.patch

There can still be some remaining issues, but I don't have time for more
detailed tests.


The thing that can be useful IMHO is possibility to use
reboot(8)/shutdown(8), etc. inside a jail, but...
I'm unfortunately too busy with other (probably less interesting, but
profitable) projects.

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
[EMAIL PROTECTED]   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpiLz0M1Jpi6.pgp
Description: PGP signature


Re: Idea about skeleton jail

2005-02-01 Thread Pawel Jakub Dawidek
On Wed, Feb 02, 2005 at 12:52:17AM +0800, Xin LI wrote:
+ ??? 2005-02-01?? 11:40 +0100???Pawel Jakub Dawidek?
+  The thing that can be useful IMHO is possibility to use
+  reboot(8)/shutdown(8), etc. inside a jail, but...
+  I'm unfortunately too busy with other (probably less interesting, but
+  profitable) projects.
+ 
+ Quick question:  Is this mean we can have init(8) running in jail?

Yes, I started a branch for this work (pjd_jailinit), but...

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
[EMAIL PROTECTED]   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgp2fMX5uvVRx.pgp
Description: PGP signature


Re: Idea about skeleton jail

2005-02-01 Thread Pawel Jakub Dawidek
On Tue, Feb 01, 2005 at 01:31:11PM -0800, Justin Hopper wrote:
+   I've made some fixes a week or something
+  ago, I just created a patch against HEAD if you want to try it:
+  
+ http://people.freebsd.org/~pjd/patches/jail_2005020101.patch
+  
+  There can still be some remaining issues, but I don't have time for more
+  detailed tests.
+ 
+ Excellent, I'll try the patch here in a couple of minutes.  Can you tell
+ me what the known issues are with the patch?  Perhaps I can lend a hand
+ on helping to resolve them.

Frankly, I don't know. It just needs detailed testing.

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
[EMAIL PROTECTED]   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgp1aIJTzcoLj.pgp
Description: PGP signature


Re: 5.3-STABLE: handle_workitem_freefile panic

2005-01-23 Thread Pawel Jakub Dawidek
On Sun, Jan 23, 2005 at 05:38:55PM +0300, Dmitry Morozovsky wrote:
+ On Sun, 23 Jan 2005, Dmitry Morozovsky wrote:
+ 
+ DM I'm building debig kernel now and enabling kernel dumps, just to be 
sure, but 
+ DM it seems some sporadic file system inconsistencies...
+ 
+ Hmmm... it reveals geom_mirror I use (over two SATA drives) both does not 
+ support dumping, and hides underlying ad*[a-h] partitions. Should I gmirror 
+ distinct partitions instead of whole ad4 and ad6?

You cannot dump to GEOM providers, especially from rank  1 geoms.
It is better to create more mirrors for every partition you want to mirror
(or slice) instead of whole disk.
One of the benefits is that gmirror marks mirror as clean if there were no
WRITE requests in few seconds, so even after power failure resynchonization
is not needed. When you've many smaller mirrors, after unclean shutdown you
probably don't need rebuild all mirrors.
The argument against could be that when you synchronize many mirrors on
the same disks in parallel, your disks are less happy (in one big mirror
scenario, disk's heads don't have to jump from one place to another so
often).

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
[EMAIL PROTECTED]   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpFHrvUC4R0m.pgp
Description: PGP signature


Re: 5.3-STABLE: handle_workitem_freefile panic

2005-01-23 Thread Pawel Jakub Dawidek
On Sun, Jan 23, 2005 at 07:24:26PM +0300, Dmitry Morozovsky wrote:
+ However: how can the be achieved the following goal: have mirrored swap (to 
+ keep redundancy and HA) and a place to dump panic images to, modulo having 
+ scratch disk and/or scratch unused partition?

When you have dedicated mirror only for swap (e.g. mirror on ad0s1b and
ad2s1b) you probably should be able to dump into ad[02]s1b (but I didn't
test it).

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
[EMAIL PROTECTED]   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgphQS8RFv4X6.pgp
Description: PGP signature


Re: geom mirror and gbde

2005-01-21 Thread Pawel Jakub Dawidek
On Fri, Jan 21, 2005 at 09:56:58AM +0100, Attila Nagy wrote:
+ Hello,
+ 
+ I would like to use gbde on a geom mirror, but /etc/rc.d/gbde fails if 
+ there is a slash in the device name.
+ 
+ I don't know what would be the clean solution, I used the attached diff 
+ to solve the problem.
+ 
+ Please review it and if there is a better solution, commit it.

Acha! I fixed gbde(8) to accept devices with / in them, but forgot about
rc.d/gbde.

+ @@ -81,16 +81,17 @@
+  for device in $gbde_devices; do
+  parent=${device%.bde}
+  parent=${parent#/dev/}
+ -eval 
lock=\${gbde_lock_${parent}-\${gbde_lockdir}/${parent}.lock\}
+ +parent_=`echo ${parent} | sed s/\//_/g`
+ +eval 
lock=\${gbde_lock_${parent_}-\${gbde_lockdir}/${parent_}.lock\}
+  if [ -e /dev/${parent} -a ! -e /dev/${parent}.bde ]; then
+  echo Configuring Disk Encryption for ${parent}.

Only this part is needed.
Committed to HEAD, MFC after 1 week. Thanks!

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
[EMAIL PROTECTED]   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpyfgPaUZaUx.pgp
Description: PGP signature


Re: Freeze when using atapicam

2005-01-06 Thread Pawel Jakub Dawidek
On Wed, Jan 05, 2005 at 09:26:00PM +0100, Olivier Certner wrote:
+  Hello all,
+ 
+  Would someone have the time to look at my previous post dated January 4th, 
+ 15:43 GMT on the freebsd-questions mailing list?

(I haven't read you post on questions@, but...)
I've a hang on boot when I use atapicam with my DVD-RW. With CD-ROM
everything is ok.

I'm able to boot and work without any problems on my DVD-RW only with
atapi DMA turned off in /boot/loader.conf:

hw.ata.atapi_dma=0

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
[EMAIL PROTECTED]   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpioiOgvunDB.pgp
Description: PGP signature


Re: Odd geom behaviour.

2004-12-21 Thread Pawel Jakub Dawidek
On Mon, Dec 20, 2004 at 10:51:11PM -0500, David Gilbert wrote:
Content-Description: message body text
+ I have a set of 12 disks.  2x9G and 10x4.5G.  I have a setup whereby I
+ run a gmirror on each pair of disks and then a gconcat on the
+ mirrors.  Attached is a copy of the gmirror and gconcat lists.
+ 
+ Now... I shutdown -r this machine (which happens to be an alpha) and
+ it shuts down happily.  However, _every_ time it reboots, it wishes to
+ rebuild the mirrors:
+ 
+ GEOM_MIRROR: Device m1 created (id=4055141955).
+ GEOM_MIRROR: Device m1: provider da1 detected.
+ GEOM_MIRROR: Device m1: provider da2 detected.
+ GEOM_MIRROR: Device m1: provider da2 activated.
+ GEOM_MIRROR: Device m1: provider mirror/m1 launched.
+ GEOM_MIRROR: Device m1: rebuilding provider da1.
+ 
+ (x5 more for the other mirrors).  Now this isn't particularly bad, I
+ suppose, except that the machine is occupied for some number of
+ minutes after boot with this activity.  No fsck ... the filesystem is
+ happy.
+ 
+ The machine is available for testing should someone want to look at
+ it.  In fact, the machine is part of my retrocluster of hardware
+ running FreeBSD and NetBSD (if someone needs hardware with serial
+ consoles to debug, this is the purpose of the retrocluster).
+ 
+ Anyways... ideas?
+ 
+ (note that in these files, the mirrors are still rebuilding)

What system version are you using? If this is 5.3 you should place:

swapoff=YES

to your /etc/rc.conf and use shutdown(8) command to reboot/turn off your
machine.
This is fixed in HEAD in much more clean way already.

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
[EMAIL PROTECTED]   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgp3zciUToQjB.pgp
Description: PGP signature


Re: rc.shutdown and jails

2004-12-11 Thread Pawel Jakub Dawidek
On Sat, Dec 11, 2004 at 12:44:12AM -0800, Julian Elischer wrote:
+ I think we should introduce an init process for jails..
+ 
+ It would be responsible for all that the normal init is responsible for
+ except for being the default parent.. (some might argue for that too).
+ Sending it a particular signal would notify it to
+ send shutdown signals to all its compatriots in the jail etc.

I started to work on this in perforce: pjd_jailinit.

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
[EMAIL PROTECTED]   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpaK1nzuq661.pgp
Description: PGP signature


Re: Hand on gmirror (Was: Re: gmirror bugs, how many?)

2004-12-08 Thread Pawel Jakub Dawidek
On Wed, Dec 08, 2004 at 02:10:02AM +0200, Alexandr Kovalenko wrote:
+ Hello, Pawel Jakub Dawidek!
+  
+  This is known race, which is already fixed in HEAD. I want to commit it
+  soon.
+ 
+ Any plans on backporting it to RELENG_5 (RELENG_5_3 maybe?)?

I'm going to MFC it probably this weekend to RELENG_5, RELENG_5_3 is
closed for changes like this one.

+ To be on original topic - is there any way to make a mirror from live
+ system? I mean I have running FreeBSD on da0 and I want to make a
+ gmirror on it (I'm planning to add second drive soon). How to avoid
+ those disklabel warnings correctly?

You still need to reboot. The whole instruction you should find in
freebsd-geom@ mailing list archives, I wrote about this few times, AFAIR.

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
[EMAIL PROTECTED]   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpHbqJJEZamw.pgp
Description: PGP signature


Re: Hand on gmirror (Was: Re: gmirror bugs, how many?)

2004-12-08 Thread Pawel Jakub Dawidek
On Wed, Dec 08, 2004 at 11:40:44AM +0200, Alexandr Kovalenko wrote:
+  + To be on original topic - is there any way to make a mirror from live
+  + system? I mean I have running FreeBSD on da0 and I want to make a
+  + gmirror on it (I'm planning to add second drive soon). How to avoid
+  + those disklabel warnings correctly?
+  
+  You still need to reboot. The whole instruction you should find in
+  freebsd-geom@ mailing list archives, I wrote about this few times, AFAIR.
+ 
+ Could you please remember subject of that thread?
+ 
+ I was able to make a gmirror on live system using
+ kern.geom.debugflags=16, but problem with disklabel remains.

You cannot do this on live system, because you need to mount root file
system on top of the mirror and remounting root file system is not
possible. You need to create the mirror on 2nd disk first, etc.

Even if you store metadata on disk (with debugflags=16), changes will
not be updated on 2nd disk, because I/O requests go to the disk provider,
not to the mirror provider.

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
[EMAIL PROTECTED]   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpCpwiAJCThe.pgp
Description: PGP signature


Re: Multiple IPs in jail

2004-12-08 Thread Pawel Jakub Dawidek
On Wed, Dec 08, 2004 at 09:51:31AM -0800, Justin Hopper wrote:
+ Thanks for understanding my question, Devon.  I guess at this point I'll
+ patch a system here and begin testing with it, and hopefully PJD or PHK
+ or somebody else @freebsd will respond with any plans to roll this
+ functionality into the base system.  It's really not a problem if there
+ is no plans to do it, I just don't want to spend a lot of time fiddling
+ with a patch and then find it in the base system in 5.4 or something.

My patch still has some issues. I updated the patch against HEAD from a
minute ago:

http://people.freebsd.org/~pjd/patches/jail_2004120901.patch

I don't have time to work on this right now, so can't say if/when it'll
be committed.

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
[EMAIL PROTECTED]   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpSjldRH7uzw.pgp
Description: PGP signature


Re: Hand on gmirror (Was: Re: gmirror bugs, how many?)

2004-11-29 Thread Pawel Jakub Dawidek
On Mon, Nov 29, 2004 at 12:52:40PM -0200, Jo?o Carlos Mendes Lus wrote:
+ + Indeed, the -h option is what I wanted and the bug is in the 
+ + manual. What would happen if I change the disc ID in this case?
+ 
+ Your disk will not be detected as a mirror component, because hardcoded
+ name is different.
+ 
+ Oops.  Is there a check for that?  For example, let's say that ad0s1 got 
+ renamed to ad1s1, and hardcoded a reference to ad0s1.  In this case, 
+ there is a disk called ad0s1 in the system.  Is gmirror smart enough in 
+ this case?

In this case ad1s1 will not be connected to the mirror (but don't worry,
ad0s1 will not be connected as well).

+ + sigesc::root jcmendes [553] disklabel mirror/vol0
+ + # /dev/mirror/vol0:
+ + 8 partitions:
+ + #size   offsetfstype   [fsize bsize bps/cpg]
+ +   a: 16498864   16unused0 0
+ +   c: 164988800unused0 0 # raw part, 
+ + don't edit
+ + sigesc::root jcmendes [554]
+ + 
+ +   Seems good until now.  Except for the offset 16 of the a partition. 
+ +  Is this necessary?  The man page says that the only sector reserved 
+ + for metadata is the provider's last one.
+ 
+ Ehh, blame disklabel(8). First 16 sectors are reserved for boot code.
+ 
+ And why this does not happen with ad0s1, etc?

I think it should, only using sysinstall for this will not allocate those
sectors. Anyway, it has nothing to do with gmirror.

-- 
Pawel Jakub Dawidek   http://www.FreeBSD.org
[EMAIL PROTECTED]   http://garage.freebsd.pl
FreeBSD committer Am I Evil? Yes, I Am!


pgpcMmdf82g2J.pgp
Description: PGP signature


Re: Hand on gmirror (Was: Re: gmirror bugs, how many?)

2004-11-29 Thread Pawel Jakub Dawidek
On Mon, Nov 29, 2004 at 07:27:51PM -0200, Jo?o Carlos Mendes Lus wrote:
+ I finally got the system to boot with gmirror fully enabled.  But I got 
+ this during boot:
+ 
+ 
+ GEOM_MIRROR: Device vol0 created (id=3592859320).
+ GEOM_MIRROR: Device vol0: provider ad0s1 detected.
+ GEOM_MIRROR: Device vol0: provider ad1s1 detected.
+ GEOM_MIRROR: Device vol0: provider ad1s1 activated.
+ GEOM_MIRROR: Cannot update metadata on disk ad0s1 (error=1).
+ GEOM_MIRROR: Device vol0: provider ad0s1 activated.
+ GEOM_MIRROR: Device vol0: provider mirror/vol0 launched.
+ GEOM_MIRROR: Cannot update metadata on disk ad0s1 (error=1).
+ GEOM_MIRROR: Device vol0: provider ad0s1 disconnected.

This is known race, which is already fixed in HEAD. I want to commit it
soon.

-- 
Pawel Jakub Dawidek   http://www.FreeBSD.org
[EMAIL PROTECTED]   http://garage.freebsd.pl
FreeBSD committer Am I Evil? Yes, I Am!


pgpoTcPvSZWCS.pgp
Description: PGP signature


Re: Hand on gmirror (Was: Re: gmirror bugs, how many?)

2004-11-28 Thread Pawel Jakub Dawidek
On Fri, Nov 26, 2004 at 05:55:51PM -0200, Jo?o Carlos Mendes Lus wrote:
+ Pawel Jakub Dawidek wrote:
[...]
+ What error do you get when you try to do this?
+ 
+ Step by step:
+ 
+ - The system has started with a preloaded geom_mirror:
[...]
+ - There is a running mirror partition:
[...]
+ - Now let's try to remove (disable was my intention, a bad idea):
+ 
+ sigesc::root jcmendes [524] gmirror unload
+ Could not unload module: Device not configured.
+ sigesc::root jcmendes [525] gmirror list
+ sigesc::root jcmendes [526] gmirror load
+ Command 'load' not available.
+ sigesc::root jcmendes [527] gmirror list
+ sigesc::root jcmendes [528] kldstat
+ Id Refs AddressSize Name
+  1   13 0xc040 3126c4   kernel
+  21 0xc0713000 10be8geom_mirror.ko
+  3   14 0xc0724000 59340acpi.ko
+  41 0xc106a000 6000 linprocfs.ko
+  51 0xc107 18000linux.ko
+  61 0xc1183000 2000 fade_saver.ko
+ sigesc::root jcmendes [529] ls -l /dev/mirror/
+ total 1
+ dr-xr-xr-x  2 root  wheel  512 Nov 26 12:19 .
+ dr-xr-xr-x  5 root  wheel  512 Nov 26 12:19 ..
+ sigesc::root jcmendes [530]
+ 
+ - Well, something not good happened.  The device did not unload, and do 
+ not list any device anymore.  Trying to reload it has no effect.
+ - This used to work before preloading it in loader.conf, but then I 
+ would not be able to boot a mirror partition.
[...]

Not working 'unload' command is because of bug in GEOM. Now, to avoid
deadlock you get an error (ENXIO), but mirror will be destroyed.
The next 'unload' should be ok. To avoid those errors, you should first
stop all mirrors (unsing 'stop' command) and then unload kernel module.
BTW. There is no 'reload' command.

+ Indeed, the -h option is what I wanted and the bug is in the 
+ manual. What would happen if I change the disc ID in this case?

Your disk will not be detected as a mirror component, because hardcoded
name is different.

+ sigesc::root jcmendes [553] disklabel mirror/vol0
+ # /dev/mirror/vol0:
+ 8 partitions:
+ #size   offsetfstype   [fsize bsize bps/cpg]
+   a: 16498864   16unused0 0
+   c: 164988800unused0 0 # raw part, 
+ don't edit
+ sigesc::root jcmendes [554]
+ 
+   Seems good until now.  Except for the offset 16 of the a partition. 
+  Is this necessary?  The man page says that the only sector reserved 
+ for metadata is the provider's last one.

Ehh, blame disklabel(8). First 16 sectors are reserved for boot code.

-- 
Pawel Jakub Dawidek   http://www.FreeBSD.org
[EMAIL PROTECTED]   http://garage.freebsd.pl
FreeBSD committer Am I Evil? Yes, I Am!


pgpWBT4JUfk0K.pgp
Description: PGP signature


Re: gmirror bugs, how many?

2004-11-26 Thread Pawel Jakub Dawidek
On Fri, Nov 26, 2004 at 02:56:15AM -0200, Jo?o Carlos Mendes Lus wrote:
+ Pawel Jakub Dawidek wrote:
+ First mistake - wrong order. Create a mirror, than partition a mirror
+ provider.
+ 
+ Is this a constraint in the design?  Im my point of view, geom would 
+ treat all block devices equally, no matter if they are whole disks or 
+ single partitions.

You're right and it does so.

+ If this is not the case, them maybe this should be noted in the man page.
+ 
+ Note that sometimes it is not necessary to have a whole disk redundant. 
+  I could use part of it to temporary data, for example.  I've done this 
+ with vinum in 4-stable more than once: I get two disks, each with a copy 
+ of the root partition (which I intended to mirror with gmirror), a swap 
+ partition, a mirror vinum subdisk and a stripe vinum subdisk.  Note that 
+ in this case the data integrity and cost is more important than 
+ continuous operation.  If a disk fail, the server will stop, but no 
+ *important* data will get lost.  This is the scenario which I was testing.

You can do that with gmirror.
All I'm saying is that you first should create a mirror, then create slices
and partitions on top of mirror provider, because you want to use
mirror/vol0s1a, not ad0s1a. Note, that mirror/vol0 is one sector shorter
than ad0 and imagine a situation when gmirror stores metadata in the same
place where BSD stores it - if you first create a mirror and then
partitions on ad0 you'll overwrite gmirror metadata. This very painful,
that in MBR metadata is visible on traffic providers and I don't want to
repeat that mistake.

+ + Now, lets reboot.  I could not unload geom_mirror, since it was 
+ + preloaded during boot, is this expected?  The device could not be 
+ + unloaded, but the volume disapeared (gmirror list, ls /dev/mirror). 
+ 
+ If there is no mirror configured, you should be able to unload it.
+ 
+ Before putting it in /boot/loader.conf, unload worked, even with mirror 
+ devices configured, IIRC.  Only after loader.conf preloading this 
+ problem appeared.

What error do you get when you try to do this?

+ The man page says only:
+ 
+ -h   Hardcode providers' names in metadata.
+ 
+ and does not explain when I should use this.
+ 
+ Do you mean that if I want it to use ad1s1 as the provider, and not ad1, 
+ -h is what I want?

Only when you share the last sector between those two providers.
You can still create ad1s1, which is one sector shorter.

+ 
+ + Is there any gmirror hacker around to fix these?
+ 
+ There is nothing to fix.
+ 
+ Surely there is.  At least the manual.

I've to agree here.:)

+ And even if gmirror is correct, there's also the problem shown with 
+ disklabel in my previous email.

What problem is there when you use proper order of doing things?

-- 
Pawel Jakub Dawidek   http://www.FreeBSD.org
[EMAIL PROTECTED]   http://garage.freebsd.pl
FreeBSD committer Am I Evil? Yes, I Am!


pgpGwxxWqUz6n.pgp
Description: PGP signature


Re: FreeBSD Kernel buffer overflow

2004-09-19 Thread Pawel Jakub Dawidek
On Sat, Sep 18, 2004 at 09:13:42PM -0700, Julian Elischer wrote:
+ +#if (__i386__)  (INVARIANTS)
+ +   KASSERT(new_sysent-nargs = 0  new_sysent-nargs = 
+ i386_SYS_ARGS,
+ +   invalid number of syscalls);
+ +#endif
+ +
+*old_sysent = sysent[*offset];
+sysent[*offset] = *new_sysent;
+return 0;
+ 
+ 
+ Why panic the machine at this point?  Just refuse to install the syscall
+ and return an error.
+ 
+ and the test for INVARIANTS is un-needed.. KASSERT only compiles to anything
+ when INVARIANTS is defined.

...and it should be '#ifdef', not '#if'.
...and the panic message should be inside ().

-- 
Pawel Jakub Dawidek   http://www.FreeBSD.org
[EMAIL PROTECTED]   http://garage.freebsd.pl
FreeBSD committer Am I Evil? Yes, I Am!


pgpzjpAm2AMY1.pgp
Description: PGP signature


Re: FreeBSD Kernel buffer overflow

2004-09-18 Thread Pawel Jakub Dawidek
On Fri, Sep 17, 2004 at 12:37:12PM +0300, Giorgos Keramidas wrote:
+ % +#ifdef INVARIANTS
+ % +   KASSERT(0 = narg  narg = 8, (invalid number of syscall args));
+ % +#endif

Maybe:
KASSERT(0 = narg  narg = sizeof(args) / sizeof(args[0]),
(invalid number of syscall args));

So if we decide to increase/decrease it someday, we don't have to remember
about this KASSERT().

-- 
Pawel Jakub Dawidek   http://www.FreeBSD.org
[EMAIL PROTECTED]   http://garage.freebsd.pl
FreeBSD committer Am I Evil? Yes, I Am!


pgpSWfnBU9LRz.pgp
Description: PGP signature


Re: FreeBSD Kernel buffer overflow

2004-09-18 Thread Pawel Jakub Dawidek
On Sat, Sep 18, 2004 at 02:18:55AM -0700, Don Lewis wrote:
+ On 18 Sep, Pawel Jakub Dawidek wrote:
+  On Fri, Sep 17, 2004 at 12:37:12PM +0300, Giorgos Keramidas wrote:
+  + % +#ifdef INVARIANTS
+  + % +   KASSERT(0 = narg  narg = 8, (invalid number of syscall args));
+  + % +#endif
+  
+  Maybe:
+  KASSERT(0 = narg  narg = sizeof(args) / sizeof(args[0]),
+  (invalid number of syscall args));
+  
+  So if we decide to increase/decrease it someday, we don't have to remember
+  about this KASSERT().
+ 
+ What keeps the attacker from installing two syscalls, the first of which
+ pokes NOPs over the KASSERT code, and the second of which accepts too
+ many arguments?

First of all, this is not protection from an attacker, but help for bad
programmers.

+ If you think we really need this bit of extra security, why not just
+ prevent the syscall with too many arguments from being registered by
+ syscall_register()?  At least that keeps the check out of the most
+ frequently executed path.

Good point, this is much better place for it.

-- 
Pawel Jakub Dawidek   http://www.FreeBSD.org
[EMAIL PROTECTED]   http://garage.freebsd.pl
FreeBSD committer Am I Evil? Yes, I Am!


pgp95AlGUtH0A.pgp
Description: PGP signature


Re: kern___getcwd() returns ENOTDIR

2004-06-28 Thread Pawel Jakub Dawidek
On Sun, Jun 27, 2004 at 11:12:20AM -0700, David Schultz wrote:
+ On Sun, Jun 27, 2004, Kentucky Mandeloid Mo. wrote:
+  I'm writng a smal kernel module that catches file access syscalls.
+  At every syscall I need a full name of file is being passed to a syscall.
+  I'm getting it with a path passed to syscall and if path is not starting 
+  with / I get current working directory of process using kern___getcwd().
+  In every syscall all works just fine except rmdir  unlink.
+  Sometimes in unlink and everytime in rmdir it returns not a directory error.
+  I know already that kern___getcwd() works through vnode cache and this method 
+  is not a reliable way to get file names.
+  So is there any other way get cwd of a proccess? 
+ 
+ linux_getcwd() works in more cases than kern___getcwd(), but it
+ has other problems.

What problems does it have? Could you provide more details?
Was it discusses when patch for changing kern___getcwd() with
linux_getcwd() was introduced?

-- 
Pawel Jakub Dawidek   http://www.FreeBSD.org
[EMAIL PROTECTED]   http://garage.freebsd.pl
FreeBSD committer Am I Evil? Yes, I Am!


pgpI1d26GbBJ8.pgp
Description: PGP signature


Re: api for sharing memory from kernel to userspace?

2004-05-19 Thread Pawel Jakub Dawidek
On Wed, May 19, 2004 at 05:29:07AM -0700, Alfred Perlstein wrote:
+ I need to share about 100megs of memory between kernel and userspace.
+ 
+ The memory can not be paged and should appear contig in the process's
+ address space.  Any suggestions?
+ 
+ I need a way to either:
+ map user memory into the kernel's address space.
+ map kernel memory into the user's address space.
+ 
+ I was looking at pmap_qenter() but it didn't see attractive because
+ it's for short term mappings, this mapping will exist for quite a
+ while.

In mapping kernel memory into user's address space I am interested as well
for GEOM Gate and other evil projects.

-- 
Pawel Jakub Dawidek   http://www.FreeBSD.org
[EMAIL PROTECTED]   http://garage.freebsd.pl
FreeBSD committer Am I Evil? Yes, I Am!


pgpuGPDqfSSPA.pgp
Description: PGP signature


  1   2   3   >