Re: fsid change of ZFS?

2011-08-24 Thread Pawel Jakub Dawidek
On Wed, Aug 24, 2011 at 04:41:25PM +0300, Kostik Belousov wrote:
 On Wed, Aug 24, 2011 at 09:36:37AM -0400, Rick Macklem wrote:
  Well, doesn't this result in the same issue as the fixed table?
  In other words, the developer has to supply the suggested byte for
  fsid and make sure that it doesn't conflict with other suggested byte
  values or suffer the same consequence as forgetting to update the fixed
  table. (ie. It just puts the fixed value in a different place, from what
  I see, for in-tree modules. Also, with a fixed table, they are all in
  one place, so it's easy to choose a non-colliding value?)
 The reason for my proposal was Pawel note that a porter of the filesystem
 should be aware of some place in kern/ where to register, besides writing
 the module.

Well, he has to be aware, but we should do all we can to minimize the
number of place he needs to update, as it is easy to forget some.

I agree with Rick that what you proposed is similar to fixed table of
file system names and I'd prefer to avoid that. If we can have
name-based hash that produces no collision for in-tree file systems and
know current 3rd party file systems plus collision detection for the
future then it is good enough, IMHO. And this is what Rick proposed with
his patch.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://yomoli.com


pgpOyu4gfRBq3.pgp
Description: PGP signature


Re: fsid change of ZFS?

2011-08-23 Thread Pawel Jakub Dawidek
On Sat, Aug 20, 2011 at 08:15:34PM -0400, Rick Macklem wrote:
 Hiroki, could you please test the attached patch.
 
 One problem with this patch is that I don't know how to create a fixed
 table that matches what systems would already have been getting.
 (I got the first 6 entries by booting a GENERIC i386 kernel with a
  printf in vfs_init(), so I suspect those don't change much, although
  I'm not sure if ZFS will usually end up before or after them?)
 
 Do you guys know what ZFS gets assigned typically? (I realize that
 changes w.r.t. when it gets loaded, so the question also becomes
 do you know how it typically gets loaded so the table can have
 that vfc_typenum value assigned to it?)
 Maybe you could boot a system with a printf like:
 
 printf(%s, %d\n, vfc-vfc_name, vfc-vfc_typenum);
 
 just after vfc-vfc_typenum = maxvfsconf++; in vfs_init() and
 then look in dmesg after booting, to see what your tables look like?
 (Without the attached patch installed.)

Rick, I'm sorry to arrive so late, but in my opinion hardcoding list of
file systems in the kernel is a step in wrong direction, really.
We are trying to keep things modularized, so there are no such things
laying around that have to be cleaned up when file system goes away or
updated when new file system arrives.

I remember for example fts code where I found that it keeps list of file
systems that can be handled faster. ZFS could have been handled faster,
but I found this after few years.
For this case there should be VFCF_* flag that fts shuld recognize and not
hardcore file system names.

This was also the reason that when I added support for jail-friendly
file systems and support for file systems with delegated administration
I haven't created list of file system types that support it, but added
VFCF_JAIL and VFCF_DELEGADMIN flags.

Here you cannot use those flags to solve the problem, but hardcoding
file system types in an array is really not the way to go.

I much prefer Ben's idea of calculating a hash from file system name and
detecting collisions.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://yomoli.com


pgpJBiqW9g9eb.pgp
Description: PGP signature


Re: fsid change of ZFS?

2011-08-23 Thread Pawel Jakub Dawidek
On Tue, Aug 23, 2011 at 10:09:41AM -0400, Rick Macklem wrote:
 Ok, I'll admit I wasn't very fond of a fixed table that would inevitably
 get out of date someday, either.
 
 I didn't think hashing for the cases not in the table was worth the effort,
 but doing a hash instead of a table seems reasonable.
 
 I see that ZFS only uses the low order 8 bits, so I'll try and come up
 with an 8bit hash solution and will post a patch for testing/review soon.
 
 I don't think the vfs_sysctl() is that great a concern, given that it
 appears to be deprecated already anyhow. (With an 8bit hash, vfs_typenum
 won't be that sparse.) I'll also make sure that whatever hash I use
 doesn't collide for the current list of file names (although I will include
 code that handles a collision in the patch).

Sounds great. Thanks!

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://yomoli.com


pgpDcE0skKKef.pgp
Description: PGP signature


Re: fsid change of ZFS?

2011-08-23 Thread Pawel Jakub Dawidek
On Tue, Aug 23, 2011 at 04:11:20PM -0400, Rick Macklem wrote:
 Pawel Jakub Dawidek wrote:
  On Tue, Aug 23, 2011 at 10:09:41AM -0400, Rick Macklem wrote:
   Ok, I'll admit I wasn't very fond of a fixed table that would
   inevitably
   get out of date someday, either.
  
   I didn't think hashing for the cases not in the table was worth the
   effort,
   but doing a hash instead of a table seems reasonable.
  
   I see that ZFS only uses the low order 8 bits, so I'll try and come
   up
   with an 8bit hash solution and will post a patch for testing/review
   soon.
  
   I don't think the vfs_sysctl() is that great a concern, given that
   it
   appears to be deprecated already anyhow. (With an 8bit hash,
   vfs_typenum
   won't be that sparse.) I'll also make sure that whatever hash I use
   doesn't collide for the current list of file names (although I will
   include
   code that handles a collision in the patch).
  
  Sounds great. Thanks!
  
 Here's the patch. (Hiroki could you please test this, thanks, rick.)
 ps: If the white space gets trashed, the same patch is at:
http://people.freebsd.org/~rmacklem/fsid.patch

The patch is fine by me. Thanks, Rick!

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://yomoli.com


pgpsuJPptDK8Q.pgp
Description: PGP signature


Re: Weird issue with hastd(8)

2011-06-27 Thread Pawel Jakub Dawidek
On Sat, Jun 25, 2011 at 05:54:13PM +0300, Mikolaj Golub wrote:
 For me the idea to send updates to secondary only via
 synchronization thread, starting it periodically looks
 interesting. Sure it should not be the replacement for real
 async mode, but having something like this in hast apart other
 synchronization modes might be useful.
 
 Comparing it with real async  that is described in manual it has
 the following advantages:
 
 1) It is much easier to implement.
 
 2) If you have frequent updates of the same blocks, real async
 will send them all, while with sync thread approach we will skip
 many intermediate updates.

I must say I don't agree with your points here. We should not implement
one more replication mode, because it is easier to implement. Imagine
situation when we finally get proper 'async' mode and we will need to
explain to the users the difference between 'async' and 'async2' modes
as async2 was easier to implement back when we had no async yet, but
for you it does more or less the same. And we will need to keep support
for both of them. If anything, I'd prefer to call it 'async' and then
change underlying algorithm entirely. This will handle users confusion,
but still leaves the need to protocol compatiblity between hastds
implementing older and newer 'async'.

The second argument reveals weakness of this approach. The very
important thing is to keep data consistent when nodes are connected.
By 'consistent' I mean that in every point in time if primary dies,
secondary can start operating - it may have a bit older data in async
mode, but the data will be consistent - you can fsck file system and
start your services. In the way you described no care is taken to move
the data to the secondary node in proper order, ie. some later writes
can be send before earlier writes, because eg. they are placed in lower
extent and if you have primary failure right there, the secondary data
view won't be consistent and your file system will most likely by
corrupt.
In async mode you can skip and combine only consecutive writes.
For example if your queue contains the following writes
(number. offset size):

1.0 1024
2.  512 1024
3.0 1024
4. 4096 1024
5.0 1536

You can compress it to:

2+3.0 1536
  4. 4096 1024
  5.0 1536

Where we ignore first write entirely and combine writes 2 and 3, but we
cannot simply skip first three writes, only because we have fifth write
that covers them, as there is 4096,1024 request in between.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://yomoli.com


pgpayeHu7ykVa.pgp
Description: PGP signature


Re: Randomization in hastd(8) synchronization thread

2011-05-21 Thread Pawel Jakub Dawidek
On Tue, May 17, 2011 at 12:39:19PM -0700, Maxim Sobolev wrote:
 Hi Pawel,
 
 I am trying to use hastd(8) over slow links and one problem is
 apparent right now - current approach with synchronizing content
 sequentially is not working in this case. What happens is that hastd
 hits the first frequently updated block and cannot make any progress
 anymore. In my case I have 30GB of dirty space to be synchronized
 over just 1mbps uplink.
 
 The quick fix that I've applied is randomization in the block
 selection code. This way  eventually all least used blocks will be
 synchronized, leaving only hot ones dirty. More effective approach
 would be to use some kind of LRU selection algorithm, but
 statistical approach would work just as good in this case.
 
 Please review the patch below:
 
 http://sobomax.sippysoft.com/activemap.c.diff

Hmm, hastd keeps separate bitmap for synchronization. It is stored in
am_syncmap field. Blocks that are dirtied during regular writes should
not effect on synchronization bitmap and synchronization progress.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://yomoli.com


pgp9xz8wcUwuQ.pgp
Description: PGP signature


Re: geli on r221012

2011-05-08 Thread Pawel Jakub Dawidek
On Mon, Apr 25, 2011 at 01:31:55PM +, Anton Yuzhaninov wrote:
 Geli no longer works for me after upgrade to r221012.
 
 # geli attach -k ~citrin/private.key /dev/label/spool2
 Enter passphrase:
 #
 
 from dmesg:
 GEOM_ELI: Device label/spool2.eli created.
 GEOM_ELI: Encryption: Blowfish-CBC 128
 GEOM_ELI:  Integrity: HMAC/MD5
 GEOM_ELI: Crypto: software
 
 # dd if=/dev/label/spool2.eli of=/dev/null
 dd: /dev/label/spool2.eli: Invalid argument
 0+0 records in
 0+0 records out
 0 bytes transferred in 0.000669 secs (0 bytes/sec)

Thanks for the report! It should be fixed in r221628.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://yomoli.com


pgpUlmhHjBPXE.pgp
Description: PGP signature


Re: panic: g_eli_key_hold: sc_ekeys_total=1

2011-04-24 Thread Pawel Jakub Dawidek
On Sun, Apr 24, 2011 at 11:12:03AM +0200, Fabian Keil wrote:
 The panic can be reproduced with:
 /sbin/geli onetime -l 256 -s 4096 /dev/ada0s1b

That's why I asked for ada0s1b size. It should be fixed in HEAD (r220984).

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://yomoli.com


pgpAuN5zvAX8k.pgp
Description: PGP signature


Re: panic: g_eli_key_hold: sc_ekeys_total=1

2011-04-22 Thread Pawel Jakub Dawidek
On Fri, Apr 22, 2011 at 05:04:01PM +0200, Fabian Keil wrote:
 With sources from today my system panics at boot time
 after attaching the swap device:
 
 GEOM_ELI: Device ada0s1b.eli created.
 GEOM_ELI: Encryption: AES-XTS 256
 GEOM_ELI: Crypto: software
 panic: g_eli_key_hold: sc_ekeys_total=1
 cpuid = 0
 KDB: enter: panic
 Uptime: 2m16s
 Physical memory: 1974 MB
 Dumping 213 MB: 198 182 166 150 134 118 102 86 70 54 38 22 6
[...]

Could you provide the output of:

# diskinfo -v /dev/ada0s1b

And could you try:

# /sbin/geli onetime -l 256 -s 4096 /dev/ada0s1b

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://yomoli.com


pgp1PdPS9g7QC.pgp
Description: PGP signature


Re: Any success stories for HAST + ZFS?

2011-04-02 Thread Pawel Jakub Dawidek
On Thu, Mar 24, 2011 at 01:36:32PM -0700, Freddie Cash wrote:
 [Not sure which list is most appropriate since it's using HAST + ZFS
 on -RELEASE, -STABLE, and -CURRENT.  Feel free to trim the CC: on
 replies.]
 
 I'm having a hell of a time making this work on real hardware, and am
 not ruling out hardware issues as yet, but wanted to get some
 reassurance that someone out there is using this combination (FreeBSD
 + HAST + ZFS) successfully, without kernel panics, without core dumps,
 without deadlocks, without issues, etc.  I need to know I'm not
 chasing a dead rabbit.

I just committed a fix for a problem that might look like a deadlock.
With trociny@ patch and my last fix (to GEOM GATE and hastd) do you
still have any issues?

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://yomoli.com


pgpfaqPYEbyOO.pgp
Description: PGP signature


Re: Any success stories for HAST + ZFS?

2011-03-25 Thread Pawel Jakub Dawidek
On Thu, Mar 24, 2011 at 01:36:32PM -0700, Freddie Cash wrote:
 I've tried with FreeBSD 8.2-RELEASE, 8-STABLE, 8-STABLE w/ZFSv28
 patches, and 9-CURRENT (after the ZFSv28 commit).  Things work well
 until I start hastd.  Then either the system locks up, or hastd causes
 a kernel panic, or hastd dumps core.

The minimum amount of information (as always) would be backtrace from
the kernel and also hastd backtrace when it coredumps. There is really
decent logging in hast, so I'm also sure it does log something
interesting on primary or secondary. Another useful thing would be to
turn on debugging in hast (single -d option for hastd).

The best you can do is to give me the simplest and quickest procedure to
reproduce the issue, eg. configure two hast resources, put ZFS mirror on
top, start rsync /usr/src to the file system on top of hast and switch
roles. The simpler the better.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://yomoli.com


pgpYcvgL105vI.pgp
Description: PGP signature


Re: missing files in readdir(3) on NFS export of ZFS volume (since v28?)

2011-03-08 Thread Pawel Jakub Dawidek
On Mon, Mar 07, 2011 at 01:08:46AM +0100, Pierre Beyssac wrote:
 Hello,
 
 I'm running a 9-current server as compiled on Sat Mar  5 02:17:14
 CET 2011.
 
 Since I upgraded to ZFS v28 I noticed missing files from NFS. The
 files are still accessible through NFS but they don't show up on a
 readdir(3).
[...]

Could you try r219404?

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://yomoli.com


pgpeiqDGOvkQL.pgp
Description: PGP signature


Re: [head tinderbox] failure on ia64/ia64

2011-03-06 Thread Pawel Jakub Dawidek
On Mon, Mar 07, 2011 at 01:06:11AM +, FreeBSD Tinderbox wrote:
 TB --- 2011-03-07 00:25:55 - tinderbox 2.6 running on 
 freebsd-current.sentex.ca
 TB --- 2011-03-07 00:25:55 - starting HEAD tinderbox run for ia64/ia64
 TB --- 2011-03-07 00:25:55 - cleaning the object tree
 TB --- 2011-03-07 00:26:06 - cvsupping the source tree
 TB --- 2011-03-07 00:26:06 - /usr/bin/csup -z -r 3 -g -L 1 -h cvsup.sentex.ca 
 /tinderbox/HEAD/ia64/ia64/supfile
 TB --- 2011-03-07 00:26:19 - building world
 TB --- 2011-03-07 00:26:19 - MAKEOBJDIRPREFIX=/obj
 TB --- 2011-03-07 00:26:19 - PATH=/usr/bin:/usr/sbin:/bin:/sbin
 TB --- 2011-03-07 00:26:19 - TARGET=ia64
 TB --- 2011-03-07 00:26:19 - TARGET_ARCH=ia64
 TB --- 2011-03-07 00:26:19 - TZ=UTC
 TB --- 2011-03-07 00:26:19 - __MAKE_CONF=/dev/null
 TB --- 2011-03-07 00:26:19 - cd /src
 TB --- 2011-03-07 00:26:19 - /usr/bin/make -B buildworld
  World build started on Mon Mar  7 00:26:20 UTC 2011
  Rebuilding the temporary build tree
  stage 1.1: legacy release compatibility shims
  stage 1.2: bootstrap tools
  stage 2.1: cleaning up the object tree
  stage 2.2: rebuilding the object tree
  stage 2.3: build tools
  stage 3: cross tools
  stage 4.1: building includes
  stage 4.2: building libraries
  stage 4.3: make dependencies
 [...]
 mkdep -f .depend -a /src/sbin/growfs/growfs.c
 echo growfs: /obj/ia64.ia64/src/tmp/usr/lib/libc.a   .depend
 === sbin/gvinum (depend)
 rm -f .depend
 mkdep -f .depend -a-I/src/sbin/gvinum/../../sys /src/sbin/gvinum/gvinum.c 
 /src/sbin/gvinum/../../sys/geom/vinum/geom_vinum_share.c
 echo gvinum: /obj/ia64.ia64/src/tmp/usr/lib/libc.a 
 /obj/ia64.ia64/src/tmp/usr/lib/libreadline.a 
 /obj/ia64.ia64/src/tmp/usr/lib/libtermcap.a 
 /obj/ia64.ia64/src/tmp/usr/lib/libdevstat.a 
 /obj/ia64.ia64/src/tmp/usr/lib/libkvm.a 
 /obj/ia64.ia64/src/tmp/usr/lib/libgeom.a  .depend
 === sbin/hastctl (depend)
 make: don't know how to make hast_compression.c. Stop
 *** Error code 2

Interesting race. hast_compression.c was added in the same commit it was
added to hastctl Makefile.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://yomoli.com


pgpFDhVqWe1wK.pgp
Description: PGP signature


Re: HEADS UP: ZFSv28 is in!

2011-03-01 Thread Pawel Jakub Dawidek
On Mon, Feb 28, 2011 at 08:34:08AM +0100, Martin Sugioarto wrote:
  PS. If you like my work, you help me to promote yomoli.com:)
  
  http://yomoli.com
  http://www.facebook.com/pages/Yomolicom/178311095544155
  
 
 I would like, but you should at least tell me what it is (what will be
 sold there). I don't like to advertise things I don't know or even
 things that seem evil to me.
 
 I'll post your answer to a well-known German *BSD forum, if you want.

Well, I didn't want to say too much about it here, as it isn't really
related to FreeBSD. This is a startup I'm working on which is
location-based chat, which allows users to communicate with their
neighborhood.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://yomoli.com


pgpe1gJOLMeSe.pgp
Description: PGP signature


Re: HEADS UP: ZFSv28 is in!

2011-02-28 Thread Pawel Jakub Dawidek
On Sun, Feb 27, 2011 at 04:03:01PM -0700, Shawn Webb wrote:
 I'm so excited for your work. Thanks so much for bringing zpool v28 to
 FreeBSD. Will v28 come to 8-stable?

Yes, hopefully in 1-2 month(s).

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://yomoli.com


pgp1UOEA9rzOR.pgp
Description: PGP signature


Re: HEADS UP: ZFSv28 is in!

2011-02-28 Thread Pawel Jakub Dawidek
On Mon, Feb 28, 2011 at 10:37:25AM +, krad wrote:
 On 28 February 2011 08:47, Pawel Jakub Dawidek p...@freebsd.org wrote:
  On Sun, Feb 27, 2011 at 04:03:01PM -0700, Shawn Webb wrote:
  I'm so excited for your work. Thanks so much for bringing zpool v28 to
  FreeBSD. Will v28 come to 8-stable?
 
  Yes, hopefully in 1-2 month(s).
 
  --
  Pawel Jakub Dawidek                       http://www.wheelsystems.com
  FreeBSD committer                         http://www.FreeBSD.org
  Am I Evil? Yes, I Am!                     http://yomoli.com
 
 
 ive never managed to be able to boot off my 4k aligned pool
 (ashift=12) on stable, does the import to head provide all the patches
 for this or is it a case of using the latest zfs v28 patch set for
 stable? I have no dying need for v28 yet, it just want to be able to
 boot onto the 4k drive and tidy things up.

Support for this is included in what I committed to HEAD. Even HEAD
couldn't boot off of pools with ashift != 9 until now.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://yomoli.com


pgpoBcg2ska7K.pgp
Description: PGP signature


HEADS UP: ZFSv28 is in!

2011-02-27 Thread Pawel Jakub Dawidek
Hi.

I just committed ZFSv28 to HEAD.

New major features:

- Data deduplication.
- Triple parity RAIDZ (RAIDZ3).
- zfs diff.
- zpool split.
- Snapshot holds.
- zpool import -F. Allows to rewind corrupted pool to earlier
  transaction group.
- Possibility to import pool in read-only mode.

PS. If you like my work, you help me to promote yomoli.com:)

http://yomoli.com
http://www.facebook.com/pages/Yomolicom/178311095544155

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://yomoli.com


pgpGTPfcT34QE.pgp
Description: PGP signature


Re: [PATCH] OpenSolaris/ZFS: C++ compatibility

2011-02-05 Thread Pawel Jakub Dawidek
On Fri, Feb 04, 2011 at 11:03:53AM -0700, Justin T. Gibbs wrote:
 The attached patch is sufficient to allow a C++ program to use libzfs.
 The motivation for these changes is work I'm doing on a ZFS fault
 handling daemon that is written in C++.  SpectraLogic's intention
 is to return this work to the FreeBSD project once it is a bit more
 complete.
 
 Since these changes modify files that come from OpenSolaris, I want to be
 sure I understand the project's policies regarding divergence from
 the vendor before I check them in.  All of the changes save one should
 be trivial to merge with vendor changes and I will do that work for the
 v28 import.  Is there any reason I should not commit these changes?

Now that OpenSolaris is dead we don't have to be so strict with keeping
the diff against vendor small at all cost. I'd prefer not to modify
vendor code whenever possible so it is easier for us to cooperate with
IllumOS (we already took ome code from them).

Me and my company are also interested in fault management daemon
(although not restricted to ZFS, but a more general purpose mechanism
like FMA in Solaris). My question would be are there any chances you may
be convinced to use plain C? With C we might be able to help, but not
with C++.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgphkmODt5wu8.pgp
Description: PGP signature


Re: [PATCH] OpenSolaris/ZFS: C++ compatibility

2011-02-05 Thread Pawel Jakub Dawidek
On Sat, Feb 05, 2011 at 02:36:40PM -0700, Justin T. Gibbs wrote:
 On 2/5/2011 8:39 AM, Pawel Jakub Dawidek wrote:
  On Fri, Feb 04, 2011 at 11:03:53AM -0700, Justin T. Gibbs wrote:
  The attached patch is sufficient to allow a C++ program to use libzfs.
  The motivation for these changes is work I'm doing on a ZFS fault
  handling daemon that is written in C++.  SpectraLogic's intention
  is to return this work to the FreeBSD project once it is a bit more
  complete.
 
  Since these changes modify files that come from OpenSolaris, I want to be
  sure I understand the project's policies regarding divergence from
  the vendor before I check them in.  All of the changes save one should
  be trivial to merge with vendor changes and I will do that work for the
  v28 import.  Is there any reason I should not commit these changes?
  
  Now that OpenSolaris is dead we don't have to be so strict with keeping
  the diff against vendor small at all cost. I'd prefer not to modify
  vendor code whenever possible so it is easier for us to cooperate with
  IllumOS (we already took ome code from them).
 
 Perhaps IllumOS will accept these changes back?  As I mentioned in the
 change descriptions included with the patch, the header files already
 show the intention of providing C++ support (extern C blocks), they
 just don't quite deliver.  The changes shouldn't be controversial.

Sure. To be clear: I'm not against those changes, I think they are worth
it. And getting IllumOS to accept them back is definitely a good idea.

  Me and my company are also interested in fault management daemon
  (although not restricted to ZFS, but a more general purpose mechanism
  like FMA in Solaris).
 
 We have talked internally about this at Spectra too.  Since we don't have
 BSD licensed nvpair code, we've thought of using Google protocol buffers
 to allow extensible encoding of fault data.  The GP implementation is
 MIT licensed and looks like it might be less cumbersome to use than
 nvpairs.  For the first release of our product, however, we are just
 making due with the string data that devctl provides.

I've developed similar API during HAST work, maybe it is a good starting
point? src/sbin/hastd/nv.{c,h}.

  My question would be are there any chances you may
  be convinced to use plain C? With C we might be able to help, but not
  with C++.
 
 The core FMA support needs to be reasonably accessible from C code of
 course (fully functional and not cumbersome to use).  But we should
 allow FMA agents to be coded in whatever language is convenient to the
 developer.  The project may only be able to accept agents in C (and I'm
 voting for C++ too) into it's distribution, but that policy should not
 drive us to make the FMA architecture hard to access from shell, python,
 ruby, or some other language.

Yes, agents should not be limited to one language. I wouldn't be
surprised is the majority of agents will be shell scripts.

 The reason I chose C++ for this task is that devd, the source of the
 events I process, already requires C++ so using C++ in zfsd doesn't
 impose any new requirements on the system.  Zfsd, like even the C
 kernel of FreeBSD is coded in an object oriented fashion, but its
 much cleaner to implement this type of design in a language that
 inherently supports object oriented concepts.  Could I rewrite all
 that I have in C?  Sure, but there would have to be some compelling
 reasons to offset the reduction in clarity and maintainability such
 a change would cause.

Hmm, so zfsd will receive events from devd? I'm in opinion that we
should let devd alone. In my initial port I used devd, because it was
closest match, but if we want to clean it up, we shouldn't go through
devd. For example ZFS v28 can report whole binary blocks where checksum
doesn't match and passing those through devd would be cumbersome.

 Is your inability to help on a C++ version of this code due to distaste
 for C++ or just a lack of experience with it?

The latter. I'm sure there are many committers that are fluent in C++,
but all of them know C. I was under impression that Warner implemented
devd in C++ also as a kind of experiment, which nobody really followed.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpQQMrZ5Hwdv.pgp
Description: PGP signature


Re: Replacing a failed disk in raidz2 zfs (and gpt)

2011-02-03 Thread Pawel Jakub Dawidek
On Thu, Feb 03, 2011 at 06:11:34AM +, Philip M. Gollucci wrote:
 All,
 
 I have a zroot(mirror)+zmysql(raidz2) setup on a MySQL db box.
 One drive failed (mfid3).  We've since replaced it.
 
 I can't for the life of me get zpool to replace it. I can't remember why
 I used gpt instead of direct disks for the zmysql pool (but thats how it
 is).  I've tried all of the following commands with different errors,
 and I must say I'm stumped.  I've done this several times before for the
 ASF (but no gpt at play there).
 
 $ zpool scrub zmysql
 just runs, and completes, no error
 
 $ zpool replace zmysql gpt/disk3
 cannot replace gpt/disk3 with gpt/disk3: one or more devices is
 currently unavailable
[...]
 $ zpool offline zmysql gpt/disk3
 cannot offline gpt/disk3: no valid replicas

I'm afraid this is ZFS bug that is fixed in v28 for sure, not sure
about v14/v15.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpvKfbSsGxHk.pgp
Description: PGP signature


Re: Replacing a failed disk in raidz2 zfs (and gpt)

2011-02-03 Thread Pawel Jakub Dawidek
On Thu, Feb 03, 2011 at 07:52:52PM +, Philip M. Gollucci wrote:
 Do you have a bug ID ?

I think it is 6328632. Change 5a60f16123ba. Note, there are many, many
other unrelated changes.

 Do you have any work arounds?

From what I can see, this change is in HEAD already, so I'll try that.

 Will a reboot help ?

No idea, sorry.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpEXAC6VatmN.pgp
Description: PGP signature


Re: Replacing a failed disk in raidz2 zfs (and gpt)

2011-02-03 Thread Pawel Jakub Dawidek
On Thu, Feb 03, 2011 at 08:08:15PM +, Philip M. Gollucci wrote:
 On 02/03/11 20:02, Pawel Jakub Dawidek wrote:
  On Thu, Feb 03, 2011 at 07:52:52PM +, Philip M. Gollucci wrote:
  Do you have a bug ID ?
  
  I think it is 6328632. Change 5a60f16123ba. Note, there are many, many
  other unrelated changes.
  
  Do you have any work arounds?
  
  From what I can see, this change is in HEAD already, so I'll try that.
 Do you have a pointer to how to get the hg repo handy.  There's no diff
 there.

The repo is still online:

ssh://a...@hg.opensolaris.org/hg/onnv/onnv-gate

But if you are thinking about extracting only part of the change
responsible for your problem that might not be easy.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpmkyX9M3bLW.pgp
Description: PGP signature


Re: [head tinderbox] failure on ia64/ia64

2011-02-01 Thread Pawel Jakub Dawidek
On Mon, Jan 31, 2011 at 04:56:06PM -0800, Marcel Moolenaar wrote:
 
 On Jan 31, 2011, at 3:51 PM, Pawel Jakub Dawidek wrote:
 
  On Mon, Jan 31, 2011 at 10:56:18PM +, FreeBSD Tinderbox wrote:
  [...]
  cc -O2 -pipe  -I/src/sbin/hastctl/../hastd -DINET -DINET6 -DYY_NO_UNPUT 
  -DYY_NO_INPUT -DHAVE_CRYPTO -std=gnu99 -Wsystem-headers -Werror -Wall 
  -Wno-format-y2k -W -Wno-unused-parameter -Wstrict-prototypes 
  -Wmissing-prototypes -Wpointer-arith -Wreturn-type -Wcast-qual 
  -Wwrite-strings -Wswitch -Wshadow -Wunused-parameter -Wcast-align 
  -Wchar-subscripts -Winline -Wnested-externs -Wredundant-decls 
  -Wold-style-definition -Wno-pointer-sign -c 
  /src/sbin/hastctl/../hastd/proto_common.c
  cc1: warnings being treated as errors
  /src/sbin/hastctl/../hastd/proto_common.c: In function 
  'proto_common_descriptor_send':
  /src/sbin/hastctl/../hastd/proto_common.c:116: warning: cast increases 
  required alignment of target type
  /src/sbin/hastctl/../hastd/proto_common.c: In function 
  'proto_common_descriptor_recv':
  /src/sbin/hastctl/../hastd/proto_common.c:146: warning: cast increases 
  required alignment of target type
  /src/sbin/hastctl/../hastd/proto_common.c:149: warning: cast increases 
  required alignment of target type
  *** Error code 1
  
  Marcel, do you have an idea how one can use CMSG_NXTHDR() on ia64 with
  high WARNS? With WARNS=6 I get those errors and I've no idea how to fix
  it properly. If there is a fix, CMSG_NXTHDR() should probably be fixed,
  but maybe I'm wrong?
 
 this warning indicates that you're casting from a pointer to type P
 (P having alignment constraints Ap) to a pointer to type Q (Q having
 alignment constraints Aq), and Aq  Ap. The compiler tells you that
 you may end up with misaligned accesses.
 
 If you know that the pointer satisfies Aq, you can cast through (void *)
 to silence the compiler. If you cannot guarantee that, you have a bigger
 problem. Solutions include packing type Q to reduce Aq or to copy the
 data to a local variable.
 
 Take the statement at line 116 for example:
   *((int *)CMSG_DATA(cmsg)) = fd;
 
 We're effectively casting from a (char *) to a (int *) and then doing
 a 32-bit access (write). The easy fix (casting through (void *) is not
 possible, because you cannot guarantee that the address is properly
 aligned. cmsg points to memory set aside by the following local
 variable:
   unsigned char ctrl[CMSG_SPACE(sizeof(fd))];
 
 There's no guarantee that the compiler will align the character array
 at a 32-bit boundary (though in practice it seems to be). I have seen
 this kind of construct fail on ARM and PowerPC for example.
 
 In any case: The safest approach here is to use le32enc or be32enc
 rather than casting through (void *). Obviously these function encode
 using a fixed byte order when the original code is using the native
 byte order of the CPU. Having native encoding functions help.
 
 You could use bcopy as well, but the compiler is typically too smart
 for its own good and it will try to optimize the call away. This
 leaves you with the same misaligned access that you tried to avooid
 by using bcopy(). You need to trick the compiler so that it won't
 optimize the bcopy away, like:
   bcopy((void *)fd, CMSG_DATA(cmsg), sizeof(fd));

Interesting. I did use bcopy() to silence the warning, but the need to
cast to (void *) is surprising.

Still, I'm more concerned with CMSG_NXTHDR() macro, which from what I
see might not be fixed by casting arguments.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpEWqZfPoVvr.pgp
Description: PGP signature


Re: [head tinderbox] failure on ia64/ia64

2011-01-31 Thread Pawel Jakub Dawidek
On Mon, Jan 31, 2011 at 10:56:18PM +, FreeBSD Tinderbox wrote:
[...]
 cc -O2 -pipe  -I/src/sbin/hastctl/../hastd -DINET -DINET6 -DYY_NO_UNPUT 
 -DYY_NO_INPUT -DHAVE_CRYPTO -std=gnu99 -Wsystem-headers -Werror -Wall 
 -Wno-format-y2k -W -Wno-unused-parameter -Wstrict-prototypes 
 -Wmissing-prototypes -Wpointer-arith -Wreturn-type -Wcast-qual 
 -Wwrite-strings -Wswitch -Wshadow -Wunused-parameter -Wcast-align 
 -Wchar-subscripts -Winline -Wnested-externs -Wredundant-decls 
 -Wold-style-definition -Wno-pointer-sign -c 
 /src/sbin/hastctl/../hastd/proto_common.c
 cc1: warnings being treated as errors
 /src/sbin/hastctl/../hastd/proto_common.c: In function 
 'proto_common_descriptor_send':
 /src/sbin/hastctl/../hastd/proto_common.c:116: warning: cast increases 
 required alignment of target type
 /src/sbin/hastctl/../hastd/proto_common.c: In function 
 'proto_common_descriptor_recv':
 /src/sbin/hastctl/../hastd/proto_common.c:146: warning: cast increases 
 required alignment of target type
 /src/sbin/hastctl/../hastd/proto_common.c:149: warning: cast increases 
 required alignment of target type
 *** Error code 1

Marcel, do you have an idea how one can use CMSG_NXTHDR() on ia64 with
high WARNS? With WARNS=6 I get those errors and I've no idea how to fix
it properly. If there is a fix, CMSG_NXTHDR() should probably be fixed,
but maybe I'm wrong?

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgphFUx7Q3q4K.pgp
Description: PGP signature


Re: My ZFS v28 Testing Experience

2011-01-13 Thread Pawel Jakub Dawidek
On Wed, Jan 12, 2011 at 11:03:19PM -0400, Chris Forgeron wrote:
 I've been testing out the v28 patch code for a month now, and I've yet to 
 report any real issues other than what is mentioned below. 
 
 I'll detail some of the things I've tested, hopefully the stability of v28 in 
 FreeBSD will convince others to give it a try so the final release of v28 
 will be as solid as possible.
 
 I've been using FreeBSD 9.0-CURRENT as of Dec 12th, and 8.2PRE as of Dec 16th
 
 What's worked well:
 
 - I've made and destroyed small raidz's (3-5 disks), large 26 disk raid-10's, 
 and a large 20 disk raid-50.
 - I've upgraded from v15, zfs 4, no issues on the different arrays noted above
 - I've confirmed that a v15 or v28 pool will import into Solaris 11 Express, 
 and vice versa, with the exception about dual log or cache devices noted 
 below. 
 - I've run many TB of data through the ZFS storage via benchmarks from my 
 VM's connected via NFS, to simple copies inside the same pool, or copies from 
 one pool to another. 
 - I've tested pretty much every compression level, and changing them as I 
 tweak my setup and try to find the best blend.
 - I've added and subtracted many a log and cache device, some in failed 
 states from hot-removals, and the pools always stayed intact.

Thank you very much for all your testing, that's really a valuable
contribution. I'll be happy to work with you on tracking down the
bottleneck in ZFSv28.

 Issues:
 
 - Import of pools with multiple cache or log devices. (May be a very minor 
 point)
 
 A v28 pool created in Solaris 11 Express with 2 or more log devices, or 2 or 
 more cache devices won't import in FreeBSD 9. This also applies to a pool 
 that is created in FreeBSD, is imported in Solaris to have the 2 log devices 
 added there, then exported and attempted to be imported back in FreeBSD. No 
 errors, zpool import just hangs forever. If I reboot into Solaris, import the 
 pool, remove the dual devices, then reboot into FreeBSD, I can then import 
 the pool without issue. A single cache, or log device will import just fine. 
 Unfortunately I deleted my witness-enabled FreeBSD-9 drive, so I can't easily 
 fire it back up to give more debug info. I'm hoping some kind soul will 
 attempt this type of transaction and report more detail to the list.
 
 Note - I just decided to try adding 2 cache devices to a raidz pool in 
 FreeBSD, export, and then importing, all without rebooting. That seems to 
 work. BUT - As soon as you try to reboot FreeBSD with this pool staying 
 active, it hangs on boot. Booting into Solaris, removing the 2 cache devices, 
 then booting back into FreeBSD then works. Something is kept in memory 
 between exporting then importing that allows this to work.  

Unfortunately I'm unable to reproduce this. It works for me with 2 cache
and 2 log vdevs. I tried to reboot, etc. My test exactly looks like
this:

# zpool create tank raidz ada0 ada1
# zpool add tank cache ada0 ada1
# zpool export tank
# kldunload zfs
# zpool import tank
works
# reboot
works

 - Speed. (More of an issue, but what do we do?)
 
 Wow, it's much slower than Solaris 11 Express for transactions. I do 
 understand that Solaris will have a slight advantage over any port of ZFS. 
 All of my speed tests are made with a kernel without debug, and yes, these 
 are -CURRENT and -PRE releases, but the speed difference is very large.

Before we go any further could you please confirm that you commented out
this line in sys/modules/zfs/Makefile:

CFLAGS+=-DDEBUG=1

This turns all kind of ZFS debugging and slows it down a lot, but for
the correctness testing is invaluable. This will be turned off once we
import ZFS into FreeBSD-CURRENT.

BTW. In my testing Solaris 11 Express is much, much slower than
FreeBSD/ZFSv28. And by much I mean two or more times in some tests.
I was wondering if they have some debug turned on in Express.

 At first, I thought it may be more of an issue with the ix0/Intel X520DA2 
 10Gbe drivers that I'm using, since the bulk of my tests are over NFS (I'm 
 going to use this as a SAN via NFS, so I test in that environment). 
 
 But - I did a raw cp command from one pool to another of several TB. I 
 executed the same command under FreeBSD as I did under Solaris 11 Express. 
 When executed in FreeBSD, the copy took 36 hours. With a fresh destination 
 pool of the same settings/compression/etc under Solaris, the copy took 7.5 
 hours. 

When you turn off compression (because it turns all-zero blocks into
holes) you can test it by simply:

# dd if=/dev/zero of=/zfs_fs/zero bs=1m

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgprFLLYTe9F4.pgp
Description: PGP signature


Re: Next ZFSv28 patchset ready for testing.

2011-01-04 Thread Pawel Jakub Dawidek
On Wed, Dec 15, 2010 at 10:15:40AM +0200, Andrei Kolu wrote:
 2010/12/14 Pawel Jakub Dawidek p...@freebsd.org
 
  On Mon, Dec 13, 2010 at 10:45:56PM +0100, Pawel Jakub Dawidek wrote:
   Hi.
  
   The new patchset is ready for testing:
  
         http://people.freebsd.org/~pjd/patches/zfs_20101212.patch.bz2
 
  You can also download the whole source tree already patched from here:
 
         http://people.freebsd.org/~pjd/zfs_20101212.tbz
 
 
 # uname -a
 FreeBSD freebsd9.raidon.eu 9.0-CURRENT FreeBSD 9.0-CURRENT #0: Tue Dec
 14 14:37:01 EET 2010
 r...@freebsd9.raidon.eu:/usr/obj/usr/src/sys/GENERIC  amd64
 
 Create files filled with zeroes:
 # mkfile 512m disk1 disk2 disk3 disk4
 # zpool create andmed raidz /home/antik/disk{1,2,3,4}
 # zpool status andmed
   pool: andmed
  state: ONLINE
  scan: none requested
 config:
 
 NAME   STATE READ WRITE CKSUM
 andmed ONLINE   0 0 0
   raidz1-0 ONLINE   0 0 0
 /home/antik/disk1  ONLINE   0 0 0
 /home/antik/disk2  ONLINE   0 0 0
 /home/antik/disk3  ONLINE   0 0 0
 /home/antik/disk4  ONLINE   0 0 0
 
 errors: No known data errors
 
 Now let's try to scrub:
 # zpool scrub andmed
 
 Fatal trap 12: page fault while in kernel mode
 cpuid = 1; apic id = 01
 fault virtual address = 0x1fb8007b
 fault code = supervisor read data, page not present
 instruction pointer = 0x20:0x812967d2
 stack pointer = 0x20:0xff80ee605548
 frame pointer = 0x28:0xff80ee605730
 code segment = base 0x0, limit 0xf, type 0x1b
  = DPL 0, pres1, long 1, def32 0, gran 1
 processor eflags = interrupt enabled, resume, IOPL = 0
 current process = 2081 (initial thread)
 [ thread pid 2081 tid 100121 ]
 Stopped at  vdev_file_open+0x92:  testb  $0x20,0x7b(%rax)

Could you verify if this patch fixes the problem for you?

http://people.freebsd.org/~pjd/patches/vdev_file.c.2.patch

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgplp1JmNuuvJ.pgp
Description: PGP signature


Re: Next ZFSv28 patchset ready for testing.

2010-12-19 Thread Pawel Jakub Dawidek
On Fri, Dec 17, 2010 at 12:54:36AM +0300, Rechistov Grigory (Речистов Григорий) 
wrote:
 I started to check the new ZFS version inside a VirtualBox machine. So far  
 it works for me without crashes, but I got some observations worth  
 mentioning. Here are the steps I made:
 
 1. Installed 8.1-RELEASE (from minimal install  CD)
 2. Csup'ped sources to CURRENT (as of 14/12/2010) [note that I haven't  
 used SVN repository]
 3. Applied the patch in question.
 4. Created a zpool raidz of two disks of old  version 15. Also some usual  
 tuning of ZFS in loader.conf was done as I am running 32 bit version with  
 low amount of memory.  zfs_enable=YES in rc.conf was added too.
 4.1 Moved /usr/ports to ZFS to have some files on it.
 5. Make buildworld, buildkernel, installkernel, installworld - all the  
 canonical steps from the Handbook.
 6. After reboot to final 9.0-CURRENT world I got a dmesg with some trace  
 stack related to ZFS and also a rc.d script message about unrecognized  
 command 'volinit' (see the text of it in attachment).

This one is because mergemaster(8) skips files with the same $FreeBSD$
value, so you need to copy /usr/src/etc/rc.d/zvol to /etc/rc.d/ by hand.

 7. Nevertheless the system booted. Files
 8. `zpool upgrade -a` worked all right and reported that now I have ZFS  
 version 28
 
 Overall I am pleasantly surprised how streamlined the whole process was.

That's good to hear, thanks.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgp5O7SANNIX6.pgp
Description: PGP signature


Re: Next ZFSv28 patchset ready for testing.

2010-12-15 Thread Pawel Jakub Dawidek
On Wed, Dec 15, 2010 at 10:15:00PM -0500, ben wilber wrote:
 On Mon, Dec 13, 2010 at 10:45:56PM +0100, Pawel Jakub Dawidek wrote:
  Hi.
  
  The new patchset is ready for testing:
 
 Running fine for 24 hours now under load with a ~50 disk v15 (not
 upgraded) pool from -CURRENT.  Thanks!
 
 Only strange thing is the rc script complains:
 
 /etc/rc: DEBUG: run_rc_command: doit: zvol_start 
 unrecognized command 'volinit'
 usage: zfs command args ...

Did you run mergemaster(8) after the upgrade? The patch includes change
to etc/rc.d/zvol to remove 'zfs volinit'/'zfs volfini' which are no
longer available.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgp7c4gzudIbP.pgp
Description: PGP signature


Re: Next ZFSv28 patchset ready for testing.

2010-12-14 Thread Pawel Jakub Dawidek
On Tue, Dec 14, 2010 at 03:20:05PM +0100, Olivier Smedts wrote:
  make installworld
 
 That's what I wanted to do, and why I rebooted single-user on the new
 kernel. But isn't the v13-v15 userland supposed to work with the v28
 kernel ?

Yes, it is suppose to work. Exactly to be able to follow FreeBSD common
upgrade path. Martin was working on this (CCed).

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpCsgsK8Mp9u.pgp
Description: PGP signature


Next ZFSv28 patchset ready for testing.

2010-12-13 Thread Pawel Jakub Dawidek
Hi.

The new patchset is ready for testing:

http://people.freebsd.org/~pjd/patches/zfs_20101212.patch.bz2

When applying the patch be sure to use correct options for patch(1)!:

# cd /usr/src
# fetch http://people.freebsd.org/~pjd/patches/zfs_20101212.patch.bz2
# bzip2 -d zfs_20101212.patch.bz2
# patch -E -p0  zfs_20101212.patch

The patch is against FreeBSD HEAD as of 2010-12-12.

Some of the changes since the last patchset (zfs_20100831.patch):

- Boot support for ZFS v28 (only RAIDZ3 is not yet supported).
- Various fixes for the existing ZFS boot code.
- Support for sendfile(2) (by avg@).
- Userland-kernel compatibility with v13-v15 (by mm@).
- ACL fixes (by trasz@).
- Various bug fixes.

Please test, test, test. Chances are this is the last patchset before
v28 going to HEAD (finally). Especially test new changes, like boot
support and sendfile(2) support. Also be sure to verify if you can
import for existing ZFS pools (v13-v15) when running v28 or boot from
your existing pools.

Enjoy!

PS. Martin (mm@) will be providing patch against 8-STABLE soon.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgptzjMdmsjno.pgp
Description: PGP signature


Re: Next ZFSv28 patchset ready for testing.

2010-12-13 Thread Pawel Jakub Dawidek
On Mon, Dec 13, 2010 at 10:45:56PM +0100, Pawel Jakub Dawidek wrote:
 Hi.
 
 The new patchset is ready for testing:
 
   http://people.freebsd.org/~pjd/patches/zfs_20101212.patch.bz2
 
 When applying the patch be sure to use correct options for patch(1)!:
 
   # cd /usr/src
   # fetch http://people.freebsd.org/~pjd/patches/zfs_20101212.patch.bz2
   # bzip2 -d zfs_20101212.patch.bz2
   # patch -E -p0  zfs_20101212.patch
[...]

If patch(1) reports reject of sys/cddl/compat/opensolaris/sys/sysmacros.h
file or you see the following error while compiling world:

/usr/src/cddl/usr.bin/ctfconvert/../../../cddl/contrib/opensolaris/tools/ctf/cvt/strtab.c:249:
 undefined reference to `MIN'
strtab.o(.text+0x28d): In function `strtab_insert':
/usr/src/cddl/usr.bin/ctfconvert/../../../cddl/contrib/opensolaris/tools/ctf/cvt/strtab.c:119:
 undefined reference to `MIN'
strtab.o(.text+0x3a1):/usr/src/cddl/usr.bin/ctfconvert/../../../cddl/contrib/opensolaris/tools/ctf/cvt/strtab.c:145:
 undefined reference to `MIN'
*** Error code 1

Simple remove sys/cddl/compat/opensolaris/sys/sysmacros.h file from the tree.

Unfortunately the patch can either works on source downloaded via cvsup or on
the source downloaded via subversion as those two have different $FreeBSD$ id
strings (at least in case of this file). The patch is generated based on
subversion source, so if you use cvsup, you most likely will see the reject and
the error.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgp46myIfopSX.pgp
Description: PGP signature


Re: Next ZFSv28 patchset ready for testing.

2010-12-13 Thread Pawel Jakub Dawidek
On Mon, Dec 13, 2010 at 10:45:56PM +0100, Pawel Jakub Dawidek wrote:
 Hi.
 
 The new patchset is ready for testing:
 
   http://people.freebsd.org/~pjd/patches/zfs_20101212.patch.bz2

You can also download the whole source tree already patched from here:

http://people.freebsd.org/~pjd/zfs_20101212.tbz

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpJ41aQDwAYd.pgp
Description: PGP signature


Re: Next ZFSv28 patchset ready for testing.

2010-12-13 Thread Pawel Jakub Dawidek
On Mon, Dec 13, 2010 at 11:00:31PM -, Steven Hartland wrote:
 What's the expected behaviour for the sendfile changes as
 sendfile is one of the problems we have here with the
 double memory allocation required for it under ZFS compared
 to UFS. Does this patch address that?

No. The patch doesn't address that. It only adds support for
sendfile(2), as it was commented out in the previous patchset.

 Inspecting the patch the following segment looks odd:-
 --- sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c.orig
 +++ sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c
 ...
while (n  0) {
nbytes = MIN(n, zfs_read_chunk_size -
P2PHASE(uio-uio_loffset, zfs_read_chunk_size));
 
 +#ifdef __FreeBSD__
 +   if (uio-uio_segflg == UIO_NOCOPY)
 +   error = mappedread_sf(vp, nbytes, uio);
 +   else
 +#endif /* __FreeBSD__ */
if (vn_has_cached_data(vp))
error = mappedread(vp, nbytes, uio);
else
 
 Is there an extra else in there which will break things or should
 the __FreeBSD__ mappedread_sf block replace the standard mappedread
 call or is the indentation just a bit weird?

The code is correct. It is just hard to split 'else' and 'if' with a
'#endif' and keep the indentation pretty. Depends on the conditions we
use one of the three methods to read the data.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpSKGrAP0AYX.pgp
Description: PGP signature


Re: taskqueue_create() name parameter lieftime

2010-11-16 Thread Pawel Jakub Dawidek
On Tue, Nov 16, 2010 at 08:27:11AM -0500, John Baldwin wrote:
 On Tuesday, November 16, 2010 7:20:47 am Andriy Gapon wrote:
  
  taskqueue_create() documentation never explicitly says this, but current
  taskqueue_create() implementation just stores a 'name' pointer parameter
  internally.  Thus it depends on the 'name' having a life time encompassing 
  that of
  the taskqueue.
  I think that alternatively we could have copied the name (or a portion of 
  it) into
  an internal buffer.
  I don't any argument for either approach, just curious which one looks more
  preferable from general (FreeBSD, kernel) programming practices point of 
  view.
 
 Hmm, in many other places we store a separate copy (e.g. all the interrupt
 code uses separate MAXCOMLEN char arrays to hold names).  If that is easy to
 do, that is probably the best approach.

The most friendly API would keep the name internally, but would also
allow me to provide name in printf-like format, so I don't have to use
sprint()/snprintf() before calling it. This unfortunatelly will change
taskqueue API as name is the first argument, which makes it not worth
the pain.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgp3yVgaHDkwq.pgp
Description: PGP signature


Re: ZFS v28 is ready for wider testing.

2010-11-03 Thread Pawel Jakub Dawidek
On Wed, Nov 03, 2010 at 07:28:15PM +0100, Olivier Smedts wrote:
         http://people.freebsd.org/~pjd/patches/zfs_20100831.patch.bz2
 
 Hello,
 
 Any status update on this ? I regularly check
 http://people.freebsd.org/~pjd/patches/ to see if there's an updated
 version of your patch. 2 months old is quite a bit for -CURRENT, which
 often receives commits on zfsco parts.
 
 Thanks for all your work on FreeBSD (not only ZFS).

It took a while, but I should have something new shortly. I recently
finished boot support for v28 (the most missing feature in the previous
patch?) and will work on new patch soon. I'm heading to meetBSD
California tomorrow and I'll be back in a week, so nothing will happen
till then for sure.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpPnD9csrFCZ.pgp
Description: PGP signature


Re: letting glabel recognise a media change

2010-10-11 Thread Pawel Jakub Dawidek
On Mon, Oct 11, 2010 at 11:03:26AM -0400, John Baldwin wrote:
 With CD drives you are also rather stuck in that the existing ABI for
 controlling CD drives (e.g. ioctls in 3rd party software to eject a CD) are
 done on the /dev/cdX device.  Ideally enclosures for removable media would
 be separate devices from the removable media itself, but a lot of existing
 software for CD's would break if this changes now.

Right, but I still wonder if we could execute provider orphan and
retaste on various events like media insertion or removal. If media is
removed we orphan provider and recreate it, which will trigger retaste,
and this is fine there will be nothing to read from or write to (we will
simply return errors as we do now, I think). This way we nicely
co-operate with GEOM, but also with other tools that don't require media
to be present (if there is no media devfs entry still exists and handles
ioctls, it just return errors on read requests).

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgp57kBd4EwFu.pgp
Description: PGP signature


Re: letting glabel recognise a media change

2010-10-10 Thread Pawel Jakub Dawidek
On Thu, Sep 30, 2010 at 08:46:11PM +0300, Alexander Motin wrote:
 Andriy Gapon wrote:
  on 30/09/2010 01:28 Matthew Jacob said the following:
  If something like that was in place, I assure you that things would start 
  to use
  it very quickly.
  
  I am not sure about this.
  Because, e.g. I don't see an easy way to know that media is changed in 
  scsi_cd
  driver.  That is, without polling.  I don't consider polling to be an easy 
  way for
  a number of reasons.
 
 SATA specification defines concept of Asynchronous Notification. It is
 already used by port multipliers to report about PHY events. It is also
 supposed to be used by CD drives to report media change. I haven't seen
 such devices yet, but hope they may appear sometimes.
 
 And even without AN support it would be nice to implement proper
 handling for SCSI UA - media changed errors within CAM. It still won't
 be perfect without using polling, but probably still something.

I'd like to know the original reason why CD device is represented by
GEOM provider and not CD media. For my naive thinking CD media should be
GEOM provider that we taste once the media is inserted and orphan once
the media is removed. I don't see any reasons for CD device to be useful
GEOM provider, but maybe I'm overlooking something.

Poul-Henning or Soren, do you remember who made and why this design choice?

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpbCmI9YvYaB.pgp
Description: PGP signature


Recent GELI additions.

2010-09-25 Thread Pawel Jakub Dawidek
Hi.

I'd like to inform about three new features in GELI available in HEAD:

1. AES-XTS encryption. XTS mode is a standard that is recommended these
   days for storage encryption. This is the default now. AES-XTS support
   was also added to opencrypto framework and aesni(4) driver.

2. Multiple encryption keys. GELI will use one encryption key for at
   most 2^20 blocks (sectors), as it is not recommended to use the same
   encryption key for too much data. It generates keys array from the
   master key on attach and uses it accordingly. This is the default now.

3. Passphrase can now be loaded from a file (-J and -j options).

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpKbX8P352EG.pgp
Description: PGP signature


Re: gptboot rewrite, bootonce, etc.

2010-09-20 Thread Pawel Jakub Dawidek
On Mon, Sep 20, 2010 at 09:46:56AM +0100, krad wrote:
 does it work for zfs boot as that would be really nice if it did?

No, it doesn't. ZFS works a bit differently. ZFS operate on pools, not
really on partitions. One ZFS file system can span multiple
disks/partitions. I'm not yet sure how to implement it, so it is
intuitive, but I also haven't spend much time thinking about it. We
needed UFS and that is what I implemented. It took me much more time
than I expected anyway:)

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpOli8wZZAdH.pgp
Description: PGP signature


Re: gptboot rewrite, bootonce, etc.

2010-09-20 Thread Pawel Jakub Dawidek
On Mon, Sep 20, 2010 at 01:17:38AM +0200, Oliver Pinter wrote:
 Hi PJD!
 
 Can you this patcheset release for 7-STABLE?

I've no plans atm to port this work to 7-STABLE. I don't even have 7.x
systems anymore. Not sure how boot code differs, maybe the patch will
apply without modifications? No idea. I'd like to MFC this to 8-STABLE,
though.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgp1EiZmlOSUJ.pgp
Description: PGP signature


Re: gptboot rewrite, bootonce, etc.

2010-09-20 Thread Pawel Jakub Dawidek
On Sun, Sep 19, 2010 at 09:10:52PM +0400, Boris Samorodov wrote:
 Hi!
 
 On Sat, 18 Sep 2010 01:45:42 +0200 Pawel Jakub Dawidek wrote:
 
  My company was in need for functionality similar to nextboot(8), but on
  boot loader level, so we can have two partitions we boot from where one
  is known to be good and the other is used for upgrades. We upgrade by
  dd(1)ing entire partition image onto unused partition, we mark it as
  try-to-boot-from-it-but-only-once, reboot and if we fail to boot from
  the new partition, we fall back to the old, good partition. If we
  succeed on the other hand, we mark the new partition as our boot
  partition and mark the other one as unused.
 
  Well, how hard can it be?
 
  After around two weeks of work, I ended up rewriting gptboot in large
  parts, reorganizing a lot of code, improving and extending gpart a bit
  and implementing desire functionality.
 
  Here is the patch for review and test:
 
  http://people.freebsd.org/~pjd/patches/gptboot.patch
 
 Great! Since I need to have both i386 and amd64 at my box
 here are my test results:
 -
 [~]b...@alya% uname -a
 FreeBSD alya 9.0-CURRENT FreeBSD 9.0-CURRENT #1 r212758M: Sat Sep 18 16:13:38 
 MSD 2010
 b...@alya:/space/FreeBSD/base/head/obj/space/FreeBSD/base/head/src/sys/ALYA 
 amd64
 
 [~]b...@alya% glabel status
   Name  Status  Components
 gptid/c6053c9b-abcc-11df-b740-00251124aff4 N/A  ad4p1
  label/9-amd64 N/A  ad4p2
 label/swap N/A  ad4p3
label/space N/A  ad4p4
   label/9-i386 N/A  ad4p5
 [~]b...@alya% mount
 /dev/label/9-amd64 on / (ufs, local)
 devfs on /dev (devfs, local, multilabel)
 /dev/label/space on /space (ufs, local)
 /dev/md0 on /tmp (ufs, local, nosuid, soft-updates)
 procfs on /proc (procfs, local)
 linprocfs on /compat/linux/proc (linprocfs, local)
 linsysfs on /compat/linux/sys (linsysfs, local)
 fdescfs on /dev/fd (fdescfs)
 
 [~]b...@alya% gpart show
 =   34  490234685  ad4  GPT  (234G)
  341281  freebsd-boot  (64K)
 162   419430402  freebsd-ufs  (20G)
4194320283886083  freebsd-swap  (4.0G)
50331810  2097152004  freebsd-ufs  (100G)
   260047010   419430405  freebsd-ufs  (20G)
   301990050  188244669   - free -  (90G)
 
 [~]b...@alya% gpart set -a bootme -i 2 ad4
 bootme set on ad4p2
 [~]b...@alya% gpart set -a bootonce -i 5 ad4
 bootonce set on ad4p5
 [~]b...@alya% gpart show
 =   34  490234685  ad4  GPT  (234G)
  341281  freebsd-boot  (64K)
 162   419430402  freebsd-ufs  [bootme]  (20G)
4194320283886083  freebsd-swap  (4.0G)
50331810  2097152004  freebsd-ufs  (100G)
   260047010   419430405  freebsd-ufs  [bootonce,bootme]  (20G)
   301990050  188244669   - free -  (90G)
 -
 
 Install i386 kernel/world to ad4p5, successful reboot, get i386
 system. Next reboot (get amd64 system back):
 -
 [~]b...@alya% gpart show
 =   34  490234685  ad4  GPT  (234G)
  341281  freebsd-boot  (64K)
 162   419430402  freebsd-ufs  [bootme]  (20G)
4194320283886083  freebsd-swap  (4.0G)
50331810  2097152004  freebsd-ufs  (100G)
   260047010   419430405  freebsd-ufs  (20G)
   301990050  188244669   - free -  (90G)
 -
 
 All seems to work fine.

Great, thanks for testing!

  Any comments or suggestions?
 
 Only one for now. With current default syslog configuration
 logging to local0.warning and local0.info goes nowhere.
 It will be good if those messages have traces at the
 default system.

Good point. I changed those to local0.notice.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpK71ho4UC6u.pgp
Description: PGP signature


gptboot rewrite, bootonce, etc.

2010-09-17 Thread Pawel Jakub Dawidek
 things will have to wait until I can
sleep at nights again. Well, there is still dedup support that waits to
be implemented in gptzfsboot...

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpm1w4OWOKIR.pgp
Description: PGP signature


Re: ZFS v28 is ready for wider testing.

2010-09-03 Thread Pawel Jakub Dawidek
On Fri, Sep 03, 2010 at 04:50:44PM +0100, Peter Molnar, BSD wrote:
 Hi,
 I would like to try ZFS + VirtualBox but I have got problems:
 
 
 1) Linux 2.6.32-24-generic #42-Ubuntu SMP Fri Aug 20 14:21:58 UTC 2010 
 x86_64 GNU/Linux
 
 I tried import that file in my  VirtualBox but I have got error:
 Failed to import appliance.
 /home/peter/FreeBSD/zfsv28.ovf
 Too many IDE controllers in OVF; import facility only supports one.

Which VirtualBox version do you use? 3.2.8?

Exporting appliances is a bit broken (if you have more than one disk, it
will point all disks at the last one from configuration), so I had to
edit .ovf file manually to fix this. Maybe I messed something up, but I
was able to successfully import it before publishing it.

PS. I waited for so long for decent virtualization software for FreeBSD,
and I must say VirtualBox is really great, and free, and open-source
Are you reading this, VMWare?

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgppp5WIVDzjJ.pgp
Description: PGP signature


Re: ZFS v28 is ready for wider testing.

2010-09-02 Thread Pawel Jakub Dawidek
On Thu, Sep 02, 2010 at 01:55:51AM -0700, Rob Farmer wrote:
 On Tue, Aug 31, 2010 at 2:59 PM, Pawel Jakub Dawidek p...@freebsd.org wrote:
 
  Ok, now that I know you read everything carefully, here is the patch:
 
         http://people.freebsd.org/~pjd/patches/zfs_20100831.patch.bz2
 
 
 buildworld on i386 (yes I know ZFS isn't ideal there):
[...]

Yes, I know about this problem, You can use attached patch or wait for
full patch, which I'll be sending later today.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!
--- sys/cddl/compat/opensolaris/sys/atomic.h
+++ sys/cddl/compat/opensolaris/sys/atomic.h
@@ -39,10 +39,9 @@
 #ifndef __LP64__
 extern void atomic_add_64(volatile uint64_t *target, int64_t delta);
 extern void atomic_dec_64(volatile uint64_t *target);
-extern void *atomic_cas_ptr(volatile void *target, void *cmp,  void *newval);
 #endif
 #ifndef __sparc64__
-extern uint64_t atomic_cas_32(volatile uint32_t *target, uint32_t cmp,
+extern uint32_t atomic_cas_32(volatile uint32_t *target, uint32_t cmp,
 uint32_t newval);
 extern uint64_t atomic_cas_64(volatile uint64_t *target, uint64_t cmp,
 uint64_t newval);
@@ -119,21 +118,19 @@
 }
 
 #ifndef COMPAT_32BIT
-#if defined(__LP64__)
+#ifdef __LP64__
 static __inline void *
 atomic_cas_ptr(volatile void *target, void *cmp,  void *newval)
 {
-	return ((void *)atomic_cas_64((volatile uint64_t *)target, (uint64_t)cmp,
-	(uint64_t)newval));
+	return ((void *)atomic_cas_64(target, (uint64_t)cmp, (uint64_t)newval));
 }
 #else
 static __inline void *
 atomic_cas_ptr(volatile void *target, void *cmp,  void *newval)
 {
-	return ((void *)atomic_cas_32((volatile uint64_t *)target, (uint64_t)cmp,
-	(uint64_t)newval));
+	return ((void *)atomic_cas_32(target, (uint32_t)cmp, (uint32_t)newval));
 }
 #endif
-#endif
+#endif	/* !COMPAT_32BIT */
 
 #endif	/* !_OPENSOLARIS_SYS_ATOMIC_H_ */


pgppo82knRdQW.pgp
Description: PGP signature


Re: ZFS v28 is ready for wider testing.

2010-09-02 Thread Pawel Jakub Dawidek
On Tue, Aug 31, 2010 at 11:59:15PM +0200, Pawel Jakub Dawidek wrote:
[...]
 Ok, now that I know you read everything carefully, here is the patch:
 
   http://people.freebsd.org/~pjd/patches/zfs_20100831.patch.bz2

Now it is even easier to test new ZFS! :)

Here you can find VirtualBox Appliance (113MB) with
FreeBSD 9-CURRENT and ZFSv28:

http://people.freebsd.org/~pjd/misc/FreeBSD9_ZFSv28_0.1.tgz

Untar it, import it (zfsv28.ovf) to VirtualBox and have fun.

You can log in as root with no password (via virtual console or via SSH).
The system IP address is IP 192.168.56.66/24.
There are 16 ada(4) disks to play with. For example:

zfsv28:root:~# zpool create tank raidz3 ada{0,1,2,3,4,5,6,7} raidz3 
ada{8,9,10,11,12,13,14,15}
zfsv28:root:~# zpool status
  pool: tank
 state: ONLINE
 scan: none requested
config:

NAMESTATE READ WRITE CKSUM
tankONLINE   0 0 0
  raidz3-0  ONLINE   0 0 0
ada0ONLINE   0 0 0
ada1ONLINE   0 0 0
ada2ONLINE   0 0 0
ada3ONLINE   0 0 0
ada4ONLINE   0 0 0
ada5ONLINE   0 0 0
ada6ONLINE   0 0 0
ada7ONLINE   0 0 0
  raidz3-1  ONLINE   0 0 0
ada8ONLINE   0 0 0
ada9ONLINE   0 0 0
ada10   ONLINE   0 0 0
ada11   ONLINE   0 0 0
ada12   ONLINE   0 0 0
ada13   ONLINE   0 0 0
ada14   ONLINE   0 0 0
ada15   ONLINE   0 0 0

errors: No known data errors

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgp3nDIzwUUuC.pgp
Description: PGP signature


Re: ZFS v28 is ready for wider testing.

2010-09-01 Thread Pawel Jakub Dawidek
On Tue, Aug 31, 2010 at 11:59:15PM +0200, Pawel Jakub Dawidek wrote:
 Ok, now that I know you read everything carefully, here is the patch:
 
   http://people.freebsd.org/~pjd/patches/zfs_20100831.patch.bz2

Important note. Please patch with the following command:

# patch -E -p0  zfs_20100831.patch

If you don't use -E option, patch(1) won't remove empty files and you
won't be able to compile it.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgplMh4YH3ZOH.pgp
Description: PGP signature


ZFS v28 is ready for wider testing.

2010-08-31 Thread Pawel Jakub Dawidek
Hello.

I'd like to give you ZFS v28 for testing. If you are neither brave nor
mad, you can stop here.

The patchset is very experimental. It can eat your cookie and hurt your
teddy bear, so be warned. Don't try it for anything except testing.

This patchset is also a message we, as the FreeBSD project, would like
to send to our users: Eventhough OpenSolaris is dead, the ZFS file
system is going to stay in FreeBSD. At this point we have quite a few
developers involved in ZFS on FreeBSD as well as serveral companies.
We are also looking forward to work with IllumOS.

So, what this new ZFS brings?

- Data deduplication. Read more here:

http://blogs.sun.com/bonwick/entry/zfs_dedup

- Triple parity RAIDZ (RAIDZ3). Read more here:

http://dtrace.org/blogs/ahl/2009/07/21/triple-parity-raid-z/

- zfs diff. Read more here:

http://arc.opensolaris.org/caselog/PSARC/2010/105/20100328_tim.haley

- zpool split. Read more here:

http://arc.opensolaris.org/caselog/PSARC/2009/511/20090924_mark.musante

- Snapshot holds. Read more here:

http://arc.opensolaris.org/caselog/PSARC/2009/297/20090511_chris.kirby

- zpool import -F. Allows to rewind corrupted pool to earlier
  transaction group.

- Possibility to import pool in read-only mode.

And much, much more, including plenty of preformance improvements and bug
fixes.

So test whatever you can and report back. Look for regressions, strange
behaviour, missing features, deadlocks, livelocks, preformance
degradation, etc.

The boot code is not updated at all, so booting off of ZFS doesn't
currently work.

The patch is against today's FreeBSD HEAD.

The patch enables (in sys/modules/zfs/Makefile) ZFS internal debugging,
please don't turn it off. Also, compile your kernel with the following
options:

options KDB
options DDB
options INVARIANTS
options INVARIANT_SUPPORT
options WITNESS
options WITNESS_SKIPSPIN
options DEBUG_LOCKS
options DEBUG_VFS_LOCKS

Ignore all the LOR (Lock Order Reversal) reports from WITNESS. There will
be plenty of those, and you'll desperately want to report them, but please
don't.

The best way to report a problem is to answer to this e-mail with as short
as possible procedure of how to reproduce it and debugging info. I'd
prefer textdump if possible. Below you can find quick procedure how to
setup textdumps:

Choose spare/swap disk/partition in your system, let's say it is
/dev/ad0s1b.

Add the following line to /etc/fstab:

/dev/ad0s1b noneswapsw  0   0

Add the following line to /etc/rc.conf:

ddb_enable=YES

Run the following commands:

# /etc/rc.d/swap1 start
# /etc/rc.d/dumpon start
# /etc/rc.d/ddb start

This will setup swap, mark it as dump device and setup some DDB
scripts. Or you can just reboot.

Now when your system panic or deadlock, enter DDB and call the
following command:

ddb run kdb.enter.panic

It will execute all the commands I need, dump them in text format to
your swap device and reboot machine.

After the reboot, you should find textdump.tar.0 file in /var/crash/
directory. This is the debug info I need.

End of textdumps procedure.

Ok, now that I know you read everything carefully, here is the patch:

http://people.freebsd.org/~pjd/patches/zfs_20100831.patch.bz2

Good luck! :

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpGVyTUV4RIm.pgp
Description: PGP signature


Re: [CFT] Improved ZFS metaslab code (faster write speed)

2010-08-28 Thread Pawel Jakub Dawidek
On Sat, Aug 28, 2010 at 05:03:42AM -0400, jhell wrote:
 On 08/28/2010 04:20, Andriy Gapon wrote:
  on 28/08/2010 04:24 jhell said the following:
  The modified patch from avg@ (portion patch) is:
 
  #ifdef _KERNEL
  if (arc_reclaim_needed()) {
  needfree = 0;
  wakeup(needfree);
  }
  #endif
 
 I still moved that down to below _KERNEL for the obvious reasons.  But
  when I was using the original patch with if (needfree) I noticed a
  performance degradation after ~12 hours of use with and without UMA
  turned on. So far with ~48 hours of testing with the top half of that
  being with the above change, I have not seen more degradation of
  
  This is quite unexpected.
  needfree should be checked as the very first thing in arc_reclaim_needed()
  [unless you have patched it locally].  So if needfree is 1 then
  arc_reclaim_needed() should also return 1.  But the converse is not true,
  arc_reclaim_needed() may return 1 even if needfree is zero.
  
  So if your testing results are conclusive then it must mean that some extra
  wakeups on needfree are needed.  I.e. needfree is zero, so there shouldn't 
  be
  anything waiting on it (see arc_lowmem) and no notification should be 
  needed,
  but issuing somehow does make difference,
  Hmm...
  
 
 I will look further into this and see if I can throw a counter around it
 or some printf's so I can at least log what its doing in both instances.
 
 I thought the very same thing you said above when I saw your patch for
 that and was astounded at the results that were returned from it. So in
 short testing I reverted it back quickly to see if that was the cause of
 the problem and sure enough everything resumed to the way it was before.
 
 Anyway thanks for the reply. I will get back to you if I see anything
 cool arise from this.

Could you include the following patch to your testing:

http://people.freebsd.org/~pjd/patches/arc.c.9.patch

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpomIv4VGZ52.pgp
Description: PGP signature


Re: Mounting cd9660 multiple times gives EBUSY [Was: unionfs a little improvement]

2010-08-22 Thread Pawel Jakub Dawidek
On Wed, Aug 18, 2010 at 12:48:53PM +0200, Ed Schouten wrote:
 Hi Daichi,
 
 I think Keith Packard of Xorg once wrote a commit message along the
 lines of 5000 lines of code removed, feature added This seems to be
 similar, albeit on a smaller scale. ;-)
 
 Apart from this issue with unionfs, I am also experiencing another
 issue, where for some reason I cannot perform a second mount of the CD
 right after booting the system. Basically, my WIP FreeBSD boot CD does
 the following (but written in C):
 
   mount -t cd9660 /dev/iso9660/freebsd /mnt
   mount -t tmpfs none /tmp
   mount -t unionfs /tmp /mnt
   mount -t devfs none /mnt/dev
   chroot /mnt /sbin/init
 
 The first step fails with EBUSY. I use the following hack to get it
 working, but I don't think it's the proper way to solve it:

What you are trying to do here is to mount /dev/iso9660/freebsd for the
second time? This is not supported. The check is there to prevent doing
this, as it will panic on you when you try to unmount first mount (not
really a problem in your case, as the first mount is /, so you probably
don't want to unmount it, but it is a problem in general).

You should be able to reproduce the panic with your patch applied by
doing the following:

# mount -t cd9660 /dev/iso9660/freebsd /mnt0
# mount -t cd9660 /dev/iso9660/freebsd /mnt1
# umount /mnt0

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgp88NLmz310d.pgp
Description: PGP signature


Re: glabel force sectorsize patch

2010-08-08 Thread Pawel Jakub Dawidek
On Sun, Aug 08, 2010 at 02:02:17PM +0200, Ivan Voras wrote:
 On 8.8.2010 12:30, Pawel Jakub Dawidek wrote:
  So why do you want to obfuscate glabel with it? For people to start
  depend on it? Once we start supporting 4kB sectors what do we do with
  such a change? Remove it and decrease version number? What people will
  do with providers already labeled this way?
  
  If its temporary, just allow to list providers you want to increase
  sector size in /boot/loader.conf. Once we start supporting it properly
  people might simply remove it from loader.conf and it should just work.
  
  Glabel is not for that and I don't agree for such obfuscation.
 
 Of course, there are good and bad sides to it. My take on it is that the
 only bad side is that it really isn't glabel's primary function to
 (optionally) fixup geometry, while the good sides are:

It isn't its secondary function either.

 * glabel is in GENERIC and judging by the mailing lists' traffic it is
 one of the better used parts of the system so people are familiar with
 it. It is also already used as a perfectly valid fixup for device
 renaming, making both UFS and ZFS more stable for usage.

That's an excellent argument. But you know what? The em(4) is also in
GENERIC, why not to add it in there?

 * You can't really make people depend on glabel both because it is in
 GENERIC and because of it storing metadata in the last sector, making
 the rest of the drive completely usable without it in the event native
 4k sector support is grown.

I never said that. I do want people to depend on glabel, because it is
free of such ugly hacks, so I know it won't bite them in the future.

I don't want people to start depend on the fact that glabel supports
changing sector sizes.

Once we start supporting 4kB sectors properly people configuration will
stop working, because glabel won't be able to read its metadata anymore.
Your hack will break all configurations that started to depend on your
hack. In what I proposed, GEOM provider will be presented to glabel (or
any other GEOM class) as 4kB provider and everything will just work,
also after adding proper support for 4kB sectors.

 I'd like to hear comments from the wider audience. In respect with your
 comment, I will compromise: as 4k sector drives have become available
 over the counter more than 6 months ago and so far I think this is the
 first effort to give some support for them, I will commit this patch
 before 9.0 code freeze only if no other support gets developed.

I'll repeat. You won't commit this patch, because it is totally wrong
solution and can only do a lot of damage in the future.
If you look forward, even temporary solutions can be done right.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpxLQFRxU0ja.pgp
Description: PGP signature


Re: glabel force sectorsize patch

2010-08-08 Thread Pawel Jakub Dawidek
On Sun, Aug 08, 2010 at 02:57:20PM +0200, Marius Nünnerich wrote:
 On Sun, Aug 8, 2010 at 14:02, Ivan Voras ivo...@freebsd.org wrote:
  I'd like to hear comments from the wider audience. In respect with your
  comment, I will compromise: as 4k sector drives have become available
  over the counter more than 6 months ago and so far I think this is the
  first effort to give some support for them, I will commit this patch
  before 9.0 code freeze only if no other support gets developed.
 
 I do not like this at all. Even if it's just for the KISS and POLA
 principles. A geom should do one thing and do it right imo.
 Why not write a new geom class that does what you want?

New GEOM class only for sectorsize conversion that can operate on
metadata will be useful, not only to solve this particular problem.
Although keep in mind that if at some point disks will be detected and
presented as 4kB providers to the GEOM, this class won't be able to find
its metadata anymore (as it was stored in the last 512 bytes, not in the
last 4 kilobytes).

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpMenhUo3zq1.pgp
Description: PGP signature


Re: AESNI driver and fpu_kern KPI

2010-05-18 Thread Pawel Jakub Dawidek
On Sat, May 15, 2010 at 01:04:01PM +0300, Kostik Belousov wrote:
 Hello,
 
 please find at http://people.freebsd.org/~kib/misc/aesni.1.patch the
 combined patch, containing the fpu_kern KPI and Intel AESNI crypto(9)
 driver.  I did development and some testing on the hardware generously
 provided by Sentex Communications to Netperf cluster.

Nice work. Few comments:

- Could you modify this chunk in padlock.c:

+   td = curthread;
+   error = fpu_kern_enter(td, ses-ses_fpu_ctx);
+   if (error != 0)
+   goto out;
error = padlock_hash_setup(ses, macini);
+   fpu_kern_leave(td, ses-ses_fpu_ctx);
+   out:

  To something without goto, eg.:

td = curthread;
error = fpu_kern_enter(td, ses-ses_fpu_ctx);
if (error == 0) {
error = padlock_hash_setup(ses, macini);
fpu_kern_leave(td, ses-ses_fpu_ctx);
}

- I see that in sys/dev/random/nehemiah.c you don't check for return
  value of fpu_kern_enter(). That's the only place where you ignore it.
  Is that intended?

- Unfortunately the driver in its current version can't be used with
  IPsec and with GELI where authentication is enabled. This is because
  the driver doesn't support sessions where both encryption and
  authentication is defined. Do you have plans to change it?
  I saw that you based crypto(9) bits on padlock, which does support
  sessions with authentication by calculating hashes in software.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgptFXEkt9czc.pgp
Description: PGP signature


Re: Switchover to CAM ATA?

2010-04-26 Thread Pawel Jakub Dawidek
On Mon, Apr 26, 2010 at 10:33:27AM -0600, M. Warner Losh wrote:
 I've read most of this thread.  I think this is cool technology.
 However, before we move forward with this, we need to have a plan for
 the various issues that have come up.  The plan needs to be specific,
 have owners for key items, warnings about ownerless == obsoleted, and
 target dates.
 
 I think this is one of the cases where we should record the plan of
 record on a wiki.  It worked well for other times we've had big,
 disruptive changes.
 
 My opinion for the path forward:
 (1) Send a big heads up about the future of ataraid(5).  It will be
 shot in the head soon, to be replaced be a bunch of geom classes
 for each different container format.  At least that seems to be
 the rough consensus I've seen so far.  We need worker bees to do
 many of these classes, although much can be mined from the ataraid
 code today.

This shouldn't be a bunch of GEOM classes. This should one class which
recognize multiple formats, just like the LABEL class.
I don't think it is feasible to reuse gmirror for that, it wasn't
designed in something like this in mind.

 (2) Send another big heads up strongly recommending people go to
 glabel based fstabs.  Maybe the right option here is to provide a
 simple script walk people through the conversion.  This will
 render the carnage of ad - ada (or da) a mostly non-event, and
 also protect people from 'oops' of rebooting with that thumb drive
 in the system.
 (3) Create a wiki to record all the new geom classes needed.  Find
 people to own each one, or note it is unowned, and support will be
 dropped if no owner can be found.
 (4) sysinstall should default to creating label systems, if it doesn't
 already.
 (5) Issues with glabel and ataraid(5) need an owner, and need to be
 resolved, since the device names here are likely to change.

What are the issues?

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgp9zbeI5WsV4.pgp
Description: PGP signature


Re: Switchover to CAM ATA?

2010-04-26 Thread Pawel Jakub Dawidek
On Mon, Apr 26, 2010 at 12:19:46PM -0600, M. Warner Losh wrote:
 In message: 20100426181209.gb3...@garage.freebsd.pl
 Pawel Jakub Dawidek p...@freebsd.org writes:
 : On Mon, Apr 26, 2010 at 10:33:27AM -0600, M. Warner Losh wrote:
 :  I've read most of this thread.  I think this is cool technology.
 :  However, before we move forward with this, we need to have a plan for
 :  the various issues that have come up.  The plan needs to be specific,
 :  have owners for key items, warnings about ownerless == obsoleted, and
 :  target dates.
 :  
 :  I think this is one of the cases where we should record the plan of
 :  record on a wiki.  It worked well for other times we've had big,
 :  disruptive changes.
 :  
 :  My opinion for the path forward:
 :  (1) Send a big heads up about the future of ataraid(5).  It will be
 :  shot in the head soon, to be replaced be a bunch of geom classes
 :  for each different container format.  At least that seems to be
 :  the rough consensus I've seen so far.  We need worker bees to do
 :  many of these classes, although much can be mined from the ataraid
 :  code today.
 : 
 : This shouldn't be a bunch of GEOM classes. This should one class which
 : recognize multiple formats, just like the LABEL class.
 : I don't think it is feasible to reuse gmirror for that, it wasn't
 : designed in something like this in mind.
 
 OK.  Maybe I got the consensus wrong...  My key point is that we need
 a plan moving forward, we need to identify what's actively being
 worked on vs somebody else[tm] should do tihs and when it needs to
 be done or else.

You most likely got it right, I'm just saying creating separate GEOM
class for each metadata format is wrong direction. :)

 :  (5) Issues with glabel and ataraid(5) need an owner, and need to be
 :  resolved, since the device names here are likely to change.
 : 
 : What are the issues?
 
 ataraid doesn't remove the underlying ad* devices, so glabel often
 picks those up instead of the ataraid device, and you only get 1
 disk's worth of raid device...  So no mirroring or only 1/2 a striped
 volume.

It not only leave ad* devices, it doesn't even open them properly using
GEOM. It's internal ATA hack, which is PITA.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpC74JvN8hWL.pgp
Description: PGP signature


Re: ZFS behavior when device disappears

2010-04-20 Thread Pawel Jakub Dawidek
On Tue, Apr 13, 2010 at 05:39:30PM -0600, Jason J. W. Williams wrote:
 Hello,
 
 Currently, we're an OpenSolaris shop but with the way things are going
 over at Oracle/Sun we're starting to evaluate our options for keeping
 ZFS but moving off Solaris. One of my concerns is that FreeBSD is
 implementing ZFSv14 (ZFS itself is up to v23 I believe). For quite a
 long time, ZFS under Solaris had a real problem with the following
 scenario:
 
 * Hard drive starts to die
 * Controller and SCSI subsystem continue to retry an I/O rather than
 failing fast
 * Even if the I/O does fail fast ZFS doesn't really notice a spike in
 I/O failures and continues to use the drive.
 * Result: I/O on the zpool stalls completely while the I/Os continue
 to be tried against the drive.
 
 This got fixed in later revs of OpenSolaris by enhancements to ZFS and
 greater integration with the Fault Management Architecture (FMA) of
 Solaris...lots of I/Os failing on a drive get communicated to ZFS who
 then offlines the drive out of the pool.
 
 My question is, what is the situation in FreeBSD 8 with ZFS if that
 type of situation occurs?

I believe FreeBSD does whatever OpenSolaris did for this version of ZFS.
There is nogoing work to bring v24 to FreeBSD. Basic functionality works
already, but a lot work is still needed. At some point I'll see what we
can do about it, because we don't have FMA in FreeBSD and we would need
to find another way to deal with it. I've limited time I can spend on
ZFS right now, so I'm making small steps, but I'm making good progress
too.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpVisqFmsp2w.pgp
Description: PGP signature


Re: ZFS behavior when device disappears

2010-04-20 Thread Pawel Jakub Dawidek
On Tue, Apr 20, 2010 at 07:24:53AM -0600, Jason J. W. Williams wrote:
 Hi Pawel,
 
 Thank you very much for the response! Please forgive some of my
 questions, as I'm a bit unfamiliar with the FreeBSD port.
 
 What is the nature of the port? Is it something where each new version
 of ZFS is a from-scratch effort to some degree? Or is it a point where
 new ZFS versions are a matter of just making the newer features
 operational?

Definitely the latter, but there some problems:

- Some changes in OpenSolaris ZFS are very hard to port in short time,
  and when it takes a lot of time, new versions arrive and it is nice to
  get them too, etc. which makes whole process to take long time.

  Good example here is moving some functionality to Python, where we
  have to decided what to do about that without importing Python to the
  base system.

- OpenSolaris ZFS is experimental and I don't think Solaris version is
  published anywhere. This means it needs extensive testing on our side,
  which of course takes time.

- OpenSolaris changes are often not easy to understand. They have
  different commit rules than we have. Commit logs are not very helpful
  and multiple fixes are committed in one go, which makes it hard to
  separate individual changes if we just need a fix and not intrusive
  change that came along.

I'm doing my best, but my time is limited. I see more and more people
are interested in helping with ZFS, which is a very good sign I was
waiting for for a long time:)

It is of course still wonderful that we can use ZFS. All my servers and
my laptop are running exclusively on ZFS at this point:)

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpM8JNKN6bFd.pgp
Description: PGP signature


Re: Increasing MAXPHYS

2010-03-22 Thread Pawel Jakub Dawidek
On Mon, Mar 22, 2010 at 08:23:43AM +, Poul-Henning Kamp wrote:
 In message 4ba633a0.2090...@icyb.net.ua, Andriy Gapon writes:
 on 21/03/2010 16:05 Alexander Motin said the following:
  Ivan Voras wrote:
  Hmm, it looks like it could be easy to spawn more g_* threads (and,
  barring specific class behaviour, it has a fair chance of working out of
  the box) but the incoming queue will need to also be broken up for
  greater effect.
  
  According to notes, looks there is a good chance to obtain races, as
  some places expect only one up and one down thread.
 
 I haven't given any deep thought to this issue, but I remember us discussing
 them over beer :-)
 
 The easiest way to obtain more parallelism, is to divide the mesh into
 multiple independent meshes.
 
 This will do you no good if you have five disks in a RAID-5 config, but
 if you have two disks each mounted on its own filesystem, you can run
 a g_up  g_down for each of them.

A class is suppose to interact with other classes only via GEOM, so I
think it should be safe to choose g_up/g_down threads for each class
individually, for example:

/dev/ad0s1a (DEV)
   |
g_up_0 + g_down_0
   |
 ad0s1a (BSD)
   |
g_up_1 + g_down_1
   |
 ad0s1 (MBR)
   |
g_up_2 + g_down_2
   |
 ad0 (DISK)

We could easly calculate g_down thread based on bio_to-geom-class and
g_up thread based on bio_from-geom-class, so we know I/O requests for
our class are always coming from the same threads.

If we could make the same assumption for geoms it would allow for even
better distribution.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpFAxWFcI5ds.pgp
Description: PGP signature


Re: check for jailed environment for adjkerntz

2010-03-01 Thread Pawel Jakub Dawidek
On Mon, Mar 01, 2010 at 02:15:41AM +0300, Subbsd wrote:
 jail with complete type have standard crontab a file of tasks. However not
 all standard task are adapted for work in jail an environment. For example
 adjkerntz which generates
 
 adjkerntz [46733]: sysctl (set: machdep.wall_cmos_clock): Operation not
 permitted
 
 I suggest to give adjkerntz concept about jail in which to it it is not
 necessary to work:
[...]

I also always was finding that annoying, but only your e-mail made me to
think about ways to fix it and that maybe simple patch like the one
below will do?

--- etc/crontab (wersja 204363)
+++ etc/crontab (kopia robocza)
@@ -22,4 +22,4 @@
 #
 # Adjust the time zone if the CMOS clock keeps local time, as opposed to
 # UTC time.  See adjkerntz(8) for details.
-1,31   0-5 *   *   *   rootadjkerntz -a
+1,31   0-5 *   *   *   root[ `sysctl -n 
security.jail.jailed` -eq 0 ]  adjkerntz -a

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpYvDwD944Ze.pgp
Description: PGP signature


HAST (Highly Available Storage) now in HEAD.

2010-02-19 Thread Pawel Jakub Dawidek
Hi.

Yesterday I committed HAST to the HEAD branch.

HAST allows to transparently store data on two physically separated
machines connected over the TCP/IP network. HAST works in
Primary-Secondary (Master-Backup, Master-Slave) configuration, which
means that only one of the cluster nodes can be active at any given
time. Only Primary node is able to handle I/O requests to HAST-managed
devices. Currently HAST is limited to two cluster nodes in total.

HAST operates on block level - it provides disk-like devices in
/dev/hast/ directory for use by file systems and/or applications.
Working on block level makes it transparent for file systems and
applications. There in no difference between using HAST-provided device
and raw disk, partition, etc. All of them are just regular GEOM
providers in FreeBSD.

For more information please consult hastd(8), hastctl(8) and
hast.conf(5) manual pages, as well as:

http://wiki.FreeBSD.org/HAST

On the wiki page above you should find instructions how to initialize
hast and integrate it with ucarp.

Let me know (using freebsd...@freebsd.org mailing list) if you have and
questions or comments.

And last, but not least, I'd like to thank sponsorswho made this
projects possible:

The FreeBSD Foundation, http://www.freebsdfoundation.org
OMCnet Internet Service GmbH, http://www.omc.net
TransIP BV, http://www.transip.nl

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgpXW0Rd7BO2p.pgp
Description: PGP signature


Re: ZFS: statfs and recordsize problem

2010-02-19 Thread Pawel Jakub Dawidek
On Thu, Feb 18, 2010 at 03:39:28PM +0300, Alexander Zagrebin wrote:
 I have noticed, that statfs called for ZFS file systems,
 returns the value of FS's recordsize property in both f_bsize and
 f_iosize.
 
 It's a problem for some software.
 For example, squid uses block size of cache's file system to calculate
 the space occupied by file.
 So by default it considers that any small file uses 128KB of a cache
 (when default value of recordsize is used), though really this file
 may use 512B only.
 This miscalculation leads to unreasonable cleaning of a cache.
 
 IMHO the behavior of statfs have to be changed, as ZFS uses variable
 (up to recordsize) block sizes.
 It must return 512 as f_bsize and recordsize as f_iosize.
 One of possible solutions is the attached patch.
 Could somebody look it?

I committed (slightly modified version of) your patch to HEAD.
Thanks!

-- 
Pawel Jakub Dawidek   http://www.wheel.pl
p...@freebsd.org   http://www.FreeBSD.org
FreeBSD committer Am I Evil? Yes, I Am!


pgp67WCYnRd70.pgp
Description: PGP signature


Re: jail and emulators/linux_base

2003-12-03 Thread Pawel Jakub Dawidek
On Wed, Dec 03, 2003 at 10:22:16AM +0100, Niklas Saers Mailinglistaccount wrote:
+ I'm running CURRENT and set up a jail where I want to install SUN JDK
+ 1.4.2. In the process, linux emulation needs to be installed. While
+ installing emulators/linux_base, I get the following:
+ 
+ === Installing for linux_base-7.1_5
+ Un-mounting linprocfs...
+ umount: retrying using path instead of file system ID
+ ===  Generating temporary packing list
+ === Checking if emulators/linux_base already installed
+ mknod: /compat/linux/dev/null: Operation not permitted
+ *** Error code 1
+ 
+ While Linux-emulation is already up and running on the host-machine, it
+ seems the jail is not allowed to create what it needs to run it. I
+ understand allowing mknod(8) within a jail is dangerous in the case where
+ you allow untrusted users to be root. Is there some way to either say I
+ don't let untrusted users be root thus allowing this or to compile
+ emulators/linux_base more jail-friendly, possibly setting things up from
+ outside the jail?

Erm. You may install it using chroot(8) only and then run jail with the
same path. You may also use chroot(8) instead of jail if you're looking
for full functionality.

-- 
Pawel Jakub Dawidek   [EMAIL PROTECTED]
UNIX Systems Programmer/Administrator http://garage.freebsd.pl
Am I Evil? Yes, I Am! http://cerber.sourceforge.net


pgp0.pgp
Description: PGP signature


Panic: if_simloop: attempted use of a free mbuf!

2003-11-28 Thread Pawel Jakub Dawidek
Hello.

I'm reaching assertion from /sys/net/if_loop.c:270.

This is very easy to reproduce:

First you need to put loopback into promiscuous mode:

# tcpdump -i lo0

Then try to connect to loopback, for example:

# telnet 127.0.0.1 22

Enjoy!:)

-- 
Pawel Jakub Dawidek   [EMAIL PROTECTED]
UNIX Systems Programmer/Administrator http://garage.freebsd.pl
Am I Evil? Yes, I Am! http://cerber.sourceforge.net


pgp0.pgp
Description: PGP signature


Re: panic: sleeping without a mutex (acd related)

2003-11-25 Thread Pawel Jakub Dawidek
On Tue, Nov 25, 2003 at 11:21:03AM +0100, Christian Laursen wrote:
+ I have been experiencing some random lockups after upgrading from
+ 5.1-RELEASE to 5.2-BETA.
+ 
+ I then wen on and enabled all the debug options in my kernel config
+ hoping to be able to find the cause.
+ 
+ But now I cannot boot at all. In the end of the boot process when
+ detecting ATA drives, I get this:
+ 
+ ad0: 76319MB ST380011A [155061/16/63] at ata0-master UDMA100  
+ acd0-5: CDROM with 6 CD changer CD-C68E at ata1-master PIO4   
+ acd6: DVDROM CREATIVEDVD5240E-1 at ata1-slave PIO4
+ panic: sleeping without a mutex 
+ Debugger(panic)   
+ Stopped at  Debugger+0x54:  xchgl   %ebx,in_Debugger.0  
+ db 
+ db trace   
+ Debugger(c06e3744,c07549a0,c06e3ec9,d861ab60,100) at Debugger+0x54  
+ panic(c06e3ec9,0,c06e3eb8,c06d6584,10) at panic+0xd5
+ msleep(c45173d8,0,4c,c06d6584,0) at msleep+0x505
+ acd_geom_access(c452de00,1,0,0,0) at acd_geom_access+0x115  

Yeah. There are two calls of tsleep(9) without timeout set
(in line 499, 509), so this KASSERT is reached:

KASSERT(timo != 0 || mtx_owned(Giant) || mtx != NULL,
(sleeping without a mutex));

-- 
Pawel Jakub Dawidek   [EMAIL PROTECTED]
UNIX Systems Programmer/Administrator http://garage.freebsd.pl
Am I Evil? Yes, I Am! http://cerber.sourceforge.net


pgp0.pgp
Description: PGP signature


Panic after mount() fail.

2003-11-17 Thread Pawel Jakub Dawidek
Hello.

There is a problem with mount(2) failures. It can cause panics.

How-to-repeat.

# dd if=/dev/random of=/test.img bs=1m count=8
# mdconfig -a -t vnode -f /test.img -u 25
# mkdir -p /mnt/test
# mount /dev/md25 /mnt/test
(fail)
# mount /dev/md25 /mnt/test
(panic Memory modified after free ...)

This is because on failure mutex is not destroyed.

Patch:

--- vfs_mount.c.origSun Nov 16 15:46:56 2003
+++ vfs_mount.c Sun Nov 16 15:21:48 2003
@@ -1061,6 +1061,7 @@ update:
vfs_unbusy(mp, td);
else {
mp-mnt_vfc-vfc_refcount--;
+   mtx_destroy(mp-mnt_mtx);
vfs_unbusy(mp, td);
 #ifdef MAC
mac_destroy_mount(mp);
@@ -1142,6 +1143,7 @@ update:
vp-v_iflag = ~VI_MOUNT;
VI_UNLOCK(vp);
mp-mnt_vfc-vfc_refcount--;
+   mtx_destroy(mp-mnt_mtx);
vfs_unbusy(mp, td);
 #ifdef MAC
mac_destroy_mount(mp);

-- 
Pawel Jakub Dawidek   [EMAIL PROTECTED]
UNIX Systems Programmer/Administrator http://garage.freebsd.pl
Am I Evil? Yes, I Am! http://cerber.sourceforge.net


pgp0.pgp
Description: PGP signature


LOR (ffs_snapshot.c:651 vm_map.c:2258).

2003-11-05 Thread Pawel Jakub Dawidek
Hello.

lock order reversal
 1st 0xc66a6db0 vnode interlock (vnode interlock) @ 
/usr/src/sys/ufs/ffs/ffs_snapshot.c:651
 2nd 0xc0c2f110 system map (system map) @ /usr/src/sys/vm/vm_map.c:2258
Stack backtrace:
backtrace(c05bbfcb,c0c2f110,c05c650b,c05c650b,c05c6581) at backtrace+0x17
witness_lock(c0c2f110,8,c05c6581,8d2,c0c2f0b0) at witness_lock+0x686
_mtx_lock_flags(c0c2f110,0,c05c6581,8d2,c6aee000) at _mtx_lock_flags+0xb5
_vm_map_lock(c0c2f0b0,c05c6581,8d2,c69e61b0,0) at _vm_map_lock+0x36
vm_map_remove(c0c2f0b0,c6aee000,c6af,e1b1a7f0,c0555f99) at vm_map_remove+0x30
kmem_free(c0c2f0b0,c6aee000,2000,e1b1a80c,c05579f9) at kmem_free+0x32
page_free(c6aee000,2000,22,c060c4b8,c05e9100) at page_free+0x3a
uma_large_free(c69e61b0,e1b1a83c,c0487f64,c66a6db0,2000) at uma_large_free+0xf9
free(c6aee000,c05e9100,c05c3358,28b,c25aff00) at free+0xe9
ffs_snapshot(c6522600,80c39a0,70,c04b5d36,c060d3e0) at ffs_snapshot+0x23f4
ffs_mount(c6522600,c69c4380,bfbffcc0,e1b1abf0,c6496720) at ffs_mount+0x617
vfs_mount(c6496720,c258ecd0,c69c4380,1211000,bfbffcc0) at vfs_mount+0x7d1
mount(c6496720,e1b1ad14,c05cd44e,3ee,4) at mount+0xba
syscall(2f,2f,2f,0,bfbffdc0) at syscall+0x28f
Xint0x80_syscall() at Xint0x80_syscall+0x1d
--- syscall (21), eip = 0x80557bb, esp = 0xbfbffb6c, ebp = 0xbfbffd48 ---

-- 
Pawel Jakub Dawidek   [EMAIL PROTECTED]
UNIX Systems Programmer/Administrator http://garage.freebsd.pl
Am I Evil? Yes, I Am! http://cerber.sourceforge.net


pgp0.pgp
Description: PGP signature


Panic (in_pcb.c:866).

2003-11-02 Thread Pawel Jakub Dawidek
Hello.

I got this panic while doing 'killall -9 ppp' on FreeBSD 5-CURRENT (kernel
from October 31st):

panic: mtx_lock() of spin mytex @ /usr/src/sys/netinet/in_pcb.c:866
[...]
db trace
[...]
Debugger [...]
panic [...]
_mtx_lock_flags [...]
[...] in_losing+0x40
[...] tcp_timer_rexmt+0x23e
[...] softclock+0x1ad
[...] ithread_loop+0x177
[...] fork_exit+0xb5
[...] fork_trampoline+0x8
[...]

-- 
Pawel Jakub Dawidek   [EMAIL PROTECTED]
UNIX Systems Programmer/Administrator http://garage.freebsd.pl
Am I Evil? Yes, I Am! http://cerber.sourceforge.net


pgp0.pgp
Description: PGP signature


Panic (route.c:99).

2003-11-02 Thread Pawel Jakub Dawidek
Hello.

Kernel from October 31st, while doing 'killall ssh'.

panic: mtx_lock() of spin mutex @ /usr/src/sys/net/route.c:99
db trace
Debugger [...]
panic [...]
_mtx_lock_flags [...]
[...] rtalloc_ign+0x4b
[...] rtalloc+0x19
[...] tcp_rtlookup+0x39
[...] tcp_gettaocache+0x11
[...] tcp_output+0x161
[...] tcp_usr_shutdown+0xb2
[...] soshutdown+0x42
[...] shutdown+0x6c
[...] syscall+0x28f
[...] Xint0x80_syscall+0x1d

-- 
Pawel Jakub Dawidek   [EMAIL PROTECTED]
UNIX Systems Programmer/Administrator http://garage.freebsd.pl
Am I Evil? Yes, I Am! http://cerber.sourceforge.net


pgp0.pgp
Description: PGP signature


LOR (rtsock.c:388 route.c:133).

2003-11-02 Thread Pawel Jakub Dawidek
Hello.

Simlar one was reported but not exactly this one:

lock order reversal
 1st 0xc4516e90 rtentry (rtentry) @ /usr/src/sys/net/rtsock.c:388
 2nd 0xc43e327c radix node head (radix node head) @ /usr/src/sys/net/route.c:133

Stack backtrace:
backtrace(c05c6672,c43e327c,c05cb651,c05cb651,c05cb6a7) at backtrace+0x17
witness_lock(c43e327c,8,c05cb6a7,85,c437c300) at witness_lock+0x686
_mtx_lock_flags(c43e327c,0,c05cb6a7,85,c4516c90) at _mtx_lock_flags+0xb4
rtalloc1(c4539a6c,1,1,435,0) at rtalloc1+0x74
rt_setgate(c4516e00,c437c300,c4539a6c,184,0) at rt_setgate+0x23c
route_output(c1926700,c44f8dd0,8c,c1926700,1f74) at route_output+0x674
raw_usend(c44f8dd0,0,c1926700,0,0) at raw_usend+0x76
rts_send(c44f8dd0,0,c1926700,0,0) at rts_send+0x35
sosend(c44f8dd0,0,e8916c80,c1926700,0) at sosend+0x429
soo_write(c4497374,e8916c80,c452d480,0,c446b4c0) at soo_write+0x92
dofilewrite(c446b4c0,c4497374,2,bfbfeab0,8c) at dofilewrite+0xe3
write(c446b4c0,e8916d14,c05d6e5b,3f0,3) at write+0x6f
syscall(2f,2f,2f,2,3) at syscall+0x28f
Xint0x80_syscall() at Xint0x80_syscall+0x1d
--- syscall (4), eip = 0x2826e173, esp = 0xbfbfe89c, ebp = 0xbfbfe8c8 ---

-- 
Pawel Jakub Dawidek   [EMAIL PROTECTED]
UNIX Systems Programmer/Administrator http://garage.freebsd.pl
Am I Evil? Yes, I Am! http://cerber.sourceforge.net


pgp0.pgp
Description: PGP signature


Re: HEADSUP: MPSAFE network drivers

2003-10-30 Thread Pawel Jakub Dawidek
On Wed, Oct 29, 2003 at 11:52:48AM -0700, Sam Leffler wrote:
+ I'm committing changes to mark various network drivers' interrupt handlers 
+ MPSAFE. To insure folks have a way to backout if they hit problems I've also 
+ added a tunable that lets you disable this w/o rebuilding your kernel.  By 
+ default all network drivers that register an interrupt handler INTR_MPSAFE 
+ are setup to run their ISR w/o Giant.  If you want to defeat this w/o 
+ changing the code you can set
+ 
+ debug.mpsafenet=0
+ 
+ from the loader when booting and the MPSAFE bit will automatically be removed.  
+ I plan to use this to also control forthcoming changes for registering MPSAFE 
+ netisrs.
+ 
+ The following drivers are marked MPSAFE:
+ 
+ ath, em, ep, fxp, sn, wi, sis
+ 
+ I've got changes coming for bge.  Other drivers probably can be marked MPSAFE 
+ but I'm only doing it for those drivers that I can test.

Because there is so many drivers, maybe you could prepare some regression
tests designed to check changed things. This will allow people to test your
changes - it is not very easy now if we don't know what we're looking for
exactly PLUS those drivers aren't marked MPSAFE by default.

-- 
Pawel Jakub Dawidek   [EMAIL PROTECTED]
UNIX Systems Programmer/Administrator http://garage.freebsd.pl
Am I Evil? Yes, I Am! http://cerber.sourceforge.net


pgp0.pgp
Description: PGP signature


LOR (swap_pager.c:1134 vm_kern.c:328).

2003-10-24 Thread Pawel Jakub Dawidek
Hello.

It was reported already?

 1st 0xc0c1ede0 vm object (vm object) @ /usr/src/sys/vm/swap_pager.c:1134
 2nd 0xc0c2f110 system map (system map) @ /usr/src/sys/vm/vm_kern.c:328
Stack backtrace:
backtrace(c05cce28,c0c2f110,c05d72b0,c05d72b0,c05d7147) at backtrace+0x17
witness_lock(c0c2f110,8,c05d7147,148,c0c2f0b0) at witness_lock+0x686
_mtx_lock_flags(c0c2f110,0,c05d7147,148,101) at _mtx_lock_flags+0xbb
_vm_map_lock(c0c2f0b0,c05d7147,148,c061a748,c061a770) at _vm_map_lock+0x36
kmem_malloc(c0c2f0b0,1000,101,c46b78bc,c056aead) at kmem_malloc+0x3a
page_alloc(c0c3a3c0,1000,c46b78af,101,0) at page_alloc+0x27
slab_zalloc(c0c3a3c0,1,c0c3a3d4,8,c05d8ac1) at slab_zalloc+0xb3
uma_zone_slab(c0c3a3c0,1,c05d8ac1,68c,0) at uma_zone_slab+0xda
uma_zalloc_internal(c0c3a3c0,0,1,0,c0c206b0) at uma_zalloc_internal+0x3e
bucket_alloc(80,1,c05d8ac1,70b,0) at bucket_alloc+0x5e
uma_zfree_arg(c0c20600,c472ebdc,0,7b6,8000) at uma_zfree_arg+0x299
swp_pager_meta_ctl(c0c1ede0,1f,0,2,c46b7a9c) at swp_pager_meta_ctl+0x10d
swap_pager_unswapped(c0cbfb28,1,c05c7357,bd,c46b7a14) at swap_pager_unswapped+0x
2a
vm_fault(c0d415e8,bfbff000,2,8,c154d390) at vm_fault+0x1186
trap_pfault(c46b7b0c,0,bfbffcc8,c063cee0,bfbffcc8) at trap_pfault+0x119
trap(18,10,10,bfbffcc8,c46b7bac) at trap+0x2f7
calltrap() at calltrap+0x5
--- trap 0xc, eip = 0xc059d82c, esp = 0xc46b7b4c, ebp = 0xc46b7cb8 ---
slow_copyout(c154d390,5,bfbffcc8,bfbffc48,0) at slow_copyout+0x4
select(c154d390,c46b7d14,c05dd181,3ed,5) at select+0x67
syscall(2f,2f,2f,bfbffcc8,1) at syscall+0x28f
Xint0x80_syscall() at Xint0x80_syscall+0x1d
--- syscall (93), eip = 0x280bbad3, esp = 0xbfbffbfc, ebp = 0xbfbffda0 ---

-- 
Pawel Jakub Dawidek   [EMAIL PROTECTED]
UNIX Systems Programmer/Administrator http://garage.freebsd.pl
Am I Evil? Yes, I Am! http://cerber.sourceforge.net


pgp0.pgp
Description: PGP signature


AES is broken.

2003-10-17 Thread Pawel Jakub Dawidek
Hello.

After recent changes to AES, GBDE is borken.

How to repeat:

# mdconfig -a -t malloc -s 16M
# gbde init /dev/md0 -L /etc/md0.lock
# gbde attach md0 -l /etc/md0.lock
# newfs -O2 /dev/md0.bde || echo BROKEN

-- 
Pawel Jakub Dawidek   [EMAIL PROTECTED]
UNIX Systems Programmer/Administrator http://garage.freebsd.pl
Am I Evil? Yes, I Am! http://cerber.sourceforge.net


pgp0.pgp
Description: PGP signature


Re: GEOM_BDE

2003-10-15 Thread Pawel Jakub Dawidek
On Tue, Oct 14, 2003 at 09:49:03PM +0200, Jacek Serwatynski wrote:
+ I have problem with compiling my kernel. I wanted to play with gbde so i
+ added options GEOM_BDE.I have been doing cvsup at Tue Oct 14 20:43:17 2003 CEST
+ My config kernel:
[...]

You have to add 'device random' to your kernel config.

-- 
Pawel Jakub Dawidek   [EMAIL PROTECTED]
UNIX Systems Programmer/Administrator http://garage.freebsd.pl
Am I Evil? Yes, I Am! http://cerber.sourceforge.net


pgp0.pgp
Description: PGP signature


Re: GEOM_BDE

2003-10-15 Thread Pawel Jakub Dawidek
On Wed, Oct 15, 2003 at 09:56:57AM +0200, Poul-Henning Kamp wrote:
+ I have problem with compiling my kernel. I wanted to play with gbde so i
+ added options GEOM_BDE.I have been doing cvsup at Tue Oct 14 20:43:17 2003 CEST
+ My config kernel:
+ 
+ /usr/src/sys/geom/bde/g_bde.h:180: undefined reference to `rijndael_cipherInit'
+ /usr/src/sys/geom/bde/g_bde.h:207: undefined reference to `rijndael_blockDecrypt'
+ 
+ I had same problem until I added device   random to kernel config file.
+ 
+ Yes, the recent commits to the rijndael code must have messed up something 

No, this always was a problem. There were no chance to use BDE when
'device random' isn't compiled in kernel, but is loaded as kernel module.

-- 
Pawel Jakub Dawidek   [EMAIL PROTECTED]
UNIX Systems Programmer/Administrator http://garage.freebsd.pl
Am I Evil? Yes, I Am! http://cerber.sourceforge.net


pgp0.pgp
Description: PGP signature


LOR (route.c:182 route.c:133).

2003-10-15 Thread Pawel Jakub Dawidek
Hello.

Already reported?

lock order reversal
 1st 0xc47b6490 rtentry (rtentry) @ /usr/src/sys/net/route.c:182
 2nd 0xc44be77c radix node head (radix node head) @ /usr/src/sys/net/route.c:133

Stack backtrace:
backtrace(c05b43a7,c44be77c,c05b934c,c05b934c,c05b93a2) at backtrace+0x17
witness_lock(c44be77c,8,c05b93a2,85,c4358540) at witness_lock+0x686
_mtx_lock_flags(c44be77c,0,c05b93a2,85,246) at _mtx_lock_flags+0xb4
rtalloc1(c05dcadc,1,1,3d7,d762bb44) at rtalloc1+0x74
rt_setgate(c47b6400,c4358540,c05dcadc,c0600868,c4425000) at rt_setgate+0x264
rtredirect(c05dcacc,c05dcadc,0,6,c05dcaec) at rtredirect+0x1ad
icmp_input(c192a000,14,c04b3a4a,c0600610,c0600868) at icmp_input+0x500
ip_input(c192a000,0,c05b91a8,89,0) at ip_input+0x922
netisr_processqueue(c0624a90,0,c05b91a8,e5,c190df00) at netisr_processqueue+0x8a

swi_net(0,0,c05af115,215,c1916974) at swi_net+0x90
ithread_loop(c1914d80,d762bd48,c05aef87,314,c1914d80) at ithread_loop+0x177
fork_exit(c047ce45,c1914d80,d762bd48) at fork_exit+0xc2
fork_trampoline() at fork_trampoline+0x8
--- trap 0x1, eip = 0, esp = 0xd762bd7c, ebp = 0 ---

-- 
Pawel Jakub Dawidek   [EMAIL PROTECTED]
UNIX Systems Programmer/Administrator http://garage.freebsd.pl
Am I Evil? Yes, I Am! http://cerber.sourceforge.net


pgp0.pgp
Description: PGP signature


LOR (tcp_input.c:654 tcp_usrreq.c:621).

2003-10-14 Thread Pawel Jakub Dawidek
Hello.

I'm not sure if this was reported already.

lock order reversal
 1st 0xc51046ec inp (inp) @ /usr/src/sys/netinet/tcp_input.c:654
 2nd 0xc0642cac tcp (tcp) @ /usr/src/sys/netinet/tcp_usrreq.c:621
Stack backtrace:
backtrace(c05d0e2c,c0642cac,c05d63bc,c05d63bc,c05d76ab) at backtrace+0x17
witness_lock(c0642cac,8,c05d76ab,26d,74) at witness_lock+0x671
_mtx_lock_flags(c0642cac,0,c05d76ab,26d,74) at _mtx_lock_flags+0xba
tcp_usr_rcvd(c5574400,80,c05d1437,db70ca84,3b9aca00) at tcp_usr_rcvd+0x30
soreceive(c5574400,db70cac0,db70cacc,db70cac4,0) at soreceive+0x7ff
nfsrv_rcv(c5574400,c7a79480,4,c5105de8,18) at nfsrv_rcv+0x87
sowakeup(c5574400,c557444c,c05d6dc0,446,108) at sowakeup+0x89
tcp_input(c1bfc800,14,c06428d4,c05f066c,db70cc48) at tcp_input+0xed1
ip_input(c1bfc800,0,c05d5bfc,89,0) at ip_input+0x81f
netisr_processqueue(c0641350,0,c05d5bfc,e5,c1bc8100) at netisr_processqueue+0x8e

swi_net(0,0,c05cb58d,215,c1bda974) at swi_net+0x8c
ithread_loop(c1bd8d80,db70cd48,c05cb3ff,314,c1bd8d80) at ithread_loop+0x172
fork_exit(c047d9e0,c1bd8d80,db70cd48) at fork_exit+0xc0
fork_trampoline() at fork_trampoline+0x8
--- trap 0x1, eip = 0, esp = 0xdb70cd7c, ebp = 0 ---

-- 
Pawel Jakub Dawidek   [EMAIL PROTECTED]
UNIX Systems Programmer/Administrator http://garage.freebsd.pl
Am I Evil? Yes, I Am! http://cerber.sourceforge.net


pgp0.pgp
Description: PGP signature


GEOM Gate.

2003-10-14 Thread Pawel Jakub Dawidek
Hello hackers...

Ok, GEOM Gate is ready for testing.
For those who don't know what it is, they can read README:

http://garage.freebsd.pl/geom_gate.README

and presentation from WIP/BSDCon03 session:

http://garage.freebsd.pl/GEOM_Gate.pdf

After compliation (cd geom_gate; make; make install) you should run
regression tests:

# regression/runtests.sh

If everything will went ok you can play with GEOM Gate and report any bugs.

I've spend some time to made GEOM Gate force-remove-safe so using
'-f' option with ggc(8) should be always safe.

Ah! Four manual pages are added, so feel free to read them first
(gg(4), geom_gate(4), ggc(8), ggd(8))

http://garage.freebsd.pl/geom_gate.tbz

Enjoy!

-- 
Pawel Jakub Dawidek   [EMAIL PROTECTED]
UNIX Systems Programmer/Administrator http://garage.freebsd.pl
Am I Evil? Yes, I Am! http://cerber.sourceforge.net


pgp0.pgp
Description: PGP signature


Re: need some debugging help

2003-09-01 Thread Pawel Jakub Dawidek
On Fri, Aug 29, 2003 at 10:03:57PM -0600, Kenneth D. Merry wrote:
+ I've been working on a set of patches to remove the sysctl variable creation
+ from interrupt context in the cd(4) and da(4) drivers.
+ 
+ To fix the problem, I've created a new taskqueue that runs in a thread
+ context, instead of inside a software interrupt like the current task
+ queues.  (The eventual fix will involve moving the CAM probe inside a
+ thread; this will provide a more temporary solution that will hopefully
+ also work on -stable, until we can change the CAM probe code.)
+ 
+ I think I have everything setup correctly, but I keep getting panics inside
+ the GEOM code with these patches.  (Memory modified after free.)  I don't
+ know whether I've just exposed some race condition, or whether I've done
+ something wrong.
+ 
+ I've seen several different panics, all with the same root cause (memory
+ modified after free), and with two different previous memory pools -- geom
+ and devbuf.

I was getting same panics while I was working on GEOM Gate.
After many hours of debugging I've tracked this down - I've initialized
a mutex, but I haven't destroy it.

As I susspect you're loading cd(4) as kld module?

It seems, that you're making exactly same bug:

mtx_init(kthread_mutex, taskqueue kthread, NULL, MTX_DEF);

And where is mtx_destroy()?

-- 
Pawel Jakub Dawidek   [EMAIL PROTECTED]
UNIX Systems Programmer/Administrator http://garage.freebsd.pl
Am I Evil? Yes, I Am! http://cerber.sourceforge.net


pgp0.pgp
Description: PGP signature


Re: need some debugging help

2003-09-01 Thread Pawel Jakub Dawidek
On Mon, Sep 01, 2003 at 02:13:45AM +0200, Pawel Jakub Dawidek wrote:
+ I was getting same panics while I was working on GEOM Gate.
+ After many hours of debugging I've tracked this down - I've initialized
+ a mutex, but I haven't destroy it.
+ 
+ As I susspect you're loading cd(4) as kld module?

No, you don't need to load it as kld module, because you initiate
this mutex on every function call (and mutex is locally allocated to),
so try to put mtx_destroy() on the end of this function, this should help.
(I hope there is no problem with calling msleep(9) with mutex from stack)

-- 
Pawel Jakub Dawidek   [EMAIL PROTECTED]
UNIX Systems Programmer/Administrator http://garage.freebsd.pl
Am I Evil? Yes, I Am! http://cerber.sourceforge.net


pgp0.pgp
Description: PGP signature


Re: need some debugging help

2003-09-01 Thread Pawel Jakub Dawidek
On Mon, Sep 01, 2003 at 12:48:41AM -0600, Kenneth D. Merry wrote:
+  - I tried just holding a mutex all the time, but obviously you can't
+malloc while holding a mutex (except Giant), and the sysctl code does a
+number of mallocs.  (The original cause of this problem -- M_WAITOK
+mallocs.)

I've proposed some time ago changing M_WAITOK to M_NOWAIT, because
function/macros responsible for sysctl creation could failed from other
reasons, so I don't see any reason why they couldn't fail because of
insufficient memory. Caller is obliged to check return value...

-- 
Pawel Jakub Dawidek   [EMAIL PROTECTED]
UNIX Systems Programmer/Administrator http://garage.freebsd.pl
Am I Evil? Yes, I Am! http://cerber.sourceforge.net


pgp0.pgp
Description: PGP signature


Re: Lot's of SIGILL, SIGSEGV

2003-08-18 Thread Pawel Jakub Dawidek
On Sun, Aug 17, 2003 at 08:00:54PM -0700, David O'Brien wrote:
+  This is a FAQ. In the future, please search the archives before posting.
+  
+  At this moment in time, 'p4' isn't a safe CPUTYPE (It produces broken
+  code). 'p3' or 'i686' are what's recommended for Pentium 4s.
+ 
+ Andre, I think you are out of date -- CPUTYPE=p4 is now safe with GCC
+ 3.3.1.

I think he is right, because when upgrading host where was gcc3.2 to
current -CURRENT (with gcc3.3) 'make world' builds make(1) in first
place and it is builded by gcc3.2 with CPUTYPE=p4, so it will be broken.

So gcc have to be upgraded in first place (with CPUTYPE=p3).

-- 
Pawel Jakub Dawidek   [EMAIL PROTECTED]
UNIX Systems Programmer/Administrator http://garage.freebsd.pl
Am I Evil? Yes, I Am! http://cerber.sourceforge.net


pgp0.pgp
Description: PGP signature


Re: fuword(), suword(), etc.

2003-07-24 Thread Pawel Jakub Dawidek
On Wed, Jul 23, 2003 at 02:48:41PM -0700, Julian Elischer wrote:
+ I'd like to have a suptr and fuptr to be able to save and read
+ user pointers in a machine independent manner..
+ at the moment ia need to know the size of a pointer and select the
+ appropriate 32 or 64 version.. It would jus tbe another ENTRY files in 
+ support.[sS] alongside teh appropriate sized entry
+ for each architecture so it wouldn't 'cost' anything..
+ 
+ for i386 it would be an alternate name for fuword32() and suword32()
+ I'm not sure what it would be on other architectures
+ 
+ comments?

Yes, good idea. I'm using for now something like this:

static __inline void *
fuptr(void *uaddr)
{
void *ptr;

if (copyin(uaddr, ptr, sizeof(void *)) != 0)
return ((void *)-1);

return (ptr);
}

For numbers is always better to use copyin(9)/copyout(9). Functions
like fubyte(9), etc. make no sens for me. -1 is returned on error or
if there is really -1, so one isn't able to find out if there is an
error or not.

-- 
Pawel Jakub Dawidek   [EMAIL PROTECTED]
UNIX Systems Programmer/Administrator http://garage.freebsd.pl
Am I Evil? Yes, I Am! http://cerber.sourceforge.net


pgp0.pgp
Description: PGP signature


File system deadlock. GBDE(4) and/or MD(4) related.

2003-07-24 Thread Pawel Jakub Dawidek
Hello.

I've found deadlock in gbde(4) and/or md(4).

Here is a complete procedure hot to repeat it:

# touch /mnt/test.file
# mdconfig -a -t vnode -f /mnt/test.file -s 512M -u 1
# mkdir /etc/gbde
# gbde init /dev/md1 -L /etc/gbde/md1
Enter new passphrase:
Reenter new passphrase:
Wrote key 0 at 25444352
# gbde attach md1 -l /etc/gbde/md1
Enter passphrase:
# newfs -U -O2 /dev/md1.bde
/dev/md1.bde: 496.5MB (1016768 sectors) block size 16384, fragment size 2048
using 4 cylinder groups of 124.12MB, 7944 blks, 15936 inodes.
with soft updates
super-block backups (for fsck -b #) at:
 160, 254368, 508576, 762784
# mkdir /mnt/test
# mount /dev/md1.bde /mnt/test
# cp -R /usr/src /mnt/test 
[ wait about 10 seconds ]
# ls /mnt/te[TAB]
or
# sync;sync

-- 
Pawel Jakub Dawidek   [EMAIL PROTECTED]
UNIX Systems Programmer/Administrator http://garage.freebsd.pl
Am I Evil? Yes, I Am! http://cerber.sourceforge.net


pgp0.pgp
Description: PGP signature


Re: File system deadlock. GBDE(4) and/or MD(4) related.

2003-07-24 Thread Pawel Jakub Dawidek
On Thu, Jul 24, 2003 at 03:57:07PM +0200, Pawel Jakub Dawidek wrote:
+ I've found deadlock in gbde(4) and/or md(4).

Yes, it is gbde fault:

db show lockedvnods
[...]
0xc3332920: tag ufs, type VREG, usecount 1, writecount 0, refcount 21, flags 
(VV_OBJBUF), lock type
ufs: EXCL (count 1) by thread 0xc2ca5000
ino 8214, on dev md0.bde (4, 23)
0xc33325b4: tag ufs, type VREG, usecount 2, writecount 1, refcount 25, flags 
(VV_OBJBUF), lock type
ufs: EXCL (count 1) by thread 0xc2ec3980
ino 8217, on dev md0.bde (4, 23)
[...]

Look at refcounts.

-- 
Pawel Jakub Dawidek   [EMAIL PROTECTED]
UNIX Systems Programmer/Administrator http://garage.freebsd.pl
Am I Evil? Yes, I Am! http://cerber.sourceforge.net


pgp0.pgp
Description: PGP signature


Re: File system deadlock. GBDE(4) and/or MD(4) related.

2003-07-24 Thread Pawel Jakub Dawidek
On Thu, Jul 24, 2003 at 05:03:23PM +0200, Poul-Henning Kamp wrote:
+ # touch /mnt/test.file
+ 
+ You are probably missing:
+ 
+  dd if=/dev/null of=/mnt/test.file bs=1m count=512

You mean /dev/zero? But this doesn't change anything.

+ # mdconfig -a -t vnode -f /mnt/test.file -s 512M -u 1
+ 
+ What you have found has nothing to do with GBDE, I think it is the
+ usual vnode backed md(4) deadlock.

Hmm? So you're trying to tell that this is somehow normal behaviour?

-- 
Pawel Jakub Dawidek   [EMAIL PROTECTED]
UNIX Systems Programmer/Administrator http://garage.freebsd.pl
Am I Evil? Yes, I Am! http://cerber.sourceforge.net


pgp0.pgp
Description: PGP signature


Re: File system deadlock. GBDE(4) and/or MD(4) related.

2003-07-24 Thread Pawel Jakub Dawidek
On Thu, Jul 24, 2003 at 08:38:02PM +0200, Poul-Henning Kamp wrote:
+ + What you have found has nothing to do with GBDE, I think it is the
+ + usual vnode backed md(4) deadlock.
+ 
+ Hmm? So you're trying to tell that this is somehow normal behaviour?
+ 
+ We've had problems like this before with vnode backed MD(4) devices
+ (and vn(4) devices before that).
+ 
+ One way or another:  It is _not_ a GBDE problem.

Hey, Poul! I'm not trying to show that gbde(4) is a buggy software,
I'm not trying to destroy you work, your image or FreeBSD, really.

I believe that this isn't bug in gbde(4), my fault, sorry.

But one thing I know, is that bug is somewhere and I just want to help
track it down.

This information could be useful:

When I've mounted file system on /private (not on /mnt/private) there
is no problem anymore. So maybe deadlock is caused by some directory
locking or something? Because if file system in mounted on /mnt/private
deadlock is 100% reproducable.

-- 
Pawel Jakub Dawidek   [EMAIL PROTECTED]
UNIX Systems Programmer/Administrator http://garage.freebsd.pl
Am I Evil? Yes, I Am! http://cerber.sourceforge.net


pgp0.pgp
Description: PGP signature


Re: possible unionfs bug

2003-07-20 Thread Pawel Jakub Dawidek
On Sun, Jul 20, 2003 at 03:02:21PM +0200, Divacky Roman wrote:
+ I might be wrong but this:
+ 
+ free(mp-mnt_data, M_UNIONFSMNT);   /* XXX */
+  mp-mnt_data = 0;
+  
+ seems to me wrong and might cause crashes etc.
+ am I correct or wrong?

Could you describe scenario when this could be dangerous? Or why do you
think it is?

This memory is allocated while mounting unionfs file system,
so it is quite natural to free this memory while unmounting file system.

-- 
Pawel Jakub Dawidek   [EMAIL PROTECTED]
UNIX Systems Programmer/Administrator http://garage.freebsd.pl
Am I Evil? Yes, I Am! http://cerber.sourceforge.net


pgp0.pgp
Description: PGP signature


Re: Screen saver: bsd_saver.

2003-06-27 Thread Pawel Jakub Dawidek
On Fri, Jun 27, 2003 at 09:20:03AM +, Bosko Milekic wrote:
+  Hello there.
+  
+  I've wrote screen saver for FreeBSD 5.x with rotating bsd logo.
+  
+http://garage.freebsd.pl/bsd_saver.tbz
+  
+  Any chance to add it to tree?
+  
+  I don't know whether it works or not, but this contains 
+  floating point instruction, which is hardly used and needs cafeful 
+  treatment. (As far as I know, FP instruction is used only on
+  i586_bcopy) What do you think about it?
+ 
+   FWIW, I've tested this yesterday and wanted to commit it but
+   shamefully I must admit that I don't know how to properly prepare a
+   port.  The screen saver works and is pretty neat although I had to
+   build in low video mode.

Andrew Kenneth Milton [EMAIL PROTECTED] suggest me to automatically
turn on low video mode if there is no chance to turn on high and to
automaticly load vesa.ko if required.
I think, that those suggestion are good and I'll implement them.

-- 
Pawel Jakub Dawidek   [EMAIL PROTECTED]
UNIX Systems Programmer/Administrator http://garage.freebsd.pl
Am I Evil? Yes, I Am! http://cerber.sourceforge.net


pgp0.pgp
Description: PGP signature


Re: Screen saver: bsd_saver.

2003-06-27 Thread Pawel Jakub Dawidek
On Fri, Jun 27, 2003 at 03:35:19PM +0200, Pawel Jakub Dawidek wrote:
+ +   FWIW, I've tested this yesterday and wanted to commit it but
+ +   shamefully I must admit that I don't know how to properly prepare a
+ +   port.  The screen saver works and is pretty neat although I had to
+ +   build in low video mode.
+ 
+ Andrew Kenneth Milton [EMAIL PROTECTED] suggest me to automatically
+ turn on low video mode if there is no chance to turn on high and to
+ automaticly load vesa.ko if required.
+ I think, that those suggestion are good and I'll implement them.

Ok. Done.

http://garage.freebsd.pl/bsd_saver.tbz

I'm not able to add depency on vesa module without this patch:

http://garage.freebsd.pl/vesa.patch

So for now it will try to run on 1024x768 screen, then 800x600 and at
the end on 320x200. If vesa will be loaded it should run on 1024x768 and
if not on 320x200.

-- 
Pawel Jakub Dawidek   [EMAIL PROTECTED]
UNIX Systems Programmer/Administrator http://garage.freebsd.pl
Am I Evil? Yes, I Am! http://cerber.sourceforge.net


pgp0.pgp
Description: PGP signature


Screen saver: bsd_saver.

2003-06-26 Thread Pawel Jakub Dawidek
Hello there.

I've wrote screen saver for FreeBSD 5.x with rotating bsd logo.

http://garage.freebsd.pl/bsd_saver.tbz

Any chance to add it to tree?

Thanks.

-- 
Pawel Jakub Dawidek   [EMAIL PROTECTED]
UNIX Systems Programmer/Administrator http://garage.freebsd.pl
Am I Evil? Yes, I Am! http://cerber.sourceforge.net


pgp0.pgp
Description: PGP signature


3d screen saver.

2003-06-24 Thread Pawel Jakub Dawidek
Hello:)

I want to present one-night-hack: 3d CERB logo screen saver.

It is dedicated for FreeBSD 5.x and it is quite nice (IMHO).

You can download it from:

http://garage.freebsd.pl/cerb_saver.tbz

Enjoy!

-- 
Pawel Jakub Dawidek   [EMAIL PROTECTED]
UNIX Systems Programmer/Administrator http://garage.freebsd.pl
Am I Evil? Yes, I Am! http://cerber.sourceforge.net


pgp0.pgp
Description: PGP signature


Re: 5.1-RELEASE panic, trace included

2003-06-14 Thread Pawel Jakub Dawidek
On Sat, Jun 14, 2003 at 02:28:33AM -0400, Robert Watson wrote:
+ If you have the kernel.debug for this kernel, could you send the gdb -k
+ output of: 
+ 
+ l *in6_pcbbind+0x2a7

I've looked at objdump -d kernel, and it looks like this is somewhere here:

214:t = in_pcblookup_local(pcbinfo,
215:sin.sin_addr, lport,
216:INPLOOKUP_WILDCARD);
217:if (t 
218:(so-so_cred-cr_uid !=
219: t-inp_socket-so_cred-cr_uid) 
220:(ntohl(t-inp_laddr.s_addr) !=
221: INADDR_ANY ||
222: INP_SOCKAF(so) ==
223: INP_SOCKAF(t-inp_socket)))
224:return (EADDRINUSE);

We're talking about this line:

test%eax,%eax
je  c03ac9c7 in6_pcbbind+0x2e7
mov 0x64(%eax),%eax
mov %eax,0xffd0(%ebp)
=  mov 0xc4(%eax),%edx
mov 0xc4(%esi),%eax
mov 0x4(%eax),%eax
cmp 0x4(%edx),%eax
je  c03ac9c7 in6_pcbbind+0x2e7

We're loading inp_socket-so_cred to edx here.
So it looks like inp_socket is NULL. Hmm, it is possible?

-- 
Pawel Jakub Dawidek   [EMAIL PROTECTED]
UNIX Systems Programmer/Administrator http://garage.freebsd.pl
Am I Evil? Yes, I Am! http://cerber.sourceforge.net


pgp0.pgp
Description: PGP signature


pam_unix.c [PATCH].

2003-05-29 Thread Pawel Jakub Dawidek
Hello.

I think there is no need to open a PR for this.
Argument 'flags' marked as unused is used in those functions:

--- pam_unix.c.orig Wed May 28 23:31:54 2003
+++ pam_unix.c  Wed May 28 23:32:40 2003
@@ -95,8 +95,7 @@
  * authentication management
  */
 PAM_EXTERN int
-pam_sm_authenticate(pam_handle_t *pamh, int flags __unused,
-int argc, const char *argv[])
+pam_sm_authenticate(pam_handle_t *pamh, int flags, int argc, const char *argv[])
 {
login_cap_t *lc;
struct options options;
@@ -159,8 +158,7 @@
  * account management
  */
 PAM_EXTERN int
-pam_sm_acct_mgmt(pam_handle_t *pamh, int flags __unused,
-int argc, const char *argv[])
+pam_sm_acct_mgmt(pam_handle_t *pamh, int flags, int argc, const char *argv[])
 {
struct addrinfo hints, *res;
struct options options;

-- 
Pawel Jakub Dawidek   [EMAIL PROTECTED]
UNIX Systems Programmer/Administrator http://garage.freebsd.pl
Am I Evil? Yes, I Am! http://cerber.sourceforge.net


pgp0.pgp
Description: PGP signature


Re: 5-STABLE Roadmap

2003-02-16 Thread Pawel Jakub Dawidek
On Thu, Feb 13, 2003 at 08:28:43PM -0800, Sam Leffler wrote:
+ This can quickly turn into a bikeshed, but suggest ones.  We're looking for
+ good benchmarks. [...]

Look at:

http://www.web-polygraph.org

It provides tests for www-cache/proxy stuff.
We can test many things with it:

- how fast could we generate workload,
- how heavy load could we handle,
- how fast is squid running on FreeBSD,
- how fast is squid rewritten with libkse,
- etc.

And this is good stablility test.
This is real good and free stuff, I use it on 4.x.

-- 
Pawel Jakub Dawidek
UNIX Systems Administrator
http://garage.freebsd.pl
Am I Evil? Yes, I Am.



msg52484/pgp0.pgp
Description: PGP signature


Re: 5-STABLE Roadmap

2003-02-16 Thread Pawel Jakub Dawidek
On Sun, Feb 16, 2003 at 02:08:35PM -0700, Scott Long wrote:
+ Pawel Jakub Dawidek wrote:
+ 
+ On Thu, Feb 13, 2003 at 08:28:43PM -0800, Sam Leffler wrote:
+ + This can quickly turn into a bikeshed, but suggest ones.  We're
+ looking for
+ + good benchmarks. [...]
+ 
+ Look at:
+ 
+ http://www.web-polygraph.org
+ 
+ It provides tests for www-cache/proxy stuff.
+ We can test many things with it:
+ 
+ - how fast could we generate workload,
+ - how heavy load could we handle,
+ - how fast is squid running on FreeBSD,
+ - how fast is squid rewritten with libkse,
+ - etc.
+ 
+ And this is good stablility test.
+ This is real good and free stuff, I use it on 4.x.
+ 
+ Thanks for the pointer, this looks very interesting.  How hard
+ is it to set up?  [...]

Setting it up is quite simple, but it doesn't compile with gcc 3.x...
Authors of this stuff proposing to use it with FreeBSD 4.x, so it is well
tested on out favorite system:)

+ [...] DO you have any test configuations and/or
+ scripts that we could adapt?

Yes, on website kernel patches are avaliable for tunning, but for new
releases of 4.x this isn't necessary, all could be configure with kernel
options and sysctls (for 4.8):

options MAXFILES=16384
options HZ=1000
options NMBCLUSTERS=32678

kern.ipc.somaxconn=1024
net.inet.ip.portrange.last=4
net.inet.tcp.delayed_ack=0
net.inet.tcp.msl=3000

Rest is quite simple/well documented. Tests in theory could be run on
one machine, so... And some nice looking results generated by web-polygraph:

Without any proxy:
http://garage.freebsd.pl/pm3-15-11-2k2
With squid:
http://garage.freebsd.pl/pm3-05-11-2k2
http://garage.freebsd.pl/pm3-06-11-2k2
With external proxy:
http://garage.freebsd.pl/pm3-29-01-2k3

PS. I'm CC-ing this thread to one of polygraph's authors, he could be
interested as well.

-- 
Pawel Jakub Dawidek
UNIX Systems Administrator
http://garage.freebsd.pl
Am I Evil? Yes, I Am.



msg52494/pgp0.pgp
Description: PGP signature


LOR: if_ether.c - route.c.

2003-02-03 Thread Pawel Jakub Dawidek
Hello.

We got lock order reversal here:

 1st 0xc0384800 arp mutex (arp mutex) @ /usr/src/sys/netinet/if_ether.c:151
 2nd 0xc1886b7c radix node head (radix node head) @ /usr/src/sys/net/route.c:549

Simple backtrace:
rtreqest1() [route.c]
rtreqest() [route.c]
arptfree() [if_ether.c]
arptimer() [if_ether.c]

-- 
Pawel Jakub Dawidek
UNIX Systems Administrator
http://garage.freebsd.pl
Am I Evil? Yes, I Am.



msg51659/pgp0.pgp
Description: PGP signature


Re: LOR: if_ether.c - route.c.

2003-02-03 Thread Pawel Jakub Dawidek
On Mon, Feb 03, 2003 at 12:06:28PM +0100, Pawel Jakub Dawidek wrote:
+ We got lock order reversal here:
+ 
+  1st 0xc0384800 arp mutex (arp mutex) @ /usr/src/sys/netinet/if_ether.c:151
+  2nd 0xc1886b7c radix node head (radix node head) @ /usr/src/sys/net/route.c:549
+ 
+ Simple backtrace:
+ rtreqest1() [route.c]
+ rtreqest() [route.c]
+ arptfree() [if_ether.c]
+ arptimer() [if_ether.c]

I think that MTX_DUPOK is needed here, so:

--- radix.h.origSun Feb  2 20:07:42 2003
+++ radix.h Mon Feb  3 21:48:30 2003
@@ -159,7 +159,7 @@
 
 
 #define RADIX_NODE_HEAD_LOCK_INIT(rnh) \
-mtx_init((rnh)-rnh_mtx, radix node head, NULL, MTX_DEF | MTX_RECURSE)
+mtx_init((rnh)-rnh_mtx, radix node head, NULL, MTX_DEF | MTX_RECURSE | 
+MTX_DUPOK)
 #define RADIX_NODE_HEAD_LOCK(rnh)  mtx_lock((rnh)-rnh_mtx)
 #define RADIX_NODE_HEAD_UNLOCK(rnh)mtx_unlock((rnh)-rnh_mtx)
 #define RADIX_NODE_HEAD_DESTROY(rnh)   mtx_destroy((rnh)-rnh_mtx)

Am I right?
radix node head is locked first time in arptimer() and 2nd in rtrequest1().
And (if I understand code well) those locks should be in both functions,
because rtrequest1() is not only called through arptimer(), but also through
other functions that don't lock it eariler.

-- 
Pawel Jakub Dawidek
UNIX Systems Administrator
http://garage.freebsd.pl
Am I Evil? Yes, I Am.



msg51703/pgp0.pgp
Description: PGP signature


Lock order reversal in if_tl device.

2003-01-03 Thread Pawel Jakub Dawidek
Hello...

tl0: device timeout
lock order reversal
 1st 0xc0331020 ifnet (ifnet) @ /usr/src/sys/net/if.c:1181
 2nd 0xc2576ab8 tl0 (network driver) @ /usr/src/sys/pci/if_tl.c:2067

Driver is loaded via kld module from /boot/loader.conf.
To get interface up I need to unload and load module again.

Here is my kernel config and dmesg.boot:

http://prioris.mini.pw.edu.pl/~nick/dmesg.boot
http://prioris.mini.pw.edu.pl/~nick/SLAYER

-- 
Pawel Jakub Dawidek
UNIX Systems Administrator
http://garage.freebsd.pl
Am I Evil? Yes, I Am.

To Unsubscribe: send mail to [EMAIL PROTECTED]
with unsubscribe freebsd-current in the body of the message



Re: -CURRENT panic on SMP Athlon box.

2002-12-21 Thread Pawel Jakub Dawidek
On Sat, Dec 21, 2002 at 08:51:27AM +0100, Poul-Henning Kamp wrote:
+ My SMP Athlon box paniced again tonight, and this time my serial
+ console caught it in the act.
+ 
+ I have no idea what has caused this, and have no idea if it has any
+ significance for 5.0-R or not.  I wonder if we have a memory leak ?

Maybe a good way to debug it is to show memory statistics just like sysctl
kern.malloc do, befor this panic (or any panic caused by insufficient
memory) is called?

-- 
Pawel Jakub Dawidek
UNIX Systems Administrator
http://garage.freebsd.pl
Am I Evil? Yes, I Am.



msg49156/pgp0.pgp
Description: PGP signature


Panic in jail [patch].

2002-12-20 Thread Pawel Jakub Dawidek
Hello.

Initiated mutex for prison isn't destroyed on error.
Kernel will on every error.

Here You got patch for this:

--- kern_jail.c.origFri Dec 20 15:11:10 2002
+++ kern_jail.c Fri Dec 20 15:14:03 2002
@@ -103,6 +103,7 @@
PROC_UNLOCK(p);
crfree(newcred);
 bail:
+   mtx_destroy(pr-pr_mtx);
FREE(pr, M_PRISON);
return (error);
 }
---

BTW. Maybe is time to implement jail with more features?
Multiple ips, protecting statfs-like calls or even multi level jail?
As multi level jail I understand jail created in jail, etc.

-- 
Pawel Jakub Dawidek
UNIX Systems Administrator
http://garage.freebsd.pl
Am I Evil? Yes, I Am.



msg49120/pgp0.pgp
Description: PGP signature


<    1   2