Re: main [and, likely, stable/14]: do not set vfs.zfs.bclone_enabled=1 with that zpool feature enabled because it still leads to panics

2023-09-08 Thread Pawel Jakub Dawidek

On 9/8/23 15:09, Alexander Motin wrote:
Thank you, Martin.  I was able to reproduce the issue with your script 
and found the cause.


I first though the issue is triggered by the `cp`, but it appeared to be 
triggered by `cat`.  It also got copy_file_range() support, but later 
than `cp`.  That is probably why it slipped through testing.  This patch 
fixes it for me: https://github.com/openzfs/zfs/pull/15251 .


Mark, could you please try the patch?


Thank you Alex for the fix!

--
Pawel Jakub Dawidek




Re: another crash and going forward with zfs

2023-04-17 Thread Pawel Jakub Dawidek

On 4/18/23 05:14, Mateusz Guzik wrote:

On 4/17/23, Pawel Jakub Dawidek  wrote:

Correct me if I'm wrong, but from my understanding there were zero
problems with block cloning when it wasn't in use or now disabled.

The reason I've introduced vfs.zfs.bclone_enabled sysctl, was to exactly
avoid mess like this and give us more time to sort all the problems out
while making it easy for people to try it.

If there is no plan to revert the whole import, I don't see what value
removing just block cloning will bring if it is now disabled by default
and didn't cause any problems when disabled.



The feature definitely was not properly stress tested and what not and
trying to do it keeps running into panics. Given the complexity of the
feature I would expect there are many bug lurking, some of which
possibly related to the on disk format. Not having to deal with any of
this is can be arranged as described above and is imo the most
sensible route given the timeline for 14.0


Block cloning doesn't create, remove or modify any on-disk data until it 
is in use.


Again, if we are not going to revert the whole merge, I see no point in 
reverting block cloning as until it is enabled, its code is not 
executed. This allow people who upgraded the pools to do nothing special 
and it will allow people to test it easily.


--
Pawel Jakub Dawidek




Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

2023-04-17 Thread Pawel Jakub Dawidek

On 4/17/23 21:28, José Pérez wrote:

Hi Pawel,
thank you for your reply and for the fixes.

I think there is a 4th issue that needs to be addressed: how do we 
recover from the worst case scenario which is a machine with a kernel > 
2a58b312b62f and ZFS root upgraded with block cloning enabled.


In particular, is it safe to turn such a machine on in the first place, 
and what are the risks involved in doing so? Any potential data loss?


Would such a machine be able to fix itself by compiling a kernel, or 
would compilation fail and might data be corrupted in the process?


I have two poudriere builders powered off (I am not alone in this 
situation) and I need to recover them, ideally minimizing data loss. The 
builders are also hosting current and used to build kernels and worlds 
for 13 and current: as of now all my production machines are stuck on 
the 13 they run, I cannot update binaries nor packages and I would like 
to be back online.


José,

I can only speak of block cloning in details, but I'll try to address 
everything.


The easiest way to avoid block_cloning-related corruption on the kernel 
after the last OpenZFS merge, but before e0bb199925 is to set the 
compress property to 'off' and the sync property to something other than 
'disabled'. This will avoid the block_cloning-related corruption and 
zil_replaying() panic.


As for the other corruption, unfortunately I don't know the details, but 
my understanding is that it is happening under higher load. Not sure I'd 
trust a kernel built on a machine with this bug present. What I would do 
is to compile the kernel as of 068913e4ba somewhere else, boot the 
problematic machine in single-user mode and install the newly built kernel.


As far as I can tell, contrary to some initial reports, none of the 
problems introduced by the recent OpenZFS merge corrupt the pool 
metadata, only file's data. You can locate the files modified with the 
bogus kernel using find(1) with a proper modification time, but you have 
to decide what to do with them (either throw them away, restore them 
from backup or inspect them).


--
Pawel Jakub Dawidek




Re: another crash and going forward with zfs

2023-04-17 Thread Pawel Jakub Dawidek

On 4/18/23 03:51, Mateusz Guzik wrote:

After bugfixes got committed I decided to zpool upgrade and sysctl
vfs.zfs.bclone_enabled=1 vs poudriere for testing purposes. I very
quickly got a new crash:

panic: VERIFY(arc_released(db->db_buf)) failed

cpuid = 9
time = 1681755046
KDB: stack backtrace:
db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe0a90b8e5f0
vpanic() at vpanic+0x152/frame 0xfe0a90b8e640
spl_panic() at spl_panic+0x3a/frame 0xfe0a90b8e6a0
dbuf_redirty() at dbuf_redirty+0xbd/frame 0xfe0a90b8e6c0
dmu_buf_will_dirty_impl() at dmu_buf_will_dirty_impl+0xa2/frame
0xfe0a90b8e700
dmu_write_uio_dnode() at dmu_write_uio_dnode+0xe9/frame 0xfe0a90b8e780
dmu_write_uio_dbuf() at dmu_write_uio_dbuf+0x42/frame 0xfe0a90b8e7b0
zfs_write() at zfs_write+0x672/frame 0xfe0a90b8e960
zfs_freebsd_write() at zfs_freebsd_write+0x39/frame 0xfe0a90b8e980
VOP_WRITE_APV() at VOP_WRITE_APV+0xdb/frame 0xfe0a90b8ea90
vn_write() at vn_write+0x325/frame 0xfe0a90b8eb20
vn_io_fault_doio() at vn_io_fault_doio+0x43/frame 0xfe0a90b8eb80
vn_io_fault1() at vn_io_fault1+0x161/frame 0xfe0a90b8ecc0
vn_io_fault() at vn_io_fault+0x1b5/frame 0xfe0a90b8ed40
dofilewrite() at dofilewrite+0x81/frame 0xfe0a90b8ed90
sys_write() at sys_write+0xc0/frame 0xfe0a90b8ee00
amd64_syscall() at amd64_syscall+0x157/frame 0xfe0a90b8ef30
fast_syscall_common() at fast_syscall_common+0xf8/frame 0xfe0a90b8ef30
--- syscall (4, FreeBSD ELF64, write), rip = 0x103cddf7949a, rsp =
0x103cdc85dd48, rbp = 0x103cdc85dd80 ---
KDB: enter: panic
[ thread pid 95000 tid 135035 ]
Stopped at  kdb_enter+0x32: movq$0,0x9e4153(%rip)

The posted 14.0 schedule which plans to branch stable/14 on May 12 and
one cannot bet on the feature getting beaten up into production shape
by that time. Given whatever non-block_clonning and not even zfs bugs
which are likely to come out I think this makes the feature a
non-starter for said release.

I note:
1. the current problems did not make it into stable branches.
2. there was block_cloning-related data corruption (fixed) and there may be more
3. there was unrelated data corruption (see
https://github.com/openzfs/zfs/issues/14753), sorted out by reverting
the problematic commit in FreeBSD, not yet sorted out upstream

As such people's data may be partially hosed as is.

Consequently the proposed plan is as follows:
1. whack the block cloning feature for the time being, but make sure
pools which upgraded to it can be mounted read-only
2. run ztest and whatever other stress testing on FreeBSD, along with
restoring openzfs CI -- I can do the first part, I'm sure pho will not
mind to run some tests of his own
3. recommend people create new pools and restore data from backup. if
restoring from backup is not an option, tar or cp (not zfs send) from
the read-only mount

block cloning beaten into shape would use block_cloning_v2 or whatever
else, key point that the current feature name would be considered
bogus (not blocking RO import though) to prevent RW usage of the
current pools with it enabled.

Comments?


Correct me if I'm wrong, but from my understanding there were zero 
problems with block cloning when it wasn't in use or now disabled.


The reason I've introduced vfs.zfs.bclone_enabled sysctl, was to exactly 
avoid mess like this and give us more time to sort all the problems out 
while making it easy for people to try it.


If there is no plan to revert the whole import, I don't see what value 
removing just block cloning will bring if it is now disabled by default 
and didn't cause any problems when disabled.


--
Pawel Jakub Dawidek




Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

2023-04-17 Thread Pawel Jakub Dawidek

On 4/17/23 18:15, Pawel Jakub Dawidek wrote:

There were three issues that I know of after the recent OpenZFS merge:

1. Data corruption unrelated to block cloning, so it can happen even 
with block cloning disabled or not in use. This was the problematic commit:

 
https://github.com/openzfs/zfs/commit/519851122b1703b8445ec17bc89b347cea965bb9

It was reverted in 63ee747febbf024be0aace61161241b53245449e.

2. Data corruption with embedded blocks when block cloning is enabled. 
It can happen when compression is enabled and the block contains between 
60 to 112 bytes (this might be hard to determine). Fix exists, it is 
merged to OpenZFS already, but isn't in FreeBSD yet.

 OpenZFS pull request: https://github.com/openzfs/zfs/pull/14739

3. Panic on VERIFY(zil_replaying(zfsvfs->z_log, tx)). This is triggered 
when block cloning is enabled, the sync property is set to disabled and 
copy_file_range(2) is used. Easy fix exists, it is not yet merged to 
OpenZFS and not yet in FreeBSD HEAD.

 OpenZFS pull request: https://github.com/openzfs/zfs/pull/14758

Block cloning was disabled in 46ac8f2e7d9601311eb9b3cd2fed138ff4a11a66, 
so 2 and 3 should not occur.


As of 068913e4ba3dd9b3067056e832cefc5ed264b5cc all known issues are 
fixed, as far as I can tell.


Block cloning remains disabled for now just to be on the safe side, but 
can be enabled by setting sysctl vfs.zfs.bclone_enabled to 1.


Don't relay on this sysctl as it will be removed in 2-3 weeks.

--
Pawel Jakub Dawidek




Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

2023-04-16 Thread Pawel Jakub Dawidek

On 4/16/23 01:07, Florian Smeets wrote:
On the pool that has block_cloning enabled I see the above insta panic 
when poudriere starts building. I found a workaround though:


--- /usr/local/share/poudriere/include/fs.sh.orig    2023-04-15 
18:03:50.090823000 +0200
+++ /usr/local/share/poudriere/include/fs.sh    2023-04-15 
18:04:04.144736000 +0200

@@ -295,7 +295,6 @@
  fi

  zfs clone -o mountpoint=${mnt} \
-    -o sync=disabled \
  -o atime=off \
  -o compression=off \
  ${fs}@${snap} \

With this workaround I was able to build thousands of packages without 
panics or failures due to data corruption.


Thank you, Florian, that was very helpful!

This should fix the problem:

https://github.com/openzfs/zfs/pull/14758

--
Pawel Jakub Dawidek




Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

2023-04-13 Thread Pawel Jakub Dawidek

On 4/14/23 09:23, Charlie Li wrote:

Pawel Jakub Dawidek wrote:
Here is the change that reverts most of the modifications and disables 
cloning new blocks. It does retain ability to free existing cloned 
blocks and keeps block_cloning feature around, so upgraded pools can 
be imported and existing cloned blocks freed.


It does not handle replaying ZIL with block-cloning logs, so make sure 
you import pools that were cleanly exported.


I'd appreciate if someone who can reproduce those corruptions could 
try it.


https://github.com/pjd/openzfs/commit/f2cfbcf76a733c44e25cba8c649162ef68047103

Does not apply to sys/contrib/openzfs tip, conflicts in 
module/os/freebsd/zfs/zfs_vnops_os.c and module/zfs/dmu.c.


This should work:

https://people.freebsd.org/~pjd/patches/brt_revert.patch

--
Pawel Jakub Dawidek




Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

2023-04-13 Thread Pawel Jakub Dawidek

On 4/14/23 07:52, Charlie Li wrote:

Pawel Jakub Dawidek wrote:
thank you for your testing and patience so far. I'm working on a patch 
to revert block cloning without affecting people who already upgraded 
their pools.


Testing with mjg@ earlier today revealed that block_cloning was not the 
cause of poudriere bulk build (and similar cp(1)/install(1)-based) 
corruption, although may have exacerbated it.


Can you please elaborate how were you testing and what exactly did you 
exclude?


Thanks.

--
Pawel Jakub Dawidek




Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

2023-04-13 Thread Pawel Jakub Dawidek

On 4/14/23 07:40, Pawel Jakub Dawidek wrote:

On 4/13/23 22:56, Cy Schubert wrote:
I'm in the process of building a branch reverting the merge altogether 
and

will test it on my sandbox machine later today.


Cy,

thank you for your testing and patience so far. I'm working on a patch 
to revert block cloning without affecting people who already upgraded 
their pools.


I'd also greatly appreciate if you could provide a procedure for me to 
reproduce the corruption, ideally without the internet access, as I'll 
be on the plane(s) for the next ~24h.


Here is the change that reverts most of the modifications and disables 
cloning new blocks. It does retain ability to free existing cloned 
blocks and keeps block_cloning feature around, so upgraded pools can be 
imported and existing cloned blocks freed.


It does not handle replaying ZIL with block-cloning logs, so make sure 
you import pools that were cleanly exported.


I'd appreciate if someone who can reproduce those corruptions could try it.

https://github.com/pjd/openzfs/commit/f2cfbcf76a733c44e25cba8c649162ef68047103

Thank you guys for your help!

--
Pawel Jakub Dawidek




Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

2023-04-13 Thread Pawel Jakub Dawidek

On 4/13/23 22:56, Cy Schubert wrote:

I'm in the process of building a branch reverting the merge altogether and
will test it on my sandbox machine later today.


Cy,

thank you for your testing and patience so far. I'm working on a patch 
to revert block cloning without affecting people who already upgraded 
their pools.


I'd also greatly appreciate if you could provide a procedure for me to 
reproduce the corruption, ideally without the internet access, as I'll 
be on the plane(s) for the next ~24h.


--
Pawel Jakub Dawidek




Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

2023-04-13 Thread Pawel Jakub Dawidek

On 4/13/23 23:05, Shawn Webb wrote:

I've learned over the years downstream that it's not really my place
to tell upstream what to do or how to do it. However, I think given
the seriousness of this, upstream might do well to revert the commit
until a solid fix is in place. Upstream might want to consider the
impacts this is having not just with downstream projects, but also
regular users.

Really bad timing to have a lot of new tax documentation that I really
don't want to lose. I'd really like to have an up-to-date, security
patched OS, but I guess I'll stay behind so that I don't risk losing
critical financial documentation.


Shawn,

I'm working on a patch to safely revert this that would also work for 
people who already upgraded their pools.


I'm sorry for this mess.

--
Pawel Jakub Dawidek




Re: dumpdev in loader.conf vs rc.d/dumpon

2015-09-24 Thread Pawel Jakub Dawidek
On Thu, Sep 24, 2015 at 02:18:50PM +0300, Slawa Olhovchenkov wrote:
> On Thu, Sep 24, 2015 at 11:28:05AM +0300, Andrey V. Elsukov wrote:
> 
> > On 23.09.2015 19:57, Andriy Gapon wrote:
> > > I do not have a strong opinion.  Either option, rc.d/dumpon change or 
> > > geom_dev
> > > change, is fine with me.
> > 
> > I added the ability to set dumpdev via loader. But I wasn't aware that
> > it was used in rc.d script.
> > 
> > If you have set dumpdev kenv, it will be already enabled in the time
> > when rc.d/dumpon  will be run. So, I think it is useless to try to
> > enable dumpdev again. I prefer remove this old code from rc.d script.
> 
> rc.d script can redirect dump to device, not available at boot time,
> iSCSI disk, for examle.

No. Dump device is very special. It runs in an environment when kernel
already paniced, there are no interrupt, so there is no networking.
Storage controllers have special methods to handle dumping kernel memory
- it doesn't go through GEOM, it cannot go through GEOM as the scheduler
doesn't work too.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://mobter.com


pgpfeiUIcx0t9.pgp
Description: PGP signature


Re: dumpdev in loader.conf vs rc.d/dumpon

2015-09-24 Thread Pawel Jakub Dawidek
On Fri, Sep 25, 2015 at 12:11:51AM +0300, Slawa Olhovchenkov wrote:
> On Thu, Sep 24, 2015 at 10:58:00PM +0200, Pawel Jakub Dawidek wrote:
> 
> > On Thu, Sep 24, 2015 at 02:18:50PM +0300, Slawa Olhovchenkov wrote:
> > > On Thu, Sep 24, 2015 at 11:28:05AM +0300, Andrey V. Elsukov wrote:
> > > 
> > > > On 23.09.2015 19:57, Andriy Gapon wrote:
> > > > > I do not have a strong opinion.  Either option, rc.d/dumpon change or 
> > > > > geom_dev
> > > > > change, is fine with me.
> > > > 
> > > > I added the ability to set dumpdev via loader. But I wasn't aware that
> > > > it was used in rc.d script.
> > > > 
> > > > If you have set dumpdev kenv, it will be already enabled in the time
> > > > when rc.d/dumpon  will be run. So, I think it is useless to try to
> > > > enable dumpdev again. I prefer remove this old code from rc.d script.
> > > 
> > > rc.d script can redirect dump to device, not available at boot time,
> > > iSCSI disk, for examle.
> > 
> > No. Dump device is very special. It runs in an environment when kernel
> > already paniced, there are no interrupt, so there is no networking.
> > Storage controllers have special methods to handle dumping kernel memory
> > - it doesn't go through GEOM, it cannot go through GEOM as the scheduler
> > doesn't work too.
> 
> Can be ZFS VOL act as dump device?

I don't think so. IIRC there was a hack in Illumos to allocate
contiguous space for dump in one of the vdevs (then I think it was
extended to multiple vdevs). I don't think any ZFS feature has worked
for such a ZVOL (no checksumming, no compression, etc.).

Others may have more up-to-date info about that.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://mobter.com


pgpSlioURbq3y.pgp
Description: PGP signature


Re: Read-only /usr/obj/ no longer kosher?

2015-08-26 Thread Pawel Jakub Dawidek
On Tue, Aug 25, 2015 at 03:32:35PM -0700, NGie Cooper wrote:
 On Tue, Aug 25, 2015 at 3:21 PM, Xin Li delp...@delphij.net wrote:
  On 08/25/15 14:55, Pawel Jakub Dawidek wrote:
  Now that I think of it, it might have been that I did
  buildworld/buildkernel before -p1. Then freebsd-update updated
  newvers.sh and then I was trying to do installworld.
 
  Yes, I can now reproduce it with source updated to -p2.
 
  Yes, that's because freebsd-version.sh is generated from the files (but
  it's not clear to me whether if it's a bug or a feature that 'make
  install' checks if it's up-to-date and decides to regenerate it...).
 
 It's a quirk for sure. If you change the behavior, people will
 definitely complain as they will now need to go back and rebuild
 everything.

What we have now is misleading. People should recompile. It is rather
rare to see security advisory which bumps only patch level and something
that doesn't require recompilation (eg. a shell script). Current
behaviour would make people think they are running latest patch level
because freebsd-version says so, eventhough they only did 'make
installworld' without rebuilding affected binaries.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://mobter.com


pgpRsLnByGkaA.pgp
Description: PGP signature


Re: Read-only /usr/obj/ no longer kosher?

2015-08-25 Thread Pawel Jakub Dawidek
On Tue, Aug 25, 2015 at 11:04:37PM +0200, Pawel Jakub Dawidek wrote:
 On Sun, Aug 23, 2015 at 03:29:01PM -0700, Xin Li wrote:
  
  
  On 8/23/15 14:55, Pawel Jakub Dawidek wrote:
   I used to build world and kernel on one machine and export both /usr/src/ 
   and
   /usr/obj read-only to other machines. It doesn't work anymore (this is 
   from
   'make installworld'):
   
   === bin/freebsd-version (install)
   eval $(egrep '^(TYPE|REVISION|BRANCH)=' 
   /usr/src/bin/freebsd-version/../../sys/conf/newvers.sh) ;  if ! sed -e  
   s/@@TYPE@@/${TYPE}/g;  s/@@REVISION@@/${REVISION}/g;  
   s/@@BRANCH@@/${BRANCH}/g;   
   /usr/src/bin/freebsd-version/freebsd-version.sh.in freebsd-version.sh ; 
   then  rm -f freebsd-version.sh ;  exit 1 ;  fi
   cannot create freebsd-version.sh: Permission denied
   rm: freebsd-version.sh: Read-only file system
   *** Error code 1
  
  What's the modification times of
  /usr/obj/usr/bin/freebsd-version/freebsd-version.sh,
  /usr/src/bin/freebsd-version/freebsd-version.sh and
  /usr/src/sys/conf/newvers.sh?
 
 I saw it twice, but cannot reproduce it anymore. This is 10.2-RELEASE,
 I've send it to current@ by mistake. All in all my expectation is that
 we shouldn't modify obj/ during installworld.

Now that I think of it, it might have been that I did
buildworld/buildkernel before -p1. Then freebsd-update updated
newvers.sh and then I was trying to do installworld.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://mobter.com
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Read-only /usr/obj/ no longer kosher?

2015-08-25 Thread Pawel Jakub Dawidek
On Tue, Aug 25, 2015 at 11:53:47PM +0200, Pawel Jakub Dawidek wrote:
 On Tue, Aug 25, 2015 at 11:04:37PM +0200, Pawel Jakub Dawidek wrote:
  On Sun, Aug 23, 2015 at 03:29:01PM -0700, Xin Li wrote:
   
   
   On 8/23/15 14:55, Pawel Jakub Dawidek wrote:
I used to build world and kernel on one machine and export both 
/usr/src/ and
/usr/obj read-only to other machines. It doesn't work anymore (this is 
from
'make installworld'):

=== bin/freebsd-version (install)
eval $(egrep '^(TYPE|REVISION|BRANCH)=' 
/usr/src/bin/freebsd-version/../../sys/conf/newvers.sh) ;  if ! sed -e 
 s/@@TYPE@@/${TYPE}/g;  s/@@REVISION@@/${REVISION}/g;  
s/@@BRANCH@@/${BRANCH}/g;   
/usr/src/bin/freebsd-version/freebsd-version.sh.in freebsd-version.sh 
; then  rm -f freebsd-version.sh ;  exit 1 ;  fi
cannot create freebsd-version.sh: Permission denied
rm: freebsd-version.sh: Read-only file system
*** Error code 1
   
   What's the modification times of
   /usr/obj/usr/bin/freebsd-version/freebsd-version.sh,
   /usr/src/bin/freebsd-version/freebsd-version.sh and
   /usr/src/sys/conf/newvers.sh?
  
  I saw it twice, but cannot reproduce it anymore. This is 10.2-RELEASE,
  I've send it to current@ by mistake. All in all my expectation is that
  we shouldn't modify obj/ during installworld.
 
 Now that I think of it, it might have been that I did
 buildworld/buildkernel before -p1. Then freebsd-update updated
 newvers.sh and then I was trying to do installworld.

Yes, I can now reproduce it with source updated to -p2.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://mobter.com
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Read-only /usr/obj/ no longer kosher?

2015-08-25 Thread Pawel Jakub Dawidek
On Sun, Aug 23, 2015 at 03:29:01PM -0700, Xin Li wrote:
 
 
 On 8/23/15 14:55, Pawel Jakub Dawidek wrote:
  I used to build world and kernel on one machine and export both /usr/src/ 
  and
  /usr/obj read-only to other machines. It doesn't work anymore (this is from
  'make installworld'):
  
  === bin/freebsd-version (install)
  eval $(egrep '^(TYPE|REVISION|BRANCH)=' 
  /usr/src/bin/freebsd-version/../../sys/conf/newvers.sh) ;  if ! sed -e  
  s/@@TYPE@@/${TYPE}/g;  s/@@REVISION@@/${REVISION}/g;  
  s/@@BRANCH@@/${BRANCH}/g;   
  /usr/src/bin/freebsd-version/freebsd-version.sh.in freebsd-version.sh ; 
  then  rm -f freebsd-version.sh ;  exit 1 ;  fi
  cannot create freebsd-version.sh: Permission denied
  rm: freebsd-version.sh: Read-only file system
  *** Error code 1
 
 What's the modification times of
 /usr/obj/usr/bin/freebsd-version/freebsd-version.sh,
 /usr/src/bin/freebsd-version/freebsd-version.sh and
 /usr/src/sys/conf/newvers.sh?

I saw it twice, but cannot reproduce it anymore. This is 10.2-RELEASE,
I've send it to current@ by mistake. All in all my expectation is that
we shouldn't modify obj/ during installworld.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://mobter.com
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Read-only /usr/obj/ no longer kosher?

2015-08-23 Thread Pawel Jakub Dawidek
I used to build world and kernel on one machine and export both /usr/src/ and
/usr/obj read-only to other machines. It doesn't work anymore (this is from
'make installworld'):

=== bin/freebsd-version (install)
eval $(egrep '^(TYPE|REVISION|BRANCH)=' 
/usr/src/bin/freebsd-version/../../sys/conf/newvers.sh) ;  if ! sed -e  
s/@@TYPE@@/${TYPE}/g;  s/@@REVISION@@/${REVISION}/g;  s/@@BRANCH@@/${BRANCH}/g; 
  /usr/src/bin/freebsd-version/freebsd-version.sh.in freebsd-version.sh ; 
then  rm -f freebsd-version.sh ;  exit 1 ;  fi
cannot create freebsd-version.sh: Permission denied
rm: freebsd-version.sh: Read-only file system
*** Error code 1

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://mobter.com


pgp0DzHE4AU2t.pgp
Description: PGP signature


Re: Memory modified after free, seemingly geli related

2015-08-05 Thread Pawel Jakub Dawidek
On Thu, Aug 06, 2015 at 04:06:40AM +0200, Pawel Jakub Dawidek wrote:
 On Wed, Aug 05, 2015 at 03:24:26AM +, Ed Maste wrote:
  I've encountered a few memory modified after free panics recently,
  which seem to be from geli. I don't yet have any debugging to
  completely confirm it's geli, but it has not happened on my other test
  laptop which configured similarly but without geli.
  
  This has a few local patches from my to-commit-to-HEAD queue.
  FreeBSD volta 11.0-CURRENT FreeBSD 11.0-CURRENT #10
  r284409+6a002d9(staging): Tue Jul  7 17:57:01 EDT 2015
  
  panic: Memory modified after free 0xf80009d504d8(248) val=0 @
  0xf80009d50518
 
 I'm seeing it too. I tracked it down to ZFS. The bio was last owned by
 the ZFS::VDEV GEOM class, which is modyfing bio_error on freed bio. I'm
 investigating further and will let you know here once I find the
 cause.

Ok. It was bio from ZFS in my case, but it was GELI which modified
bio_error after delivering bio.

This patch fixes the race:

http://people.freebsd.org/~pjd/patches/geom_eli.patch

Using bio after calling crypto_dispatch() is a bug. 'done' callbacks
might have already called g_io_deliver() and upper layer might have
already freed the bio.

I'm not fully convinced that panic is the right response to
crypto_dispatch() failure. It means that the driver failed our request
and didn't call our callback, which is bad as we never complete the I/O.
The crypto drivers tend to return errors only if the request itself is
bogus, but that is program's bug and not a runtime condition. In other
words panic should be fine here.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://mobter.com


pgpRu2X0EJLDP.pgp
Description: PGP signature


Re: Memory modified after free, seemingly geli related

2015-08-05 Thread Pawel Jakub Dawidek
On Wed, Aug 05, 2015 at 03:24:26AM +, Ed Maste wrote:
 I've encountered a few memory modified after free panics recently,
 which seem to be from geli. I don't yet have any debugging to
 completely confirm it's geli, but it has not happened on my other test
 laptop which configured similarly but without geli.
 
 This has a few local patches from my to-commit-to-HEAD queue.
 FreeBSD volta 11.0-CURRENT FreeBSD 11.0-CURRENT #10
 r284409+6a002d9(staging): Tue Jul  7 17:57:01 EDT 2015
 
 panic: Memory modified after free 0xf80009d504d8(248) val=0 @
 0xf80009d50518

I'm seeing it too. I tracked it down to ZFS. The bio was last owned by
the ZFS::VDEV GEOM class, which is modyfing bio_error on freed bio. I'm
investigating further and will let you know here once I find the
cause.

 cpuid = 1
 KDB: stack backtrace:
 db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfe011414a880
 vpanic() at vpanic+0x189/frame 0xfe011414a900
 panic() at panic+0x43/frame 0xfe011414a960
 trash_ctor() at trash_ctor+0x48/frame 0xfe011414a970
 uma_zalloc_arg() at uma_zalloc_arg+0x573/frame 0xfe011414a9e0
 g_clone_bio() at g_clone_bio+0x1d/frame 0xfe011414aa00
 g_eli_start() at g_eli_start+0xbd/frame 0xfe011414aa30
 g_io_schedule_down() at g_io_schedule_down+0xe6/frame 0xfe011414aa60
 g_down_procbody() at g_down_procbody+0x7d/frame 0xfe011414aa70
 fork_exit() at fork_exit+0x84/frame 0xfe011414aab0
 fork_trampoline() at fork_trampoline+0xe/frame 0xfe011414aab0
 --- trap 0, rip = 0, rsp = 0xfe011414ab70, rbp = 0 ---

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://mobter.com


pgpMR9ZeEaVYN.pgp
Description: PGP signature


Re: diskid documentation

2014-06-03 Thread Pawel Jakub Dawidek
On Mon, Jun 02, 2014 at 03:27:06PM -0700, John-Mark Gurney wrote:
 Pawel Jakub Dawidek wrote this message on Mon, Jun 02, 2014 at 22:26 +0200:
  The problem is that GPT labels (or GPT IDs for that matter) should not
  be implemented within GLABEL. This is wrong. It should be implemented as
  part of GPART, so that GPART would create ada0p1, gpt/label and
  gptid/whatever. Opening one of those should not make the others
  disappear then. Only opening ada0 for writting would make them disappear.
 
 even gpart would be wrong IMO... What happens if there is another
 provider like GPART, but different, do they need to implement diskid
 creation too to prevent the same issue?
 
 Shouldn't geom be updated to say, this ident is an alias, everything
 you do w/ this, it's exactly the same as the other one?  This would
 also have the advantage of possibly removing one layer in the call
 chain when dealing w/ IO. (or does GEOM has a pass-through flag that
 says, I don't do anything, just skip me?)

As for disk IDs it definitely shouldn't be implemented in GPART or
GLABEL. IMHO the right place is the DISK class - both ada0 and
diskid-of-ada0 should exist on the same rights (two providers of one
geom). This also would address your concern about additional layer.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://mobter.com


pgpUBPO31HwbW.pgp
Description: PGP signature


Re: diskid documentation

2014-06-02 Thread Pawel Jakub Dawidek
On Mon, Jun 02, 2014 at 11:01:08AM -0700, John-Mark Gurney wrote:
 Michael W. Lucas wrote this message on Mon, Jun 02, 2014 at 11:36 -0400:
  On Mon, Jun 02, 2014 at 10:45:52AM -0400, Ryan Stone wrote:
   On Mon, Jun 2, 2014 at 9:26 AM, Allan Jude allanj...@freebsd.org wrote:
It also tends to sometimes hide the gpt label provider on me (not sure
in which cases it does this, but it is annoying)
   
   This happens when something (e.g. zfs) happens to open the diskid
   provider instead of the gpt label.  For me this ended up being a bit
   more than annoying; my swap was mounted in /etc/fstab via a gpt label
   so I silently lost my swap when I did an upgrade.
  
  Wait-- one type of one label can hide another?
  
  I thought a big point of labels was to remove ambiguity...
 
 Surprisingly, yes...  I didn't think about this, but it's true...
 
 A disk will get exported via two different devices, diskid and normal
 da/ada...  The tasting will go through and create all the necessary
 sub devices, but the problem is that we now have two different paths,
 and if something opens the diskid path, then the da/ada paths all
 disappear...
 
 This sounds like we need to fix geom to bind the two together so
 that when one opens, the other doesn't disappear... The problem is
 that geom views them as two separate disks when in fact they are the
 same...  someone who knows geom well should think about how to solve
 this problem, as diskid isn't the first time this has happened, just
 most prevalent w/ ZFS and diskid.

The problem is that GPT labels (or GPT IDs for that matter) should not
be implemented within GLABEL. This is wrong. It should be implemented as
part of GPART, so that GPART would create ada0p1, gpt/label and
gptid/whatever. Opening one of those should not make the others
disappear then. Only opening ada0 for writting would make them disappear.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://mobter.com


pgp8HNxiuiLpJ.pgp
Description: PGP signature


Re: [PATCH RFC] Disable save-entropy in jails

2013-12-25 Thread Pawel Jakub Dawidek
On Tue, Dec 24, 2013 at 12:44:34PM -0800, Xin Li wrote:
 Hi,
 
 I think we shouldn't save entropy inside jails, as the data is not going
 to be used by rc script (pjd@126744).  If there is no objections, I will
 commit this changeset on January 1, 2014.

I agree we shouldn't do it. I have this line in my crontab that I wanted
to commit at some point:

1,310-5 *   *   *   root[ `sysctl -n 
security.jail.jailed` -eq 0 ]  adjkerntz -a

It prevents executing adjkerntz from within a jail, but allows to keep
the same crontab in and outside jails.

We could do the same for save-entropy. It would be even nicer to have
some flag so that even sysctl(8) is not executed.

 Index: libexec/save-entropy/save-entropy.sh
 ===
 --- libexec/save-entropy/save-entropy.sh  (revision 259828)
 +++ libexec/save-entropy/save-entropy.sh  (working copy)
 @@ -42,6 +42,10 @@ elif [ -r /etc/rc.conf ]; then
   . /etc/rc.conf 2/dev/null
  fi
 
 +if [ `/sbin/sysctl -n security.jail.jailed` -eq 1 ]; then
 + exit 0
 +fi
 +
  case ${entropy_dir} in
  [Nn][Oo])
   exit 0

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://mobter.com


pgpiq6LgOgUJX.pgp
Description: PGP signature


Re: dhclient can't limit bpf descriptor?

2013-12-15 Thread Pawel Jakub Dawidek
On Sat, Dec 14, 2013 at 12:12:23PM -0800, Tim Kientzle wrote:
 Opened up an old VM from a month or so ago (r257910) and dhclient won’t start.
 
 Specifically, dhclient complains (when run by root):
  “can’t limit bpf descriptor: Bad address”
 and then immediately exits.
 
 What does this mean?   I don’t know anything about the capabilities
 framework and certainly haven’t configured it in any way.

Maybe your userland and kernel are out of sync? There was
backward-incompatible change in Capsicum that could result in error like
that (capability rights are now passed as a pointer to a structure and
not uint64_t bitfield).

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://mobter.com


pgpOTGPPh0ZvY.pgp
Description: PGP signature


Re: r253070 and disappearing zpool

2013-07-25 Thread Pawel Jakub Dawidek
On Wed, Jul 24, 2013 at 02:47:11PM +0300, Andriy Gapon wrote:
 on 22/07/2013 23:38 Pawel Jakub Dawidek said the following:
  The /boot/ has to be unencrypted and can be stored on eg. USB
  pendrive which is never left unattended, unlike laptop which can be left
  in eg. a hotel room, but with entire disk encrypted.
 
 As we discussed elsewhere, there are many options of configuring full disk
 encryption.  Including decisions whether root filesystem should be separate 
 from
 boot filesystem, choice of filesystem type for boot fs, ways of tying various
 pieces together, and many more.
 
 I do not believe that my change is incompatible with full disk encryption in
 general.

Maybe you can imagine many ways of configuring it, but definiately the
most typical one is to have separate /boot/ from /, where /boot/ is
unencrypted and where you use one file system type for both (UFS or ZFS).

 Let's also recall that the system was not created / configured by any of the
 existing official or semi-official tools and thus it does not represent any
 recommended way of setting up such systems.  Glen configured it this way, but 
 it
 doesn't mean that that is the way.

Note that there are no official tools to install FreeBSD on ZFS. Is that
enough reason to stop supporting it?

What Glen did is the recommended way of setting up full disk encryption
with ZFS. I'd do it the same way and I'd recommend this configuration to
anyone who will (or did) ask me.

 I think that there are many of ways of changing configuration of that system 
 to
 make behave as before again.
 Three I mentioned already.  Another is to add rc script to import the boot 
 pool,
 given that it is a special, designated pool.  Yet another is to place
 zpool.cache onto the root pool and use nullfs (instead of a symlink) to make
 /boot be from the boot pool but /boot/zfs be from the root pool.

Come on...

  BTW. If moving zpool.cache to /etc/zfs/ will work for both cases that's
  fine by me, although the migration might be tricky.
 
 Yes, that's migration that's scary to me too.
 
 
 Now, about the postponed points.
 I will reproduce a section from my email that you've snipped.
 
  P.S.
  ZFS/FreeBSD boot process is extremely flexible.  For example zfsboot can 
  take
  zfsloader from pool1/fsA, zfsloader can boot kernel from pool2/fsB and 
  kernel
  can mount / from pool3/fsC.  Of these 3 filesystems from where should
  zpool.cache be taken?
  My firm opinion is that it should be taken from / (pool3/fsC in the example
  above).  Because it is the root filesystem that defines what a system is 
  going
  to do ultimately: what daemons are started, with what configurations, etc.
  And thus it should also determine what pools to auto-import.
  We can say that zpool.cache is analogous to /etc/fstab in this respect.
 
 So do you or do you not agree with my reasoning about from where zpool.cache
 should be taken?
 If you do not, then please explain why.
 If you do, then please explain how this would be compatible with the old way 
 of
 loading zpool.cache.

I don't have a strong opinion about this. As I said above I'm fine with
moving zpool.cache to /etc/zfs/ if we can ensure it won't break existing
installations.

Still I'm not sure this was your initial goal, because you weren't aware
of systems with separate boot pool until recently (if you were aware of
this I hope you wouldn't commit the change without prior discussion).
Which means in your eyes zpool.cache was always part of the root pool,
because /boot/ was.

 I think that ensuring that zpool.cache is always loaded from a root filesystem
 is the gain from my change.

Were people complaining about zpool.cache being loaded from /boot/zfs/
and not from /etc/zfs/? I don't think so. But people do complain about
boot pool not being autoimported. In my opinion for the end user it
doesn't really matter if it is /etc/zfs/zpool.cache or
/boot/zfs/zpool.cache, as both directories are available once the system
is booted. For most people those two directories are placed on the same
file system. For some people who actually care if this is /etc/zfs/ or
/boot/zfs/, because those are separate file systems the latter works,
the former doesn't.

In my opinion the gain, if any, is only theoretical.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://mobter.com


pgpF_b3WkFXBB.pgp
Description: PGP signature


Re: r253070 and disappearing zpool

2013-07-22 Thread Pawel Jakub Dawidek
On Mon, Jul 22, 2013 at 10:29:40AM +0300, Andriy Gapon wrote:
 I think that this setup (on ZFS level) is quite untypical, although not
 impossible on FreeBSD (and perhaps only FreeBSD).
 It's untypical because you have separate boot pool (where loader, loader.conf
 and kernel are taken from) and root pool (where / is mounted from).

As I said elsewhere, it is pretty typical when full disk encryption is
used. The /boot/ has to be unencrypted and can be stored on eg. USB
pendrive which is never left unattended, unlike laptop which can be left
in eg. a hotel room, but with entire disk encrypted.

 So, I see three ways of resolving the problem that my changes caused for your
 configuration.
 
 1.  [the easiest] Put zpool.cache loading instructions that used to be in
 defaults/loader.conf into your loader.conf.  This way everything should work 
 as
 before -- zpool.cache would be loaded from your boot pool.
 
 2. Somehow (I don't want to go into any technical details here) arrange that
 your root pool has /boot/zfs/zpool.cache that describes your boot pool.  This 
 is
 probably hard given that your /boot is a symlink at the moment.  This probably
 would be easier to achieve if zpool.cache lived in /etc/zfs.
 
 3. [my favorite]  Remove an artificial difference between your boot and root
 pools, so that they are a single root+boot pool (as zfs gods intended).  As 
 far
 as I understand your setup, you use GELI to protect some sensitive data.
 Apparently your kernel is not sensitive data, so I wonder if your /bin/sh or
 /sbin/init are really sensitive either.
 So perhaps you can arrange your unencrypted pool to hold all of the base 
 system
 (boot + root) and put all your truly sensitive filesystems (like e.g. /home or
 /var/data or /opt/xyz) onto your encrypted pool.

If all you care about is laptop being stolen, then that would work.

If you however want to be protected from someone replacing your /sbin/init
with something evil then you use encryption or even better integrity
verification also supported by GELI.

Remember, tools not policies.

There is also option number 4 - backing out your commit.

When I saw your commit removing those entries from defaults/loader.conf,
I thought it is fine, as we now don't require zpool.cache to import the
root pool, which was, BTW, very nice and handy improvement. Now that we
know it breaks existing installations I'd prefer the commit to be backed
out. This is because apart from breaking some existing installations it
doesn't gain us anything.

 So I understand that my change causes a problem for a setup like yours, but I
 believe that the change is correct.

The change is clearly incorrect or incomplete as it breaks existing
installations and doesn't allow for full disk encryption configuration
on ZFS-only systems.

BTW. If moving zpool.cache to /etc/zfs/ will work for both cases that's
fine by me, although the migration might be tricky.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://mobter.com


pgpG8GeaQjVQd.pgp
Description: PGP signature


Re: HEADSUP! dhclient(8) sandboxing.

2013-07-04 Thread Pawel Jakub Dawidek
On Wed, Jul 03, 2013 at 11:04:21PM -0700, Alfred Perlstein wrote:
 On 7/3/13 3:52 PM, Pawel Jakub Dawidek wrote:
  Hi.
 
  I've just committed Capsicum sandboxing for the dhclient(8).
  Let me know (ideally by sending e-mail to current@ and CCing me) if you
  notice any weird behaviour.
 
  The work was sponsored by the FreeBSD Foundation.
 
 It broke running dhclient on igb0 for me.  It says interface not found 
 or something to that effect.
 
 Can I help somehow?
 
 Basically just ifconfig down igb0 then try to run dhclient.  It will 
 not work.  If you up the interface and then run it, it is OK.
 
 See attached image.

Thanks for the report. Could you try this patch?

http://people.freebsd.org/~pjd/patches/dhclient.c.patch

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://mobter.com


pgp0Gs_Z1C8ZM.pgp
Description: PGP signature


Re: HEADSUP! dhclient(8) sandboxing.

2013-07-04 Thread Pawel Jakub Dawidek
On Thu, Jul 04, 2013 at 04:55:14PM +0400, Andrey Chernov wrote:
 On 04.07.2013 2:52, Pawel Jakub Dawidek wrote:
  I've just committed Capsicum sandboxing for the dhclient(8).
  Let me know (ideally by sending e-mail to current@ and CCing me) if you
  notice any weird behaviour.
 
 I don't test one your very recent commit yet, but whole previous commits
 chain case dhclient broken:
 
 Starting dhclient.
 em0: no link .. got link
 em0: not found
 exiting.
 /etc/rc.d/dhclient: WARNING: failed to start dhclient
 
 and a bit later in rc
 
 Starting dhclient.
 em0: not found
 exiting.
 /etc/rc.d/dhclient: WARNING: failed to start dhclient

It should be fixed in r252697. Could you give it a try?

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://mobter.com


pgpvt0u0Xs1b4.pgp
Description: PGP signature


HEADSUP! dhclient(8) sandboxing.

2013-07-03 Thread Pawel Jakub Dawidek
Hi.

I've just committed Capsicum sandboxing for the dhclient(8).
Let me know (ideally by sending e-mail to current@ and CCing me) if you
notice any weird behaviour.

The work was sponsored by the FreeBSD Foundation.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://mobter.com


pgpG83WhmJLcx.pgp
Description: PGP signature


Re: r248583 Kernel panic: negative refcount 0xfffffe0031b59168

2013-07-01 Thread Pawel Jakub Dawidek
On Sun, Jun 30, 2013 at 01:18:36PM +0200, Mateusz Guzik wrote:
 On Sun, Jun 30, 2013 at 05:21:42PM +1000, Kubilay Kocak wrote:
  I'm seeing what I believe is related panic, reliably being generated by
  the Python regression test suite on a newly created FreeBSD 10-CURRENT
  buildbot.
  
  Symptoms first seen in an freebsd.org FTP snapshot dated Thu May 30
  20:01:46 UTC 2013 and also reproducible on a freshly updated r252400
  
  It is additionally reproducible after checking out pure upstream python
  sources, using the following steps:
  
  hg clone http://hg.python.org/cpython
  cd cpython  configure  make buildbottest
  
  An interesting possible correlation is that it seems to drop out
  during/around test_socket
  
 
 Turns out the bug is quite funny ;)
 
 Try this:
 diff --git a/sys/kern/uipc_usrreq.c b/sys/kern/uipc_usrreq.c
 index 5d8e814..7a4db04 100644
 --- a/sys/kern/uipc_usrreq.c
 +++ b/sys/kern/uipc_usrreq.c
 @@ -1764,8 +1764,8 @@ unp_externalize(struct mbuf *control, struct mbuf 
 **controlp, int flags)
   }
   for (i = 0; i  newfds; i++, fdp++) {
   fde = fdesc-fd_ofiles[*fdp];
 - fde-fde_file = fdep[0]-fde_file;
 - filecaps_move(fdep[0]-fde_caps,
 + fde-fde_file = fdep[i]-fde_file;
 + filecaps_move(fdep[i]-fde_caps,
   fde-fde_caps);
   if ((flags  MSG_CMSG_CLOEXEC) != 0)
   fde-fde_flags |= UF_EXCLOSE;

Thanks for tracking it down before I had time to get to it!
The change looks good.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://mobter.com


pgpHVKRcu5rHH.pgp
Description: PGP signature


Re: r248583 Kernel panic: negative refcount 0xfffffe0031b59168

2013-04-15 Thread Pawel Jakub Dawidek
, v_mount = 
 0x0, v_nmntvnodes = {
 tqe_next = 0xfe014fd95760, tqe_prev = 0xfe011d500958}, v_un = 
 {vu_mount = 0x0, vu_socket = 0x0, 
 vu_cdev = 0x0, vu_fifoinfo = 0x0}, v_hashlist = {le_next = 0x0, le_prev = 
 0x0}, v_cache_src = {
 lh_first = 0x0}, v_cache_dst = {tqh_first = 0x0, tqh_last = 
 0xfe01967007b0}, v_cache_dd = 0x0, 
   v_lock = {lock_object = {lo_name = 0x80dddbb1 zfs, lo_flags = 
 91881472, lo_data = 0, 
   lo_witness = 0x0}, lk_lock = 1, lk_exslpfail = 0, lk_timo = 51, lk_pri 
 = 96}, v_interlock = {
 lock_object = {lo_name = 0x807bfbb9 vnode interlock, lo_flags = 
 16908288, lo_data = 0, 
   lo_witness = 0x0}, mtx_lock = 6}, v_vnlock = 0xfe01967007c8, 
 v_actfreelist = {
 tqe_next = 0xfe0031985b10, tqe_prev = 0xfe014fd95820}, v_bufobj = 
 {bo_mtx = {lock_object = {
 lo_name = 0x807bfbc9 bufobj interlock, lo_flags = 16908288, 
 lo_data = 0, 
 lo_witness = 0x0}, mtx_lock = 6}, bo_ops = 0x80a5af10, 
 bo_object = 0x0, bo_synclist = {
   le_next = 0x0, le_prev = 0x0}, bo_private = 0xfe0196700760, 
 __bo_vnode = 0xfe0196700760, 
 bo_clean = {bv_hd = {tqh_first = 0x0, tqh_last = 0xfe0196700880}, 
 bv_root = 0x0, bv_cnt = 0}, 
 bo_dirty = {bv_hd = {tqh_first = 0x0, tqh_last = 0xfe01967008a0}, 
 bv_root = 0x0, bv_cnt = 0}, 
 bo_numoutput = 0, bo_flag = 0, bo_bsize = 131072}, v_pollinfo = 0x0, 
 v_label = 0x0, v_lockf = 0x0, 
   v_rl = {rl_waiters = {tqh_first = 0x0, tqh_last = 0xfe01967008e8}, 
 rl_currdep = 0x0}, v_cstart = 0, 
   v_lasta = 0, v_lastw = 0, v_clen = 0, v_holdcnt = 0, v_usecount = 0, 
 v_iflag = 128, v_vflag = 4, 
   v_writecount = 0, v_hash = 26636295, v_type = VBAD}
 
 
 # kgdb -n 0
 GNU gdb 6.1.1 [FreeBSD]
 Copyright 2004 Free Software Foundation, Inc.
 GDB is free software, covered by the GNU General Public License, and you are
 welcome to change it and/or distribute copies of it under certain conditions.
 Type show copying to see the conditions.
 There is absolutely no warranty for GDB.  Type show warranty for details.
 This GDB was configured as amd64-marcel-freebsd...
 
 Unread portion of the kernel message buffer:
 panic: negative refcount 0xfe0059a400c8
 cpuid = 0
 KDB: stack backtrace:
 db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xff823aff8770
 kdb_backtrace() at kdb_backtrace+0x39/frame 0xff823aff8820
 vpanic() at vpanic+0x127/frame 0xff823aff8860
 kassert_panic() at kassert_panic+0x136/frame 0xff823aff88d0
 closef() at closef+0x1ff/frame 0xff823aff8960
 closefp() at closefp+0xa0/frame 0xff823aff89b0
 amd64_syscall() at amd64_syscall+0x1f9/frame 0xff823aff8ab0
 Xfast_syscall() at Xfast_syscall+0xfb/frame 0xff823aff8ab0
 --- syscall (6, FreeBSD ELF64, sys_close), rip = 0x80aeaaa8a, rsp = 
 0x7fffbd28, rbp = 0x7fffbd40 ---
 Uptime: 21m3s
 [...]
 (kgdb) bt
 #0  doadump (textdump=1) at pcpu.h:231
 #1  0x804f5827 in kern_reboot (howto=260) at 
 /freebsd-src/local/sys/kern/kern_shutdown.c:447
 #2  0x804f5d36 in vpanic (fmt=value optimized out, ap=value 
 optimized out)
 at /freebsd-src/local/sys/kern/kern_shutdown.c:754
 #3  0x804f5bc6 in kassert_panic (fmt=value optimized out)
 at /freebsd-src/local/sys/kern/kern_shutdown.c:642
 #4  0x804b900f in closef (fp=value optimized out, td=value 
 optimized out) at refcount.h:66
 #5  0x804b7030 in closefp (fdp=0xfe018dc79800, fd=value 
 optimized out, fp=0xfe0059a400a0, 
 td=0xfe016dfca920, holdleaders=value optimized out)
 at /freebsd-src/local/sys/kern/kern_descrip.c:1136
 #6  0x806e26c9 in amd64_syscall (td=0xfe016dfca920, traced=0) at 
 subr_syscall.c:134
 #7  0x806cb13b in Xfast_syscall () at exception.S:387
 #8  0x00080aeaaa8a in ?? ()
 Previous frame inner to this frame (corrupt stack?)
 Current language:  auto; currently minimal
 (kgdb) 
 
  
  Thanks,
  
  Shawn Webb
  ___
  freebsd-current@freebsd.org mailing list
  http://lists.freebsd.org/mailman/listinfo/freebsd-current
  To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://mobter.com


pgprAfmBcAPgt.pgp
Description: PGP signature


Re: pidfile_open incorrectly returns EAGAIN when pidfile is locked

2013-03-14 Thread Pawel Jakub Dawidek
On Thu, Mar 14, 2013 at 08:28:25AM +0100, Dirk Engling wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 On 13.03.13 23:08, Pawel Jakub Dawidek wrote:
 
  I think I considered something similar at first, but the change I 
  proposed was optimal, IMHO at the cost of producing pretty large
  diff, because of indentation change. But to be sure, can you send a
  patch of your proposed change?
 
 http://erdgeist.org/arts/software/Code/pidfile.c.diff

Right. Your patch assumes EWOULDBLOCK is equal to EAGAIN, which is true
on FreeBSD, but is not portable. Also in case pidptr is NULL you compare
errno three times instead of just one (not a big deal of course, just
something that could be done a bit more optimal:)).

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://tupytaj.pl


pgpdpnLR5mSxt.pgp
Description: PGP signature


Re: pidfile_open incorrectly returns EAGAIN when pidfile is locked

2013-03-14 Thread Pawel Jakub Dawidek
On Thu, Mar 14, 2013 at 09:42:40AM -0400, John Baldwin wrote:
 On Thursday, March 14, 2013 4:44:20 am Pawel Jakub Dawidek wrote:
  On Thu, Mar 14, 2013 at 08:28:25AM +0100, Dirk Engling wrote:
   -BEGIN PGP SIGNED MESSAGE-
   Hash: SHA1
   
   On 13.03.13 23:08, Pawel Jakub Dawidek wrote:
   
I think I considered something similar at first, but the change I 
proposed was optimal, IMHO at the cost of producing pretty large
diff, because of indentation change. But to be sure, can you send a
patch of your proposed change?
   
   http://erdgeist.org/arts/software/Code/pidfile.c.diff
  
  Right. Your patch assumes EWOULDBLOCK is equal to EAGAIN, which is true
  on FreeBSD, but is not portable. Also in case pidptr is NULL you compare
  errno three times instead of just one (not a big deal of course, just
  something that could be done a bit more optimal:)).
 
 Geeze, why not just add an else.  That's the really short diff:

Heh, I did consider that as well, but here you check errno twice,
instead of once. Guys, is there anything wrong with the patch I
proposed?

 Index: pidfile.c
 ===
 --- pidfile.c (revision 248162)
 +++ pidfile.c (working copy)
 @@ -140,7 +140,8 @@ pidfile_open(const char *path, mode_t mode, pid_t
   *pidptr = -1;
   if (errno == 0 || errno == EAGAIN)
   errno = EEXIST;
 - }
 + } else if (errno == EWOULDBLOCK)
 + errno = EEXIST;
   free(pfh);
   return (NULL);
   }

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://tupytaj.pl


pgpXFp9L0bjdx.pgp
Description: PGP signature


Re: pidfile_open incorrectly returns EAGAIN when pidfile is locked

2013-03-14 Thread Pawel Jakub Dawidek
On Thu, Mar 14, 2013 at 10:11:07AM -0700, Chuck Swiger wrote:
 Hi--
 
 On Mar 14, 2013, at 9:50 AM, John Baldwin wrote:
  On Thursday, March 14, 2013 12:29:58 pm Pawel Jakub Dawidek wrote:
 
 [ ... ]
  Heh, I did consider that as well, but here you check errno twice,
  instead of once. Guys, is there anything wrong with the patch I
  proposed?
  
  I'm sure the compiler can work that out just fine and it should do whatever
  is most readable to the programmer.  I don't care either way.
 
 Strong +1.  Having the code be correct and readable is much more important
 then trying to hand-optimize a single-digit # of integer compares in
 startup code that usually runs ~once per process.

Well, I think my version is more obvious, just the diff is larger.
Anyway, I think enough has been said already about this crucial change:)

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://tupytaj.pl


pgpeAnHpmUc3i.pgp
Description: PGP signature


Re: pidfile_open incorrectly returns EAGAIN when pidfile is locked

2013-03-13 Thread Pawel Jakub Dawidek
On Wed, Mar 13, 2013 at 11:18:36AM -0400, John Baldwin wrote:
 On Tuesday, March 12, 2013 4:16:32 pm Dirk Engling wrote:
  While debugging my own daemon I noticed that pidfile_open does not
  perform the appropriate checks for a running daemon if the caller does
  not provide a pidptr to pidfile_open
  
  fd = flopen(pfh-pf_path,
  O_WRONLY | O_CREAT | O_TRUNC | O_CLOEXEC | O_NONBLOCK, mode);
  
  fails when another daemon holds the lock and flopen sets errno to
  EAGAIN, the check 4 lines below in
  
  if (errno == EWOULDBLOCK  pidptr != NULL) {
  
  means that the pidfile_read is never executed.
  
  This results in my second daemon receiving an EAGAIN which clearly was
  meant to report a race condition between two daemons starting at the
  same time and the first one not yet finishing pidfile_write.
  
  The expected behavior would be to set errno to EEXIST, even if no pidptr
  was passed.
 
 Yes, I think it should actually perform the same logic even if pidptr is
 NULL of waiting for the other daemon to finish starting up.  Something like
 this:
 
 Index: lib/libutil/pidfile.c
 ===
 --- pidfile.c (revision 248162)
 +++ pidfile.c (working copy)
 @@ -100,6 +100,7 @@ pidfile_open(const char *path, mode_t mode, pid_t
   struct stat sb;
   int error, fd, len, count;
   struct timespec rqtp;
 + pid_t dummy;
  
   pfh = malloc(sizeof(*pfh));
   if (pfh == NULL)
 @@ -126,7 +127,9 @@ pidfile_open(const char *path, mode_t mode, pid_t
   fd = flopen(pfh-pf_path,
   O_WRONLY | O_CREAT | O_TRUNC | O_CLOEXEC | O_NONBLOCK, mode);
   if (fd == -1) {
 - if (errno == EWOULDBLOCK  pidptr != NULL) {
 + if (errno == EWOULDBLOCK) {
 + if (pidptr == NULL)
 + pidptr = dummy;
   count = 20;
   rqtp.tv_sec = 0;
   rqtp.tv_nsec = 500;

I agree EEXIST should be returned, but I don't like reading existing
pidfile (including waiting for the other process to write its PID) just
to throw read PID away.

How about this patch?

http://people.freebsd.org/~pjd/patches/pidfile.c.patch

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://tupytaj.pl


pgp85B_RZoSMW.pgp
Description: PGP signature


Re: pidfile_open incorrectly returns EAGAIN when pidfile is locked

2013-03-13 Thread Pawel Jakub Dawidek
On Wed, Mar 13, 2013 at 10:59:17PM +0100, Dirk Engling wrote:
 
 On Wed, 13 Mar 2013, Pawel Jakub Dawidek wrote:
 
  How about this patch?
 
  http://people.freebsd.org/~pjd/patches/pidfile.c.patch
 
 If you move the lines
 
 + if (errno == 0 || errno == EAGAIN)
 + errno = EEXIST;
 
 out of the else branch, you can get rid of the if branch, guard the else 
 branch by a
 
 + if (pidptr) {
 
 and let the if (errno == 0 || errno == EAGAIN) fix the errno

I think I considered something similar at first, but the change I
proposed was optimal, IMHO at the cost of producing pretty large diff,
because of indentation change. But to be sure, can you send a patch of
your proposed change?

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://tupytaj.pl


pgp1futIji1g8.pgp
Description: PGP signature


Re: r247839: broken pipe - for top, sudo and ports

2013-03-10 Thread Pawel Jakub Dawidek
On Wed, Mar 06, 2013 at 08:04:57AM -0500, John Baldwin wrote:
 On Tuesday, March 05, 2013 2:35:48 pm Hartmann, O. wrote:
  On recent FreeBSD 10.0-CURRENT/amd64 (CLANG buildworld, serveral systems
  (3) the same symptoms)), many services drop a sporadic
  
  broken pipe
  
  This happesn to system's top (I have to type it several times to get
  finally a top), it happens to sudo su -, it happens to SSH (drops
  connection with broken pipe) and as I reported earlier, it seems to
  affect the entire port system, since I can not build any port, I receive
  
  *** [do-extract] Signal 13
  
  This is dramatic for me, because several modules (rtc, linux_adobe ...)
  can not be recompiled as it is required by the last /usr/src/UPDATING
  entry 20130304.
  
  Since dbus fails to start and even the nVidia driver (which is a kernel
  module, it canot be built and therefore ... ).
  
  Dimitry, I put you into CC, just in case. It seems that the last commits
  (not only the new DRM2 mess) broke something.
  
  I hope that others using FreeBSD 10.0CURRENT with CLANG can confirm this.\
 
 Have you tried backing up to just before all of pjd@'s file descriptor and
 capsicum commits?  It broke some other stuff initially related to fd passing,
 so I don't think it is beyond imagination that it broke something with UNIX
 domain sockets in general.

Is there a consensus already if this is result of my changes or davide's
r247804?

I just upgraded my laptop to today's HEAD and I don't see any weird
behaviour yet. If someone can provide a way to reproduce the problem,
I'd be happy to investigate.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://tupytaj.pl


pgp7F62niw8U0.pgp
Description: PGP signature


Re: HEADS UP: Capsicum overhaul.

2013-03-03 Thread Pawel Jakub Dawidek
On Sun, Mar 03, 2013 at 10:18:02PM +0300, Jan Beich wrote:
 Pawel Jakub Dawidek p...@freebsd.org writes:
 
  I just committed pretty large change that affects not only Capsicum, but
  also descriptor handling code in the kernel. If you will find some
  strange problems after r243611 (like panics, unexpected application
  errors, etc.) I may be at fault. I'll be looking at current@ mailing
  list closly, so report here if you find problems that look related to my
  change.
 
 tmux started to behave weirdly, sometimes failing to attach:
 
   $ printenv
   PATH=/sbin:/bin:/usr/sbin:/usr/bin:/usr/local/sbin:/usr/local/bin
   OLDPWD=/
   DISPLAY=:0
   PWD=/home/foo
   TERM=xterm
   USER=foo
   HOME=/home/foo
   SHELL=/bin/sh
 
   $ ktrace -i tmux -L test -f /dev/null
   $ echo $?
   1
   $ kdump -r | pastebinit -a 'tmux fails to attach'
   http://pastebin.com/U3nCPrFY
 
   $ env -i TERM=$TERM ktrace -i /usr/local/bin/tmux -L test -f /dev/null
   $ ^D
   [exited]
   $ kdump -r | pastebinit -a 'tmux fails to attach (workaround)'
   http://pastebin.com/w1dsUAU4
 
 I've tried so far:
 
   * booting allbsd.org snapshot - no joy
   * enabling capsicum options - no joy
   * reverting recent capsicum commits - works fine

Yes, it was already reported to me and I'm investigating the problem.
Thanks.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://tupytaj.pl


pgpg0QzEgXbEG.pgp
Description: PGP signature


Re: HEADS UP: Capsicum overhaul.

2013-03-03 Thread Pawel Jakub Dawidek
On Sun, Mar 03, 2013 at 10:18:02PM +0300, Jan Beich wrote:
 Pawel Jakub Dawidek p...@freebsd.org writes:
 
  I just committed pretty large change that affects not only Capsicum, but
  also descriptor handling code in the kernel. If you will find some
  strange problems after r243611 (like panics, unexpected application
  errors, etc.) I may be at fault. I'll be looking at current@ mailing
  list closly, so report here if you find problems that look related to my
  change.
 
 tmux started to behave weirdly, sometimes failing to attach:

I committed a work-around in r247740, but the root of the problem is yet
to be found.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://tupytaj.pl


pgp7tAe7RGpg5.pgp
Description: PGP signature


Re: kernel build failure

2013-03-03 Thread Pawel Jakub Dawidek
On Sun, Mar 03, 2013 at 06:47:00PM -0500, Michael Butler wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 SVN r247736 prompts this ..
 
 cc -c -O2 -pipe -fno-strict-aliasing -march=pentium4 -std=c99  -Wall
 - -Wredundant-decls -Wnested-externs -Wstrict-prototypes
 - -Wmissing-prototypes -Wpointer-arith -Winline -Wcast-qual  -Wundef
 - -Wno-pointer-sign -fformat-extensions  -Wmissing-include-dirs
 - -fdiagnostics-show-option  -Wno-error-tautological-compare
 - -Wno-error-empty-body  -Wno-error-parentheses-equality -nostdinc  -I.
 - -I/usr/src/sys -I/usr/src/sys/contrib/altq -D_KERNEL
 - -DHAVE_KERNEL_OPTION_HEADERS -include opt_global.h  -mno-aes -mno-avx
 - -mno-mmx -mno-sse -msoft-float -ffreestanding -fstack-protector -Werror
  /usr/src/sys/kern/uipc_usrreq.c
 /usr/src/sys/kern/uipc_usrreq.c:1689:18: error: use of undeclared
 identifier 'fdep'; did you mean 'fde'?
 filecaps_free(fdep-fde_caps);
^~~~
fde
 /usr/src/sys/kern/uipc_usrreq.c:1682:36: note: 'fde' declared here
 unp_freerights(struct filedescent *fde, int fdcount)
^
 1 error generated.

This was because I divided larger change into smaller changes.
r247738 should be fine.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://tupytaj.pl


pgp3VnbiqBD_t.pgp
Description: PGP signature


Re: HEADS UP: Capsicum overhaul.

2013-03-02 Thread Pawel Jakub Dawidek
On Fri, Mar 01, 2013 at 10:24:01PM -0500, Michael Butler wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 On 03/01/13 21:05, Pawel Jakub Dawidek wrote:
  I just committed pretty large change that affects not only Capsicum, but
  also descriptor handling code in the kernel. If you will find some
  strange problems after r243611 (like panics, unexpected application
  errors, etc.) I may be at fault. I'll be looking at current@ mailing
  list closly, so report here if you find problems that look related to my
  change.
  
 
 Two things I noted with a kernel @ SVN r247608:
 
 On reboot ..
 
 newsyslog: can't mv /var/log/debug.log.zGwQSBb to /var/log/debug.log:
 Capabilities insufficient
 
  .. and X/intel/KMS fails to start
 
 reverting to SVN r247497 (my previous compilation) performs as expected,

I've fixed the rename problem in r247616. Not sure if this will fix X.
Could you give it a try?

Thank you for the report!

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://tupytaj.pl


pgpNcW5DpTt_W.pgp
Description: PGP signature


Re: HEADS UP: Capsicum overhaul.

2013-03-02 Thread Pawel Jakub Dawidek
On Fri, Mar 01, 2013 at 09:45:02PM -0600, Larry Rosenman wrote:
 On Sat, 2 Mar 2013, Pawel Jakub Dawidek wrote:
 
  I just committed pretty large change that affects not only Capsicum, but
  also descriptor handling code in the kernel. If you will find some
  strange problems after r243611 (like panics, unexpected application
  errors, etc.) I may be at fault. I'll be looking at current@ mailing
  list closly, so report here if you find problems that look related to my
  change.
 
 
 
 Similar to another post:
 vn up
 Updating '.':
 Udatabases/py-sqlite3/Makefile
 Udatabases/py-sqlite3/files/setup.py
 Udatabases/py-sqlite3/files/setup3.py
 svn: E93: Can't move '/usr/ports/.svn/tmp/svn-X6U5KQ' to 
 '/usr/ports/databases/py-sqlite3/Makefile': Capabilities insufficient
 # svn up
 svn: E155037: Previous operation has not finished; run 'cleanup' if it was 
 interrupted
 # svn cleanup
 svn: E93: Can't move '/usr/ports/.svn/tmp/svn-Bb1iSM' to 
 '/usr/ports/databases/py-sqlite3/Makefile': Capabilities insufficient

This should be now fixed in r247616.

Thank you for the report!

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://tupytaj.pl


pgpRi27mevghK.pgp
Description: PGP signature


HEADS UP: Capsicum overhaul.

2013-03-01 Thread Pawel Jakub Dawidek
I just committed pretty large change that affects not only Capsicum, but
also descriptor handling code in the kernel. If you will find some
strange problems after r243611 (like panics, unexpected application
errors, etc.) I may be at fault. I'll be looking at current@ mailing
list closly, so report here if you find problems that look related to my
change.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://tupytaj.pl


pgpD8eAHegq2k.pgp
Description: PGP signature


Re: geli(8) breaks after a couple hours of uptime

2013-02-10 Thread Pawel Jakub Dawidek
On Sun, Feb 10, 2013 at 09:50:58AM +0200, Andriy Gapon wrote:
 on 10/02/2013 01:35 Pawel Jakub Dawidek said the following:
  geli(8) almost exclusively deals with sensitive data. Even mlocking
  MAXPHYS would fail with current limits, but this is bad idea.
  
  With mlockall() I am sure I didn't miss anything - be it forgetting
  about mlocking some buffer or zeroing it before munlock. I'm also sure
  someone else who can modify geli(8) in the future won't miss anything
  too.
 
 Well, the geli is not such a complex program really.  It seems to use only two
 or so buffers for sensitive data. [...]

Maybe it isn't very complex, but complex enough that you missed a dozen
or so buffers that would need mlocking (almost everything that is
bzero'ed), not to mention internal states for hash and encryption
algorithms that operate on blocks, so they can keep plain data until
their update method gather entire block. Encryption and HMAC calculation
is done by API used by both userland and kernel parts, so it would need
some ifdefs to make it work, thus further complicating entire thing.

 [...] As far as I can see geli deals only with some
 key management (reading keys, generating key from key material, etc). There is
 definitely no need to mlock the code, etc.

I fully agree there is no need to mlock the code and I'd be happy to use
mlockall(2) flag that protects only the data. Until such flag is
introduced I'll keep mlocking everything.

 I think that PAGE_SIZE (or at most a small multiple of it) should be 
 sufficient.
 I don't think that we currently have (or expect to see in the near future)
 algorithms where keys with more than 4096 size provide any additional 
 security.

geli(8) deals just fine with files that are larger than buffers, so even
with smaller buffer it can read the data in few steps.

The proposed patch is here if someone would like to give it a try:

http://people.freebsd.org/~pjd/patches/geom_eli.c.patch

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://tupytaj.pl


pgpm_Gk3PBtuD.pgp
Description: PGP signature


Re: geli(8) breaks after a couple hours of uptime

2013-02-09 Thread Pawel Jakub Dawidek
On Sat, Feb 09, 2013 at 06:00:53PM +0200, Andriy Gapon wrote:
 on 09/02/2013 17:04 Andrey Zonov said the following:
  On 2/9/13 5:07 PM, Fabian Keil wrote:
 
  This would at least prevent the segfault.
 
  
  I see two possibilities to get segfault:
- no checking for result from memory allocation functions
- too big stack
  
  I have no found any broken memory allocation checking, but I found two
  big objects on the stack.  One is buf[MAXPHYS] in eli_genkey_files() and
  another is passbuf[MAXPHYS] in eli_genkey_passphrase().  If we change
  these two to malloc(), then we can handle error from malloc(), print
  some useful message and prevent segfault.
 
 I'd rather do what Kostik suggested and Fabian mentioned: instead of
 mlockall(MCL_FUTURE) the code should mlock only the (explicitly designated)
 buffers that can contain sensitive data.

geli(8) almost exclusively deals with sensitive data. Even mlocking
MAXPHYS would fail with current limits, but this is bad idea.

With mlockall() I am sure I didn't miss anything - be it forgetting
about mlocking some buffer or zeroing it before munlock. I'm also sure
someone else who can modify geli(8) in the future won't miss anything
too.

geli(8) is relatively simple program, it doesn't allocate megabytes of
memory for different pruposes and just needs few pages for sensitive
data. As I said most of the memory it uses is for sensitive data.

The obvious problem is allocating MAXPHYS on the stack. This has to be
changed, especially that we may want to rise MAXPHYS in the future.

Other than that I expect thing would be tuned properly so that geli(8)
can work by default. I'm happy to use smaller buffers than MAXPHYS -
keyfiles are far smaller usually than 128kB, so there shouldn't be any
issue with this.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://tupytaj.pl


pgpjcefoSL7Mf.pgp
Description: PGP signature


Re: acpi resume related patch

2013-01-28 Thread Pawel Jakub Dawidek
On Sat, Jan 26, 2013 at 04:18:57PM +0200, Andriy Gapon wrote:
 on 25/01/2013 18:08 Andriy Gapon said the following:
  on 25/01/2013 15:51 John Baldwin said the following:
  On Friday, January 25, 2013 3:43:33 am Andriy Gapon wrote:
 
  If you have ACPI suspend/resume working, if it used to work but stopped 
  working
  at some time, if it never worked, but you are still hoping, could you 
  please
  test the following patch and report back?
 
  http://svn.freebsd.by/files/acpi-apic-wakeup-final.patch
 
  This will break systems not using the local APIC since you unconditionally
  call lapic_setup() on resume.This was part of the feature of the previous
  code that by using a dummy pic it could register it only when the local 
  APIC
  was used.
  
  Thank you for drawing my attention to this.  I will try to fix this issue.
  The reason I want to remove lapic from 'pics' (and I already described it 
  in a
  private email) is that Local APIC is a special kind of PIC.  It's already
  explicitly initialized by APs.  Putting it into 'pics' tailq just 
  obfuscates the code.
  
  It should also be registered before any of the I/O APICs are by
  the design of the local_apic.c code.
  
  In fact, as I see in the code, Local APIC is always registered _after_ I/O 
  APICs.
  And thus lapic_resume was called after ioapic_resume.
  Additionally, currently there is no synchronization between initialization 
  of
  Local APICs on APs and initialization of I/O APICs at the wakeup/resume 
  time.
  
 
 Here is an updated version of the patch:
 http://people.freebsd.org/~avg/acpi-apic-wakeup.2.patch

FYI, it doesn't change anything for me. Resume seems to work, but
suspend just reset my laptop.

I unload all driver modules (including if_em, sound, nvidia, usb).
The only driver I keep is ahci.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://tupytaj.pl


pgpeGRk3y8xZd.pgp
Description: PGP signature


Re: acpi resume related patch

2013-01-28 Thread Pawel Jakub Dawidek
On Mon, Jan 28, 2013 at 03:07:59PM +0200, Andriy Gapon wrote:
 on 28/01/2013 14:45 Pawel Jakub Dawidek said the following:
  FYI, it doesn't change anything for me. Resume seems to work, but
  suspend just reset my laptop.
 
 Sorry, I am a little confused by this description.
 How can you know that resume works if suspend resets the machine?

Because everything turns off, including display and the power led starts
to throb and it does that until I try to resume it.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://tupytaj.pl


pgp9FvUaKpvfH.pgp
Description: PGP signature


Re: acpi resume related patch

2013-01-28 Thread Pawel Jakub Dawidek
On Mon, Jan 28, 2013 at 03:40:18PM +0200, Andriy Gapon wrote:
 on 28/01/2013 15:24 Pawel Jakub Dawidek said the following:
  On Mon, Jan 28, 2013 at 03:07:59PM +0200, Andriy Gapon wrote:
  on 28/01/2013 14:45 Pawel Jakub Dawidek said the following:
  FYI, it doesn't change anything for me. Resume seems to work, but
  suspend just reset my laptop.
 
  Sorry, I am a little confused by this description.
  How can you know that resume works if suspend resets the machine?
  
  Because everything turns off, including display and the power led starts
  to throb and it does that until I try to resume it.
  
 
 Ah, so suspend and resume got accidentally reverted...
 I see, thank you for the report.

I just noticed that, yes, sorry.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://tupytaj.pl


pgpZdzy5Vr92A.pgp
Description: PGP signature


Re: ACPI panic on unplugging the power cord.

2013-01-26 Thread Pawel Jakub Dawidek
On Fri, Jan 25, 2013 at 04:19:44PM -0500, Jung-uk Kim wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 On 2013-01-25 04:26:02 -0500, Pawel Jakub Dawidek wrote:
  On Thu, Jan 24, 2013 at 05:18:48PM +0100, Pawel Jakub Dawidek
  wrote:
  One is when I leave laptop idle for some time (few hours?):
  
  http://people.freebsd.org/~pjd/misc/acpi_idle_panic_0.jpg 
  http://people.freebsd.org/~pjd/misc/acpi_idle_panic_1.jpg
  
  Small update. This panic doesn't happen when the system is idle,
  it happens we I close the lid to the point when display is turned
  off (which is almost closed, but not entirely closed).
  
  Closing lid doesn't trigger suspend or anything like that. It just
  turns off the display.
 
 Please try the attached patch (with my previous patch).  Also,
 available from here:
 
 http://people.freebsd.org/~jkim/acpi_exit.diff

Your patch fixes panic triggered by unplugging power cable as well the
one triggered by closing the lid. I haven't tried booting from battery
and connecting power cable, but I expect happy end there as well.
Thank you very much for the fix.

BTW. When I'm closing the lid and opening it again (not cosing it
entirely, just to the point when display is turned off) I see this
message to be printed twice on the console:

CPU0: local APIC error 0x80

Although I must admit I see this message from time to time, but I
haven't found a pattern. Closing and opening the lid always make it
appear. Not sure how much it is related to ACPI, though.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://tupytaj.pl


pgpgOFfnWqmSv.pgp
Description: PGP signature


Re: ACPI panic on unplugging the power cord.

2013-01-25 Thread Pawel Jakub Dawidek
On Thu, Jan 24, 2013 at 05:18:48PM +0100, Pawel Jakub Dawidek wrote:
 One is when I leave laptop idle for some time (few hours?):
 
   http://people.freebsd.org/~pjd/misc/acpi_idle_panic_0.jpg
   http://people.freebsd.org/~pjd/misc/acpi_idle_panic_1.jpg

Small update. This panic doesn't happen when the system is idle, it
happens we I close the lid to the point when display is turned off
(which is almost closed, but not entirely closed).

Closing lid doesn't trigger suspend or anything like that. It just turns
off the display.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://tupytaj.pl


pgpt_6zGlZaPi.pgp
Description: PGP signature


Re: ACPI panic on unplugging the power cord.

2013-01-24 Thread Pawel Jakub Dawidek
On Wed, Jan 23, 2013 at 07:54:57PM -0500, Jung-uk Kim wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1
 
 On 2013-01-22 12:56:29 -0500, Pawel Jakub Dawidek wrote:
  I just upgraded to HEAD today and was wondering what will explode. 
  Now I know.
  
  When I unplug power cord from my laptop, ACPI panics. Pictures
  here:
  
  http://people.freebsd.org/~pjd/misc/acpi_panic_0.jpg 
  http://people.freebsd.org/~pjd/misc/acpi_panic_1.jpg
  
  Let me know if you need more info.
 
 Can you please try the attached patch?  It is also available from here:
 
 http://people.freebsd.org/~jkim/utcache.diff
 
 Please note the patch may or may not fix the problem but I think I
 found an ancient bug. :-(

This patch didn't fix the panic:

http://people.freebsd.org/~pjd/misc/acpi_unplug_panic_0.jpg
http://people.freebsd.org/~pjd/misc/acpi_unplug_panic_1.jpg

In the meantime I found two other panics.

One is when I leave laptop idle for some time (few hours?):

http://people.freebsd.org/~pjd/misc/acpi_idle_panic_0.jpg
http://people.freebsd.org/~pjd/misc/acpi_idle_panic_1.jpg

And when is when I boot laptop without power connected and I connect the
power:

http://people.freebsd.org/~pjd/misc/acpi_power_connect_panic_0.jpg
http://people.freebsd.org/~pjd/misc/acpi_power_connect_panic_1.jpg

BTW. On the acpi_power_connect_panic_0.JPG photo, at the top of the
screen you can see error messages that are logged every second when I
have power disconnected. This is not a new problem. I had this problem
when I bought this laptop, but now that I'm reporting those bugs, I can
as well let you know about this one.

I'm running this on Thinkpad T530.

I understand that those problems are specific to my laptop and that on
your laptop all of the above work just fine?

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://tupytaj.pl


pgpEcz4SnSEh8.pgp
Description: PGP signature


Re: ZFS + usb in trouble?

2013-01-23 Thread Pawel Jakub Dawidek
On Wed, Jan 23, 2013 at 09:11:23PM +0900, Alexander Nedotsukov wrote:
 And now both are failing, although zfs in a different way.

Yes, but now we are sure this is not ZFS issue. Sequential read simply
doesn't trigger the corruption, but ZFS access patterns do. The read
errors you are seeing are due to GELI returning EIO on integrity
verification error.

 # zpool status
   pool: testpool
  state: ONLINE
 status: One or more devices has experienced an unrecoverable error.  An
   attempt was made to correct the error.  Applications are unaffected.
 action: Determine if the device needs to be replaced, and clear the errors
   using 'zpool clear' or replace the device with 'zpool replace'.
see: http://illumos.org/msg/ZFS-8000-9P
   scan: scrub repaired 1M in 0h2m with 0 errors on Wed Jan 23 20:57:11 2013
 config:
 
   NAME  STATE READ WRITE CKSUM
   testpool  ONLINE   0 0 0
 raidz1-0ONLINE   0 0 0
   da5.elid  ONLINE   0 0 0
   da5.elie  ONLINE  21 0 0
   da5.elif  ONLINE   0 0 0
 
 # dmesg | tail -20
 GEOM_ELI: da1.eli: Failed to authenticate 131072 bytes of data at offset 
 1010827264.
 ath0: bb hang detected (0x4), resetting
 GEOM_ELI: Device da5.eli created.
 GEOM_ELI: Encryption: AES-XTS 128
 GEOM_ELI:  Integrity: HMAC/SHA1
 GEOM_ELI: Crypto: software
 GEOM_ELI: da5.eli: Failed to authenticate 4096 bytes of data at offset 4096.
 GEOM_ELI: da5.eli: Failed to authenticate 4096 bytes of data at offset 0.
 GEOM_ELI: da5.eli: Failed to authenticate 5792 bytes of data at offset 
 229288288.
 GEOM_ELI: da5.eli: Failed to authenticate 65536 bytes of data at offset 
 229429248.
 GEOM_ELI: da5.eli: Failed to authenticate 131072 bytes of data at offset 
 229298176.
 GEOM_ELI: da5.eli: Failed to authenticate 26496 bytes of data at offset 
 229494784.
 GEOM_ELI: da5.eli: Failed to authenticate 12288 bytes of data at offset 
 270299136.
 GEOM_ELI: da5.eli: Failed to authenticate 65536 bytes of data at offset 
 270319616.
 GEOM_ELI: da5.eli: Failed to authenticate 5792 bytes of data at offset 
 273095488.
 GEOM_ELI: da5.eli: Failed to authenticate 56864 bytes of data at offset 
 273105376.
 GEOM_ELI: da5.eli: Failed to authenticate 2400 bytes of data at offset 
 273326080.
 GEOM_ELI: da5.eli: Failed to authenticate 65536 bytes of data at offset 
 272560128.
 GEOM_ELI: da5.eli: Failed to authenticate 131072 bytes of data at offset 
 272429056.
 GEOM_ELI: da5.eli: Failed to authenticate 32768 bytes of data at offset 
 272396288.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://tupytaj.pl


pgpN4AvBRoAtD.pgp
Description: PGP signature


ACPI panic on unplugging the power cord.

2013-01-22 Thread Pawel Jakub Dawidek
I just upgraded to HEAD today and was wondering what will explode.
Now I know.

When I unplug power cord from my laptop, ACPI panics. Pictures here:

http://people.freebsd.org/~pjd/misc/acpi_panic_0.jpg
http://people.freebsd.org/~pjd/misc/acpi_panic_1.jpg

Let me know if you need more info.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://tupytaj.pl


pgpaejQVh5GkP.pgp
Description: PGP signature


Re: ZFS + usb in trouble?

2013-01-22 Thread Pawel Jakub Dawidek
On Tue, Jan 22, 2013 at 11:47:29PM +0900, Alexander Nedotsukov wrote:
 Hi Pawel,
 
 Here what I did.
 
 # geli onetime -a hmac/sha1 -s 4096 /dev/da5 
 # dmesg | tail -15
 wlan0: link state changed to UP
 ugen5.4: USBest Technology at usbus5
 umass2: USBest Technology USB Mass Storage Device, class 0/0, rev 2.00/1.00, 
 addr 4 on usbus5
 umass2:  SCSI over Bulk-Only; quirks = 0x0100
 umass2:4:2:-1: Attached to scbus4
 da5 at umass-sim2 bus 2 scbus4 target 0 lun 0
 da5: USB2.0 FlashDisk 0.00 Removable Direct Access SCSI-2 device 
 da5: 40.000MB/s transfers
 da5: 983MB (2015231 512 byte sectors: 64H 32S/T 983C)
 GEOM_ELI: Device da5.eli created.
 GEOM_ELI: Encryption: AES-XTS 128
 GEOM_ELI:  Integrity: HMAC/SHA1
 GEOM_ELI: Crypto: software
 GEOM_ELI: da5.eli: Failed to authenticate 4096 bytes of data at offset 4096.
 GEOM_ELI: da5.eli: Failed to authenticate 4096 bytes of data at offset 0.
 
 # dd if=/dev/random of=/dev/da5.eli bs=1m
 dd: /dev/da5.eli: short write on character device
 dd: /dev/da5.eli: end of device
 875+0 records in
 874+1 records out
 917151744 bytes transferred in 1530.831674 secs (599120 bytes/sec)
 
 # dd if=/dev/da5.eli of=/dev/null bs=1m
 874+1 records in
 874+1 records out
 917151744 bytes transferred in 178.874312 secs (5127353 bytes/sec)
 
 All clear. No new errors.
 
 Zpool created on the same usb stick is still failing with cksum errors.
 
 # usbconfig -d 5.4 add_quirk UQ_MSC_NO_SYNC_CACHE
 
 umass2:  SCSI over Bulk-Only; quirks = 0x4000
 
 Re-created zpool is still failing with cksum errors. Every new scrub run is 
 triggering another repairing.

Interesting. Can you put ZFS on top of GELI with authentication after
filling da0.eli with random random and see if GELI will report
corruption when ZFS will or if only ZFS will report corruption?

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://tupytaj.pl


pgpwnenyZmG1u.pgp
Description: PGP signature


Re: ZFS + usb in trouble?

2013-01-21 Thread Pawel Jakub Dawidek
On Sat, Jan 19, 2013 at 11:26:39PM +0900, Alexander Nedotsukov wrote:
 Let's use some space out of it.
 
 #dd if=/dev/zero of=/tank/foo
 ^C250939+0 records in
 250938+0 records out
 128480256 bytes transferred in 30.402453 secs (4225983 bytes/sec)
 
 Oops...
 
 #zpool status
   pool: tank
  state: ONLINE
 status: One or more devices has experienced an unrecoverable error.  An
   attempt was made to correct the error.  Applications are unaffected.
 action: Determine if the device needs to be replaced, and clear the errors
   using 'zpool clear' or replace the device with 'zpool replace'.
see: http://illumos.org/msg/ZFS-8000-9P
   scan: scrub repaired 5K in 0h0m with 0 errors on Sat Jan 19 23:11:20 2013
 config:
 
   NAMESTATE READ WRITE CKSUM
   tankONLINE   0 0 0
 raidz1-0  ONLINE   0 0 0
   da1 ONLINE   0 0 1
   da2 ONLINE   0 0 0
   da3 ONLINE   0 0 1

This seems like umass/usb issue (hps@ CCed). Can you also try GELI with
data integrity verification?


# geli onetime -a hmac/sha1 -s 4096 /dev/da0

It will report some problems due to GEOM tasting, but ignore them and run:

# dd if=/dev/random of=/dev/da0.eli bs=1m

Once that done try to read the data back and look at the logs to see if
any corruptions are reported.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://tupytaj.pl


pgpRaNMtMVNc2.pgp
Description: PGP signature


Re: kmem_map auto-sizing and size dependencies

2013-01-21 Thread Pawel Jakub Dawidek
On Fri, Jan 18, 2013 at 08:26:04AM -0800, m...@freebsd.org wrote:
   Should it be set to a larger initial value based on min(physical,KVM) space
   available?
 
 It needs to be smaller than the physical space, [...]

Or larger, as the address space can get fragmented and you might not be
able to allocate memory even if you have physical pages available.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://tupytaj.pl


pgphr9pcEGGow.pgp
Description: PGP signature


Re: ZFS + usb in trouble?

2013-01-21 Thread Pawel Jakub Dawidek
I used wrong Hans' login, resending with the proper e-mail.

On Mon, Jan 21, 2013 at 09:58:52PM +0100, Pawel Jakub Dawidek wrote:
 On Sat, Jan 19, 2013 at 11:26:39PM +0900, Alexander Nedotsukov wrote:
  Let's use some space out of it.
  
  #dd if=/dev/zero of=/tank/foo
  ^C250939+0 records in
  250938+0 records out
  128480256 bytes transferred in 30.402453 secs (4225983 bytes/sec)
  
  Oops...
  
  #zpool status
pool: tank
   state: ONLINE
  status: One or more devices has experienced an unrecoverable error.  An
  attempt was made to correct the error.  Applications are unaffected.
  action: Determine if the device needs to be replaced, and clear the errors
  using 'zpool clear' or replace the device with 'zpool replace'.
 see: http://illumos.org/msg/ZFS-8000-9P
scan: scrub repaired 5K in 0h0m with 0 errors on Sat Jan 19 23:11:20 2013
  config:
  
  NAMESTATE READ WRITE CKSUM
  tankONLINE   0 0 0
raidz1-0  ONLINE   0 0 0
  da1 ONLINE   0 0 1
  da2 ONLINE   0 0 0
  da3 ONLINE   0 0 1
 
 This seems like umass/usb issue (hps@ CCed). Can you also try GELI with
 data integrity verification?
 
 
   # geli onetime -a hmac/sha1 -s 4096 /dev/da0
 
 It will report some problems due to GEOM tasting, but ignore them and run:
 
   # dd if=/dev/random of=/dev/da0.eli bs=1m
 
 Once that done try to read the data back and look at the logs to see if
 any corruptions are reported.
 
 -- 
 Pawel Jakub Dawidek   http://www.wheelsystems.com
 FreeBSD committer http://www.FreeBSD.org
 Am I Evil? Yes, I Am! http://tupytaj.pl


-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://tupytaj.pl


pgphVtGYdmMmd.pgp
Description: PGP signature


Re: auditdistd config file location

2012-12-13 Thread Pawel Jakub Dawidek
On Tue, Dec 04, 2012 at 04:38:13PM +0100, Johan Hendriks wrote:
 Hello all.
 
 I had some spare time so i thought it was a good moment to look at 
 auditdistd .
 One thing i noticed was the default config file location.
 The man page and the wiki tells me it is /etc/security/auditdistd
 
 I enabled audistd by placing the following in the rc.conf file
 auditdistd_enable=YES
 
 However if i want to start the deamon, it tells me the config file is 
 not present.
 And that is correct, because my config is in /etc/security/ and not /etc/
 
 [root@smb-filer01 ~]# /etc/rc.d/auditdistd start
 /etc/rc.d/auditdistd: WARNING: /etc/auditdistd.conf is not readable.
 /etc/rc.d/auditdistd: WARNING: failed precmd routine for auditdistd
 [root@smb-filer01 ~]#
 
 I think the default location of the config file needs to be modified to 
 match that of the wiki page and the man page.

The one in the man page is correct (/etc/security/auditdistd.conf).
This was last minute change, which I just fixed (r244181).
Thank you for the report!

If you have any other questions, don't hesitate to ask. Feel free to CC
me when sending a question to the mailing list.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://tupytaj.pl


pgp59eN4q5ojB.pgp
Description: PGP signature


Re: [head tinderbox] failure on ia64/ia64

2012-12-01 Thread Pawel Jakub Dawidek
On Sat, Dec 01, 2012 at 03:17:57AM +, FreeBSD Tinderbox wrote:
 cc -c -O2 -pipe -fno-strict-aliasing  -std=c99  -Wall -Wredundant-decls 
 -Wnested-externs -Wstrict-prototypes  -Wmissing-prototypes -Wpointer-arith 
 -Winline -Wcast-qual  -Wundef -Wno-pointer-sign -fformat-extensions  
 -Wmissing-include-dirs -fdiagnostics-show-option   -nostdinc  -I. -I/src/sys 
 -I/src/sys/contrib/altq -I/src/sys/contrib/ia64/libuwx/src -D_KERNEL 
 -DHAVE_KERNEL_OPTION_HEADERS -include opt_global.h -fno-common 
 -finline-limit=15000 --param inline-unit-growth=100 --param 
 large-function-growth=1000 -fno-builtin -mconstant-gp -ffixed-r13 
 -mfixed-range=f32-f127 -fpic -ffreestanding -Werror  
 /src/sys/kern/vfs_lookup.c
 /src/sys/kern/vfs_lookup.c:214:54: error: macro AUDIT_ARG_UPATH1 passed 4 
 arguments, but takes just 3
 /src/sys/kern/vfs_lookup.c: In function 'namei':
 /src/sys/kern/vfs_lookup.c:214: error: 'AUDIT_ARG_UPATH1' undeclared (first 
 use in this function)
 /src/sys/kern/vfs_lookup.c:214: error: (Each undeclared identifier is 
 reported only once
 /src/sys/kern/vfs_lookup.c:214: error: for each function it appears in.)
 /src/sys/kern/vfs_lookup.c:216:54: error: macro AUDIT_ARG_UPATH2 passed 4 
 arguments, but takes just 3
 /src/sys/kern/vfs_lookup.c:216: error: 'AUDIT_ARG_UPATH2' undeclared (first 
 use in this function)
 *** [vfs_lookup.o] Error code 1
[...]

My recent commits broke the build in two ways, both should be fixed now,
I'm very sorry about that.

The code did compile for me on my development box, but it turned out
source code on my laptop didn't have latest changes. I tried to compile
it on my laptop before the commit, but run into problems with my
environment (my laptop is running HEAD from before the switch to clang
and the compilation was failing very early for other reasons), it is
late at night to figure out the fix (which was trivial: CC=gcc), so I
assumed that if the code builds on dev box it is fine. It wasn't.
My mistake, I should've waited till moring. Once again, sorry about
that!

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://tupytaj.pl


pgpMxZDeqDTng.pgp
Description: PGP signature


Re: [HEADSUP] zfs root pool mounting

2012-12-01 Thread Pawel Jakub Dawidek
On Fri, Nov 30, 2012 at 08:51:48AM +0200, Daniel Braniss wrote:
  
  Recently some changes were made to how a root pool is opened for root 
  filesystem
  mounting.  Previously the root pool had to be present in zpool.cache.  Now 
  it is
  automatically discovered by probing available GEOM providers.
  The new scheme is believed to be more flexible.  For example, it allows to 
  prepare
  a new root pool at one system, then export it and then boot from it on a new
  system without doing any extra/magical steps with zpool.cache.  It could 
  also be
  convenient after zpool split and in some other situations.
  
  The change was introduced via multiple commits, the latest relevant 
  revision in
  head is r243502.  The changes are partially MFC-ed, the remaining parts are
  scheduled to be MFC-ed soon.
  
  I have received a report that the change caused a problem with booting on 
  at least
  one system.  The problem has been identified as an issue in local 
  environment and
  has been fixed.  Please read on to see if you might be affected when you 
  upgrade,
  so that you can avoid any unnecessary surprises.
  
  You might be affected if you ever had a pool named the same as your current 
  root
  pool.  And you still have any disks connected to your system that belonged 
  to that
  pool (in whole or via some partitions).  And that pool was never properly
  destroyed using zpool destroy, but merely abandoned (its disks
  re-purposed/re-partitioned/reused).
  
  If all of the above are true, then I recommend that you run 'zdb -l disk' 
  for
  all suspect disks and their partitions (or just all disks and partitions).  
  If
  this command reports at least one valid ZFS label for a disk or a partition 
  that
  do not belong to any current pool, then the problem may affect you.
  
  The best course is to remove the offending labels.
  
  If you are affected, please follow up to this email.
 
 GREATE
 in a diskless environment, /boot is read only, and the zpool.cache issue
 has been bothering me ever since, there was no way (and I tried) to re route 
 it.

I believe zpool.cache is not required only for root pool anymore and
that you still need it if you want non-root pools to be automatically
configured after reboot. Am I right, Andriy?

Zpool.cache basically tells ZFS which pools should be automatically
imported and file systems mounted. You can have disks in your system
with ZFS pools that should not be auto-imported and zpool.cache is the
way to tell the difference.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://tupytaj.pl


pgpbgMyrchJJv.pgp
Description: PGP signature


Re: ZFS TRIM support committed to HEAD.

2012-09-25 Thread Pawel Jakub Dawidek
On Sun, Sep 23, 2012 at 08:20:41PM -0400, Glen Barber wrote:
 Hi Pawel,
 
 On Sun, Sep 23, 2012 at 09:53:58PM +0200, Pawel Jakub Dawidek wrote:
  FYI, I just committed TRIM support to ZFS, especially useful for
  SSD-only pools. This is something I implemented long time ago, but was
  now motivated to get back to it and commit it finally by some great
  fixes and improvements from the zfsonlinux project (made by Etienne
  Dechamps).
  
 
 Great!  Thanks for this.
 
 Any chance you can document the following sysctls?

None od the kstat sysctls are documented, they emulate kstat framework
from Solaris. We would need to modify a lot of vendor code to document
those in 'sysctl -d' output. It still would be good to have them
documented even elsewhere, eventhough most are rather self-explanatory.

 root@kaos:/root # sysctl -d kstat.zfs.misc.zio_trim
 kstat.zfs.misc.zio_trim: 
 kstat.zfs.misc.zio_trim.zio_trim_bytes: 
 kstat.zfs.misc.zio_trim.zio_trim_success: 
 kstat.zfs.misc.zio_trim.zio_trim_unsupported: 
 kstat.zfs.misc.zio_trim.zio_trim_failed: 

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://tupytaj.pl


pgpYMnkgz0Sv6.pgp
Description: PGP signature


Re: ZFS TRIM support committed to HEAD.

2012-09-24 Thread Pawel Jakub Dawidek
On Mon, Sep 24, 2012 at 04:55:20PM +0100, Steven Hartland wrote:
 - Original Message - 
 From: Bob Bishop r...@gid.co.uk
 To: Pawel Jakub Dawidek p...@freebsd.org
 Cc: freebsd-current@freebsd.org; freebsd...@freebsd.org; Steven 
 Hartland kill...@multiplay.co.uk
 Sent: Monday, September 24, 2012 3:17 PM
 Subject: Re: ZFS TRIM support committed to HEAD.
 
 
  Hi,
  
  On 23 Sep 2012, at 23:25, Pawel Jakub Dawidek wrote:
  
  I have a patch against stable/8, but not stable/9:
  
  http://people.freebsd.org/~pjd/patches/zfstrim8.patch
  
  Running with that in an otherwise 8-STABLE GENERIC amd64 kernel, I'm 
  getting:
  
  kstat.zfs.misc.zio_trim.zio_trim_bytes: 0
  kstat.zfs.misc.zio_trim.zio_trim_success: 0
  kstat.zfs.misc.zio_trim.zio_trim_unsupported: 0
  kstat.zfs.misc.zio_trim.zio_trim_failed: 2742
  
  which doesn't look like it's working. The SSDs are:
  
  ad4: 114473MB ADATA SSD S510 120GB 3.3.2 at ata2-master UDMA100 SATA 3Gb/s
  ad6: 114473MB ADATA SSD S510 120GB 3.3.2 at ata3-master UDMA100 SATA 3Gb/s
  
  Any suggestions? Thanks
 
 Don't think ad supports TRIM, switch to ada (ahci) and you should be good.
 
 Although I'm surprised your seeing that many reported failures as it should
 have disabled it on a pool level after the first few failures.
 
 Is it still increasing?

Note that 'failed' count is increasing, not the 'unsupported' count.
We disable TRIM automatically if we get EOPNOTSUPP and ATA is returning
some other error(s).

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://tupytaj.pl


pgpNdLRO6o97M.pgp
Description: PGP signature


Re: ZFS TRIM support committed to HEAD.

2012-09-24 Thread Pawel Jakub Dawidek
On Tue, Sep 25, 2012 at 12:14:24AM +0100, Bob Bishop wrote:
 Hi,
 
 Still seems to be working OK, but:
 
 seagoon# zpool status
   pool: m1
  state: ONLINE
 status: One or more devices has experienced an unrecoverable error.  An
   attempt was made to correct the error.  Applications are unaffected.
 action: Determine if the device needs to be replaced, and clear the errors
   using 'zpool clear' or replace the device with 'zpool replace'.
see: http://www.sun.com/msg/ZFS-8000-9P
   scan: scrub repaired 0 in 0h2m with 0 errors on Mon Sep 24 23:52:08 2012
 config:
 
   NAME   STATE READ WRITE CKSUM
   m1 ONLINE   0 0 0
 mirror-0 ONLINE   0 0 0
   gpt/disk1  ONLINE109M 0 0
   gpt/disk0  ONLINE109M 0 0
 
 errors: No known data errors
 seagoon# sysctl -a |grep _trim
 kstat.zfs.misc.zio_trim.zio_trim_bytes: 228731904
 kstat.zfs.misc.zio_trim.zio_trim_success: 19406
 kstat.zfs.misc.zio_trim.zio_trim_unsupported: 0
 kstat.zfs.misc.zio_trim.zio_trim_failed: 0
 seagoon# 
 
 No device errors logged in messages, and scrub comes up clean as you can see. 
 The read error count is increasing, but otherwise everything appears to work 
 OK.

Are you sure your world and kernel are in sync? I remember seeing
similar problem when my userland was updated.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://tupytaj.pl


pgpEsD6sK3yL7.pgp
Description: PGP signature


ZFS TRIM support committed to HEAD.

2012-09-23 Thread Pawel Jakub Dawidek
FYI, I just committed TRIM support to ZFS, especially useful for
SSD-only pools. This is something I implemented long time ago, but was
now motivated to get back to it and commit it finally by some great
fixes and improvements from the zfsonlinux project (made by Etienne
Dechamps).

Note that this functionality is turned off by default for now.
To turn it on you need to add vfs.zfs.trim_disable=0 to
/boot/loader.conf. You can see some statistics under
kstat.zfs.misc.zio_trim sysctl.

Big thanks to multiplay.co.uk for sponsoring this work!

BTW. If you find this useful, so tell me about it during EuroBSDcon 2012
in a month that will be held this year in Warsaw, Poland:

http://eurobsdcon.org

:)

On Sun, Sep 23, 2012 at 07:40:58PM +, Pawel Jakub Dawidek wrote:
 Author: pjd
 Date: Sun Sep 23 19:40:58 2012
 New Revision: 240868
 URL: http://svn.freebsd.org/changeset/base/240868
 
 Log:
   Add TRIM support.
   
   The code builds a map of regions that were freed. On every write the
   code consults the map and eventually removes ranges that were freed
   before, but are now overwritten.
   
   Freed blocks are not TRIMed immediately. There is a tunable that defines
   how many txg we should wait with TRIMming freed blocks (64 by default).
   
   There is a low priority thread that TRIMs ranges when the time comes.
   During TRIM we keep in-flight ranges on a list to detect colliding
   writes - we have to delay writes that collide with in-flight TRIMs in
   case something will be reordered and write will reached the disk before
   the TRIM. We don't have to do the same for in-flight writes, as
   colliding writes just remove ranges to TRIM.
   
   Sponsored by:   multiplay.co.uk
   
   This work includes some important fixes and some improvements obtained
   from the zfsonlinux project, including TRIMming entire vdevs on pool
   create/add/attach and on pool import for spare and cache vdevs.
   
   Obtained from:  zfsonlinux
   Submitted by:   Etienne Dechamps etienne.decha...@ovh.net

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://tupytaj.pl


pgpuE8lmhRoHn.pgp
Description: PGP signature


Re: ZFS TRIM support committed to HEAD.

2012-09-23 Thread Pawel Jakub Dawidek
On Sun, Sep 23, 2012 at 10:24:53PM +0100, Bob Bishop wrote:
 Hi,
 
 On 23 Sep 2012, at 20:53, Pawel Jakub Dawidek wrote:
 
  FYI, I just committed TRIM support to ZFS, especially useful for
  SSD-only pools. [etc]
 
 Is any of this applicable to -STABLE or 8.x?

I have a patch against stable/8, but not stable/9:

http://people.freebsd.org/~pjd/patches/zfstrim8.patch

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://tupytaj.pl


pgpO35odYgQfi.pgp
Description: PGP signature


Re: [CFC/CFT] large changes in the loader(8) code

2012-06-28 Thread Pawel Jakub Dawidek
On Thu, Jun 28, 2012 at 08:33:17AM -0700, Marcel Moolenaar wrote:
 
 On Jun 28, 2012, at 3:10 AM, Stefan Esser wrote:
  
  All of the above is ugly, U'm afraid :(
 
 Indeed. The only sane way is to put the metadata in a partition of its own.
 Every compliant OS will respect that and consequently will not scribble over
 the data unintentionally. Any other scheme that puts valuable data in some
 undocumented or unregistered location is violating the GPT spec right away
 and is susceptible to being clobbered unintentionally.

If the user runs:

# gpart create -s GPT /dev/mirror/foo

for me it is obvious that he wants to partition the mirror device and
not individual disks. Because the mirror was configured earlier, do you
expect gmirror to somehow detect that someone is writting GPT metadata
later and magically place GPT metadata on the raw disk and move mirror's
metadata to some magic partition? Not to mention that the mirror itself
doesn't have to be configured on top of raw disks. And not to mention
that the mirror may never be partitioned.

If GPT in your opinion is limited only to raw disks then I guess the
best way to fix that is to refuse to configure GPT on anything except
raw disks (which was already proposed by Andrey?). In my opinion this is
unacceptable, but I think this is what you are suggesting.

One of the GEOM design goals was to be flexible. Let the user decide in
what order he wants to configure various layers. How do you know that in
every possible scenerio software mirroring should come after
partitioning and encryption after mirroring? Why can't we provide
flexible tools to the user and let him decide? Maybe GPT nesting
violates standards, but why can't we support it as an extention, really?

I recognize the need to warn users if they use FreeBSD-specific
features. We do that with non-standard APIs. So how about this.

Let's modify gpart(8) to print a warning if GPT is configured on
something else than raw disk. Let's the warning say that such
configuration is non-standard and problems are expected if the disk is
shared between other OSes.

In my opinion that's fair.

With such a warning in place, I think we can allow users to decide on
their own if they really want that or not. Then, we can also improve
FreeBSD boot loader to play nice with FreeBSD-specific extensions.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://tupytaj.pl


pgpk4jIfg3rJt.pgp
Description: PGP signature


Re: [CFC/CFT] large changes in the loader(8) code

2012-06-28 Thread Pawel Jakub Dawidek
On Thu, Jun 28, 2012 at 02:54:43PM -0700, Marcel Moolenaar wrote:
 On Jun 28, 2012, at 12:49 PM, Alexander Leidinger wrote:
  Or are you suggesting to
  convince all BIOS vendors to include the ability to boot from some kind
  of FreeBSD private partitioning scheme (not MBR as it is not
  suitable, not GPT as you are not OK to use it on a gmirror)?
 
 I would be having less problems if the mirroring didn't force the backup
 GPT header in anything but the last sector. [...]

GPT backup header is placed in the last sector of the mirror device,
just like the user asked. Gmirror doesn't force anything. User decides
to put GPT partitioning on the mirror device instead of raw disk.
Gmirror doesn't even know and doesn't have to know how the user uses
data area on the mirror device.

 [...] If the metadata was somewhere
 else, then we wouldn't need to kluge various places to deal with the
 ambiguity and visible interoperability problems of the various tools and
 OSes. [...]

Where is somewhere else, exactly?

If somewhere else on this disk, then where? At the begining of the disk?
Then you would complain that it keeps metadata where the primary header
should be located and also MBR metadata, BSDlabel metadata, etc.
Somewhere in the middle of the disk? Some future GPTng may want to use
the same spot, but also gmirror-unaware boot loader will see corrupted
data (shifted by one sector). Come on...

If somewhere else is not on this disk, then I'm sorry, but this is
totally impractical. Disks are the place you store stuff. In 99% of the
cases there is no other place to store it, but the disk itself. Should
we ask users to use additional disk to keep mirror's metadata?

 [...] Thus, it's not that I object to the mirroring per se, just to the
 mirroring as it is currently implemented with gmirror.

Do you know software RAID (=1) or volume manager that doesn't keep
metadata on component disks?

PS. We are discussing two totally different things here:
1. Is placing GPT on anything but raw disk violates the spec? I can
   agree that it does and I'm happy with gpart(8) growing a warning.
2. How to do software mirroring. Besides trying really hard I'm not sure
   what alternative are you proposing. Could you be more specific and
   describe how gmirror should be implemented in your opinion?

  What about multipathing? In case the disk is attached via two paths but
  multipath is not enabled, the OS sees the same disk (and the same
  identical unique disk identifier) multiple times. Is this a violation
  of the spec too?
 
 It's the same disk, isn't it? The OS can actually use the property
 of the ID to infer that it has already seen this disk and not create
 multiple device nodes.

You cannot trust some id that is found on disk to be unique, as all
your assumptions break when the user decides to dd(1)-copy content of
this disk to another disk, for example.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://tupytaj.pl


pgp4nneOO8jiW.pgp
Description: PGP signature


Re: [CFC/CFT] large changes in the loader(8) code

2012-06-27 Thread Pawel Jakub Dawidek
On Wed, Jun 27, 2012 at 08:22:25AM -0400, John Baldwin wrote:
  I don't think so. Most common case is to configure partitions on top of
  a mirror. Mirroring partitions is less common. Mostly because of
  hardware RAIDs being popular. You don't expect hardware RAID vendor to
  mirror partitions. Partition editors for other OS's won't work, but only
  because they don't support gmirror. If they wouldn't recognize and
  support some hardware (or pseudo-hardware) RAIDs there will be the same
  problem.
 
 Hardware RAIDs hide the metadata from the disk that the BIOS (and disk
 editors) see.  Thus, putting a GPT on a hardware RAID volume works fine
 as the logical volume is always seen by all OS's consistently. [...]

Only if you won't connect this disk to a different controller.

 [...] The same
 is even true of the software RAID that graid supports since the metadata
 is defined by the vendor and thus the logical volume is always seen other
 OS's consistently.

But is it seen without metadata by the boot loader?

What I'm trying to say is that it is fair to expect from the user to not
use gmirror-configured disk on different OS. If the user wants to use
this disk in different OS then he has to use format that is recognized
by both.

Because gmirror is supported by FreeBSD we should improve the support by
teaching boot loader about it. Pretending gmirror is special and
recommending to mirror partitions with it instead of raw disks is not
the solution.

I really can't see how gmirror is different in this regard from any
other software RAID or volume manager. If you try to use disk that
contains unrecognized metadata the behaviour is undefined (but hopefully
not a panic).

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://tupytaj.pl


pgpQmYWVuPgKs.pgp
Description: PGP signature


Re: [CFC/CFT] large changes in the loader(8) code

2012-06-27 Thread Pawel Jakub Dawidek
On Wed, Jun 27, 2012 at 10:37:11AM -0700, Marcel Moolenaar wrote:
 
 On Jun 26, 2012, at 10:37 AM, John Baldwin wrote:
  
  GPT really wants the backup header at the last LBA.  I know you can set it, 
  but I've interpreted that as a way to see if the primary header is correct 
  or 
  not.  It seems to me that GPT tables created in this fashion (inside a GEOM 
  provider) will not work properly with partition editors for other OS's.  
  I'm 
  hesitant to encourage the use of this as I do think putting GPT inside of a 
  gmirror violates the GPT spec.
 
 Agreed.

Guys. This doesn't violate the GPT spec in any way. The spec is
narrow-minded if it talks only about raw disks, but you should think
about gmirror as pseudo-hardware RAID. That's all. If putting GPT on top
of RAID array is spec violation, then I guess we just have to live with it.

 While it is a nice trick to use the last sector for meta data, it does
 create 2 problems. 1 is mentioned above. [...]

It doesn't really matter where gmirror puts its metadata. If gmirror
would keep its metadata in the first sector, gpart/gpt will find its
metadata in the last sector and will complain about missing primary
header.

 [...] The second is that when there's
 different metadata in the first *and* the last sector, you can't decide
 which is to take precedence without also looking at the other and know
 how to interpret it. We have not solved this second problem at all.  We
 do get reports about the problems though. At best we're handwaving or
 kluging.

This is different kind of problem. It took me a while to realize that,
but now I know:)

The real problem is that not all metadata formats are suitable for
autodetection. That's all.

The metadata I use in my GEOM classes play nice with autodetection.
The solution is very easy - keep size of the disk device within metadata.
This allows gmirror to figure out if it is configured on raw disk, last
slice or last partition within last slice, etc.
If GPT would keep disk size in its metadata the second problem you
mentioned would not exist. And to be honest GPT kinda does that by having
backup header's LBA stored in the primary header. And this is fine as
long the primary header is valid.

The same problem is with things like UFS labels. There is no way to
properly support them using GEOM autodetection, because there is no
provider size in UFS superblock. UFS superblock contains file system
size, but it is not the same, as one can create smaller file system than
the underlying disk device.

 I think it's unwise to depend on FreeBSD-specific extensions or features
 in industry-standard partitioning schemes and as such make the use of
 foreign tools hard if not impossible.

If you plan to use the given disk with FreeBSD only, what's the problem?
Partitioning is not the end of the world. Even if you use
industry-standard partitioning schemes what file system are you going
to use to actually access your data? FAT? Of course if you do share your
disk between various OSes then probably your best bet is to use MBR or
GPT on raw disk and FAT file system. But if you use your disk with
FreeBSD only, then I see no reason to not to leverage FreeBSD-specific
features (be it gmirror, geli or zfs).

 A much more flexible approach is to support out-of-band configuration
 data. This allows us to mirror GPT disks without having to become non-
 standard as it removes the need to use the last sector for meta-data.
 The ability to construct GEOM hierarchies unambiguously is very
 important and our current approach has proven to not deliver on that.
 This is actually impacting existing FreeBSD consumers already, like
 Juniper. So, se should not go deeper into this rabbit hole. We should
 finally solve this problem for real...

Marcel, nothing stops anyone from implementing GEOM mirror class that
uses no on-disk metadata. GEOM is not a limiting factor here. GEOM does
provide mechanism for autoconfiguration, but it is totally optional and
GEOM class might choose not to use it.

As an example you can take a look at two other GEOM classes of mine:
gconcat(8) and gstripe(8). You can use 'label' subcommand to store
metadata on component disks, which will take advantage of  GEOM
autodetection and autoconfiguration. You can also use 'create'
subcommand to create ad hoc provider that stores no metadata and makes
use of entire disks, which also means it won't be automatically created
on next boot.

For Juniper it might be more handy to use out-of-band configuration as
you know the hardware you are running on, so you know where the disks
are exactly, etc. My company build appliances too, so I have been there.
For most of our users automatic configuration is simply better, as they
can shuffle disks around and not wonder if the system will boot or not.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http

Re: [CFC/CFT] large changes in the loader(8) code

2012-06-27 Thread Pawel Jakub Dawidek
On Wed, Jun 27, 2012 at 10:45:35AM -0700, Marcel Moolenaar wrote:
 
 On Jun 26, 2012, at 2:43 PM, Pawel Jakub Dawidek wrote:
  
  As for sharing disk with other OS. If you share the disk with OS that
  doesn't support gmirror, you shouldn't use gmirror in the first place.
  You probably want to use only formats that are recognized by all your
  OSes.
 
 This statement is ridicuous by virtue of not being in touch with
 reality and by making gmirror useless for such wide range of cases
 that one can question why we have it at all.
 
 Put differently: a mirroring class is a fairly basic and useful thing
 to have. Limiting it's use is nothing but artificial and follows from
 having to use the underlying provider to store metadata. This then
 changes the view of the underlying providing to consumers above gmirror
 in a way that makes the presence or absence of gmirror visible.
 Solving the visibility problem makes gmirror useful all the time.
 I see that as a better way of looking at it than simply blurting out
 that you shouldn't use gmirror when certain awkward and artifical
 conditions apply.

I'm sorry, Marcel, but what you describe here has nothing to do with
reality. To be able to implement realiable mirroring you have to use
on-disk metadata. There is no way around that. You can implement
non-redundant GEOM classes without using on-disk metadata, but
out-of-band configuration in case of mirroring is simply naive. How do
you detect that components are out of sync, for example?

And when it comes to visablity. Are you suggesting that gmirror should
present entire underlying provider to upper layers? Including its
metadata? I hope not, because we went through that hell already
(remember skipping first 16 sectors by UFS, as BSDlabel metadata might
be there? The same for swap?).
I think I did pretty good job by making the metadata as simple as
possible - I use exactly one sector at the end of the target device.
I'm really having a hard time to think of a simpler format.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://tupytaj.pl


pgpwcpFwrh1lk.pgp
Description: PGP signature


Re: [CFC/CFT] large changes in the loader(8) code

2012-06-26 Thread Pawel Jakub Dawidek
On Tue, Jun 26, 2012 at 04:50:36PM +0400, Andrey V. Elsukov wrote:
 Hi All,
 
 Some time ago i have started reading the code in the sys/boot.
 Especially i'm interested in the partition tables handling.
 I found several problems:
 1. There are several copies of the same code in the libi386/biosdisk.c
 and common/disk.c, and partially libpc98/biosdisk.c.
 2. ZFS probing is very slow, because the ZFS code doesn't know how many
 disks and partitions the system has:
   http://www.freebsd.org/cgi/query-pr.cgi?pr=148296
   http://www.freebsd.org/cgi/query-pr.cgi?pr=161897
 3. The GPT support doesn't check CRC and even doesn't know anything
 about the secondary GPT header/table.

Just a quick note here. At some point when I was adding GPT attributes
to allow for test starts I greatly improved, at least parts of, the GPT
implementation. I did implement support for both CRC checksum
verification and fallback to backup GPT header when primary is broken.
And the code is still in sys/boot/common/gpt.c. So my question would be
what do you mean by this sentence?

 So, i have created the branch and committed the changes:
   http://svnweb.freebsd.org/base/user/ae/bootcode/
 The patch is here:
   http://people.freebsd.org/~ae/boot.diff
 
 What i already did:
 1. The partition tables handling now is machine independent,
 and it is compatible with the kernel's GEOM_PART implementation.
 There is new API for disk drivers in the loader to get information
 about partitions and tables:
 common/Makefile.inc
   common/part.c
   common/part.h
 
 2. The similar and general code from the disk drivers merged in the
 disk.c:
 common/disk.c
 common/disk.h
 i386/libi386/libi386.h
 i386/libi386/biosdisk.c
 userboot/test/test.c
 userboot/userboot/userboot_disk.c
 userboot/userboot.h
 3. ZFS code now uses new API and probing on the systems with many disks
 should be greatly increased:
 zfs/zfs.c
 i386/loader/main.c
 4. The gptboot now searches the backup GPT header in the previous sectors,
 when it finds the GEOM:: signature in the last sector. PMBR code also
 tries to do the same:
 common/gpt.c
 i386/pmbr/pmbr.s
 
 5. Also the pmbr image now contains one fake partition record.
 When several first sectors are damaged the kernel can't detect GPT
 (see RECOVERING section in the gpart(8)). We can restore PMBR with dd(1)
 command, but the old pmbr image has an empty partition table and
 loader doesn't able to boot from GPT, when there is no partition record
 in the PMBR. Now it will be able. When pmbr is installed via 'gpart bootcode'
 command, the kernel correctly modifies this partition record. So, this is only
 for the first rescue step.
 
 6. I have changed userboot interface. I guess there is none consumers except
 the one test program. But if it isn't that, i can make it compatible.
 
 Any comments are welcome.
 
 -- 
 WBR, Andrey V. Elsukov
 
 



-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://tupytaj.pl


pgpWMg9tnH9tN.pgp
Description: PGP signature


Re: [CFC/CFT] large changes in the loader(8) code

2012-06-26 Thread Pawel Jakub Dawidek
On Tue, Jun 26, 2012 at 06:01:26PM +0400, Andrey V. Elsukov wrote:
 On 26.06.2012 16:57, Pawel Jakub Dawidek wrote:
  On Tue, Jun 26, 2012 at 04:50:36PM +0400, Andrey V. Elsukov wrote:
  Hi All,
 
  Some time ago i have started reading the code in the sys/boot.
  Especially i'm interested in the partition tables handling.
  I found several problems:
  1. There are several copies of the same code in the libi386/biosdisk.c
  and common/disk.c, and partially libpc98/biosdisk.c.
  2. ZFS probing is very slow, because the ZFS code doesn't know how many
  disks and partitions the system has:
 http://www.freebsd.org/cgi/query-pr.cgi?pr=148296
 http://www.freebsd.org/cgi/query-pr.cgi?pr=161897
  3. The GPT support doesn't check CRC and even doesn't know anything
  about the secondary GPT header/table.
  
  Just a quick note here. At some point when I was adding GPT attributes
  to allow for test starts I greatly improved, at least parts of, the GPT
  implementation. I did implement support for both CRC checksum
  verification and fallback to backup GPT header when primary is broken.
  And the code is still in sys/boot/common/gpt.c. So my question would be
  what do you mean by this sentence?
 
 Yes, gptboot does that, but the loader/zfsloader doesn't. So there might
 be a situation when gptboot does boot, but loader(8) can't.

I see. I don't know if I'll find time for a proper review, but it is
really great that you are working on cleaning up this huge mess.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://tupytaj.pl


pgp62kXgtS21u.pgp
Description: PGP signature


Re: [CFC/CFT] large changes in the loader(8) code

2012-06-26 Thread Pawel Jakub Dawidek
On Tue, Jun 26, 2012 at 01:37:11PM -0400, John Baldwin wrote:
  4. The gptboot now searches the backup GPT header in the previous sectors,
  when it finds the GEOM:: signature in the last sector. PMBR code also
  tries to do the same:
  common/gpt.c
  i386/pmbr/pmbr.s
 
 GPT really wants the backup header at the last LBA.  I know you can set it, 
 but I've interpreted that as a way to see if the primary header is correct or 
 not. [...]

My interpretation is different: The way to verify if the header is valid
is to check its checksum, not to check if the backup header location in
the primary header points at the last LBA.

Of course if primary header's checksum is incorrect it is hard to trust
that the backup header location is correct. And we need the backup
header when the primary header is invalid...

 [...] It seems to me that GPT tables created in this fashion (inside a GEOM 
 provider) will not work properly with partition editors for other OS's.  I'm 
 hesitant to encourage the use of this as I do think putting GPT inside of a 
 gmirror violates the GPT spec.

I don't think so. Most common case is to configure partitions on top of
a mirror. Mirroring partitions is less common. Mostly because of
hardware RAIDs being popular. You don't expect hardware RAID vendor to
mirror partitions. Partition editors for other OS's won't work, but only
because they don't support gmirror. If they wouldn't recognize and
support some hardware (or pseudo-hardware) RAIDs there will be the same
problem.

In other words, IMHO, our problem is that FreeBSD's boot code doesn't
recognize/support gmirror's metadata. What Andrey is proposing is to
recognize the metadata and act accordingly - in case of a gmirror we
simply need to skip it.

In the future we will have the same problem with graid - until we add
support for it to the boot code, we won't be able to boot from it.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://tupytaj.pl


pgpxIGC3e4HIp.pgp
Description: PGP signature


Re: [CFC/CFT] large changes in the loader(8) code

2012-06-26 Thread Pawel Jakub Dawidek
On Tue, Jun 26, 2012 at 02:41:31PM -0700, Kevin Oberman wrote:
 Long ago I saw a proposal to create a dedicated partition on GPT to
 hold the metadata. With the large number of partitions available on
 GPT, tying up one just for GEOM seems like a low price and it moves
 the device GEOM out of the realm of FreeBSD unique and subject to
 serious issues when/if a disk is shared with some other OS. I have
 seen little comment on this and have never seen any argument that that
 it could not work.
 
 I think this is an issue that will continue to bite users unless it is fixed.

I don't really see how dedicating a partition for metadata can work or
is good idea, sorry.

As for sharing disk with other OS. If you share the disk with OS that
doesn't support gmirror, you shouldn't use gmirror in the first place.
You probably want to use only formats that are recognized by all your
OSes.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://tupytaj.pl


pgprySQFgONfF.pgp
Description: PGP signature


Re: trim/discard success story

2012-04-06 Thread Pawel Jakub Dawidek
On Tue, Apr 03, 2012 at 06:18:43PM -0700, Julian Elischer wrote:
 for flash drives this is great news..
 Now if ZFS would get trim support, that too would be great.

I'm working on it.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://tupytaj.pl


pgpnW1D0zD64a.pgp
Description: PGP signature


Re: gptboot rewrite, bootonce, etc.

2012-01-31 Thread Pawel Jakub Dawidek
On Tue, Jan 31, 2012 at 06:49:51PM +0300, Andrey Fesenko wrote:
 This work if use ZFS?
 My issues Root on ZFS  GPT and boot to ufs partition
 http://lists.freebsd.org/pipermail/freebsd-fs/2012-January/013514.html
 
 I test
 # gpart show
 =   34  625142381  ada0  GPT  (298G)
  34128 1  freebsd-boot  (64k)
 162   26621952 2  freebsd-ufs  [bootonce,bootme]  (12G)
266221148388608 3  freebsd-swap  (4.0G)
35010722  590131693 4  freebsd-zfs  (281G)
 
 system ada0p2 not boot.

This functionality only works with UFS (gptboot).

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://tupytaj.pl


pgpgkHKPPasLZ.pgp
Description: PGP signature


Snapshot listing speedup.

2012-01-22 Thread Pawel Jakub Dawidek
If you have many snapshots and you were complaining that listing them
takes a lot of time, you may find the commit below useful.

It only works if your listing is limited to snapshot names and you want
to sort also by snapshot name (by default snapshots are sorted by
creation time).

On Sat, Jan 21, 2012 at 09:12:53PM +, Pawel Jakub Dawidek wrote:
 Author: pjd
 Date: Sat Jan 21 21:12:53 2012
 New Revision: 230438
 URL: http://svn.freebsd.org/changeset/base/230438
 
 Log:
   Dramatically optimize listing snapshots when user requests only snapshot
   names and wants to sort them by name, ie. when executes:
   
   # zfs list -t snapshot -o name -s name
   
   Because only name is needed we don't have to read all snapshot properties.
   
   Below you can find how long does it take to list 34509 snapshots from a 
 single
   disk pool before and after this change with cold and warm cache:
   
   before:
   
   # time zfs list -t snapshot -o name -s name  /dev/null
   cold cache: 525s
   warm cache: 218s
   
   after:
   
   # time zfs list -t snapshot -o name -s name  /dev/null
   cold cache: 1.7s
   warm cache: 1.1s

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://tupytaj.pl


pgpCh8Ntx3o73.pgp
Description: PGP signature


Re: 9.0-RC1 panic in tcp_input: negative winow.

2011-12-29 Thread Pawel Jakub Dawidek
On Thu, Dec 29, 2011 at 11:12:59AM -0500, John Baldwin wrote:
 On Sunday, December 25, 2011 11:01:33 am Коньков Евгений wrote:
  Здравствуйте, John.
  
  Вы писали 20 декабря 2011 г., 16:52:44:
  
  JB On Saturday, December 17, 2011 6:21:27 pm Pawel Jakub Dawidek wrote:
   On Mon, Dec 12, 2011 at 11:00:23AM -0500, John Baldwin wrote:
An update.  I've sent Pawel a testing patch to see if my hypothesis is 
correct
(www.freebsd.org/~jhb/patches/tcp_negwin_test.patch).  If it is then I 
intend
to commit www.freebsd.org/~jhb/patches/tcp_negwin2.patch as the fix.
   
   Unfortunately it paniced today. Take a look at:
   
 http://people.freebsd.org/~pjd/misc/tcp_panic.jpg
  
  JB Ok, the one use case I was worried about is happening regularly before 
  your
  JB panic, so that is good.  Can you use gdb to figure out which call to
  JB tcp_output() is actually panic'ing?  I wonder if it is this case:
  
  JB /*
  JB  * Return any desired output.
  JB  */
  JB if (needoutput || (tp-t_flags  TF_ACKNOW)) {
  JB (void) tcp_output(tp);
  JB /* XXX: Debug */
  JB KASSERT(SEQ_GEQ(tp-rcv_adv, tp-rcv_nxt),
  JB (tcp_input: negative window after ACK));
  
  JB And if 'needoutput' is true, but TF_ACKNOW is not set, and tcp_output() 
  decides
  JB to not do anything.  I've updated tcp_negwin_test.patch to not panic if 
  that call
  JB to tcp_output() doesn't actually send a packet.  Please re-test.
  
  
  # uname -a
  FreeBSD meta-up 9.0-PRERELEASE FreeBSD 9.0-PRERELEASE #4: Sat Dec 24 
  13:59:20 EET 2011 @:/usr/obj/usr/src/sys/KES_KERN_v10  i386
  
  rebooting once per day. Now I compile kernel with debug options.
  Can you advice me which and where I find debug info when it will
  reboting next time? so I can help to debug problem
 
 Are you using the patch at the URL above (tcp_negwin_test.patch)?  If not,
 can you try applying that patch and seeing if you still get any panics?

I applied 1.5 days ago, so far now panics and no other messages.
I modified the patch a bit to not panic, but print a message when panic
was suppose to happen. This box is too valuable for me to panic it too
often. Because there were no debug messages I understand that the
scenerio didn't happen yet and not that the problem is fixed, right?

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://yomoli.com


pgp9eZdmwuxI5.pgp
Description: PGP signature


Re: 9.0-RC1 panic in tcp_input: negative winow.

2011-12-17 Thread Pawel Jakub Dawidek
On Mon, Dec 12, 2011 at 11:00:23AM -0500, John Baldwin wrote:
 An update.  I've sent Pawel a testing patch to see if my hypothesis is correct
 (www.freebsd.org/~jhb/patches/tcp_negwin_test.patch).  If it is then I intend
 to commit www.freebsd.org/~jhb/patches/tcp_negwin2.patch as the fix.

Unfortunately it paniced today. Take a look at:

http://people.freebsd.org/~pjd/misc/tcp_panic.jpg

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://yomoli.com


pgpdhigg3fyKF.pgp
Description: PGP signature


Re: 9.0-RC1 panic in tcp_input: negative winow.

2011-12-14 Thread Pawel Jakub Dawidek
On Mon, Dec 12, 2011 at 11:00:23AM -0500, John Baldwin wrote:
 An update.  I've sent Pawel a testing patch to see if my hypothesis is correct
 (www.freebsd.org/~jhb/patches/tcp_negwin_test.patch).  If it is then I intend
 to commit www.freebsd.org/~jhb/patches/tcp_negwin2.patch as the fix.

Sorry for the delay, but I'm just rebooting the box that triggered the
panic with John's patch. I should know more in a day or two.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://yomoli.com


pgp4LFFZCMTh1.pgp
Description: PGP signature


Re: zfs-related(?) panic in cache_enter: wrong vnode type

2011-12-08 Thread Pawel Jakub Dawidek
On Wed, Dec 07, 2011 at 06:50:35PM +0200, Andriy Gapon wrote:
 (kgdb) bt
 #0  doadump (textdump=1) at pcpu.h:224
 #1  0x804f6d3b in kern_reboot (howto=260) at
 /usr/src/sys/kern/kern_shutdown.c:447
 #2  0x804f63e9 in panic (fmt=0x104 Address 0x104 out of bounds) at
 /usr/src/sys/kern/kern_shutdown.c:635
 #3  0x80585f46 in cache_enter (dvp=0xfe003d4763c0,
 vp=0xfe0142517000, cnp=0xff82393b3708) at 
 /usr/src/sys/kern/vfs_cache.c:726
 #4  0x81a90900 in zfs_lookup (dvp=0xfe003d4763c0,
 nm=0xff82393b3140 .., vpp=0xff82393b36e0, cnp=0xff82393b3708,
 nameiop=0, cr=0xfe0042e88100, td=0xfe000fdfa480,
 flags=0) at
 /usr/src/sys/modules/zfs/../../cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:1470

Which FreeBSD version is it? Because lines doesn't seem to match neither
to HEAD nor to stable/8. It would be best of you could send me few lines
around zfs_vnops.c:1470 and vfs_cache.c:726.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://yomoli.com


pgpI7RzOpvuwY.pgp
Description: PGP signature


Re: [RFC] winbond watchdog driver for FreeBSD/i386 and FreeBSD/amd64

2011-12-07 Thread Pawel Jakub Dawidek
On Tue, Jun 28, 2011 at 03:32:41PM -0700, Xin LI wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA256
 
 Hi,
 
 I'd like to request for comments on the attached driver, which supports
 watchdogs on several Winbond super I/O chip models and have been tested
 on a few of recent Supermicro motherboards.

Is there any reason this is not yet committed? Please commit, pretty
please.

One minor problem I found is described below. Other than that it works
for me, thanks!

[...]
 +static void
 +winbondwd_event(void *arg, unsigned int cmd, int *error)
 +{
 + struct winbondwd_softc *sc = arg;
 + unsigned char rtimeout;
 + uint64_t timeout;
 +
 + if (cmd == 0)
 + winbondwd_set_timeout(sc, 0);
 + else {
 + timeout = (uint64_t)1  (cmd  WD_INTERVAL);
 + if (timeout  (uint64_t)0xff * 1000 * 1000 * 1000) {
 + rtimeout = timeout / (1000 * 1000 * 1000);
 + if (rtimeout == 0)
 + rtimeout = 0xff;
 + winbondwd_set_timeout(sc, rtimeout);
 + } else {
 + device_printf(sc-device,
 + Value %u too big, disabling\n, cmd  WD_INTERVAL);
 + /* Proposed timeout can not be satisified */
 + winbondwd_set_timeout(sc, 0);
 + }

You should add '*error = 0;' right here. Without it we return some
random error, in my case watchdgod is able to arm the watchdog but exits
when trying to pat it, as it gets an error, which leads to a reset short
aferwards.

 + }
 +}

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://yomoli.com


pgpOBGKr7YKAI.pgp
Description: PGP signature


Re: zfs i/o hangs on 9-PRERELEASE

2011-11-28 Thread Pawel Jakub Dawidek
On Fri, Nov 25, 2011 at 01:20:01PM -0600, Mark Felder wrote:
 13:14:32 nas:~  uname -a
 FreeBSD nas.feld.me 9.0-PRERELEASE FreeBSD 9.0-PRERELEASE #3 r227971M: 
 Fri Nov 25 10:07:48 CST 2011 
 r...@nas.feld.me:/usr/obj/tank/svn/sys/GENERIC  amd64
 
 This seemed to start happening sometime after RC1. I tried 8-STABLE and 
 it's happening there too right now. I think whatever caused this was 
 MFC'd. I've also reproduced this on completely different hardware 
 running a single disk ZFS pool.
 
 
 I'm getting this output in dmesg after these hangs I keep seeing.

Mark, those backtrace are not related to ZFS, but to PF. Not sure if
they are at all related to your hangs. Most cases where ZFS I/O seems to
hang are hardware problems, where I/O requests are not completed.

'procstat -kk -a' output might be useful once the hang happens.

 uma_zalloc_arg: zone pfrktable with the following non-sleepable locks 
 held:
 exclusive sleep mutex pf task mtx (pf task mtx) r = 0 
 (0x8199af20) locked @ 
 /tank/svn/sys/modules/pf/../../contrib/pf/net/pf_ioctl.c:1589
 KDB: stack backtrace:
 db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
 kdb_backtrace() at kdb_backtrace+0x37
 _witness_debugger() at _witness_debugger+0x2e
 witness_warn() at witness_warn+0x2c4
 uma_zalloc_arg() at uma_zalloc_arg+0x335
 pfr_create_ktable() at pfr_create_ktable+0xd8
 pfr_ina_define() at pfr_ina_define+0x12b
 pfioctl() at pfioctl+0x1c5a
 devfs_ioctl_f() at devfs_ioctl_f+0x7a
 kern_ioctl() at kern_ioctl+0xcd
 sys_ioctl() at sys_ioctl+0xfd
 amd64_syscall() at amd64_syscall+0x3ac
 Xfast_syscall() at Xfast_syscall+0xf7
 --- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x800da711c, rsp = 
 0x7fff9d28, rbp = 0x7fffa1f0 ---

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://yomoli.com


pgp98vrUuromO.pgp
Description: PGP signature


Are there any users of accelerated asymmetric cryptography?

2011-11-12 Thread Pawel Jakub Dawidek
Hi.

I'm in the process of rewriting large parts of the opencrypto framework
and was wondering if there are any users that actually make use of
asymmetric cryptography. From what I see only ubsec(4) driver supports
it and from what I remember those cards are hard to find.

If there are no users of this functionality I plan to remove support for
asym crypto from the opencrypto framework. It is just too hard to maintain
this without test hardware and also quite pointless if nobody uses it.
If some users can be found and they really need it, we have two options:
1. Keep asym crypto around, but whoever steps up have to be ready to
   test patches.
2. Use alternative way of accessing it (eg. via direct ioctl to ubsec
   device), at least until we start to support more cards.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://yomoli.com


pgpfM9n21ZWWb.pgp
Description: PGP signature


Re: ZFS: invalid zap_type=134218628

2011-11-11 Thread Pawel Jakub Dawidek
On Tue, Nov 08, 2011 at 08:02:13PM +0100, Sebastian Chmielewski wrote:
 I've got following error after upgrading my 9.0-RC1 to RC2 using source and
 got this error at loader prompt:
 
 ZFS: invalid zap_type=134218628
 
 when I tried zfsboottest from 9.0-STABLE it failed
 ./zfsboottest /boot/kernel/kernel /dev/ada0p4
 returns pool status correctly but ends with the same message
 
 zfsboottest from 10.0-CURRENT returns pool status, zfsboottest.sh zroot
 printed OK message at the end
 (very useful zfsboottest.sh wasn't yet merged from current to stable).
 I've installed gptzfsboot, zfsloader, loader from -CURRENT and it boots
 fine.
 
 I'm running ZFS on GPT partitions, zfs version 28 with dedup on.
 File system can be successfully mount using FreeBSD memstick.
 
 I think that patches for boot should be merged to 9.0-RELEASE otherwise
 users may have the same problems I had.

Yes, I plan to include those fixes in 9.0-RELEASE.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://yomoli.com


pgpWkozxezUrm.pgp
Description: PGP signature


Re: RFC: GEOM MULTIPATH rewrite

2011-11-01 Thread Pawel Jakub Dawidek
On Mon, Oct 31, 2011 at 10:10:14PM +0200, Alexander Motin wrote:
 Hi.
 
 Attempt to fix some GEOM MULTIPATH issues made me almost rewrite it. So
 I would like to present my results and request for testing and feedback.
 
 The main changes:
  - Improved locking and destruction process to fix crashes in many cases.
  - Improved automatic configuration method to make it safe by reading
 metadata back from all specified paths after writing to one.
  - Added provider size check to reduce chance of conflict with other
 GEOM classes.
  - Added manual configuration method without using on-disk metadata.
  - Added add and remove commands to manage paths manually.
  - Failed paths no longer dropped from GEOM, but only marked as FAIL and
 excluded from I/O operations.
  - Automatically restore failed paths when all others paths are marked
 as failed, for example, because of device-caused (not transport) errors.
  - Added fail and restore commands to manually control FAIL flag.
  - GEOM is now destroyed on last provider disconnection. IMHO it is
 right to do if device was completely removed.
  - Added optional Active/Active mode support. Unlike Active/Passive
 mode, load evenly distributed between all working paths. If supported by
 device, it allows to significantly improve performance, utilizing
 bandwidth of all paths. It is controlled by -A option during creation.
 Disabled by default now.
  - Improved `status` and `list` commands output.
 
 Latest patch can be found here:
 http://people.freebsd.org/~mav/gmultipath4.patch
 
 Feedbacks are welcome!
 
 Sponsored by: iXsystems, Inc.

There are two possible issues that comes to my mind, not sure if you
address them.

1. When configuration is based on on-disk metadata, GEOM spoil/taste is
   not fully helpful - if you have two paths: da0 and da1 and I write
   to da0, gmultipath won't be informed by GEOM that da1 changed as well.
   One solution is to basically keep all paths open exclusively all the
   time, even if gmultipath provider is not open or emulate spoil/taste
   for other paths if any path was modified.

2. In active/active mode do you do anything to handle possible
   reordering? Ie. if you have overlapping writes and send both of them
   using different paths, you cannot be sure that order will be
   preserved. Most of the time that's not a problem, as file systems
   rarely if at all send overlapping writes to device, but this is weak
   assumption.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://yomoli.com


pgpJEqgC1I0SO.pgp
Description: PGP signature


Re: umass(4) regression in 9.0-RC1.

2011-10-28 Thread Pawel Jakub Dawidek
On Fri, Oct 28, 2011 at 09:11:42AM +0200, Hans Petter Selasky wrote:
 On Thursday 27 October 2011 20:51:15 Pawel Jakub Dawidek wrote:
  On Thu, Oct 27, 2011 at 08:42:09PM +0200, Hans Petter Selasky wrote:
   This is the root HUB. Can you also show the actual device?
  
  Sorry, it wasn't connected, here it goes:
  
  ugen0.2: USB2.0-CRW Generic at usbus0, cfg=255 md=HOST spd=HIGH (480Mbps)
  pwr=ON
  
bLength = 0x0012
bDescriptorType = 0x0001
bcdUSB = 0x0200
bDeviceClass = 0x
bDeviceSubClass = 0x
bDeviceProtocol = 0x
bMaxPacketSize0 = 0x0008
idVendor = 0x0bda
idProduct = 0x0119
bcdDevice = 0x1981
iManufacturer = 0x0001  retrieving string failed
iProduct = 0x0002  retrieving string failed
iSerialNumber = 0x0003  retrieving string failed
bNumConfigurations = 0x0001
 
 Hi,
 
 The control request in question is mandatory according to the UMASS 
 specification, and I wonder why it times out and all other control requests 
 aswell.
 
 Could you try setting the no-synchronize cache quirk instead, and then plug 
 your device.
 
 I'm sorry, but this problem needs further investigation before we can make a 
 patch.

It wasn't immediately obvious for me how to set the no-synchronize cache
quirk, but I think I found it:

# usbconfig add_quirk UQ_MSC_NO_SYNC_CACHE

And it seems to work:

umass0: Generic USB2.0-CRW, class 0/0, rev 2.00/19.81, addr 2 on usbus0
(probe0:umass-sim0:0:0:0): TEST UNIT READY. CDB: 0 0 0 0 0 0
(probe0:umass-sim0:0:0:0): CAM status: SCSI Status Error
(probe0:umass-sim0:0:0:0): SCSI status: Check Condition
(probe0:umass-sim0:0:0:0): SCSI sense: UNIT ATTENTION asc:28,0 (Not ready to 
ready change, medium may have changed)
da0 at umass-sim0 bus 0 scbus13 target 0 lun 0
da0: Generic- SD/MMC 1.00 Removable Direct Access SCSI-0 device
da0: 40.000MB/s transfers
da0: 30799MB (63076352 512 byte sectors: 255H 63S/T 3926C)

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://yomoli.com


pgpU6OrpvyK5i.pgp
Description: PGP signature


umass(4) regression in 9.0-RC1.

2011-10-27 Thread Pawel Jakub Dawidek
Hi.

My SDHC card (via adapter) is no longer being detected after upgrade to
9.0-RC1. The same card with this adapter works on my laptop with older
HEAD.

usbus0: EHCI version 1.0
usbus0: NVIDIA nForce MCP55 USB 2.0 controller on ehci0
usbus0: 480Mbps High Speed USB v2.0
ugen0.1: nVidia at usbus0
uhub0: nVidia EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1 on 
usbus0
uhub0: 10 ports with 10 removable, self powered

# usbdevs -v
usbdevs: no USB controllers found

ehci0@pci0:0:10:1:  class=0x0c0320 card=0x81fb1043 chip=0x036d10de 
rev=0xa2 hdr=0x00
vendor = 'nVidia Corporation'
device = 'MCP55 USB Controller'
class  = serial bus
subclass   = USB

After connecting the adapter with the card inside:

kernel: ugen0.2: Generic at usbus0

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://yomoli.com


pgpsXGKnZ7vPK.pgp
Description: PGP signature


Re: umass(4) regression in 9.0-RC1.

2011-10-27 Thread Pawel Jakub Dawidek
On Thu, Oct 27, 2011 at 08:30:44PM +0200, Hans Petter Selasky wrote:
 What does usbconfig dump_device_desc, say about this device?

ugen0.1: EHCI root HUB nVidia at usbus0, cfg=0 md=HOST spd=HIGH (480Mbps) 
pwr=SAVE

  bLength = 0x0012
  bDescriptorType = 0x0001
  bcdUSB = 0x0200
  bDeviceClass = 0x0009
  bDeviceSubClass = 0x
  bDeviceProtocol = 0x0001
  bMaxPacketSize0 = 0x0040
  idVendor = 0x
  idProduct = 0x
  bcdDevice = 0x0100
  iManufacturer = 0x0001  nVidia
  iProduct = 0x0002  EHCI root HUB
  iSerialNumber = 0x  no string
  bNumConfigurations = 0x0001

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://yomoli.com


pgpP7jchJhwkx.pgp
Description: PGP signature


Re: umass(4) regression in 9.0-RC1.

2011-10-27 Thread Pawel Jakub Dawidek
On Thu, Oct 27, 2011 at 08:42:09PM +0200, Hans Petter Selasky wrote:
 This is the root HUB. Can you also show the actual device?

Sorry, it wasn't connected, here it goes:

ugen0.2: USB2.0-CRW Generic at usbus0, cfg=255 md=HOST spd=HIGH (480Mbps) 
pwr=ON

  bLength = 0x0012
  bDescriptorType = 0x0001
  bcdUSB = 0x0200
  bDeviceClass = 0x
  bDeviceSubClass = 0x
  bDeviceProtocol = 0x
  bMaxPacketSize0 = 0x0008
  idVendor = 0x0bda
  idProduct = 0x0119
  bcdDevice = 0x1981
  iManufacturer = 0x0001  retrieving string failed
  iProduct = 0x0002  retrieving string failed
  iSerialNumber = 0x0003  retrieving string failed
  bNumConfigurations = 0x0001

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://yomoli.com


pgpR8r24wP9lu.pgp
Description: PGP signature


Re: [PATCH] Default scrub interval to whole weeks

2011-10-27 Thread Pawel Jakub Dawidek
On Thu, Oct 27, 2011 at 12:02:22PM -0700, Xin LI wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA256
 
 Hi,
 
 I think it might be better if we set scrub interval to whole weeks
 (proposed changeset changes it to 5 weeks).
 
 The reason for this is to make it easier for system administrators to
 estimate when the scrub happens, for instance, if a scrub was done on
 Saturday, it always happen on Saturdays. [...]

Sounds reasonable.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://yomoli.com


pgpRonlwMWRTb.pgp
Description: PGP signature


Re: 9.0-RC1 panic in tcp_input: negative winow.

2011-10-27 Thread Pawel Jakub Dawidek
On Fri, Oct 28, 2011 at 11:29:34AM +1100, Lawrence Stewart wrote:
 On 10/26/11 22:53, John Baldwin wrote:
  The assertion would be triggered when the next packet arrives (as I said
  above).  Try modifying your debugging output to also log if the ACK is
  delayed.  I suspect it is not delayed until the last one.  (Pushing out an
  ACK will reset rcv_adv to be beyond rcv_nxt in tcp_output(), so in the case
  of an immediate ACK, rcv_nxt  rcv_adv is only a transient condition all
  under a single lock invocation so never visible to other consumers of the
  protocol control block.)  If that is what you see, then that confirms what
  I guessed above and I will likely just remove the assertion in tcp_input()
  and patch the timewait code to handle this case.
 
 
 Pawel, have you been able to confirm John's hypothesis? [...]

Yeah, sorry. I moved the debug to the points where we drop the t_inpcb
lock and I still see rcv_nxt being greater than rcv_adv:

tcp_do_segment:2970 negative window: tp 0xfe00685ee3d0 rcv_nxt 
1312878324 rcv_adv 1312878187

This is just before the INP_WUNLOCK(tp-t_inpcb) under 'check_delack'
label. I see this a lot (it was logged 545 times for 11 different tp
pointers during 24h period).

tcp_do_segment:3009 negative window: tp 0xfe005cfc6000 rcv_nxt 
1442546453 rcv_adv 1442545722

This is just before calling tcp_output(). This one was logged 65 times
for 3 different tp pointers.
I placed a debug also after tcp_output() call, but it is not logged, so
once we return from tcp_output() everything is fine.

The panic would be triggered 115 times for 5 different tp pointers
during that time.

I write 'tp pointers' as I'm not 100% sure if the same pointer always
represents the same connection or if it is reused.

 [...] What I don't 
 quite get is why we haven't had a lot more reports of this issue...

Maybe because my TCP/IP stack is heavly modified? ...not:)

No idea to be honest. Ask Ken to turn on INVARIANTS in 9.0-RC2 and we
will see:)

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://yomoli.com


pgpN4xRhoFGYE.pgp
Description: PGP signature


Re: 9.0-RC1 panic in tcp_input: negative winow.

2011-10-26 Thread Pawel Jakub Dawidek
On Mon, Oct 24, 2011 at 08:14:22AM -0400, John Baldwin wrote:
 On Sunday, October 23, 2011 11:58:28 am Pawel Jakub Dawidek wrote:
  On Sun, Oct 23, 2011 at 11:44:45AM +0300, Kostik Belousov wrote:
   On Sun, Oct 23, 2011 at 08:10:38AM +0200, Pawel Jakub Dawidek wrote:
My suggestion would be that if we won't be able to fix it before 9.0,
we should turn this assertion off, as the system seems to be able to
recover.
   
   Shipped kernels have all assertions turned off.
  
  Yes, I'm aware of that, but many people compile their production kernels
  with INVARIANTS/INVARIANT_SUPPORT to fail early instead of eg.
  corrupting data. I'd be fine in moving this under DIAGNOSTIC or changing
  it into a printf, so it will be visible.
 
 No, the kernel is corrupting things in other places when this is true, so
 if you are running with INVARIANTS, we want to know about it.   Specifically,
 in several places in TCP we assume that rcv_adv = rcv_nxt, and depend on
 being able to do 'rcv_adv - rcv_nxt'.
 
 In this case, it looks like the difference is consistently less than one 
 frame.  I suspect the other end of the connection is sending just beyond the 
 end of the advertised window (it probably assumes it is better to send a full 
 frame if it has that much pending data even though part of it is beyond the 
 window edge vs sending a truncated packet that just fills the window) and that
 that frame is accepted ok in the header prediction case and it's ACK is 
 delayed, but the next packet to arrive then trips over this assumption.
 
 Since 'win' is guaranteed to be non-negative and we explicitly cast
 'rcv_adv - rcv_nxt' to (int) in the following line that the assert is checking
 for:
 
   tp-rcv_wnd = imax(win, (int)(tp-rcv_adv - tp-rcv_nxt));
 
 I think we already handle this case ok and perhaps the assertion can just be
 removed?  Not sure if others feel that it warrants a comment to note that this
 is the case being handled.

I added debug to the places where rcv_adv and rcv_nxt are modified. Here
is what happens before the panic occurs:

tcp_do_segment:1722 negative window: tp 0xfe000dab1b70 rcv_nxt 4022361548 
rcv_adv 4022360100 diff -1448
tcp_do_segment:2847 negative window: tp 0xfe000dab1b70 rcv_nxt 4022362298 
rcv_adv 4022361548 diff -750
tcp_do_segment:1722 negative window: tp 0xfe000dab1b70 rcv_nxt 4022363746 
rcv_adv 4022362298 diff -1448
tcp_do_segment:2847 negative window: tp 0xfe000dab1b70 rcv_nxt 4022364836 
rcv_adv 4022363746 diff -1090
tcp_do_segment:1722 negative window: tp 0xfe000dab1b70 rcv_nxt 4022366284 
rcv_adv 4022364836 diff -1448
tcp_do_segment:1722 negative window: tp 0xfe000dab1b70 rcv_nxt 4022370628 
rcv_adv 4022369690 diff -938
tcp_do_segment:1722 negative window: tp 0xfe000dab1b70 rcv_nxt 4022379140 
rcv_adv 4022377692 diff -1448
tcp_do_segment:1722 negative window: tp 0xfe000dab1b70 rcv_nxt 4022387792 
rcv_adv 4022386344 diff -1448
tcp_do_segment:2847 negative window: tp 0xfe000dab1b70 rcv_nxt 4022388890 
rcv_adv 4022387792 diff -1098
tcp_do_segment:1722 negative window: tp 0xfe000dab1b70 rcv_nxt 4022390338 
rcv_adv 4022388890 diff -1448
tcp_do_segment:2847 negative window: tp 0xfe000dab1b70 rcv_nxt 4022394563 
rcv_adv 4022394342 diff -221
panic: tcp_input negative window: tp 0xfe000dab1b70 rcv_nxt 4022394563 
rcv_adv 4022394342 win=0 diff -221

I can send you the full log if you want, I've plenty of messages where
rcv_adv  rcv_nxt, not all of them trigger this assertion.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://yomoli.com


pgpjTSToeUp9v.pgp
Description: PGP signature


Re: Problems with ada0: Previously was known as ad6 functionality

2011-10-26 Thread Pawel Jakub Dawidek
On Tue, Oct 25, 2011 at 05:01:31PM +0200, Jason Edwards wrote:
 On Tue, Oct 25, 2011 at 4:35 PM, Nikolay Denev nde...@gmail.com wrote:
  I was struck with this problem too yesterday.
  I was able to import the pool by specifying the directory where zpool 
  should look for devices.
  Like : zpool import -d /dev/gptid
  Alternatively you can try putting kern.cam.ada.legacy_aliases=0 in your 
  /boot/loader.conf
 
 Hello Nikolay,
 
 Thanks for your reply!
 
 The kern.cam.ada.legacy_aliases=0 indeed makes the /dev/ad6 device go
 away. However, unlike I suspected, it did not solve the zpool import
 problem. It still lists my pool as corrupted. If I use the zpool
 import -d /dev/gpt as you suggested, the pool displays as ONLINE and I
 can import it just fine. Using -d /dev works too, so I guess this
 works fine as a workaround.

Hmm, that's really strange. 'zpool import -d /dev pool' works, but
'zpool import pool' doesn't?

Does, eg. 'geom disk status' work or returns an error?

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://yomoli.com


pgpdz8xMIULHW.pgp
Description: PGP signature


Re: 9.0-RC1 panic in tcp_input: negative winow.

2011-10-23 Thread Pawel Jakub Dawidek
On Sun, Oct 23, 2011 at 12:35:15PM +1100, Lawrence Stewart wrote:
 On 10/22/11 19:49, Pawel Jakub Dawidek wrote:
  The panic message says:
 
  panic: tcp_input negative window: tp 0xfe007763e000 rcv_nxt 
  3718269252 rcv_adv 3718268291
 
  I only have picture of the backtrace:
 
  http://people.freebsd.org/~pjd/misc/panic_negative_window.jpg
 
 
 ewww that is not good. Can you give us any more information about the 
 machine and what it's doing? Is it terminating TCP connections from the 
 internet at large or only local LAN (i.e. is there likely to be packet 
 loss happening)? Are you doing TSO or LRO? Do you have any non-default 
 tuning in place?

It is my local file server. It is doing NFS and AFP over LAN and also
downloads files from the internet. It is triggered after few hours.
I changed the KASSERT() into printf() and added printing 'win' variable
and this is what got logged during the night:

05:16:24 tcp_input negative window: tp 0xfe0026772b70 rcv_nxt 1107827269 
rcv_adv 1107826256 win=242
05:16:29 tcp_input negative window: tp 0xfe0026772b70 rcv_nxt 1107833451 
rcv_adv 1107832977 win=880
05:16:41 tcp_input negative window: tp 0xfe0026772b70 rcv_nxt 1107849563 
rcv_adv 1107848860 win=639
05:20:02 tcp_input negative window: tp 0xfe0026772b70 rcv_nxt 1108108230 
rcv_adv 1108107331 win=567
05:24:30 tcp_input negative window: tp 0xfe0026772b70 rcv_nxt 1108433302 
rcv_adv 1108432272 win=974
05:24:46 tcp_input negative window: tp 0xfe0026772b70 rcv_nxt 1108450385 
rcv_adv 1108450060 win=751
05:26:44 tcp_input negative window: tp 0xfe0026772b70 rcv_nxt 1108574818 
rcv_adv 1108573851 win=71
05:28:03 tcp_input negative window: tp 0xfe0026772b70 rcv_nxt 1108654103 
rcv_adv 1108653166 win=0
05:28:43 tcp_input negative window: tp 0xfe0026772b70 rcv_nxt 1108692396 
rcv_adv 1108691451 win=0
05:30:06 tcp_input negative window: tp 0xfe0026772b70 rcv_nxt 1108781258 
rcv_adv 1108780372 win=235
05:35:05 tcp_input negative window: tp 0xfe0026772b70 rcv_nxt 1109067578 
rcv_adv 1109067335 win=663
05:37:03 tcp_input negative window: tp 0xfe0026772b70 rcv_nxt 1109180403 
rcv_adv 1109179411 win=0
05:41:03 tcp_input negative window: tp 0xfe0026772b70 rcv_nxt 1109428265 
rcv_adv 1109427375 win=170

And the systems seems to be fine.

I'm happy to test patches, but one round would take 24h.

My suggestion would be that if we won't be able to fix it before 9.0,
we should turn this assertion off, as the system seems to be able to
recover.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://yomoli.com


pgpcUNlslP3cT.pgp
Description: PGP signature


Re: 9.0-RC1 panic in tcp_input: negative winow.

2011-10-23 Thread Pawel Jakub Dawidek
On Sun, Oct 23, 2011 at 11:44:45AM +0300, Kostik Belousov wrote:
 On Sun, Oct 23, 2011 at 08:10:38AM +0200, Pawel Jakub Dawidek wrote:
  My suggestion would be that if we won't be able to fix it before 9.0,
  we should turn this assertion off, as the system seems to be able to
  recover.
 
 Shipped kernels have all assertions turned off.

Yes, I'm aware of that, but many people compile their production kernels
with INVARIANTS/INVARIANT_SUPPORT to fail early instead of eg.
corrupting data. I'd be fine in moving this under DIAGNOSTIC or changing
it into a printf, so it will be visible.

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://yomoli.com


pgpe47qzpCChn.pgp
Description: PGP signature


9.0-RC1 panic in tcp_input: negative winow.

2011-10-22 Thread Pawel Jakub Dawidek
The panic message says:

panic: tcp_input negative window: tp 0xfe007763e000 rcv_nxt 
3718269252 rcv_adv 3718268291

I only have picture of the backtrace:

http://people.freebsd.org/~pjd/misc/panic_negative_window.jpg

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://yomoli.com


pgpo3XZOvEVzL.pgp
Description: PGP signature


Re: incorrect use of pidfile(3)

2011-10-14 Thread Pawel Jakub Dawidek
On Thu, Oct 13, 2011 at 04:11:40PM +0200, Dag-Erling Smørgrav wrote:
 Pawel Jakub Dawidek p...@freebsd.org writes:
  I'm still in opinion that EWOULDBLOCK and EAGAIN (which is the same
  value on FreeBSD) should be converted to EEXIST on pidfile_open()
  return.
 
 The historical (and documented) behavior is to return EAGAIN.

We don't want to duplicate the code of handling EAGAIN into every single
pidfile(3) consumer. This is why we hav pidfile(3) API in the first
place - to make it easy for people to use.

  Also if we now have for loop, why not to put count in there?
 
 Because if we do, there will be a nanosleep after the last
 pidfile_read() attempt.  We need to break the loop after pidfile_read()
 failed but before nanosleep().

Right, ok.

  I'm not very happy about touching pidptr in case of error other than
  EEXIST. This is not documented, but a bit unexpected anyway.
 
 Well, it was your idea, I just moved it to before the loop :)

In my patch *pidptr was set to -1 only in the case of EAGAIN from
pidfile_read(), not for every other error.

BTW. With your patch we will continue even when flopen(3) failed for
other reasons, instead of returning NULL. Checking for fd being -1
should not be done in the same statement with other checks.

After proposed changes it would look like this, what do you think?

http://people.freebsd.org/~pjd/patches/pidfile.3.patch

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://yomoli.com


pgpNGouMOHFTt.pgp
Description: PGP signature


Re: incorrect use of pidfile(3)

2011-10-13 Thread Pawel Jakub Dawidek
On Thu, Oct 13, 2011 at 12:54:38PM +0200, Dag-Erling Smørgrav wrote:
 I looked at some of the programs that use pidfile(3) in base, and they
 pretty much all get it wrong.  Consider these two scenarios:
 
 1) common case
 
 process A   process B
 
 main()
   pidfile_open() - success
   perform_initialization()
   daemon()
 pidfile_write() - success
 perform_work()  main()
   pidfile_open() - EEXIST
   exit()
 
 2) very unlikely but still possible case
 
 process A   process B
 
 main()
   pidfile_open() - success main()
   perform_initialization()pidfile_open() - EAGAIN
   daemon()perform_initialization()
 pidfile_write() - successdaemon()
 perform_work()  perform_work()
 
 The problem is that most of them (at least the ones I checked) ignore a
 pidfile_open() failure unless errno == EEXIST.
 
 How do we fix this?  My suggestion is to loop until pidfile_open()
 succeeds or errno != EAGAIN.  Does anyone have any objections to that
 approach?

I think we already do that internally in pidfile_open(). Can you take a look at
the source and confirm that this is what you mean?

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://yomoli.com


pgp8t7uJHg24p.pgp
Description: PGP signature


Re: incorrect use of pidfile(3)

2011-10-13 Thread Pawel Jakub Dawidek
On Thu, Oct 13, 2011 at 02:54:16PM +0200, Dag-Erling Smørgrav wrote:
 After discussing this with pjd@ on IRC, I arrived at the attached patch,
 which increases the length of time pidfile_open() itself waits (I hadn't
 noticed that it already looped) and sets *pidptr to -1 if it fails to read
 a pid.

I'm still in opinion that EWOULDBLOCK and EAGAIN (which is the same
value on FreeBSD) should be converted to EEXIST on pidfile_open()
return.

Also if we now have for loop, why not to put count in there?

I'm not very happy about touching pidptr in case of error other than
EEXIST. This is not documented, but a bit unexpected anyway.

I'm happy with increasing the timeout.

 Index: lib/libutil/pidfile.c
 ===
 --- lib/libutil/pidfile.c (revision 226271)
 +++ lib/libutil/pidfile.c (working copy)
 @@ -118,22 +118,19 @@
*/
   fd = flopen(pfh-pf_path,
   O_WRONLY | O_CREAT | O_TRUNC | O_NONBLOCK, mode);
 - if (fd == -1) {
 - count = 0;
 + if (fd == -1  errno == EWOULDBLOCK  pidptr != NULL) {
 + *pidptr = -1;
 + count = 20;
   rqtp.tv_sec = 0;
   rqtp.tv_nsec = 500;
 - if (errno == EWOULDBLOCK  pidptr != NULL) {
 - again:
 + for (;;) {
   errno = pidfile_read(pfh-pf_path, pidptr);
 - if (errno == 0)
 - errno = EEXIST;
 - else if (errno == EAGAIN) {
 - if (++count = 3) {
 - nanosleep(rqtp, 0);
 - goto again;
 - }
 - }
 + if (errno != EAGAIN || --count == 0)
 + break;
 + nanosleep(rqtp, 0);
   }
 + if (errno == 0)
 + errno = EEXIST;
   free(pfh);
   return (NULL);
   }

-- 
Pawel Jakub Dawidek   http://www.wheelsystems.com
FreeBSD committer http://www.FreeBSD.org
Am I Evil? Yes, I Am! http://yomoli.com


pgpIlSX6pcKvL.pgp
Description: PGP signature


  1   2   >