from:"Peter Jeremy"

Re: Possible PEBKAC bug for fwget(8)?

2023-07-07 Thread Peter Jeremy

On 2023-Jul-07 08:03:40 +0100, Graham Perrin  wrote:
>PCI pictured at
><https://en.wikipedia.org/wiki/Peripheral_Component_Interconnect>, somehow I
>don't imagine finding that type of slot inside the HP EliteBook where I ran
>the command ;-)

Whilst you probably don't have a full-size PCI or PCIe connector in
your laptop, it's very likely that it has a Mini PCIe connector for
the WiFi adapter.  Even without that, there are virtual PCI buses
inside your CPU chip - have a look at the output of "pciconf -lv".

-- 
Peter Jeremy

signature.asc
Description: PGP signature

ntpd fails on recent -current/arm64

2023-04-23 Thread Peter Jeremy

Somewhere between c283016-g607bc91d90a3 and c283077-g7f658f99f7ed,
some change in the kernel has made ntpd stop working on my arm64 test
box.  (My amd64 test box is a couple of days behind so I'm not sure if
it's arm-specific).

What I've identified so far:
* The problem is in the kernel, not userland.
* The impact seems to be limited to ntpd (in particular, ntpdate works).
* ntpd appears to be correctly exchanging NTP packets with peers.
* ntpd is not responding to "ntpq -p" queries
* ntp_gettime and ntp_adjtime both return TIME_ERROR to ntptime

I've looked through the commits and, beyond much of netinet being
roto-tilled, I can't see anything obvious.

Is anyone else seeing anything similar?  Can anyone suggest where
to look next?

-- 
Peter Jeremy


signature.asc
Description: PGP signature

Re: Beadm can't create snapshot

2022-08-23 Thread Peter Jeremy

On 2022-Aug-23 15:19:34 +0200, Ronald Klop  wrote:
>Van: Kyle Evans 
>> I was not aware that beadm touches loader.conf, but I find that
>> slightly horrifying. I won't personally make bectl do that, but I
>> guess I could at least document that it doesn't...
>
>Today I looked up something for boot environments myself and read this: 
>https://wiki.freebsd.org/BootEnvironments#Setting_Boot_Dataset
>
>"In order for boot environments to be effective, you must let the bootfs zpool 
>property control which dataset gets mounted as the root. Particularly, 
>/etc/fstab must be purged of any / mount, and /boot/loader.conf must not be 
>setting vfs.root.mountfrom directly. "
>
>So it is documented somewhere at least.

Looking at the wiki history, Kyle wrote that in January 2020.  I
wonder if he recalls where that requirement came from.

I've gone rummaging through the mailing list history and other wiki
pages.  It seems that vfs.root.mountfrom used to be required - e.g.
 https://lists.freebsd.org/pipermail/freebsd-fs/2011-September/012482.html
 https://lists.freebsd.org/pipermail/svn-src-head/2011-October/030641.html
and people wanted to change that - e.g.
 https://lists.freebsd.org/pipermail/freebsd-current/2009-October/012933.html
 https://lists.freebsd.org/pipermail/freebsd-fs/2010-March/008010.html
resulting in it becoming optional in May 2012:
 https://lists.freebsd.org/pipermail/svn-src-head/2012-May/036902.html

Based on the quoted wiki entry, it seems that sometime between May
2012 and January 2020, vfs.root.mountfrom went from "must be set" to
"must not be set" and I can't find anywhere where that is publicised.
This is a serious problem because we now have the situation where
some documentation still says to set vfs.root.mountfrom - e.g.
 https://wiki.freebsd.org/RootOnZFS/GPTZFSBoot/Mirror step 2.6
and people are still using it without being warned that it shouldn't
be used - e.g. the thread starting
 https://lists.freebsd.org/pipermail/freebsd-fs/2020-July/028351.html

I've had a look at the beadm source and it preserves/updates
vfs.root.mountfrom if it's present in loader.conf but doesn't add it
if it's not present.

IMO, if bectl isn't going to update loader.conf, it needs to warn and
fail if loader.conf contains a vfs.root.mountfrom that points to a
BE that's different to bootfs.  (And ideally, a similar check of
/etc/fstab, though beadm doesn't touch that).

-- 
Peter Jeremy

signature.asc
Description: PGP signature

Re: Beadm can't create snapshot

2022-08-22 Thread Peter Jeremy

On 2022-Aug-22 10:56:51 +0200, "Patrick M. Hausen"  wrote:
>> Am 22.08.2022 um 10:45 schrieb Peter Jeremy :
>> On 2022-Aug-17 18:07:20 +0200, "Patrick M. Hausen"  wrote:
>>> Isn't beadm retired in favour of bectl?
>> 
>> 2) "bectl activate" doesn't update /boot/loader.conf so the wrong
>>   root filesystem is mounted.
>
>You mean the vfs.root.mountfrom option? I thought that, too, was deprecated and
>replaced by the bootfs property of the zpool.

I've looking through mailing list archives and searched the 'net and
haven't found anything saying vfs.root.mountfrom is deprecated.
loader(8) mentions that it will fallback to using "currdev" if there's
no root entry in /etc/fstab and vfs.root.mountfrom isn't set.

At the very least, it's an undocumented incompatibility between beadm
and bectl: I can't take an existing system that's using beadm and just
switch to using bectl.

-- 
Peter Jeremy

signature.asc
Description: PGP signature

Re: Beadm can't create snapshot

2022-08-22 Thread Peter Jeremy

On 2022-Aug-17 18:07:20 +0200, "Patrick M. Hausen"  wrote:
>Isn't beadm retired in favour of bectl?

bectl still has a number of bugs:
1) The output from "bectl list" is in filesystem/bename order rather
   than creation date order.  This is an issue if you use (eg) git
   commit hashes as the name.
2) "bectl activate" doesn't update /boot/loader.conf so the wrong
   root filesystem is mounted.

That said "bectl create" appears to be a workable replacement for
"beadm create" and avoids the current "'snapshots_changed' is
readonly" bugs.

-- 
Peter Jeremy


signature.asc
Description: PGP signature

Re: recover deleted file

2022-04-16 Thread Peter Jeremy

On 2022-Apr-17 01:13:02 +0300, Sami Halabi  wrote:
>I understand its hard to undelete since no one designed UFS/ZFS to do so..
>that why I asked in later replies to see if someone would step in and
>implement such a "feature" and I suggested some directions/thoughts.

As you point out, neither UFS nor ZFS were designed to support an
"undelete" function: Once an inode has no references (open files
or directory entries), the inode and all associated data blocks are
returned to the free list and could be used by a subsequent allocation.

What semantics would you like UFS or ZFS to implement instead?  Is it
just that the inode and associated data blocks should stay in limbo
for some period?  If, what controls the period?  What if a file is
truncated to 0 or overwritten before being unlinked?  How much would
you be willing to pay for "undelete" functionality?

>As soren@ suggested in later reply it maybe would be easier to implement
>custom rm script that moves files to "Recycle bin" directory (and empty it
>after some period)

Alternatively, you could alias "rm" to "rm -i".

>but as a programmer I know that perfection is needed :)
>so It might start as a simple task and end in many what-if's
>(unfortunattly I did my last C programming in late 2003!).

This doesn't need to be C.  You could do this in your scripting
language of choice.  Or you could offer to pay someone to do this
for you.

>What amzes me is that this "feature" was asked too much in the last decade
>or two and no one ever implemented it, maybe it's not needed in daily
>usage, but in disasters it would be super userful, save admins many time
>and nerves..

I went rummaging back through my mail archives and it actually doesn't
seem to come up that often.  You seem to be about the 3rd person this
century on the lists I read.  I did find a discussion in zfs-discuss
from May/June 2006 about supporting undelete but it seems that no
agreement on the desired behaviour was achieved.

>For now I did some backup tools locally and used chflags to mark them
>undeletable so I wouldn't do that mistake again,

You could also consider snapshots - both UFS and ZFS support snapshots.

If the information is very critical (you mentioned legal consequences)
then you might like to consider real-time replication of the MySQL redo
logs to another systems - though that won't necessarily protect you
from someone accidently doing a "DELETE FROM xxx;" or "DROP TABLE xxx;"

-- 
Peter Jeremy

signature.asc
Description: PGP signature

Re: Rock64 configuration fails to boot for main 22c4ab6cb015 but worked for main 06bd74e1e39c (Nov 21): e.MMC mishandled?

2021-12-08 Thread Peter Jeremy

On 2021-Dec-09 08:19:30 +0100, Emmanuel Vadot  wrote:
>
> Hi Mark,
>
>On Wed, 8 Dec 2021 20:36:20 -0800
>Mark Millard via freebsd-current  wrote:
>
>> [ Note: w...@freebsd.org is only a guess, based on:
>> https://lists.freebsd.org/archives/dev-commits-src-main/2021-December/001931.html
>>  ]
>> 
>> Attempting to update to:
>> 
>> main-n251456-22c4ab6cb015-dirty: Tue Dec  7 19:38:53 PST 2021
>> 
>> resulted in boot failure (showing some boot -v output):
[hang just before root is mounted]
> Could you try reverting 
>8661e085fb953855dbc7059f21a64a05ae61b22c "mmc: Fix HS200/HS400
>capability check" and let me know ?

I had exactly the same boot failure but was still working backwards
through the root mount code trying to isolate the issue.  Reverting
8661e085fb953855dbc7059f21a64a05ae61b22c solves the problem for me.
I'd noticed the mmc1 difference and mmcsd1 error:
 mmc1:  bus: 8bit, 200MHz (HS200 timing)
 mmc1:  memory: 30310400 blocks, erase sector 1024 blocks
mmc1: setting transfer rate to 150.000MHz (HS200 timing)

bud I didn't think it was the cause.

I had tracked down that the hang was somewhere between
https://cgit.freebsd.org/src/tree/sys/kern/vfs_mountroot.c#n779 and
https://cgit.freebsd.org/src/tree/sys/kern/vfs_mountroot.c#n1008
which led me to suspect that the problem might be in the geom
layer (eg g_waitidle()) but was still considering where to add
my next tranche of printf's when I saw Mark's mail.

-- 
Peter Jeremy

signature.asc
Description: PGP signature

Re: Install to ZFS root is using device names hence failing when device tree is changed.

2021-09-07 Thread Peter Jeremy

On 2021-Sep-06 17:45:31 +0200, Karel Gardas  wrote:
>just installed 14-current snapshot from 2.9. on uefi amd64 machine. 
>Installed from USB memstick which was detected as da0 into the ssd 
>hanging on usb3 in external enclosure which was detected as da1.
>
>ZFS root pool is then using /dev/da1p3 as swap and /dev/da1p1 as 
>/boot/efi and probably also something as root zpool.
>
>Anyway, expected thing happen. When I pulled out USB stick identified as 
>da0 on reboot, the drive on USB3 switch from da1 to da0 and result is 
>unbootable system with complains about various /dev/da1xx drives missing 
>for swap efi boot etc.

Can you give more details about exactly what the errors and when they
occur during the boot cycle.  In particular:
* Low-level boot (anything prior to the FreeBSD kernel) knows nothing
  about da0 or da1, so any problems there are associated with your
  BIOS config, not FreeBSD.
* The swap partition will, by default, appear as a hard-wired device
  name in /etc/fstab - that will definitely need updating.  This will
  prevent the "swapon" working but won't prevent the boot.
* ZFS doesn't care about device names - it looks for ZFS labels on all
  possible devices.

-- 
Peter Jeremy

signature.asc
Description: PGP signature

Re: Files in /etc containing empty VCSId header

2021-06-09 Thread Peter Jeremy via freebsd-current

On 2021-Jun-08 17:13:45 -0600, Ian Lepore  wrote:
>On Tue, 2021-06-08 at 15:11 -0700, Rodney W. Grimes wrote:
>> There is a command for that which does or use to do a pretty
>> decent job of it called whereis(1).

Thanks.  That looks useful.

>revolution > whereis ntp.conf
>ntp.conf:
>revolution > whereis netif
>netif:
>revolution > whereis services
>services:
>
>So how does that help me locate the origin of these files in the source
>tree?

It works for me™:
server% whereis ntp.conf
ntp.conf: /usr/src/usr.sbin/ntp/ntpd/ntp.conf
server% whereis netif   
netif: /usr/src/libexec/rc/rc.d/netif
server% whereis services
services: /usr/src/contrib/unbound/services

Is your source tree somewhere other than /usr/src?

-- 
Peter Jeremy


signature.asc
Description: PGP signature

Re: geli broken in 13.0-BETA4 and later on armv8

2021-03-06 Thread Peter Jeremy via freebsd-current

On 2021-Mar-06 10:39:02 -0800, Oleksandr Tymoshenko  wrote:
>Peter Jeremy via freebsd-current (freebsd-current@freebsd.org) wrote:
>> [Adding arm@ and making it clearer that this is armv8-only]
>> 
>> On 2021-Mar-06 20:26:19 +1100, Peter Jeremy  
>> wrote:
>> >On 2021-Mar-06 19:18:37 +1100, Peter Jeremy via freebsd-stable 
>> > wrote:
>> >>Somewhere between 13.0-ALPHA2 (c256201-g02611ef8ee9) and 13.0-BETA4
>> >>(releng/13.0-n244592-e32bc253629), geli (at least on my RockPro64 -
>> >>RK3399, arm64) has changed so that a geli-encrypted partition (using
>> >>AES-XTS 128) that was readable on 13.0-ALPHA2 becomes garbage on
>> >>13.0-BETA4.
>> >
>> >I've confirmed that the problem is f76393a6305b - reverting that
>> >commit fixes the problem in releng/13.0.
>> >
>> >I've further verified that the bug is still present in main (14.x)
>> >at 028616d0dd69.
>
>Could you test this patch and let me know if it fixes the issue?
>
>https://people.freebsd.org/~gonzo/patches/armv8crypto-xts-fix.diff

Yes, it does.  Thank you very much.

--- 
Peter Jeremy


signature.asc
Description: PGP signature

Re: geli broken in 13.0-BETA4 and later on armv8

2021-03-06 Thread Peter Jeremy via freebsd-current

[Adding arm@ and making it clearer that this is armv8-only]

On 2021-Mar-06 20:26:19 +1100, Peter Jeremy  wrote:
>On 2021-Mar-06 19:18:37 +1100, Peter Jeremy via freebsd-stable 
> wrote:
>>Somewhere between 13.0-ALPHA2 (c256201-g02611ef8ee9) and 13.0-BETA4
>>(releng/13.0-n244592-e32bc253629), geli (at least on my RockPro64 -
>>RK3399, arm64) has changed so that a geli-encrypted partition (using
>>AES-XTS 128) that was readable on 13.0-ALPHA2 becomes garbage on
>>13.0-BETA4.
>
>I've confirmed that the problem is f76393a6305b - reverting that
>commit fixes the problem in releng/13.0.
>
>I've further verified that the bug is still present in main (14.x)
>at 028616d0dd69.

-- 
Peter Jeremy


signature.asc
Description: PGP signature

Re: geli broken in 13.0-BETA4 and later

2021-03-06 Thread Peter Jeremy via freebsd-current

On 2021-Mar-06 19:18:37 +1100, Peter Jeremy via freebsd-stable 
 wrote:
>Somewhere between 13.0-ALPHA2 (c256201-g02611ef8ee9) and 13.0-BETA4
>(releng/13.0-n244592-e32bc253629), geli (at least on my RockPro64 -
>RK3399, arm64) has changed so that a geli-encrypted partition (using
>AES-XTS 128) that was readable on 13.0-ALPHA2 becomes garbage on
>13.0-BETA4.

I've confirmed that the problem is f76393a6305b - reverting that
commit fixes the problem in releng/13.0.

I've further verified that the bug is still present in main (14.x)
at 028616d0dd69.

-- 
Peter Jeremy

signature.asc
Description: PGP signature

Re: New Xorg - different key-codes

2020-03-11 Thread Peter Jeremy

On 2020-Mar-11 10:29:08 +0100, Niclas Zeising  wrote:
>This has to do with switching to using evdev to handle input devices on 
>FreeBSD 12 and CURRENT.  There's been several reports, and suggested 
>solutions to this, as well as an UPDATING entry detailing the change.

The UPDATING entry says that it's switched from devd to udev.  There's no
mention of evdev or that the keycodes have been roto-tilled.  It's basically
a vanilla "things have been changed, see the documentation" entry.  Given
that entry, it's hardly surprising that people are confused.

-- 
Peter Jeremy

signature.asc
Description: PGP signature

Re: System clock is slow

2020-03-09 Thread Peter Jeremy

On 2020-Mar-09 19:59:09 -0400, Theron  wrote:
>Since switching from 12.1-RELEASE to CURRENT I've noticed timing 
>problems with audio applications.  It turns out that the problem is not 
>with the audio drivers, but with the system clock driver, which now 
>reports passage of time 0.3% too slow.  Although I discovered this only 
>recently, it's been broken since r352684 made on Sept. 25.  Has anyone 
>else noticed?

Note that r352684 was MFC'd to both 11-stable (r353007) and 12-stable
(r353006) in early October and I don't recall seeing any adverse
reports before this.

Are you running NTP?  If so, is NTP maintaining lock and what is the
reported PLL frequency (ntpq -c kerni)?

What does "sysctl kern.timecounter" report and have you tried using
any of the alternative timecounters listed in kern.timecounter.choice?

Are you overclocking your CPU (or doing anything else non-standard)?

-- 
Peter Jeremy

signature.asc
Description: PGP signature

Re: Which AMD CPUs are supported -- temperature

2020-02-16 Thread Peter Jeremy

On 2020-Feb-13 13:27:17 -0800, Chris  wrote:
>My BIOS appears to have the correct temp reading. Would it be of any use
>to anyone besides myself, if I were to decompile it, and get the source
>for the temp reading/monitoring from it?

I would definitely like to have this information.  If you are able to
share the two constants (both step size and reference temperature), that
would be great.

-- 
Peter Jeremy

signature.asc
Description: PGP signature

Re: Which AMD CPUs are supported -- temperature

2020-02-12 Thread Peter Jeremy

On 2020-Feb-12 15:23:51 -0500, mike tancsa  wrote:
>Not sure about the older Athlon CPUs, but the 2 generations of Ryzen's I
>have seem correct as well as an APU
>
>CPU: AMD GX-412TC SOC    (998.17-MHz K8-class CPU)

OTOH, I'm not confident about temperatures on my APU.  The publicly
available data just says that the SoC reports "a temperature on its own
scale" relative to a Tctl_max which "is specified in the power and thermal
data sheet" (that I have been unable to locate).  Everyone seems to assume
that the step size is 0.125K but I haven't found that publicly documented
anywhere.  The AMD Product Brief states that the maximum temperature is
90°C but using that as Tctl_max gives me temperature readings that don't
look right.

>And on a fanless APU
>
># sysctl -a dev.cpu.0.temperature
>dev.cpu.0.temperature: 62.6C
>
># sysctl -a dev.amdtemp.0.core0.sensor0
>dev.amdtemp.0.core0.sensor0: 63.1C

At what ambient temperature?  I see a similar value from my (idle) APU3
but don't believe the (implied) ~35K junction-to-ambient difference.

-- 
Peter Jeremy

signature.asc
Description: PGP signature

Re: head -r356066 reaching kern.ipc.nmbclusters on Rock64 (CortexA53 with 4GiByte of RAM) while putting files on it via nfs: some evidence

2020-01-04 Thread Peter Jeremy

Sorry for the delay in responding.

On 2019-Dec-27 21:59:49 -0800, Mark Millard via freebsd-arm 
 wrote:
>The following sort of sequence leads to the Rock64 not
>responding on the console or over ethernet, after notifying
>of nmbclusters having been reached. (This limits what
>information I have of what things were like at the end.)

There's a bug in the dwc(4) driver such that it can leak mbuf clusters.
I've been running with the following patch but need to clean it up
samewhat before I can commit it:

Index: sys/dev/dwc/if_dwc.c
===
--- sys/dev/dwc/if_dwc.c(revision 356350)
+++ sys/dev/dwc/if_dwc.c(working copy)
@@ -755,7 +755,6 @@
 dwc_rxfinish_locked(struct dwc_softc *sc)
 {
struct ifnet *ifp;
-   struct mbuf *m0;
struct mbuf *m;
int error, idx, len;
uint32_t rdes0;
@@ -762,9 +761,8 @@
 
ifp = sc->ifp;
 
-   for (;;) {
+   for (; ; sc->rx_idx = next_rxidx(sc, sc->rx_idx)) {
idx = sc->rx_idx;
-
rdes0 = sc->rxdesc_ring[idx].tdes0;
if ((rdes0 & DDESC_RDES0_OWN) != 0)
break;
@@ -773,9 +771,9 @@
BUS_DMASYNC_POSTREAD);
bus_dmamap_unload(sc->rxbuf_tag, sc->rxbuf_map[idx].map);
 
+   m = sc->rxbuf_map[idx].mbuf;
len = (rdes0 >> DDESC_RDES0_FL_SHIFT) & DDESC_RDES0_FL_MASK;
if (len != 0) {
-   m = sc->rxbuf_map[idx].mbuf;
m->m_pkthdr.rcvif = ifp;
m->m_pkthdr.len = len;
m->m_len = len;
@@ -784,24 +782,33 @@
/* Remove trailing FCS */
m_adj(m, -ETHER_CRC_LEN);
 
+   /* Consume the mbuf and mark it as consumed */
+   sc->rxbuf_map[idx].mbuf = NULL;
DWC_UNLOCK(sc);
(*ifp->if_input)(ifp, m);
DWC_LOCK(sc);
+   m = NULL;
} else {
/* XXX Zero-length packet ? */
}
 
-   if ((m0 = dwc_alloc_mbufcl(sc)) != NULL) {
-   if ((error = dwc_setup_rxbuf(sc, idx, m0)) != 0) {
-   /*
-* XXX Now what?
-* We've got a hole in the rx ring.
-*/
+   if (m == NULL) {
+   if ((m = dwc_alloc_mbufcl(sc)) == NULL) {
+   if_inc_counter(sc->ifp, IFCOUNTER_IQDROPS, 1);
+   continue;
}
-   } else
+   }
+
+   if ((error = dwc_setup_rxbuf(sc, idx, m)) != 0) {
+   m_free(m);
+   device_printf(sc->dev,
+   "dwc_setup_rxbuf returned %d\n", error);
if_inc_counter(sc->ifp, IFCOUNTER_IQDROPS, 1);
-
-   sc->rx_idx = next_rxidx(sc, sc->rx_idx);
+   /*
+* XXX Now what?
+        * We've got a hole in the rx ring.
+*/
+   }
}
 }

-- 
Peter Jeremy


signature.asc
Description: PGP signature

buildworld has mandatory dependency on optional executable.

2019-10-30 Thread Peter Jeremy

I've just discovered that "make buildworld" has a mandatory dependency
on kbdcontrol (see
https://svnweb.freebsd.org/base/head/Makefile.inc1?annotate=354138#l2207 )
but, if WITHOUT_LEGACY_CONSOLE is defined then kbdcontrol isn't built
(https://svnweb.freebsd.org/base/head/usr.sbin/Makefile?annotate=352949#l162 )
and the installed version will be deleted by "make delete-old":
https://svnweb.freebsd.org/base/head/tools/build/mk/OptionalObsoleteFiles.inc?annotate=353358#l4520

This seems undesirable...

The "make buildworld" failure doesn't make the cause obvious - it just
reports "*** Error code 1" in bootstrap-tools.  Having trace the failure,
I now see ".ERROR_TARGET='_bootstrap-tools-link-kbdcontrol'" but that was
only obvious in hindsight.

-- 
Peter Jeremy


signature.asc
Description: PGP signature

Re: Reproducable deadlock in NFS client

2019-10-03 Thread Peter Jeremy

On 2019-Oct-03 23:28:07 +, Rick Macklem  wrote:
>1 - kib@ just put a patch up on phabricator that reorganizes the handling
>  of vnode_pager_setsize().
>  D21883
>  (If you could test this patch, that might be the best approach.)

That fixes my problem.  I've added a note to D21883

>ps: Btw, capturing "procstat -kk" and "ps axHl" would give you/us more info.
> (The "H" on "ps" shows the iod threads.)
>  If you can drop into the debugger when it is hung as above, you could
>  capture the stuff listed here:
>https://www.freebsd.org/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug-deadlocks.html

Thanks for the pointer and sorry for leaving that out.

-- 
Peter Jeremy


signature.asc
Description: PGP signature

Reproduceable deadlock in NFS Client

2019-10-03 Thread Peter Jeremy

My diskless Rock64 has taken to deadlocking reproduceably whilst
building libprivatesqlite3.a as part of buildworld when running
r352792.  At the time of the deadlock, the relevant running process
is:
ar -crD libprivatesqlite3.a sqlite3.o

And those files are:
-rw-r--r--1 root  wheel  3178496  4 Oct 01:10 libprivatesqlite3.a
-rw-r--r--1 root  wheel  7975272  4 Oct 01:10 sqlite3.o

The "ar" reports it's in bo_wwait and, after about 30 minutes, I get:
deadlres_td_sleep_q: possible deadlock detected for 0xfd00012c9560, blocked 
for 1800613 ticks

cpuid = 2
time = 1570117920
KDB: stack backtrace:
db_trace_self() at db_trace_self_wrapper+0x28
 pc = 0x0054b83c  lr = 0x000e2b08
 sp = 0x4030a790  fp = 0x4030a9a0

db_trace_self_wrapper() at vpanic+0x18c
 pc = 0x000e2b08  lr = 0x0027fb54
 sp = 0x4030a9b0  fp = 0x4030aa50

vpanic() at panic+0x44
 pc = 0x0027fb54  lr = 0x0027f904
 sp = 0x4030aa60  fp = 0x4030aae0

panic() at deadlkres+0x33c
 pc = 0x0027f904  lr = 0x0021c19c
 sp = 0x4030aaf0  fp = 0x4030ab50

deadlkres() at fork_exit+0x7c
 pc = 0x0021c19c  lr = 0x002404f4
 sp = 0x4030ab60  fp = 0x4030ab90

fork_exit() at fork_trampoline+0x10
 pc = 0x002404f4  lr = 0x0056743c
 sp = 0x4030aba0  fp = 0x0000


-- 
Peter Jeremy


signature.asc
Description: PGP signature

Re: panic: sleeping thread on r352386

2019-09-18 Thread Peter Jeremy

On 2019-Sep-17 15:24:30 +0300, Konstantin Belousov  wrote:
>Try this.
>
>diff --git a/sys/fs/nfsclient/nfs_clport.c b/sys/fs/nfsclient/nfs_clport.c
>index 63ea4736707..a23b4ba4efa 100644

Sorry for the delay but I'm not seeing problems with this version of
your patch (now r352457) either.  Thank you for your efforts.

-- 
Peter Jeremy

signature.asc
Description: PGP signature

Re: panic: sleeping thread on r352386

2019-09-17 Thread Peter Jeremy

On 2019-Sep-17 11:06:58 +0300, Konstantin Belousov  wrote:
>Try the following change, which more accurately tries to avoid
>vnode_pager_setsize().  The real cause requires much more extensive
>changes.
>
>diff --git a/sys/fs/nfsclient/nfs_clport.c b/sys/fs/nfsclient/nfs_clport.c
>index 63ea4736707..16dc7745c77 100644
>--- a/sys/fs/nfsclient/nfs_clport.c
>+++ b/sys/fs/nfsclient/nfs_clport.c
...

With that patch, I'm back to "Sleeping thread (...) owns a non-sleepable
lock" panics.

-- 
Peter Jeremy


signature.asc
Description: PGP signature

Re: "Sleeping with non-sleepable lock" in NFS on recent -current

2019-09-16 Thread Peter Jeremy

On 2019-Sep-16 11:19:02 +0300, Konstantin Belousov  wrote:
>diff --git a/sys/fs/nfsclient/nfs_clport.c b/sys/fs/nfsclient/nfs_clport.c
>index 471e029a8b5..63ea4736707 100644
...

Thanks, that patch seems much more stable.

-- 
Peter Jeremy


signature.asc
Description: PGP signature

Re: "Sleeping with non-sleepable lock" in NFS on recent -current

2019-09-16 Thread Peter Jeremy

On 2019-Sep-16 09:32:52 +0300, Konstantin Belousov  wrote:
>On Mon, Sep 16, 2019 at 04:12:05PM +1000, Peter Jeremy wrote:
>> I'm consistently seeing panics in the NFS code on recent -current on aarm64.
>> The panics are one of the following two:
>> Sleeping on "vmopar" with the following non-sleepable locks held:
>> exclusive sleep mutex NEWNFSnode lock (NEWNFSnode lock) r = 0 
>> (0xfd0078b346f0) locked @ /usr/src/sys/fs/nfsclient/nfs_clport.c:432
>> 
>> Sleeping thread (tid 100077, pid 35) owns a non-sleepable lock
>> 
>> Both panics have nearly identical backtraces (see below).  I'm running
>> diskless on a Rock64 with both filesystem and swap over NFS.  The panics
>> can be fairly reliably triggered by any of:
>> * "make -j4 buildworld"
>> * linking the kernel (as part of buildkernel)
>> * "make installworld"
>> 
>> Has anyone else seen this?
...

>Weird since this should have been fixed long time ago.  Anyway, please
>try the following, it should fix the rest of cases.
>
>diff --git a/sys/fs/nfsclient/nfs_clport.c b/sys/fs/nfsclient/nfs_clport.c
...
>@@ -540,7 +541,7 @@ nfscl_loadattrcache(struct vnode **vpp, struct nfsvattr 
>*nap, void *nvaper,
>   } else {
>   np->n_size = vap->va_size;
>   np->n_flag |= NSIZECHANGED;
>-  vnode_pager_setsize(vp, np->n_size);
>+  setnsize = 1;

Should this else block include a "nsize = np->n_size;"?  Without it,
nsize will remain set to 0, which looks wrong.

-- 
Peter Jeremy


signature.asc
Description: PGP signature

"Sleeping with non-sleepable lock" in NFS on recent -current

2019-09-15 Thread Peter Jeremy

I'm consistently seeing panics in the NFS code on recent -current on aarm64.
The panics are one of the following two:
Sleeping on "vmopar" with the following non-sleepable locks held:
exclusive sleep mutex NEWNFSnode lock (NEWNFSnode lock) r = 0 
(0xfd0078b346f0) locked @ /usr/src/sys/fs/nfsclient/nfs_clport.c:432

Sleeping thread (tid 100077, pid 35) owns a non-sleepable lock

Both panics have nearly identical backtraces (see below).  I'm running
diskless on a Rock64 with both filesystem and swap over NFS.  The panics
can be fairly reliably triggered by any of:
* "make -j4 buildworld"
* linking the kernel (as part of buildkernel)
* "make installworld"

Has anyone else seen this?

The first panic (sleeping on vmopar) has a backtrace:
sched_switch() at mi_switch+0x19c
 pc = 0x002ab368  lr = 0x0028a9f4
 sp = 0x61192660  fp = 0x61192680

mi_switch() at sleepq_switch+0x100
 pc = 0x0028a9f4  lr = 0x002d56dc
 sp = 0x61192690  fp = 0x611926d0

sleepq_switch() at sleepq_wait+0x48
 pc = 0x002d56dc  lr = 0x002d5594
 sp = 0x611926e0  fp = 0x61192700

sleepq_wait() at _sleep+0x2c4  [***]
 pc = 0x002d5594  lr = 0x00289eec
 sp = 0x61192710  fp = 0x611927b0

_sleep() at vm_object_page_remove+0x178  [***]
 pc = 0x00289eec  lr = 0x0052211c
 sp = 0x611927c0  fp = 0x61192820

vm_object_page_remove() at vnode_pager_setsize+0xc0
 pc = 0x0052211c  lr = 0x00539a70
 sp = 0x61192830  fp = 0x61192870

vnode_pager_setsize() at nfscl_loadattrcache+0x2e8
 pc = 0x00539a70  lr = 0x001ed4b4
 sp = 0x61192880  fp = 0x611928e0

nfscl_loadattrcache() at ncl_writerpc+0x104
 pc = 0x001ed4b4  lr = 0x001e2158
 sp = 0x611928f0  fp = 0x61192a40

ncl_writerpc() at ncl_doio+0x36c
 pc = 0x001e2158  lr = 0x001f0370
 sp = 0x61192a50  fp = 0x61192ae0

ncl_doio() at nfssvc_iod+0x228
 pc = 0x001f0370  lr = 0x001f1d88
 sp = 0x61192af0  fp = 0x61192b50

nfssvc_iod() at fork_exit+0x7c
 pc = 0x001f1d88  lr = 0x0023ff5c
 sp = 0x61192b60  fp = 0x61192b90

fork_exit() at fork_trampoline+0x10
 pc = 0x0023ff5c  lr = 0x00562c34
 sp = 0x61192ba0  fp = 0x


For the second panic, the [***] change to:
sleepq_wait() at vm_page_sleep_if_busy+0x80
vm_page_sleep_if_busy() at vm_object_page_remove+0xfc


-- 
Peter Jeremy


signature.asc
Description: PGP signature

Re: "panic: Duplicate alloc" in dwmmc_attach on Rock64

2019-06-23 Thread Peter Jeremy

On 2019-Jun-21 20:59:39 +1000, Peter Jeremy  wrote:
>Since r349169, my Rock64 has consistently panic'd whilst attaching
>rockchip_dwmmc1.  A kernel built at r349135 works OK.  The relevant
>output looks like:
>rockchip_dwmmc0: (RockChip)> mem 0xff50-0xff503fff irq 40 on ofwbus0
>rockchip_dwmmc0: Hardware version ID is 270a
>mmc0:  on rockchip_dwmmc0
>rockchip_dwmmc1: (RockChip)> mem 0xff52-0xff523fff irq 42 on ofwbus0
>rockchip_dwmmc1: Hardware version ID is 270a
>panic: Duplicate alloc of 0xfd89cf50 from zone 0xfd817540(16) 
>slab 0xfd89cf90(0)

I did some more digging and narrowed this down to r349151 (which has nothing
that would be an obvious cause).  And the problem went away somewhere
between r349269 and r349288.  Since there's nothing obvious there either, I
presume this is something more subtle like a race condition that has been
provoked by the code changes.

-- 
Peter Jeremy

signature.asc
Description: PGP signature

"panic: Duplicate alloc" in dwmmc_attach on Rock64

2019-06-21 Thread Peter Jeremy

Since r349169, my Rock64 has consistently panic'd whilst attaching
rockchip_dwmmc1.  A kernel built at r349135 works OK.  The relevant
output looks like:
rockchip_dwmmc0:  mem 0xff50-0xff503fff irq 40 on ofwbus0
rockchip_dwmmc0: Hardware version ID is 270a
mmc0:  on rockchip_dwmmc0
rockchip_dwmmc1:  mem 0xff52-0xff523fff irq 42 on ofwbus0
rockchip_dwmmc1: Hardware version ID is 270a
panic: Duplicate alloc of 0xfd89cf50 from zone 0xfd817540(16) 
slab 0xfd89cf90(0)

cpuid = 0
time = 1
KDB: stack backtrace:
db_trace_self() at db_trace_self_wrapper+0x28
 pc = 0x00535d54  lr = 0x000df10c
 sp = 0x000104d0  fp = 0x000106e0

db_trace_self_wrapper() at vpanic+0x18c
 pc = 0x000df10c  lr = 0x00278218
 sp = 0x000106f0  fp = 0x00010790

vpanic() at panic+0x44
 pc = 0x00278218  lr = 0x00277fc8
 sp = 0x000107a0  fp = 0x00010820

panic() at uma_dbg_alloc+0x144
 pc = 0x00277fc8  lr = 0x004fa4b0
 sp = 0x00010830  fp = 0x00010850

uma_dbg_alloc() at uma_zalloc_arg+0x9b0
 pc = 0x004fa4b0  lr = 0x004f9960
 sp = 0x00010860  fp = 0x000108e0

uma_zalloc_arg() at malloc+0x9c
 pc = 0x004f9960  lr = 0x00252a8c
 sp = 0x000108f0  fp = 0x00010920

malloc() at bounce_bus_dmamem_alloc+0x4c
 pc = 0x00252a8c  lr = 0x00533b64
 sp = 0x00010930  fp = 0x00010960

bounce_bus_dmamem_alloc() at dwmmc_attach+0x5fc
 pc = 0x00533b64  lr = 0x00556f14
 sp = 0x00010970  fp = 0x000109e0

dwmmc_attach() at device_attach+0x3f4
 pc = 0x00556f14  lr = 0x002abd8c
 sp = 0x000109f0  fp = 0x00010a40

device_attach() at bus_generic_new_pass+0x12c
 pc = 0x002abd8c  lr = 0x002adb40
 sp = 0x00010a50  fp = 0x00010a80
...

I've looked through all the intervening commits and don't see any
smoking gun.  Does anyone have any suggestions?

-- 
Peter Jeremy


signature.asc
Description: PGP signature

Re: error: yacc.h: No such file or directory

2019-06-20 Thread Peter Jeremy

On 2019-Jun-18 07:01:31 -0700, Enji Cooper  wrote:
>
>> On Jun 18, 2019, at 06:59, Enji Cooper  wrote:
>> PS This is one of the reasons why I wasn’t quick to discount Peter Jeremy’s 
>> reported build issue.
>
>Correction: I meant Julian Stacey.

I'm not sure how I feel about being confused with jhs.

Actually, I had also seen this problem in both mkesdb_static and
mkcsmapper_static but hadn't reported it because I was investigating
something else and wasn't certain that it wasn't self-inflicted.

-- 
Peter Jeremy

signature.asc
Description: PGP signature

Re: FreeBSD 12 kernel broken

2019-03-24 Thread Peter Jeremy

On 2019-Mar-22 19:08:18 +0300, Rozhuk Ivan  wrote:
>ld: error: undefined symbol: xz_dec_init
>>>> referenced by g_uzip_lzma.c:106 (/usr/src/sys/geom/uzip/g_uzip_lzma.c:106)
>>>>   g_uzip_lzma.o:(g_uzip_lzma_ctor)
>
>ld: error: undefined symbol: xz_dec_run
>>>> referenced by g_uzip_lzma.c:81 (/usr/src/sys/geom/uzip/g_uzip_lzma.c:81)
>>>>   g_uzip_lzma.o:(g_uzip_lzma_decompress)
>
>ld: error: undefined symbol: xz_dec_end
>>>> referenced by g_uzip_lzma.c:60 (/usr/src/sys/geom/uzip/g_uzip_lzma.c:60)
>>>>   g_uzip_lzma.o:(g_uzip_lzma_free)
>--- kernel.full ---
>*** [kernel.full] Error code 1

Are you talking about FreeBSD 12 or FreeBSD 13?

-- 
Peter Jeremy


signature.asc
Description: PGP signature

Re: Optimization bug with floating-point?

2019-03-14 Thread Peter Jeremy

On 2019-Mar-13 23:30:07 -0700, Steve Kargl  
wrote:
>AFAICT, all libm float routines need to be modified to conditional
>include ieeefp.h and call fpsetprec(FP_PD).  This will work around
>issues is FP and libm.  FreeBSD needs to issue an erratum about 
>the numerical issues with clang.

I vaguely recall looking into the x87 initialisation a long time ago
and STR that the startup code (either crtX or in the kernel) does
a fninit() to set the precision.  I don't recall exactly where.

IMO, calling fpsetprec() in every libm float function is overkill. It
should be enough to fpsetprec() before main() and add a note in the
man pages that libm is built to use the default FPU configuration and
changing the configuration (precision or rounding) may result in larger
errors.

-- 
Peter Jeremy

signature.asc
Description: PGP signature

Re: how to browse svnweb source?

2018-05-28 Thread Peter Jeremy

On 2018-May-28 18:06:07 -0700, Jeffrey Bouquet  
wrote:
>> > Suddenly the site www.secnetix.de/olli/FreeBSD/svnews which showed 
>> > sequential
>> > source as for example xx1966 on april 3  xx2040 on april 4 this year, 
>> > is not loading
>> > in the browser.

That site is not associated with the FreeBSD Project so you would need to
discuss the absence of information on that site with whoever runs it.

>I tried that url every which way, sorting the headings, etc, and onscreen
>would be at best, a description of the new source but not specifically which
>files were changed and their complete path. Nothing like the url mentioned 
>above at
>.de in the latter's overview. 

Without knowing what that site displayed, it's very difficult to know where
(or if) svnweb provides the information.  Given a known revision, you can
check (eg) https://svnweb.freebsd.org/base?view=revision&revision=333926

If you want a sequential list of commits, you might be better off with (eg)
https://lists.freebsd.org/pipermail/svn-src-all/

-- 
Peter Jeremy

signature.asc
Description: PGP signature

Re: Strange ARC/Swap/CPU on yesterday's -CURRENT

2018-03-20 Thread Peter Jeremy

On 2018-Mar-11 10:43:58 -1000, Jeff Roberson  wrote:
>Also, if you could try going back to r328953 or r326346 and let me know if 
>the problem exists in either.  That would be very helpful.  If anyone is 
>willing to debug this with me contact me directly and I will send some 
>test patches or debugging info after you have done the above steps.

I ran into this on 11-stable and tracked it to r326619 (MFC of r325851).
I initially got around the problem by reverting that commit but either
it or something very similar is still present in 11-stable r331053.

I've seen it in my main server (32GB RAM) but haven't managed to reproduce
it in smaller VBox guests - one difficulty I faced was artificially filling
ARC.

-- 
Peter Jeremy

signature.asc
Description: PGP signature

Re: Build error: 'emmintrin.h' file not found

2018-01-24 Thread Peter Jeremy

On 2018-Jan-24 17:34:33 +0100, Florian Limberger  
wrote:
>since a few days I can't build 12-CURRENT anymore, due to the 'emmintrin.h'
>header missing.

I ran into a similar problem about a month ago.  First of all, does
your host system have emmintrin.h?  E.g. what is the output of "find
/usr/lib/clang -name emmintrin.h" ?

-- 
Peter Jeremy

signature.asc
Description: PGP signature

Re: Unable to build 12-current/amd64

2017-12-25 Thread Peter Jeremy

On 2017-Dec-23 13:42:40 +0100, Dimitry Andric  wrote:
>On 23 Dec 2017, at 10:56, Peter Jeremy  wrote:
>> 
>> Since r326496, buildworld on my 12-current/amd64 system has consistently
>> died as follows.
>...
>> /usr/src/contrib/llvm/tools/clang/lib/Basic/SourceManager.cpp:1166:10: fatal 
>> error: 'emmintrin.h' file not found
>> #include 
>> ^
>> 1 error generated.
>> *** Error code 1
>> 
>> Stop.
>> make[4]: stopped in /usr/src/lib/clang/libclang
>> 
>> I'm building on a 12.0-CURRENT VirtualBox guest at r326430.  I've checked
>> that my /usr/src is clean and deleted /usr/obj to no effect.  I have dug
>> into SourceManager.cpp and the #include is protected by a #if __SSE2__,
>> which is relying on clang internal checks to define (and my CPU supports
>> SSE2).  Does anyone have any ideas to explain what is going on?
>
>First of all, does your host system have emmintrin.h?  E.g. what is the
>output of "find /usr/lib/clang -name emmintrin.h" ?

Aha.  Somehow my entire /usr/lib/clang/5.0.0 tree was missing.  I'm not sure
if that was an installworld glitch or something I accidently did.  In any
case, restoring it has fixed the problem.  Thanks for the pointer.

-- 
Peter Jeremy


signature.asc
Description: PGP signature

Unable to build 12-current/amd64

2017-12-23 Thread Peter Jeremy

Since r326496, buildworld on my 12-current/amd64 system has consistently
died as follows.  I have no problems building on i386 or building
12-current/amd64 on 11-stable.

...
>>> stage 3: cross tools
--
cd /usr/src; INSTALL="sh /usr/src/tools/install.sh"  
TOOLS_PREFIX=/usr/obj/usr/src/amd64.amd64/tmp  
PATH=/usr/obj/usr/src/amd64.amd64/tmp/legacy/usr/sbin:/usr/obj/usr/src/amd64.amd64/tmp/legacy/usr/bin:/usr/obj/usr/src/amd64.amd64/tmp/legacy/bin:/sbin:/bin:/usr/sbin:/usr/bin
  WORLDTMP=/usr/obj/usr/src/amd64.amd64/tmp  MAKEFLAGS="-m 
/usr/src/tools/build/mk  -m /usr/src/share/mk" make  -f Makefile.inc1  DESTDIR= 
 OBJTOP='/usr/obj/usr/src/amd64.amd64/tmp/obj-tools'  OBJROOT='${OBJTOP}/'  
MAKEOBJDIRPREFIX=  BOOTSTRAPPING=1200054  BWPHASE=cross-tools  SSP_CFLAGS=  
MK_HTML=no NO_LINT=yes MK_MAN=no  -DNO_PIC MK_PROFILE=no -DNO_SHARED  
-DNO_CPU_CFLAGS MK_WARNS=no MK_CTF=no  MK_CLANG_EXTRAS=no MK_CLANG_FULL=no  
MK_LLDB=no MK_TESTS=no  MK_INCLUDES=yes  TARGET=amd64 TARGET_ARCH=amd64  
MK_GDB=no MK_LLD_IS_LD=no MK_TESTS=no cross-tools
...
===> lib/clang/libclang (all)
...
c++  -O2 -pipe -I/usr/obj/usr/src/amd64.amd64/tmp/obj-tools/lib/clang/libclang 
-I/usr/obj/usr/src/amd64.amd64/tmp/obj-tools/lib/clang/libllvm 
-I/usr/src/contrib/llvm/tools/clang/lib/Driver 
-I/usr/src/contrib/llvm/tools/clang/include -I/usr/src/lib/clang/include 
-I/usr/src/contrib/llvm/include -DLLVM_BUILD_GLOBAL_ISEL -D__STDC_LIMIT_MACROS 
-D__STDC_CONSTANT_MACROS 
-DLLVM_DEFAULT_TARGET_TRIPLE=\"x86_64-unknown-freebsd12.0\" 
-DLLVM_HOST_TRIPLE=\"x86_64-unknown-freebsd12.0\" 
-DDEFAULT_SYSROOT=\"/usr/obj/usr/src/amd64.amd64/tmp\" -ffunction-sections 
-fdata-sections -gline-tables-only -MD -MF.depend.Basic_SourceLocation.o 
-MTBasic/SourceLocation.o -Qunused-arguments 
-I/usr/obj/usr/src/amd64.amd64/tmp/legacy/usr/include  -std=c++11 
-fno-exceptions -fno-rtti -gline-tables-only -stdlib=libc++ 
-Wno-c++11-extensions  -c 
/usr/src/contrib/llvm/tools/clang/lib/Basic/SourceLocation.cpp -o 
Basic/SourceLocation.o
c++  -O2 -pipe -I/usr/obj/usr/src/amd64.amd64/tmp/obj-tools/lib/clang/libclang 
-I/usr/obj/usr/src/amd64.amd64/tmp/obj-tools/lib/clang/libllvm 
-I/usr/src/contrib/llvm/tools/clang/lib/Driver 
-I/usr/src/contrib/llvm/tools/clang/include -I/usr/src/lib/clang/include 
-I/usr/src/contrib/llvm/include -DLLVM_BUILD_GLOBAL_ISEL -D__STDC_LIMIT_MACROS 
-D__STDC_CONSTANT_MACROS 
-DLLVM_DEFAULT_TARGET_TRIPLE=\"x86_64-unknown-freebsd12.0\" 
-DLLVM_HOST_TRIPLE=\"x86_64-unknown-freebsd12.0\" 
-DDEFAULT_SYSROOT=\"/usr/obj/usr/src/amd64.amd64/tmp\" -ffunction-sections 
-fdata-sections -gline-tables-only -MD -MF.depend.Basic_SourceManager.o 
-MTBasic/SourceManager.o -Qunused-arguments 
-I/usr/obj/usr/src/amd64.amd64/tmp/legacy/usr/include  -std=c++11 
-fno-exceptions -fno-rtti -gline-tables-only -stdlib=libc++ 
-Wno-c++11-extensions  -c 
/usr/src/contrib/llvm/tools/clang/lib/Basic/SourceManager.cpp -o 
Basic/SourceManager.o
/usr/src/contrib/llvm/tools/clang/lib/Basic/SourceManager.cpp:1166:10: fatal 
error: 'emmintrin.h' file not found
#include 
 ^
1 error generated.
*** Error code 1

Stop.
make[4]: stopped in /usr/src/lib/clang/libclang

I'm building on a 12.0-CURRENT VirtualBox guest at r326430.  I've checked
that my /usr/src is clean and deleted /usr/obj to no effect.  I have dug
into SourceManager.cpp and the #include is protected by a #if __SSE2__,
which is relying on clang internal checks to define (and my CPU supports
SSE2).  Does anyone have any ideas to explain what is going on?

-- 
Peter Jeremy


signature.asc
Description: PGP signature

Re: get_swap_pager(x) failed

2017-12-13 Thread Peter Jeremy

On 2017-Dec-13 11:23:46 +, Gary Palmer  wrote:
>An open question would be why ARC is not reducing if the system is
>under memory pressure.  It's meant to, but there have been various
>bugs in that implementation.

The OP doesn't say what version of -current he is running but I would
point the finger at r325851.  I have discovered that, in 11-stable,
r326619 (which is the MFC of r325851) stops ARC responding to memory
backpressure.

-- 
Peter Jeremy

signature.asc
Description: PGP signature

Re: dump trying to access incorrect block numbers?

2017-07-07 Thread Peter Jeremy

On 2017-Jul-07 10:44:36 -0400, Michael Butler  
wrote:
>Recent builds doing a backup (dump) cause nonsensical errors in syslog:

I can't directly offer any ideas but some more background might help:
When did you first notice this (what SVN revision)?
Do you know what the last good SVN revision was?
Is this a new or old filesystem?
Is the filesystem mounted/active or not when you dump it?
What are the relevant parameters for the filesystem on ada0s3a?
Are you running softupdates, journalling etc?
Which dump(8) phase is reporting the errors?
What are the exact dump and fsck commands you ran?

>I now have two UFS-based systems showing the same symptoms - what's up 
>with this?

Was there anything you did on either filesystem that might have triggered it?

-- 
Peter Jeremy

signature.asc
Description: PGP signature

Re: ino64? r318606 -> r318739 OK; r318739 -> r318781 fails SIGSEGV

2017-05-24 Thread Peter Jeremy

On 2017-May-24 20:21:54 +0300, Konstantin Belousov  wrote:
>No SIGSEGV etc, so I think that the effects seen are due to build system.
>rm -rf obj/* is the safest trick, I believe.

But the behaviour does indicate that meta mode is not doing the right thing
under all circumstances.  It's blatently breaking in this scenario but could
be causing more subtle (and unnoticed) breakage in other cases.  This makes
me feel that this is worth investigating further.

-- 
Peter Jeremy

signature.asc
Description: PGP signature

Re: ino64? r318606 -> r318739 OK; r318739 -> r318781 fails SIGSEGV

2017-05-24 Thread Peter Jeremy

On 2017-May-24 18:01:42 -0700, "Simon J. Gerraty"  wrote:
>Peter Jeremy  wrote:
>> as follows.  My suspicion is that meta mode isn't seeing enough of the
>> differences between the bootstrap and main build steps and so causing make
>> to incorrectly skip steps.
>
>I see a number of places in src/Makefile* where BUILD_TOOLS_META=.NOMETA
>is added to env of things like CROSSENV, CD2MAKE, LIBCOMPATWMAKEENV
>
>Use of .NOMETA could be leading to problems - but I'm not familiar with
>where BUILD_TOOLS_META is used.

I've not looked at the guts of how meta mode works or is inhibited either.

In my case, I have "WITH_META_MODE=yes" in /etc/src-env.conf and was
using "make buildworld" - which failed.  The upgrade worked cleanly
when I manually deleted all the .meta files.  If I get a round tuit,
I'll try to revert to before the update and have a closer look at what
broke with the "normal" build, if no-one else beats me to it.

-- 
Peter Jeremy

signature.asc
Description: PGP signature

Re: ino64? r318606 -> r318739 OK; r318739 -> r318781 fails SIGSEGV

2017-05-24 Thread Peter Jeremy

On 2017-May-24 08:47:41 -0700, Ngie Cooper  wrote:
>There was another report on the list about a stale MAKEOBJDIRPREFIX 
> causing someone grief. I think it's safe to say that meta mode and -DNO_CLEAN 
> might not work across this transition--in particular meta mode tends to err 
> on the side of not to rebuilding things.

I ran into a very similar problem trying to update from r318744 to r318781.
In my case, even two "make clean" wasn't enough and "make buildworld" died
as follows.  My suspicion is that meta mode isn't seeing enough of the
differences between the bootstrap and main build steps and so causing make
to incorrectly skip steps.

--
>>> stage 2.3: build tools
--
cd /usr/src; MAKEOBJDIRPREFIX=/usr/obj  INSTALL="sh /usr/src/tools/install.sh"  
TOOLS_PREFIX=/usr/obj/usr/src/tmp  
PATH=/usr/obj/usr/src/tmp/legacy/usr/sbin:/usr/obj/usr/src/tmp/legacy/usr/bin:/usr/obj/usr/src/tmp/legacy/bin:/sbin:/bin:/usr/sbin:/usr/bin
  WORLDTMP=/usr/obj/usr/src/tmp  MAKEFLAGS="-m /usr/src/tools/build/mk  -m 
/usr/src/share/mk" /usr/obj/usr/src/make.amd64/bmake  -f Makefile.inc1  
TARGET=amd64 TARGET_ARCH=amd64  DESTDIR=  BOOTSTRAPPING=1200031  SSP_CFLAGS=  
-DNO_LINT  -DNO_CPU_CFLAGS MK_WARNS=no MK_CTF=no  MK_CLANG_EXTRAS=no 
MK_CLANG_FULL=no  MK_LLDB=no MK_TESTS=no build-tools
...
===> usr.bin/mkesdb_static (obj,build-tools)
Building /usr/obj/usr/src/usr.bin/mkesdb_static/citrus_bcs.o
Building /usr/obj/usr/src/usr.bin/mkesdb_static/citrus_db_factory.o
Building /usr/obj/usr/src/usr.bin/mkesdb_static/citrus_db_hash.o
Building /usr/obj/usr/src/usr.bin/mkesdb_static/citrus_lookup_factory.o
Building /usr/obj/usr/src/usr.bin/mkesdb_static/lex.c
Building /usr/obj/usr/src/usr.bin/mkesdb_static/lex.o
/usr/src/usr.bin/mkesdb/lex.l:44:10: fatal error: 'yacc.h' file not found
#include "yacc.h"
 ^~~~
 1 error generated.
 *** Error code 1

Stop.
bmake[3]: stopped in /usr/src/usr.bin/mkesdb_static
.ERROR_TARGET='lex.o'
.ERROR_META_FILE='/usr/obj/usr/src/usr.bin/mkesdb_static/lex.o.meta'
.MAKE.LEVEL='3'
MAKEFILE=''
.MAKE.MODE='meta missing-filemon=yes missing-meta=yes silent=yes verbose'
.CURDIR='/usr/src/usr.bin/mkesdb_static'
.MAKE='/usr/obj/usr/src/make.amd64/bmake'
.OBJDIR='/usr/obj/usr/src/usr.bin/mkesdb_static'
.TARGETS='build-tools'
DESTDIR=''
LD_LIBRARY_PATH=''
MACHINE='amd64'
MACHINE_ARCH='amd64'
MAKEOBJDIRPREFIX='/usr/obj'
MAKESYSPATH='/usr/src/share/mk'
MAKE_VERSION='20161212'
PATH='/usr/obj/usr/src/tmp/legacy/usr/sbin:/usr/obj/usr/src/tmp/legacy/usr/bin:/usr/obj/usr/src/tmp/legacy/bin:/sbin:/bin:/usr/sbin:/usr/bin'
SRCTOP='/usr/src'
OBJTOP='/usr/obj/usr/src'
.MAKE.MAKEFILES='/usr/src/share/mk/sys.mk /usr/src/share/mk/local.sys.env.mk 
/usr/src/share/mk/src.sys.env.mk /etc/src-env.conf 
/usr/src/share/mk/bsd.mkopt.mk /usr/src/share/mk/bsd.suffixes.mk /etc/make.conf 
/usr/src/share/mk/local.sys.mk /usr/src/share/mk/src.sys.mk 
/usr/src/usr.bin/mkesdb_static/Makefile /usr/src/usr.bin/mkesdb/Makefile.inc 
/usr/src/tools/build/mk/bsd.prog.mk /usr/src/share/mk/bsd.prog.mk 
/usr/src/share/mk/bsd.init.mk /usr/src/share/mk/bsd.opts.mk 
/usr/src/share/mk/bsd.cpu.mk /usr/src/share/mk/local.init.mk 
/usr/src/share/mk/src.init.mk /usr/src/usr.bin/mkesdb_static/../Makefile.inc 
/usr/src/share/mk/bsd.own.mk /usr/src/share/mk/bsd.compiler.mk 
/usr/src/share/mk/bsd.compiler.mk /usr/src/share/mk/bsd.libnames.mk 
/usr/src/share/mk/src.libnames.mk /usr/src/share/mk/src.opts.mk 
/usr/src/share/mk/bsd.nls.mk /usr/src/share/mk/bsd.confs.mk 
/usr/src/share/mk/bsd.files.mk /usr/src/share/mk/bsd.incs.mk 
/usr/src/share/mk/bsd.links.mk /usr/src/share/mk/bsd.man.mk 
/usr/src/share/mk/bsd.dep.mk /usr/src/share/mk/bsd.clang-analyze.mk 
/usr/src/share/mk/bsd.obj.mk /usr/src/share/mk/bsd.subdir.mk 
/usr/src/share/mk/bsd.sys.mk /usr/src/tools/build/mk/Makefile.boot'
.PATH='. /usr/src/usr.bin/mkesdb_static /usr/src/lib/libc/iconv 
/usr/src/usr.bin/mkesdb'
*** Error code 1

I've done a "find /usr/obj -name \*.meta -print0 | xargs -0 rm" and am still
waiting for that to complete, though it has passed the above failure point.

-- 
Peter Jeremy

signature.asc
Description: PGP signature

Re: effect of strip(1) on du(1)

2017-03-03 Thread Peter Jeremy

On 2017-Mar-02 22:19:10 -0800, "Rodney W. Grimes" 
 wrote:
>> du(1) is using fts_read(3), which is based on the stat(2) information.
>> The OpenGroup defines st_blocksize as "Number of blocks allocated for
>> this object."  In the case of ZFS, a write(2) may return before any
>> blocks are actually allocated.  And thanks to compression, gang
...
>My gut tells me that this is gona cause problems, is it ONLY
>the st_blocksize data that is incorrect then not such a big
>problem, or are we returning other meta data that is wrong?

Note that it's st_blocks, not st_blocksize.

I did an experiment, writing a (roughly) 113MB file (some data I had
lying around), close()ing it and then stat()ing it in a loop.  This is
FreeBSD 10.3 with ZFS and lz4 compression.  Over the 26ms following the
close(), st_blocks gradually rose from 24169 to 51231.  It then stayed
stable until 4.968s after the close, when st_blocks again started
increasing until it stabilized after a total of 5.031s at 87483.  Based
on this, st_blocks reflects the actual number of blocks physically
written to disk.  None of the other fields in the struct stat vary.

The 5s delay is presumably the TXG delay (since this system is basically
unloaded).  I'm not sure why it writes roughly ½ the data immediately
and the rest as part of the next TXG write.

>My expectactions of executing a stat(2) call on a file would
>be that the data returned is valid and stable.  I think almost
>any program would expect that.

I think a case could be made that st_blocks is a valid representation
of "the number of blocks allocated for this object" - with the number
increasing as the data is physically written to disk.  As for it being
stable, consider a (hypothetical) filesystem that can transparently
migrate data between different storage media, with different compression
algorithms etc (ZFS will be able to do this once the mythical block
rewrite code is written).

-- 
Peter Jeremy

signature.asc
Description: PGP signature

Re: effect of strip(1) on du(1)

2017-03-02 Thread Peter Jeremy

On 2017-Mar-02 22:29:46 +0300, Subbsd  wrote:
>During some interval after strip call, du will show 512B for any file.
>If execute du(1) after strip(1) without delay, this behavior is reproduced 
>100%:

What filesystem are you using?  strip(1) rewrites the target file and du(1)
reports the number of blocks reported by stat(2).  It seems that you are
hitting a situation where the file metadata isn't immediately updated.

-- 
Peter Jeremy

signature.asc
Description: PGP signature

Re: removing SVR4 binary compatibilty layer

2017-02-15 Thread Peter Jeremy

On 2017-Feb-14 10:32:32 -0800, Gleb Smirnoff  wrote:
>  After some discussion on svn mailing list [1], there is intention
>to remove SVR4 binary compatibilty layer from FreeBSD head, meaning
>that FreeBSD 12.0-RELEASE, available in couple of years would
>be shipped without it. There is no intention of merge of the removal.
>The stable@ mailing list added for wider audience.

Can I suggest that we put some warnings into the SVr4 image activation
code and MFC that to at least 11 to try and smoke out anyone who might
actually be using it.

-- 
Peter Jeremy


signature.asc
Description: PGP signature

Re: Somethign missing in my environment?

2016-08-17 Thread Peter Jeremy

On 2016-Aug-16 23:14:45 +0200, Willem Jan Withagen  wrote:
>And I'm running:
>make -j8 buildworld
>So getting a good target that give the error is hard.
>
>So I continued with make -DNOCLEAN -DNO_CLEAN buildworld.

There's nothing immediately obvious.  I suggest trying without the
"-DNOCLEAN -DNO_CLEAN" - they are shortcuts that aren't guaranteed to
work under all circumstances.  And if that still fails, skip the '-j8'
because it's possible there are still race conditions in buildworld
(though that is very unlikely).

-- 
Peter Jeremy

signature.asc
Description: PGP signature

Re: Somethign missing in my environment?

2016-08-16 Thread Peter Jeremy

On 2016-Aug-16 20:31:57 +0200, Willem Jan Withagen  wrote:
>I'm trying to compile world, but I keep getting:
>
>/usr/obj/usr/srcs/head/src/tmp/usr/lib/libgcc_s.so: undefined reference
>to `__gxx_personality_v0'
>cc: error: linker command failed with exit code 1 (use -v to see invocation)
>*** [h_raw.full] Error code 1
>
>Even after refetching the complete tree.

We need more context:
- What SVN revision of (presumably) -current is this?
- What architecture are you compiling on/for?
- What do you have in /etc/make.conf and /etc/src.conf
- What is your current environment?
- What is the output leading up to that error (what is being built?

-- 
Peter Jeremy


signature.asc
Description: PGP signature

Re: Mosh regression between 10.x and 11-stable

2016-08-11 Thread Peter Jeremy

On 2016-Aug-11 10:06:35 -0700, Ngie Cooper  wrote:
>
>> On Aug 11, 2016, at 09:30, John Hood  wrote:
>> 
>> I still can't reproduce this on 3 different 11.0-BETA4 servers and a
>> variety of clients and networks.  Can you try and identify a more
>> portable repro or at least figure out why it fails on your system?
>> 
>> Please try applying this patch, too.  It's a shot in the dark, though.
>
>Dumb question: what ssh key type(s) (dsa, rsa, etc) are you using Peter :)?

I'm using ECDSA for both the host and user keys.

-- 
Peter Jeremy


signature.asc
Description: PGP signature

Re: Mosh regression between 10.x and 11-stable

2016-08-11 Thread Peter Jeremy

On 2016-Aug-11 12:30:23 -0400, John Hood  wrote:
>I still can't reproduce this on 3 different 11.0-BETA4 servers and a
>variety of clients and networks.  Can you try and identify a more
>portable repro or at least figure out why it fails on your system?
>
>Please try applying this patch, too.  It's a shot in the dark, though.

That patch seems to fix the problem I'm seeing.  Not waiting for output
to drain is consistent with the symptoms I'm seeing, though I have no
idea why only my Linux client is affected.

-- 
Peter Jeremy

signature.asc
Description: PGP signature

Re: Mosh regression between 10.x and 11-stable

2016-08-11 Thread Peter Jeremy

On 2016-Aug-10 14:32:15 -0400, john hood  wrote:
>On 8/10/16 4:18 AM, Peter Jeremy wrote:
>> I recently updated one of my VPS hosts from 10.3-RELEASE-p5 to 11.0-BETA4
>> r303811 and mosh to that host from my Linux laptop stopped working.  All
>> I get on the laptop is:
>> $ mosh remotehost
>> Connection to remotehost closed.
>> /usr/bin/mosh: Did not find mosh server startup message.

>> 1) the "MOSH CONNECT" message isn't making it out of the local ssh process.
>
>Do you know if the message is getting out of mosh-server?  into sshd?
>Do you know if mosh-server is actually running?  (It will log utmp
>entries on startup.)

mosh-server is running - I can see it from another session and redirecting
verbose output into a file, I get:

mosh-server (mosh 1.2.5) [build mosh 1.2.5]
Copyright 2012 Keith Winstein 
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

[mosh-server detached, pid = 4202]

Warning: termios IUTF8 flag not defined.
Character-erase of multibyte character sequence
probably does not work properly on this platform.

I can't tell if it's actually writing into the remote ssh process.

>> 2) it's racy because I can get it from "always fails" to "sometimes works".
>
>How do you get it there?

- Add '-v' to the local ssh command.
- ktrace the remote mosh-server process (this seems to make it consistently 
work).

-- 
Peter Jeremy

signature.asc
Description: PGP signature

Mosh regression between 10.x and 11-stable

2016-08-10 Thread Peter Jeremy

I recently updated one of my VPS hosts from 10.3-RELEASE-p5 to 11.0-BETA4
r303811 and mosh to that host from my Linux laptop stopped working.  All
I get on the laptop is:
$ mosh remotehost
Connection to remotehost closed.
/usr/bin/mosh: Did not find mosh server startup message.

I've tried rebuilding mosh (and all dependencies) on the host to no avail.

This isn't the DSA change that's been discussed elsewhere: I can SSH from my
laptop to the host without problem.  I can also manually invoke mosh-client
and mosh-server and it works.  Unfortunately, mosh has no provision for
debugging.  I've tried hacking the mosh perl script to make it more verbose
and that shows that:
1) the "MOSH CONNECT" message isn't making it out of the local ssh process.
2) it's racy because I can get it from "always fails" to "sometimes works".

My suspicion is that something has changed in either sshd or TCP that
is resulting in the connection going away before the stdout from the
remote mosh-server makes it out from the local ssh process.

I've looked at tcpdump's of both successful and failed SSH sessions
but don't see anything obviously different (encryption makes it
difficult to decode the session).

Has anyone else seen this behaviour or have any ideas what might be
causing it?

-- 
Peter Jeremy


signature.asc
Description: PGP signature

FreeBSD 11.0-BETA2 won't boot on an Acer Aspire 5560

2016-07-27 Thread Peter Jeremy

I'm trying to boot the 11.0-BETA2/amd64 memory stick image and the
kernel panics: (Following copied by hand):

ACPI APIC Table: 
...
acpi0:  on motherboard
ACPI Error: Hardware did not change modes (20160527/hwacpi-160)
ACPI Error: Could not transition to APCI mode (20160527/evxfevnt-105)
ACPI Warning: AcpiEnable failed (20160527/utxfinit-184)
acpi0: Could not enable ACPI: AE_NO_HARDWARE_RESPONSE
device_attach: acpi0 attach returned 6

Followed by a NULL dereference panic at nexus_acpi_attach+0x89

The system boots a 10.0-RELEASE/amd64 memstick (the only other image I
have conveniently to date) without problem.

-- 
Peter Jeremy


signature.asc
Description: PGP signature

Re: Recognizing SMR HDDs

2016-05-26 Thread Peter Jeremy

On 2016-May-26 08:42:53 +0200, Gary Jennejohn  wrote:
>Now that ken@ has checked in the SMR code I'm wondering how I can see
>whether it's having any effect.

camcontrol(8) has been enhanced with SMR options and there's a new
zonectl(8) command - these should be able to report whether the drive
is recognized as a host-aware or host-managed SMR drive.  I believe
that drive-managed SMR drives don't admit to anything.

>Does the fact that the drive appears as a /dev/daX play any role?

USB drives are handled via the SCSI CAM layer rather than as SATA
drives.  It's possible that either the umass(4) driver or your USB
to SATA adapter are not correctly handling the relevant commands.

-- 
Peter Jeremy

signature.asc
Description: PGP signature

Re: qsort() documentation

2016-04-20 Thread Peter Jeremy

On 2016-Apr-20 08:45:00 +0200, Hans Petter Selasky  wrote:
>There is something which I don't understand. Why is quicksort falling 
>back to insertion sort which is an O(N**2) algorithm, when there exist a 
>O(log(N)*log(N)*N) algorithms, which I propose as a solution to the 
>"bad" characteristics of qsort.

O() notation just describes the (normally, worst case) ratio of input size
to runtime for a given algorithm: Increasing the input size by (say) 100×
means an insertion sort will take about 1× as long to run, whilst the
"best" algorithms would take about 2000× as long.  It says nothing about how
fast sorting (say) 1000 items takes with either sort or how they behave on
"typical" inputs.  In general, the fancier algorithms might have better
worst-case O() numbers but they have higher overheads and may not perform
any better on typical inputs - so, for small inputs, insertion sort or
bubble sort may be faster.

IMO:
- If you're only sorting a small number of items and/or doing it infrequently,
  the sort performance doesn't really matter and you can use any algorithm.
- If you're sorting lots of items and sort performance is a real issue, you
  need to examine the performance of a variety of algorithms on your input
  data and may need to roll your own implementation.

As long as qsort() behaves reasonably and its behaviour is documented
sufficiently well that someone can decide whether or not to rule it out
for their specific application, that is (IMHO) sufficient.

-- 
Peter Jeremy

signature.asc
Description: PGP signature

Re: gettimeofday((void *)-1, NULL) implicates core dump on recent FreeBSD 11-CURRENT

2015-07-08 Thread Peter Jeremy

On 2015-Jul-08 12:22:03 -0700, Garrett Cooper  wrote:
>On Jul 8, 2015, at 12:17, Doug Rabson  wrote:
>
>> As far as I can tell, POSIX doesn't require either EFAULT or any other
>> behaviour - the text in http://www.open-std.org/jtc1/sc22/open/n4217.pdf
>> just says, "No errors are defined". Our man page is wrong and any real
>> program which relies on gettimeofday not faulting when given bad inputs is
>> broken.
>
>I would suggest the following:
>1. Document behavior in NOTES about gettimeofday returning EFAULT with the 
>specific scenarios kib mentioned, segfaulting otherwise (wordsmithing the 
>actual info of course). Otherwise, it might confuse people who look at the 
>manpage later.

I would suggest adding a comment to intro(2) noting that not all functions
listed in section 2 are necessarily system calls and may report error
conditions (or maybe "perform argument validation") differently when
implemented in userland.

Note that the issues with gettimeofday() also apply to clock_gettime().

I'm not sure if we want to explicitly document the conditions under which
gettimeofday() (or clock_gettime()) are implemented in userland vs syscalls
because that is guaranteed to get stale over time.  How about stating that
these functions are implemented as syscalls only if the AT_TIMEKEEP value
reported by "procstat -x" is NULL.

-- 
Peter Jeremy

pgpNkOswpFC0C.pgp
Description: PGP signature

Re: Bug-report of sorts...

2015-01-30 Thread Peter Jeremy

On 2015-Jan-30 22:24:50 +, Poul-Henning Kamp  wrote:
>But the point is I never get to the webpage, local_unbound just doesn't
>seem to be able to resolve anything through the DHCP appointed server,
>despite the fact that dig(1) does so just fine.

How about some packet captures showing the request/response differences
between dig(1) and local_unbound?

-- 
Peter Jeremy


pgphVJ2onIPFJ.pgp
Description: PGP signature

Re: [CFT] Paravirtualized KVM clock

2015-01-21 Thread Peter Jeremy

On 2015-Jan-04 11:56:14 -0600, Bryan Venteicher  
wrote:
>For the last few weeks, I've been working on adding support for KVM clock
>in the projects/paravirt branch. Currently, a KVM VM guest will end up
>selecting either the HPET or ACPI as the timecounter source. Unfortunately,
>this is very costly since every timecounter fetch causes a VM exit. KVM
>clock allows the guest to use the TSC instead; it is very similar to the
>existing Xen timer.

A somewhat late response but have you looked at
https://github.com/blitz/freebsd/commit/cdc5f872b3e48cc0dda031fc7d6bdedc65c3148f
I've been running this[*] on a Google Compute Engine instance for about 6
months without problems.

[*] I had to patch out the test for KVM_FEATURE_CLOCKSOURCE_STABLE_BIT but
I think that's a GCE issue.

-- 
Peter Jeremy

pgpi9_M8QUFuE.pgp
Description: PGP signature

Re: mk output during builds: duplicate script for target "...." ignored

2014-09-06 Thread Peter Jeremy

On 2014-Sep-05 18:18:15 +, "Bjoern A. Zeeb" 
 wrote:
>Started the last 48 hours at some time:

It's now fixed for me.  I think the fix was r271168.

-- 
Peter Jeremy


pgpv2g5pS98PC.pgp
Description: PGP signature

Re: keyboard break to debugger broken?

2014-07-04 Thread Peter Jeremy

On 2014-Jul-04 02:28:48 -0700, John-Mark Gurney  wrote:
>So, I recently tried to break into the debugger w/ the various key
>sequences that I know about, and none of them worked... I've tried
>CTRL-ESC, ALT-ESC, CTRL-ALT-ESC, CTRL-PRTSCR, ALT-PRTSCR and
>CTRL-ALT-PRTSCR, and many other different ones...   I've verified that
>I can sysctl debug.kdb.enter=1 to enter the debugger, and the
>CTRL-ALT-PAUSE works to suspend the machine, and CTRL-ALT-DEL works
>to reboot...
>
>Does anyone know if this works?

It works for me on 10.0.  Do you have debug.kdb.break_to_debugger=1
and hw.syscons.kbd_debug=1 (if you're using syscons)?

-- 
Peter Jeremy


pgpRWEUgfMxEM.pgp
Description: PGP signature

Re: OpenSSL vs. LibreSSL (OpenBSD)

2014-04-25 Thread Peter Jeremy

On 2014-Apr-25 05:00:38 -0400, Zack Gold  wrote:
>An important thing to note here is motive. The Linux Foundation is
>housing this "Core Infrastructure Initiative" project, and so they are
>the ones who get all the money. "The Initiative's funds will be
>administered by the Linux Foundation and a steering group comprised of
>backers of the project as well as key open source developers and other
>industry stakeholders." So, it might be in the interest of these
>people to not necessarily fix bugs. They might be interested in other
>things, like ownership. Though, this may be a bit irrational.

It has occurred to me that Linux (in general, not the Foundation)
contains a number of religious zealots and the current OpenSSL license
is not in keeping with their religion.  And there have been previous
cases where portable open source software has passed into the
maintainership of Linux groups and had all the cross-platform code
excised to make it Linux-only.

-- 
Peter Jeremy

pgpwNAwcA6h9m.pgp
Description: PGP signature

Re: Import of DragonFly Mail Agent

2014-02-24 Thread Peter Jeremy

On 2014-Feb-24 10:44:30 -0600, Bryan Drewery  wrote:
>
>I have the Oreilly sendmail book here and it's thicker than The Design
>and Implementation of the FreeBSD Operating System. That's quite an
>application!

More impressively, ISTR it's thicker than "The Magic Garden Explained"
- which is the SVR4 internals.

-- 
Peter Jeremy


pgpXr6FrMeCfw.pgp
Description: PGP signature

Re: ZFS command can block the whole ZFS subsystem!

2014-01-05 Thread Peter Jeremy

On 2014-Jan-05 09:11:38 +0100, "O. Hartmann"  
wrote:
>On Sun, 5 Jan 2014 10:14:26 +1100
>Peter Jeremy  wrote:
>
>> On 2014-Jan-04 23:26:42 +0100, "O. Hartmann"
>>  wrote:
>> >zfs list -r BACKUP00
>> >NAME  USED  AVAIL  REFER  MOUNTPOINT
>> >BACKUP00 1.48T  1.19T   144K  /BACKUP00
>> >BACKUP00/backup  1.47T  1.19T  1.47T  /backup
>> 
>> Well, that at least shows it's making progress - it's gone from 2.5T
>> to 1.47T used (though I gather that has taken several days).  Can you
>> pleas post the result of
>> zfs get all BACKUP00/backup

>BACKUP00/backup  deduponlocal

This is your problem.  Before it can free any block, it has to check
for other references to the block via the DDT and I suspect you don't
have enough RAM to cache the DDT.

Your options are:
1) Wait until the delete finishes.
2) Destroy the pool with extreme prejudice: Forcably export the pool
   (probably by booting to single user and not starting ZFS) and write
   zeroes to the first and last MB of ada3p1.

BTW, this problem will occur on any filesystem where you've ever
enabled dedup - once there are any dedup'd blocks in a filesystem,
all deletes need to go via the DDT.

-- 
Peter Jeremy

pgp3MDihoDvIU.pgp
Description: PGP signature

Re: ZFS command can block the whole ZFS subsystem!

2014-01-04 Thread Peter Jeremy

On 2014-Jan-04 23:26:42 +0100, "O. Hartmann"  
wrote:
>zfs list -r BACKUP00
>NAME  USED  AVAIL  REFER  MOUNTPOINT
>BACKUP00 1.48T  1.19T   144K  /BACKUP00
>BACKUP00/backup  1.47T  1.19T  1.47T  /backup

Well, that at least shows it's making progress - it's gone from 2.5T
to 1.47T used (though I gather that has taken several days).  Can you
pleas post the result of
zfs get all BACKUP00/backup

-- 
Peter Jeremy

pgpmSrBIo4DlN.pgp
Description: PGP signature

Re: ZFS command can block the whole ZFS subsystem!

2014-01-04 Thread Peter Jeremy

On 2014-Jan-03 20:25:35 +0100, "O. Hartmann"  
wrote:
>[~] zfs get all BACKUP00
>NAME  PROPERTY  VALUE SOURCE
...
>BACKUP00  usedbysnapshots   0 -
>BACKUP00  usedbydataset 144K  -
>BACKUP00  usedbychildren2.53T -
>BACKUP00  usedbyrefreservation  0 -

>Funny, the disk is supposed to be "empty" ... but is marked as used by
>2.5 TB ...

That says there's another filesystem inside BACKUP00 which has 2.5TB used.

What are the results of:
zpool status -v BACKUP00
zfs list -r BACKUP00

-- 
Peter Jeremy


pgpJndNkyBTKH.pgp
Description: PGP signature

Re: PACKAGESITE spam

2013-12-26 Thread Peter Jeremy

On 2013-Dec-22 11:53:17 -0800, Darren Pilgrim  
wrote:
>Because of that deinstall log.  When you use `pkg install` to upgrade a 
>port, you get something like this:
>
>Jul 10 23:06:40 chombo pkg-static: ca_root_nss-3.15.1 installed
>Nov 29 15:04:52 chombo pkg: ca_root_nss reinstalled: 3.15.2_1
>
>That information does not exist in the pkg database.

I agree that's a serious bug/regression in the pkg database: With the
old pkg system, I could tell when a port was installed by looking at
the timestamps on the +COMMENT file.  The install time is needed to
answer questions like "does this entry in UPDATING affect me" (ie have
I rebuilt the port since the entry date).  It's something I used
regularly and its absence is a PITA.

I shouldn't need to rummage through /var/log/messages - and in any case,
by default FreeBSD only keeps 500K of messages history (about a month
in my case) so the information has probably rotated into the bit bucket.

I agree that having a pkg audit trail would be useful.  Unfortunately,
what we have today is not an audit trail and isn't especially useful.

-- 
Peter Jeremy

pgpVS_m9BxiAC.pgp
Description: PGP signature

Re: [Call For Help] Clang + OpenJDK + head + amd64 == cocktail of death (for clusters)

2013-07-25 Thread Peter Jeremy

On 2013-Jul-25 10:39:17 +0200, Baptiste Daroussin  wrote:
>After some investigation we discover that blacklisting openjdk6 allows the
>building process to go to completion again.
...
>It seems to happen only on head amd64, so far we think it is only
>happening when jdk is built with clang.

This mail arrives at an opportune time.  I've just discovered that if
I build openjdk6 with clang (on head/amd64), the resultant jdk SEGV's
if I again try to build openjdk6.  If I build it with "USE_GCC=any"
then the problem goes away.

>I have no time, neither skill to investigate that,

I don't have the time to investigate further but forcing the use of gcc
instead of clang is at least a workaround.

-- 
Peter Jeremy

pgpDa0UXCa_Nr.pgp
Description: PGP signature

Re: access to hard drives is "blocked" by writes to a flash drive

2013-03-04 Thread Peter Jeremy

On 2013-Mar-03 23:12:40 -0800, Don Lewis  wrote:
>On  4 Mar, Konstantin Belousov wrote:
>> It could be argued that the current typical value of 16MB for the
>> hirunningbufspace is too low, but experiments with increasing it did
>> not provided any measureable change in the throughput or latency for
>> some loads.
>
>The correct value is probably proportional to the write bandwidth
>available.

The problem is that write bandwidth varies widely depending on the
workload.  For spinning rust, this will vary between maybe 64KBps
(512B random writes) and 100-150MBps (single-theaded large sequential
writes).  The (low-end) SSD in my Netbook also has about 100:1 variance
due to erase blocking.  How do you tune hirunningbufspace in the face
of 2 or 3 orders of magnitude variance in throughput?  Especially since
SSDs don't gradually degrade - they hit a brick wall.

-- 
Peter Jeremy

pgpZfJbSDrVSA.pgp
Description: PGP signature

Re: access to hard drives is "blocked" by writes to a flash drive

2013-03-02 Thread Peter Jeremy

On 2013-Mar-02 18:29:54 +0100, deeptech71  wrote:

>When one of my flash drives is being heavily written to; typically by
>``svn update'' on /usr/src, located on the flash drive; the following
>can be said about filesystem behavior:
>
>- ``svn update'' seems to be able to quickly update a bunch of files,
>   but is then unable to continue for a period of time. This behavior
>   is cyclical, and cycles several times, depending on the amount of
>   updating work to be done for a particular run of ``svn update''.

This sounds like normal flash behaviour:  You can only write to erased
blocks.  The SSD firmware attempts to keep a free pool of erased blocks
but if you write too fast, you empty the free pool and need to wait for
the wear-levelling algorithm to move blocks around and erase them.

Enabling TRIM (the '-t' flag on tunefs) will help if the drive supports
TRIM (if it doesn't, it'll probably just lockup).  Otherwise, you need
to either put up with it or upgrade to a better SSD.

I run into this regularly with the low-end SuperTalent drive in my
Netbook but have never seen it with the OCZ Agility4 that I use for
L2ARC in my fileserver.

-- 
Peter Jeremy

pgpPsz41Q1HhI.pgp
Description: PGP signature

Re: No ZFS when loading modules from loeader prompt

2013-02-21 Thread Peter Jeremy

On Wed, Feb 20, 2013 at 7:05 AM, O. Hartmann  
wrote:
> At the loader prompt, I need to unload the buggy kernel and load the old
> working one via
>
> load /boot/kernel.old/kernel
>
> Then I load also the ZFS related modules
>
> load /boot/kernel.old/opensolaris.ko
> load /boot/kernel.old/zfs.ko
>
> Issuing boot at the end of that stage boots the kernel - the old one
> -successfully - but there is no working ZFS and no ZFS volume gets
> mounted although the rc.conf is executed correctly.
>
> What am I doing wrong at that point? Why isn't ZFS run and mount properly?

Last time I ran into this problem, the issue was that "unload" also
unloaded the zpool.cache file and the ZFS code relied on that to find
the kernel.  I don't recall what the workaround was.

On 2013-Feb-20 08:17:46 -0800, Freddie Cash  wrote:
>Sounds like a perfect use case for Boot Environments.  Create a new BE,
>install the new kernel into it, set it as the default, reboot.  If it
>fails, you manually set the previous BE as the default, and reboot.  That
>way, your "known-good", working environment is never affected.

How do you change your BE in the loader?  Or how do you change your
BE when you can't boot?

-- 
Peter Jeremy


pgpHx5Un14coz.pgp
Description: PGP signature

Re: Zpool surgery

2013-01-27 Thread Peter Jeremy

On 2013-Jan-27 14:31:56 -, Steven Hartland  wrote:
>- Original Message - 
>From: "Ulrich Spörlein" 
>> I want to transplant my old zpool tank from a 1TB drive to a new 2TB
>> drive, but *not* use dd(1) or any other cloning mechanism, as the pool
>> was very full very often and is surely severely fragmented.
>
>Cant you just drop the disk in the original machine, set it as a mirror
>then once the mirror process has completed break the mirror and remove
>the 1TB disk.

That will replicate any fragmentation as well.  "zfs send | zfs recv"
is the only (current) way to defragment a ZFS pool.

-- 
Peter Jeremy


pgp7mByYv45q2.pgp
Description: PGP signature

Re: Programmer dvorak layout for syscons

2012-11-19 Thread Peter Jeremy

On 2012-Nov-20 02:42:50 +0200, mbsd  wrote:
>I've been using this layout for a long time in X and I create kbdmap for
>syscons.
>
>Does it any chance to be put in source tree? So my question is, is it
>worth.

I suggest you write a PR that includes the keymap and an appropriate
patch for /usr/share/syscons/keymaps/INDEX.keymaps as well as explaining
how it differs from the 9 existing Dvorak keymaps.

-- 
Peter Jeremy

pgpSNEQbnvSGA.pgp
Description: PGP signature

Re: HEADS UP: Forth Optimizations

2012-11-11 Thread Peter Jeremy

On 2012-Nov-10 16:53:10 -0800, Devin Teske  wrote:
>Can someone help review this for the commit log?

I've had a look through the proposed patch and my comments follow.
Other than that, it looks good to me.

>Index: menu-commands.4th
>===
>--- menu-commands.4th  (revision 242835)
>+++ menu-commands.4th  (working copy)
...
>@@ -185,21 +240,21 @@ variable root_state
...
>   s" set kernel=${kernel_prefix}${kernel[N]}${kernel_suffix}"
>-\ command to assemble full kernel-path
>-  -rot tuck 36 + c! swap\ replace 'N' with array index value
>-  evaluate  \ sets $kernel to full kernel-path
>+  36 +c! \ replace 'N' with ASCII numeral
>+  evaluate

I think the "sets $kernel to full kernel-path" comment is worth keeping.

>   s" set root=${root_prefix}${root[N]}${root_suffix}"
>-\ command to assemble root image-path
>-  -rot tuck 30 + c! swap\ replace 'N' with array index value
>-  evaluate  \ sets $kernel to full kernel-path
>+  30 +c! \ replace 'N' with ASCII numeral
>+  evaluate

Likewise, this could do with a (corrected) comment that it sets $root
to the full path to root.

>Index: menu.4th
>===
>--- menu.4th   (revision 242835)
>+++ menu.4th   (working copy)
>@@ -184,18 +223,15 @@ create init_text8 255 allot
> 
>   \ base name of environment variable
>   loader_color? if
>-  s" ansi_caption[x]"
>+  dup ansi_caption[x]
>   else
>-  s" menu_caption[x]"
>+  dup menu_caption[x]
>   then

Could this be simplified to

=   dup
=   loader_color? if
=   ansi_caption[x]
=   else
=   menu_caption[x]
=   then

Or, at a higher level, should this whole block be pulled into a new
word (along with similar words for toggled_{ansi,text}[x] and
{ansi,menu}_caption[x][y]?

>@@ -227,36 +263,26 @@ create init_text8 255 allot
...
>   getenv dup -1 <> if
>   \ Assign toggled text to menu caption

Some comments on stack contents around here would make it somewhat
easier to follow what is going on.

>@@ -329,19 +340,18 @@ create init_text8 255 allot
...
>   \ This is highly unlikely to occur, but to make
>   \ sure that things move along smoothly, allocate
>   \ a temporary NULL string
> 
>+  drop ( getenv cruft )
>   s" "
>   then
>   then

Is this the memory leak?  If so, can I suggest that this be commited
separately since it is a simple change and is distinct from the other
changes you are proposing.

>@@ -357,14 +367,14 @@ create init_text8 255 allot
>   \ 
>   \ Let's perform what we need to with the above.
> 
>-  \ base name of menuitem caption var
>+  \ Assign array value text to menu caption
>+  4 pick

According to the docementation just above this hunk, there are only 4
items on the stack, so "4 pick" seems wrong, though it is consistent
with my understanding of the old code.  The "2 pick [char] 0" you
added earlier seems to similarly be out-by-one, though consistent.

>@@ -521,17 +528,20 @@ create init_text8 255 allot
> 
>   \ If this is the ACPI menu option, act accordingly.
>   dup menuacpi @ = if
>-  acpimenuitem ( -- C-Addr/U | -1 )
>+  dup acpimenuitem ( n -- n n c-addr/u | n n -1 )
>+  dup -1 <> if
>+  13 +c! ( n n c-addr/u -- n ) \ replace 'x'

I think the stack here should be ( n n c-addr/u -- n c-addr/u )

>@@ -950,100 +914,43 @@ create init_text8 255 allot
> 
>   49 \ Iterator start (loop range 49 to 56; ASCII '1' to '8')
>   begin
>-  \ Unset variables in-order of appearance in menu.4th(8)

Does the order matter?  I notice you've changed it.


pgpjhm7HlFkWe.pgp
Description: PGP signature

Re: [head tinderbox] failure on arm/arm

2012-11-10 Thread Peter Jeremy

On 2012-Nov-10 09:16:32 +1100, Brett  wrote:
>Just an observation: a few years ago when I got sick of Linux's
>"headlong rush" development model, I subscribed to various BSD
>mailing lists to see what else was out there. I considered FreeBSD at
>the time - there was a neverending avalanche of "[head tinderbox]
>failure" messages.

The Project tries to avoid it but occasional build failures on the
development branch are very likely to occur.  As a new user, you
would be much better off starting with a release branch.

>This told me that I would be more likely to be running code written
>by people who knew what they were doing if I went with Open, Net, or
>DragonflyBSD.

I think that's being unfair.  Do Open, Net or DFly have an equivalent
to the tinderboxes that do automated test builds and report failures?
And, since you have replied to an ARM failure, DragonflyBSD would not
be an option since it doesn't support ARM.

-- 
Peter Jeremy

pgpggt7LmRYN1.pgp
Description: PGP signature

Re: FORTRAN vs. Fortran (was: November 5th is Clang-Day)

2012-11-03 Thread Peter Jeremy

On 2012-Nov-02 11:21:10 -0500, Brooks Davis  wrote:
>On Fri, Nov 02, 2012 at 10:21:19AM +, Anton Shterenlikht wrote:
>> It's a shame though that, with LLVM as the
>> default compiler, further development of
>> FreeBSD/ia64 and FreeBSD/sparc64
>> will probably suffer and then stop altogether.
>
>If you read either my annoucment or the diff closly you will note that
>the default it only changing for x86 architectures.

Even with all the best of intentions, once the x86 architectures (which
cover the bulk of the user and developer mass) migrate to a different
toolchain, the risk of bitrot in the GNU toolchain decomes non-negligible.
And once it breaks, there may not be the critical mass to repair it.
This is basically what happened to the Alpha.

-- 
Peter Jeremy

pgpPdXemjRuOy.pgp
Description: PGP signature

Re: memory warnings r240891 | dmesgg

2012-10-06 Thread Peter Jeremy

On 2012-Oct-04 23:51:09 +0400, Sergey Kandaurov  wrote:
>On 4 October 2012 20:18, Darrel  wrote:
>> warning: total configured swap (2621440 pages) exceeds maximum
>> recommended amount (1852656 pages).
...
>This is because kernel needs some memory to manage swap too.
>Currently for amd64 this roughly reduces to the following rule
>(My apologies in advance for the extra simplification):
>
>100MB RAM per 800MB swap space.

That is oversimplified to the point of being wrong.  As of HEAD
r239255 and 9-stable r240097, there's no longer a limit on amd64.  The
limit is still required on 32-bit architectures due to the limited KVA
available.

The actual KVA requirements (RAM is only allocated when the swap space
is actually used) is about 5MB KVA per 1GB swap.  The default swzone
for i386 was 32MiB - which is sufficient for ~7GB swap (the 1852656
pages reported above) and was increased to 34.5MB for i386 in r239730
to support ~8GB swap (this is also in r240097).  (It's all approximate
because of the way swap space is allocated using struct swblock).

See the thread starting
http://lists.freebsd.org/pipermail/freebsd-current/2012-August/035839.html
for more details.

-- 
Peter Jeremy

pgprxHjDiuWkT.pgp
Description: PGP signature

Re: sysctl kern.ipc.somaxconn limit 65535 why?

2012-10-04 Thread Peter Jeremy

On 2012-Oct-03 19:45:01 +0100, free...@chrysalisnet.org wrote:
>In addition we had to migrate all our mysql servers from freebsd to debian
>because they were hitting some arbitary OS limit but I could never figure
>out what, sys% usage went through the roof when this limit was hit, issue
>didnt occur on debian.

Did you report this issue on any of the FreeBSD mailing lists?
Reporting a problem doesn't guarantee that it will be fixed
(unfortunately) but not reporting a problem makes it extremely
unlikely that it will be fixed.

>  I feel recently freebsd is more focused on desktop's
>and as such developer's never develop for a heavy server usage scenario,

This isn't intentionally true but it's true that few developers run
large servers so they may not run into some issues that only impact
large systems.  Again, it's up to people who do run such systems to
provide feedback about bottlenecks & issues they hit so that they can
be fixed.

>I keep coming across hardcoded low limits.  As rightly pointed out default

There are lots of defaults that were set some time (potentially
decades) ago and may no longer be optimal.  It's unrealistic to expect
that all the defaults are correct in all circumstances and this is one
area where end users can help by flagging defaults that they find need
tuning.

>values now days are useless 128 for somaxconn? maybe ok for a desktop.

But, as others have pointed out, this isn't one of them.  Can you
please provide more details on a use scenario where a listen(2)
backlog exceeding 128 is reasonable.

>  I cant tell app developers to
>fix their apps to work on FreeBSD, they dont care, if it works fine on
>windows and linux then the app isnt broken as far as they are concerned.

FreeBSD is not Windows or Linux and never will be.  There are lots of
grey areas in the various standards that *BSD, Linux, Solaris, Windows
etc comply with and some OSs interpret these grey areas differently to
others (in some areas, it seems Linux has deliberately done things
differently to other Unices for no obvious reason, and the GNU
embrace-and-extend philosophy doesn't help).  Writing portable code
takes more than adding some .ac/.am files to an arbitrary blob of code
and just because a developer thinks their app isn't broken doesn't
make them right.

BTW, I note that this was sent to -current?  Are you running HEAD on
production servers?  If so, your feedback on issues you encounter
would be appreciated so that they can be corrected before they make
it into a RELEASE.

-- 
Peter Jeremy
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"

Re: Shouldn't world be able to build without /usr/include?

2012-09-16 Thread Peter Jeremy

No.  The first stage of the buildworld is creating cross-tools - which
run on the existing world (and hence need its include files and libs).

-- 
Peter Jeremy


pgpFV9rJata7v.pgp
Description: PGP signature

Re: pkgng suggestion: renaming /usr/sbin/pkg to /usr/sbin/pkg-bootstrap

2012-08-26 Thread Peter Jeremy

On 2012-Aug-26 12:27:41 -0700, Doug Barton  wrote:
>On 08/26/2012 12:08, Ian Lepore wrote:
>> Maybe it could rename itself to /usr/local/sbin/pkg-bootstrap as part of
>> replacing itself, so that you could re-bootstrap your way out of a
>> problem later.
>
>That's certainly creative thinking, but I'm still queasy about 2
>commands with the same name that do 2 different things. And having it
>rename itself adds to the confusion down the road.

I also like the idea of a pkg-bootstrap command.  Possibly a symlink
from pkg to pkg-bootstrap, that gets removed as part of the bootstrap
process, would help - but it should just tell you how to run
pkg-bootstrap.  I don't like the idea of pkg{-bootstrap} autonomously
installing something I didn't ask for.  And I don't like the idea that
all pkg commands get bounced through a /usr/sbin/pkg once it has been
bootstrapped.

>Having a simple pkg bootstrapping tool in the base is a good idea. But
>the functionality needs to be extremely limited so that we don't
>increase the security exposure; and so that we don't end up in a
>situation where a bug fix for something in the base limits our ability
>to innovate with pkg in the ports tree.

Agreed.  BTW, one thing that needs to be considered is how to recover
from the embedded public key needing to be invalidated (eg due to the
private key being exposed).

-- 
Peter Jeremy

pgp6uilrjhsXu.pgp
Description: PGP signature

Re: dhclient cause up/down cycle after 239356 ?

2012-08-22 Thread Peter Jeremy

On 2012-Aug-22 15:35:01 -0400, John Baldwin  wrote:
>Hmm.  Perhaps we could use a debouncer to ignore "short" link flaps?  Kind of
>gross (and OpenBSD doesn't do this).  For now this change basically ignores
>link up events if they occur with 5 seconds of the link down event.  The 5 is
>hardcoded which is kind of yuck.

I'm also a bit concerned about this for similar reasons to adrian@.
We need to distinguish between short link outages caused by (eg) a
switch admin reconfiguring the switch (which needs the lease to be
re-checked) and those caused by broken NICs which report link status
changes when they are touched.  Maybe an alternative is to just ignore
link flaps when they occur within a few seconds of a script_go().
(And/or make the ignore timeout configurable).

Apart from fxp(4), does anyone know how many NICs are similarly
broken?

Does anyone know why this issue doesn't bite OpenBSD?  Does it have
a work-around to avoid resetting the link, not report link status
changes or just no-one has noticed the issue?

BTW to jhb: Can you check your mailer's list configuration.  You
appear to be adding  and leaving
 in the Cc list.

-- 
Peter Jeremy

pgp9SoqeQglFI.pgp
Description: PGP signature

Re: r239356: does it mean, that synchronous dhcp and dhcplcinet with disabled devd gone?

2012-08-21 Thread Peter Jeremy

On 2012-Aug-21 17:25:23 -0400, John Baldwin  wrote:
>Ok, this is what I came up with, somewhat loosely based on OpenBSD's dhclient.
>I tested that it survives the following:

I've also done some limited testing on both bge and fxp NICs and
haven't run into any problems.  In particular the spurious link resets
from fxp don't seem to cause any problems.

-- 
Peter Jeremy

pgp5gbqPFkDoz.pgp
Description: PGP signature

Re: dhclient cause up/down cycle after 239356 ?

2012-08-21 Thread Peter Jeremy

On 2012-Aug-21 19:42:17 +0300, Vitalij Satanivskij  wrote:
>Look's like dhclient do down/up sequence -

Not intentionally.

>Aug 21 19:21:00 home kernel: fxp0: link state changed to UP
>Aug 21 19:21:01 home kernel: fxp0: link state changed to DOWN
>Aug 21 19:21:01 home dhclient: New IP Address (fxp0): xx.xx.xx.xx
>Aug 21 19:21:01 home dhclient: New Subnet Mask (fxp0): 255.255.255.0
>Aug 21 19:21:01 home dhclient: New Broadcast Address (fxp0): xx.xx.xx.xx
>Aug 21 19:21:01 home dhclient: New Routers (fxp0): xx.xx.xx.xx
>Aug 21 19:21:03 home kernel: fxp0: link state changed to UP

I can reproduce this behaviour - but only on fxp (i82559 in my case)
NICs.  My bge (BCM5750) and rl (RTL8139) NICs do not report the
spurious DOWN/UP.  (I don't normally run DHCP on any fxp interfaces,
so I didn't see it during my testing).

The problem appears to be the 
  $IFCONFIG $interface inet alias 0.0.0.0 netmask 255.0.0.0 broadcast 
255.255.255.255 up
executed by /sbin/dhclient-script during PREINIT.  This is making the
fxp NIC reset the link (actually, assigning _any_ IP address to an fxp
NIC causes it to reset the link).  The post r239356 dhclient detects
the link going down and exits.

>Before r239356 iface just doing down/up without dhclient exit and
>everything work fine.

For you, anyway.  Failing to detect link down causes problems for me
because my dhclient was not seeing my cable-modem resets and therefore
failing to reacquire a DHCP lease.

-- 
Peter Jeremy

pgptb9EOcZ9Yg.pgp
Description: PGP signature

Re: buildworld c++ internal error

2012-08-20 Thread Peter Jeremy

On 2012-Aug-20 07:17:59 +0900, Randy Bush  wrote:
>the only thing a night's sleep got me was the idea of attaching an
>external sata drive and putting swap on it.

You can also swap to a file via NFS.

-- 
Peter Jeremy


pgp62N8KdUtmP.pgp
Description: PGP signature

Re: Time to bump default VM_SWZONE_SIZE_MAX?

2012-08-13 Thread Peter Jeremy

On 2012-Aug-12 15:44:07 -0700, Colin Percival  wrote:
>If I'm understanding things correctly, the "maxswzone" value -- set by the
>kern.maxswzone loader tunable or to VM_SWZONE_SIZE_MAX by default -- should
>be approximately 9 MiB per GiB of swap space.

I'm not sure how you got that value.  By default, struct swblock is
288 bytes (280 bytes on 32-bit archs) and can store up to 32 pages of
swap (the comment in vm/swap_pager.c:swap_pager_swap_init() is wrong).
For x86, this is 2.25 MiB per GiB (best case).

>The current default for VM_SWZONE_SIZE_MAX was set in August 2002 to 32 MiB;
>meaning that anyone who wants to use more than ~ 3.5 GB of swap space ought
>to set kern.maxswzone in /boot/loader.conf.

In practice, you can't fully populate each swblock.  I did a test on
my amd64 box by running multiple copies of a program that allocates
and dirties a big chunk of RAM and then pause()s.  That gave me a 90%
swblock utilisation - which I suspect is higher than a typical
scenario where memory pressure pushes more randomly unused pages out.

Realistically, I'd say that the default VM_SWZONE_SIZE_MAX can handle
about 9GB swap (at least, that was my experience).

BTW, if you plan on allocating lots of swap, be aware that each swap
device is limited to 32GiB - see vm/swap_pager.c:swaponsomething().

-- 
Peter Jeremy

pgpwSk7xMhpGY.pgp
Description: PGP signature

Re: [HEADSUP & CFT] pkg 1.0rc1 and schedule

2012-07-16 Thread Peter Jeremy

On 2012-Jul-16 07:18:05 +0100, Matthew Seaman  wrote:
>No.  Parallel installs will not work -- the first to start will lock the
>DB, and the second won't be able to proceed.

Good - it was the locking I was mostly concerned about.  As long as
the install is locked, it's safe to run multiple port installs on
different terminalls without them treading on each other.  (Next step,
outside pkgng, in to allow paralles builds).

Thank you for all the answers.

-- 
Peter Jeremy

pgp0v7MUuicxP.pgp
Description: PGP signature

Re: [HEADSUP & CFT] pkg 1.0rc1 and schedule

2012-07-15 Thread Peter Jeremy

On 2012-Jul-12 10:01:10 +, Baptiste Daroussin  wrote:
>What is pkg
>---
>pkg is a new package manager for FreeBSD. It is designed as a replacement for
>the pkg_* tools, and as a full featured binary package manager.

A couple of specific questions that I haven't seen answered during
this thread or in the wiki:
- Can pkgng cope with parallel installs?  What happpens if I
  simultaneously (attempt to) install conflicting packages?
- If I use "pkg delete -f", what happens to packages that depended
  on the forcibly-deleted package?
- What happens if I delete a package where I've modified one of the
  files managed by the package?
- What facilities does it have for auditing and repairing the package
  database? (ie checking for inconsistencies between installed files
  and the content of the package database)
- How does it handle the situation where I install a package that
  depends on foo version 1.2.3 but have foo version 1.2.4 (or 1.2.2)
  installed?  What about if I have bar version 1.3, which is ABI-
  compatible with foo version 1.2.3, installed?
- Will it detect that a package install would overwrite an existing
  file?  What does it do in this case?
- I gather it handles "update package" more intelligently than
  "uninstall old package, install new package".  Will it avoid
  replacing an old file with an identical one in the new package?
  If so, what happens to the file metadata (particularly uid, gid
  and mtime)?
- Can it track user-edited configuration files that are associated
  with packages?
- Can it do 2- or 3-way merges of package configuration files?
- The README states "Directory leftovers are automatically removed if
  they are not in the MTREE."  How does this work for directories
  that are shared between multiple packages?  Does this mean that if
  I add a file to a directory that was created by a package, that
  file will be deleted automatically if I delete the package?

-- 
Peter Jeremy


pgpJM9KZGxJce.pgp
Description: PGP signature

Re: Use of C99 extra long double math functions after r236148

2012-07-13 Thread Peter Jeremy

On 2012-Jul-13 11:58:05 -0400, David Schultz  wrote:
>I propose we set a timeframe for this, on the order of a few months.
...
>If the schedule can't be met, then we can just import Cephes as an
>interim solution without further ado.  This provides Bruce and Steve
>an opportunity to commit what they have been working on, without
>forcing the rest of the FreeBSD community to wait indefinitely for
>the pie in the sky.

This sounds good to me as well and I'd be happy to help.

-- 
Peter Jeremy


pgpmY7CNvs676.pgp
Description: PGP signature

Re: Use of C99 extra long double math functions after r236148

2012-07-13 Thread Peter Jeremy

On 2012-Jul-11 15:32:47 -0700, Steve Kargl  
wrote:
>I know an approach to implementing many of the missing
>functions.

Are you willing to share this insight so someone else could do the work?

>  When I do find
>some free time, I look at what is missing and start to
>put together a new function.  At the moment, it seems
>that it takes 3+ years to get a new function written,
>tested, and committed.

And, from what I can see, much of this is done quietly - which opens
up the possibility that two people might both implement the same code
or that people will avoid the area in fear of treading on someone
else's toes.  As I said previously, I believe the existing wiki page
could be improved to form a central co-ordinating point to show what
what activity is (or isn't) occurring.

>but most people seem to push the "easy button" and want
>to grab either cephes or netlib's libm.  There are
>technical issues with this approach that I won't 
>rehash again.

Doing it properly requires significant effort by people with fairly
specialised skills.  Whilst the project has several people with the
skills, it appears that none of them currently have the time.  In the
meantime, FreeBSD is taking free kicks from other FOSS groups that
have gone down the quick-and-dirty path.

AFAIK, none of the relevant standards (POSIX, IEEE754) have any
precision requirements for functions other than +-*/ and sqrt() - all
of which we have correctly implemented.  I therefore believe that, for
the remaining missing functions, the Project would be best served by
committing the best code that is currently available under a suitable
license and cleaning it up over time (as was done for the current
libm).

-- 
Peter Jeremy

pgpPVXxJTjV0R.pgp
Description: PGP signature

Re: Adding support for WC (write-combining) memory to bus_dma

2012-07-12 Thread Peter Jeremy

On 2012-Jul-12 10:40:27 -0400, John Baldwin  wrote:
>contigmalloc().  In fact, even better is to call kmem_alloc_contig() directly
>rather than using contigmalloc().
...
>Peter, this is somewhat orthognal (but related) to your bus_dma patch which is
>what prompted me to post this.

Overall, the change seems good to me.  My sole thought on the API was
whether the actual attribute should be passed, rather than having a
couple of new BUS_DMA_ flags but you've addressed that in a followup.

One change is that previously allocated memory was all charged to
M_DEVBUF via the malloc_type_allocated() call in contigmalloc()
whereas now only small allocations are counted.  This would seem to
indicate that large bus_dmamem_alloc() allocations won't be visible in
(eg) "vmstat -m".

-- 
Peter Jeremy

pgpZoejmmJeAW.pgp
Description: PGP signature

Re: Use of C99 extra long double math functions after r236148

2012-07-10 Thread Peter Jeremy

On 2012-Jul-08 19:01:07 -0700, Steve Kargl  
wrote:
>Well, on the most popular hardware (that being i386/amd64),
>ld80 will use hardware fp instruction while ld128 must be
>done completely in software.  The speed difference is
>significant.

AFAIK, of the architectures that FreeBSD supports, only sparc64
defines ld128 in the architecture and I don't believe there are any
SPARC chip implementations that implement ld128 math in hardware.

For that matter, I don't believe anything except x86 provides full
IEEE FP support in hardware - most architectures require software
assistance for subnormals and some corner cases.  If your application
happens to hit those cases often, performance will also suffer.

On 2012-Jul-08 20:05:04 -0700, Steve Kargl  
wrote:
>AFAIK, neither gcc in base nor clang would be c99 complaint
>even if all of the c99 math functions were available.

That sort of argument can easily get circular.  Lets get the C99 bits
of libm out of the way and then we can have another bikeshed about the
shortcomings of the compiler(s).

On 2012-Jul-08 19:56:52 -0400, David Schultz  wrote:
>Yes, Bruce has ld128 versions, and clusteradm very kindly got us a
>sparc64 machine to test on.  That was about the time I ran out of time
>to keep working on it.  If someone wants to pick it up, that would be
>great.

I have access to a couple of SPARC systems as well and would be willing
to help work on the missing bits.

On 2012-Jul-10 18:58:01 -0400, David Schultz  wrote:
>On Tue, Jul 10, 2012, Rainer Hurling wrote:
>> powl:   src/extra/trio/triostr.c
>> src/extra/trio/trio.c
>> src/main/format.c
>
>It's hard to do a good job on powl(), but the simple approach
>(exp(log(x)*y)) plus a few special cases may suffice for many uses.

A simplistic exp(log(x)*y) throws away 15 bits of precision (size of
the FP exponent field).  cephes has a powl() that appears to do better
or, alternatively, it shouldn't be too difficult to extend the approach
used by __ieee754_pow() using long doubles.

>> BTW: There seems to be a discrepancy about missing functions listed in
>> http://wiki.freebsd.org/MissingMathStuff and in
>> http://svnweb.freebsd.org/base/head/lib/msun/src/math.h?r1=227472&r2=236148&pathrev=236148.
>> So the wiki is a bit outdated now?
>My list:
[elided]

I was thinking that a wiki page would be a good spot to co-ordinate
the work (as well as making it clear what is still to be done).  The
existing page needs some TLC to be useful.

-- 
Peter Jeremy

pgpJMDQgZRF8K.pgp
Description: PGP signature

RAM fragmention problems

2012-07-05 Thread Peter Jeremy

I am running into a problem with RAM fragmentation causing contigmalloc()
failures and wonder if anyone has a tool that that would allow me to
identify the owner(s) of pages of RAM within a region on amd64.

-- 
Peter Jeremy


pgpJ5bQo0Tiwa.pgp
Description: PGP signature

Re: Add new syscons font to FreeBSD current release

2012-06-21 Thread Peter Jeremy

On 2012-Jun-20 17:38:36 +0430, Mohammad Shafiee  
wrote:
>I've made a Persian font for FreeBSD syscons.
>You can download the font from here:
>http://sourceforge.net/projects/bsdpersiancons/
>
>How can I add this font to FreeBSD current release?

As a first step, I'd create a port for it.  See
http://www.freebsd.org/doc/en/books/porters-handbook/

-- 
Peter Jeremy


pgprd7bzEzHR2.pgp
Description: PGP signature

Re: Use of C99 extra long double math functions after r236148

2012-06-01 Thread Peter Jeremy

On 2012-Jun-01 10:29:13 -0400, John Baldwin  wrote:
>On Friday, June 01, 2012 1:55:10 am Eitan Adler wrote:
>> Also, are there BSD licensed naive implementations of these functions
>> we can use? Would it be okay to has slow, but accurate versions of
>> these functions as a stopgap?
>
>Peter Jeremy more or less has a stopgap already ready judging by the comments 
>in the thread thus far.

There's probably an hours work by either stephen@ or myself to adapt
the work I did on cephes in Sage to a standalone FreeBSD port.
Unfortunately, both stephen@ & I are currently otherwise occupied and
other comments in this thread suggest that the inclusion of such a port
would be strongly opposed.

Note that cephes isn't "slow but accurate" - it's reasonably fast but
naive and therefore dodgy in edge cases.

-- 
Peter Jeremy

pgpHAsPC0mWbI.pgp
Description: PGP signature

Re: OptionalObsoleteFiles.inc completeness

2012-06-01 Thread Peter Jeremy

On 2012-Jun-01 20:50:24 +0200, Ulrich Spörlein  wrote:
>Why is xargs even calling /bin/echo when "utility" is not specified.

Because that's what it's documented as doing.

>Shouldn't it just print a certain number of arguments (one in this
>case)?

The current approach is simpler - there's always "utility" and it
defaults to "/bin/echo".  Therefore xargs can just always fork/exec.
I agree that special-casing the default to have xargs print the
relevant number of arguments would be more efficient.

-- 
Peter Jeremy

pgpjWzNyZgd8T.pgp
Description: PGP signature

Re: OptionalObsoleteFiles.inc completeness

2012-05-31 Thread Peter Jeremy

On 2012-May-30 13:27:03 +1000, Peter Jeremy  wrote:
>On 2012-May-29 02:18:25 +0400, Dmitry Marakasov  wrote:
>>Then you should try to profile it - my script basically runs
>>delete-old delete-old-libs for every knob (131 of them), and it
>>hadn't taken more than 4 seconds even once.
>
>I've done some investigating and the problem is that "xargs -n1"
>fork()/exec()s /bin/echo on each file (and there are 5538 files for
>me).  Changing this to "tr ' ' '\n'" reduces "make delete-old" runtime
>to 1.75s - which is much nicer.  I've checked a variety of other
>systems running 8.x & 9.x and the 97s seems to be anomalously long so
>I'll do some more investigating.

I've tracked the problem down to excessive VM faults caused by
jemalloc.  Whilst executing /bin/echo, jemalloc mmap()s two 4MiB
chunks of memory.  Unless you build with MALLOC_PRODUCTION (which I
hadn't), it then proceeds to verify that both blocks are zero-filled.
This causes 2048 (unnecessary) page faults (out of a total of 2133).
When I rebuilt jemalloc with MALLOC_PRODUCTION, this dropped to 87
page faults (cf 76 an 8.x and 62 on 9.x) and the elapsed time for
"make delete-old" dropped to slightly more than 8.x & 9.x.

"xargs -n1" is probably a worst case scenario for jemalloc but this
probably similarly affects other short-lived processes (and the shell
scripts that invoke them).  It's a pity that this particular test is a
compile-time option.

I still think that saving 5500 fork()/exec() pairs is a good reason
to switch from "xargs -n1" to "tr ' ' '\n'".

-- 
Peter Jeremy

pgp66hvYrS7pF.pgp
Description: PGP signature

Re: OptionalObsoleteFiles.inc completeness

2012-05-29 Thread Peter Jeremy

On 2012-May-29 02:18:25 +0400, Dmitry Marakasov  wrote:
>* Peter Jeremy (pe...@rulingia.com) wrote:
>> My experience is that it now takes about 2½ minutes on 10.x with warm
>> caches, compared to less than 1 second on 8.x.
>
>Now = after applying my patch or after changing system? Which knobs
>were enabled?

"Now" as in -current as against 8.x.  But, that 2½ mins was wrong,
sorry.  I recalled "150s" but actually checking, it's really 1:50
(100s).  It occurred to me that was an oldish -current (r235127) so I
updated to r236183 and the time dropped to 107s.  Since this is an
oldish P4, I tried a UP kernel and that reduced it to 96s.  Your patch
made no noticable change (ministat reported no difference with 95%
confidence).

The system is amd64 with no MK_* knobs defined.

>Then you should try to profile it - my script basically runs
>delete-old delete-old-libs for every knob (131 of them), and it
>hadn't taken more than 4 seconds even once.

I've done some investigating and the problem is that "xargs -n1"
fork()/exec()s /bin/echo on each file (and there are 5538 files for
me).  Changing this to "tr ' ' '\n'" reduces "make delete-old" runtime
to 1.75s - which is much nicer.  I've checked a variety of other
systems running 8.x & 9.x and the 97s seems to be anomalously long so
I'll do some more investigating.

-- 
Peter Jeremy

pgp23vtZvpadf.pgp
Description: PGP signature

Re: Use of C99 extra long double math functions after r236148

2012-05-28 Thread Peter Jeremy

On 2012-May-28 15:54:06 -0700, Steve Kargl  
wrote:
>Given that cephes was written years before C99 was even
>conceived, I suspect all functions are sub-standard.

Well, most of cephes was written before C99.  The C99 parts of
cephes were written to turn it into a complete C99 implementation.

>  For
>example, AFAIK, none of the long double functions are
>appropriate for any platform that has an 128-bit long double;
>as cephes was written for an Intel 80-bit format.

FreeBSD currently supports:
64-bit long doubles on ARM, MIPS and PowerPC;
80-bit long doubles on amd64, i386 and iA64;
128-bit long doubles on SPARC.

The lack of LD128 in cephes therefore only affects one (not widely
used) platform.  The lack of even de facto standards for long
double mean that any applications wanting to use them already need
to cope with at least a 2:1 precision range.

>If portmgr or a port maintainer wants to use a library with
>untested implementations of missing libm functions, please do
>not put it into /usr/local/lib and call it libm.

There some test code in cephes.  Can you point me to a suitable test
suite for LD80 and LD128?  The reason for calling it libm is to avoid
having to hack every consumer to add an additional library.

On 2012-May-28 16:30:35 -0700, Steve Kargl  
wrote:
>Who's writing the code to test the implementations?  That is
>better much the problem.  Without testing, one might get an
>implementation that appears to work until it doesn't!

That is equally true of the rest of FreeBSD.  The list of open PRs
suggests that FreeBSD still has a fair way to go before reaching
perfection.  And, most of this thread has been about using this code
in ports - where the bar is much lower.  Who is writing the code to
test all the other ports?  What is so special about this particular
proposed port that it needs to come with solid-gold credentials?

>  It took
>me 3+ years to get sqrtl() into libm, but bde and das (and
>myself) wanted to make sure the code worked.

Last time I checked (a couple of years ago), FreeBSD was missing 65
C99 libm functions.  At 3 years per function, we should have C99
support available early in the 23rd century - which may be a bit late.

On 2012-May-28 22:03:43 -0500, Stephen Montgomery-Smith  
wrote:
>1.  By being so picky about being so precise, FreeBSD is behind the time 
>line in rolling out a usable set of C99 functions.

And at the current rate, we'll all be long dead before they are
available.  Whilst I'd far prefer to have a properly verifed library
function, I think we are better off with an implementation that has
some caveats regarding edge-case behaviour than having nothing.

>In the end, I do think it is good to ultimately settle on good C99 
>compliant code.  But having something intermediate that mostly works is 
>better than nothing.  Especially if it exists only in the ports, and not 
>in the base code.

I agree with this sentiment.

What do people do on other free OSs?  Does a tested open source C99
libm exist anywhere?  glibc implements cpow(x,y) as cexp(y*clog(x))
and cephes does better than that.  Is FreeBSD wasting its time writing
"correct" C99 code because all the libm consumers expect no better
than what glibc offers?

I agree that writing correct libm functions is hard.  I think a lot of
the problem is that it's a mix of lots of boilerplate code testing for
special conditions and edge cases that is boring to write and fiddly
to get right, together with a kernel that is a pile of polynomial
evaluations full of magic numbers that needs specialist skills to
write.  If we could get someone with the relevant skills to formally
list all the special conditions & edge cases for each function, it
should be possible to generate both the library C code and test cases
from that - which would remove a lot of the tedium.

-- 
Peter Jeremy

pgpUnZGDcc79l.pgp
Description: PGP signature

Re: Use of C99 extra long double math functions after r236148

2012-05-28 Thread Peter Jeremy

On 2012-May-28 13:31:59 -0700, Steve Kargl  
wrote:
>On Mon, May 28, 2012 at 11:01:24AM -0500, Stephen Montgomery-Smith wrote:
>> One thing that could be done is to have a "math/cephes" port that adds 
>> the extra C99 math functions.  This is already done in the math/sage 
>> port, using a rather clever patch due to Peter Jeremy, that applies to 
>> the cephes code.
...
>This is a horrible, horrible, horrible idea.  Have you
>looked at the cephes code, particularly the complex.h
>functions?

The cephes code is somewhat a mess layout-wise.  Algorithmetically,
it seems somewhat variable - some functions are implemented (hopefully
correctly) using semi-numerical techniques, whereas others just use
mathematical identities which will result in precision loss - though
most of the functions include accuracy information.

I agree it would be far preferable to have a properly validated C99
libm with all functions having maximum errors of a no more than a few
LSB over their complete domain, as well as correct support for signed
zeroes, infinities and signalling and non-signalling NaNs but that is
a non-trivial undertaking.

In the interim, how should FreeBSD handle apps that want a C99 libm?
1) Fail to build them
2) Provide possibly imperfect fallbacks for the unimplemented bits.

If someone (I don't have the expertise) wants to identify the cephes
functions that are sub-standard, we can include link-time warnings
(as done for eg gets(3)) when they are used.

-- 
Peter Jeremy

pgpcG5SKNkFm9.pgp
Description: PGP signature

Re: Use of C99 extra long double math functions after r236148

2012-05-28 Thread Peter Jeremy

On 2012-May-28 11:01:24 -0500, Stephen Montgomery-Smith  
wrote:
>One thing that could be done is to have a "math/cephes" port that adds 
>the extra C99 math functions.  This is already done in the math/sage 
>port, using a rather clever patch due to Peter Jeremy, that applies to 
>the cephes code.
>
>What it would do is to create a /usr/local/lib/libm.so that would 
>provide the extra functions not currently included in /lib/libm.so, and 
>then link in /lib/libm.so as well.  It would also create its own 
>/usr/local/include/math.h and /usr/local/include/complex.h as well.

Basically, as long as the compiler searches /usr/local/{include,lib}
before the base include/lib then ,  and -lm give
the application a complete C99 math implementation by using base
functions where they exist and cephes functions where they don't.

The patch I wrote for sage can be found at
http://trac.sagemath.org/sage_trac/ticket/9543
If there's any interest, I could produce a port for this.

Another option would be to import cephes into base and use it to
provide the missing C99 functions.  Cephes includes copyright notices
but the closest I can find to a license is:
"   Some software in this archive may be from the book _Methods and
 Programs for Mathematical Functions_ (Prentice-Hall or Simon & Schuster
 International, 1989) or from the Cephes Mathematical Library, a
 commercial product. In either event, it is copyrighted by the author.
 What you see here may be used freely but it comes with no support or
 guarantee."

-- 
Peter Jeremy

pgpYmCz2gMd3i.pgp
Description: PGP signature

Re: OptionalObsoleteFiles.inc completeness

2012-05-28 Thread Peter Jeremy

On 2012-May-28 23:55:42 +0400, Dmitry Marakasov  wrote:
>* Peter Jeremy (pe...@rulingia.com) wrote:
>
>> >2) Is this ok to backport the list from current to stable branches? Pro
>> >- it's really simple, con - it will contain files never installed with
>> >this (old) branch.
>> 
>> Another con:  "make delete-old" on -current takes about 2 orders of
>> magnitude longer to run than on 8.x.  I would prefer to see some
>> effort put into speeding it up before it was backported.
>
>Is that really a reason while it is still under 4 seconds and is not
>usually run more often than updates (which take minutes if not hours)?

My experience is that it now takes about 2½ minutes on 10.x with warm
caches, compared to less than 1 second on 8.x.  For most of that time,
there's no output and there's no warning of the increased time.  I
actually wrote about the poor performance here a couple of weeks ago.

-- 
Peter Jeremy


pgpj1hAqZ4ktC.pgp
Description: PGP signature

Re: OptionalObsoleteFiles.inc completeness

2012-05-28 Thread Peter Jeremy

On 2012-May-27 18:05:41 +0400, Dmitry Marakasov  wrote:
>2) Is this ok to backport the list from current to stable branches? Pro
>- it's really simple, con - it will contain files never installed with
>this (old) branch.

Another con:  "make delete-old" on -current takes about 2 orders of
magnitude longer to run than on 8.x.  I would prefer to see some
effort put into speeding it up before it was backported.

-- 
Peter Jeremy

pgptJtyQZ4Lv8.pgp
Description: PGP signature

Re: UFS+J panics on HEAD

2012-05-24 Thread Peter Jeremy

On 2012-May-24 12:04:21 +0400, Lev Serebryakov  wrote:
>  I afraid, that after real hardware failure (like real HDD death,
>not these pseudo-broken-hardware situations, when HDDs is perfectly
>alive and in good condition), all data will be lost. I could restore
>data from remains of FFS by hands (format is straightforward and
>well-known), but ZFS is different story...

If your disk dies then you need a redundant copy of your data - either
via backups or via RAID.  Normally, you'd run ZFS with some level of
redundancy so that disk failures did not result in data loss.  That
said, ZFS is touchier about data - if it can't verify the checksums in
your data, it will refuse to give it to you - whereas UFS will hand
you back a pile of bytes that may or may the same as what you gave it
to store.  And you can't necessarily get _any_ data off a failed disk.

> Yes, backups is solution, but I don't have money to buy (reliable)
>hardware to backup 4Tb of data :(

4TB disks are available but not really economical at present.  2TB
disks still seem to be the happy medium.  If your data will compress
down to 2TB then save it to a disk, otherwise split your backups
across a pair of disks.  A 2TB disk with enclosure is < I attended "Solaris internals" 5-days training four years ago (when I
>worked for Sun Microsystems), and instructor says same words...

I have had lots of problems at $work with Solaris UFS quietly
corrupting data following crashes.  At least with ZFS, you have a
detter chance of knowing when your data has been corrupted.

-- 
Peter Jeremy

pgpk4t2qrNnV7.pgp
Description: PGP signature

Re: "make delete-old" performance.

2012-05-16 Thread Peter Jeremy

On 2012-May-16 18:11:32 -0700, Devin Teske  wrote:
>Right now, I believe the most useful comparison between systems is
>(assuming UFS is in play) the output of "tunefs -p" for the
>filesystem that the slowness is appearing on.

These systems all run ZFS and apart from the first run, there doesn't
seem to be any disk activity at all.  It looks like the kernel is the
bottleneck.

>SoftUpdates (and whether it's enabled or disabled) can play a huge
>difference in how fast file-deletions are.

I've already successfully run "make delete-old" so there are no actual
file deletions.  This is all just looking for files that aren't present.

-- 
Peter Jeremy

pgpI6smYwen8A.pgp
Description: PGP signature

1 2 3 4 5 >

1 - 100 of 434 matches

Mail list logo