Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

2023-04-18 Thread Mark Millard
On Apr 18, 2023, at 15:02, José Pérez  wrote:

> El 2023-04-18 21:37, Mark Millard escribió:
>>> In this case it does because the value is "active". If it's "enabled"
>>> you do not need to do anything.
>> Well, if block_cloning is disabled it would not become active.
> [...]
>> So, in progressing past the vintage that corrupt zfs data,
>> one could end up with block_cloning enabled in the process.
> 
> You still have to willingly issue the command
> zpool upgrade 
> so you might not just end up with the feature enabled by running this
> or that kernel, that's why I suggested step 0: verify if you are the
> worst case scenario before you begin.

I was not really worried about the no-zpool-upgrade/disabled
case. I was worried about "enabled" vs "active" as the
transition enabled -> active is automatic based on activity.

But there is overall disabled vs. enabled vs. active for the
block_cloning feature so I mentioned all 3.


>>> Boot in single user mode and check if your pool has block cloning in
>>> use:
>>> # zpool get feature@block_cloning zroot
>>> NAME PROPERTY VALUE SOURCE
>>> zroot feature@block_cloning active local
>>> In this case it does because the value is "active". If it's "enabled"
>>> you do not need to do anything.
> 
> If you did not upgrade the pool, the feature would just not be there and
> the pool is sane (*).

"Not being there" vs. "disabled" has some context to it. I worded
based on the way my context shows things.

My example context has:

# zpool get all zroot | grep compat
zroot  compatibility  openzfs-2.1-freebsdlocal

which explains the particular list of disabled features
reported below. (It is a "never had zpool upgrade" context
as well.)

# zpool get all zroot | grep disabled
zroot  feature@edonr  disabled   local
zroot  feature@zilsaxattr disabled   local
zroot  feature@head_errlogdisabled   local
zroot  feature@blake3 disabled   local
zroot  feature@block_cloning  disabled   local

so "not be there" seems to mean "disabled" as zpool presents
things based on compatibility. Just to see the command you
listed fully but in my type of context:

# zpool get feature@block_cloning zroot
NAME   PROPERTY   VALUE  SOURCE
zroot  feature@block_cloning  disabled   local

# zpool version
zfs-2.1.99-FreeBSD_g431083f75
zfs-kmod-2.1.99-FreeBSD_g431083f75

(Those are software versions, not properties of
specific pools.)

I'll note that I see:

# zpool get feature@JUNKNAME zroot
# 

So features that the software does not have in its
list of possibilities get an empty result.

> unaffected_machine# zpool get feature@block_cloning zroot
> unaffected_machine#

That is the same sort of output as in my feature@JUNKNAME
test above. It is not clear from what is presented that
the context had block_cloning in its list of possibilities.

In my normal environment (that still predates the import
of the openzfs update), I get the same sort of result
for feature@block_cloning as you show above.

> As said, if the feature has been enabled but no calls to
> copy_file_range() occurred, the pool is also sane.

A the time but more activity can change the status
because copy_file_range() could be called. So I
expect that the following step is relvant to avoid
ending up with block_cloning becoming active:

QUOTE
When in single user mode set compression property to "off" on any zfs 
active dataset that has compression other than "off" and the sync 
property to something other than "disabled".
END QUOTE

> To summarize:
> no feature -> sane
> feature "enabled" -> sane
> feature "active" -> might not be sane
> 
> BR,
> 
> (*) as per this bug.


===
Mark Millard
marklmi at yahoo.com




Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

2023-04-18 Thread José Pérez

El 2023-04-18 21:37, Mark Millard escribió:

In this case it does because the value is "active". If it's "enabled"
you do not need to do anything.


Well, if block_cloning is disabled it would not become active.

[...]

So, in progressing past the vintage that corrupt zfs data,
one could end up with block_cloning enabled in the process.


You still have to willingly issue the command
zpool upgrade 
so you might not just end up with the feature enabled by running this
or that kernel, that's why I suggested step 0: verify if you are the
worst case scenario before you begin.


Boot in single user mode and check if your pool has block cloning in
use:
# zpool get feature@block_cloning zroot
NAME PROPERTY VALUE SOURCE
zroot feature@block_cloning active local

In this case it does because the value is "active". If it's "enabled"
you do not need to do anything.


If you did not upgrade the pool, the feature would just not be there and
the pool is sane (*).

unaffected_machine# zpool get feature@block_cloning zroot
unaffected_machine#

As said, if the feature has been enabled but no calls to
copy_file_range() occurred, the pool is also sane.

To summarize:
no feature -> sane
feature "enabled" -> sane
feature "active" -> might not be sane

BR,

(*) as per this bug.
--
José Pérez



Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

2023-04-18 Thread Mark Millard
José_Pérez  wrote on
Date: Tue, 18 Apr 2023 16:59:03 UTC :

> El 2023-04-17 21:59, Pawel Jakub Dawidek escribió:
> > José,
> > 
> > I can only speak of block cloning in details, but I'll try to address
> > everything.
> > 
> > The easiest way to avoid block_cloning-related corruption on the
> > kernel after the last OpenZFS merge, but before e0bb199925 is to set
> > the compress property to 'off' and the sync property to something
> > other than 'disabled'. This will avoid the block_cloning-related
> > corruption and zil_replaying() panic.
> > 
> > As for the other corruption, unfortunately I don't know the details,
> > but my understanding is that it is happening under higher load. Not
> > sure I'd trust a kernel built on a machine with this bug present. What
> > I would do is to compile the kernel as of 068913e4ba somewhere else,
> > boot the problematic machine in single-user mode and install the newly
> > built kernel.
> > 
> > As far as I can tell, contrary to some initial reports, none of the
> > problems introduced by the recent OpenZFS merge corrupt the pool
> > metadata, only file's data. You can locate the files modified with the
> > bogus kernel using find(1) with a proper modification time, but you
> > have to decide what to do with them (either throw them away, restore
> > them from backup or inspect them).
> 
> Sharing my experience on how to get out of the worst case scenario with 
> a building machine that is affected by the bug.
> 
> CAVEAT: this is my experience, take it at your own risk. It worked for 
> me, there is no guarantee that it will work for your. You may create 
> corrupted files and make your system harder to recover or definitely 
> brick it. Don't blame me, you have been warned. YMMV.
> 
> Boot in single user mode and check if your pool has block cloning in 
> use:
> # zpool get feature@block_cloning zroot
> NAME PROPERTY VALUE SOURCE
> zroot feature@block_cloning active local
> 
> In this case it does because the value is "active". If it's "enabled" 
> you do not need to do anything.

Well, if block_cloning is disabled it would not become active.

But, if it is enabled, it can automatically become active by
creating a first entry in the involved Block Reference Table
during any activity meets the criteria for such. If the FreeBSD
vintage in place is one that corrupts zfs data for any reason,
one would still want to progress to a vintage that does not
corrupt zfs data, even if block_cloning is enabled but not
active just before starting such an update sequence.

So, in progressing past the vintage that corrupt zfs data,
one could end up with block_cloning enabled in the process.
At least, that is my understanding of the issue.

May be only a subset of the "causes data corruption" range of
vintages would have to worry about block_cloning becoming active
during the effort to get past all the sources of corruptions.
(If so, I've no clue what range that would be.)

I expect that the "you do not need to do anything" for
block_cloning being "enabled" instead of "active" may be too
strong of a claim, depending on the specific starting-vintage
inside the range with zfs data corruption problems.

(From what I've read, when the last Block Reference Table
entry is removed for any reason, the matching block_cloning
changes back from being indicated as active to being indicated
as enabled.)

> 1) When in single user mode set compression property to "off" on any zfs 
> active dataset that has compression other than "off" and the sync 
> property to something other than "disabled".
> 2) Boot multiuser and update your current sources, e.g.
> git update --rebase
> 3) Build and install a new kernel without too much pressure (e.g. with 
> -j 1):
> make -j 1 kernel
> 4) Reboot with the new kernel
> 5) Now you have to reinstall the kernel with
> make installkernel
> This is because the new kernel files were written by the old kernel 
> and need to be removed.
> 6) Find out when the pool was upgraded (I used command history) and 
> create a file with that date, in my case:
> touch -t 2304161957 /tmp/from
> 7) Find out when you booted the new kernel (I used fgrep Copyright 
> /var/log/messages | tail -n 1) and create a file with that date, in my 
> case:
> touch -t 2304172142 /tmp/to
> 8) Find the files/firs created between the two dates:
> find / -newerBm /tmp/from -and -not -newerBm /tmp/to > 
> /tmp/filelist.txt
> 9) Inspect /tmp/filelist.txt and save any important items. If the 
> important files are not corrupted you can do:
> cp important_file new; mv new important_file
> NOTA BENE: "touch important_file" would not work, you do need to 
> re-create the file.
> 10) Delete the remaining files/dirs in /tmp/filelist.txt. If you did 5) 
> you will remove /boot/kernel.old files, but not /boot/kernel files.
> 11) Restore your compression and sync properties where appropiate.
> 

===
Mark Millard
marklmi at yahoo.com




Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

2023-04-18 Thread José Pérez

El 2023-04-17 21:59, Pawel Jakub Dawidek escribió:

José,

I can only speak of block cloning in details, but I'll try to address
everything.

The easiest way to avoid block_cloning-related corruption on the
kernel after the last OpenZFS merge, but before e0bb199925 is to set
the compress property to 'off' and the sync property to something
other than 'disabled'. This will avoid the block_cloning-related
corruption and zil_replaying() panic.

As for the other corruption, unfortunately I don't know the details,
but my understanding is that it is happening under higher load. Not
sure I'd trust a kernel built on a machine with this bug present. What
I would do is to compile the kernel as of 068913e4ba somewhere else,
boot the problematic machine in single-user mode and install the newly
built kernel.

As far as I can tell, contrary to some initial reports, none of the
problems introduced by the recent OpenZFS merge corrupt the pool
metadata, only file's data. You can locate the files modified with the
bogus kernel using find(1) with a proper modification time, but you
have to decide what to do with them (either throw them away, restore
them from backup or inspect them).


Sharing my experience on how to get out of the worst case scenario with 
a building machine that is affected by the bug.


CAVEAT: this is my experience, take it at your own risk. It worked for 
me, there is no guarantee that it will work for your. You may create 
corrupted files and make your system harder to recover or definitely 
brick it. Don't blame me, you have been warned. YMMV.


Boot in single user mode and check if your pool has block cloning in 
use:

# zpool get feature@block_cloning zroot
NAME PROPERTY   VALUE  SOURCE
zrootfeature@block_cloning  active local

In this case it does because the value is "active". If it's "enabled" 
you do not need to do anything.


1) When in single user mode set compression property to "off" on any zfs 
active dataset that has compression other than "off" and the sync 
property to something other than "disabled".

2) Boot multiuser and update your current sources, e.g.
   git update --rebase
3) Build and install a new kernel without too much pressure (e.g. with 
-j 1):

   make -j 1 kernel
4) Reboot with the new kernel
5) Now you have to reinstall the kernel with
   make installkernel
   This is because the new kernel files were written by the old kernel 
and need to be removed.
6) Find out when the pool was upgraded (I used command history) and 
create a file with that date, in my case:

   touch -t 2304161957 /tmp/from
7) Find out when you booted the new kernel (I used fgrep Copyright 
/var/log/messages | tail -n 1) and create a file with that date, in my 
case:

   touch -t 2304172142 /tmp/to
8) Find the files/firs created between the two dates:
   find / -newerBm /tmp/from -and -not -newerBm /tmp/to > 
/tmp/filelist.txt
9) Inspect /tmp/filelist.txt and save any important items. If the 
important files are not corrupted you can do:

   cp important_file new; mv new important_file
   NOTA BENE: "touch important_file" would not work, you do need to 
re-create the file.
10) Delete the remaining files/dirs in /tmp/filelist.txt. If you did 5) 
you will remove /boot/kernel.old files, but not /boot/kernel files.

11) Restore your compression and sync properties where appropiate.

BR,

--
José Pérez



Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

2023-04-17 Thread Pawel Jakub Dawidek

On 4/17/23 21:28, José Pérez wrote:

Hi Pawel,
thank you for your reply and for the fixes.

I think there is a 4th issue that needs to be addressed: how do we 
recover from the worst case scenario which is a machine with a kernel > 
2a58b312b62f and ZFS root upgraded with block cloning enabled.


In particular, is it safe to turn such a machine on in the first place, 
and what are the risks involved in doing so? Any potential data loss?


Would such a machine be able to fix itself by compiling a kernel, or 
would compilation fail and might data be corrupted in the process?


I have two poudriere builders powered off (I am not alone in this 
situation) and I need to recover them, ideally minimizing data loss. The 
builders are also hosting current and used to build kernels and worlds 
for 13 and current: as of now all my production machines are stuck on 
the 13 they run, I cannot update binaries nor packages and I would like 
to be back online.


José,

I can only speak of block cloning in details, but I'll try to address 
everything.


The easiest way to avoid block_cloning-related corruption on the kernel 
after the last OpenZFS merge, but before e0bb199925 is to set the 
compress property to 'off' and the sync property to something other than 
'disabled'. This will avoid the block_cloning-related corruption and 
zil_replaying() panic.


As for the other corruption, unfortunately I don't know the details, but 
my understanding is that it is happening under higher load. Not sure I'd 
trust a kernel built on a machine with this bug present. What I would do 
is to compile the kernel as of 068913e4ba somewhere else, boot the 
problematic machine in single-user mode and install the newly built kernel.


As far as I can tell, contrary to some initial reports, none of the 
problems introduced by the recent OpenZFS merge corrupt the pool 
metadata, only file's data. You can locate the files modified with the 
bogus kernel using find(1) with a proper modification time, but you have 
to decide what to do with them (either throw them away, restore them 
from backup or inspect them).


--
Pawel Jakub Dawidek




Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

2023-04-17 Thread Mark Millard
José_Pérez  wrote on
Date: Mon, 17 Apr 2023 12:28:40 UTC :

> El 2023-04-17 12:43, Pawel Jakub Dawidek escribió:
> > On 4/17/23 18:15, Pawel Jakub Dawidek wrote:
> >> There were three issues that I know of after the recent OpenZFS merge:
> >> 
> >> 1. Data corruption unrelated to block cloning, so it can happen even 
> >> with block cloning disabled or not in use. This was the problematic 
> >> commit:
> >> 
> >> 
> >> https://github.com/openzfs/zfs/commit/519851122b1703b8445ec17bc89b347cea965bb9
> >> 
> >> It was reverted in 63ee747febbf024be0aace61161241b53245449e.
> >> 
> >> 2. Data corruption with embedded blocks when block cloning is enabled. 
> >> It can happen when compression is enabled and the block contains 
> >> between 60 to 112 bytes (this might be hard to determine). Fix exists, 
> >> it is merged to OpenZFS already, but isn't in FreeBSD yet.
> >> OpenZFS pull request: https://github.com/openzfs/zfs/pull/14739
> >> 
> >> 3. Panic on VERIFY(zil_replaying(zfsvfs->z_log, tx)). This is 
> >> triggered when block cloning is enabled, the sync property is set to 
> >> disabled and copy_file_range(2) is used. Easy fix exists, it is not 
> >> yet merged to OpenZFS and not yet in FreeBSD HEAD.
> >> OpenZFS pull request: https://github.com/openzfs/zfs/pull/14758
> >> 
> >> Block cloning was disabled in 
> >> 46ac8f2e7d9601311eb9b3cd2fed138ff4a11a66, so 2 and 3 should not occur.
> > 
> > As of 068913e4ba3dd9b3067056e832cefc5ed264b5cc all known issues are
> > fixed, as far as I can tell.
> > 
> > Block cloning remains disabled for now just to be on the safe side,
> > but can be enabled by setting sysctl vfs.zfs.bclone_enabled to 1.
> > 
> > Don't relay on this sysctl as it will be removed in 2-3 weeks.
> 
> Hi Pawel,
> thank you for your reply and for the fixes.
> 
> I think there is a 4th issue that needs to be addressed: how do we 
> recover from the worst case scenario which is a machine with a kernel > 
> 2a58b312b62f and ZFS root upgraded with block cloning enabled.
> 
> In particular, is it safe to turn such a machine on in the first place, 
> and what are the risks involved in doing so? Any potential data loss?
> 
> Would such a machine be able to fix itself by compiling a kernel, or 
> would compilation fail and might data be corrupted in the process?
> 
> I have two poudriere builders powered off (I am not alone in this 
> situation) and I need to recover them, ideally minimizing data loss. The 
> builders are also hosting current and used to build kernels and worlds 
> for 13 and current: as of now all my production machines are stuck on 
> the 13 they run, I cannot update binaries nor packages and I would like 
> to be back online.
> 
> Whatever the fixing procedure, it shall be outlined in the UPDATING 
> document.

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=270811 is an example
issue where a FreeBSD powerpc package building server can not boot
--after patching so it no longer gets a boot time "panic: floating-point
unavailable trap" (that jhibbits patch is still not committed):

QUOTE from the description:
. . .
nda1: 953869MB (1953525168 512 byte sectors)
GEOM_MIRROR: Device mirror/swap0 launched (2/2).
Mounting from zfs:zroot failed with error 6; retrying for 3 more seconds
Mounting from zfs:zroot failed with error 6.

Loader variables:
vfs.root.mountfrom=zfs:zroot

Manual root filesystem specification:
: [options]
Mount  using filesystem 
and with the specified (optional) option list.

eg. ufs:/dev/da0s1a
zfs:zroot/ROOT/default
cd9660:/dev/cd0 ro
(which is equivalent to: mount -t cd9660 -o ro /dev/cd0 /)

? List valid disk boot devices
. Yield 1 second (for background tasks)
 Abort manual input

mountroot>

This machine is part of the FreeBSD cluster for building PowerPC packages,
so we can build kernels to test anytime necessary.
END  QUOTE

===
Mark Millard
marklmi at yahoo.com




Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

2023-04-17 Thread José Pérez

El 2023-04-17 12:43, Pawel Jakub Dawidek escribió:

On 4/17/23 18:15, Pawel Jakub Dawidek wrote:

There were three issues that I know of after the recent OpenZFS merge:

1. Data corruption unrelated to block cloning, so it can happen even 
with block cloning disabled or not in use. This was the problematic 
commit:
 
https://github.com/openzfs/zfs/commit/519851122b1703b8445ec17bc89b347cea965bb9


It was reverted in 63ee747febbf024be0aace61161241b53245449e.

2. Data corruption with embedded blocks when block cloning is enabled. 
It can happen when compression is enabled and the block contains 
between 60 to 112 bytes (this might be hard to determine). Fix exists, 
it is merged to OpenZFS already, but isn't in FreeBSD yet.

 OpenZFS pull request: https://github.com/openzfs/zfs/pull/14739

3. Panic on VERIFY(zil_replaying(zfsvfs->z_log, tx)). This is 
triggered when block cloning is enabled, the sync property is set to 
disabled and copy_file_range(2) is used. Easy fix exists, it is not 
yet merged to OpenZFS and not yet in FreeBSD HEAD.

 OpenZFS pull request: https://github.com/openzfs/zfs/pull/14758

Block cloning was disabled in 
46ac8f2e7d9601311eb9b3cd2fed138ff4a11a66, so 2 and 3 should not occur.


As of 068913e4ba3dd9b3067056e832cefc5ed264b5cc all known issues are
fixed, as far as I can tell.

Block cloning remains disabled for now just to be on the safe side,
but can be enabled by setting sysctl vfs.zfs.bclone_enabled to 1.

Don't relay on this sysctl as it will be removed in 2-3 weeks.


Hi Pawel,
thank you for your reply and for the fixes.

I think there is a 4th issue that needs to be addressed: how do we 
recover from the worst case scenario which is a machine with a kernel > 
2a58b312b62f and ZFS root upgraded with block cloning enabled.


In particular, is it safe to turn such a machine on in the first place, 
and what are the risks involved in doing so? Any potential data loss?


Would such a machine be able to fix itself by compiling a kernel, or 
would compilation fail and might data be corrupted in the process?


I have two poudriere builders powered off (I am not alone in this 
situation) and I need to recover them, ideally minimizing data loss. The 
builders are also hosting current and used to build kernels and worlds 
for 13 and current: as of now all my production machines are stuck on 
the 13 they run, I cannot update binaries nor packages and I would like 
to be back online.


Whatever the fixing procedure, it shall be outlined in the UPDATING 
document.


Thank you.

BR,

--
José Pérez



Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

2023-04-17 Thread Pawel Jakub Dawidek

On 4/17/23 18:15, Pawel Jakub Dawidek wrote:

There were three issues that I know of after the recent OpenZFS merge:

1. Data corruption unrelated to block cloning, so it can happen even 
with block cloning disabled or not in use. This was the problematic commit:

 
https://github.com/openzfs/zfs/commit/519851122b1703b8445ec17bc89b347cea965bb9

It was reverted in 63ee747febbf024be0aace61161241b53245449e.

2. Data corruption with embedded blocks when block cloning is enabled. 
It can happen when compression is enabled and the block contains between 
60 to 112 bytes (this might be hard to determine). Fix exists, it is 
merged to OpenZFS already, but isn't in FreeBSD yet.

 OpenZFS pull request: https://github.com/openzfs/zfs/pull/14739

3. Panic on VERIFY(zil_replaying(zfsvfs->z_log, tx)). This is triggered 
when block cloning is enabled, the sync property is set to disabled and 
copy_file_range(2) is used. Easy fix exists, it is not yet merged to 
OpenZFS and not yet in FreeBSD HEAD.

 OpenZFS pull request: https://github.com/openzfs/zfs/pull/14758

Block cloning was disabled in 46ac8f2e7d9601311eb9b3cd2fed138ff4a11a66, 
so 2 and 3 should not occur.


As of 068913e4ba3dd9b3067056e832cefc5ed264b5cc all known issues are 
fixed, as far as I can tell.


Block cloning remains disabled for now just to be on the safe side, but 
can be enabled by setting sysctl vfs.zfs.bclone_enabled to 1.


Don't relay on this sysctl as it will be removed in 2-3 weeks.

--
Pawel Jakub Dawidek




Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

2023-04-17 Thread José Pérez

Hi Pawel,
thank you for the patch.

Can you please elaborate a little more?

Did you run any tests? Is it safe to use your patch to access pools with 
feature@block_cloning active? Is it possible to build a kernel from such 
a pool?


Asking for others: is this fixing any corrupted data?

Thank you.

BR,

El 2023-04-17 06:35, Pawel Jakub Dawidek escribió:

On 4/16/23 01:07, Florian Smeets wrote:
On the pool that has block_cloning enabled I see the above insta panic 
when poudriere starts building. I found a workaround though:


--- /usr/local/share/poudriere/include/fs.sh.orig    2023-04-15 
18:03:50.090823000 +0200
+++ /usr/local/share/poudriere/include/fs.sh    2023-04-15 
18:04:04.144736000 +0200

@@ -295,7 +295,6 @@
  fi

  zfs clone -o mountpoint=${mnt} \
-    -o sync=disabled \
  -o atime=off \
  -o compression=off \
  ${fs}@${snap} \

With this workaround I was able to build thousands of packages without 
panics or failures due to data corruption.


Thank you, Florian, that was very helpful!

This should fix the problem:

https://github.com/openzfs/zfs/pull/14758


--
José Pérez



Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

2023-04-16 Thread Pawel Jakub Dawidek

On 4/16/23 01:07, Florian Smeets wrote:
On the pool that has block_cloning enabled I see the above insta panic 
when poudriere starts building. I found a workaround though:


--- /usr/local/share/poudriere/include/fs.sh.orig    2023-04-15 
18:03:50.090823000 +0200
+++ /usr/local/share/poudriere/include/fs.sh    2023-04-15 
18:04:04.144736000 +0200

@@ -295,7 +295,6 @@
  fi

  zfs clone -o mountpoint=${mnt} \
-    -o sync=disabled \
  -o atime=off \
  -o compression=off \
  ${fs}@${snap} \

With this workaround I was able to build thousands of packages without 
panics or failures due to data corruption.


Thank you, Florian, that was very helpful!

This should fix the problem:

https://github.com/openzfs/zfs/pull/14758

--
Pawel Jakub Dawidek




Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

2023-04-16 Thread Mark Millard
On Apr 16, 2023, at 10:40, Mark Millard  wrote:

> On Apr 16, 2023, at 01:34, Mark Millard  wrote:
> 
>> On Apr 15, 2023, at 19:13, Mark Millard  wrote:
>> 
>>> A general question is all for this message.
>>> 
>>> So far no commit to FeeeBSD's main seems to be
>>> analogous to the content of:
>>> 
>>> https://github.com/openzfs/zfs/pull/14739/files
>>> 
>>> After my existing poudriere bulk test finishes,
>>> should I avoid having the content of that change
>>> in place for future testing? Vs.: Should I keep
>>> using the content of that change?
>>> 
>>> (The question is prompted by the 2 recent commits
>>> that I will update my test environment to be using,
>>> in part by fetching and updating to a new head,
>>> avoiding the "no dnode_next_offset change" status
>>> that my existing test has.)
>>> 
>> 
>> Not knowing, I updated to:
>> 
>> # uname -apKU
>> FreeBSD CA72_4c8G_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #92 
>> main-n262185-b1a00c2b1368-dirty: Sun Apr 16 00:10:51 PDT 2023 
>> root@CA72_4c8G_ZFS:/usr/obj/BUILDs/main-CA72-nodbg-clang/usr/main-src/arm64.aarch64/sys/GENERIC-NODBG-CA72
>>  arm64 aarch64 1400086 1400086
>> 
>> with the following still in place:
>> 
>> # git -C /usr/main-src/ diff sys/contrib/openzfs/
>> diff --git a/sys/contrib/openzfs/module/zfs/dmu.c 
>> b/sys/contrib/openzfs/module/zfs/dmu.c
>> index ce985d833f58..cda1472a77aa 100644
>> --- a/sys/contrib/openzfs/module/zfs/dmu.c
>> +++ b/sys/contrib/openzfs/module/zfs/dmu.c
>> @@ -2312,8 +2312,10 @@ dmu_brt_clone(objset_t *os, uint64_t object, uint64_t 
>> offset, uint64_t length,
>>   dl->dr_overridden_by.blk_phys_birth = 0;
>>   } else {
>>   dl->dr_overridden_by.blk_birth = dr->dr_txg;
>> -   dl->dr_overridden_by.blk_phys_birth =
>> -   BP_PHYSICAL_BIRTH(bp);
>> +   if (!BP_IS_EMBEDDED(bp)) {
>> +   dl->dr_overridden_by.blk_phys_birth =
>> +   BP_PHYSICAL_BIRTH(bp);
>> +   }
>>   }
>> mutex_exit(>db_mtx);
>> 
>> 
>> 
>> and booted the update. I've done a:
>> 
>> # poudriere pkgclean -jmain-CA72-bulk_a -A
>> 
>> and started another package build run based
>> on that combination:
>> 
>> # poudriere bulk -jmain-CA72-bulk_a -w -f ~/origins/CA72-origins.txt
>> . . .
>> [main-CA72-bulk_a-default] [2023-04-16_00h38m01s] [balancing_pool:] Queued: 
>> 476 Built: 0   Failed: 0   Skipped: 0   Ignored: 0   Fetched: 0   Tobuild: 
>> 476  Time: 00:00:24
>> [00:00:37] Recording filesystem state for prepkg... done
>> [00:00:37] Building 476 packages using up to 16 builders
>> [00:00:37] Hit CTRL+t at any time to see build progress and stats
>> [00:00:37] [01] [00:00:00] Builder starting
>> [00:00:40] [01] [00:00:03] Builder started
>> [00:00:40] [01] [00:00:00] Building ports-mgmt/pkg | pkg-1.19.1_1
>> . . .
>> 
>> If there are no failures, it will be about 9 hrs before I know that.
>> Given that I'll be trying to sleep soon, it may be about that long
>> either way.
> 
> [Reminder: All my testing has been of a "block_cloning was
> never enabled" context. This one has the dnode_next_offset
> change involved, unlike the prior one.]
> 
> There was one failed fetch but no other failures:
> 
> [01:25:02] [04] [00:01:07] Finished ports-mgmt/fallout | fallout-1.0.4_8: 
> Failed: fetch
> . . .
> [09:13:58] Failed ports: ports-mgmt/fallout:fetch
> [main-CA72-bulk_a-default] [2023-04-16_00h38m01s] [committing:] Queued: 476 
> Built: 475 Failed: 1   Skipped: 0   Ignored: 0   Fetched: 0   Tobuild: 0
> Time: 09:13:45
> 
> Running the bulk again:
> 
> . . .
> [00:00:22] Building 1 packages using up to 1 builders
> [00:00:22] Hit CTRL+t at any time to see build progress and stats
> [00:00:22] [01] [00:00:00] Builder starting
> [00:00:24] [01] [00:00:02] Builder started
> [00:00:24] [01] [00:00:00] Building ports-mgmt/fallout | fallout-1.0.4_8
> [00:01:04] [01] [00:00:40] Finished ports-mgmt/fallout | fallout-1.0.4_8: 
> Success
> . . .
> 
> I do not expect the fetch issue is evidence of a problem.

By omission, I was too vague about that. The log's error message was:

go: golang.org/x/text@v0.3.7: read "https:/proxy.golang.org/@v/v0.3.7.zip": 
read tcp 192.168.1.110:47155->142.251.215.241:443: read: connection reset by 
peer

> I'm counting this as:  No evidence of corruption problems.
> 



===
Mark Millard
marklmi at yahoo.com




Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

2023-04-16 Thread Mark Millard
On Apr 16, 2023, at 01:34, Mark Millard  wrote:

> On Apr 15, 2023, at 19:13, Mark Millard  wrote:
> 
>> A general question is all for this message.
>> 
>> So far no commit to FeeeBSD's main seems to be
>> analogous to the content of:
>> 
>> https://github.com/openzfs/zfs/pull/14739/files
>> 
>> After my existing poudriere bulk test finishes,
>> should I avoid having the content of that change
>> in place for future testing? Vs.: Should I keep
>> using the content of that change?
>> 
>> (The question is prompted by the 2 recent commits
>> that I will update my test environment to be using,
>> in part by fetching and updating to a new head,
>> avoiding the "no dnode_next_offset change" status
>> that my existing test has.)
>> 
> 
> Not knowing, I updated to:
> 
> # uname -apKU
> FreeBSD CA72_4c8G_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #92 
> main-n262185-b1a00c2b1368-dirty: Sun Apr 16 00:10:51 PDT 2023 
> root@CA72_4c8G_ZFS:/usr/obj/BUILDs/main-CA72-nodbg-clang/usr/main-src/arm64.aarch64/sys/GENERIC-NODBG-CA72
>  arm64 aarch64 1400086 1400086
> 
> with the following still in place:
> 
> # git -C /usr/main-src/ diff sys/contrib/openzfs/
> diff --git a/sys/contrib/openzfs/module/zfs/dmu.c 
> b/sys/contrib/openzfs/module/zfs/dmu.c
> index ce985d833f58..cda1472a77aa 100644
> --- a/sys/contrib/openzfs/module/zfs/dmu.c
> +++ b/sys/contrib/openzfs/module/zfs/dmu.c
> @@ -2312,8 +2312,10 @@ dmu_brt_clone(objset_t *os, uint64_t object, uint64_t 
> offset, uint64_t length,
>dl->dr_overridden_by.blk_phys_birth = 0;
>} else {
>dl->dr_overridden_by.blk_birth = dr->dr_txg;
> -   dl->dr_overridden_by.blk_phys_birth =
> -   BP_PHYSICAL_BIRTH(bp);
> +   if (!BP_IS_EMBEDDED(bp)) {
> +   dl->dr_overridden_by.blk_phys_birth =
> +   BP_PHYSICAL_BIRTH(bp);
> +   }
>}
>  mutex_exit(>db_mtx);
> 
> 
> 
> and booted the update. I've done a:
> 
> # poudriere pkgclean -jmain-CA72-bulk_a -A
> 
> and started another package build run based
> on that combination:
> 
> # poudriere bulk -jmain-CA72-bulk_a -w -f ~/origins/CA72-origins.txt
> . . .
> [main-CA72-bulk_a-default] [2023-04-16_00h38m01s] [balancing_pool:] Queued: 
> 476 Built: 0   Failed: 0   Skipped: 0   Ignored: 0   Fetched: 0   Tobuild: 
> 476  Time: 00:00:24
> [00:00:37] Recording filesystem state for prepkg... done
> [00:00:37] Building 476 packages using up to 16 builders
> [00:00:37] Hit CTRL+t at any time to see build progress and stats
> [00:00:37] [01] [00:00:00] Builder starting
> [00:00:40] [01] [00:00:03] Builder started
> [00:00:40] [01] [00:00:00] Building ports-mgmt/pkg | pkg-1.19.1_1
> . . .
> 
> If there are no failures, it will be about 9 hrs before I know that.
> Given that I'll be trying to sleep soon, it may be about that long
> either way.

[Reminder: All my testing has been of a "block_cloning was
never enabled" context. This one has the dnode_next_offset
change involved, unlike the prior one.]

There was one failed fetch but no other failures:

[01:25:02] [04] [00:01:07] Finished ports-mgmt/fallout | fallout-1.0.4_8: 
Failed: fetch
. . .
[09:13:58] Failed ports: ports-mgmt/fallout:fetch
[main-CA72-bulk_a-default] [2023-04-16_00h38m01s] [committing:] Queued: 476 
Built: 475 Failed: 1   Skipped: 0   Ignored: 0   Fetched: 0   Tobuild: 0
Time: 09:13:45

Running the bulk again:

. . .
[00:00:22] Building 1 packages using up to 1 builders
[00:00:22] Hit CTRL+t at any time to see build progress and stats
[00:00:22] [01] [00:00:00] Builder starting
[00:00:24] [01] [00:00:02] Builder started
[00:00:24] [01] [00:00:00] Building ports-mgmt/fallout | fallout-1.0.4_8
[00:01:04] [01] [00:00:40] Finished ports-mgmt/fallout | fallout-1.0.4_8: 
Success
. . .

I do not expect the fetch issue is evidence of a problem.

I'm counting this as:  No evidence of corruption problems.

===
Mark Millard
marklmi at yahoo.com




Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

2023-04-16 Thread Mark Millard
On Apr 15, 2023, at 19:13, Mark Millard  wrote:

> A general question is all for this message.
> 
> So far no commit to FeeeBSD's main seems to be
> analogous to the content of:
> 
> https://github.com/openzfs/zfs/pull/14739/files
> 
> After my existing poudriere bulk test finishes,
> should I avoid having the content of that change
> in place for future testing? Vs.: Should I keep
> using the content of that change?
> 
> (The question is prompted by the 2 recent commits
> that I will update my test environment to be using,
> in part by fetching and updating to a new head,
> avoiding the "no dnode_next_offset change" status
> that my existing test has.)
> 

Not knowing, I updated to:

# uname -apKU
FreeBSD CA72_4c8G_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #92 
main-n262185-b1a00c2b1368-dirty: Sun Apr 16 00:10:51 PDT 2023 
root@CA72_4c8G_ZFS:/usr/obj/BUILDs/main-CA72-nodbg-clang/usr/main-src/arm64.aarch64/sys/GENERIC-NODBG-CA72
 arm64 aarch64 1400086 1400086

with the following still in place:

# git -C /usr/main-src/ diff sys/contrib/openzfs/
diff --git a/sys/contrib/openzfs/module/zfs/dmu.c 
b/sys/contrib/openzfs/module/zfs/dmu.c
index ce985d833f58..cda1472a77aa 100644
--- a/sys/contrib/openzfs/module/zfs/dmu.c
+++ b/sys/contrib/openzfs/module/zfs/dmu.c
@@ -2312,8 +2312,10 @@ dmu_brt_clone(objset_t *os, uint64_t object, uint64_t 
offset, uint64_t length,
dl->dr_overridden_by.blk_phys_birth = 0;
} else {
dl->dr_overridden_by.blk_birth = dr->dr_txg;
-   dl->dr_overridden_by.blk_phys_birth =
-   BP_PHYSICAL_BIRTH(bp);
+   if (!BP_IS_EMBEDDED(bp)) {
+   dl->dr_overridden_by.blk_phys_birth =
+   BP_PHYSICAL_BIRTH(bp);
+   }
}
  mutex_exit(>db_mtx);



and booted the update. I've done a:

# poudriere pkgclean -jmain-CA72-bulk_a -A

and started another package build run based
on that combination:

# poudriere bulk -jmain-CA72-bulk_a -w -f ~/origins/CA72-origins.txt
. . .
[main-CA72-bulk_a-default] [2023-04-16_00h38m01s] [balancing_pool:] Queued: 476 
Built: 0   Failed: 0   Skipped: 0   Ignored: 0   Fetched: 0   Tobuild: 476  
Time: 00:00:24
[00:00:37] Recording filesystem state for prepkg... done
[00:00:37] Building 476 packages using up to 16 builders
[00:00:37] Hit CTRL+t at any time to see build progress and stats
[00:00:37] [01] [00:00:00] Builder starting
[00:00:40] [01] [00:00:03] Builder started
[00:00:40] [01] [00:00:00] Building ports-mgmt/pkg | pkg-1.19.1_1
. . .

If there are no failures, it will be about 9 hrs before I know that.
Given that I'll be trying to sleep soon, it may be about that long
either way.

===
Mark Millard
marklmi at yahoo.com




Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

2023-04-16 Thread Mark Millard
On Apr 15, 2023, at 21:31, Mark Millard  wrote:

> On Apr 15, 2023, at 17:27, Mark Millard  wrote:
> 
>> On Apr 15, 2023, at 15:49, Mark Millard  wrote:
>> 
>>> . . .
 
 
 (Mostly written as I progressed but some material later
 inserted into/around previously written material.)
 
 Summary:
 
 As stands, it looks like reverting the dnode_is_dirty
 code is what fixes the corruptions that my type of
 test context produced via poudriere bulk activity .
 
 
 The details that lead to that summary . . .
 
 Using my my build environment for updating my temporary,
 experimental context, an environment running a world and
 and kernel that predate the import:
 
 # uname -apKU
 FreeBSD CA72_4c8G_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #90 
 main-n261544-cee09bda03c8-dirty: Wed Mar 15 20:25:49 PDT 2023 
 root@CA72_16Gp_ZFS:/usr/obj/BUILDs/main-CA72-nodbg-clang/usr/main-src/arm64.aarch64/sys/GENERIC-NODBG-CA72
  arm64 aarch64 1400082 1400082
 
 (Note the "nondbg": I normally run non-debug main builds,
 but with symbols not stripped.)
 
 The kernel and world for this are what is in old-main-CA72:
 
 # bectl list
 BEActive Mountpoint Space Created
 main-CA72 R  -  3.98G 2023-04-12 20:29
 old-main-CA72 N  /  1.08M 2023-02-06 19:44
 
 (Most everything else is outside the BE's and so is shared
 across the BE's.)
 
 I updated to also have (whitespace details likely
 not preserved in this note):
 
 # git -C /usr/main-src/ diff 
 /usr/main-src/sys/contrib/openzfs/module/zfs/dnode.c
 diff --git a/sys/contrib/openzfs/module/zfs/dnode.c 
 b/sys/contrib/openzfs/module/zfs/dnode.c
 index 367bfaa80726..49a7f59c0da4 100644
 --- a/sys/contrib/openzfs/module/zfs/dnode.c
 +++ b/sys/contrib/openzfs/module/zfs/dnode.c
 @@ -1772,17 +1772,7 @@ dnode_is_dirty(dnode_t *dn)
 {
 mutex_enter(>dn_mtx);
 for (int i = 0; i < TXG_SIZE; i++) {
 -   list_t *list = >dn_dirty_records[i];
 -   for (dbuf_dirty_record_t *dr = list_head(list);
 -   dr != NULL; dr = list_next(list, dr)) {
 -   if (dr->dr_dbuf == NULL ||
 -   (dr->dr_dbuf->db_blkid != DMU_BONUS_BLKID &&
 -   dr->dr_dbuf->db_blkid != DMU_SPILL_BLKID)) {
 -   mutex_exit(>dn_mtx);
 -   return (B_TRUE);
 -   }
 -   }
 -   if (dn->dn_free_ranges[i] != NULL) {
 +   if (multilist_link_active(>dn_dirty_link[i])) {
 mutex_exit(>dn_mtx);
 return (B_TRUE);
 }
 
 
 
 
 I did my usual buildworld buildkernel sequence and then
 one of my normal install sequences into main-CA72 to
 update it to have the change, as well as the prior
 material involved in my first experiment that I'd
 reported on.
 
 I cleared the content of the jail that I use for
 temporary experiments, such as the prior testing that
 got the 11 builder failures:
 
 # poudriere pkgclean -jmain-CA72-bulk_a -A
 
 I then rebooted using the updated main-CA72 BE.
 
 Then I started the:
 
 # poudriere bulk -jmain-CA72-bulk_a -w -f ~/origins/CA72-origins.txt
 . . .
 [00:00:37] Building 476 packages using up to 16 builders
 [00:00:37] Hit CTRL+t at any time to see build progress and stats
 [00:00:38] [01] [00:00:00] Builder starting
 [00:00:40] [01] [00:00:02] Builder started
 [00:00:40] [01] [00:00:00] Building ports-mgmt/pkg | pkg-1.19.1_1
 
 In the prior experiment it got:
 
 476 = 252 success + 11 failed + 213 skipped
 
 and it reported the time for that as: 00:37:52.
 
 A normal from-scratch build takes many hours (multiple
 compiler toolchains and such) so my first report after
 this point will be for one of:
 
 A) It got to, say, 00:40:00 or beyond with, or without
 failures.
 vs.
 B) It got failures and stopped before that.
 
 . . . TIME GOES BY . . .
 
 At about 00:40:00 the status was:
 
 [00:40:00] [06] [00:00:00] Building x11/libXv | libXv-1.0.12,1
 load: 30.73  cmd: sh 1508 [nanslp] 2400.88r 6.69u 11.90s 0% 3960k
 [main-CA72-bulk_a-default] [2023-04-15_14h47m19s] [parallel_build:] 
 Queued: 476 Built: 235 Failed: 0   Skipped: 0   Ignored: 0   Fetched: 0   
 Tobuild: 241  Time: 00:40:01
 ID  TOTALORIGIN   PKGNAME   
 PHASE PHASETMPFS   CPU% MEM%
 [15] 00:07:44 devel/py-lxml@py39 | py39-lxml-4.9.2   
 stage 00:00:08 40.00 KiB 0%   0%
 [01] 00:00:34 

Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

2023-04-15 Thread Mark Millard
On Apr 15, 2023, at 17:27, Mark Millard  wrote:

> On Apr 15, 2023, at 15:49, Mark Millard  wrote:
> 
>> . . .
>>> 
>>> 
>>> (Mostly written as I progressed but some material later
>>> inserted into/around previously written material.)
>>> 
>>> Summary:
>>> 
>>> As stands, it looks like reverting the dnode_is_dirty
>>> code is what fixes the corruptions that my type of
>>> test context produced via poudriere bulk activity .
>>> 
>>> 
>>> The details that lead to that summary . . .
>>> 
>>> Using my my build environment for updating my temporary,
>>> experimental context, an environment running a world and
>>> and kernel that predate the import:
>>> 
>>> # uname -apKU
>>> FreeBSD CA72_4c8G_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #90 
>>> main-n261544-cee09bda03c8-dirty: Wed Mar 15 20:25:49 PDT 2023 
>>> root@CA72_16Gp_ZFS:/usr/obj/BUILDs/main-CA72-nodbg-clang/usr/main-src/arm64.aarch64/sys/GENERIC-NODBG-CA72
>>>  arm64 aarch64 1400082 1400082
>>> 
>>> (Note the "nondbg": I normally run non-debug main builds,
>>> but with symbols not stripped.)
>>> 
>>> The kernel and world for this are what is in old-main-CA72:
>>> 
>>> # bectl list
>>> BEActive Mountpoint Space Created
>>> main-CA72 R  -  3.98G 2023-04-12 20:29
>>> old-main-CA72 N  /  1.08M 2023-02-06 19:44
>>> 
>>> (Most everything else is outside the BE's and so is shared
>>> across the BE's.)
>>> 
>>> I updated to also have (whitespace details likely
>>> not preserved in this note):
>>> 
>>> # git -C /usr/main-src/ diff 
>>> /usr/main-src/sys/contrib/openzfs/module/zfs/dnode.c
>>> diff --git a/sys/contrib/openzfs/module/zfs/dnode.c 
>>> b/sys/contrib/openzfs/module/zfs/dnode.c
>>> index 367bfaa80726..49a7f59c0da4 100644
>>> --- a/sys/contrib/openzfs/module/zfs/dnode.c
>>> +++ b/sys/contrib/openzfs/module/zfs/dnode.c
>>> @@ -1772,17 +1772,7 @@ dnode_is_dirty(dnode_t *dn)
>>> {
>>>  mutex_enter(>dn_mtx);
>>>  for (int i = 0; i < TXG_SIZE; i++) {
>>> -   list_t *list = >dn_dirty_records[i];
>>> -   for (dbuf_dirty_record_t *dr = list_head(list);
>>> -   dr != NULL; dr = list_next(list, dr)) {
>>> -   if (dr->dr_dbuf == NULL ||
>>> -   (dr->dr_dbuf->db_blkid != DMU_BONUS_BLKID &&
>>> -   dr->dr_dbuf->db_blkid != DMU_SPILL_BLKID)) {
>>> -   mutex_exit(>dn_mtx);
>>> -   return (B_TRUE);
>>> -   }
>>> -   }
>>> -   if (dn->dn_free_ranges[i] != NULL) {
>>> +   if (multilist_link_active(>dn_dirty_link[i])) {
>>>  mutex_exit(>dn_mtx);
>>>  return (B_TRUE);
>>>  }
>>> 
>>> 
>>> 
>>> 
>>> I did my usual buildworld buildkernel sequence and then
>>> one of my normal install sequences into main-CA72 to
>>> update it to have the change, as well as the prior
>>> material involved in my first experiment that I'd
>>> reported on.
>>> 
>>> I cleared the content of the jail that I use for
>>> temporary experiments, such as the prior testing that
>>> got the 11 builder failures:
>>> 
>>> # poudriere pkgclean -jmain-CA72-bulk_a -A
>>> 
>>> I then rebooted using the updated main-CA72 BE.
>>> 
>>> Then I started the:
>>> 
>>> # poudriere bulk -jmain-CA72-bulk_a -w -f ~/origins/CA72-origins.txt
>>> . . .
>>> [00:00:37] Building 476 packages using up to 16 builders
>>> [00:00:37] Hit CTRL+t at any time to see build progress and stats
>>> [00:00:38] [01] [00:00:00] Builder starting
>>> [00:00:40] [01] [00:00:02] Builder started
>>> [00:00:40] [01] [00:00:00] Building ports-mgmt/pkg | pkg-1.19.1_1
>>> 
>>> In the prior experiment it got:
>>> 
>>> 476 = 252 success + 11 failed + 213 skipped
>>> 
>>> and it reported the time for that as: 00:37:52.
>>> 
>>> A normal from-scratch build takes many hours (multiple
>>> compiler toolchains and such) so my first report after
>>> this point will be for one of:
>>> 
>>> A) It got to, say, 00:40:00 or beyond with, or without
>>> failures.
>>> vs.
>>> B) It got failures and stopped before that.
>>> 
>>> . . . TIME GOES BY . . .
>>> 
>>> At about 00:40:00 the status was:
>>> 
>>> [00:40:00] [06] [00:00:00] Building x11/libXv | libXv-1.0.12,1
>>> load: 30.73  cmd: sh 1508 [nanslp] 2400.88r 6.69u 11.90s 0% 3960k
>>> [main-CA72-bulk_a-default] [2023-04-15_14h47m19s] [parallel_build:] Queued: 
>>> 476 Built: 235 Failed: 0   Skipped: 0   Ignored: 0   Fetched: 0   Tobuild: 
>>> 241  Time: 00:40:01
>>> ID  TOTALORIGIN   PKGNAME   
>>> PHASE PHASETMPFS   CPU% MEM%
>>> [15] 00:07:44 devel/py-lxml@py39 | py39-lxml-4.9.2   
>>> stage 00:00:08 40.00 KiB 0%   0%
>>> [01] 00:00:34 x11/libXxf86vm | libXxf86vm-1.1.4_3
>>> build-depends 00:00:03 56.00 KiB   2.3%   0%
>>> [16] 00:01:59 x11-toolkits/libXt | libXt-1.2.1,1 
>>> configure 

Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

2023-04-15 Thread Mark Millard
A general question is all for this message.

So far no commit to FeeeBSD's main seems to be
analogous to the content of:

https://github.com/openzfs/zfs/pull/14739/files

After my existing poudriere bulk test finishes,
should I avoid having the content of that change
in place for future testing? Vs.: Should I keep
using the content of that change?

(The question is prompted by the 2 recent commits
that I will update my test environment to be using,
in part by fetching and updating to a new head,
avoiding the "no dnode_next_offset change" status
that my existing test has.)

===
Mark Millard
marklmi at yahoo.com




Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

2023-04-15 Thread Mark Millard



On Apr 15, 2023, at 15:49, Mark Millard  wrote:

> . . .
>> 
>> 
>> (Mostly written as I progressed but some material later
>> inserted into/around previously written material.)
>> 
>> Summary:
>> 
>> As stands, it looks like reverting the dnode_is_dirty
>> code is what fixes the corruptions that my type of
>> test context produced via poudriere bulk activity .
>> 
>> 
>> The details that lead to that summary . . .
>> 
>> Using my my build environment for updating my temporary,
>> experimental context, an environment running a world and
>> and kernel that predate the import:
>> 
>> # uname -apKU
>> FreeBSD CA72_4c8G_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #90 
>> main-n261544-cee09bda03c8-dirty: Wed Mar 15 20:25:49 PDT 2023 
>> root@CA72_16Gp_ZFS:/usr/obj/BUILDs/main-CA72-nodbg-clang/usr/main-src/arm64.aarch64/sys/GENERIC-NODBG-CA72
>>  arm64 aarch64 1400082 1400082
>> 
>> (Note the "nondbg": I normally run non-debug main builds,
>> but with symbols not stripped.)
>> 
>> The kernel and world for this are what is in old-main-CA72:
>> 
>> # bectl list
>> BEActive Mountpoint Space Created
>> main-CA72 R  -  3.98G 2023-04-12 20:29
>> old-main-CA72 N  /  1.08M 2023-02-06 19:44
>> 
>> (Most everything else is outside the BE's and so is shared
>> across the BE's.)
>> 
>> I updated to also have (whitespace details likely
>> not preserved in this note):
>> 
>> # git -C /usr/main-src/ diff 
>> /usr/main-src/sys/contrib/openzfs/module/zfs/dnode.c
>> diff --git a/sys/contrib/openzfs/module/zfs/dnode.c 
>> b/sys/contrib/openzfs/module/zfs/dnode.c
>> index 367bfaa80726..49a7f59c0da4 100644
>> --- a/sys/contrib/openzfs/module/zfs/dnode.c
>> +++ b/sys/contrib/openzfs/module/zfs/dnode.c
>> @@ -1772,17 +1772,7 @@ dnode_is_dirty(dnode_t *dn)
>> {
>>   mutex_enter(>dn_mtx);
>>   for (int i = 0; i < TXG_SIZE; i++) {
>> -   list_t *list = >dn_dirty_records[i];
>> -   for (dbuf_dirty_record_t *dr = list_head(list);
>> -   dr != NULL; dr = list_next(list, dr)) {
>> -   if (dr->dr_dbuf == NULL ||
>> -   (dr->dr_dbuf->db_blkid != DMU_BONUS_BLKID &&
>> -   dr->dr_dbuf->db_blkid != DMU_SPILL_BLKID)) {
>> -   mutex_exit(>dn_mtx);
>> -   return (B_TRUE);
>> -   }
>> -   }
>> -   if (dn->dn_free_ranges[i] != NULL) {
>> +   if (multilist_link_active(>dn_dirty_link[i])) {
>>   mutex_exit(>dn_mtx);
>>   return (B_TRUE);
>>   }
>> 
>> 
>> 
>> 
>> I did my usual buildworld buildkernel sequence and then
>> one of my normal install sequences into main-CA72 to
>> update it to have the change, as well as the prior
>> material involved in my first experiment that I'd
>> reported on.
>> 
>> I cleared the content of the jail that I use for
>> temporary experiments, such as the prior testing that
>> got the 11 builder failures:
>> 
>> # poudriere pkgclean -jmain-CA72-bulk_a -A
>> 
>> I then rebooted using the updated main-CA72 BE.
>> 
>> Then I started the:
>> 
>> # poudriere bulk -jmain-CA72-bulk_a -w -f ~/origins/CA72-origins.txt
>> . . .
>> [00:00:37] Building 476 packages using up to 16 builders
>> [00:00:37] Hit CTRL+t at any time to see build progress and stats
>> [00:00:38] [01] [00:00:00] Builder starting
>> [00:00:40] [01] [00:00:02] Builder started
>> [00:00:40] [01] [00:00:00] Building ports-mgmt/pkg | pkg-1.19.1_1
>> 
>> In the prior experiment it got:
>> 
>> 476 = 252 success + 11 failed + 213 skipped
>> 
>> and it reported the time for that as: 00:37:52.
>> 
>> A normal from-scratch build takes many hours (multiple
>> compiler toolchains and such) so my first report after
>> this point will be for one of:
>> 
>> A) It got to, say, 00:40:00 or beyond with, or without
>>  failures.
>> vs.
>> B) It got failures and stopped before that.
>> 
>> . . . TIME GOES BY . . .
>> 
>> At about 00:40:00 the status was:
>> 
>> [00:40:00] [06] [00:00:00] Building x11/libXv | libXv-1.0.12,1
>> load: 30.73  cmd: sh 1508 [nanslp] 2400.88r 6.69u 11.90s 0% 3960k
>> [main-CA72-bulk_a-default] [2023-04-15_14h47m19s] [parallel_build:] Queued: 
>> 476 Built: 235 Failed: 0   Skipped: 0   Ignored: 0   Fetched: 0   Tobuild: 
>> 241  Time: 00:40:01
>> ID  TOTALORIGIN   PKGNAME   
>> PHASE PHASETMPFS   CPU% MEM%
>> [15] 00:07:44 devel/py-lxml@py39 | py39-lxml-4.9.2   
>> stage 00:00:08 40.00 KiB 0%   0%
>> [01] 00:00:34 x11/libXxf86vm | libXxf86vm-1.1.4_3
>> build-depends 00:00:03 56.00 KiB   2.3%   0%
>> [16] 00:01:59 x11-toolkits/libXt | libXt-1.2.1,1 
>> configure 00:00:52 40.00 KiB   0.3%   0%
>> [02] 00:01:40 devel/dbus | dbus-1.14.6,1 
>> configure 00:00:05 36.00 KiB   0.5%   0%
>> [03] 00:02:20 

Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

2023-04-15 Thread Mark Millard
On Apr 15, 2023, at 15:33, Mark Millard  wrote:

> On Apr 15, 2023, at 13:30, Mateusz Guzik  wrote:
> 
>> On 4/15/23, FreeBSD User  wrote:
>>> Am Sat, 15 Apr 2023 07:36:25 -0700
>>> Cy Schubert  schrieb:
>>> 
 In message <20230415115452.08911...@thor.intern.walstatt.dynvpn.de>,
 FreeBSD Us
 er writes:
> Am Thu, 13 Apr 2023 22:18:04 -0700
> Mark Millard  schrieb:
> 
>> On Apr 13, 2023, at 21:44, Charlie Li  wrote:
>> 
>>> Mark Millard wrote:
 FYI: in my original report for a context that has never had
 block_cloning enabled, I reported BOTH missing files and
 file content corruption in the poudriere-devel bulk build
 testing. This predates:
 https://people.freebsd.org/~pjd/patches/brt_revert.patch
 but had the changes from:
 https://github.com/openzfs/zfs/pull/14739/files
 The files were missing from packages installed to be used
 during a port's build. No other types of examples of missing
 files happened. (But only 11 ports failed.)
>>> I also don't have block_cloning enabled. "Missing files" prior to
>>> brt_rev
> ert may actually
>>> be present, but as the corruption also messes with the file(1)
>>> signature,
> some tools like
>>> ldconfig report them as missing.
>> 
>> For reference, the specific messages that were not explicit
>> null-byte complaints were (some shown with a little context):
>> 
>> 
>> ===>   py39-lxml-4.9.2 depends on shared library: libxml2.so - not
>> found
>> ===>   Installing existing package /packages/All/libxml2-2.10.3_1.pkg
>> 
>> [CA72_ZFS] Installing libxml2-2.10.3_1...
>> [CA72_ZFS] Extracting libxml2-2.10.3_1: .. done
>> ===>   py39-lxml-4.9.2 depends on shared library: libxml2.so - found
>> 
>> (/usr/local/lib/libxml2.so) . . .
>> [CA72_ZFS] Extracting libxslt-1.1.37: .. done
>> ===>   py39-lxml-4.9.2 depends on shared library: libxslt.so - found
>> 
>> (/usr/local/lib/libxslt.so) ===>   Returning to build of
>> py39-lxml-4.9.2
>> . . .
>> ===>  Configuring for py39-lxml-4.9.2
>> Building lxml version 4.9.2.
>> Building with Cython 0.29.33.
>> Error: Please make sure the libxml2 and libxslt development packages
>> are in
> stalled.
>> 
>> 
>> [CA72_ZFS] Extracting libunistring-1.1: .. done
>> ===>   libidn2-2.3.4 depends on shared library: libunistring.so - not
>> found
> 
>> 
>> 
>> [CA72_ZFS] Extracting gmp-6.2.1: .. done
>> ===>   mpfr-4.2.0,1 depends on shared library: libgmp.so - not found
>> 
>> 
>> 
>> ===>   nettle-3.8.1 depends on shared library: libgmp.so - not found
>> ===>   Installing existing package /packages/All/gmp-6.2.1.pkg
>> [CA72_ZFS] Installing gmp-6.2.1...
>> the most recent version of gmp-6.2.1 is already installed
>> ===>   nettle-3.8.1 depends on shared library: libgmp.so - not found
>> 
>> *** Error code 1
>> 
>> 
>> autom4te: error: need GNU m4 1.4 or later: /usr/local/bin/gm4
>> 
>> 
>> checking for GNU
>> M4 that supports accurate traces... configure: error: no acceptable m4
>> coul
> d be found in
>> $PATH. GNU M4 1.4.6 or later is required; 1.4.16 or newer is
>> recommended.
>> GNU M4 1.4.15 uses a buggy replacement strstr on some systems.
>> Glibc 2.9 - 2.12 and GNU M4 1.4.11 - 1.4.15 have another strstr bug.
>> 
>> 
>> ld: error: /usr/local/lib/libblkid.a: unknown file type
>> 
>> 
>> ===
>> Mark Millard
>> marklmi at yahoo.com
>> 
>> 
> 
> Hello
> 
> whar is the recent status of fixing/mitigate this desatrous bug?
> Especially f
> or those with the
> new option enabled on ZFS pools. Any advice?
> 
> In an act of precausion (or call it panic) I shutdown several servers to
> prev
> ent irreversible
> damages to databases and data storages. We face on one host with
> /usr/ports r
> esiding on ZFS
> always errors on the same files created while staging (using portmaster,
> leav
> es the system
> with noninstalled software, i.e. www/apache24 in our case). Deleting the
> work
> folder doesn't
> seem to change anything, even when starting a scrubbing of the entire
> pool (R
> AIDZ1 pool) -
> cause unknown, why it affects always the same files to be corrupted.
> Same wit
> h deve/ruby-gems.
> 
> Poudriere has been shutdown for the time being to avoid further issues.
> 
> 
> Are there any advies to proceed apart from conserving the boxes via
> shutdown?
> 
> Thank you ;-)
> oh
> 
> 
> 
> --
> O. Hartmann
 
 With an up-to-date tree + pjd@'s "Fix data corruption when cloning
 embedded
 blocks. #14739" patch I didn't have any 

Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

2023-04-15 Thread Mark Millard
On Apr 15, 2023, at 13:30, Mateusz Guzik  wrote:

> On 4/15/23, FreeBSD User  wrote:
>> Am Sat, 15 Apr 2023 07:36:25 -0700
>> Cy Schubert  schrieb:
>> 
>>> In message <20230415115452.08911...@thor.intern.walstatt.dynvpn.de>,
>>> FreeBSD Us
>>> er writes:
 Am Thu, 13 Apr 2023 22:18:04 -0700
 Mark Millard  schrieb:
 
> On Apr 13, 2023, at 21:44, Charlie Li  wrote:
> 
>> Mark Millard wrote:
>>> FYI: in my original report for a context that has never had
>>> block_cloning enabled, I reported BOTH missing files and
>>> file content corruption in the poudriere-devel bulk build
>>> testing. This predates:
>>> https://people.freebsd.org/~pjd/patches/brt_revert.patch
>>> but had the changes from:
>>> https://github.com/openzfs/zfs/pull/14739/files
>>> The files were missing from packages installed to be used
>>> during a port's build. No other types of examples of missing
>>> files happened. (But only 11 ports failed.)
>> I also don't have block_cloning enabled. "Missing files" prior to
>> brt_rev
 ert may actually
>> be present, but as the corruption also messes with the file(1)
>> signature,
 some tools like
>> ldconfig report them as missing.
> 
> For reference, the specific messages that were not explicit
> null-byte complaints were (some shown with a little context):
> 
> 
> ===>   py39-lxml-4.9.2 depends on shared library: libxml2.so - not
> found
> ===>   Installing existing package /packages/All/libxml2-2.10.3_1.pkg
> 
> [CA72_ZFS] Installing libxml2-2.10.3_1...
> [CA72_ZFS] Extracting libxml2-2.10.3_1: .. done
> ===>   py39-lxml-4.9.2 depends on shared library: libxml2.so - found
> 
> (/usr/local/lib/libxml2.so) . . .
> [CA72_ZFS] Extracting libxslt-1.1.37: .. done
> ===>   py39-lxml-4.9.2 depends on shared library: libxslt.so - found
> 
> (/usr/local/lib/libxslt.so) ===>   Returning to build of
> py39-lxml-4.9.2
> . . .
> ===>  Configuring for py39-lxml-4.9.2
> Building lxml version 4.9.2.
> Building with Cython 0.29.33.
> Error: Please make sure the libxml2 and libxslt development packages
> are in
 stalled.
> 
> 
> [CA72_ZFS] Extracting libunistring-1.1: .. done
> ===>   libidn2-2.3.4 depends on shared library: libunistring.so - not
> found
 
> 
> 
> [CA72_ZFS] Extracting gmp-6.2.1: .. done
> ===>   mpfr-4.2.0,1 depends on shared library: libgmp.so - not found
> 
> 
> 
> ===>   nettle-3.8.1 depends on shared library: libgmp.so - not found
> ===>   Installing existing package /packages/All/gmp-6.2.1.pkg
> [CA72_ZFS] Installing gmp-6.2.1...
> the most recent version of gmp-6.2.1 is already installed
> ===>   nettle-3.8.1 depends on shared library: libgmp.so - not found
> 
> *** Error code 1
> 
> 
> autom4te: error: need GNU m4 1.4 or later: /usr/local/bin/gm4
> 
> 
> checking for GNU
> M4 that supports accurate traces... configure: error: no acceptable m4
> coul
 d be found in
> $PATH. GNU M4 1.4.6 or later is required; 1.4.16 or newer is
> recommended.
> GNU M4 1.4.15 uses a buggy replacement strstr on some systems.
> Glibc 2.9 - 2.12 and GNU M4 1.4.11 - 1.4.15 have another strstr bug.
> 
> 
> ld: error: /usr/local/lib/libblkid.a: unknown file type
> 
> 
> ===
> Mark Millard
> marklmi at yahoo.com
> 
> 
 
 Hello
 
 whar is the recent status of fixing/mitigate this desatrous bug?
 Especially f
 or those with the
 new option enabled on ZFS pools. Any advice?
 
 In an act of precausion (or call it panic) I shutdown several servers to
 prev
 ent irreversible
 damages to databases and data storages. We face on one host with
 /usr/ports r
 esiding on ZFS
 always errors on the same files created while staging (using portmaster,
 leav
 es the system
 with noninstalled software, i.e. www/apache24 in our case). Deleting the
 work
 folder doesn't
 seem to change anything, even when starting a scrubbing of the entire
 pool (R
 AIDZ1 pool) -
 cause unknown, why it affects always the same files to be corrupted.
 Same wit
 h deve/ruby-gems.
 
 Poudriere has been shutdown for the time being to avoid further issues.
 
 
 Are there any advies to proceed apart from conserving the boxes via
 shutdown?
 
 Thank you ;-)
 oh
 
 
 
 --
 O. Hartmann
>>> 
>>> With an up-to-date tree + pjd@'s "Fix data corruption when cloning
>>> embedded
>>> blocks. #14739" patch I didn't have any issues, except for email messages
>>> 
>>> with corruption in my sent directory, nowhere else. I'm still
>>> investigating
>>> the email messages issue. IMO one is generally safe to run poudriere on
>>> 

Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

2023-04-15 Thread Mateusz Guzik
On 4/15/23, FreeBSD User  wrote:
> Am Sat, 15 Apr 2023 07:36:25 -0700
> Cy Schubert  schrieb:
>
>> In message <20230415115452.08911...@thor.intern.walstatt.dynvpn.de>,
>> FreeBSD Us
>> er writes:
>> > Am Thu, 13 Apr 2023 22:18:04 -0700
>> > Mark Millard  schrieb:
>> >
>> > > On Apr 13, 2023, at 21:44, Charlie Li  wrote:
>> > >
>> > > > Mark Millard wrote:
>> > > >> FYI: in my original report for a context that has never had
>> > > >> block_cloning enabled, I reported BOTH missing files and
>> > > >> file content corruption in the poudriere-devel bulk build
>> > > >> testing. This predates:
>> > > >> https://people.freebsd.org/~pjd/patches/brt_revert.patch
>> > > >> but had the changes from:
>> > > >> https://github.com/openzfs/zfs/pull/14739/files
>> > > >> The files were missing from packages installed to be used
>> > > >> during a port's build. No other types of examples of missing
>> > > >> files happened. (But only 11 ports failed.)
>> > > > I also don't have block_cloning enabled. "Missing files" prior to
>> > > > brt_rev
>> > ert may actually
>> > > > be present, but as the corruption also messes with the file(1)
>> > > > signature,
>> >  some tools like
>> > > > ldconfig report them as missing.
>> > >
>> > > For reference, the specific messages that were not explicit
>> > > null-byte complaints were (some shown with a little context):
>> > >
>> > >
>> > > ===>   py39-lxml-4.9.2 depends on shared library: libxml2.so - not
>> > > found
>> > > ===>   Installing existing package /packages/All/libxml2-2.10.3_1.pkg
>> > >
>> > > [CA72_ZFS] Installing libxml2-2.10.3_1...
>> > > [CA72_ZFS] Extracting libxml2-2.10.3_1: .. done
>> > > ===>   py39-lxml-4.9.2 depends on shared library: libxml2.so - found
>> > >
>> > > (/usr/local/lib/libxml2.so) . . .
>> > > [CA72_ZFS] Extracting libxslt-1.1.37: .. done
>> > > ===>   py39-lxml-4.9.2 depends on shared library: libxslt.so - found
>> > >
>> > > (/usr/local/lib/libxslt.so) ===>   Returning to build of
>> > > py39-lxml-4.9.2
>> > > . . .
>> > > ===>  Configuring for py39-lxml-4.9.2
>> > > Building lxml version 4.9.2.
>> > > Building with Cython 0.29.33.
>> > > Error: Please make sure the libxml2 and libxslt development packages
>> > > are in
>> > stalled.
>> > >
>> > >
>> > > [CA72_ZFS] Extracting libunistring-1.1: .. done
>> > > ===>   libidn2-2.3.4 depends on shared library: libunistring.so - not
>> > > found
>> >
>> > >
>> > >
>> > > [CA72_ZFS] Extracting gmp-6.2.1: .. done
>> > > ===>   mpfr-4.2.0,1 depends on shared library: libgmp.so - not found
>> > >
>> > >
>> > >
>> > > ===>   nettle-3.8.1 depends on shared library: libgmp.so - not found
>> > > ===>   Installing existing package /packages/All/gmp-6.2.1.pkg
>> > > [CA72_ZFS] Installing gmp-6.2.1...
>> > > the most recent version of gmp-6.2.1 is already installed
>> > > ===>   nettle-3.8.1 depends on shared library: libgmp.so - not found
>> > >
>> > > *** Error code 1
>> > >
>> > >
>> > > autom4te: error: need GNU m4 1.4 or later: /usr/local/bin/gm4
>> > >
>> > >
>> > > checking for GNU
>> > > M4 that supports accurate traces... configure: error: no acceptable m4
>> > > coul
>> > d be found in
>> > > $PATH. GNU M4 1.4.6 or later is required; 1.4.16 or newer is
>> > > recommended.
>> > > GNU M4 1.4.15 uses a buggy replacement strstr on some systems.
>> > > Glibc 2.9 - 2.12 and GNU M4 1.4.11 - 1.4.15 have another strstr bug.
>> > >
>> > >
>> > > ld: error: /usr/local/lib/libblkid.a: unknown file type
>> > >
>> > >
>> > > ===
>> > > Mark Millard
>> > > marklmi at yahoo.com
>> > >
>> > >
>> >
>> > Hello
>> >
>> > whar is the recent status of fixing/mitigate this desatrous bug?
>> > Especially f
>> > or those with the
>> > new option enabled on ZFS pools. Any advice?
>> >
>> > In an act of precausion (or call it panic) I shutdown several servers to
>> > prev
>> > ent irreversible
>> > damages to databases and data storages. We face on one host with
>> > /usr/ports r
>> > esiding on ZFS
>> > always errors on the same files created while staging (using portmaster,
>> > leav
>> > es the system
>> > with noninstalled software, i.e. www/apache24 in our case). Deleting the
>> > work
>> >  folder doesn't
>> > seem to change anything, even when starting a scrubbing of the entire
>> > pool (R
>> > AIDZ1 pool) -
>> > cause unknown, why it affects always the same files to be corrupted.
>> > Same wit
>> > h deve/ruby-gems.
>> >
>> > Poudriere has been shutdown for the time being to avoid further issues.
>> >
>> >
>> > Are there any advies to proceed apart from conserving the boxes via
>> > shutdown?
>> >
>> > Thank you ;-)
>> > oh
>> >
>> >
>> >
>> > --
>> > O. Hartmann
>>
>> With an up-to-date tree + pjd@'s "Fix data corruption when cloning
>> embedded
>> blocks. #14739" patch I didn't have any issues, except for email messages
>>
>> with corruption in my sent directory, nowhere else. I'm still
>> investigating
>> the email messages issue. IMO one is generally safe to run poudriere on
>> the

Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

2023-04-15 Thread Mark Millard
On Apr 15, 2023, at 11:07, Cy Schubert  wrote:

> In message <5a47f62d-0e78-4c3e-84c0-45eeb03c7...@yahoo.com>, Mark Millard 
> write
> s:
>> On Apr 15, 2023, at 07:36, Cy Schubert  =
>> wrote:
>> 
>>> In message <20230415115452.08911...@thor.intern.walstatt.dynvpn.de>,=20=
>> 
>>> FreeBSD Us
>>> er writes:
 Am Thu, 13 Apr 2023 22:18:04 -0700
 Mark Millard  schrieb:
 =20
> On Apr 13, 2023, at 21:44, Charlie Li  wrote:
> =20
>> Mark Millard wrote: =20
>>> FYI: in my original report for a context that has never had
>>> block_cloning enabled, I reported BOTH missing files and
>>> file content corruption in the poudriere-devel bulk build
>>> testing. This predates:
>>> https://people.freebsd.org/~pjd/patches/brt_revert.patch
>>> but had the changes from:
>>> https://github.com/openzfs/zfs/pull/14739/files
>>> The files were missing from packages installed to be used
>>> during a port's build. No other types of examples of missing
>>> files happened. (But only 11 ports failed.) =20
>> I also don't have block_cloning enabled. "Missing files" prior to =
>> brt_rev
 ert may actually
>> be present, but as the corruption also messes with the file(1) =
>> signature,
 some tools like
>> ldconfig report them as missing. =20
> =20
> For reference, the specific messages that were not explicit
> null-byte complaints were (some shown with a little context):
> =20
> =20
> =3D=3D=3D>   py39-lxml-4.9.2 depends on shared library: libxml2.so - =
>> not found
> =3D=3D=3D>   Installing existing package =
>> /packages/All/libxml2-2.10.3_1.pkg =20
> [CA72_ZFS] Installing libxml2-2.10.3_1...
> [CA72_ZFS] Extracting libxml2-2.10.3_1: .. done
> =3D=3D=3D>   py39-lxml-4.9.2 depends on shared library: libxml2.so - =
>> found
> (/usr/local/lib/libxml2.so) . . .
> [CA72_ZFS] Extracting libxslt-1.1.37: .. done
> =3D=3D=3D>   py39-lxml-4.9.2 depends on shared library: libxslt.so - =
>> found
> (/usr/local/lib/libxslt.so) =3D=3D=3D>   Returning to build of =
>> py39-lxml-4.9.2 =20
> . . .
> =3D=3D=3D>  Configuring for py39-lxml-4.9.2 =20
> Building lxml version 4.9.2.
> Building with Cython 0.29.33.
> Error: Please make sure the libxml2 and libxslt development packages =
>> are in
 stalled.
> =20
> =20
> [CA72_ZFS] Extracting libunistring-1.1: .. done
> =3D=3D=3D>   libidn2-2.3.4 depends on shared library: =
>> libunistring.so - not found
 =20
> =20
> =20
> [CA72_ZFS] Extracting gmp-6.2.1: .. done
> =3D=3D=3D>   mpfr-4.2.0,1 depends on shared library: libgmp.so - not =
>> found =20
> =20
> =20
> =3D=3D=3D>   nettle-3.8.1 depends on shared library: libgmp.so - not =
>> found
> =3D=3D=3D>   Installing existing package /packages/All/gmp-6.2.1.pkg =
>> =20
> [CA72_ZFS] Installing gmp-6.2.1...
> the most recent version of gmp-6.2.1 is already installed
> =3D=3D=3D>   nettle-3.8.1 depends on shared library: libgmp.so - not =
>> found =20
> *** Error code 1
> =20
> =20
> autom4te: error: need GNU m4 1.4 or later: /usr/local/bin/gm4
> =20
> =20
> checking for GNU=20
> M4 that supports accurate traces... configure: error: no acceptable =
>> m4 coul
 d be found in
> $PATH. GNU M4 1.4.6 or later is required; 1.4.16 or newer is =
>> recommended.
> GNU M4 1.4.15 uses a buggy replacement strstr on some systems.
> Glibc 2.9 - 2.12 and GNU M4 1.4.11 - 1.4.15 have another strstr bug.
> =20
> =20
> ld: error: /usr/local/lib/libblkid.a: unknown file type
> =20
> =20
> =3D=3D=3D
> Mark Millard
> marklmi at yahoo.com
> =20
> =20
 =20
 Hello=20
 =20
 whar is the recent status of fixing/mitigate this desatrous bug? =
>> Especially f
 or those with the
 new option enabled on ZFS pools. Any advice?
 =20
 In an act of precausion (or call it panic) I shutdown several servers =
>> to prev
 ent irreversible
 damages to databases and data storages. We face on one host with =
>> /usr/ports r
 esiding on ZFS
 always errors on the same files created while staging (using =
>> portmaster, leav
 es the system
 with noninstalled software, i.e. www/apache24 in our case). Deleting =
>> the work
 folder doesn't
 seem to change anything, even when starting a scrubbing of the entire =
>> pool (R
 AIDZ1 pool) -
 cause unknown, why it affects always the same files to be corrupted. =
>> Same wit
 h deve/ruby-gems.
 =20
 Poudriere has been shutdown for the time being to avoid further =
>> issues.=20
 =20
 Are there any advies to proceed apart from conserving the boxes via =
>> shutdown?
 =20
 Thank you ;-)
 oh
 =20
 =20
 =20
 --=20
 O. Hartmann
>>> =20
>>> With an up-to-date tree + pjd@'s "Fix data corruption when cloning =

Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

2023-04-15 Thread Cy Schubert
In message <5a47f62d-0e78-4c3e-84c0-45eeb03c7...@yahoo.com>, Mark Millard 
write
s:
> On Apr 15, 2023, at 07:36, Cy Schubert  =
> wrote:
>
> > In message <20230415115452.08911...@thor.intern.walstatt.dynvpn.de>,=20=
>
> > FreeBSD Us
> > er writes:
> >> Am Thu, 13 Apr 2023 22:18:04 -0700
> >> Mark Millard  schrieb:
> >>=20
> >>> On Apr 13, 2023, at 21:44, Charlie Li  wrote:
> >>>=20
>  Mark Millard wrote: =20
> > FYI: in my original report for a context that has never had
> > block_cloning enabled, I reported BOTH missing files and
> > file content corruption in the poudriere-devel bulk build
> > testing. This predates:
> > https://people.freebsd.org/~pjd/patches/brt_revert.patch
> > but had the changes from:
> > https://github.com/openzfs/zfs/pull/14739/files
> > The files were missing from packages installed to be used
> > during a port's build. No other types of examples of missing
> > files happened. (But only 11 ports failed.) =20
>  I also don't have block_cloning enabled. "Missing files" prior to =
> brt_rev
> >> ert may actually
>  be present, but as the corruption also messes with the file(1) =
> signature,
> >> some tools like
>  ldconfig report them as missing. =20
> >>>=20
> >>> For reference, the specific messages that were not explicit
> >>> null-byte complaints were (some shown with a little context):
> >>>=20
> >>>=20
> >>> =3D=3D=3D>   py39-lxml-4.9.2 depends on shared library: libxml2.so - =
> not found
> >>> =3D=3D=3D>   Installing existing package =
> /packages/All/libxml2-2.10.3_1.pkg =20
> >>> [CA72_ZFS] Installing libxml2-2.10.3_1...
> >>> [CA72_ZFS] Extracting libxml2-2.10.3_1: .. done
> >>> =3D=3D=3D>   py39-lxml-4.9.2 depends on shared library: libxml2.so - =
> found
> >>> (/usr/local/lib/libxml2.so) . . .
> >>> [CA72_ZFS] Extracting libxslt-1.1.37: .. done
> >>> =3D=3D=3D>   py39-lxml-4.9.2 depends on shared library: libxslt.so - =
> found
> >>> (/usr/local/lib/libxslt.so) =3D=3D=3D>   Returning to build of =
> py39-lxml-4.9.2 =20
> >>> . . .
> >>> =3D=3D=3D>  Configuring for py39-lxml-4.9.2 =20
> >>> Building lxml version 4.9.2.
> >>> Building with Cython 0.29.33.
> >>> Error: Please make sure the libxml2 and libxslt development packages =
> are in
> >> stalled.
> >>>=20
> >>>=20
> >>> [CA72_ZFS] Extracting libunistring-1.1: .. done
> >>> =3D=3D=3D>   libidn2-2.3.4 depends on shared library: =
> libunistring.so - not found
> >>=20
> >>>=20
> >>>=20
> >>> [CA72_ZFS] Extracting gmp-6.2.1: .. done
> >>> =3D=3D=3D>   mpfr-4.2.0,1 depends on shared library: libgmp.so - not =
> found =20
> >>>=20
> >>>=20
> >>> =3D=3D=3D>   nettle-3.8.1 depends on shared library: libgmp.so - not =
> found
> >>> =3D=3D=3D>   Installing existing package /packages/All/gmp-6.2.1.pkg =
> =20
> >>> [CA72_ZFS] Installing gmp-6.2.1...
> >>> the most recent version of gmp-6.2.1 is already installed
> >>> =3D=3D=3D>   nettle-3.8.1 depends on shared library: libgmp.so - not =
> found =20
> >>> *** Error code 1
> >>>=20
> >>>=20
> >>> autom4te: error: need GNU m4 1.4 or later: /usr/local/bin/gm4
> >>>=20
> >>>=20
> >>> checking for GNU=20
> >>> M4 that supports accurate traces... configure: error: no acceptable =
> m4 coul
> >> d be found in
> >>> $PATH. GNU M4 1.4.6 or later is required; 1.4.16 or newer is =
> recommended.
> >>> GNU M4 1.4.15 uses a buggy replacement strstr on some systems.
> >>> Glibc 2.9 - 2.12 and GNU M4 1.4.11 - 1.4.15 have another strstr bug.
> >>>=20
> >>>=20
> >>> ld: error: /usr/local/lib/libblkid.a: unknown file type
> >>>=20
> >>>=20
> >>> =3D=3D=3D
> >>> Mark Millard
> >>> marklmi at yahoo.com
> >>>=20
> >>>=20
> >>=20
> >> Hello=20
> >>=20
> >> whar is the recent status of fixing/mitigate this desatrous bug? =
> Especially f
> >> or those with the
> >> new option enabled on ZFS pools. Any advice?
> >>=20
> >> In an act of precausion (or call it panic) I shutdown several servers =
> to prev
> >> ent irreversible
> >> damages to databases and data storages. We face on one host with =
> /usr/ports r
> >> esiding on ZFS
> >> always errors on the same files created while staging (using =
> portmaster, leav
> >> es the system
> >> with noninstalled software, i.e. www/apache24 in our case). Deleting =
> the work
> >> folder doesn't
> >> seem to change anything, even when starting a scrubbing of the entire =
> pool (R
> >> AIDZ1 pool) -
> >> cause unknown, why it affects always the same files to be corrupted. =
> Same wit
> >> h deve/ruby-gems.
> >>=20
> >> Poudriere has been shutdown for the time being to avoid further =
> issues.=20
> >>=20
> >> Are there any advies to proceed apart from conserving the boxes via =
> shutdown?
> >>=20
> >> Thank you ;-)
> >> oh
> >>=20
> >>=20
> >>=20
> >> --=20
> >> O. Hartmann
> >=20
> > With an up-to-date tree + pjd@'s "Fix data corruption when cloning =
> embedded=20
> > blocks. #14739" patch I didn't have any issues, except for email =
> messages=20
> > with 

Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

2023-04-15 Thread Mark Millard
On Apr 15, 2023, at 07:36, Cy Schubert  wrote:

> In message <20230415115452.08911...@thor.intern.walstatt.dynvpn.de>, 
> FreeBSD Us
> er writes:
>> Am Thu, 13 Apr 2023 22:18:04 -0700
>> Mark Millard  schrieb:
>> 
>>> On Apr 13, 2023, at 21:44, Charlie Li  wrote:
>>> 
 Mark Millard wrote:  
> FYI: in my original report for a context that has never had
> block_cloning enabled, I reported BOTH missing files and
> file content corruption in the poudriere-devel bulk build
> testing. This predates:
> https://people.freebsd.org/~pjd/patches/brt_revert.patch
> but had the changes from:
> https://github.com/openzfs/zfs/pull/14739/files
> The files were missing from packages installed to be used
> during a port's build. No other types of examples of missing
> files happened. (But only 11 ports failed.)  
 I also don't have block_cloning enabled. "Missing files" prior to brt_rev
>> ert may actually
 be present, but as the corruption also messes with the file(1) signature,
>> some tools like
 ldconfig report them as missing.  
>>> 
>>> For reference, the specific messages that were not explicit
>>> null-byte complaints were (some shown with a little context):
>>> 
>>> 
>>> ===>   py39-lxml-4.9.2 depends on shared library: libxml2.so - not found
>>> ===>   Installing existing package /packages/All/libxml2-2.10.3_1.pkg  
>>> [CA72_ZFS] Installing libxml2-2.10.3_1...
>>> [CA72_ZFS] Extracting libxml2-2.10.3_1: .. done
>>> ===>   py39-lxml-4.9.2 depends on shared library: libxml2.so - found
>>> (/usr/local/lib/libxml2.so) . . .
>>> [CA72_ZFS] Extracting libxslt-1.1.37: .. done
>>> ===>   py39-lxml-4.9.2 depends on shared library: libxslt.so - found
>>> (/usr/local/lib/libxslt.so) ===>   Returning to build of py39-lxml-4.9.2  
>>> . . .
>>> ===>  Configuring for py39-lxml-4.9.2  
>>> Building lxml version 4.9.2.
>>> Building with Cython 0.29.33.
>>> Error: Please make sure the libxml2 and libxslt development packages are in
>> stalled.
>>> 
>>> 
>>> [CA72_ZFS] Extracting libunistring-1.1: .. done
>>> ===>   libidn2-2.3.4 depends on shared library: libunistring.so - not found
>> 
>>> 
>>> 
>>> [CA72_ZFS] Extracting gmp-6.2.1: .. done
>>> ===>   mpfr-4.2.0,1 depends on shared library: libgmp.so - not found  
>>> 
>>> 
>>> ===>   nettle-3.8.1 depends on shared library: libgmp.so - not found
>>> ===>   Installing existing package /packages/All/gmp-6.2.1.pkg  
>>> [CA72_ZFS] Installing gmp-6.2.1...
>>> the most recent version of gmp-6.2.1 is already installed
>>> ===>   nettle-3.8.1 depends on shared library: libgmp.so - not found  
>>> *** Error code 1
>>> 
>>> 
>>> autom4te: error: need GNU m4 1.4 or later: /usr/local/bin/gm4
>>> 
>>> 
>>> checking for GNU 
>>> M4 that supports accurate traces... configure: error: no acceptable m4 coul
>> d be found in
>>> $PATH. GNU M4 1.4.6 or later is required; 1.4.16 or newer is recommended.
>>> GNU M4 1.4.15 uses a buggy replacement strstr on some systems.
>>> Glibc 2.9 - 2.12 and GNU M4 1.4.11 - 1.4.15 have another strstr bug.
>>> 
>>> 
>>> ld: error: /usr/local/lib/libblkid.a: unknown file type
>>> 
>>> 
>>> ===
>>> Mark Millard
>>> marklmi at yahoo.com
>>> 
>>> 
>> 
>> Hello 
>> 
>> whar is the recent status of fixing/mitigate this desatrous bug? Especially f
>> or those with the
>> new option enabled on ZFS pools. Any advice?
>> 
>> In an act of precausion (or call it panic) I shutdown several servers to prev
>> ent irreversible
>> damages to databases and data storages. We face on one host with /usr/ports r
>> esiding on ZFS
>> always errors on the same files created while staging (using portmaster, leav
>> es the system
>> with noninstalled software, i.e. www/apache24 in our case). Deleting the work
>> folder doesn't
>> seem to change anything, even when starting a scrubbing of the entire pool (R
>> AIDZ1 pool) -
>> cause unknown, why it affects always the same files to be corrupted. Same wit
>> h deve/ruby-gems.
>> 
>> Poudriere has been shutdown for the time being to avoid further issues. 
>> 
>> Are there any advies to proceed apart from conserving the boxes via shutdown?
>> 
>> Thank you ;-)
>> oh
>> 
>> 
>> 
>> -- 
>> O. Hartmann
> 
> With an up-to-date tree + pjd@'s "Fix data corruption when cloning embedded 
> blocks. #14739" patch I didn't have any issues, except for email messages 
> with corruption in my sent directory, nowhere else. I'm still investigating 
> the email messages issue. IMO one is generally safe to run poudriere on the 
> latest ZFS with the additional patch.

My poudriere testing failed when I tested such (14739 included),
per what I reported, block_cloning never have been enabled.
Others have also reported poudriere bulk build failures absent
block_cloning being involved and 14739 being in place. My tests
do predate:

https://people.freebsd.org/~pjd/patches/brt_revert.patch

and I'm not sure of if Cy's activity had brt_revert.patch in
place or not.

Other's notes 

Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

2023-04-15 Thread Cy Schubert
On Sat, 15 Apr 2023 18:07:34 +0200
Florian Smeets  wrote:

> On 15.04.23 17:51, FreeBSD User wrote:
> > Am Sat, 15 Apr 2023 07:36:25 -0700
> > Cy Schubert  schrieb:  
> >>
> >> With an up-to-date tree + pjd@'s "Fix data corruption when cloning embedded
> >> blocks. #14739" patch I didn't have any issues, except for email messages
> >> with corruption in my sent directory, nowhere else. I'm still investigating
> >> the email messages issue. IMO one is generally safe to run poudriere on the
> >> latest ZFS with the additional patch.  
> 
> This is also my current observation. I have 2 hosts where I was 
> unfortunate enough to update at the wrong time. I currently *think* that 
> I'm *not* seeing data corruption with head from April 12th and this 
> patch 
> https://github.com/openzfs/zfs/commit/d3a6e5ca3b2f684132238ca968bf0b96f17ec7e1.diff
>  
> applied.
> 
> One pool has been upgraded with feature@block_cloning and the other hasn't.
> > 
> > FreeBSD 14.0-CURRENT #8 main-n262175-5ee1c90e50ce: Sat Apr 15 07:57:16 CEST 
> > 2023 amd64
> > 
> > The box is crashing while trying to update ports with the well known issue:
> > 
> > Panic String: VERIFY(!zil_replaying(zilog, tx)) failed
> >   
> On the pool that has block_cloning enabled I see the above insta panic 
> when poudriere starts building. I found a workaround though:
> 
> --- /usr/local/share/poudriere/include/fs.sh.orig 2023-04-15 
> 18:03:50.090823000 +0200
> +++ /usr/local/share/poudriere/include/fs.sh  2023-04-15 
> 18:04:04.144736000 +0200
> @@ -295,7 +295,6 @@
>   fi
> 
>   zfs clone -o mountpoint=${mnt} \
> - -o sync=disabled \
>   -o atime=off \
>   -o compression=off \
>   ${fs}@${snap} \
> 
> With this workaround I was able to build thousands of packages without 
> panics or failures due to data corruption.

Thanks for this. I'll test this next week. A one should be able to test
this by hand to capture a dump.

> 
> Florian



-- 
Cheers,
Cy Schubert 
FreeBSD UNIX: Web:  https://FreeBSD.org
NTP:   Web:  https://nwtime.org

e^(i*pi)+1=0



Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

2023-04-15 Thread Cy Schubert
In message <20230415175218.777d0...@thor.intern.walstatt.dynvpn.de>, 
FreeBSD Us
er writes:
> Am Sat, 15 Apr 2023 07:36:25 -0700
> Cy Schubert  schrieb:
>
> > In message <20230415115452.08911...@thor.intern.walstatt.dynvpn.de>, 
> > FreeBSD Us
> > er writes:
> > > Am Thu, 13 Apr 2023 22:18:04 -0700
> > > Mark Millard  schrieb:
> > >  
> > > > On Apr 13, 2023, at 21:44, Charlie Li  wrote:
> > > >   
> > > > > Mark Millard wrote:
> > > > >> FYI: in my original report for a context that has never had
> > > > >> block_cloning enabled, I reported BOTH missing files and
> > > > >> file content corruption in the poudriere-devel bulk build
> > > > >> testing. This predates:
> > > > >> https://people.freebsd.org/~pjd/patches/brt_revert.patch
> > > > >> but had the changes from:
> > > > >> https://github.com/openzfs/zfs/pull/14739/files
> > > > >> The files were missing from packages installed to be used
> > > > >> during a port's build. No other types of examples of missing
> > > > >> files happened. (But only 11 ports failed.)
> > > > > I also don't have block_cloning enabled. "Missing files" prior to brt
> _rev  
> > > ert may actually  
> > > > > be present, but as the corruption also messes with the file(1) signat
> ure,  
> > >  some tools like  
> > > > > ldconfig report them as missing.
> > > > 
> > > > For reference, the specific messages that were not explicit
> > > > null-byte complaints were (some shown with a little context):
> > > > 
> > > >   
> > > > ===>   py39-lxml-4.9.2 depends on shared library: libxml2.so - not foun
> d
> > > > ===>   Installing existing package /packages/All/libxml2-2.10.3_1.pkg  
>   
> > > > [CA72_ZFS] Installing libxml2-2.10.3_1...
> > > > [CA72_ZFS] Extracting libxml2-2.10.3_1: .. done  
> > > > ===>   py39-lxml-4.9.2 depends on shared library: libxml2.so - found  
> > > > (/usr/local/lib/libxml2.so) . . .
> > > > [CA72_ZFS] Extracting libxslt-1.1.37: .. done  
> > > > ===>   py39-lxml-4.9.2 depends on shared library: libxslt.so - found  
> > > > (/usr/local/lib/libxslt.so) ===>   Returning to build of py39-lxml-4.9.
> 2  
> > > > . . .  
> > > > ===>  Configuring for py39-lxml-4.9.2
> > > > Building lxml version 4.9.2.
> > > > Building with Cython 0.29.33.
> > > > Error: Please make sure the libxml2 and libxslt development packages ar
> e in  
> > > stalled.  
> > > > 
> > > > 
> > > > [CA72_ZFS] Extracting libunistring-1.1: .. done  
> > > > ===>   libidn2-2.3.4 depends on shared library: libunistring.so - not f
> ound  
> > > 
> > > > 
> > > > 
> > > > [CA72_ZFS] Extracting gmp-6.2.1: .. done  
> > > > ===>   mpfr-4.2.0,1 depends on shared library: libgmp.so - not found   
>  
> > > > 
> > > >   
> > > > ===>   nettle-3.8.1 depends on shared library: libgmp.so - not found
> > > > ===>   Installing existing package /packages/All/gmp-6.2.1.pkg
> > > > [CA72_ZFS] Installing gmp-6.2.1...
> > > > the most recent version of gmp-6.2.1 is already installed  
> > > > ===>   nettle-3.8.1 depends on shared library: libgmp.so - not found   
>  
> > > > *** Error code 1
> > > > 
> > > > 
> > > > autom4te: error: need GNU m4 1.4 or later: /usr/local/bin/gm4
> > > > 
> > > > 
> > > > checking for GNU 
> > > > M4 that supports accurate traces... configure: error: no acceptable m4 
> coul  
> > > d be found in  
> > > > $PATH. GNU M4 1.4.6 or later is required; 1.4.16 or newer is recommende
> d.
> > > > GNU M4 1.4.15 uses a buggy replacement strstr on some systems.
> > > > Glibc 2.9 - 2.12 and GNU M4 1.4.11 - 1.4.15 have another strstr bug.
> > > > 
> > > > 
> > > > ld: error: /usr/local/lib/libblkid.a: unknown file type
> > > > 
> > > > 
> > > > ===
> > > > Mark Millard
> > > > marklmi at yahoo.com
> > > > 
> > > >   
> > >
> > > Hello 
> > >
> > > whar is the recent status of fixing/mitigate this desatrous bug? Especial
> ly f
> > > or those with the
> > > new option enabled on ZFS pools. Any advice?
> > >
> > > In an act of precausion (or call it panic) I shutdown several servers to 
> prev
> > > ent irreversible
> > > damages to databases and data storages. We face on one host with /usr/por
> ts r
> > > esiding on ZFS
> > > always errors on the same files created while staging (using portmaster, 
> leav
> > > es the system
> > > with noninstalled software, i.e. www/apache24 in our case). Deleting the 
> work
> > >  folder doesn't
> > > seem to change anything, even when starting a scrubbing of the entire poo
> l (R
> > > AIDZ1 pool) -
> > > cause unknown, why it affects always the same files to be corrupted. Same
>  wit
> > > h deve/ruby-gems.
> > >
> > > Poudriere has been shutdown for the time being to avoid further issues. 
> > >
> > > Are there any advies to proceed apart from conserving the boxes via shutd
> own?
> > >
> > > Thank you ;-)
> > > oh
> > >
> > >
> > >
> > > -- 
> > > O. Hartmann  
> > 
> > With an up-to-date tree + pjd@'s "Fix data corruption when cloning embedded
>  
> > blocks. #14739" patch I didn't have 

Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

2023-04-15 Thread Florian Smeets

On 15.04.23 17:51, FreeBSD User wrote:

Am Sat, 15 Apr 2023 07:36:25 -0700
Cy Schubert  schrieb:


With an up-to-date tree + pjd@'s "Fix data corruption when cloning embedded
blocks. #14739" patch I didn't have any issues, except for email messages
with corruption in my sent directory, nowhere else. I'm still investigating
the email messages issue. IMO one is generally safe to run poudriere on the
latest ZFS with the additional patch.


This is also my current observation. I have 2 hosts where I was 
unfortunate enough to update at the wrong time. I currently *think* that 
I'm *not* seeing data corruption with head from April 12th and this 
patch 
https://github.com/openzfs/zfs/commit/d3a6e5ca3b2f684132238ca968bf0b96f17ec7e1.diff 
applied.


One pool has been upgraded with feature@block_cloning and the other hasn't.


FreeBSD 14.0-CURRENT #8 main-n262175-5ee1c90e50ce: Sat Apr 15 07:57:16 CEST 
2023 amd64

The box is crashing while trying to update ports with the well known issue:

Panic String: VERIFY(!zil_replaying(zilog, tx)) failed

On the pool that has block_cloning enabled I see the above insta panic 
when poudriere starts building. I found a workaround though:


--- /usr/local/share/poudriere/include/fs.sh.orig	2023-04-15 
18:03:50.090823000 +0200
+++ /usr/local/share/poudriere/include/fs.sh	2023-04-15 
18:04:04.144736000 +0200

@@ -295,7 +295,6 @@
fi

zfs clone -o mountpoint=${mnt} \
-   -o sync=disabled \
-o atime=off \
-o compression=off \
${fs}@${snap} \

With this workaround I was able to build thousands of packages without 
panics or failures due to data corruption.


Florian


OpenPGP_0xEF5BA4DCD5A9F3C0.asc
Description: OpenPGP public key


OpenPGP_signature
Description: OpenPGP digital signature


Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

2023-04-15 Thread FreeBSD User
Am Sat, 15 Apr 2023 07:36:25 -0700
Cy Schubert  schrieb:

> In message <20230415115452.08911...@thor.intern.walstatt.dynvpn.de>, 
> FreeBSD Us
> er writes:
> > Am Thu, 13 Apr 2023 22:18:04 -0700
> > Mark Millard  schrieb:
> >  
> > > On Apr 13, 2023, at 21:44, Charlie Li  wrote:
> > >   
> > > > Mark Millard wrote:
> > > >> FYI: in my original report for a context that has never had
> > > >> block_cloning enabled, I reported BOTH missing files and
> > > >> file content corruption in the poudriere-devel bulk build
> > > >> testing. This predates:
> > > >> https://people.freebsd.org/~pjd/patches/brt_revert.patch
> > > >> but had the changes from:
> > > >> https://github.com/openzfs/zfs/pull/14739/files
> > > >> The files were missing from packages installed to be used
> > > >> during a port's build. No other types of examples of missing
> > > >> files happened. (But only 11 ports failed.)
> > > > I also don't have block_cloning enabled. "Missing files" prior to 
> > > > brt_rev  
> > ert may actually  
> > > > be present, but as the corruption also messes with the file(1) 
> > > > signature,  
> >  some tools like  
> > > > ldconfig report them as missing.
> > > 
> > > For reference, the specific messages that were not explicit
> > > null-byte complaints were (some shown with a little context):
> > > 
> > >   
> > > ===>   py39-lxml-4.9.2 depends on shared library: libxml2.so - not found
> > > ===>   Installing existing package /packages/All/libxml2-2.10.3_1.pkg
> > > [CA72_ZFS] Installing libxml2-2.10.3_1...
> > > [CA72_ZFS] Extracting libxml2-2.10.3_1: .. done  
> > > ===>   py39-lxml-4.9.2 depends on shared library: libxml2.so - found  
> > > (/usr/local/lib/libxml2.so) . . .
> > > [CA72_ZFS] Extracting libxslt-1.1.37: .. done  
> > > ===>   py39-lxml-4.9.2 depends on shared library: libxslt.so - found  
> > > (/usr/local/lib/libxslt.so) ===>   Returning to build of py39-lxml-4.9.2  
> > > . . .  
> > > ===>  Configuring for py39-lxml-4.9.2
> > > Building lxml version 4.9.2.
> > > Building with Cython 0.29.33.
> > > Error: Please make sure the libxml2 and libxslt development packages are 
> > > in  
> > stalled.  
> > > 
> > > 
> > > [CA72_ZFS] Extracting libunistring-1.1: .. done  
> > > ===>   libidn2-2.3.4 depends on shared library: libunistring.so - not 
> > > found  
> > 
> > > 
> > > 
> > > [CA72_ZFS] Extracting gmp-6.2.1: .. done  
> > > ===>   mpfr-4.2.0,1 depends on shared library: libgmp.so - not found
> > > 
> > >   
> > > ===>   nettle-3.8.1 depends on shared library: libgmp.so - not found
> > > ===>   Installing existing package /packages/All/gmp-6.2.1.pkg
> > > [CA72_ZFS] Installing gmp-6.2.1...
> > > the most recent version of gmp-6.2.1 is already installed  
> > > ===>   nettle-3.8.1 depends on shared library: libgmp.so - not found
> > > *** Error code 1
> > > 
> > > 
> > > autom4te: error: need GNU m4 1.4 or later: /usr/local/bin/gm4
> > > 
> > > 
> > > checking for GNU 
> > > M4 that supports accurate traces... configure: error: no acceptable m4 
> > > coul  
> > d be found in  
> > > $PATH. GNU M4 1.4.6 or later is required; 1.4.16 or newer is recommended.
> > > GNU M4 1.4.15 uses a buggy replacement strstr on some systems.
> > > Glibc 2.9 - 2.12 and GNU M4 1.4.11 - 1.4.15 have another strstr bug.
> > > 
> > > 
> > > ld: error: /usr/local/lib/libblkid.a: unknown file type
> > > 
> > > 
> > > ===
> > > Mark Millard
> > > marklmi at yahoo.com
> > > 
> > >   
> >
> > Hello 
> >
> > whar is the recent status of fixing/mitigate this desatrous bug? Especially 
> > f
> > or those with the
> > new option enabled on ZFS pools. Any advice?
> >
> > In an act of precausion (or call it panic) I shutdown several servers to 
> > prev
> > ent irreversible
> > damages to databases and data storages. We face on one host with /usr/ports 
> > r
> > esiding on ZFS
> > always errors on the same files created while staging (using portmaster, 
> > leav
> > es the system
> > with noninstalled software, i.e. www/apache24 in our case). Deleting the 
> > work
> >  folder doesn't
> > seem to change anything, even when starting a scrubbing of the entire pool 
> > (R
> > AIDZ1 pool) -
> > cause unknown, why it affects always the same files to be corrupted. Same 
> > wit
> > h deve/ruby-gems.
> >
> > Poudriere has been shutdown for the time being to avoid further issues. 
> >
> > Are there any advies to proceed apart from conserving the boxes via 
> > shutdown?
> >
> > Thank you ;-)
> > oh
> >
> >
> >
> > -- 
> > O. Hartmann  
> 
> With an up-to-date tree + pjd@'s "Fix data corruption when cloning embedded 
> blocks. #14739" patch I didn't have any issues, except for email messages 
> with corruption in my sent directory, nowhere else. I'm still investigating 
> the email messages issue. IMO one is generally safe to run poudriere on the 
> latest ZFS with the additional patch.
> 
> My tests of the additional patch concluded that it resolved my last 

Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

2023-04-15 Thread Cy Schubert
In message <20230415115452.08911...@thor.intern.walstatt.dynvpn.de>, 
FreeBSD Us
er writes:
> Am Thu, 13 Apr 2023 22:18:04 -0700
> Mark Millard  schrieb:
>
> > On Apr 13, 2023, at 21:44, Charlie Li  wrote:
> > 
> > > Mark Millard wrote:  
> > >> FYI: in my original report for a context that has never had
> > >> block_cloning enabled, I reported BOTH missing files and
> > >> file content corruption in the poudriere-devel bulk build
> > >> testing. This predates:
> > >> https://people.freebsd.org/~pjd/patches/brt_revert.patch
> > >> but had the changes from:
> > >> https://github.com/openzfs/zfs/pull/14739/files
> > >> The files were missing from packages installed to be used
> > >> during a port's build. No other types of examples of missing
> > >> files happened. (But only 11 ports failed.)  
> > > I also don't have block_cloning enabled. "Missing files" prior to brt_rev
> ert may actually
> > > be present, but as the corruption also messes with the file(1) signature,
>  some tools like
> > > ldconfig report them as missing.  
> > 
> > For reference, the specific messages that were not explicit
> > null-byte complaints were (some shown with a little context):
> > 
> > 
> > ===>   py39-lxml-4.9.2 depends on shared library: libxml2.so - not found
> > ===>   Installing existing package /packages/All/libxml2-2.10.3_1.pkg  
> > [CA72_ZFS] Installing libxml2-2.10.3_1...
> > [CA72_ZFS] Extracting libxml2-2.10.3_1: .. done
> > ===>   py39-lxml-4.9.2 depends on shared library: libxml2.so - found
> > (/usr/local/lib/libxml2.so) . . .
> > [CA72_ZFS] Extracting libxslt-1.1.37: .. done
> > ===>   py39-lxml-4.9.2 depends on shared library: libxslt.so - found
> > (/usr/local/lib/libxslt.so) ===>   Returning to build of py39-lxml-4.9.2  
> > . . .
> > ===>  Configuring for py39-lxml-4.9.2  
> > Building lxml version 4.9.2.
> > Building with Cython 0.29.33.
> > Error: Please make sure the libxml2 and libxslt development packages are in
> stalled.
> > 
> > 
> > [CA72_ZFS] Extracting libunistring-1.1: .. done
> > ===>   libidn2-2.3.4 depends on shared library: libunistring.so - not found
>   
> > 
> > 
> > [CA72_ZFS] Extracting gmp-6.2.1: .. done
> > ===>   mpfr-4.2.0,1 depends on shared library: libgmp.so - not found  
> > 
> > 
> > ===>   nettle-3.8.1 depends on shared library: libgmp.so - not found
> > ===>   Installing existing package /packages/All/gmp-6.2.1.pkg  
> > [CA72_ZFS] Installing gmp-6.2.1...
> > the most recent version of gmp-6.2.1 is already installed
> > ===>   nettle-3.8.1 depends on shared library: libgmp.so - not found  
> > *** Error code 1
> > 
> > 
> > autom4te: error: need GNU m4 1.4 or later: /usr/local/bin/gm4
> > 
> > 
> > checking for GNU 
> > M4 that supports accurate traces... configure: error: no acceptable m4 coul
> d be found in
> > $PATH. GNU M4 1.4.6 or later is required; 1.4.16 or newer is recommended.
> > GNU M4 1.4.15 uses a buggy replacement strstr on some systems.
> > Glibc 2.9 - 2.12 and GNU M4 1.4.11 - 1.4.15 have another strstr bug.
> > 
> > 
> > ld: error: /usr/local/lib/libblkid.a: unknown file type
> > 
> > 
> > ===
> > Mark Millard
> > marklmi at yahoo.com
> > 
> > 
>
> Hello 
>
> whar is the recent status of fixing/mitigate this desatrous bug? Especially f
> or those with the
> new option enabled on ZFS pools. Any advice?
>
> In an act of precausion (or call it panic) I shutdown several servers to prev
> ent irreversible
> damages to databases and data storages. We face on one host with /usr/ports r
> esiding on ZFS
> always errors on the same files created while staging (using portmaster, leav
> es the system
> with noninstalled software, i.e. www/apache24 in our case). Deleting the work
>  folder doesn't
> seem to change anything, even when starting a scrubbing of the entire pool (R
> AIDZ1 pool) -
> cause unknown, why it affects always the same files to be corrupted. Same wit
> h deve/ruby-gems.
>
> Poudriere has been shutdown for the time being to avoid further issues. 
>
> Are there any advies to proceed apart from conserving the boxes via shutdown?
>
> Thank you ;-)
> oh
>
>
>
> -- 
> O. Hartmann

With an up-to-date tree + pjd@'s "Fix data corruption when cloning embedded 
blocks. #14739" patch I didn't have any issues, except for email messages 
with corruption in my sent directory, nowhere else. I'm still investigating 
the email messages issue. IMO one is generally safe to run poudriere on the 
latest ZFS with the additional patch.

My tests of the additional patch concluded that it resolved my last 
problems, except for the sent email problem I'm still investigating. I'm 
sure there's a simple explanation for it, i.e. the email thread was 
corrupted by the EXDEV regression which cannot be fixed by anything, even 
reverting to the previous ZFS -- the data in those files will remain 
damaged regardless.

I cannot speak to the others who have had poudriere and other issues. I 
never had any problems with poudriere on top of the new ZFS.


Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

2023-04-15 Thread FreeBSD User
Am Thu, 13 Apr 2023 22:18:04 -0700
Mark Millard  schrieb:

> On Apr 13, 2023, at 21:44, Charlie Li  wrote:
> 
> > Mark Millard wrote:  
> >> FYI: in my original report for a context that has never had
> >> block_cloning enabled, I reported BOTH missing files and
> >> file content corruption in the poudriere-devel bulk build
> >> testing. This predates:
> >> https://people.freebsd.org/~pjd/patches/brt_revert.patch
> >> but had the changes from:
> >> https://github.com/openzfs/zfs/pull/14739/files
> >> The files were missing from packages installed to be used
> >> during a port's build. No other types of examples of missing
> >> files happened. (But only 11 ports failed.)  
> > I also don't have block_cloning enabled. "Missing files" prior to 
> > brt_revert may actually
> > be present, but as the corruption also messes with the file(1) signature, 
> > some tools like
> > ldconfig report them as missing.  
> 
> For reference, the specific messages that were not explicit
> null-byte complaints were (some shown with a little context):
> 
> 
> ===>   py39-lxml-4.9.2 depends on shared library: libxml2.so - not found
> ===>   Installing existing package /packages/All/libxml2-2.10.3_1.pkg  
> [CA72_ZFS] Installing libxml2-2.10.3_1...
> [CA72_ZFS] Extracting libxml2-2.10.3_1: .. done
> ===>   py39-lxml-4.9.2 depends on shared library: libxml2.so - found
> (/usr/local/lib/libxml2.so) . . .
> [CA72_ZFS] Extracting libxslt-1.1.37: .. done
> ===>   py39-lxml-4.9.2 depends on shared library: libxslt.so - found
> (/usr/local/lib/libxslt.so) ===>   Returning to build of py39-lxml-4.9.2  
> . . .
> ===>  Configuring for py39-lxml-4.9.2  
> Building lxml version 4.9.2.
> Building with Cython 0.29.33.
> Error: Please make sure the libxml2 and libxslt development packages are 
> installed.
> 
> 
> [CA72_ZFS] Extracting libunistring-1.1: .. done
> ===>   libidn2-2.3.4 depends on shared library: libunistring.so - not found  
> 
> 
> [CA72_ZFS] Extracting gmp-6.2.1: .. done
> ===>   mpfr-4.2.0,1 depends on shared library: libgmp.so - not found  
> 
> 
> ===>   nettle-3.8.1 depends on shared library: libgmp.so - not found
> ===>   Installing existing package /packages/All/gmp-6.2.1.pkg  
> [CA72_ZFS] Installing gmp-6.2.1...
> the most recent version of gmp-6.2.1 is already installed
> ===>   nettle-3.8.1 depends on shared library: libgmp.so - not found  
> *** Error code 1
> 
> 
> autom4te: error: need GNU m4 1.4 or later: /usr/local/bin/gm4
> 
> 
> checking for GNU 
> M4 that supports accurate traces... configure: error: no acceptable m4 could 
> be found in
> $PATH. GNU M4 1.4.6 or later is required; 1.4.16 or newer is recommended.
> GNU M4 1.4.15 uses a buggy replacement strstr on some systems.
> Glibc 2.9 - 2.12 and GNU M4 1.4.11 - 1.4.15 have another strstr bug.
> 
> 
> ld: error: /usr/local/lib/libblkid.a: unknown file type
> 
> 
> ===
> Mark Millard
> marklmi at yahoo.com
> 
> 

Hello 

whar is the recent status of fixing/mitigate this desatrous bug? Especially for 
those with the
new option enabled on ZFS pools. Any advice?

In an act of precausion (or call it panic) I shutdown several servers to 
prevent irreversible
damages to databases and data storages. We face on one host with /usr/ports 
residing on ZFS
always errors on the same files created while staging (using portmaster, leaves 
the system
with noninstalled software, i.e. www/apache24 in our case). Deleting the work 
folder doesn't
seem to change anything, even when starting a scrubbing of the entire pool 
(RAIDZ1 pool) -
cause unknown, why it affects always the same files to be corrupted. Same with 
deve/ruby-gems.

Poudriere has been shutdown for the time being to avoid further issues. 

Are there any advies to proceed apart from conserving the boxes via shutdown?

Thank you ;-)
oh



-- 
O. Hartmann



Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

2023-04-14 Thread Shawn Webb
On Thu, Apr 13, 2023 at 06:48:14PM -0400, Charlie Li wrote:
> Shawn Webb wrote:
> > Does the ZFS project have some sort of automated testing to catch
> > data-gobbling, pool killing bugs? It seems like this would have been
> > caught with some CI/CD stress testing automation scripts.
> > 
> I can't speak about how the OpenZFS project does things, but this particular
> corruption does not have any deterministic characteristics both pre- and
> post-condition, so would be hard to automate testing.

My approach would be to have a policy by which any new feature
scheduled to land in the main branch must also not show any
regressions when running `poudriere bulk -ac`. Such a policy could be
enforced via server-side git commit hook. One problem, though, is that
implementing that policy isn't just a matter of code, but also
infrastructure, so there's a tangible monetary cost.

I should mention that I appreciate the selfless hard work of those
involved in the FreeBSD and OpenZFS projects. I hope for continued
incremental improvements.

Thanks,

-- 
Shawn Webb
Cofounder / Security Engineer
HardenedBSD

https://git.hardenedbsd.org/hardenedbsd/pubkeys/-/raw/master/Shawn_Webb/03A4CBEBB82EA5A67D9F3853FF2E67A277F8E1FA.pub.asc


signature.asc
Description: PGP signature


Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

2023-04-13 Thread Mark Millard
On Apr 13, 2023, at 21:44, Charlie Li  wrote:

> Mark Millard wrote:
>> FYI: in my original report for a context that has never had
>> block_cloning enabled, I reported BOTH missing files and
>> file content corruption in the poudriere-devel bulk build
>> testing. This predates:
>> https://people.freebsd.org/~pjd/patches/brt_revert.patch
>> but had the changes from:
>> https://github.com/openzfs/zfs/pull/14739/files
>> The files were missing from packages installed to be used
>> during a port's build. No other types of examples of missing
>> files happened. (But only 11 ports failed.)
> I also don't have block_cloning enabled. "Missing files" prior to brt_revert 
> may actually be present, but as the corruption also messes with the file(1) 
> signature, some tools like ldconfig report them as missing.

For reference, the specific messages that were not explicit
null-byte complaints were (some shown with a little context):


===>   py39-lxml-4.9.2 depends on shared library: libxml2.so - not found
===>   Installing existing package /packages/All/libxml2-2.10.3_1.pkg
[CA72_ZFS] Installing libxml2-2.10.3_1...
[CA72_ZFS] Extracting libxml2-2.10.3_1: .. done
===>   py39-lxml-4.9.2 depends on shared library: libxml2.so - found 
(/usr/local/lib/libxml2.so)
. . .
[CA72_ZFS] Extracting libxslt-1.1.37: .. done
===>   py39-lxml-4.9.2 depends on shared library: libxslt.so - found 
(/usr/local/lib/libxslt.so)
===>   Returning to build of py39-lxml-4.9.2
. . .
===>  Configuring for py39-lxml-4.9.2
Building lxml version 4.9.2.
Building with Cython 0.29.33.
Error: Please make sure the libxml2 and libxslt development packages are 
installed.


[CA72_ZFS] Extracting libunistring-1.1: .. done
===>   libidn2-2.3.4 depends on shared library: libunistring.so - not found


[CA72_ZFS] Extracting gmp-6.2.1: .. done
===>   mpfr-4.2.0,1 depends on shared library: libgmp.so - not found


===>   nettle-3.8.1 depends on shared library: libgmp.so - not found
===>   Installing existing package /packages/All/gmp-6.2.1.pkg
[CA72_ZFS] Installing gmp-6.2.1...
the most recent version of gmp-6.2.1 is already installed
===>   nettle-3.8.1 depends on shared library: libgmp.so - not found
*** Error code 1


autom4te: error: need GNU m4 1.4 or later: /usr/local/bin/gm4


checking for GNU 
M4 that supports accurate traces... configure: error: no acceptable m4 could be 
found in $PATH.
GNU M4 1.4.6 or later is required; 1.4.16 or newer is recommended.
GNU M4 1.4.15 uses a buggy replacement strstr on some systems.
Glibc 2.9 - 2.12 and GNU M4 1.4.11 - 1.4.15 have another strstr bug.


ld: error: /usr/local/lib/libblkid.a: unknown file type


===
Mark Millard
marklmi at yahoo.com




Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

2023-04-13 Thread Charlie Li

Mark Millard wrote:

FYI: in my original report for a context that has never had
block_cloning enabled, I reported BOTH missing files and
file content corruption in the poudriere-devel bulk build
testing. This predates:

https://people.freebsd.org/~pjd/patches/brt_revert.patch

but had the changes from:

https://github.com/openzfs/zfs/pull/14739/files

The files were missing from packages installed to be used
during a port's build. No other types of examples of missing
files happened. (But only 11 ports failed.)

I also don't have block_cloning enabled. "Missing files" prior to 
brt_revert may actually be present, but as the corruption also messes 
with the file(1) signature, some tools like ldconfig report them as missing.


--
Charlie Li
…nope, still don't have an exit line.



OpenPGP_signature
Description: OpenPGP digital signature


Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

2023-04-13 Thread Mark Millard
On Apr 13, 2023, at 21:27, Charlie Li  wrote:
> 
> Pawel Jakub Dawidek wrote:
>> On 4/14/23 09:23, Charlie Li wrote:
>>> Pawel Jakub Dawidek wrote:
 Here is the change that reverts most of the modifications and disables 
 cloning new blocks. It does retain ability to free existing cloned blocks 
 and keeps block_cloning feature around, so upgraded pools can be imported 
 and existing cloned blocks freed.
 
 It does not handle replaying ZIL with block-cloning logs, so make sure you 
 import pools that were cleanly exported.
 
 I'd appreciate if someone who can reproduce those corruptions could try it.
 
 https://github.com/pjd/openzfs/commit/f2cfbcf76a733c44e25cba8c649162ef68047103
 
>>> Does not apply to sys/contrib/openzfs tip, conflicts in 
>>> module/os/freebsd/zfs/zfs_vnops_os.c and module/zfs/dmu.c.
>> This should work:
>> https://people.freebsd.org/~pjd/patches/brt_revert.patch
> This results in missing files rather than corruption.

FYI: in my original report for a context that has never had
block_cloning enabled, I reported BOTH missing files and
file content corruption in the poudriere-devel bulk build
testing. This predates:

https://people.freebsd.org/~pjd/patches/brt_revert.patch

but had the changes from:

https://github.com/openzfs/zfs/pull/14739/files

The files were missing from packages installed to be used
during a port's build. No other types of examples of missing
files happened. (But only 11 ports failed.)

===
Mark Millard
marklmi at yahoo.com




Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

2023-04-13 Thread Charlie Li

Pawel Jakub Dawidek wrote:

On 4/14/23 09:23, Charlie Li wrote:

Pawel Jakub Dawidek wrote:
Here is the change that reverts most of the modifications and 
disables cloning new blocks. It does retain ability to free existing 
cloned blocks and keeps block_cloning feature around, so upgraded 
pools can be imported and existing cloned blocks freed.


It does not handle replaying ZIL with block-cloning logs, so make 
sure you import pools that were cleanly exported.


I'd appreciate if someone who can reproduce those corruptions could 
try it.


https://github.com/pjd/openzfs/commit/f2cfbcf76a733c44e25cba8c649162ef68047103

Does not apply to sys/contrib/openzfs tip, conflicts in 
module/os/freebsd/zfs/zfs_vnops_os.c and module/zfs/dmu.c.


This should work:

https://people.freebsd.org/~pjd/patches/brt_revert.patch


This results in missing files rather than corruption.

--
Charlie Li
…nope, still don't have an exit line.



OpenPGP_signature
Description: OpenPGP digital signature


Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

2023-04-13 Thread Pawel Jakub Dawidek

On 4/14/23 09:23, Charlie Li wrote:

Pawel Jakub Dawidek wrote:
Here is the change that reverts most of the modifications and disables 
cloning new blocks. It does retain ability to free existing cloned 
blocks and keeps block_cloning feature around, so upgraded pools can 
be imported and existing cloned blocks freed.


It does not handle replaying ZIL with block-cloning logs, so make sure 
you import pools that were cleanly exported.


I'd appreciate if someone who can reproduce those corruptions could 
try it.


https://github.com/pjd/openzfs/commit/f2cfbcf76a733c44e25cba8c649162ef68047103

Does not apply to sys/contrib/openzfs tip, conflicts in 
module/os/freebsd/zfs/zfs_vnops_os.c and module/zfs/dmu.c.


This should work:

https://people.freebsd.org/~pjd/patches/brt_revert.patch

--
Pawel Jakub Dawidek




Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

2023-04-13 Thread Mateusz Guzik
On 4/14/23, Charlie Li  wrote:
> Pawel Jakub Dawidek wrote:
>> On 4/14/23 07:52, Charlie Li wrote:
>>> Pawel Jakub Dawidek wrote:
 thank you for your testing and patience so far. I'm working on a
 patch to revert block cloning without affecting people who already
 upgraded their pools.

>>> Testing with mjg@ earlier today revealed that block_cloning was not
>>> the cause of poudriere bulk build (and similar cp(1)/install(1)-based)
>>> corruption, although may have exacerbated it.
>>
>> Can you please elaborate how were you testing and what exactly did you
>> exclude?
>>
> mjg@ prepared
> https://gitlab.com/vishwin/freebsd-src/-/commit/b41f187ba329621cda1e8e67a0786f07b1221a3c
>
> which only removes block_cloning, rebuilding kernel only (buildworld
> fails) for me to test poudriere bulk -c builds with. I used a world from
> https://gitlab.com/vishwin/freebsd-src/-/tree/zfs-revert which consists
> of reverting the merge commit plus a few other conflicts, but keeping
> vop_fplookup_vexec.
>

I'm going to narrow down the non-blockcopy corruption after my testjig
gets off the ground.

Basically I expect to have it sorted out on Friday.


-- 
Mateusz Guzik 



Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

2023-04-13 Thread Charlie Li

Pawel Jakub Dawidek wrote:
Here is the change that reverts most of the modifications and disables 
cloning new blocks. It does retain ability to free existing cloned 
blocks and keeps block_cloning feature around, so upgraded pools can be 
imported and existing cloned blocks freed.


It does not handle replaying ZIL with block-cloning logs, so make sure 
you import pools that were cleanly exported.


I'd appreciate if someone who can reproduce those corruptions could try it.

https://github.com/pjd/openzfs/commit/f2cfbcf76a733c44e25cba8c649162ef68047103

Does not apply to sys/contrib/openzfs tip, conflicts in 
module/os/freebsd/zfs/zfs_vnops_os.c and module/zfs/dmu.c.


--
Charlie Li
…nope, still don't have an exit line.



OpenPGP_signature
Description: OpenPGP digital signature


Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

2023-04-13 Thread Charlie Li

Pawel Jakub Dawidek wrote:

On 4/14/23 07:52, Charlie Li wrote:

Pawel Jakub Dawidek wrote:
thank you for your testing and patience so far. I'm working on a 
patch to revert block cloning without affecting people who already 
upgraded their pools.


Testing with mjg@ earlier today revealed that block_cloning was not 
the cause of poudriere bulk build (and similar cp(1)/install(1)-based) 
corruption, although may have exacerbated it.


Can you please elaborate how were you testing and what exactly did you 
exclude?


mjg@ prepared 
https://gitlab.com/vishwin/freebsd-src/-/commit/b41f187ba329621cda1e8e67a0786f07b1221a3c 
which only removes block_cloning, rebuilding kernel only (buildworld 
fails) for me to test poudriere bulk -c builds with. I used a world from 
https://gitlab.com/vishwin/freebsd-src/-/tree/zfs-revert which consists 
of reverting the merge commit plus a few other conflicts, but keeping 
vop_fplookup_vexec.


--
Charlie Li
…nope, still don't have an exit line.



OpenPGP_signature
Description: OpenPGP digital signature


Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

2023-04-13 Thread Pawel Jakub Dawidek

On 4/14/23 07:52, Charlie Li wrote:

Pawel Jakub Dawidek wrote:
thank you for your testing and patience so far. I'm working on a patch 
to revert block cloning without affecting people who already upgraded 
their pools.


Testing with mjg@ earlier today revealed that block_cloning was not the 
cause of poudriere bulk build (and similar cp(1)/install(1)-based) 
corruption, although may have exacerbated it.


Can you please elaborate how were you testing and what exactly did you 
exclude?


Thanks.

--
Pawel Jakub Dawidek




Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

2023-04-13 Thread Pawel Jakub Dawidek

On 4/14/23 07:40, Pawel Jakub Dawidek wrote:

On 4/13/23 22:56, Cy Schubert wrote:
I'm in the process of building a branch reverting the merge altogether 
and

will test it on my sandbox machine later today.


Cy,

thank you for your testing and patience so far. I'm working on a patch 
to revert block cloning without affecting people who already upgraded 
their pools.


I'd also greatly appreciate if you could provide a procedure for me to 
reproduce the corruption, ideally without the internet access, as I'll 
be on the plane(s) for the next ~24h.


Here is the change that reverts most of the modifications and disables 
cloning new blocks. It does retain ability to free existing cloned 
blocks and keeps block_cloning feature around, so upgraded pools can be 
imported and existing cloned blocks freed.


It does not handle replaying ZIL with block-cloning logs, so make sure 
you import pools that were cleanly exported.


I'd appreciate if someone who can reproduce those corruptions could try it.

https://github.com/pjd/openzfs/commit/f2cfbcf76a733c44e25cba8c649162ef68047103

Thank you guys for your help!

--
Pawel Jakub Dawidek




Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

2023-04-13 Thread Charlie Li

Pawel Jakub Dawidek wrote:
thank you for your testing and patience so far. I'm working on a patch 
to revert block cloning without affecting people who already upgraded 
their pools.


Testing with mjg@ earlier today revealed that block_cloning was not the 
cause of poudriere bulk build (and similar cp(1)/install(1)-based) 
corruption, although may have exacerbated it.
I'd also greatly appreciate if you could provide a procedure for me to 
reproduce the corruption, ideally without the internet access, as I'll 
be on the plane(s) for the next ~24h.


Due to non-deterministic conditions, there...kind of isn't one. Best is 
probably just poudriere bulk builds.


--
Charlie Li
…nope, still don't have an exit line.



OpenPGP_signature
Description: OpenPGP digital signature


Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

2023-04-13 Thread Charlie Li

Shawn Webb wrote:

Does the ZFS project have some sort of automated testing to catch
data-gobbling, pool killing bugs? It seems like this would have been
caught with some CI/CD stress testing automation scripts.

I can't speak about how the OpenZFS project does things, but this 
particular corruption does not have any deterministic characteristics 
both pre- and post-condition, so would be hard to automate testing.


--
Charlie Li
…nope, still don't have an exit line.



OpenPGP_signature
Description: OpenPGP digital signature


Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

2023-04-13 Thread Pawel Jakub Dawidek

On 4/13/23 22:56, Cy Schubert wrote:

I'm in the process of building a branch reverting the merge altogether and
will test it on my sandbox machine later today.


Cy,

thank you for your testing and patience so far. I'm working on a patch 
to revert block cloning without affecting people who already upgraded 
their pools.


I'd also greatly appreciate if you could provide a procedure for me to 
reproduce the corruption, ideally without the internet access, as I'll 
be on the plane(s) for the next ~24h.


--
Pawel Jakub Dawidek




Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

2023-04-13 Thread Pawel Jakub Dawidek

On 4/13/23 23:05, Shawn Webb wrote:

I've learned over the years downstream that it's not really my place
to tell upstream what to do or how to do it. However, I think given
the seriousness of this, upstream might do well to revert the commit
until a solid fix is in place. Upstream might want to consider the
impacts this is having not just with downstream projects, but also
regular users.

Really bad timing to have a lot of new tax documentation that I really
don't want to lose. I'd really like to have an up-to-date, security
patched OS, but I guess I'll stay behind so that I don't risk losing
critical financial documentation.


Shawn,

I'm working on a patch to safely revert this that would also work for 
people who already upgraded their pools.


I'm sorry for this mess.

--
Pawel Jakub Dawidek




Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

2023-04-13 Thread Shawn Webb
On Thu, Apr 13, 2023 at 06:56:35AM -0700, Cy Schubert wrote:
> In message  om>
> , Mateusz Guzik writes:
> > On 4/13/23, Cy Schubert  wrote:
> > > On Thu, 13 Apr 2023 19:54:42 +0900
> > > Pawe=C5=82 Jakub Dawidek  wrote:
> > >
> > >> On Apr 13, 2023, at 16:10, Cy Schubert  wrote=
> > :
> > >> >
> > >> > =EF=BB=BFIn message <20230413070426.8a54f...@slippy.cwsent.com>, Cy Sc=
> > hubert
> > >> > writes:
> > >> > In message <20230413064252.1e5c1...@slippy.cwsent.com>, Cy Schubert
> > >> > writes:
> > >> >> In message , Mark
> > >> >> Millard
> > >> >>> write
> > >> >>> s:
> > >> >>> [This just puts my prior reply's material into Cy's
> > >>  adjusted resend of the original. The To/Cc should
> > >>  be coomplete this time.]
> > >> 
> > >>  On Apr 12, 2023, at 22:52, Cy Schubert  =
> > =3D
> > >>  wrote:
> > >> 
> > >>  In message , Mark =
> > =3D
> > >> > Millard=3D20
> > >>  write
> > >> > s:
> > >> > From: Charlie Li  wrote on
> > >> >> Date: Wed, 12 Apr 2023 20:11:16 UTC :
> > >> >> =3D20
> > >> >> Charlie Li wrote:
> > >> >>> Mateusz Guzik wrote:
> > >>  can you please test poudriere with
> > >> > https://github.com/openzfs/zfs/pull/14739/files
> > >> > =3D20
> > >> > After applying, on the md(4)-backed pool regardless of =3D3D
> > >>  block_cloning,=3D3D20
> > >> >> the cy@ `cp -R` test reports no differing (ie corrupted) files. =
> > =3D
> > >>  Will=3D3D20=3D3D
> > >>  =3D20
> > >> >> report back on poudriere results (no block_cloning).
> > >>  =3D3D20
> > >>  As for poudriere, build failures are still rolling in. These ar=
> > e
> > >>  =3D
> > >> >>> (and=3D3D20=3D3D
> > >>  =3D20
> > >> >> have been) entirely random on every run. Some examples from this =
> > =3D
> > >> >>> run:
> > >>  =3D3D20
> > >> >>> lang/php81:
> > >> >>> - post-install: @${INSTALL_DATA}
> > >> >>> ${WRKSRC}/php.ini-development=3D3D20
> > >> >>> ${WRKSRC}/php.ini-production ${WRKDIR}/php.conf =3D3D
> > >> >>> ${STAGEDIR}/${PREFIX}/etc
> > >> >> - consumers fail to build due to corrupted php.conf packaged
> > >> >>> =3D3D20
> > >> >>> devel/ninja:
> > >> >>> - phase: stage
> > >> >>> - install -s -m 555=3D3D20
> > >> >>> /wrkdirs/usr/ports/devel/ninja/work/ninja-1.11.1/ninja=3D3D20
> > >> >>> /wrkdirs/usr/ports/devel/ninja/work/stage/usr/local/bin
> > >> >>> - consumers fail to build due to corrupted bin/ninja packaged
> > >> >>> =3D3D20
> > >> >>> devel/netsurf-buildsystem:
> > >> >>> - phase: stage
> > >> >>> - mkdir -p=3D3D20
> > >> >>> =3D3D
> > >> >>> =3D
> > >> >> /wrkdirs/usr/ports/devel/netsurf-buildsystem/work/stage/usr/local=
> > /share/n
> > >>  e=3D
> > >> >> =3D3D
> > >>  tsurf-buildsystem/makefiles=3D3D20
> > >> >> =3D3D
> > >> >>> =3D
> > >> >> /wrkdirs/usr/ports/devel/netsurf-buildsystem/work/stage/usr/local=
> > /share/n
> > >>  e=3D
> > >> >> =3D3D
> > >>  tsurf-buildsystem/testtools
> > >> >> for M in Makefile.top Makefile.tools Makefile.subdir =3D3D
> > >> >>> Makefile.pkgconfig=3D3D20
> > >> >> Makefile.clang Makefile.gcc Makefile.norcroft Makefile.open64; do
> > >> >> \
> > >> >>> cp makefiles/$M=3D3D20
> > >> >>> =3D3D
> > >> >>> =3D
> > >> >> /wrkdirs/usr/ports/devel/netsurf-buildsystem/work/stage/usr/local=
> > /share/n
> > >>  e=3D
> > >> >> =3D3D
> > >>  tsurf-buildsystem/makefiles/;=3D3D20
> > >> >> \
> > >> >>> done
> > >> >>> - graphics/libnsgif fails to build due to NUL characters in=3D3D=
> > 20
> > >> >>> Makefile.{clang,subdir}, causing nothing to link
> > >> >>> =3D20
> > >> >> Summary: I have problems building ports into packages
> > >> >> via poudriere-devel use despite being fully updated/patched
> > >> >> (as of when I started the experiment), never having enabled
> > >> >> block_cloning ( still using openzfs-2.1-freebsd ).
> > >> >> =3D20
> > >> >> In other words, I can confirm other reports that have
> > >> >> been made.
> > >> >> =3D20
> > >> >> The details follow.
> > >> >> =3D20
> > >> >> =3D20
> > >> >> [Written as I was working on setting up for the experiments
> > >> >> and then executing those experiments, adjusting as I went
> > >> >> along.]
> > >> >> =3D20
> > >> >> I've run my own tests in a context that has never had the
> > >> >> zpool upgrade and that jump from before the openzfs import to
> > >> >> after the existing commits for trying to fix openzfs on
> > >> >> FreeBSD. I report on the sequence of activities getting to
> > >> >> the point of testing as well.
> > >> >> =3D20
> > >> >> By personal policy I keep my (non-temporary) pool's compatible
> > >> >> with what the most recent ??.?-RELEASE supports, using
> > >> >> openzfs-2.1-freebsd for now. The 

Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

2023-04-13 Thread Cy Schubert
In message 
, Mateusz Guzik writes:
> On 4/13/23, Cy Schubert  wrote:
> > On Thu, 13 Apr 2023 19:54:42 +0900
> > Pawe=C5=82 Jakub Dawidek  wrote:
> >
> >> On Apr 13, 2023, at 16:10, Cy Schubert  wrote=
> :
> >> >
> >> > =EF=BB=BFIn message <20230413070426.8a54f...@slippy.cwsent.com>, Cy Sc=
> hubert
> >> > writes:
> >> > In message <20230413064252.1e5c1...@slippy.cwsent.com>, Cy Schubert
> >> > writes:
> >> >> In message , Mark
> >> >> Millard
> >> >>> write
> >> >>> s:
> >> >>> [This just puts my prior reply's material into Cy's
> >>  adjusted resend of the original. The To/Cc should
> >>  be coomplete this time.]
> >> 
> >>  On Apr 12, 2023, at 22:52, Cy Schubert  =
> =3D
> >>  wrote:
> >> 
> >>  In message , Mark =
> =3D
> >> > Millard=3D20
> >>  write
> >> > s:
> >> > From: Charlie Li  wrote on
> >> >> Date: Wed, 12 Apr 2023 20:11:16 UTC :
> >> >> =3D20
> >> >> Charlie Li wrote:
> >> >>> Mateusz Guzik wrote:
> >>  can you please test poudriere with
> >> > https://github.com/openzfs/zfs/pull/14739/files
> >> > =3D20
> >> > After applying, on the md(4)-backed pool regardless of =3D3D
> >>  block_cloning,=3D3D20
> >> >> the cy@ `cp -R` test reports no differing (ie corrupted) files. =
> =3D
> >>  Will=3D3D20=3D3D
> >>  =3D20
> >> >> report back on poudriere results (no block_cloning).
> >>  =3D3D20
> >>  As for poudriere, build failures are still rolling in. These ar=
> e
> >>  =3D
> >> >>> (and=3D3D20=3D3D
> >>  =3D20
> >> >> have been) entirely random on every run. Some examples from this =
> =3D
> >> >>> run:
> >>  =3D3D20
> >> >>> lang/php81:
> >> >>> - post-install: @${INSTALL_DATA}
> >> >>> ${WRKSRC}/php.ini-development=3D3D20
> >> >>> ${WRKSRC}/php.ini-production ${WRKDIR}/php.conf =3D3D
> >> >>> ${STAGEDIR}/${PREFIX}/etc
> >> >> - consumers fail to build due to corrupted php.conf packaged
> >> >>> =3D3D20
> >> >>> devel/ninja:
> >> >>> - phase: stage
> >> >>> - install -s -m 555=3D3D20
> >> >>> /wrkdirs/usr/ports/devel/ninja/work/ninja-1.11.1/ninja=3D3D20
> >> >>> /wrkdirs/usr/ports/devel/ninja/work/stage/usr/local/bin
> >> >>> - consumers fail to build due to corrupted bin/ninja packaged
> >> >>> =3D3D20
> >> >>> devel/netsurf-buildsystem:
> >> >>> - phase: stage
> >> >>> - mkdir -p=3D3D20
> >> >>> =3D3D
> >> >>> =3D
> >> >> /wrkdirs/usr/ports/devel/netsurf-buildsystem/work/stage/usr/local=
> /share/n
> >>  e=3D
> >> >> =3D3D
> >>  tsurf-buildsystem/makefiles=3D3D20
> >> >> =3D3D
> >> >>> =3D
> >> >> /wrkdirs/usr/ports/devel/netsurf-buildsystem/work/stage/usr/local=
> /share/n
> >>  e=3D
> >> >> =3D3D
> >>  tsurf-buildsystem/testtools
> >> >> for M in Makefile.top Makefile.tools Makefile.subdir =3D3D
> >> >>> Makefile.pkgconfig=3D3D20
> >> >> Makefile.clang Makefile.gcc Makefile.norcroft Makefile.open64; do
> >> >> \
> >> >>> cp makefiles/$M=3D3D20
> >> >>> =3D3D
> >> >>> =3D
> >> >> /wrkdirs/usr/ports/devel/netsurf-buildsystem/work/stage/usr/local=
> /share/n
> >>  e=3D
> >> >> =3D3D
> >>  tsurf-buildsystem/makefiles/;=3D3D20
> >> >> \
> >> >>> done
> >> >>> - graphics/libnsgif fails to build due to NUL characters in=3D3D=
> 20
> >> >>> Makefile.{clang,subdir}, causing nothing to link
> >> >>> =3D20
> >> >> Summary: I have problems building ports into packages
> >> >> via poudriere-devel use despite being fully updated/patched
> >> >> (as of when I started the experiment), never having enabled
> >> >> block_cloning ( still using openzfs-2.1-freebsd ).
> >> >> =3D20
> >> >> In other words, I can confirm other reports that have
> >> >> been made.
> >> >> =3D20
> >> >> The details follow.
> >> >> =3D20
> >> >> =3D20
> >> >> [Written as I was working on setting up for the experiments
> >> >> and then executing those experiments, adjusting as I went
> >> >> along.]
> >> >> =3D20
> >> >> I've run my own tests in a context that has never had the
> >> >> zpool upgrade and that jump from before the openzfs import to
> >> >> after the existing commits for trying to fix openzfs on
> >> >> FreeBSD. I report on the sequence of activities getting to
> >> >> the point of testing as well.
> >> >> =3D20
> >> >> By personal policy I keep my (non-temporary) pool's compatible
> >> >> with what the most recent ??.?-RELEASE supports, using
> >> >> openzfs-2.1-freebsd for now. The pools involved below have
> >> >> never had a zpool upgrade from where they started. (I've no
> >> >> pools that have ever had a zpool upgrade.)
> >> >> =3D20
> >> >> (Temporary pools are rare for me, such as this investigation.
> >> >> But I'm not testing block_cloning or anything new this time.)
> 

Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

2023-04-13 Thread Mateusz Guzik
On 4/13/23, Cy Schubert  wrote:
> On Thu, 13 Apr 2023 19:54:42 +0900
> Paweł Jakub Dawidek  wrote:
>
>> On Apr 13, 2023, at 16:10, Cy Schubert  wrote:
>> >
>> > In message <20230413070426.8a54f...@slippy.cwsent.com>, Cy Schubert
>> > writes:
>> > In message <20230413064252.1e5c1...@slippy.cwsent.com>, Cy Schubert
>> > writes:
>> >> In message , Mark
>> >> Millard
>> >>> write
>> >>> s:
>> >>> [This just puts my prior reply's material into Cy's
>>  adjusted resend of the original. The To/Cc should
>>  be coomplete this time.]
>> 
>>  On Apr 12, 2023, at 22:52, Cy Schubert  =
>>  wrote:
>> 
>>  In message , Mark =
>> > Millard=20
>>  write
>> > s:
>> > From: Charlie Li  wrote on
>> >> Date: Wed, 12 Apr 2023 20:11:16 UTC :
>> >> =20
>> >> Charlie Li wrote:
>> >>> Mateusz Guzik wrote:
>>  can you please test poudriere with
>> > https://github.com/openzfs/zfs/pull/14739/files
>> > =20
>> > After applying, on the md(4)-backed pool regardless of =3D
>>  block_cloning,=3D20
>> >> the cy@ `cp -R` test reports no differing (ie corrupted) files. =
>>  Will=3D20=3D
>>  =20
>> >> report back on poudriere results (no block_cloning).
>>  =3D20
>>  As for poudriere, build failures are still rolling in. These are
>>  =
>> >>> (and=3D20=3D
>>  =20
>> >> have been) entirely random on every run. Some examples from this =
>> >>> run:
>>  =3D20
>> >>> lang/php81:
>> >>> - post-install: @${INSTALL_DATA}
>> >>> ${WRKSRC}/php.ini-development=3D20
>> >>> ${WRKSRC}/php.ini-production ${WRKDIR}/php.conf =3D
>> >>> ${STAGEDIR}/${PREFIX}/etc
>> >> - consumers fail to build due to corrupted php.conf packaged
>> >>> =3D20
>> >>> devel/ninja:
>> >>> - phase: stage
>> >>> - install -s -m 555=3D20
>> >>> /wrkdirs/usr/ports/devel/ninja/work/ninja-1.11.1/ninja=3D20
>> >>> /wrkdirs/usr/ports/devel/ninja/work/stage/usr/local/bin
>> >>> - consumers fail to build due to corrupted bin/ninja packaged
>> >>> =3D20
>> >>> devel/netsurf-buildsystem:
>> >>> - phase: stage
>> >>> - mkdir -p=3D20
>> >>> =3D
>> >>> =
>> >> /wrkdirs/usr/ports/devel/netsurf-buildsystem/work/stage/usr/local/share/n
>>  e=
>> >> =3D
>>  tsurf-buildsystem/makefiles=3D20
>> >> =3D
>> >>> =
>> >> /wrkdirs/usr/ports/devel/netsurf-buildsystem/work/stage/usr/local/share/n
>>  e=
>> >> =3D
>>  tsurf-buildsystem/testtools
>> >> for M in Makefile.top Makefile.tools Makefile.subdir =3D
>> >>> Makefile.pkgconfig=3D20
>> >> Makefile.clang Makefile.gcc Makefile.norcroft Makefile.open64; do
>> >> \
>> >>> cp makefiles/$M=3D20
>> >>> =3D
>> >>> =
>> >> /wrkdirs/usr/ports/devel/netsurf-buildsystem/work/stage/usr/local/share/n
>>  e=
>> >> =3D
>>  tsurf-buildsystem/makefiles/;=3D20
>> >> \
>> >>> done
>> >>> - graphics/libnsgif fails to build due to NUL characters in=3D20
>> >>> Makefile.{clang,subdir}, causing nothing to link
>> >>> =20
>> >> Summary: I have problems building ports into packages
>> >> via poudriere-devel use despite being fully updated/patched
>> >> (as of when I started the experiment), never having enabled
>> >> block_cloning ( still using openzfs-2.1-freebsd ).
>> >> =20
>> >> In other words, I can confirm other reports that have
>> >> been made.
>> >> =20
>> >> The details follow.
>> >> =20
>> >> =20
>> >> [Written as I was working on setting up for the experiments
>> >> and then executing those experiments, adjusting as I went
>> >> along.]
>> >> =20
>> >> I've run my own tests in a context that has never had the
>> >> zpool upgrade and that jump from before the openzfs import to
>> >> after the existing commits for trying to fix openzfs on
>> >> FreeBSD. I report on the sequence of activities getting to
>> >> the point of testing as well.
>> >> =20
>> >> By personal policy I keep my (non-temporary) pool's compatible
>> >> with what the most recent ??.?-RELEASE supports, using
>> >> openzfs-2.1-freebsd for now. The pools involved below have
>> >> never had a zpool upgrade from where they started. (I've no
>> >> pools that have ever had a zpool upgrade.)
>> >> =20
>> >> (Temporary pools are rare for me, such as this investigation.
>> >> But I'm not testing block_cloning or anything new this time.)
>> >> =20
>> >> I'll note that I use zfs for bectl, not for redundancy. So
>> >> my evidence is more limited in that respect.
>> >> =20
>> >> The activities were done on a HoneyComb (16 Cortex-A72 cores).
>> >> The system has and supports ECC RAM, 64 GiBytes of RAM are
>> >> present.
>> >> =20
>> >> I started by duplicating my normal zfs environment to an
>> >> external USB3 NVMe drive and 

Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

2023-04-13 Thread Cy Schubert
On Thu, 13 Apr 2023 19:54:42 +0900
Paweł Jakub Dawidek  wrote:

> On Apr 13, 2023, at 16:10, Cy Schubert  wrote:
> > 
> > In message <20230413070426.8a54f...@slippy.cwsent.com>, Cy Schubert writes:
> > In message <20230413064252.1e5c1...@slippy.cwsent.com>, Cy Schubert writes:
> >> In message , Mark Millard
> >>> write
> >>> s:
> >>> [This just puts my prior reply's material into Cy's
>  adjusted resend of the original. The To/Cc should
>  be coomplete this time.]
>  
>  On Apr 12, 2023, at 22:52, Cy Schubert  =
>  wrote:
>  
>  In message , Mark =
> > Millard=20
>  write
> > s:
> > From: Charlie Li  wrote on
> >> Date: Wed, 12 Apr 2023 20:11:16 UTC :
> >> =20
> >> Charlie Li wrote:
> >>> Mateusz Guzik wrote:
>  can you please test poudriere with
> > https://github.com/openzfs/zfs/pull/14739/files
> > =20
> > After applying, on the md(4)-backed pool regardless of =3D
>  block_cloning,=3D20
> >> the cy@ `cp -R` test reports no differing (ie corrupted) files. =
>  Will=3D20=3D
>  =20
> >> report back on poudriere results (no block_cloning).
>  =3D20
>  As for poudriere, build failures are still rolling in. These are =
> >>> (and=3D20=3D
>  =20
> >> have been) entirely random on every run. Some examples from this =
> >>> run:
>  =3D20
> >>> lang/php81:
> >>> - post-install: @${INSTALL_DATA} ${WRKSRC}/php.ini-development=3D20
> >>> ${WRKSRC}/php.ini-production ${WRKDIR}/php.conf =3D
> >>> ${STAGEDIR}/${PREFIX}/etc
> >> - consumers fail to build due to corrupted php.conf packaged
> >>> =3D20
> >>> devel/ninja:
> >>> - phase: stage
> >>> - install -s -m 555=3D20
> >>> /wrkdirs/usr/ports/devel/ninja/work/ninja-1.11.1/ninja=3D20
> >>> /wrkdirs/usr/ports/devel/ninja/work/stage/usr/local/bin
> >>> - consumers fail to build due to corrupted bin/ninja packaged
> >>> =3D20
> >>> devel/netsurf-buildsystem:
> >>> - phase: stage
> >>> - mkdir -p=3D20
> >>> =3D
> >>> =
> >> /wrkdirs/usr/ports/devel/netsurf-buildsystem/work/stage/usr/local/share/n
>  e=
> >> =3D
>  tsurf-buildsystem/makefiles=3D20
> >> =3D
> >>> =
> >> /wrkdirs/usr/ports/devel/netsurf-buildsystem/work/stage/usr/local/share/n
>  e=
> >> =3D
>  tsurf-buildsystem/testtools
> >> for M in Makefile.top Makefile.tools Makefile.subdir =3D
> >>> Makefile.pkgconfig=3D20
> >> Makefile.clang Makefile.gcc Makefile.norcroft Makefile.open64; do \
> >>> cp makefiles/$M=3D20
> >>> =3D
> >>> =
> >> /wrkdirs/usr/ports/devel/netsurf-buildsystem/work/stage/usr/local/share/n
>  e=
> >> =3D
>  tsurf-buildsystem/makefiles/;=3D20
> >> \
> >>> done
> >>> - graphics/libnsgif fails to build due to NUL characters in=3D20
> >>> Makefile.{clang,subdir}, causing nothing to link
> >>> =20
> >> Summary: I have problems building ports into packages
> >> via poudriere-devel use despite being fully updated/patched
> >> (as of when I started the experiment), never having enabled
> >> block_cloning ( still using openzfs-2.1-freebsd ).
> >> =20
> >> In other words, I can confirm other reports that have
> >> been made.
> >> =20
> >> The details follow.
> >> =20
> >> =20
> >> [Written as I was working on setting up for the experiments
> >> and then executing those experiments, adjusting as I went
> >> along.]
> >> =20
> >> I've run my own tests in a context that has never had the
> >> zpool upgrade and that jump from before the openzfs import to
> >> after the existing commits for trying to fix openzfs on
> >> FreeBSD. I report on the sequence of activities getting to
> >> the point of testing as well.
> >> =20
> >> By personal policy I keep my (non-temporary) pool's compatible
> >> with what the most recent ??.?-RELEASE supports, using
> >> openzfs-2.1-freebsd for now. The pools involved below have
> >> never had a zpool upgrade from where they started. (I've no
> >> pools that have ever had a zpool upgrade.)
> >> =20
> >> (Temporary pools are rare for me, such as this investigation.
> >> But I'm not testing block_cloning or anything new this time.)
> >> =20
> >> I'll note that I use zfs for bectl, not for redundancy. So
> >> my evidence is more limited in that respect.
> >> =20
> >> The activities were done on a HoneyComb (16 Cortex-A72 cores).
> >> The system has and supports ECC RAM, 64 GiBytes of RAM are
> >> present.
> >> =20
> >> I started by duplicating my normal zfs environment to an
> >> external USB3 NVMe drive and adjusting the host name and such
> >> to produce the below. (Non-debug, although I do not strip
> >> symbols.) :
> >> =20
> >> # uname -apKU
> >> FreeBSD CA72_4c8G_ZFS 14.0-CURRENT FreeBSD 

Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

2023-04-13 Thread Mark Millard
On Apr 13, 2023, at 04:04, Charlie Li  wrote:

> Paweł Jakub Dawidek wrote:
>> Can you please try this patch:
>> 
>> Unfortunately I don’t see how this can happen with block cloning disabled.
> This patch made no difference in poudriere; corruption still rolled in.
> 

Same "made no difference" here.

https://lists.freebsd.org/archives/dev-commits-src-main/2023-April/014512.html

indicated that I'd applied the patch already: "I then also applied the patch
from: https://github.com/openzfs/zfs/pull/14739/files;

Cy reported the same in:

https://lists.freebsd.org/archives/dev-commits-src-main/2023-April/014519.html

"The EXDEV patch is applied. Block_cloning is disabled."

===
Mark Millard
marklmi at yahoo.com




Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

2023-04-13 Thread Charlie Li

Danilo Egea Gondolfo wrote:

I'm having a funny issue here and I'm wondering if it is related.

When building one of my ports I will, eventually, not always, get a file 
full of zeros as a result.


The build will create copies of crispy-setup and, every once in a while, 
one of them will be a blob of zeros:


I'm running the recent ZFS update but I never upgraded my pool:

This is exactly it. The copy operation within the same dataset will 
sometimes turn up corruption and randomly, so not the same file(s) get hit.


--
Charlie Li
…nope, still don't have an exit line.



OpenPGP_signature
Description: OpenPGP digital signature


Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

2023-04-13 Thread Charlie Li

Paweł Jakub Dawidek wrote:

Can you please try this patch:


Unfortunately I don’t see how this can happen with block cloning disabled.


This patch made no difference in poudriere; corruption still rolled in.

--
Charlie Li
…nope, still don't have an exit line.



OpenPGP_signature
Description: OpenPGP digital signature


Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

2023-04-13 Thread Danilo Egea Gondolfo



On 13/04/2023 06:28, Mark Millard wrote:

From: Charlie Li  wrote on
Date: Wed, 12 Apr 2023 20:11:16 UTC :


Charlie Li wrote:

Mateusz Guzik wrote:

can you please test poudriere with
https://github.com/openzfs/zfs/pull/14739/files


After applying, on the md(4)-backed pool regardless of block_cloning,
the cy@ `cp -R` test reports no differing (ie corrupted) files. Will
report back on poudriere results (no block_cloning).


As for poudriere, build failures are still rolling in. These are (and
have been) entirely random on every run. Some examples from this run:

lang/php81:
- post-install: @${INSTALL_DATA} ${WRKSRC}/php.ini-development
${WRKSRC}/php.ini-production ${WRKDIR}/php.conf ${STAGEDIR}/${PREFIX}/etc
- consumers fail to build due to corrupted php.conf packaged

devel/ninja:
- phase: stage
- install -s -m 555
/wrkdirs/usr/ports/devel/ninja/work/ninja-1.11.1/ninja
/wrkdirs/usr/ports/devel/ninja/work/stage/usr/local/bin
- consumers fail to build due to corrupted bin/ninja packaged

devel/netsurf-buildsystem:
- phase: stage
- mkdir -p
/wrkdirs/usr/ports/devel/netsurf-buildsystem/work/stage/usr/local/share/netsurf-buildsystem/makefiles
/wrkdirs/usr/ports/devel/netsurf-buildsystem/work/stage/usr/local/share/netsurf-buildsystem/testtools
for M in Makefile.top Makefile.tools Makefile.subdir Makefile.pkgconfig
Makefile.clang Makefile.gcc Makefile.norcroft Makefile.open64; do \
cp makefiles/$M
/wrkdirs/usr/ports/devel/netsurf-buildsystem/work/stage/usr/local/share/netsurf-buildsystem/makefiles/;
\
done
- graphics/libnsgif fails to build due to NUL characters in
Makefile.{clang,subdir}, causing nothing to link

Summary: I have problems building ports into packages
via poudriere-devel use despite being fully updated/patched
(as of when I started the experiment), never having enabled
block_cloning ( still using openzfs-2.1-freebsd ).

In other words, I can confirm other reports that have
been made.

The details follow.


[Written as I was working on setting up for the experiments
and then executing those experiments, adjusting as I went
along.]

I've run my own tests in a context that has never had the
zpool upgrade and that jump from before the openzfs import to
after the existing commits for trying to fix openzfs on
FreeBSD. I report on the sequence of activities getting to
the point of testing as well.

By personal policy I keep my (non-temporary) pool's compatible
with what the most recent ??.?-RELEASE supports, using
openzfs-2.1-freebsd for now. The pools involved below have
never had a zpool upgrade from where they started. (I've no
pools that have ever had a zpool upgrade.)

(Temporary pools are rare for me, such as this investigation.
But I'm not testing block_cloning or anything new this time.)

I'll note that I use zfs for bectl, not for redundancy. So
my evidence is more limited in that respect.

The activities were done on a HoneyComb (16 Cortex-A72 cores).
The system has and supports ECC RAM, 64 GiBytes of RAM are
present.

I started by duplicating my normal zfs environment to an
external USB3 NVMe drive and adjusting the host name and such
to produce the below. (Non-debug, although I do not strip
symbols.) :

# uname -apKU
FreeBSD CA72_4c8G_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #90 
main-n261544-cee09bda03c8-dirty: Wed Mar 15 20:25:49 PDT 2023 
root@CA72_16Gp_ZFS:/usr/obj/BUILDs/main-CA72-nodbg-clang/usr/main-src/arm64.aarch64/sys/GENERIC-NODBG-CA72
 arm64 aarch64 1400082 1400082

I then did: git fetch, stash push ., merge --ff-only, stash apply . :
my normal procedure. I then also applied the patch from:

https://github.com/openzfs/zfs/pull/14739/files

Then I did: buildworld buildkernel, install them, and rebooted.

The result was:

# uname -apKU
FreeBSD CA72_4c8G_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #91 
main-n262122-2ef2c26f3f13-dirty: Wed Apr 12 19:23:35 PDT 2023 
root@CA72_4c8G_ZFS:/usr/obj/BUILDs/main-CA72-nodbg-clang/usr/main-src/arm64.aarch64/sys/GENERIC-NODBG-CA72
 arm64 aarch64 1400086 1400086

The later poudriere-devel based build of packages from ports is
based on:

# ~/fbsd-based-on-what-commit.sh -C /usr/ports
4e94ac9eb97f (HEAD -> main, freebsd/main, freebsd/HEAD) devel/freebsd-gcc12: 
Bump to 12.2.0.
Author: John Baldwin 
Commit: John Baldwin 
CommitDate: 2023-03-25 00:06:40 +
branch: main
merge-base: 4e94ac9eb97fab16510b74ebcaa9316613182a72
merge-base: CommitDate: 2023-03-25 00:06:40 +
n613214 (--first-parent --count for merge-base)

poudriere attempted to build 476 packages, starting
with pkg (in order to build the 56 that I explicitly
indicate that I want). It is my normal set of ports.
The form of building is biased to allowing a high
load average compared to the number of hardware
threads (same as cores here): each builder is allowed
to use the full count of hardware threads. The build
used USE_TMPFS="data" instead of the USE_TMPFS=all I
normally use on the build machine involved.

And it produced some random errors during the attempted
builds. A 

Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

2023-04-13 Thread Mark Millard
On Apr 12, 2023, at 23:52, Alexander Leidinger  wrote:

> Quoting Mark Millard  (from Wed, 12 Apr 2023 22:28:13 
> -0700):
> 
>> A fair number of errors are of the form: the build
>> installing a previously built package for use in the
>> builder but later the builder can not find some file
>> from the package's installation.
> 
> As a data point, last year I had such issues with one particular package. It 
> was consistent no matter how often I was updating the ports tree. Poudriere 
> always failed on port X which was depending on port Y (don't remember the 
> names). The problem was, that port Y was build successfully but an extract of 
> it was not having a file it was supposed to have. IIRC I fixed the issue by 
> building the port Y manually, as re-building port Y with poudriere didn't 
> change the outcome.
> 
> So it seems this may not be specific to the most recent ZFS version, but 
> could be an older issue. It may be the case that the more recent ZFS version 
> amplifies the problem. It can also be that it is related to a specific use 
> case in poudriere.

In my procedure I'm building the same versions of the same ports
that I'd built in the pre-ZFS-import context, just in my
jail for experiments instead of in the jail for normal use.
(So I still have the original package files available.) I
am working a distinct media that started as a copy of my
good context.

In other words, I was reporting differences with the known-status
as shown by prior builds of the same /usr/ports/ tree. The
difference is just my progressing FreeBSD's version.

I'm even using the exact same machine to do the builds, but with
distinct media. (My good environment's FreeBSD still predates
the zfs import.)

> I remember a recent mail which talks about poudriere failing to copy files in 
> resource-limited environments, see 
> https://lists.freebsd.org/archives/dev-commits-src-all/2023-April/025153.html
> While the issue you are trying to pin-point may not be related to this 
> discussion, I mention it because it smells to me like we could be in a 
> situation where a similar combination of unrelated to each other FreeBSD 
> features could form a combination which triggers the issue at hand.

My procedure eliminates this alternative.

===
Mark Millard
marklmi at yahoo.com




Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

2023-04-13 Thread Cy Schubert
In message <20230413070426.8a54f...@slippy.cwsent.com>, Cy Schubert writes:
> In message <20230413064252.1e5c1...@slippy.cwsent.com>, Cy Schubert writes:
> > In message , Mark Millard 
> > write
> > s:
> > > [This just puts my prior reply's material into Cy's
> > > adjusted resend of the original. The To/Cc should
> > > be coomplete this time.]
> > >
> > > On Apr 12, 2023, at 22:52, Cy Schubert  =
> > > wrote:
> > >
> > > > In message , Mark =
> > > Millard=20
> > > > write
> > > > s:
> > > >> From: Charlie Li  wrote on
> > > >> Date: Wed, 12 Apr 2023 20:11:16 UTC :
> > > >>=20
> > > >>> Charlie Li wrote:
> > >  Mateusz Guzik wrote:
> > > > can you please test poudriere with
> > > > https://github.com/openzfs/zfs/pull/14739/files
> > > >=20
> > >  After applying, on the md(4)-backed pool regardless of =3D
> > > >> block_cloning,=3D20
> > >  the cy@ `cp -R` test reports no differing (ie corrupted) files. =
> > > Will=3D20=3D
> > > >>=20
> > >  report back on poudriere results (no block_cloning).
> > >  =3D20
> > > >>> As for poudriere, build failures are still rolling in. These are =
> > > (and=3D20=3D
> > > >>=20
> > > >>> have been) entirely random on every run. Some examples from this =
> > > run:
> > > >>> =3D20
> > > >>> lang/php81:
> > > >>> - post-install: @${INSTALL_DATA} ${WRKSRC}/php.ini-development=3D20
> > > >>> ${WRKSRC}/php.ini-production ${WRKDIR}/php.conf =3D
> > > >> ${STAGEDIR}/${PREFIX}/etc
> > > >>> - consumers fail to build due to corrupted php.conf packaged
> > > >>> =3D20
> > > >>> devel/ninja:
> > > >>> - phase: stage
> > > >>> - install -s -m 555=3D20
> > > >>> /wrkdirs/usr/ports/devel/ninja/work/ninja-1.11.1/ninja=3D20
> > > >>> /wrkdirs/usr/ports/devel/ninja/work/stage/usr/local/bin
> > > >>> - consumers fail to build due to corrupted bin/ninja packaged
> > > >>> =3D20
> > > >>> devel/netsurf-buildsystem:
> > > >>> - phase: stage
> > > >>> - mkdir -p=3D20
> > > >>> =3D
> > > >> =
> > > /wrkdirs/usr/ports/devel/netsurf-buildsystem/work/stage/usr/local/share/n
> e=
> > > =3D
> > > >> tsurf-buildsystem/makefiles=3D20
> > > >>> =3D
> > > >> =
> > > /wrkdirs/usr/ports/devel/netsurf-buildsystem/work/stage/usr/local/share/n
> e=
> > > =3D
> > > >> tsurf-buildsystem/testtools
> > > >>> for M in Makefile.top Makefile.tools Makefile.subdir =3D
> > > >> Makefile.pkgconfig=3D20
> > > >>> Makefile.clang Makefile.gcc Makefile.norcroft Makefile.open64; do \
> > > >>> cp makefiles/$M=3D20
> > > >>> =3D
> > > >> =
> > > /wrkdirs/usr/ports/devel/netsurf-buildsystem/work/stage/usr/local/share/n
> e=
> > > =3D
> > > >> tsurf-buildsystem/makefiles/;=3D20
> > > >>> \
> > > >>> done
> > > >>> - graphics/libnsgif fails to build due to NUL characters in=3D20
> > > >>> Makefile.{clang,subdir}, causing nothing to link
> > > >>=20
> > > >> Summary: I have problems building ports into packages
> > > >> via poudriere-devel use despite being fully updated/patched
> > > >> (as of when I started the experiment), never having enabled
> > > >> block_cloning ( still using openzfs-2.1-freebsd ).
> > > >>=20
> > > >> In other words, I can confirm other reports that have
> > > >> been made.
> > > >>=20
> > > >> The details follow.
> > > >>=20
> > > >>=20
> > > >> [Written as I was working on setting up for the experiments
> > > >> and then executing those experiments, adjusting as I went
> > > >> along.]
> > > >>=20
> > > >> I've run my own tests in a context that has never had the
> > > >> zpool upgrade and that jump from before the openzfs import to
> > > >> after the existing commits for trying to fix openzfs on
> > > >> FreeBSD. I report on the sequence of activities getting to
> > > >> the point of testing as well.
> > > >>=20
> > > >> By personal policy I keep my (non-temporary) pool's compatible
> > > >> with what the most recent ??.?-RELEASE supports, using
> > > >> openzfs-2.1-freebsd for now. The pools involved below have
> > > >> never had a zpool upgrade from where they started. (I've no
> > > >> pools that have ever had a zpool upgrade.)
> > > >>=20
> > > >> (Temporary pools are rare for me, such as this investigation.
> > > >> But I'm not testing block_cloning or anything new this time.)
> > > >>=20
> > > >> I'll note that I use zfs for bectl, not for redundancy. So
> > > >> my evidence is more limited in that respect.
> > > >>=20
> > > >> The activities were done on a HoneyComb (16 Cortex-A72 cores).
> > > >> The system has and supports ECC RAM, 64 GiBytes of RAM are
> > > >> present.
> > > >>=20
> > > >> I started by duplicating my normal zfs environment to an
> > > >> external USB3 NVMe drive and adjusting the host name and such
> > > >> to produce the below. (Non-debug, although I do not strip
> > > >> symbols.) :
> > > >>=20
> > > >> # uname -apKU
> > > >> FreeBSD CA72_4c8G_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #90 =3D
> > > >> main-n261544-cee09bda03c8-dirty: Wed Mar 15 20:25:49 PDT 2023 =3D
> > > >> =
> > > 

Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

2023-04-13 Thread Cy Schubert
In message <20230413064252.1e5c1...@slippy.cwsent.com>, Cy Schubert writes:
> In message , Mark Millard 
> write
> s:
> > [This just puts my prior reply's material into Cy's
> > adjusted resend of the original. The To/Cc should
> > be coomplete this time.]
> >
> > On Apr 12, 2023, at 22:52, Cy Schubert  =
> > wrote:
> >
> > > In message , Mark =
> > Millard=20
> > > write
> > > s:
> > >> From: Charlie Li  wrote on
> > >> Date: Wed, 12 Apr 2023 20:11:16 UTC :
> > >>=20
> > >>> Charlie Li wrote:
> >  Mateusz Guzik wrote:
> > > can you please test poudriere with
> > > https://github.com/openzfs/zfs/pull/14739/files
> > >=20
> >  After applying, on the md(4)-backed pool regardless of =3D
> > >> block_cloning,=3D20
> >  the cy@ `cp -R` test reports no differing (ie corrupted) files. =
> > Will=3D20=3D
> > >>=20
> >  report back on poudriere results (no block_cloning).
> >  =3D20
> > >>> As for poudriere, build failures are still rolling in. These are =
> > (and=3D20=3D
> > >>=20
> > >>> have been) entirely random on every run. Some examples from this =
> > run:
> > >>> =3D20
> > >>> lang/php81:
> > >>> - post-install: @${INSTALL_DATA} ${WRKSRC}/php.ini-development=3D20
> > >>> ${WRKSRC}/php.ini-production ${WRKDIR}/php.conf =3D
> > >> ${STAGEDIR}/${PREFIX}/etc
> > >>> - consumers fail to build due to corrupted php.conf packaged
> > >>> =3D20
> > >>> devel/ninja:
> > >>> - phase: stage
> > >>> - install -s -m 555=3D20
> > >>> /wrkdirs/usr/ports/devel/ninja/work/ninja-1.11.1/ninja=3D20
> > >>> /wrkdirs/usr/ports/devel/ninja/work/stage/usr/local/bin
> > >>> - consumers fail to build due to corrupted bin/ninja packaged
> > >>> =3D20
> > >>> devel/netsurf-buildsystem:
> > >>> - phase: stage
> > >>> - mkdir -p=3D20
> > >>> =3D
> > >> =
> > /wrkdirs/usr/ports/devel/netsurf-buildsystem/work/stage/usr/local/share/ne=
> > =3D
> > >> tsurf-buildsystem/makefiles=3D20
> > >>> =3D
> > >> =
> > /wrkdirs/usr/ports/devel/netsurf-buildsystem/work/stage/usr/local/share/ne=
> > =3D
> > >> tsurf-buildsystem/testtools
> > >>> for M in Makefile.top Makefile.tools Makefile.subdir =3D
> > >> Makefile.pkgconfig=3D20
> > >>> Makefile.clang Makefile.gcc Makefile.norcroft Makefile.open64; do \
> > >>> cp makefiles/$M=3D20
> > >>> =3D
> > >> =
> > /wrkdirs/usr/ports/devel/netsurf-buildsystem/work/stage/usr/local/share/ne=
> > =3D
> > >> tsurf-buildsystem/makefiles/;=3D20
> > >>> \
> > >>> done
> > >>> - graphics/libnsgif fails to build due to NUL characters in=3D20
> > >>> Makefile.{clang,subdir}, causing nothing to link
> > >>=20
> > >> Summary: I have problems building ports into packages
> > >> via poudriere-devel use despite being fully updated/patched
> > >> (as of when I started the experiment), never having enabled
> > >> block_cloning ( still using openzfs-2.1-freebsd ).
> > >>=20
> > >> In other words, I can confirm other reports that have
> > >> been made.
> > >>=20
> > >> The details follow.
> > >>=20
> > >>=20
> > >> [Written as I was working on setting up for the experiments
> > >> and then executing those experiments, adjusting as I went
> > >> along.]
> > >>=20
> > >> I've run my own tests in a context that has never had the
> > >> zpool upgrade and that jump from before the openzfs import to
> > >> after the existing commits for trying to fix openzfs on
> > >> FreeBSD. I report on the sequence of activities getting to
> > >> the point of testing as well.
> > >>=20
> > >> By personal policy I keep my (non-temporary) pool's compatible
> > >> with what the most recent ??.?-RELEASE supports, using
> > >> openzfs-2.1-freebsd for now. The pools involved below have
> > >> never had a zpool upgrade from where they started. (I've no
> > >> pools that have ever had a zpool upgrade.)
> > >>=20
> > >> (Temporary pools are rare for me, such as this investigation.
> > >> But I'm not testing block_cloning or anything new this time.)
> > >>=20
> > >> I'll note that I use zfs for bectl, not for redundancy. So
> > >> my evidence is more limited in that respect.
> > >>=20
> > >> The activities were done on a HoneyComb (16 Cortex-A72 cores).
> > >> The system has and supports ECC RAM, 64 GiBytes of RAM are
> > >> present.
> > >>=20
> > >> I started by duplicating my normal zfs environment to an
> > >> external USB3 NVMe drive and adjusting the host name and such
> > >> to produce the below. (Non-debug, although I do not strip
> > >> symbols.) :
> > >>=20
> > >> # uname -apKU
> > >> FreeBSD CA72_4c8G_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #90 =3D
> > >> main-n261544-cee09bda03c8-dirty: Wed Mar 15 20:25:49 PDT 2023 =3D
> > >> =
> > root@CA72_16Gp_ZFS:/usr/obj/BUILDs/main-CA72-nodbg-clang/usr/main-src/arm6=
> > =3D
> > >> 4.aarch64/sys/GENERIC-NODBG-CA72 arm64 aarch64 1400082 1400082
> > >>=20
> > >> I then did: git fetch, stash push ., merge --ff-only, stash apply . :
> > >> my normal procedure. I then also applied the patch from:
> > >>=20
> > >> https://github.com/openzfs/zfs/pull/14739/files
> > >>=20
> > >> 

Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

2023-04-13 Thread Alexander Leidinger
Quoting Mark Millard  (from Wed, 12 Apr 2023  
22:28:13 -0700):



A fair number of errors are of the form: the build
installing a previously built package for use in the
builder but later the builder can not find some file
from the package's installation.


As a data point, last year I had such issues with one particular  
package. It was consistent no matter how often I was updating the  
ports tree. Poudriere always failed on port X which was depending on  
port Y (don't remember the names). The problem was, that port Y was  
build successfully but an extract of it was not having a file it was  
supposed to have. IIRC I fixed the issue by building the port Y  
manually, as re-building port Y with poudriere didn't change the  
outcome.


So it seems this may not be specific to the most recent ZFS version,  
but could be an older issue. It may be the case that the more recent  
ZFS version amplifies the problem. It can also be that it is related  
to a specific use case in poudriere.


I remember a recent mail which talks about poudriere failing to copy  
files in resource-limited environments, see  
https://lists.freebsd.org/archives/dev-commits-src-all/2023-April/025153.html
While the issue you are trying to pin-point may not be related to this  
discussion, I mention it because it smells to me like we could be in a  
situation where a similar combination of unrelated to each other  
FreeBSD features could form a combination which triggers the issue at  
hand.


Bye,
Alexander.

--
http://www.Leidinger.net alexan...@leidinger.net: PGP 0x8F31830F9F2772BF
http://www.FreeBSD.orgnetch...@freebsd.org  : PGP 0x8F31830F9F2772BF


pgpjoaPNf5aAM.pgp
Description: Digitale PGP-Signatur


Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

2023-04-13 Thread Cy Schubert
In message , Mark Millard 
write
s:
> [This just puts my prior reply's material into Cy's
> adjusted resend of the original. The To/Cc should
> be coomplete this time.]
>
> On Apr 12, 2023, at 22:52, Cy Schubert  =
> wrote:
>
> > In message , Mark =
> Millard=20
> > write
> > s:
> >> From: Charlie Li  wrote on
> >> Date: Wed, 12 Apr 2023 20:11:16 UTC :
> >>=20
> >>> Charlie Li wrote:
>  Mateusz Guzik wrote:
> > can you please test poudriere with
> > https://github.com/openzfs/zfs/pull/14739/files
> >=20
>  After applying, on the md(4)-backed pool regardless of =3D
> >> block_cloning,=3D20
>  the cy@ `cp -R` test reports no differing (ie corrupted) files. =
> Will=3D20=3D
> >>=20
>  report back on poudriere results (no block_cloning).
>  =3D20
> >>> As for poudriere, build failures are still rolling in. These are =
> (and=3D20=3D
> >>=20
> >>> have been) entirely random on every run. Some examples from this =
> run:
> >>> =3D20
> >>> lang/php81:
> >>> - post-install: @${INSTALL_DATA} ${WRKSRC}/php.ini-development=3D20
> >>> ${WRKSRC}/php.ini-production ${WRKDIR}/php.conf =3D
> >> ${STAGEDIR}/${PREFIX}/etc
> >>> - consumers fail to build due to corrupted php.conf packaged
> >>> =3D20
> >>> devel/ninja:
> >>> - phase: stage
> >>> - install -s -m 555=3D20
> >>> /wrkdirs/usr/ports/devel/ninja/work/ninja-1.11.1/ninja=3D20
> >>> /wrkdirs/usr/ports/devel/ninja/work/stage/usr/local/bin
> >>> - consumers fail to build due to corrupted bin/ninja packaged
> >>> =3D20
> >>> devel/netsurf-buildsystem:
> >>> - phase: stage
> >>> - mkdir -p=3D20
> >>> =3D
> >> =
> /wrkdirs/usr/ports/devel/netsurf-buildsystem/work/stage/usr/local/share/ne=
> =3D
> >> tsurf-buildsystem/makefiles=3D20
> >>> =3D
> >> =
> /wrkdirs/usr/ports/devel/netsurf-buildsystem/work/stage/usr/local/share/ne=
> =3D
> >> tsurf-buildsystem/testtools
> >>> for M in Makefile.top Makefile.tools Makefile.subdir =3D
> >> Makefile.pkgconfig=3D20
> >>> Makefile.clang Makefile.gcc Makefile.norcroft Makefile.open64; do \
> >>> cp makefiles/$M=3D20
> >>> =3D
> >> =
> /wrkdirs/usr/ports/devel/netsurf-buildsystem/work/stage/usr/local/share/ne=
> =3D
> >> tsurf-buildsystem/makefiles/;=3D20
> >>> \
> >>> done
> >>> - graphics/libnsgif fails to build due to NUL characters in=3D20
> >>> Makefile.{clang,subdir}, causing nothing to link
> >>=20
> >> Summary: I have problems building ports into packages
> >> via poudriere-devel use despite being fully updated/patched
> >> (as of when I started the experiment), never having enabled
> >> block_cloning ( still using openzfs-2.1-freebsd ).
> >>=20
> >> In other words, I can confirm other reports that have
> >> been made.
> >>=20
> >> The details follow.
> >>=20
> >>=20
> >> [Written as I was working on setting up for the experiments
> >> and then executing those experiments, adjusting as I went
> >> along.]
> >>=20
> >> I've run my own tests in a context that has never had the
> >> zpool upgrade and that jump from before the openzfs import to
> >> after the existing commits for trying to fix openzfs on
> >> FreeBSD. I report on the sequence of activities getting to
> >> the point of testing as well.
> >>=20
> >> By personal policy I keep my (non-temporary) pool's compatible
> >> with what the most recent ??.?-RELEASE supports, using
> >> openzfs-2.1-freebsd for now. The pools involved below have
> >> never had a zpool upgrade from where they started. (I've no
> >> pools that have ever had a zpool upgrade.)
> >>=20
> >> (Temporary pools are rare for me, such as this investigation.
> >> But I'm not testing block_cloning or anything new this time.)
> >>=20
> >> I'll note that I use zfs for bectl, not for redundancy. So
> >> my evidence is more limited in that respect.
> >>=20
> >> The activities were done on a HoneyComb (16 Cortex-A72 cores).
> >> The system has and supports ECC RAM, 64 GiBytes of RAM are
> >> present.
> >>=20
> >> I started by duplicating my normal zfs environment to an
> >> external USB3 NVMe drive and adjusting the host name and such
> >> to produce the below. (Non-debug, although I do not strip
> >> symbols.) :
> >>=20
> >> # uname -apKU
> >> FreeBSD CA72_4c8G_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #90 =3D
> >> main-n261544-cee09bda03c8-dirty: Wed Mar 15 20:25:49 PDT 2023 =3D
> >> =
> root@CA72_16Gp_ZFS:/usr/obj/BUILDs/main-CA72-nodbg-clang/usr/main-src/arm6=
> =3D
> >> 4.aarch64/sys/GENERIC-NODBG-CA72 arm64 aarch64 1400082 1400082
> >>=20
> >> I then did: git fetch, stash push ., merge --ff-only, stash apply . :
> >> my normal procedure. I then also applied the patch from:
> >>=20
> >> https://github.com/openzfs/zfs/pull/14739/files
> >>=20
> >> Then I did: buildworld buildkernel, install them, and rebooted.
> >>=20
> >> The result was:
> >>=20
> >> # uname -apKU
> >> FreeBSD CA72_4c8G_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #91 =3D
> >> main-n262122-2ef2c26f3f13-dirty: Wed Apr 12 19:23:35 PDT 2023 =3D
> >> =
> 

Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

2023-04-13 Thread Mark Millard
[This just puts my prior reply's material into Cy's
adjusted resend of the original. The To/Cc should
be coomplete this time.]

On Apr 12, 2023, at 22:52, Cy Schubert  wrote:

> In message , Mark Millard 
> write
> s:
>> From: Charlie Li  wrote on
>> Date: Wed, 12 Apr 2023 20:11:16 UTC :
>> 
>>> Charlie Li wrote:
 Mateusz Guzik wrote:
> can you please test poudriere with
> https://github.com/openzfs/zfs/pull/14739/files
> 
 After applying, on the md(4)-backed pool regardless of =
>> block_cloning,=20
 the cy@ `cp -R` test reports no differing (ie corrupted) files. Will=20=
>> 
 report back on poudriere results (no block_cloning).
 =20
>>> As for poudriere, build failures are still rolling in. These are (and=20=
>> 
>>> have been) entirely random on every run. Some examples from this run:
>>> =20
>>> lang/php81:
>>> - post-install: @${INSTALL_DATA} ${WRKSRC}/php.ini-development=20
>>> ${WRKSRC}/php.ini-production ${WRKDIR}/php.conf =
>> ${STAGEDIR}/${PREFIX}/etc
>>> - consumers fail to build due to corrupted php.conf packaged
>>> =20
>>> devel/ninja:
>>> - phase: stage
>>> - install -s -m 555=20
>>> /wrkdirs/usr/ports/devel/ninja/work/ninja-1.11.1/ninja=20
>>> /wrkdirs/usr/ports/devel/ninja/work/stage/usr/local/bin
>>> - consumers fail to build due to corrupted bin/ninja packaged
>>> =20
>>> devel/netsurf-buildsystem:
>>> - phase: stage
>>> - mkdir -p=20
>>> =
>> /wrkdirs/usr/ports/devel/netsurf-buildsystem/work/stage/usr/local/share/ne=
>> tsurf-buildsystem/makefiles=20
>>> =
>> /wrkdirs/usr/ports/devel/netsurf-buildsystem/work/stage/usr/local/share/ne=
>> tsurf-buildsystem/testtools
>>> for M in Makefile.top Makefile.tools Makefile.subdir =
>> Makefile.pkgconfig=20
>>> Makefile.clang Makefile.gcc Makefile.norcroft Makefile.open64; do \
>>> cp makefiles/$M=20
>>> =
>> /wrkdirs/usr/ports/devel/netsurf-buildsystem/work/stage/usr/local/share/ne=
>> tsurf-buildsystem/makefiles/;=20
>>> \
>>> done
>>> - graphics/libnsgif fails to build due to NUL characters in=20
>>> Makefile.{clang,subdir}, causing nothing to link
>> 
>> Summary: I have problems building ports into packages
>> via poudriere-devel use despite being fully updated/patched
>> (as of when I started the experiment), never having enabled
>> block_cloning ( still using openzfs-2.1-freebsd ).
>> 
>> In other words, I can confirm other reports that have
>> been made.
>> 
>> The details follow.
>> 
>> 
>> [Written as I was working on setting up for the experiments
>> and then executing those experiments, adjusting as I went
>> along.]
>> 
>> I've run my own tests in a context that has never had the
>> zpool upgrade and that jump from before the openzfs import to
>> after the existing commits for trying to fix openzfs on
>> FreeBSD. I report on the sequence of activities getting to
>> the point of testing as well.
>> 
>> By personal policy I keep my (non-temporary) pool's compatible
>> with what the most recent ??.?-RELEASE supports, using
>> openzfs-2.1-freebsd for now. The pools involved below have
>> never had a zpool upgrade from where they started. (I've no
>> pools that have ever had a zpool upgrade.)
>> 
>> (Temporary pools are rare for me, such as this investigation.
>> But I'm not testing block_cloning or anything new this time.)
>> 
>> I'll note that I use zfs for bectl, not for redundancy. So
>> my evidence is more limited in that respect.
>> 
>> The activities were done on a HoneyComb (16 Cortex-A72 cores).
>> The system has and supports ECC RAM, 64 GiBytes of RAM are
>> present.
>> 
>> I started by duplicating my normal zfs environment to an
>> external USB3 NVMe drive and adjusting the host name and such
>> to produce the below. (Non-debug, although I do not strip
>> symbols.) :
>> 
>> # uname -apKU
>> FreeBSD CA72_4c8G_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #90 =
>> main-n261544-cee09bda03c8-dirty: Wed Mar 15 20:25:49 PDT 2023 =
>> root@CA72_16Gp_ZFS:/usr/obj/BUILDs/main-CA72-nodbg-clang/usr/main-src/arm6=
>> 4.aarch64/sys/GENERIC-NODBG-CA72 arm64 aarch64 1400082 1400082
>> 
>> I then did: git fetch, stash push ., merge --ff-only, stash apply . :
>> my normal procedure. I then also applied the patch from:
>> 
>> https://github.com/openzfs/zfs/pull/14739/files
>> 
>> Then I did: buildworld buildkernel, install them, and rebooted.
>> 
>> The result was:
>> 
>> # uname -apKU
>> FreeBSD CA72_4c8G_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #91 =
>> main-n262122-2ef2c26f3f13-dirty: Wed Apr 12 19:23:35 PDT 2023 =
>> root@CA72_4c8G_ZFS:/usr/obj/BUILDs/main-CA72-nodbg-clang/usr/main-src/arm6=
>> 4.aarch64/sys/GENERIC-NODBG-CA72 arm64 aarch64 1400086 1400086
>> 
>> The later poudriere-devel based build of packages from ports is
>> based on:
>> 
>> # ~/fbsd-based-on-what-commit.sh -C /usr/ports
>> 4e94ac9eb97f (HEAD -> main, freebsd/main, freebsd/HEAD) =
>> devel/freebsd-gcc12: Bump to 12.2.0.
>> Author: John Baldwin 
>> Commit: John Baldwin 
>> CommitDate: 2023-03-25 00:06:40 +
>> branch: main
>> 

Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

2023-04-12 Thread Cy Schubert
In message , Mark Millard 
write
s:
> From: Charlie Li  wrote on
> Date: Wed, 12 Apr 2023 20:11:16 UTC :
>
> > Charlie Li wrote:
> > > Mateusz Guzik wrote:
> > >> can you please test poudriere with
> > >> https://github.com/openzfs/zfs/pull/14739/files
> > >>
> > > After applying, on the md(4)-backed pool regardless of =
> block_cloning,=20
> > > the cy@ `cp -R` test reports no differing (ie corrupted) files. Will=20=
>
> > > report back on poudriere results (no block_cloning).
> > >=20
> > As for poudriere, build failures are still rolling in. These are (and=20=
>
> > have been) entirely random on every run. Some examples from this run:
> >=20
> > lang/php81:
> > - post-install: @${INSTALL_DATA} ${WRKSRC}/php.ini-development=20
> > ${WRKSRC}/php.ini-production ${WRKDIR}/php.conf =
> ${STAGEDIR}/${PREFIX}/etc
> > - consumers fail to build due to corrupted php.conf packaged
> >=20
> > devel/ninja:
> > - phase: stage
> > - install -s -m 555=20
> > /wrkdirs/usr/ports/devel/ninja/work/ninja-1.11.1/ninja=20
> > /wrkdirs/usr/ports/devel/ninja/work/stage/usr/local/bin
> > - consumers fail to build due to corrupted bin/ninja packaged
> >=20
> > devel/netsurf-buildsystem:
> > - phase: stage
> > - mkdir -p=20
> > =
> /wrkdirs/usr/ports/devel/netsurf-buildsystem/work/stage/usr/local/share/ne=
> tsurf-buildsystem/makefiles=20
> > =
> /wrkdirs/usr/ports/devel/netsurf-buildsystem/work/stage/usr/local/share/ne=
> tsurf-buildsystem/testtools
> > for M in Makefile.top Makefile.tools Makefile.subdir =
> Makefile.pkgconfig=20
> > Makefile.clang Makefile.gcc Makefile.norcroft Makefile.open64; do \
> > cp makefiles/$M=20
> > =
> /wrkdirs/usr/ports/devel/netsurf-buildsystem/work/stage/usr/local/share/ne=
> tsurf-buildsystem/makefiles/;=20
> > \
> > done
> > - graphics/libnsgif fails to build due to NUL characters in=20
> > Makefile.{clang,subdir}, causing nothing to link
>
> Summary: I have problems building ports into packages
> via poudriere-devel use despite being fully updated/patched
> (as of when I started the experiment), never having enabled
> block_cloning ( still using openzfs-2.1-freebsd ).
>
> In other words, I can confirm other reports that have
> been made.
>
> The details follow.
>
>
> [Written as I was working on setting up for the experiments
> and then executing those experiments, adjusting as I went
> along.]
>
> I've run my own tests in a context that has never had the
> zpool upgrade and that jump from before the openzfs import to
> after the existing commits for trying to fix openzfs on
> FreeBSD. I report on the sequence of activities getting to
> the point of testing as well.
>
> By personal policy I keep my (non-temporary) pool's compatible
> with what the most recent ??.?-RELEASE supports, using
> openzfs-2.1-freebsd for now. The pools involved below have
> never had a zpool upgrade from where they started. (I've no
> pools that have ever had a zpool upgrade.)
>
> (Temporary pools are rare for me, such as this investigation.
> But I'm not testing block_cloning or anything new this time.)
>
> I'll note that I use zfs for bectl, not for redundancy. So
> my evidence is more limited in that respect.
>
> The activities were done on a HoneyComb (16 Cortex-A72 cores).
> The system has and supports ECC RAM, 64 GiBytes of RAM are
> present.
>
> I started by duplicating my normal zfs environment to an
> external USB3 NVMe drive and adjusting the host name and such
> to produce the below. (Non-debug, although I do not strip
> symbols.) :
>
> # uname -apKU
> FreeBSD CA72_4c8G_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #90 =
> main-n261544-cee09bda03c8-dirty: Wed Mar 15 20:25:49 PDT 2023 =
> root@CA72_16Gp_ZFS:/usr/obj/BUILDs/main-CA72-nodbg-clang/usr/main-src/arm6=
> 4.aarch64/sys/GENERIC-NODBG-CA72 arm64 aarch64 1400082 1400082
>
> I then did: git fetch, stash push ., merge --ff-only, stash apply . :
> my normal procedure. I then also applied the patch from:
>
> https://github.com/openzfs/zfs/pull/14739/files
>
> Then I did: buildworld buildkernel, install them, and rebooted.
>
> The result was:
>
> # uname -apKU
> FreeBSD CA72_4c8G_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #91 =
> main-n262122-2ef2c26f3f13-dirty: Wed Apr 12 19:23:35 PDT 2023 =
> root@CA72_4c8G_ZFS:/usr/obj/BUILDs/main-CA72-nodbg-clang/usr/main-src/arm6=
> 4.aarch64/sys/GENERIC-NODBG-CA72 arm64 aarch64 1400086 1400086
>
> The later poudriere-devel based build of packages from ports is
> based on:
>
> # ~/fbsd-based-on-what-commit.sh -C /usr/ports
> 4e94ac9eb97f (HEAD -> main, freebsd/main, freebsd/HEAD) =
> devel/freebsd-gcc12: Bump to 12.2.0.
> Author: John Baldwin 
> Commit: John Baldwin 
> CommitDate: 2023-03-25 00:06:40 +
> branch: main
> merge-base: 4e94ac9eb97fab16510b74ebcaa9316613182a72
> merge-base: CommitDate: 2023-03-25 00:06:40 +
> n613214 (--first-parent --count for merge-base)
>
> poudriere attempted to build 476 packages, starting
> with pkg (in order to build the 56 that I explicitly
> indicate that I want). It is 

Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75

2023-04-12 Thread Mark Millard
From: Charlie Li  wrote on
Date: Wed, 12 Apr 2023 20:11:16 UTC :

> Charlie Li wrote:
> > Mateusz Guzik wrote:
> >> can you please test poudriere with
> >> https://github.com/openzfs/zfs/pull/14739/files
> >>
> > After applying, on the md(4)-backed pool regardless of block_cloning, 
> > the cy@ `cp -R` test reports no differing (ie corrupted) files. Will 
> > report back on poudriere results (no block_cloning).
> > 
> As for poudriere, build failures are still rolling in. These are (and 
> have been) entirely random on every run. Some examples from this run:
> 
> lang/php81:
> - post-install: @${INSTALL_DATA} ${WRKSRC}/php.ini-development 
> ${WRKSRC}/php.ini-production ${WRKDIR}/php.conf ${STAGEDIR}/${PREFIX}/etc
> - consumers fail to build due to corrupted php.conf packaged
> 
> devel/ninja:
> - phase: stage
> - install -s -m 555 
> /wrkdirs/usr/ports/devel/ninja/work/ninja-1.11.1/ninja 
> /wrkdirs/usr/ports/devel/ninja/work/stage/usr/local/bin
> - consumers fail to build due to corrupted bin/ninja packaged
> 
> devel/netsurf-buildsystem:
> - phase: stage
> - mkdir -p 
> /wrkdirs/usr/ports/devel/netsurf-buildsystem/work/stage/usr/local/share/netsurf-buildsystem/makefiles
>  
> /wrkdirs/usr/ports/devel/netsurf-buildsystem/work/stage/usr/local/share/netsurf-buildsystem/testtools
> for M in Makefile.top Makefile.tools Makefile.subdir Makefile.pkgconfig 
> Makefile.clang Makefile.gcc Makefile.norcroft Makefile.open64; do \
> cp makefiles/$M 
> /wrkdirs/usr/ports/devel/netsurf-buildsystem/work/stage/usr/local/share/netsurf-buildsystem/makefiles/;
>  
> \
> done
> - graphics/libnsgif fails to build due to NUL characters in 
> Makefile.{clang,subdir}, causing nothing to link

Summary: I have problems building ports into packages
via poudriere-devel use despite being fully updated/patched
(as of when I started the experiment), never having enabled
block_cloning ( still using openzfs-2.1-freebsd ).

In other words, I can confirm other reports that have
been made.

The details follow.


[Written as I was working on setting up for the experiments
and then executing those experiments, adjusting as I went
along.]

I've run my own tests in a context that has never had the
zpool upgrade and that jump from before the openzfs import to
after the existing commits for trying to fix openzfs on
FreeBSD. I report on the sequence of activities getting to
the point of testing as well.

By personal policy I keep my (non-temporary) pool's compatible
with what the most recent ??.?-RELEASE supports, using
openzfs-2.1-freebsd for now. The pools involved below have
never had a zpool upgrade from where they started. (I've no
pools that have ever had a zpool upgrade.)

(Temporary pools are rare for me, such as this investigation.
But I'm not testing block_cloning or anything new this time.)

I'll note that I use zfs for bectl, not for redundancy. So
my evidence is more limited in that respect.

The activities were done on a HoneyComb (16 Cortex-A72 cores).
The system has and supports ECC RAM, 64 GiBytes of RAM are
present.

I started by duplicating my normal zfs environment to an
external USB3 NVMe drive and adjusting the host name and such
to produce the below. (Non-debug, although I do not strip
symbols.) :

# uname -apKU
FreeBSD CA72_4c8G_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #90 
main-n261544-cee09bda03c8-dirty: Wed Mar 15 20:25:49 PDT 2023 
root@CA72_16Gp_ZFS:/usr/obj/BUILDs/main-CA72-nodbg-clang/usr/main-src/arm64.aarch64/sys/GENERIC-NODBG-CA72
 arm64 aarch64 1400082 1400082

I then did: git fetch, stash push ., merge --ff-only, stash apply . :
my normal procedure. I then also applied the patch from:

https://github.com/openzfs/zfs/pull/14739/files

Then I did: buildworld buildkernel, install them, and rebooted.

The result was:

# uname -apKU
FreeBSD CA72_4c8G_ZFS 14.0-CURRENT FreeBSD 14.0-CURRENT #91 
main-n262122-2ef2c26f3f13-dirty: Wed Apr 12 19:23:35 PDT 2023 
root@CA72_4c8G_ZFS:/usr/obj/BUILDs/main-CA72-nodbg-clang/usr/main-src/arm64.aarch64/sys/GENERIC-NODBG-CA72
 arm64 aarch64 1400086 1400086

The later poudriere-devel based build of packages from ports is
based on:

# ~/fbsd-based-on-what-commit.sh -C /usr/ports
4e94ac9eb97f (HEAD -> main, freebsd/main, freebsd/HEAD) devel/freebsd-gcc12: 
Bump to 12.2.0.
Author: John Baldwin 
Commit: John Baldwin 
CommitDate: 2023-03-25 00:06:40 +
branch: main
merge-base: 4e94ac9eb97fab16510b74ebcaa9316613182a72
merge-base: CommitDate: 2023-03-25 00:06:40 +
n613214 (--first-parent --count for merge-base)

poudriere attempted to build 476 packages, starting
with pkg (in order to build the 56 that I explicitly
indicate that I want). It is my normal set of ports.
The form of building is biased to allowing a high
load average compared to the number of hardware
threads (same as cores here): each builder is allowed
to use the full count of hardware threads. The build
used USE_TMPFS="data" instead of the USE_TMPFS=all I
normally use on the build machine involved.

And 

Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75 [separate aarch64 panic for zpool import]

2023-04-08 Thread Kyle Evans
On Sat, Apr 8, 2023 at 5:24 AM Mateusz Guzik  wrote:
>
> On 4/8/23, Kyle Evans  wrote:
> > On Fri, Apr 7, 2023 at 4:54 PM Mateusz Guzik  wrote:
> >>
> >> On 4/7/23, Mark Millard  wrote:
> >> > On Apr 7, 2023, at 14:26, Mateusz Guzik  wrote:
> >> >
> >> >> On 4/7/23, Mateusz Guzik  wrote:
> >> >>> can you try with this:
> >> >>>
> >> >>> diff --git
> >> >>> a/sys/contrib/openzfs/include/os/linux/kernel/linux/simd_aarch64.h
> >> >>> b/sys/contrib/openzfs/include/os/linux/kernel/linux/simd_aarch64.h
> >> >>> index 16276b08c759..e1bca9ef140a 100644
> >> >>> ---
> >> >>> a/sys/contrib/openzfs/include/os/linux/kernel/linux/simd_aarch64.h
> >> >>> +++
> >> >>> b/sys/contrib/openzfs/include/os/linux/kernel/linux/simd_aarch64.h
> >> >>> @@ -71,7 +71,7 @@
> >> >>> #defineID_AA64PFR0_EL1 sys_reg(3, 0, 0, 1, 0)
> >> >>> #defineID_AA64ISAR0_EL1sys_reg(3, 0, 0, 6, 0)
> >> >>>
> >> >>> -#definekfpu_allowed()  1
> >> >>> +#definekfpu_allowed()  0
> >> >>> #definekfpu_begin()kernel_neon_begin()
> >> >>> #definekfpu_end()  kernel_neon_end()
> >> >>> #definekfpu_init() (0)
> >> >>>
> >> >>>
> >> >>
> >> >> ops, wrong file
> >> >>
> >> >> diff --git a/sys/contrib/openzfs/include/os/freebsd/spl/sys/simd_arm.h
> >> >> b/sys/contrib/openzfs/include/os/freebsd/spl/sys/simd_arm.h
> >> >> index 178fbc3b3c6e..c462220289d6 100644
> >> >> --- a/sys/contrib/openzfs/include/os/freebsd/spl/sys/simd_arm.h
> >> >> +++ b/sys/contrib/openzfs/include/os/freebsd/spl/sys/simd_arm.h
> >> >> @@ -46,7 +46,7 @@
> >> >> #include 
> >> >> #include 
> >> >>
> >> >> -#definekfpu_allowed()  1
> >> >> +#definekfpu_allowed()  0
> >> >> #definekfpu_initialize(tsk)do {} while (0)
> >> >> #definekfpu_begin()do {} while (0)
> >> >> #definekfpu_end()  do {} while (0)
> >> >
> >> > It will take me a bit to setup a separate build/install
> >> > context for the source code vintage involved. Then more
> >> > time to do the build, install, and test. (I'm keeping
> >> > my normal environments completely before the mess.)
> >> >
> >> > FYI:
> >> >
> >> > I have used the artifact build just after your pair of zfs
> >> > related updates to confirm the VFP problem is still in
> >> > place as of that point:
> >> >
> >> > https://artifact.ci.freebsd.org/snapshot/main/5e2e3615d91f9c0c688987915ff5c8de23c22bde/arm64/aarch64/kernel.txz
> >> >
> >> > (No artifact build was exactly at either of your commits.)
> >> >
> >> > ===
> >> > Mark Millard
> >> > marklmi at yahoo.com
> >> >
> >> >
> >>
> >> I have arm64 + zfs at $job and just verified the above lets it boot
> >> again, so I committed already.
> >>
> >
> > This was a known issue that we were working on fixing properly over in
> > https://reviews.freebsd.org/D39448... this really could have waited
> > just a little bit longer. This problem was already brought up in
> > response to the commit in question days ago.
> >
>
> Mate, that's one confusing email.
>

Sorry, this was misdirected anger around this series of crappery.

> I had seen the upstream review, apparently there is opposition to the
> patch, it is clearly not going to land within hours.
>

The opposition is notably from a person that does not actually work on
this platform, and IMO has no bearing on our downstream review. We'll
get past him eventually, because this is what needs to happen.

> Whatever the Real Fix(tm) might be, I'm confident my change has no
> impact on work on it, past the need to flip kfpu_allowed back to 1.
>
> At the same time things were broken to the point where aarch64 + zfs
> literally did not boot. Once more, I fail to see how restoring basic
> operation by fipping a macro to 0 throws any wrenches into the effort
> to get simd working.
>

Thanks!

> If anything the question is how come a clearly *not* implemented simd
> support got kfpu_allowed set to 1.
>

Your guess is as good as mine -- it clearly could not have been tested
at all, I have no clue why they didn't err on the side of caution and
avoid fpu usage.

Thanks,

Kyle Evans



Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75 [separate aarch64 panic for zpool import]

2023-04-08 Thread Mateusz Guzik
On 4/8/23, Kyle Evans  wrote:
> On Fri, Apr 7, 2023 at 4:54 PM Mateusz Guzik  wrote:
>>
>> On 4/7/23, Mark Millard  wrote:
>> > On Apr 7, 2023, at 14:26, Mateusz Guzik  wrote:
>> >
>> >> On 4/7/23, Mateusz Guzik  wrote:
>> >>> can you try with this:
>> >>>
>> >>> diff --git
>> >>> a/sys/contrib/openzfs/include/os/linux/kernel/linux/simd_aarch64.h
>> >>> b/sys/contrib/openzfs/include/os/linux/kernel/linux/simd_aarch64.h
>> >>> index 16276b08c759..e1bca9ef140a 100644
>> >>> ---
>> >>> a/sys/contrib/openzfs/include/os/linux/kernel/linux/simd_aarch64.h
>> >>> +++
>> >>> b/sys/contrib/openzfs/include/os/linux/kernel/linux/simd_aarch64.h
>> >>> @@ -71,7 +71,7 @@
>> >>> #defineID_AA64PFR0_EL1 sys_reg(3, 0, 0, 1, 0)
>> >>> #defineID_AA64ISAR0_EL1sys_reg(3, 0, 0, 6, 0)
>> >>>
>> >>> -#definekfpu_allowed()  1
>> >>> +#definekfpu_allowed()  0
>> >>> #definekfpu_begin()kernel_neon_begin()
>> >>> #definekfpu_end()  kernel_neon_end()
>> >>> #definekfpu_init() (0)
>> >>>
>> >>>
>> >>
>> >> ops, wrong file
>> >>
>> >> diff --git a/sys/contrib/openzfs/include/os/freebsd/spl/sys/simd_arm.h
>> >> b/sys/contrib/openzfs/include/os/freebsd/spl/sys/simd_arm.h
>> >> index 178fbc3b3c6e..c462220289d6 100644
>> >> --- a/sys/contrib/openzfs/include/os/freebsd/spl/sys/simd_arm.h
>> >> +++ b/sys/contrib/openzfs/include/os/freebsd/spl/sys/simd_arm.h
>> >> @@ -46,7 +46,7 @@
>> >> #include 
>> >> #include 
>> >>
>> >> -#definekfpu_allowed()  1
>> >> +#definekfpu_allowed()  0
>> >> #definekfpu_initialize(tsk)do {} while (0)
>> >> #definekfpu_begin()do {} while (0)
>> >> #definekfpu_end()  do {} while (0)
>> >
>> > It will take me a bit to setup a separate build/install
>> > context for the source code vintage involved. Then more
>> > time to do the build, install, and test. (I'm keeping
>> > my normal environments completely before the mess.)
>> >
>> > FYI:
>> >
>> > I have used the artifact build just after your pair of zfs
>> > related updates to confirm the VFP problem is still in
>> > place as of that point:
>> >
>> > https://artifact.ci.freebsd.org/snapshot/main/5e2e3615d91f9c0c688987915ff5c8de23c22bde/arm64/aarch64/kernel.txz
>> >
>> > (No artifact build was exactly at either of your commits.)
>> >
>> > ===
>> > Mark Millard
>> > marklmi at yahoo.com
>> >
>> >
>>
>> I have arm64 + zfs at $job and just verified the above lets it boot
>> again, so I committed already.
>>
>
> This was a known issue that we were working on fixing properly over in
> https://reviews.freebsd.org/D39448... this really could have waited
> just a little bit longer. This problem was already brought up in
> response to the commit in question days ago.
>

Mate, that's one confusing email.

I had seen the upstream review, apparently there is opposition to the
patch, it is clearly not going to land within hours.

Whatever the Real Fix(tm) might be, I'm confident my change has no
impact on work on it, past the need to flip kfpu_allowed back to 1.

At the same time things were broken to the point where aarch64 + zfs
literally did not boot. Once more, I fail to see how restoring basic
operation by fipping a macro to 0 throws any wrenches into the effort
to get simd working.

If anything the question is how come a clearly *not* implemented simd
support got kfpu_allowed set to 1.

-- 
Mateusz Guzik 



Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75 [separate aarch64 panic for zpool import]

2023-04-07 Thread Mark Millard
On Apr 7, 2023, at 16:29, Kyle Evans  wrote:

> On Fri, Apr 7, 2023 at 4:54 PM Mateusz Guzik  wrote:
>> 
>> On 4/7/23, Mark Millard  wrote:
>>> On Apr 7, 2023, at 14:26, Mateusz Guzik  wrote:
>>> 
 On 4/7/23, Mateusz Guzik  wrote:
> can you try with this:
> 
> diff --git
> a/sys/contrib/openzfs/include/os/linux/kernel/linux/simd_aarch64.h
> b/sys/contrib/openzfs/include/os/linux/kernel/linux/simd_aarch64.h
> index 16276b08c759..e1bca9ef140a 100644
> --- a/sys/contrib/openzfs/include/os/linux/kernel/linux/simd_aarch64.h
> +++ b/sys/contrib/openzfs/include/os/linux/kernel/linux/simd_aarch64.h
> @@ -71,7 +71,7 @@
> #defineID_AA64PFR0_EL1 sys_reg(3, 0, 0, 1, 0)
> #defineID_AA64ISAR0_EL1sys_reg(3, 0, 0, 6, 0)
> 
> -#definekfpu_allowed()  1
> +#definekfpu_allowed()  0
> #definekfpu_begin()kernel_neon_begin()
> #definekfpu_end()  kernel_neon_end()
> #definekfpu_init() (0)
> 
> 
 
 ops, wrong file
 
 diff --git a/sys/contrib/openzfs/include/os/freebsd/spl/sys/simd_arm.h
 b/sys/contrib/openzfs/include/os/freebsd/spl/sys/simd_arm.h
 index 178fbc3b3c6e..c462220289d6 100644
 --- a/sys/contrib/openzfs/include/os/freebsd/spl/sys/simd_arm.h
 +++ b/sys/contrib/openzfs/include/os/freebsd/spl/sys/simd_arm.h
 @@ -46,7 +46,7 @@
 #include 
 #include 
 
 -#definekfpu_allowed()  1
 +#definekfpu_allowed()  0
 #definekfpu_initialize(tsk)do {} while (0)
 #definekfpu_begin()do {} while (0)
 #definekfpu_end()  do {} while (0)
>>> 
>>> It will take me a bit to setup a separate build/install
>>> context for the source code vintage involved. Then more
>>> time to do the build, install, and test. (I'm keeping
>>> my normal environments completely before the mess.)
>>> 
>>> FYI:
>>> 
>>> I have used the artifact build just after your pair of zfs
>>> related updates to confirm the VFP problem is still in
>>> place as of that point:
>>> 
>>> https://artifact.ci.freebsd.org/snapshot/main/5e2e3615d91f9c0c688987915ff5c8de23c22bde/arm64/aarch64/kernel.txz
>>> 
>>> (No artifact build was exactly at either of your commits.)
>>> 
>>> ===
>>> Mark Millard
>>> marklmi at yahoo.com
>>> 
>>> 
>> 
>> I have arm64 + zfs at $job and just verified the above lets it boot
>> again, so I committed already.
>> 
> 
> This was a known issue that we were working on fixing properly over in
> https://reviews.freebsd.org/D39448... this really could have waited
> just a little bit longer. This problem was already brought up in
> response to the commit in question days ago.

FYI:

I substituted the aarch64 kernel from:

https://artifact.ci.freebsd.org/snapshot/main/d6e24901349dc34a2f8040d67730eb2d510073ab/arm64/aarch64/kernel.txz

into the 2023-Apr-06 aarch64 snapshot based media that I'd been
testing with, rebooted, and tried the test. The result was good:

# zpool import
ZFS filesystem version: 5
ZFS storage pool version: features support (5000)

The use of appropriate:

https://artifact.ci.freebsd.org/snapshot/main/d6e24901349dc34a2f8040d67730eb2d510073ab/*/*/kernel*.txz

may be a way to get to a more normal status for then making
progress in a more normal manor, not just for aarch64 and
armv7 since the earlier zfs-update fixup drafts are also in
place at that point. Of course, one needs a way to make the
substitutions of the kernel materials into whatever type of
the boot media (UFS or ZFS) is in involved.

===
Mark Millard
marklmi at yahoo.com




Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75 [separate aarch64 panic for zpool import]

2023-04-07 Thread Kyle Evans
On Fri, Apr 7, 2023 at 4:54 PM Mateusz Guzik  wrote:
>
> On 4/7/23, Mark Millard  wrote:
> > On Apr 7, 2023, at 14:26, Mateusz Guzik  wrote:
> >
> >> On 4/7/23, Mateusz Guzik  wrote:
> >>> can you try with this:
> >>>
> >>> diff --git
> >>> a/sys/contrib/openzfs/include/os/linux/kernel/linux/simd_aarch64.h
> >>> b/sys/contrib/openzfs/include/os/linux/kernel/linux/simd_aarch64.h
> >>> index 16276b08c759..e1bca9ef140a 100644
> >>> --- a/sys/contrib/openzfs/include/os/linux/kernel/linux/simd_aarch64.h
> >>> +++ b/sys/contrib/openzfs/include/os/linux/kernel/linux/simd_aarch64.h
> >>> @@ -71,7 +71,7 @@
> >>> #defineID_AA64PFR0_EL1 sys_reg(3, 0, 0, 1, 0)
> >>> #defineID_AA64ISAR0_EL1sys_reg(3, 0, 0, 6, 0)
> >>>
> >>> -#definekfpu_allowed()  1
> >>> +#definekfpu_allowed()  0
> >>> #definekfpu_begin()kernel_neon_begin()
> >>> #definekfpu_end()  kernel_neon_end()
> >>> #definekfpu_init() (0)
> >>>
> >>>
> >>
> >> ops, wrong file
> >>
> >> diff --git a/sys/contrib/openzfs/include/os/freebsd/spl/sys/simd_arm.h
> >> b/sys/contrib/openzfs/include/os/freebsd/spl/sys/simd_arm.h
> >> index 178fbc3b3c6e..c462220289d6 100644
> >> --- a/sys/contrib/openzfs/include/os/freebsd/spl/sys/simd_arm.h
> >> +++ b/sys/contrib/openzfs/include/os/freebsd/spl/sys/simd_arm.h
> >> @@ -46,7 +46,7 @@
> >> #include 
> >> #include 
> >>
> >> -#definekfpu_allowed()  1
> >> +#definekfpu_allowed()  0
> >> #definekfpu_initialize(tsk)do {} while (0)
> >> #definekfpu_begin()do {} while (0)
> >> #definekfpu_end()  do {} while (0)
> >
> > It will take me a bit to setup a separate build/install
> > context for the source code vintage involved. Then more
> > time to do the build, install, and test. (I'm keeping
> > my normal environments completely before the mess.)
> >
> > FYI:
> >
> > I have used the artifact build just after your pair of zfs
> > related updates to confirm the VFP problem is still in
> > place as of that point:
> >
> > https://artifact.ci.freebsd.org/snapshot/main/5e2e3615d91f9c0c688987915ff5c8de23c22bde/arm64/aarch64/kernel.txz
> >
> > (No artifact build was exactly at either of your commits.)
> >
> > ===
> > Mark Millard
> > marklmi at yahoo.com
> >
> >
>
> I have arm64 + zfs at $job and just verified the above lets it boot
> again, so I committed already.
>

This was a known issue that we were working on fixing properly over in
https://reviews.freebsd.org/D39448... this really could have waited
just a little bit longer. This problem was already brought up in
response to the commit in question days ago.

Thanks,

Kyle Evans



Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75 [separate aarch64 panic for zpool import]

2023-04-07 Thread Mateusz Guzik
On 4/7/23, Mark Millard  wrote:
> On Apr 7, 2023, at 14:26, Mateusz Guzik  wrote:
>
>> On 4/7/23, Mateusz Guzik  wrote:
>>> can you try with this:
>>>
>>> diff --git
>>> a/sys/contrib/openzfs/include/os/linux/kernel/linux/simd_aarch64.h
>>> b/sys/contrib/openzfs/include/os/linux/kernel/linux/simd_aarch64.h
>>> index 16276b08c759..e1bca9ef140a 100644
>>> --- a/sys/contrib/openzfs/include/os/linux/kernel/linux/simd_aarch64.h
>>> +++ b/sys/contrib/openzfs/include/os/linux/kernel/linux/simd_aarch64.h
>>> @@ -71,7 +71,7 @@
>>> #defineID_AA64PFR0_EL1 sys_reg(3, 0, 0, 1, 0)
>>> #defineID_AA64ISAR0_EL1sys_reg(3, 0, 0, 6, 0)
>>>
>>> -#definekfpu_allowed()  1
>>> +#definekfpu_allowed()  0
>>> #definekfpu_begin()kernel_neon_begin()
>>> #definekfpu_end()  kernel_neon_end()
>>> #definekfpu_init() (0)
>>>
>>>
>>
>> ops, wrong file
>>
>> diff --git a/sys/contrib/openzfs/include/os/freebsd/spl/sys/simd_arm.h
>> b/sys/contrib/openzfs/include/os/freebsd/spl/sys/simd_arm.h
>> index 178fbc3b3c6e..c462220289d6 100644
>> --- a/sys/contrib/openzfs/include/os/freebsd/spl/sys/simd_arm.h
>> +++ b/sys/contrib/openzfs/include/os/freebsd/spl/sys/simd_arm.h
>> @@ -46,7 +46,7 @@
>> #include 
>> #include 
>>
>> -#definekfpu_allowed()  1
>> +#definekfpu_allowed()  0
>> #definekfpu_initialize(tsk)do {} while (0)
>> #definekfpu_begin()do {} while (0)
>> #definekfpu_end()  do {} while (0)
>
> It will take me a bit to setup a separate build/install
> context for the source code vintage involved. Then more
> time to do the build, install, and test. (I'm keeping
> my normal environments completely before the mess.)
>
> FYI:
>
> I have used the artifact build just after your pair of zfs
> related updates to confirm the VFP problem is still in
> place as of that point:
>
> https://artifact.ci.freebsd.org/snapshot/main/5e2e3615d91f9c0c688987915ff5c8de23c22bde/arm64/aarch64/kernel.txz
>
> (No artifact build was exactly at either of your commits.)
>
> ===
> Mark Millard
> marklmi at yahoo.com
>
>

I have arm64 + zfs at $job and just verified the above lets it boot
again, so I committed already.

-- 
Mateusz Guzik 



Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75 [separate aarch64 panic for zpool import]

2023-04-07 Thread Mark Millard
On Apr 7, 2023, at 14:26, Mateusz Guzik  wrote:

> On 4/7/23, Mateusz Guzik  wrote:
>> can you try with this:
>> 
>> diff --git
>> a/sys/contrib/openzfs/include/os/linux/kernel/linux/simd_aarch64.h
>> b/sys/contrib/openzfs/include/os/linux/kernel/linux/simd_aarch64.h
>> index 16276b08c759..e1bca9ef140a 100644
>> --- a/sys/contrib/openzfs/include/os/linux/kernel/linux/simd_aarch64.h
>> +++ b/sys/contrib/openzfs/include/os/linux/kernel/linux/simd_aarch64.h
>> @@ -71,7 +71,7 @@
>> #defineID_AA64PFR0_EL1 sys_reg(3, 0, 0, 1, 0)
>> #defineID_AA64ISAR0_EL1sys_reg(3, 0, 0, 6, 0)
>> 
>> -#definekfpu_allowed()  1
>> +#definekfpu_allowed()  0
>> #definekfpu_begin()kernel_neon_begin()
>> #definekfpu_end()  kernel_neon_end()
>> #definekfpu_init() (0)
>> 
>> 
> 
> ops, wrong file
> 
> diff --git a/sys/contrib/openzfs/include/os/freebsd/spl/sys/simd_arm.h
> b/sys/contrib/openzfs/include/os/freebsd/spl/sys/simd_arm.h
> index 178fbc3b3c6e..c462220289d6 100644
> --- a/sys/contrib/openzfs/include/os/freebsd/spl/sys/simd_arm.h
> +++ b/sys/contrib/openzfs/include/os/freebsd/spl/sys/simd_arm.h
> @@ -46,7 +46,7 @@
> #include 
> #include 
> 
> -#definekfpu_allowed()  1
> +#definekfpu_allowed()  0
> #definekfpu_initialize(tsk)do {} while (0)
> #definekfpu_begin()do {} while (0)
> #definekfpu_end()  do {} while (0)

It will take me a bit to setup a separate build/install
context for the source code vintage involved. Then more
time to do the build, install, and test. (I'm keeping
my normal environments completely before the mess.)

FYI:

I have used the artifact build just after your pair of zfs
related updates to confirm the VFP problem is still in
place as of that point:

https://artifact.ci.freebsd.org/snapshot/main/5e2e3615d91f9c0c688987915ff5c8de23c22bde/arm64/aarch64/kernel.txz

(No artifact build was exactly at either of your commits.)

===
Mark Millard
marklmi at yahoo.com




Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75 [separate aarch64 panic for zpool import]

2023-04-07 Thread Mateusz Guzik
On 4/7/23, Mateusz Guzik  wrote:
> can you try with this:
>
> diff --git
> a/sys/contrib/openzfs/include/os/linux/kernel/linux/simd_aarch64.h
> b/sys/contrib/openzfs/include/os/linux/kernel/linux/simd_aarch64.h
> index 16276b08c759..e1bca9ef140a 100644
> --- a/sys/contrib/openzfs/include/os/linux/kernel/linux/simd_aarch64.h
> +++ b/sys/contrib/openzfs/include/os/linux/kernel/linux/simd_aarch64.h
> @@ -71,7 +71,7 @@
>  #defineID_AA64PFR0_EL1 sys_reg(3, 0, 0, 1, 0)
>  #defineID_AA64ISAR0_EL1sys_reg(3, 0, 0, 6, 0)
>
> -#definekfpu_allowed()  1
> +#definekfpu_allowed()  0
>  #definekfpu_begin()kernel_neon_begin()
>  #definekfpu_end()  kernel_neon_end()
>  #definekfpu_init() (0)
>
>

ops, wrong file

diff --git a/sys/contrib/openzfs/include/os/freebsd/spl/sys/simd_arm.h
b/sys/contrib/openzfs/include/os/freebsd/spl/sys/simd_arm.h
index 178fbc3b3c6e..c462220289d6 100644
--- a/sys/contrib/openzfs/include/os/freebsd/spl/sys/simd_arm.h
+++ b/sys/contrib/openzfs/include/os/freebsd/spl/sys/simd_arm.h
@@ -46,7 +46,7 @@
 #include 
 #include 

-#definekfpu_allowed()  1
+#definekfpu_allowed()  0
 #definekfpu_initialize(tsk)do {} while (0)
 #definekfpu_begin()do {} while (0)
 #definekfpu_end()  do {} while (0)


> On 4/7/23, Mark Millard  wrote:
>> Turns out that as of this commit aarch64 (Cortex-A72 and Cortex-A57
>> examples reported) gets the following even when no zfs media is
>> present (UFS boot):
>>
>> # zpool import
>>  x0: f0fa9168 (ucom_cons_softc + efbf1bb8)
>>  x1: ff90 ($d.1 + afa318)
>>  x2: ff900400 ($d.1 + afa718)
>>  x3: fec1b0a4 (sha_incremental + 0)
>>  x4:0
>>  x5:   10
>>  x6: 8e16db93
>>  x7:0
>>  x8: feb06168 (tf_sha256_neon + 0)
>>  x9: fea931fb ($d.1 + b)
>> x10: feb045f4 (SHA2Update + f4)
>> x11:   29
>> x12:1
>> x13:0
>> x14:0
>> x15:2
>> x16: feaf7500 ($d.0 + 0)
>> x17: 00476cf0 (nanouptime + 0)
>> x18: f0fa9000 (ucom_cons_softc + efbf1a50)
>> x19: f0fa9168 (ucom_cons_softc + efbf1bb8)
>> x20:  400
>> x21: ff90 ($d.1 + afa318)
>> x22: f0fa9198 (ucom_cons_softc + efbf1be8)
>> x23:0
>> x24:0
>> x25:0
>> x26: fed2df70 (sha256_neon_impl + 0)
>> x27:  203
>> x28:   31
>> x29: f0fa9040 (ucom_cons_softc + efbf1a90)
>>  sp: f0fa9000
>>  lr: feb04668 (SHA2Update + 168)
>> elr: feaf8684 (zfs_sha256_block_neon + 14)
>> spsr: 2045
>> esr: 1fe0
>> panic: VFP exception in the kernel
>> cpuid = 3
>> time = 1680786034
>> KDB: stack backtrace:
>> db_trace_self() at db_trace_self
>> db_trace_self_wrapper() at db_trace_self_wrapper+0x30
>> vpanic() at vpanic+0x13c
>> panic() at panic+0x44
>> do_el1h_sync() at do_el1h_sync+0x210
>> handle_el1h_sync() at handle_el1h_sync+0x10
>> --- exception, esr 0xf0fa9198
>> (null)() at 0x400
>> KDB: enter: panic
>> [ thread pid 1446 tid 100101 ]
>> Stopped at  kdb_enter+0x44: undefined   f905c27f
>> db>
>>
>> The above was produced via using an artifact build's
>> kernel based on that exact commit:
>>
>> https://artifact.ci.freebsd.org/snapshot/main/2a58b312b62f908ec92311d1bd8536dbaeb8e55b/arm64/aarch64/kernel.txz
>>
>> By contrast, the prior commit had an artifact build
>> as well, but it's kernel does not get the panic for
>> zpool import :
>>
>> https://artifact.ci.freebsd.org/snapshot/main/b98fbf3781df16f7797b2bbeabf205dc7d4985ae/arm64/aarch64/kernel.txz
>>
>> See also:
>>
>> https://lists.freebsd.org/archives/freebsd-current/2023-April/003417.html
>>
>> ===
>> Mark Millard
>> marklmi at yahoo.com
>>
>>
>
>
> --
> Mateusz Guzik 
>


-- 
Mateusz Guzik 



Re: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75 [separate aarch64 panic for zpool import]

2023-04-07 Thread Mateusz Guzik
can you try with this:

diff --git a/sys/contrib/openzfs/include/os/linux/kernel/linux/simd_aarch64.h
b/sys/contrib/openzfs/include/os/linux/kernel/linux/simd_aarch64.h
index 16276b08c759..e1bca9ef140a 100644
--- a/sys/contrib/openzfs/include/os/linux/kernel/linux/simd_aarch64.h
+++ b/sys/contrib/openzfs/include/os/linux/kernel/linux/simd_aarch64.h
@@ -71,7 +71,7 @@
 #defineID_AA64PFR0_EL1 sys_reg(3, 0, 0, 1, 0)
 #defineID_AA64ISAR0_EL1sys_reg(3, 0, 0, 6, 0)

-#definekfpu_allowed()  1
+#definekfpu_allowed()  0
 #definekfpu_begin()kernel_neon_begin()
 #definekfpu_end()  kernel_neon_end()
 #definekfpu_init() (0)


On 4/7/23, Mark Millard  wrote:
> Turns out that as of this commit aarch64 (Cortex-A72 and Cortex-A57
> examples reported) gets the following even when no zfs media is
> present (UFS boot):
>
> # zpool import
>  x0: f0fa9168 (ucom_cons_softc + efbf1bb8)
>  x1: ff90 ($d.1 + afa318)
>  x2: ff900400 ($d.1 + afa718)
>  x3: fec1b0a4 (sha_incremental + 0)
>  x4:0
>  x5:   10
>  x6: 8e16db93
>  x7:0
>  x8: feb06168 (tf_sha256_neon + 0)
>  x9: fea931fb ($d.1 + b)
> x10: feb045f4 (SHA2Update + f4)
> x11:   29
> x12:1
> x13:0
> x14:0
> x15:2
> x16: feaf7500 ($d.0 + 0)
> x17: 00476cf0 (nanouptime + 0)
> x18: f0fa9000 (ucom_cons_softc + efbf1a50)
> x19: f0fa9168 (ucom_cons_softc + efbf1bb8)
> x20:  400
> x21: ff90 ($d.1 + afa318)
> x22: f0fa9198 (ucom_cons_softc + efbf1be8)
> x23:0
> x24:0
> x25:0
> x26: fed2df70 (sha256_neon_impl + 0)
> x27:  203
> x28:   31
> x29: f0fa9040 (ucom_cons_softc + efbf1a90)
>  sp: f0fa9000
>  lr: feb04668 (SHA2Update + 168)
> elr: feaf8684 (zfs_sha256_block_neon + 14)
> spsr: 2045
> esr: 1fe0
> panic: VFP exception in the kernel
> cpuid = 3
> time = 1680786034
> KDB: stack backtrace:
> db_trace_self() at db_trace_self
> db_trace_self_wrapper() at db_trace_self_wrapper+0x30
> vpanic() at vpanic+0x13c
> panic() at panic+0x44
> do_el1h_sync() at do_el1h_sync+0x210
> handle_el1h_sync() at handle_el1h_sync+0x10
> --- exception, esr 0xf0fa9198
> (null)() at 0x400
> KDB: enter: panic
> [ thread pid 1446 tid 100101 ]
> Stopped at  kdb_enter+0x44: undefined   f905c27f
> db>
>
> The above was produced via using an artifact build's
> kernel based on that exact commit:
>
> https://artifact.ci.freebsd.org/snapshot/main/2a58b312b62f908ec92311d1bd8536dbaeb8e55b/arm64/aarch64/kernel.txz
>
> By contrast, the prior commit had an artifact build
> as well, but it's kernel does not get the panic for
> zpool import :
>
> https://artifact.ci.freebsd.org/snapshot/main/b98fbf3781df16f7797b2bbeabf205dc7d4985ae/arm64/aarch64/kernel.txz
>
> See also:
>
> https://lists.freebsd.org/archives/freebsd-current/2023-April/003417.html
>
> ===
> Mark Millard
> marklmi at yahoo.com
>
>


-- 
Mateusz Guzik 



RE: git: 2a58b312b62f - main - zfs: merge openzfs/zfs@431083f75 [separate aarch64 panic for zpool import]

2023-04-07 Thread Mark Millard
Turns out that as of this commit aarch64 (Cortex-A72 and Cortex-A57
examples reported) gets the following even when no zfs media is
present (UFS boot):

# zpool import
 x0: f0fa9168 (ucom_cons_softc + efbf1bb8)
 x1: ff90 ($d.1 + afa318)
 x2: ff900400 ($d.1 + afa718)
 x3: fec1b0a4 (sha_incremental + 0)
 x4:0
 x5:   10
 x6: 8e16db93
 x7:0
 x8: feb06168 (tf_sha256_neon + 0)
 x9: fea931fb ($d.1 + b)
x10: feb045f4 (SHA2Update + f4)
x11:   29
x12:1
x13:0
x14:0
x15:2
x16: feaf7500 ($d.0 + 0)
x17: 00476cf0 (nanouptime + 0)
x18: f0fa9000 (ucom_cons_softc + efbf1a50)
x19: f0fa9168 (ucom_cons_softc + efbf1bb8)
x20:  400
x21: ff90 ($d.1 + afa318)
x22: f0fa9198 (ucom_cons_softc + efbf1be8)
x23:0
x24:0
x25:0
x26: fed2df70 (sha256_neon_impl + 0)
x27:  203
x28:   31
x29: f0fa9040 (ucom_cons_softc + efbf1a90)
 sp: f0fa9000
 lr: feb04668 (SHA2Update + 168)
elr: feaf8684 (zfs_sha256_block_neon + 14)
spsr: 2045
esr: 1fe0
panic: VFP exception in the kernel
cpuid = 3
time = 1680786034
KDB: stack backtrace:
db_trace_self() at db_trace_self
db_trace_self_wrapper() at db_trace_self_wrapper+0x30
vpanic() at vpanic+0x13c
panic() at panic+0x44
do_el1h_sync() at do_el1h_sync+0x210
handle_el1h_sync() at handle_el1h_sync+0x10
--- exception, esr 0xf0fa9198
(null)() at 0x400
KDB: enter: panic
[ thread pid 1446 tid 100101 ]
Stopped at  kdb_enter+0x44: undefined   f905c27f
db> 

The above was produced via using an artifact build's
kernel based on that exact commit:

https://artifact.ci.freebsd.org/snapshot/main/2a58b312b62f908ec92311d1bd8536dbaeb8e55b/arm64/aarch64/kernel.txz

By contrast, the prior commit had an artifact build
as well, but it's kernel does not get the panic for
zpool import :

https://artifact.ci.freebsd.org/snapshot/main/b98fbf3781df16f7797b2bbeabf205dc7d4985ae/arm64/aarch64/kernel.txz

See also:

https://lists.freebsd.org/archives/freebsd-current/2023-April/003417.html

===
Mark Millard
marklmi at yahoo.com