Re: GEOM has amnesia

2017-03-31 Thread Chris H
On Sat, 1 Apr 2017 01:36:54 +0300 "Andrey V. Elsukov"  wrote

> On 01.04.2017 00:58, Chris H wrote:
> > So. I spin up an old 11 server I have sitting in the closet, with
> > this external drive attached to it. I do *NOT* get the corrupt GPT
> > message. So I blank/partition/newfs the external drive &&
> > mount the partitions individually to /mnt && restore again. When I
> > reboot to the external drive still connected to the old 11 server,
> > I do *NOT* receive the corrupt GPT message. WooHoo! I think.
> > So I re-attach the drive to the new 12 server. Reboot, and can't
> > boot to it && get the corrupt GPT message.
> > 
> > GEOM seems to be broken in 12, maybe even (recent) 11. As the 11
> > server I used for testing is ~9 mos out.
> > 
> > What can I do to (help?) fix this mess?
> 
> Just a guess, BIOS on the system, where FreeBSD 12 is installed
> overwrites the last sector of your disks.
> I have seen such reports, and always this was the cause.
> 
> You can do the following steps to make sure:
> * on the old 11 system with the sane GPT save the last sector to some file.
> * reboot, save the sector again to another file and compare both files.
> * attach the disk to your 12 system, GPT should become corrupted. Save
> the last sector and compare with previous file.
> 
> You can look at the hexdump of this file, and probably it should be
> obviously what is extraneous in the data.
> 
> To save the last sector you need to know its number, it can be found by
> this command:
> 
>  # diskinfo da0 | awk '{print $4-1}'
> 
> Then use dd to save it:
> 
>  # dd if=/dev/da0 of=./sector skip='diskinfo da0 | awk '{print $4-1}''
>  # hexdump -C ./sector
> 
> You should see something like this:
>   45 46 49 20 50 41 52 54  00 00 01 00 5c 00 00 00  |EFI
> PART\...|
...
> *
> 0200
> 
> The dump of correct GPT header should not have more lines.
> 
Andrey, Thank you!

OK I'm having trouble with the concept. But *indeed* the
output indicates *always* good on the 11 server (confirmed
following your steps above).
Moving it to the new 12 server, returns corrupt secondary GPT
table message && hexdump output is:

  45 46 49 20 50 41 52 54  00 00 01 00 5c 00 00 00  |EFI PART\...|
0010  65 12 5c 16 00 00 00 00  2f 60 38 3a 00 00 00 00  |e.\./`8:|
0020  01 00 00 00 00 00 00 00  28 00 00 00 00 00 00 00  |(...|
0030  07 60 38 3a 00 00 00 00  91 e5 f5 c1 0d 16 e7 11  |.`8:|
0040  8d 49 00 24 81 ce ba 87  08 60 38 3a 00 00 00 00  |.I.$.`8:|
0050  80 00 00 00 80 00 00 00  00 00 00 00 86 da fa 98  ||
0060  61 66 13 80 09 fe d0 54  35 59 db 8e 43 b8 7e 37  |af.T5Y..C.~7|
0070  c9 77 0e 9d 35 fd 45 04  de 9a d3 ff 30 83 8f b4  |.w..5.E.0...|
0080  b9 84 1d 41 59 44 ef fd  fd 89 3e 1e 9e c6 23 e1  |...AYD>...#.|
0090  83 17 a7 53 e1 e7 51 c8  5f 87 2b 76 f8 60 c4 ca  |...S..Q._.+v.`..|
00a0  e2 3e 1e eb 12 69 12 32  33 c3 29 42 d6 aa 1a bc  |.>...i.23.)B|
00b0  90 af fc 4f d0 e1 58 c3  52 f5 5c 54 ca bd 05 8c  |...O..X.R.\T|
00c0  89 04 8d 7b 11 a3 b2 1e  07 6e fe 1b 79 00 c0 15  |...{.n..y...|
00d0  1a 39 79 28 91 a3 e8 24  93 1a 35 ef e9 f8 e5 17  |.9y(...$..5.|
00e0  e6 93 f1 a2 5d aa 3e 2f  40 dc b3 17 19 4c f6 05  |].>/@L..|
00f0  cf 75 3e 88 ad a4 2a 68  8c 04 c4 99 a1 bb a2 1c  |.u>...*h|
0100  9c 8d fe c7 3e e4 cb 56  ce 3d 33 5b 28 a5 c9 45  |>..V.=3[(..E|
0110  c7 3f aa e2 1e 98 bc e2  6d 9d 91 12 84 24 d6 13  |.?..m$..|
0120  3d b5 14 bd 9a 44 e9 ee  3f b5 91 31 73 86 79 7e  |=D..?..1s.y~|
0130  09 bd 4e 01 cb 06 81 b4  41 11 cd cf 97 dd 97 a1  |..N.A...|
0140  a7 73 e5 f7 c5 a4 75 c9  1f 6b 5e 88 fe 1a 92 d2  |.su..k^.|
0150  3a cc 70 21 1f b8 30 34  b9 0e 5c b2 d0 14 5e 82  |:.p!..04..\...^.|
0160  56 60 04 35 77 c9 25 04  7a af ce e1 8d 24 37 53  |V`.5w.%.z$7S|
0170  a3 0c dd 63 3c 15 fe 9f  a4 46 00 97 c1 b0 27 be  |...c

Re: FYI: what it takes for RAM+swap to build devel/llvm40 with 4 processors or cores and WITH__DEBUG= (powerpc64 example)

2017-03-31 Thread Mark Millard
On 2017-Mar-30, at 7:51 PM, Mark Millard  wrote:

> On 2017-Mar-30, at 1:22 PM, Mark Millard  wrote:
> 
>> Sounds like the ALLOW_OPTIMIZATIONS_FOR_WITH_DEBUG technique
>> would not change the "WITNESS and INVARIANTS"-like part of the
>> issue. In fact if WITH_DEBUG= causes the cmake debug-style
>> llvm40 build ALLOW_OPTIMIZATIONS_FOR_WITH_DEBUG might not
>> make any difference: separate enforcing of lack of optimization.
>> 
>> But just to see what results I've done "pkg delete llvm40"
>> and am doing another build with ALLOW_OPTIMIZATIONS_FOR_WITH_DEBUG=
>> and its supporting code in place in addition to using WITH_DEBUG=
>> as the type of build fro FreeBSD's viewpoint.
>> 
>> If you know that the test is a waste of machine cycles, you can
>> let me know if you want.
> 
> The experiment showed that ALLOW_OPTIMIZATIONS_FOR_WITH_DEBUG
> use made no difference for devel/llvm40 so devel/llvm40 itself
> has to change such as what Dimitry Andric reported separately
> as a working change to the Makefile .
> 
> (ALLOW_OPTIMIZATIONS_FOR_WITH_DEBUG would still have its uses
> for various other ports.)

I've now tried with both ALLOW_OPTIMIZATIONS_FOR_WITH_DEBUG and:

# svnlite diff /usr/ports/devel/llvm40/
Index: /usr/ports/devel/llvm40/Makefile
===
--- /usr/ports/devel/llvm40/Makefile(revision 436747)
+++ /usr/ports/devel/llvm40/Makefile(working copy)
@@ -236,6 +236,11 @@
 
 .include 
 
+.if defined(WITH_DEBUG)
+CMAKE_BUILD_TYPE=  RelWithDebInfo
+STRIP=
+.endif
+
 _CRTLIBDIR=
${LLVM_PREFIX:S|${PREFIX}/||}/lib/clang/${LLVM_RELEASE}/lib/freebsd
 .if ${ARCH} == "amd64"
 _COMPILER_RT_LIBS= \



pkg delete after the build reports:

Installed packages to be REMOVED:
llvm40-4.0.0

Number of packages to be removed: 1

The operation will free 42 GiB.

So down by 7 GiBytes from 49 GiBytes.

(I did not actually delete it.)

Also:

# du -sg /usr/obj/portswork/usr/ports/devel/llvm40
102 /usr/obj/portswork/usr/ports/devel/llvm40

which is down by 16 GiBytes from 118 GiBytes.

Reminder: These are from portmaster -DK so no
cleanup after the build, which is what leaves
the source code and such around in case of
needing to look at a problem.

(102+42) GiBytes == 146 GiBytes.
vs.
(118+49) GiBytes == 167 GiBytes.

So a difference of 21 GiBytes (or so).

But that is for everything in each case (and
WITH_DEBUG= in use):

# more /var/db/ports/devel_llvm40/options
# This file is auto-generated by 'make config'.
# Options for llvm40-4.0.0.r4
_OPTIONS_READ=llvm40-4.0.0.r4
_FILE_COMPLETE_OPTIONS_LIST=CLANG DOCS EXTRAS LIT LLD LLDB
OPTIONS_FILE_SET+=CLANG
OPTIONS_FILE_SET+=DOCS
OPTIONS_FILE_SET+=EXTRAS
OPTIONS_FILE_SET+=LIT
OPTIONS_FILE_SET+=LLD
OPTIONS_FILE_SET+=LLDB

So avoiding WITH_DEBUG= and/or various build options
is still the major way of avoiding use of lots of space
if it is an issue.



Why no RAM+SWAP total report this time:

As far as I know FreeBSD does not track or report peak
swap-space usage since the last boot. And, unfortunately
I was not around to just sit and watch a top display this
time and I did not set up any periodic recording into a
file.

That is why I've not reported on the RAM+SWAP total
this time. It will have to be another experiment
some other time.

[I do wish FreeBSD had a way of reporting peak swap-space
usage.]

===
Mark Millard
markmi at dsl-only.net

___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: GEOM has amnesia

2017-03-31 Thread Andrey V. Elsukov
On 01.04.2017 00:58, Chris H wrote:
> So. I spin up an old 11 server I have sitting in the closet, with
> this external drive attached to it. I do *NOT* get the corrupt GPT
> message. So I blank/partition/newfs the external drive &&
> mount the partitions individually to /mnt && restore again. When I
> reboot to the external drive still connected to the old 11 server,
> I do *NOT* receive the corrupt GPT message. WooHoo! I think.
> So I re-attach the drive to the new 12 server. Reboot, and can't
> boot to it && get the corrupt GPT message.
> 
> GEOM seems to be broken in 12, maybe even (recent) 11. As the 11
> server I used for testing is ~9 mos out.
> 
> What can I do to (help?) fix this mess?

Just a guess, BIOS on the system, where FreeBSD 12 is installed
overwrites the last sector of your disks.
I have seen such reports, and always this was the cause.

You can do the following steps to make sure:
* on the old 11 system with the sane GPT save the last sector to some file.
* reboot, save the sector again to another file and compare both files.
* attach the disk to your 12 system, GPT should become corrupted. Save
the last sector and compare with previous file.

You can look at the hexdump of this file, and probably it should be
obviously what is extraneous in the data.

To save the last sector you need to know its number, it can be found by
this command:

 # diskinfo da0 | awk '{print $4-1}'

Then use dd to save it:

 # dd if=/dev/da0 of=./sector skip=`diskinfo da0 | awk '{print $4-1}'`
 # hexdump -C ./sector

You should see something like this:
  45 46 49 20 50 41 52 54  00 00 01 00 5c 00 00 00  |EFI
PART\...|
0010  d7 b2 b7 bc 00 00 00 00  af 32 cf 1d 00 00 00 00
|.2..|
0020  01 00 00 00 00 00 00 00  28 00 00 00 00 00 00 00
|(...|
0030  87 32 cf 1d 00 00 00 00  a0 4a 4a e0 b0 0a e7 11
|.2...JJ.|
0040  ba c4 54 ee 75 ad 8c c7  8f 32 cf 1d 00 00 00 00
|..T.u2..|
0050  80 00 00 00 80 00 00 00  22 88 eb 6d 00 00 00 00
|"..m|
0060  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00
||
*
0200

The dump of correct GPT header should not have more lines.

-- 
WBR, Andrey V. Elsukov



signature.asc
Description: OpenPGP digital signature


GEOM has amnesia

2017-03-31 Thread Chris H
Hi I brought this up earlier, but didn't have as
much to go on as I do now. So I'd like to try this again;
On a recent(ish) install of CURRENT followed by a new
kernel/world. I'm finding I can't depend on geom(8) for
anything, but the primary (SATA3) drive, it's installed on
(if even that). To the point;
Blanking/partitioning/formatting a usb memstick to to
dump(8) this system to, works fine *until* I reboot.
Where I'm greeted with
GEOM: da0: the secondary secondary GPT table is corrupt or invalid
..
GEOM: diskid/DISK-... : the secondary GTP table is corrupt or invalid
..
using the primary only --

gpart recover returns the status to OK, *until* I reboot. Where
I'm greeted by the same BS.
OK I can't live with this, so I grab a usb2 external drive off
the shelf, and try it again. blank/partition/newfs && fsck
mounted the partitions on /mnt and performed a restore.

Reboot; && get the corrupt GPT message.

So. I spin up an old 11 server I have sitting in the closet, with
this external drive attached to it. I do *NOT* get the corrupt GPT
message. So I blank/partition/newfs the external drive &&
mount the partitions individually to /mnt && restore again. When I
reboot to the external drive still connected to the old 11 server,
I do *NOT* receive the corrupt GPT message. WooHoo! I think.
So I re-attach the drive to the new 12 server. Reboot, and can't
boot to it && get the corrupt GPT message.

GEOM seems to be broken in 12, maybe even (recent) 11. As the 11
server I used for testing is ~9 mos out.

What can I do to (help?) fix this mess?

--Chris

See also: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=218026

Thanks!


___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: VNET branch destiny

2017-03-31 Thread Bjoern A. Zeeb

On 31 Mar 2017, at 13:57, Pavel Timofeev wrote:


Hello, dear freebsd-current@!

There was FreeBSD Foundation report back in 2016Q2 where it told us
about VNET (VIMAGE) update project sponsored by foundation.
What is the current situation? Is it committed into base? If not
what's the plan?


Changes are in 12 and 11.   12 has seen more slight fixes due to other 
changes that other committers are tracking and I hope they merge to 11.


/bz
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


VNET branch destiny

2017-03-31 Thread Pavel Timofeev
Hello, dear freebsd-current@!

There was FreeBSD Foundation report back in 2016Q2 where it told us
about VNET (VIMAGE) update project sponsored by foundation.
What is the current situation? Is it committed into base? If not
what's the plan?
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: New syscons bugs: shutdown -r doesn't execute rc.d sequence and others

2017-03-31 Thread Bruce Evans

On Fri, 31 Mar 2017, Andrey Chernov wrote:


On 30.03.2017 21:53, Bruce Evans wrote:

I think it was the sizing.  The non-updated mode is 80x25, so the row
address can be out of bounds in the teken layer.


I have text 80x30 mode set at rc stage, and _after_ that may have many
kernel messages on console, all without causing reboot. How it is
different from shutdown stage? Syscons mode is unchanged since rc stage.


Probably just because their weren't enough messages to go past row 24.
I had no difficulty reproducing the crash today for entering ddb and
reboot starting 80x30 and rows > 24, after removing just the window
size update in the fix.  I missed seeing it the other day because I
tested with 80x60 to see the smaller console window more clarly, but
must have only tried rebooting with row <= 24.

Another recent fix for sc reduced the problem a little.  Mode changes
are supposed to clear the screen and move the cursor to home, but they
only clear the screen.  You should have noticed the ugliness from that
after the the switch to 80x30.  There are enough boot messages to
reach row 24 and messages continued from there.  Now they start at the
top of the screen again.  Clearing the messages is not ideal, but syscons
always did it.

Syscons also has new and old bugs preserving colors across mode changes:
- it never preserved changes to the palette (FBIO_SETPALETTE ioctl).
  Some mode changes should reset the palette, but some should not.
  Especially not ones for a vt switch
- BIOSes should reset the palette for mode changes (even to the same mode).
  Some BIOSes are confused by syscons setting the DAC to 8 bit mode and
  reset to a garbage (dark) palette then.  They always switch back to
  6 bit mode
- syscons used to maintain the current colors and didn't change them for
  mode changes.  This was slightly broken, since for a mode change from
  a mode with full color to one with less color, the interpretation of
  the color indexes might change.  The colors are now maintained by
  teken and syscons tells teken to do a full window size change which
  resets the entire teken state including colors.  This bug is normally
  hidden by vidcontrol refreshing the colors.

  vidcontrol could be held responsible for refreshing or resetting
  everything after a mode change ioctl, but I think this is backwards
  since there are many low-level details that are better handled in
  the driver.  Switching to graphics modes is already a complicated
  2-ioctl process with not enough options and poor error handling.
  Like a too-simple wrapper for fork-exec.

vt has some interesting related bugs.  It doesn't support mode switches
of course, and even changing the font seems to be unsupported in text
mode.  But in graphics mode, changing the font works and even redraws
the screen where syscons would clear it for the mode change.  But there
are bugs redrawing the screen -- often old history is redrawn.  This
should work like in xterm or a general X window refresh where the
redrawing must be done for lots of other events than resize (exposure,
etc.).


- sysctl debug.kdb.break_to_debugger.  This is documented in ddb(4), but
  only as equivalent to the unbroken BREAK_TO_DEBUGGER.


Thanx. Setting debug.kdb.break_to_debugger=1 makes both Ctrl-Alt-ESC and
Ctrl-PrtScr works in sc only mode and "c" exit don't cause all chars
beeps like in vt. I.e. it works. But I don't understand why debugging
via serial involved in sc case while not involved in vt case and fear
that some serial noise may provoke break.


This is because only syscons has full conflation of serial line breaks
with entering the debugger via a breakpoint instuction.  Syscons does:

kdb_break();

for its KDB keys, while vt does:

kdb_enter(KDB_WHY_BREAK, ...)

for its KDB keys.  The latter bypasses KDB's permissions on entering
the debugger with a BREAK.  It is unclear if this is a layering violation
in vt or incorrect use of kdb_break() in syscons.  It is certainly wrong
for vt to use the KDB_WHY_BREAK code if it is avoiding using kdb_break()
to fix the conflation.


Is there a chance to untie
serial and sc console debuggers?


This is easy to do by copying vt's arguable layering violation.  A little
more is necessary to unconflate serial breaks:
- agree that kdb_break() and KDB_WHY_BREAK are only for serial line breaks
- don't use kdb_break() and KDB_WHY_BREAK for console KDB keys of course.
  vt already has a string saying that the entry is a "manual escape to
  debugger".  Here "to debugger" is redundant, "manual escape" means
  "DDB key hit manaually by the user" and the driver that saw the key
  is left out.  "vt KDB key" would be a more useful message.  syscons
  used to print a similar message, but it now calls kdb_break() which
  produces the conflated code KDB_WHY_BREAK and the consistently
  conflated message "Break to debugger".  This is also used for serial
  line breaks.  Capitalization is also inconsistent.
- remove kdb_break().  The only