Re: pipe read returning EAGAIN

2016-02-17 Thread David Laight
On Mon, Feb 08, 2016 at 11:47:44AM +0100, Manuel Bouyer wrote:
> 
> Now the question is why is the POLLIN flag set when there's no data to read ?
> zeroing out revents before callin poll(2) doens't help.
> 
> The man page says:
>  This implementation differs from the historical one in that a given file
>  descriptor may not cause poll() to return with an error.  In cases where
>  this would have happened in the historical implementation (e.g. trying to
>  poll a revoke(2)d descriptor), this implementation instead copies the
>  events bitmask to the revents bitmask.  Attempting to perform I/O on this
>  descriptor will then return an error.  This behaviour is believed to be
>  more useful.

That sounds broken.
I think POLLERR should be set after revoke().
However, nothing in the pollfd[] array should cause the poll()
call itself to fail.

> Does it do so if the file descriptor's error is EAGAIN ?
> If so that's no very usefull ...

You are confused, that'll be for errors looking up the relevant driver.
Any error from a previous system call is not remembered.
(or better not be).

It look as though the poll support for pipes is somehow returning
'readable' when no data is available.

Of course there might be some uninitialised memory lurking.

David

-- 
David Laight: da...@l8s.co.uk


Re: Bad sleep time resolution of nanosleep(2)

2016-01-14 Thread David Laight
On Tue, Nov 24, 2015 at 01:58:15AM +0100, Rhialto wrote:
> > 
> > Well, it is rounded up first to whole ticks, that's the easy part. Next
> > the callout is scheduled at the tick boundary and then the LWP is
> > unblocked and scheduled again. It will run in the next scheduling cycle
> > unless nothing else is running?
> 
> I tried it on some fairly idle machines, and the result was quite
> consistent. It really looks like there is something in there that
> inadvertently always causes an extra tick delay.

The extra tick is added to ensure that the minumum sleep time is met.
Otherwise the sleep will be too short if called just before s tick.

David

-- 
David Laight: da...@l8s.co.uk


Re: schizophrenic GCC versions in -current?

2015-09-16 Thread David Laight
On Wed, Sep 16, 2015 at 03:27:32PM -0500, John D. Baker wrote:
> On Wed, 16 Sep 2015, John D. Baker wrote:
> 
> > Just in case there was a snafu due to my preference for update builds,
> > I'm rebuilding in non-update mode to see if the two strings can be made
> > to agree.
> 
> This seems to have been the case.  Following a non-update build, the
> version string reported by the "--version" option matches the internal
> symbol.

For some other projects I add a make dependency for any object 
files that contain the version against all the other object files.
That ensures the version (and build date) is always correct.


David

-- 
David Laight: da...@l8s.co.uk


Re: Help needed with a stubborn Gateway box!

2015-03-27 Thread David Laight
On Fri, Mar 27, 2015 at 08:44:07PM +0800, Paul Goyette wrote:
 
 So, any suggestions on how to proceed?
 
 1) Is my plan to use the ubuntu-installed copy of grub to boot the 
 NetBSD boot.iso media successful?
 
 2) Is my plan to leave the ubuntu-installed copy of grub on the disk 
 (rather than writing new boot blocks) going to work?

ubuntu will have installed grub2.
AFAICT grub2 is only of any use if you want to do exactly what
'they' expect you to do with it.
Which basically assumes you are running linux.
 
 3) Is there some other way of getting this beast booted into NetBSD?

Boot from USB?
you might find the bios will boot a cd image written to a usb memory stick.

David

-- 
David Laight: da...@l8s.co.uk


Re: firefox eats threads

2015-03-18 Thread David Laight
On Wed, Mar 18, 2015 at 05:17:29PM +, Eric Haszlakiewicz wrote:
 
 
 On March 18, 2015 11:01:15 AM EDT, Tobias Nygren t...@netbsd.org wrote:
 Firefox names all it's threads by type with pthread_setname_np(3).
 The following command is useful to find out what kind of threads are in
 use:
 
 $ ps -sp 12501 -O lname
 
 Firefox after startup pools 45 threads so that's one third of your
 available LWPs. My opinion is that the default ulimits on amd64 have
 not
 caught up with the times. 1024 would be a more reasonable figure than
 128  160 for open files and lwps.
 
 
 Fwiw, chrome/chromium fires up 69 threads, although I've only examined it 
 running on a linux box.
 Those limits are clearly inadequate.  They should probably be calculated 
 based on the machine resources, such as the amount of memory available.

The 'hard' ulimit values also need reducing.

But yes, most of the 'system wide' limits (even processes for root)
could be usefully replaced my checks against free kva, swap and physical
memory.
The problem is picking the values.

Look at what MAXYSERS does :-)


David

-- 
David Laight: da...@l8s.co.uk


Re: gpt booting status?

2014-12-30 Thread David Laight
On Wed, Dec 24, 2014 at 09:30:31PM -0500, Greg Troxel wrote:
 
 and a further question:
 
   I know /boot (with MBR) can skip the raidframe header.  So given a
   disk with a single MBR partition of type RAID, and an inside-the-RAID1
   disklabel with raid0a starting at 0, booting works.  (I'm sure because
   I do this all the time.)
 
   But, if I have gpt, with a RAID partition, and in the RAID1 have
   another gpt label, and in that a partition, is there any way to boot
   from that?  Basicallly I'm thinking
 
 sd0
   A=gpt partition 1, type raid, starts at 1024, big
 raid0  (so starts at 1024*64)
   B=gpt partition 1, starts at say 1024+64+64
 
and would like the bootxx_ffsv2 code written to the beginning of
A to see type 'raid', skip 64, and then interpret gpt vs mbr and find
the active inner gpt partition.

The code that reads /boot (last time I looked) doesn't inspect any inner
labels (of any type). It just had a nasty hack to look for a filesytem
a further 64 sectors down the disk if it doesn't find /boot in the
expected place.

Maybe a gpt disk has space for a larger boot image?
In which case it might be possible to have more code to find /boot.

David

-- 
David Laight: da...@l8s.co.uk


Re: 4k sector disks

2014-10-01 Thread David Laight
On Wed, Sep 03, 2014 at 11:07:32AM +0100, Robert Swindells wrote:
 
 Is there any special configuration needed to use 4k sector disks
 efficiently ?
 
 I have a couple of SATA drives with 4k sectors, the disklabels for
 them give a sector size of 512 bytes but 'atactl identify' shows
 the true sector size.
 
 I used 'newfs -S 4096' on one of them, a new SSD, but am wondering
 whether to copy stuff off the other one and repartition.

(as stated elsewhere, label it with 512 byte sectors)

If it is an SSD the actual sector size is likely to be much higher than 4k.

I can't actually imagine an SSD emulating 512 byte sectors in 4k ones
and then doing the required RMW cycles (with wear leveling) that the
actual memory requires.

OTOH doing larger (aligned) transfers will help.

I'd certainly ensure that everything is aligned and that the fragment
and block sizes are large.
But remember the boot code doesn't have enough memory for very large blocks.

David

-- 
David Laight: da...@l8s.co.uk


Re: cpuctl panic(!)

2014-07-24 Thread David Laight
On Wed, Jun 18, 2014 at 12:43:57PM +0100, Patrick Welche wrote:
 Surprise (-current/amd64):
 
 # cpuctl identify 0
 cpu0: highest basic info 000d
 cpu0: highest extended info 8008
 cpu0: Intel(R) Core(TM) i5-2520M CPU @ 2.50GHz
 cpu0: Intel Xeon E3-12xx, 2nd gen i7, i5, i3 2xxx (686-class), 2492.10 MHz
 cpu0: family 0x6 model 0x2a stepping 0x7 (id 0x206a7)
 ...
 cpu0: xsave features 0x7x87,SSE,AVX
 cpu0: xsave instructions 0x1XSAVEOPT
 cpu0: xsave area size: current 832, maximum 832, xgetbv enabled
 [1]   Segmentation fault (core dumped) cpuctl identify 0
 
 Program terminated with signal 11, Segmentation fault.
 #0  0x004053d0 in x86_xgetbv ()
 (gdb) bt
 #0  0x004053d0 in x86_xgetbv ()
 #1  0x0040467d in identifycpu (fd=3, cpuname=0x7f7fdb30 cpu0)
 at /usr/src/usr.sbin/cpuctl/arch/i386.c:1824
 #2  0x00401cd0 in cpu_identify (argv=0x7f7fdbd8)
 at /usr/src/usr.sbin/cpuctl/cpuctl.c:277
 #3  0x00401644 in main (argc=2, argv=0x7f7fdbd0)
 at /usr/src/usr.sbin/cpuctl/cpuctl.c:116

The cpu features indicate that xgetbv is available, but when it is executes
there cpu faults.
Clearly that shouldn't happen.
IIRC qemu is buggy - is that bare metal?

David

-- 
David Laight: da...@l8s.co.uk


Re: USB 3.0 status in NetBSD-current?

2014-06-16 Thread David Laight
On Sun, Jun 15, 2014 at 06:30:22PM -0500, Jonathan A. Kollasch wrote:
 On Wed, Jun 11, 2014 at 05:47:28AM +, Thomas Mueller wrote:
  Is there, or is there supposed to be, USB 3.0 support in the current kernel?
  
  I see xhci in kernel config, but have not yet been able to access anything 
  on a USB 3.0 port.
 
 Use a USB 2.0 cable in between to force USB 2.0 speeds.

That may not help.
A USB2 cable should still leave you using the xhci driver - just at the
lower speed.

There is some 'magic' needed to hand over the port from ohci? to xhci
(which probably require correct parsing of ACPI data to work out which
usb2 port the xhci port is linked to).

If the port isn't handed over (ie no xhci support in the kernel) the
USB port should still run at USB2 speeds.

There are also significant differences between the xhci hardware.
Some of which are definitely bugs, some are probably documentedd bugs,
other are just the hardware engineers making life extremely difficult
for the software engineers.

For example:
The xhci controller supports arbitrary scatter gather except:
1) The maximum fragment size is 64k.
2) Fragments can't cross 64k address boundaries.
3) The end of a ring segment must happen at the end of a USB packet.

David

-- 
David Laight: da...@l8s.co.uk


Re: gcc48, drmkms issues with i386

2014-04-23 Thread David Laight
On Mon, Apr 14, 2014 at 11:45:27PM +0900, Masao Uebayashi wrote:
 On Thu, Apr 10, 2014 at 3:40 AM, David Laight da...@l8s.co.uk wrote:
  On Wed, Apr 09, 2014 at 09:10:42AM -0500, John D. Baker wrote:
  On Wed, 9 Apr 2014, John D. Baker wrote:
 
   disk, the last part of the display actually looked like:
  
   prot_to_real: can't return to 0001296DFn: Diskn
...
  All the calls to 'prot_to_real' have to reside in the first 64k of
  the code area.
  The code them bombs out back to the outer loader.
 
 s/prot_to_real/real_to_prot/

Doesn't matter, they always appear as a pair.

 http://nxr.netbsd.org/xref/src/sys/arch/i386/stand/boot/Makefile.boot#131
 
 This is quite a hack...

And one I'm proud of :-)

An alternative would be to put all the functions that call prot_to_real
into a separate code section, and then arrange for that to get placed
before the normal .code section.
Trouble is, that probably requires a linker script.

David

-- 
David Laight: da...@l8s.co.uk


Re: gcc48, drmkms issues with i386

2014-04-10 Thread David Laight
On Wed, Apr 09, 2014 at 09:50:02PM +, Christos Zoulas wrote:
 
 Plausibly prot_to_real could set the real mode $cs value to one
 appropriate for the return address.
 The calls are all from assembler and are followed by a bios call
 and then a call to real_to_prot.
 
 If that were done the /boot code itself could probably be linked with a
 virtual base address of 1MB and run with virtual == physical removing
 the confusing offset.
 
 Do you want to take a stab at fixing it? It would take me an order
 of magnitude longer to do the same.

Not for at least a couple of weeks.

David

-- 
David Laight: da...@l8s.co.uk


Re: gcc48, drmkms issues with i386

2014-04-09 Thread David Laight
On Wed, Apr 09, 2014 at 09:10:42AM -0500, John D. Baker wrote:
 On Wed, 9 Apr 2014, John D. Baker wrote:
 
  disk, the last part of the display actually looked like:
  
  prot_to_real: can't return to 0001296DFn: Diskn
 
 Should have been:
 
 prot_to_real: can't return to 000129CD Fn: Diskn
 
 The amd64-built version behaves the same.  The only difference was the
 address reported in the message above: 00012D19

All the calls to 'prot_to_real' have to reside in the first 64k of
the code area.
The code them bombs out back to the outer loader.

The linker used to manage that, but it might have been relying
on the linker putting object files into a section in the order they
were specified on the command line.

Plausibly prot_to_real could set the real mode $cs value to one
appropriate for the return address.
The calls are all from assembler and are followed by a bios call
and then a call to real_to_prot.

If that were done the /boot code itself could probably be linked with a
virtual base address of 1MB and run with virtual == physical removing
the confusing offset.

David

-- 
David Laight: da...@l8s.co.uk


Re: fontconfig/freetype2 breaks amd64 build on netbsd-5/i386 host

2014-03-26 Thread David Laight
On Tue, Mar 25, 2014 at 07:53:12PM -0500, John D. Baker wrote:
...
 Here's the result of running objdump -dr against the object file as it
 exists on my filesystem (not extracted from library):
 
 /d0/build/current/obj/amd64/external/mit/xorg/lib/freetype/ftxf86.o: 
 file format elf64-x86-64
 
 
 Disassembly of section .text:
 
  FT_Get_X11_Font_Format:
0:   48 85 fftest   %rdi,%rdi
3:   74 1a   je 1f FT_Get_X11_Font_Format+0x1f
5:   48 8b bf b0 00 00 00mov0xb0(%rdi),%rdi
c:   48 8b 07mov(%rdi),%rax
f:   48 8b 40 40 mov0x40(%rax),%rax
   13:   48 85 c0test   %rax,%rax
   16:   74 07   je 1f FT_Get_X11_Font_Format+0x1f
   18:   be 00 00 00 00  mov$0x0,%esi
 19: R_X86_64_32 .rodata.str1.1
   1d:   ff e0   jmpq   *%rax
   1f:   31 c0   xor%eax,%eax
   21:   c3  retq
 
 I extracted the module from the library and ran 'objdump -dr' on it.
 It's the same.

That isn't PIC code, the PIC version is in ftxf86.pico
I've not checked the .a from a working build, but since a .so is
being generated it ought to contain the .pico versions.

I wonder how that is supposed to happen?
Maybe a parallel make happened to leave the wrong file lurking?

David

-- 
David Laight: da...@l8s.co.uk


Re: Recent new atf test failures

2014-03-26 Thread David Laight
On Wed, Mar 26, 2014 at 12:25:53PM -0700, Paul Goyette wrote:
 Some time in the last two weeks, we've had a few new test cases failing 
 in my amd64 test-bed.
 
 Tests that used to pass, but currently failing
 
   lib/csu/t_crt0/initfini3
   atf/atf-c/macros_test/detect_unused_tests
   atf/atf-c++/macros_test/detect_unused_tests
 
 Tests that currently fail, but don't seem to exist in older builds
 
   lib/libm/t_exp/exp2_powers
   lib/libm/t_exp/exp2_values

Those are some more extensive tests for exp2().

The exp2_powers tests are failing to generate an 'overflow' result.
They work for me on a real system - so it might be a qemu issue?

Maybe qemu is using the x87 fpu (with 80 bit precision) to emulate
the 64bit (and 32bit) SSE2 double (and float) maths - so the large
mutiplies used to generate overflow fail.

Actually, I wonder, have you rebuilt qemu since jeorg changed the
default x87 precision to 80bits?
That might be the difference between your tests and gson's tests
(which only show some minor errors for exp2f(7.7) and exp2f(8.8).

The exp2_values tests is showing up something odd in FP maths.
I've not changes the exp2f() code, but I'm seeing different
errors in my own testing (native on amd64) from earlier tests.
However it might just be that the allowed error is too small.

I've a local version of exp2() that uses the x87 'f2xm1' and 'fscale'
instructions on both i386 and amd64.
I do need to do a clock-count comparison for 'f2xm1', but I expect it
to be faster than the table lookup and 5th degree polynomial.
Intel claim these functions are monatonic, I bet the polynomial
version isn't.

David

-- 
David Laight: da...@l8s.co.uk


Re: Recent new atf test failures

2014-03-26 Thread David Laight
On Wed, Mar 26, 2014 at 02:57:15PM -0700, Paul Goyette wrote:
 On Wed, 26 Mar 2014, David Laight wrote:
 
 Actually, I wonder, have you rebuilt qemu since jeorg changed the
 default x87 precision to 80bits?
 That might be the difference between your tests and gson's tests
 (which only show some minor errors for exp2f(7.7) and exp2f(8.8).
 
 No, I have not updated my qemu recently (several months).

That change was somewhere near the end of last year.
The behaviour depends on the binutils version at the time the program
was linked.
There isn't a sysctl to force 64 or 80 bit modes.

David

-- 
David Laight: da...@l8s.co.uk


Re: fontconfig/freetype2 breaks amd64 build on netbsd-5/i386 host

2014-03-25 Thread David Laight
On Tue, Mar 25, 2014 at 04:25:36PM -0500, John D. Baker wrote:
 Following the updates/fixes to fontconfig/freetype2 in -current, building
 for amd64 target on my netbsd-5/i386 host consistently fails as follows:
 
 [...]
 --- libfontconfig.so.2.2 ---
 # build  src/libfontconfig.so.2.2
 rm -f libfontconfig.so.2.2
 /d0/build/current/tools/i386/bin/x86_64--netbsd-gcc  -Wl,-x -shared 
 -Wl,-soname,libfontconfig.so.2 -Wl,--warn-shared-textrel 
 -Wl,-Map=libfontconfig.so.2.map   --sysroot=/d0/build/current/DEST/amd64 
 -Wl,-rpath,/usr/X11R7/lib -L=/usr/X11R7/lib  -o libfontconfig.so.2.2  
 -Wl,-rpath-link,/d0/build/current/DEST/amd64/lib  -L=/lib  
 -Wl,--whole-archive libfontconfig_pic.a  -Wl,--no-whole-archive 
 -L/d0/build/current/obj/amd64/external/mit/expat/lib/libexpat -lexpat 
 -L/d0/build/current/obj/amd64/external/mit/xorg/lib/freetype -lfreetype 
 /d0/build/current/tools/i386/lib/gcc/x86_64--netbsd/4.8.3/../../../../x86_64--netbsd/bin/ld:
  /d0/build/current/DEST/amd64/usr/X11R7/lib/libfreetype.a(ftxf86.o): 
 relocation R_X86_64_32 against `.rodata.str1.1' can not be used when making a 
 shared object; recompile with -fPIC
 /d0/build/current/DEST/amd64/usr/X11R7/lib/libfreetype.a: could not read 
 symbols: Bad value
 collect2: error: ld returned 1 exit status
 *** [libfontconfig.so.2.2] Error code 1
 nbmake[9]: stopped in /x/current/src/external/mit/xorg/lib/fontconfig/src
 1 error

Can you find the command line used to compile ftxf86.o ?
and/or extract the object file from the library and feed it through
'objdump -dr' to find the relocation (and to see if it looks
like PIC code at all).

David

-- 
David Laight: da...@l8s.co.uk


Re: i386 and amd64 AVX support

2014-03-12 Thread David Laight
On Tue, Mar 11, 2014 at 08:51:06PM +, Alexander Nasonov wrote:
 David Laight wrote:
  I've committed code to the amd64 and i386 kernels that enables
  AVX for userspace.
  In particular the high ymm registers should be saved on context switches.
  
  Any additional testing would be welcome.
 
 Thanks for working on it. I resumed playing with avx instructions
 and I haven't found any problem so far.

Still some stuff to tidy up.
Mostly:
- make avx registers available to signal handlers.
- and to process core dumps
- add to ptrace for gdb.

The process core dump code is particularly problematical (especially
for cpus that support avx512) since it relies on several on-stack
copies of the fpu state (over 2k with avx512).

Might require major rework of the core dump code (made more complicated
by the requirement to be able to write core dumps to pipes).

David

-- 
David Laight: da...@l8s.co.uk


Re: posix_memalign conflict between /usr/include files

2014-03-08 Thread David Laight
On Sat, Mar 08, 2014 at 05:17:33PM +0100, Martin Husemann wrote:
 On Sat, Mar 08, 2014 at 10:35:03PM +0900, Ryo ONODERA wrote:
  How to handle this issue?
 
 The throw() needs to be removed.

I remember a discussion about this before.
But I can't remember what the throw() is about - especially on
a function with a C interface.

David

-- 
David Laight: da...@l8s.co.uk


Re: Porting DTrace to ARM

2014-03-06 Thread David Laight
On Thu, Mar 06, 2014 at 03:34:18PM +0900, Ryota Ozaki wrote:
 On Thu, Mar 6, 2014 at 2:21 PM, Masao Uebayashi uebay...@gmail.com wrote:
  Ah.  I misread that schedstate_percpu uses percpu(9)'s fast path,
  which doesn't exist...
 
  Anyway if it's assumed that cpu is not attached at run-time, assigning
  struct cpu_data::void *cpu_dtraceinfo at module attachment would be
  just fine.
 
 void * is probably good, otherwise we have to pull out structure definitions
 (ok, there are two: solaris_cpu_t and cpu_core_t) from external/cddl.
 opensolaris_init in external/cddl/osnet/sys/kern/opensolaris.c
 is a good place to assign, I think.

Using 'void *' causes problems with knowing which pointer is valid
for a given call.
There is no problem using 'struct foo *' without the contents of
'struct foo' being visible.

David

-- 
David Laight: da...@l8s.co.uk


i386 and amd64 AVX support

2014-02-26 Thread David Laight
I've committed code to the amd64 and i386 kernels that enables
AVX for userspace.
In particular the high ymm registers should be saved on context switches.

Any additional testing would be welcome.
At the moment there is no support for gdb and the ymm registers are not
written to core dumps, nor available to signal handlers.

Note that the ymm registers are caller-saved so should be don't care
on all system calls, so context switches from interrupt routines are
needed to actually test whether they are saved properly.

The code should also support the upcoming AVX-512, although stealing
another 2k from the kernel stack might cause problems!

David

-- 
David Laight: da...@l8s.co.uk


Re: Build break for port-hppa

2014-02-26 Thread David Laight
On Wed, Feb 26, 2014 at 01:13:34PM -0800, Paul Goyette wrote:
 Ooops - hit send too soon...
 
 With sources updated on 2014-02-26 at 15:29:31 UTC
 
 #create  ramdisk/ramdisk.fs
 Calculated size of `ramdisk.fs.tmp': 256 bytes, 1436 inodes
 Extent size set to 4096
 ramdisk.fs.tmp: 2.4MB (5000 sectors) block size 4096, fragment size 512
using 1 cylinder groups of 2.44MB, 625 blks, 1664 inodes.
 super-block backups (for fsck -b #) at:
 32,nbmakefs: Writing inode 1415 (work/./usr/mdec/boot), bytes 36864 + 4096: 
 Nospace left on device
 Populating `ramdisk.fs.tmp'

Which architecture ?

David

-- 
David Laight: da...@l8s.co.uk


Re: 6.99.32: panic when starting X

2014-02-23 Thread David Laight
On Sun, Feb 23, 2014 at 09:56:55PM +0100, Thomas Klausner wrote:
 On Sun, Feb 23, 2014 at 10:34:32AM +, Nick Hudson wrote:
  On 02/23/14 09:41, Thomas Klausner wrote:
 
  Also, x/i in ddb/crash that address and show registers
 
 (gdb) x/i  usb_allocmem_flags+0x6c
0x808dbe2c usb_allocmem_flags+108: cmp%rbx,(%rcx)
 
  I assume usb_allocmem_flags+0x6c is 0x808dbe2c
 
 Correct!
 
 Does this help?
 
 I have the kernel (without symbols) and the crash dump if you want to
 know more or look at it.

The kernels I've build don't have a 'cmp' instruction any where near
that offset in usb_allocmem_flags.
The function isn't that big, so if you run 'objdump -d /netbsd netbsd.dis'
and search for the function body you'll only have about 120 lines.
I can usually work out the source lines from that.
(gdb's 'disas usb_allocmem_flags' probably gives the same lines.)

David

-- 
David Laight: da...@l8s.co.uk


Re: 6.99.32: panic when starting X

2014-02-23 Thread David Laight
On Sun, Feb 23, 2014 at 10:26:21PM +, David Laight wrote:
 On Sun, Feb 23, 2014 at 09:56:55PM +0100, Thomas Klausner wrote:
  On Sun, Feb 23, 2014 at 10:34:32AM +, Nick Hudson wrote:
   On 02/23/14 09:41, Thomas Klausner wrote:
  
   Also, x/i in ddb/crash that address and show registers
  
  (gdb) x/i  usb_allocmem_flags+0x6c
 0x808dbe2c usb_allocmem_flags+108: cmp%rbx,(%rcx)
  
   I assume usb_allocmem_flags+0x6c is 0x808dbe2c
  
  Correct!
  
  Does this help?
  
  I have the kernel (without symbols) and the crash dump if you want to
  know more or look at it.
 
 The kernels I've build don't have a 'cmp' instruction any where near
 that offset in usb_allocmem_flags.
 The function isn't that big, so if you run 'objdump -d /netbsd netbsd.dis'
 and search for the function body you'll only have about 120 lines.
 I can usually work out the source lines from that.
 (gdb's 'disas usb_allocmem_flags' probably gives the same lines.)

Thomas sent me the disassembly.
It 'blew up' dereferencing block-tag in the loop:

1.53  mrg   313:mutex_enter(usb_blk_lock);
1.1   augustss  314:/* Check for free fragments. */
1.44  matt  315:LIST_FOREACH(f, usb_frag_freelist, next) {
1.48  matt  316:KDASSERTMSG(usb_valid_block_p(f-block, 
usb_blk_fraglist),
1.50  jym   317:%s: usb frag %p: unknown block 
pointer %p,
318: __func__, f, f-block);
1.1   augustss  319:if (f-block-tag == tag)
320:break;
1.41  matt  321:}

I'd guess a 'use after free' or 'allocate too short a buffer'.

David

-- 
David Laight: da...@l8s.co.uk


Re: updates to ls(1), output, and Emacs dired mode

2014-02-22 Thread David Laight
On Sat, Feb 22, 2014 at 09:40:38PM +, Patrick Welche wrote:
 On Sat, Feb 22, 2014 at 09:55:48PM +0900, Ryo ONODERA wrote:
  From: chris...@astron.com (Christos Zoulas), Date: Fri, 21 Feb 2014 
  02:11:36 + (UTC)
  
   In article 
   cabfrot8bczo+czrp-tffrc3j-qjdcp1grdkcjnujpq_jojt...@mail.gmail.com,
   B Harder  brad.har...@gmail.com wrote:
  I suspect that the recent changes to ls have affected its output,
  which affects Emacs dired mode (it parses ls output).
  
  1) Am I correct output has changed?
  2) if yes, is this expected behaviour?
   
   No, output should not have changed unless the new options are used.
  
  With ls.c 1.71, output of ls -w is broken.
  
  /usr/src/bin/ls% LANG=C ./ls -w
  . . . . . .
  . . . . . .
  . . . . .
 
 
 I noticed that as ls | more giving a different result to ls.

ls | more implies ls -1 | more.

I'm not sure you can actually get the terminal output into a file
(without using something like script).

David

-- 
David Laight: da...@l8s.co.uk


Re: amd64 build broken - npx.h not marked obsolete

2014-02-13 Thread David Laight
On Thu, Feb 13, 2014 at 07:28:07AM -0800, Paul Goyette wrote:
 With up-to-date sources I'm getting
 
 ==  1 missing files in DESTDIR  
 Files in flist but missing from DESTDIR.
 File wasn't installed ?
 --
 ./usr/include/i386/npx.h
   end of 1 missing files  ==
 
 Should this file be marked obsolete in src/distr/sets/lists/comp/md.i386 
 and /md.amd64 ?

I'd obsoleted it for i386, I'd not realised it was released for amd64.
Marked obsolete now.

It must be possible to about having to edit so many files...

David

-- 
David Laight: da...@l8s.co.uk


Re: Dozens of new test failures on amd64!

2014-02-12 Thread David Laight
On Wed, Feb 12, 2014 at 09:59:12AM -0800, Paul Goyette wrote:
 It seems to correspond with the recent changes/commits to atf ...
 
 We used to have 11 test failures for amd64, now we have 65!
 
 Please see [1] for details...
 
 [1] http://whooppee.com/amd64-results/6327_1_atf.html#failed-tcs-summary

Something strange happens with this on my system as well.

*** Check failed: /test-bed/src/tests/lib/libm/t_fmod.c:53: fabs(fmod(1.0, 0.1) 
- 0.1) = 55 * DBL_EPSILON not met

It might be my fault! I've been fiddling with the fpu code.

Except that it works on a bare-metal kernel I built 8pm on Sunday
just before committing the code but fails under qemu.
(Running on the same kernel.)

Mind you the generated code is very strange!
Ah that is because it uses the x87's 'partial remainder' instruction
in a loop, and under some other conditions falls back on the fmod()
library function.

David

-- 
David Laight: da...@l8s.co.uk


Re: Another 6.99.31 amd64 panic

2014-02-11 Thread David Laight
On Tue, Feb 11, 2014 at 05:28:11PM +, Christos Zoulas wrote:
 In article 
 CAG0OUxizzaDgjffmfKU1tSiPwYLi-+AUS+98mNgv=e6oqkc...@mail.gmail.com,
 Chavdar Ivanov  ci4...@gmail.com wrote:
 Same with a kernel from today.
 
 Chavdar
 
 On 10 February 2014 16:38, Chavdar Ivanov ci4...@gmail.com wrote:
  From a build at 2014/02/09 14:29 I get:
 
  ...
  boot device: raid0
  root on raid0a dumps on raid0b
  root file system type: ffs
  uvm_fault(0xfe8006d1ce60, 0x0, 4) - e
  uvm_fault(0xfe8006d1ce60, 0x0, 4) - e
  fatal page fault in supervisor mode
  trap type 6 code 0 rip 807d428e cs 8 rflags 10246 cr2 0 ilevel
  0 rsp fe8006d09560
  curlwp 0xfe8006d2fa00 pid 1.1 lowest kstack 0xfe8006d06000
  kernel: page fault trap, code=0
  Stopped in pid 1.1 (init) at   netbsd:trap+0x99b:  movzwl   
  0(%rax),%eax
  db{1} bt
  trap() at netbsdL:trap+0x99b
  --- trap (number 6) ---
  ?() at 0
  execve_loadvm() at netbsd:execve_loadvm+0x1d6
  execve1() at netbsd:execve1+0x2d
  start_init() at netbsd:start_init+0x2a7
  db{1}
 
 movq256(%rbx), %rdx
 movq%rbx, %rsi
 movq-88(%rbp), %rdi
 callcheck_exec
 -  movl%eax, %r13d
 testl   %eax, %eax
 
 That does not look correct, can you use objdump --disassemble on kern_exec.o
 then compile kern_exec.c changing on the compile line s/-c/-S -gstabs/ and
 see which source line corresponds to your failing instruction by matching
 the offset from kern_exec.o to the instruction in kern_exec.s and then finding
 the source line to kern_exec.c?

objdump -r -d kern_exec.o will lookup the relocations for you.
But I'd guess that the backtrace has missed a function and the fault
is somewhere inside check_exec().
The address 807d428e printed by the fault code is probbaly correct.
Try 'objdump -d /netbsd' and sort out which function it is in.

David

-- 
David Laight: da...@l8s.co.uk


Re: Automated report: NetBSD-current/i386 build failure

2014-02-11 Thread David Laight
On Tue, Feb 11, 2014 at 10:37:03PM +, NetBSD Test Fixture wrote:
 This is an automatically generated notice of a NetBSD-current/i386
 build failure.
 
 The failure occurred on babylon5.NetBSD.org, a NetBSD/amd64 host,
 using sources from CVS date 2014.02.11.20.17.16.
 
 An extract from the build.sh output follows:
 File is obsolete or flist is out of date ?
 --
 ./usr/include/x86/fpu.h
 =  end of 1 extra files  ===

Gah, I forgot that would creep into the i386 build already.
I'll add it it.

David

-- 
David Laight: da...@l8s.co.uk


Re: Re: compat linux exec arguments weirdness

2014-02-09 Thread David Laight
On Sun, Feb 09, 2014 at 08:41:56PM +0100, Onno van der Linden wrote:
 On Sun, Feb 09, 2014 at 08:45:40AM -0800, Chuck Silvers wrote:
   Looks like the implementation of AT_RANDOM messes up the
   argument stack (at least for the elf32 case, can't
   test the amd64 case myself).
  
  this should be fixed now, please update and give it a try.
 
 Works! Thanks very much, you can close the PR as far as I'm concerned.
 
 And now on to that firefox 27 compile error . :-)

If that is the one to do with fxsave64, I've commited a fix.

David

-- 
David Laight: da...@l8s.co.uk


Re: kernel crashes because crypto unloading?

2014-01-19 Thread David Laight
On Sun, Jan 19, 2014 at 09:49:42AM -0800, Paul Goyette wrote:
 On Sun, 19 Jan 2014, Paul Goyette wrote:
 
 I would have expected config_cfdata_detach() to fail (with EBUSY) if the 
 device was still open by someone.  So I'm not sure who/what still owns 
 allocations from the module's memory pool.
 
 Hmmm, I guess I misunderstood something.  It seems that there is no 
 protection against detaching a device even when it is currently open.
 
 A quick-and-dirty program that simply opens /dev/crypto and sleeps shows 
 that the module gets unloaded.
 
 I'm not sure at this point if the crypto(4) driver should implement a 
 ref-count, or if a more generic solution should be created within the 
 autoconf(9) framework.

The module can't do its own refcounting.
Think about what happens in the 'close' code on the last close.
The driver will decrement the ref count to zero.
The process gets pre-empted.
The driver gets unloaded.
The process resumes

open/close (well probably the vnode) needs to hold a reference count
against the device.

There is a another race as well.
If a loadable kernel module creates a kernel thread, then it has to
request a module reference for that thread.
When the thread exits it must do so by calling into the kernel requesting
that the thread exit AND that the module reference count be reduced.

Most of the time you should be able to assume that the code calling
into the module holds a reference (possibly indirectly) that ensures
the module won't go away.

David

-- 
David Laight: da...@l8s.co.uk


Re: bootxx_ffsv1 compilation failure on amd64

2014-01-16 Thread David Laight
On Wed, Jan 15, 2014 at 02:37:17PM -0800, crazzybouy wrote:
 Hi All
 I stumbled upon this post while looking for ways to recompile bootxx_ffsv1. 
 I need to put some prints to the boot loader to debug an issue for loading
 netbsd kernel with ramdisk size bigger than 16mb that does not work for me
 on an AMD64 board.

Does it work for ramdisk + kernel  15MB at all?
IIRC /boot is loaded at 64k (with a limit of 640k) and the kernel is
loaded at 1M.

The BIOS calls used to read the disk use a 16bit real mode seg:off address
so can only generate 20bit addresses - so a 16MB limit.
Loading any higher would require a low memory 'bouce' buffer.

David

-- 
David Laight: da...@l8s.co.uk


Re: evtchn_do_event: handler...didn't lower ipl (Was: Re: xl or xm for xen)

2013-12-05 Thread David Laight
On Tue, Dec 03, 2013 at 08:55:06AM +0700, Robert Elz wrote:
 
 When statclock() - and hardclock() before it - is (or are) called, the
 cpu (apparently) already holds a (spin) mutex (the ci_mtx_count field of
 the cpu_info struct is -1).Given that, and the way spin mutexes work,
 statclock() (and then hardclock()) must return with the ipl higher.

I'd have thought that acquiring a mutex would increase the count.
So a count of -1 would indicate and extra release.
Or does this counter have silly values?

David

-- 
David Laight: da...@l8s.co.uk


Re: ld.elf_so i386 memcpy corruption - calligrawords hangs

2013-10-17 Thread David Laight
On Thu, Oct 17, 2013 at 10:45:08AM +0200, Martin Husemann wrote:
 You could uncomment the following lines in the src/libexec/ld.elf_so/Makefile
 
 #CPPFLAGS+= -DDEBUG
 #CPPFLAGS+= -DRTLD_DEBUG
 
 (re-)build and install ld.elf_so, and set LD_DEBUG=1 when starting the 
 program.
 

Better is to link your program with the alternate elf interpreter name.
Then you don't affect anything else.
If the filenames are the same length you should be able to find the
string in the elf program and patch it (technically it is a shared
string - but it is unlikely to be used twice).

David

-- 
David Laight: da...@l8s.co.uk


Re: link problems

2013-10-11 Thread David Laight
On Fri, Oct 11, 2013 at 10:14:55AM +0200, Martin Husemann wrote:
 On Thu, Oct 10, 2013 at 06:42:54PM +0200, Martin Husemann wrote:
  You are right, but I can't find the initialization ;-)
 
 It is a bit hidden, but I think the patch below should do it - modulo
 the open question what defaults exactly we want changed.
 
 Joerg, do you mean to enable add_DT_NEEDED_for_regular as well by default?
 Do we have some simple test case for the whole issue?
 
 Martin

...
 +  input_flags.add_DT_NEEDED_for_dynamic = TRUE;
...

What does that change do?

If you link a program with -lcurses you don't want a DT_NEEDED entry
for libtemcap.so whether or not the program directly references
anything in libtermcap.so.

David

-- 
David Laight: da...@l8s.co.uk