from:"Andriy Gapon"

taskqueue_drain_all

2013-10-09 Thread Andriy Gapon


I would like to propose to extend taskqueue API with taskqueue_drain_all.
A potential use case: I have a private taskqueue, several kinds of tasks get
executed via it and then I want to make sure that all of them are completed.
Obviously, I have a way to ensure that no new ones get enqueued.

Is this a good addition?
Or should like consider looping over all of my tasks externally to taskqueue
implementation?

A quick prototype:

--- a/sys/kern/subr_taskqueue.c
+++ b/sys/kern/subr_taskqueue.c
@@ -316,6 +316,7 @@ taskqueue_run_locked(struct taskqueue *queue)
wakeup(task);
}
TAILQ_REMOVE(queue-tq_active, tb, tb_link);
+   wakeup(queue-tq_active);
 }

 void
@@ -402,6 +403,22 @@ taskqueue_drain(struct taskqueue *queue, struct task *task)
 }

 void
+taskqueue_drain_all(struct taskqueue *queue)
+{
+   struct task *task;
+
+   TQ_LOCK(queue);
+   while ((task = STAILQ_FIRST(queue-tq_queue)) != NULL) {
+   while (task-ta_pending != 0 || task_is_running(queue, task))
+   TQ_SLEEP(queue, task, queue-tq_mutex, PWAIT, -, 0);
+   }
+   while (TAILQ_FIRST(queue-tq_active) != NULL)
+   TQ_SLEEP(queue, queue-tq_active,
+   queue-tq_mutex, PWAIT, -, 0);
+   TQ_UNLOCK(queue);
+}
+
+void
 taskqueue_drain_timeout(struct taskqueue *queue,
 struct timeout_task *timeout_task)
 {


-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: what's going on after upgrade to svn-latest 9.

2013-10-01 Thread Andriy Gapon

on 01/10/2013 22:29 Alexey Dokuchaev said the following:
 On Fri, Jun 21, 2013 at 01:17:31PM +0200, Wojciech Puchar wrote:
 i'm getting this on my computer.
 disk seems OK, smart shows nothing, dd reads whole partition without
 problem.

 no other errors from disk itself (AHCI timeout or so).

 started exactly after i rebooted with new kernel. everything
 unchanged. userland is in sync

 thanks for help

 g_vfs_done():ada0s3d.eli[WRITE(offset=30653186048,
 length=1048576)]error = 11
 
 i've just tried recent 10.0-alpha4 on my laptop (normally running 8-stable)
 and start to see this shit as well.  smart does not report anything wrong
 with my disks, and 8-stable never yielded anything like these messages, so
 it looks like some kind of false alarm.  any clues?
 

It looks like the error comes from unmapped I/O code in GEOM and is caused by
KVA shortage:
if (vmem_alloc(transient_arena, size, M_BESTFIT | M_NOWAIT, addr)) {
if (transient_map_retries != 0 
retried = transient_map_retries) {
g_io_deliver(bp, EDEADLK/* XXXKIB */);

You could try to experiment with kern.geom.transient_map_retries

-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: Fatal trap 12 going from 8.2 to 8.4 with ZFS

2013-08-30 Thread Andriy Gapon

on 30/08/2013 11:17 Patrick said the following:
 H...
 
 (kgdb) list *vdev_mirror_child_select+0x67
 No symbol table is loaded.  Use the file command.
 
 Do I need to build the kernel from source myself? This kernel is what
 freebsd-update installed during part 1 of the upgrade.

I don't have an exact recollection of what is installed by freebsd-update - are
*.symbols files installed?

-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: Fatal trap 12 going from 8.2 to 8.4 with ZFS

2013-08-29 Thread Andriy Gapon

on 29/08/2013 19:37 Patrick said the following:
 I've got a system running on a VPS that I'm trying to upgrade from 8.2
 to 8.4. It has a ZFS root. After booting the new kernel, I get:
 
 Fatal trap 12: page fault while in kernel mode
 cpuid = 0; apic id = 00
 fault virtual address   = 0x40
 fault code  = supervisor read data, page not present
 instruction pointer = 0x20:0x810d7691
 stack pointer   = 0x28:0xff81ba60
 frame pointer   = 0x28:0xff81ba90
 code segment= base 0x0, limit 0xf, type 0x1b
 = DPL 0, pres 1, long 1, def32 0, gran 1
 processor eflags= interrupt enabled, resume, IOPL = 0
 current process = 1 (kernel)
 trap number = 12
 panic: page fault
 cpuid = 0
 KDB: stack backtrace:
 #0 0x8066cb96 at kdb_backtrace+0x66
 #1 0x8063925e at panic+0x1ce
 #2 0x809c21d0 at trap_fatal+0x290
 #3 0x809c255e at trap_pfault+0x23e
 #4 0x809c2a2e at trap+0x3ce
 #5 0x809a9624 at calltrap+0x8
 #6 0x810df517 at vdev_mirror_child_select+0x67

If possible, please run 'kgdb /path/to/8.4/kernel' and then in kgdb do 'list
*vdev_mirror_child_select+0x67'

 #7 0x810dfacc at vdev_mirror_io_start+0x24c
 #8 0x810f7c52 at zio_vdev_io_start+0x232
 #9 0x810f76f3 at zio_execute+0xc3
 #10 0x810f77ad at zio_wait+0x2d
 #11 0x8108991e at arc_read+0x6ce
 #12 0x8109d9d4 at dmu_objset_open_impl+0xd4
 #13 0x810b4014 at dsl_pool_init+0x34
 #14 0x810c7eea at spa_load+0x6aa
 #15 0x810c90b2 at spa_load_best+0x52
 #16 0x810cb0ca at spa_open_common+0x14a
 #17 0x810a892d at dsl_dir_open_spa+0x2cd
 Uptime: 3s
 Cannot dump. Device not defined or unavailable.
 
 I've booted back into the 8.2 kernel without any problems, but I'm
 wondering if anyone can suggest what I should try to get this working?
 I used freebsd-update to upgrade, and this was after the first
 freebsd-update install where it installs the kernel.


-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: [kde-freebsd] virtualbox file dialog problem

2013-08-28 Thread Andriy Gapon

on 18/07/2013 10:06 Andriy Gapon said the following:
 on 18/07/2013 03:25 Greg Rivers said the following:
 On Wed, 17 Jul 2013, Andriy Gapon wrote:

 I run virtualbox in KDE environment.  A while ago (can't say exactly when) I
 started to have a problem where any file opening dialog would fail with this
 message: Cannot talk to klauncher: Not connected to D-Bus server

 I found that setting KDE_FORK_SLAVES=1 in environment works around the 
 problem.

 I reported this same problem in this[1] thread on freebsd-ports@.  In that 
 post
 I provided a link to a similar report for KDE on openSUSE that required a 
 dbus
 patch to fix.

 I'm guessing that either the latest versions of VirtualBox have a bug in 
 their
 dbus interface, or the version of dbus we have needs to be updated.

 [1] http://lists.freebsd.org/pipermail/freebsd-ports/2013-July/084783.html
 
 I saw those OpenSUSE reports but I think that they were against the much older
 version of dbus.

I have done some more investigation and the problems turns out to be dbus
related indeed.

The problem has only a tangential relation to KDE, so I plan to drop kde@ from
this thread.  It has a relation to what VirtualBox does, so I am keeping
emulation@.  It is related to dbus and gnome@ is its maintainer(s).  It is also
related to how issetugid(2) works, so I am including standards@, security@ and
hackers@.
So, please excuse me for such a wide distribution list, but I think that the
solution should be negotiated among the parties involved.

Now a description of the problem.

1. VirtualBox executable is installed setuid root.  Apparently, when it is run
it does some privileged things and then drops all of the uids and gids (real,
effective and saved) back to what they should have been originally.
VirtualBox does not do any (re-)exec of itself after the above manipulations.

2. issetugid(2) (which is apparently a BSD extension) on FreeBSD does not
consider the above manipulations as sufficient to mark an executable as
untainted.  So it would return 1 for the VirtualBox process.

3. dbus code seems to impose some limitations on communication by such tainted
processes.  It has the following code:
http://cgit.freedesktop.org/dbus/dbus/tree/dbus/dbus-sysdeps-unix.c#n4139
For web-impaired :) the gist is that on BSD systems the code uses issetugid but
on other systems (like Linux) it uses getresuid and getresgid and checks that
all 3 uids are the same and all 3 gids are the same.

As a result, on FreeBSD the dbus code would consider the VirtualBox process
tainted and that impairs its communication with KDE components.
On systems without issetugid or those that implement it differently, dbus would
work as for a normal process and all the communications are OK.

I've also verified this conclusion by forcing dbus to use the alternative logic
on FreeBSD.

So, possible solutions:

A. change how issetugid(2) works on FreeBSD; a comment in sys_issetugid hints
that other BSDs may have different behaviors
B. change VirtualBox to be friendly to FreeBSD issetugid(2) and exec itself
after dropping the privileges
C. patch dbus port to not use issetugid(2)
D. something else

What do you guys think?

-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: [kde-freebsd] virtualbox file dialog problem

2013-08-28 Thread Andriy Gapon

on 28/08/2013 15:09 Andriy Gapon said the following:
 Now a description of the problem.
 
 1. VirtualBox executable is installed setuid root.  Apparently, when it is run
 it does some privileged things and then drops all of the uids and gids (real,
 effective and saved) back to what they should have been originally.
 VirtualBox does not do any (re-)exec of itself after the above manipulations.
 
 2. issetugid(2) (which is apparently a BSD extension) on FreeBSD does not
 consider the above manipulations as sufficient to mark an executable as
 untainted.  So it would return 1 for the VirtualBox process.
 
 3. dbus code seems to impose some limitations on communication by such 
 tainted
 processes.  It has the following code:
 http://cgit.freedesktop.org/dbus/dbus/tree/dbus/dbus-sysdeps-unix.c#n4139
 For web-impaired :) the gist is that on BSD systems the code uses issetugid 
 but
 on other systems (like Linux) it uses getresuid and getresgid and checks that
 all 3 uids are the same and all 3 gids are the same.
 
 As a result, on FreeBSD the dbus code would consider the VirtualBox process
 tainted and that impairs its communication with KDE components.
 On systems without issetugid or those that implement it differently, dbus 
 would
 work as for a normal process and all the communications are OK.
 
 I've also verified this conclusion by forcing dbus to use the alternative 
 logic
 on FreeBSD.
 
 So, possible solutions:
[snip]
 B. change VirtualBox to be friendly to FreeBSD issetugid(2) and exec itself
 after dropping the privileges
[snip]

BTW, I've just found this interesting code in the VirtualBox sources (forgive
me a full paste, but I couldn't resist):

#if defined(RT_OS_DARWIN)
# include dlfcn.h
# include sys/mman.h
# include iprt/asm.h
# include iprt/system.h

/** Really ugly hack to shut up a silly check in AppKit. */
static void ShutUpAppKit(void)
{
/* Check for Snow Leopard or higher */
char szInfo[64];
int rc = RTSystemQueryOSInfo (RTSYSOSINFO_RELEASE, szInfo, sizeof(szInfo));
if (   RT_SUCCESS (rc)
 szInfo[0] == '1') /* higher than 1x.x.x */
{
/*
 * Find issetguid() and make it always return 0 by modifying the code.
 */
void *addr = dlsym(RTLD_DEFAULT, issetugid);
int rc = mprotect((void *)((uintptr_t)addr  ~(uintptr_t)0xfff), 0x2000,
PROT_WRITE|PROT_READ|PROT_EXEC);
if (!rc)
ASMAtomicWriteU32((volatile uint32_t *)addr, 0xccc3c031); /* xor
eax, eax; ret; int3 */
}
}
#endif /* DARWIN */


-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: DTrace copyin with struct doesn't work?

2013-07-24 Thread Andriy Gapon

on 24/07/2013 21:13 Yuri said the following:
 This simple .d script fails:
 
 ---script begin---
 #!/usr/sbin/dtrace -s
 
 struct my_args {
int ii;
 };
 
 fbt::sys_select:entry
 {
   printf(sys_select %i, ((struct my_args*)copyin(arg1, sizeof (struct
 my_args)))-ii);
 }
 ---script end---
 
 dtrace: error on enabled probe ID 1 (ID 33598: fbt:kernel:sys_select:entry):
 invalid address (0xff82ff0799d8) in action #1 at DIF offset 40
 dtrace: error on enabled probe ID 1 (ID 33598: fbt:kernel:sys_select:entry):
 invalid address (0xff82fefb19d8) in action #1 at DIF offset 40
 
 Function sys_select is defined in kern/sys_generic.c:
 int
 sys_select(struct thread *td, struct select_args *uap)

From sys_select code it is clear that uap points to something that is already
copied in.  Unlike some fields within select_args.

 arg1 in DTrace script should correspond to uap argument of sys_select, and
 dereferencing should always produce an int.


-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: cpufreq not working as module on i386/amd64

2013-07-05 Thread Andriy Gapon

on 04/07/2013 08:37 Jia-Shiun Li said the following:
 ok anyone can help test and review this patch?

I can not bless this change, but I won't argue against it either.

My opinion is still that OS should advertise to ACPI the capabilities that it
actually has not those that it potentially may have.  So I prefer the status
quo.  I think that this is a minor issue and cpufreq should just be in kernel,
and that's it.

 It will allow cpufreq to be removed from kernel conf, loaded and
 function correctly as kernel module. I've tested it ok on my own
 i5-3450.
 
 It removes get_features method definition from acpi_if.m and
 corresponding implementations from est, p4tcc,  hwpstate. Feature
 flags are set directly in acpi_cpu.c omitting previous procedure of
 querying cpufreq drivers.
 
 
 After this, I'd like to find some ways to feed CPU loading info
 directly in kernel to cpufreq for finer  quicker control of
 frequencies.


-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

hwpmc with opteron 6128

2013-05-30 Thread Andriy Gapon


I am trying to do a very basic thing with hwpmc on this CPU:

CPU: AMD Opteron(tm) Processor 6128 (1999.05-MHz K8-class CPU)
hwpmc: SOFT/16/64/0x67INT,USR,SYS,REA,WRI TSC/1/64/0x20REA
K8/4/48/0x1ffINT,USR,SYS,EDG,THR,REA,WRI,INV,QUA

What I am trying is:
$ pmcstat -T -S instructions

What I am getting is just:
PMC: [FR_RETIRED_X86_INSTRUCTIONS] Samples: 0 (0.0%) , 0 unresolved

and nothing else on the screen.

Has anyone had a success with this class of processors?
Should it be supported?
Any ideas/suggestions/hints?

P.S. pmccontrol -L reports a whole bunch of K8 counters, just a small random
sub-sample:
BU_FILL_INTO_L2
IC_FETCH
IC_MISS
IC_REFILL_FROM_L2
IC_REFILL_FROM_SYSTEM
IC_L1_ITLB_MISS_AND_L2_ITLB_HIT
IC_L1_ITLB_MISS_AND_L2_ITLB_MISS
IC_MICROARCHITECTURAL_RESYNC_BY_SNOOP
IC_INSTRUCTION_FETCH_STALL
IC_RETURN_STACK_HIT
IC_RETURN_STACK_OVERFLOW
FR_RETIRED_X86_INSTRUCTIONS
FR_RETIRED_UOPS
FR_RETIRED_BRANCHES
FR_RETIRED_BRANCHES_MISPREDICTED
FR_RETIRED_TAKEN_BRANCHES
FR_RETIRED_TAKEN_BRANCHES_MISPREDICTED

-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: hwpmc with opteron 6128

2013-05-30 Thread Andriy Gapon

on 30/05/2013 18:21 Andriy Gapon said the following:
 
 I am trying to do a very basic thing with hwpmc on this CPU:
 
 CPU: AMD Opteron(tm) Processor 6128 (1999.05-MHz K8-class CPU)
 hwpmc: SOFT/16/64/0x67INT,USR,SYS,REA,WRI TSC/1/64/0x20REA
 K8/4/48/0x1ffINT,USR,SYS,EDG,THR,REA,WRI,INV,QUA

I didn't realize that the system was running in a VM.
Sorry for the noise.

 What I am trying is:
 $ pmcstat -T -S instructions
 
 What I am getting is just:
 PMC: [FR_RETIRED_X86_INSTRUCTIONS] Samples: 0 (0.0%) , 0 unresolved
 
 and nothing else on the screen.
 
 Has anyone had a success with this class of processors?
 Should it be supported?
 Any ideas/suggestions/hints?
 
 P.S. pmccontrol -L reports a whole bunch of K8 counters, just a small random
 sub-sample:
 BU_FILL_INTO_L2
 IC_FETCH
 IC_MISS
 IC_REFILL_FROM_L2
 IC_REFILL_FROM_SYSTEM
 IC_L1_ITLB_MISS_AND_L2_ITLB_HIT
 IC_L1_ITLB_MISS_AND_L2_ITLB_MISS
 IC_MICROARCHITECTURAL_RESYNC_BY_SNOOP
 IC_INSTRUCTION_FETCH_STALL
 IC_RETURN_STACK_HIT
 IC_RETURN_STACK_OVERFLOW
 FR_RETIRED_X86_INSTRUCTIONS
 FR_RETIRED_UOPS
 FR_RETIRED_BRANCHES
 FR_RETIRED_BRANCHES_MISPREDICTED
 FR_RETIRED_TAKEN_BRANCHES
 FR_RETIRED_TAKEN_BRANCHES_MISPREDICTED
 


-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: preemptive kernel

2013-05-27 Thread Andriy Gapon

on 27/05/2013 09:34 Konstantin Belousov said the following:
 Having both filter and ithread for the same interrupt is apparently
 possible but weird.  I do not see anything which would prevent interrupt
 filter from being executed while the ithread is running.  But again, this
 is very unusual setup.

I wouldn't call it weird, but, yes, it is rare.  It's a pretty normal
configuration when the filter acts as a filter and the handler acts as a
handler (in ithread).  In other words, it would be a replacement for a
configuration where a filter is used and the filter offloads actual work to
non-interrupt context via a e.g. taskqueue.
But, hmm, this functionality is probably locked under INTR_FILTER option.

-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: preemptive kernel

2013-05-27 Thread Andriy Gapon

on 27/05/2013 10:21 Orit Moskovich said the following:
 What is actually the difference between deferring a filter routine's work 
 using an ithread given to bus_setup_intr, or using the global taskqueue_swi 
 (implemented using interrupt thread)?

I think you mean taskqueue_fast.
The difference is only in how much code you need to write.  I do not think there
is any significant difference in the resulting functionality.

 What do you mean that the functionality is locked under INTR_FILTER?

Please see the code.  You have to use option INTR_FILTER to get the behavior I
described earlier.

 -Original Message-
 From: Andriy Gapon [mailto:a...@freebsd.org] 
 Sent: Monday, May 27, 2013 10:11 AM
 To: Konstantin Belousov
 Cc: Orit Moskovich; freebsd-hackers@freebsd.org
 Subject: Re: preemptive kernel
 
 on 27/05/2013 09:34 Konstantin Belousov said the following:
 Having both filter and ithread for the same interrupt is apparently 
 possible but weird.  I do not see anything which would prevent 
 interrupt filter from being executed while the ithread is running.  
 But again, this is very unusual setup.
 
 I wouldn't call it weird, but, yes, it is rare.  It's a pretty normal 
 configuration when the filter acts as a filter and the handler acts as a 
 handler (in ithread).  In other words, it would be a replacement for a 
 configuration where a filter is used and the filter offloads actual work to 
 non-interrupt context via a e.g. taskqueue.
 But, hmm, this functionality is probably locked under INTR_FILTER option.
 
 --
 Andriy Gapon
 

-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: preemptive kernel

2013-05-27 Thread Andriy Gapon


[trimmed cc]

on 27/05/2013 15:29 Orit Moskovich said the following:
From what I've read in subr_taskqueue.c taskqueue_swi, taskqueue_swi_giant 
and taskqueue_fast are all implemented using swi_add which calls 
ithread_create().
 Is there any performance difference between them. Is one of the above or 
 ithread given to bus_setup_intr preferable on the other?

The differences are described in taskqueue(9) Predefined Task Queues section.

-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: Rebooting from loader causes a fault in VMware Workstation

2013-04-23 Thread Andriy Gapon

on 23/04/2013 17:36 Dimitry Andric said the following:
 I have tried to ascertain it actually arrives at this code when
 rebooting from the loader, but it does not seem to ever make it there,
 at least not to the jump to f000:fff0.  Maybe VMware intercepts the
 switching back to real mode in the previous part, and dies on that, I am
 not sure.  It is of course rather tricky to print off any debug messages
 at that point. :-)

For the inquisitive minds here how last instructions (and CPU state) look
according to qemu log:

IN:
0xa030:  xor%eax,%eax
0xa032:  int$0x30


IN:
0x93e0:  cmp$0x1,%eax
0x93e3:  jne0x93ff


IN:
0x93ff:  orb$0x1,%ss:0x9007
0x9407:  jmp0x90d2


IN:
0x90d2:  cli
0x90d3:  mov$0x1800,%esp
0x90d8:  mov%cr0,%eax
0x90db:  and$0x7fff,%eax
0x90e0:  mov%eax,%cr0


IN:
0x90e3:  xor%ecx,%ecx
0x90e5:  mov%ecx,%cr3


IN:
0x90e8:  lgdtl  0x95d0
0x90ef:  ljmpw  $0x18,$0x90f5

Triple fault
CPU Reset (CPU 0)
ESI=0004503c EDI=3fe50968 EBP=00094a80 ESP=1800
EIP=90ef EFL=0046 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0033 a000  00cff300 DPL=3 DS   [-WA]
CS =0008   00cf9a00 DPL=0 CS32 [-R-]
SS =0010   00cf9300 DPL=0 DS   [-WA]
DS =0033 a000  00cff300 DPL=3 DS   [-WA]
FS =0033 a000  00cff300 DPL=3 DS   [-WA]
GS =0033 a000  00cff300 DPL=3 DS   [-WA]
LDT=   8200 DPL=0 LDT
TR =0038 5f98 2067 8900 DPL=0 TSS32-avl
GDT= ff85c789 
IDT= 5e00 0197
CR0=0011 CR2= CR3= CR4=
DR0= DR1= DR2= 
DR3=
DR6=0ff0 DR7=0400
CCS=0001 CCD= CCO=LOGICL
EFER=

-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: Rebooting from loader causes a fault in VMware Workstation

2013-04-23 Thread Andriy Gapon

on 23/04/2013 19:09 Andriy Gapon said the following:
 
 IN:
 0x90d2:  cli
 0x90d3:  mov$0x1800,%esp
 0x90d8:  mov%cr0,%eax
 0x90db:  and$0x7fff,%eax
 0x90e0:  mov%eax,%cr0
 
 
 IN:
 0x90e3:  xor%ecx,%ecx
 0x90e5:  mov%ecx,%cr3
 
 
 IN:
 0x90e8:  lgdtl  0x95d0
 0x90ef:  ljmpw  $0x18,$0x90f5

Perhaps the problem is that lgdt is called after disabling paging?

 Triple fault
 CPU Reset (CPU 0)
 ESI=0004503c EDI=3fe50968 EBP=00094a80 ESP=1800
 EIP=90ef EFL=0046 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
 ES =0033 a000  00cff300 DPL=3 DS   [-WA]
 CS =0008   00cf9a00 DPL=0 CS32 [-R-]
 SS =0010   00cf9300 DPL=0 DS   [-WA]
 DS =0033 a000  00cff300 DPL=3 DS   [-WA]
 FS =0033 a000  00cff300 DPL=3 DS   [-WA]
 GS =0033 a000  00cff300 DPL=3 DS   [-WA]
 LDT=   8200 DPL=0 LDT
 TR =0038 5f98 2067 8900 DPL=0 TSS32-avl
 GDT= ff85c789 
 IDT= 5e00 0197
 CR0=0011 CR2= CR3= CR4=
 DR0= DR1= DR2= 
 DR3=
 DR6=0ff0 DR7=0400
 CCS=0001 CCD= CCO=LOGICL
 EFER=
 


-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: Rebooting from loader causes a fault in VMware Workstation

2013-04-23 Thread Andriy Gapon

on 23/04/2013 19:31 John Baldwin said the following:
 On Tuesday, April 23, 2013 12:09:28 pm Andriy Gapon wrote:
 on 23/04/2013 17:36 Dimitry Andric said the following:
 I have tried to ascertain it actually arrives at this code when
 rebooting from the loader, but it does not seem to ever make it there,
 at least not to the jump to f000:fff0.  Maybe VMware intercepts the
 switching back to real mode in the previous part, and dies on that, I am
 not sure.  It is of course rather tricky to print off any debug messages
 at that point. :-)

 For the inquisitive minds here how last instructions (and CPU state) look
 according to qemu log:

 IN:
 0xa030:  xor%eax,%eax
 0xa032:  int$0x30

 
 IN:
 0x93e0:  cmp$0x1,%eax
 0x93e3:  jne0x93ff

 
 IN:
 0x93ff:  orb$0x1,%ss:0x9007
 0x9407:  jmp0x90d2

 
 IN:
 0x90d2:  cli
 0x90d3:  mov$0x1800,%esp
 0x90d8:  mov%cr0,%eax
 0x90db:  and$0x7fff,%eax
 0x90e0:  mov%eax,%cr0

 
 IN:
 0x90e3:  xor%ecx,%ecx
 0x90e5:  mov%ecx,%cr3

 
 IN:
 0x90e8:  lgdtl  0x95d0
 0x90ef:  ljmpw  $0x18,$0x90f5

 Triple fault
 CPU Reset (CPU 0)
 ESI=0004503c EDI=3fe50968 EBP=00094a80 ESP=1800
 EIP=90ef EFL=0046 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
 ES =0033 a000  00cff300 DPL=3 DS   [-WA]
 CS =0008   00cf9a00 DPL=0 CS32 [-R-]
 SS =0010   00cf9300 DPL=0 DS   [-WA]
 DS =0033 a000  00cff300 DPL=3 DS   [-WA]
 FS =0033 a000  00cff300 DPL=3 DS   [-WA]
 GS =0033 a000  00cff300 DPL=3 DS   [-WA]
 LDT=   8200 DPL=0 LDT
 TR =0038 5f98 2067 8900 DPL=0 TSS32-avl
 GDT= ff85c789 
 
 This seems wrong (address is way too high).  I wonder if the gdtdesc was 
 trashed by something?  Can you dump memory before the lgdtl instruction at 
 the 
 0x95d0 address?

Looks correct:
Breakpoint 1, 0x90e8 in ?? ()
(gdb) x/i $eip
0x90e8: lgdtl  0x95d0
(gdb) x/3xh 0x95d0
0x95d0: 0x003f  0x9590  0x
(gdb) x/16xh 0x9590
0x9590: 0x  0x  0x  0x  0x  0x  0x9a00  0x00cf
0x95a0: 0x  0x  0x9300  0x00cf  0x  0x  0x9a00  0x

Nevertheless doing stepi leads to exactly the same triple fault.

-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: Rebooting from loader causes a fault in VMware Workstation

2013-04-23 Thread Andriy Gapon

on 24/04/2013 02:03 Dimitry Andric said the following:
 Indeed, the DS segment was incorrect, the GDT should be loaded from the
 CS segment instead. 

Very good catch!  Indeed the segments at this point were set up for user data
while the supervisor data is needed.
Thank you!

-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

sdt panic

2013-04-19 Thread Andriy Gapon

#0  doadump (textdump=1) at pcpu.h:233
#1  0x80591d69 in kern_reboot (howto=260) at
/usr/src/sys/kern/kern_shutdown.c:444
#2  0x8059210c in panic (fmt=value optimized out) at
/usr/src/sys/kern/kern_shutdown.c:620
#3  0x80775fee in trap_fatal (frame=0xc, eva=18446744071571306953) at
/usr/src/sys/amd64/amd64/trap.c:872
#4  0x80776112 in trap_pfault (frame=0xff8236165030, usermode=0) at
/usr/src/sys/amd64/amd64/trap.c:730
#5  0x807768ee in trap (frame=0xff8236165030) at
/usr/src/sys/amd64/amd64/trap.c:463
#6  0x807601e3 in calltrap () at 
/usr/src/sys/amd64/amd64/exception.S:228
#7  0x81162c96 in dtrace_probe_lookup (prid=0, mod=0xff82361651e0 
,
func=0xff82361651a0 stat, name=0xff8236165160 reg)
at
/usr/src/sys/modules/dtrace/dtrace/../../../cddl/contrib/opensolaris/uts/common/dtrace/dtrace.c:7876
#8  0x810c2164 in sdt_probe_callback (probe=0x80b5e760, 
arg=value
optimized out) at 
/usr/src/sys/modules/dtrace/sdt/../../../cddl/dev/sdt/sdt.c:133
#9  0x8058f9ec in sdt_probe_listall (prov=value optimized out,
callback_func=0x810c20e5 sdt_probe_callback, arg=0x0) at
/usr/src/sys/kern/kern_sdt.c:269
#10 0x810c224b in sdt_provider_entry (prov=value optimized out,
arg=value optimized out) at
/usr/src/sys/modules/dtrace/sdt/../../../cddl/dev/sdt/sdt.c:145
#11 0x8058f26f in sdt_provider_listall_locked
(callback_func=0x810c2236 sdt_provider_entry, arg=0x0) at
/usr/src/sys/kern/kern_sdt.c:246
#12 0x8058facb in sdt_provider_listall (callback_func=0x810c2236
sdt_provider_entry, arg=0x0) at /usr/src/sys/kern/kern_sdt.c:230
#13 0x810c2234 in sdt_provide_probes (arg=0x0, desc=0xff82361651e0) 
at
/usr/src/sys/modules/dtrace/sdt/../../../cddl/dev/sdt/sdt.c:154
#14 0x8116249c in dtrace_probe_provide (desc=0x0, 
prv=0xfe0112520800)
at
/usr/src/sys/modules/dtrace/dtrace/../../../cddl/contrib/opensolaris/uts/common/dtrace/dtrace.c:7984
#15 0x81162825 in dtrace_enabling_provide (prv=0xfe0112520800) at
/usr/src/sys/modules/dtrace/dtrace/../../../cddl/contrib/opensolaris/uts/common/dtrace/dtrace.c:11607
#16 0x81167d1d in dtrace_register (name=0x808a3bc7 proc,
pap=0x810c2398, priv=2, cr=0x0, pops=0x810c23c0, arg=0x0,
idp=0x80b48c28)
at
/usr/src/sys/modules/dtrace/dtrace/../../../cddl/contrib/opensolaris/uts/common/dtrace/dtrace.c:7486
#17 0x810c21f4 in sdt_provider_reg_callback (prov=value optimized out,
arg=value optimized out) at
/usr/src/sys/modules/dtrace/sdt/../../../cddl/dev/sdt/sdt.c:182
#18 0x8058f26f in sdt_provider_listall_locked
(callback_func=0x810c21be sdt_provider_reg_callback, arg=0x0) at
/usr/src/sys/kern/kern_sdt.c:246
#19 0x8058f7da in sdt_register_callbacks 
(register_prov=0x810c21be
sdt_provider_reg_callback, reg_prov_arg=0x0, 
deregister_prov=0x810c21af
sdt_provider_unreg_callback,
dereg_prov_arg=0x0, register_probe=0x810c20e5 sdt_probe_callback,
reg_probe_arg=0x0) at /usr/src/sys/kern/kern_sdt.c:320
#20 0x810c20a8 in sdt_load (dummy=value optimized out) at
/usr/src/sys/modules/dtrace/sdt/../../../cddl/dev/sdt/sdt.c:195
#21 0x80573e23 in linker_load_module (kldname=value optimized out,
modname=0xfe000a156800 sdt, parent=0x0, verinfo=0x0,
lfpp=0xff8236165928) at /usr/src/sys/kern/kern_linker.c:241
#22 0x8057436f in kern_kldload (td=0xfe012812b940, file=0x0,
fileid=0xff8236165974) at /usr/src/sys/kern/kern_linker.c:1041
#23 0x80574533 in sys_kldload (td=0xfe012812b940, uap=value 
optimized
out) at /usr/src/sys/kern/kern_linker.c:1075
#24 0x807754e4 in amd64_syscall (td=0xfe012812b940, traced=0) at
subr_syscall.c:135
#25 0x807604c7 in Xfast_syscall () at 
/usr/src/sys/amd64/amd64/exception.S:387

My understanding is that this happens if sdt.ko is loaded later than some other
dtrace module(s)/providers and there are any (non-sdt) probes active at the time
of loading.
It seems to be caused by sdt_provider_entry() being called on a sdt-based 
provider
that has not been assigned an id (yet).

-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: call suspend_cpus() under smp_ipi_mtx

2013-04-06 Thread Andriy Gapon

on 04/04/2013 20:34 Andriy Gapon said the following:
 This seems to work without problems or any warnings with WITNESS 
 !WITNESS_SKIPSPIN, but it is very possible that I am not exercising all the
 relevant code paths.
 
 P.S. Looking through history it seems that in r169391 intr_table_lock
 was changed from a spinlock to a sx lock (later it was changed to the current
 mutex).  The commit message explains why spinlock was not needed, but it 
 doesn't
 seem to say why it was unacceptable:

Hmm, looks like I missed the elephant.
apic_free_vector is called with intr_table_lock held and it calls sched_bind().

I need to re-think all of think from the scratch...

-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: close(2) while accept(2) is blocked

2013-04-04 Thread Andriy Gapon

on 30/03/2013 18:14 John-Mark Gurney said the following:
 As someone else pointed out in this thread, if a userland program
 depends upon this behavior, it has a race condition in it...
 
 Thread 1  Thread 2Thread 3
   enters routine to read
 enters routine to close
 calls close(3)
   open() returns 3
   does read(3) for orignal fd
 
 How can the original threaded program ensure that thread 2 doesn't
 create a new fd in between?  So even if you use a lock, this won't
 help, because as far as I know, there is no enter read and unlock
 mutex call yet...
 
 I decided long ago that this is only solvable by proper use of locking
 and ensuring that if you call close (the syscall), that you do not have
 any other thread that may use the fd.  It's the close routine's (not
 syscall) function to make sure it locks out other threads and all other
 are out of the code path that will use the fd before it calls close..
 
 If someone could describe how this new eject a person from read could
 be done in a race safe way, then I'd say go ahead w/ it...  Otherwise
 we're just moving the race around, and letting people think that they
 have solved the problem when they haven't...
 
 I think I remeber another thread about this from a year or two ago,
 but I couldn't find it...  If someone finds it, posting a link would
 be nice..
 

I wish to abstract as much as possible from how an application may use, misuse 
or
even abuse the close+ interaction.  But I think that the behavior that
provides more information / capabilities is preferable over the behavior that 
does
not.  E.g. your example above does not apply to a utility that has only two
threads.  The three threads problem can also be solved if all the threads
cooperate.  But as I've said.

-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: call suspend_cpus() under smp_ipi_mtx

2013-04-04 Thread Andriy Gapon

on 01/04/2013 17:52 John Baldwin said the following:
 Hmm, I think intr_table_lock used to be a spin lock at some point.  I don't 
 remember
 why we changed it to a regular mutex.  It may be that there was a lock order 
 reason
 for that. :(

I came up with the following patch:
http://people.freebsd.org/~avg/intr_table_lock.diff

Please note the witness change.  Part of it is to prepare for smp_ipi_mtx -
intr_table_lock order, but I also had to move entropy harvest mutex because it
is used with msleep_spin. Also intr_table_lock interacts with sched lock.

This seems to work without problems or any warnings with WITNESS 
!WITNESS_SKIPSPIN, but it is very possible that I am not exercising all the
relevant code paths.

P.S. Looking through history it seems that in r169391 intr_table_lock
was changed from a spinlock to a sx lock (later it was changed to the current
mutex).  The commit message explains why spinlock was not needed, but it doesn't
seem to say why it was unacceptable:

 Minor fixes and tweaks to the x86 interrupt code: - Split the intr_table_lock
into an sx lock used for most things, and a spin lock to protect intrcnt_index.
Originally I had this as a spin lock so interrupt code could use it to lookup
sources. However, we don't actually do that because it would add a lot of 
overhead
to interrupts, and if we ever do support removing interrupt sources, we can use
other means to safely do so w/o locking in the interrupt handling code. - 
Replace
is_enabled (boolean) with is_handlers (a count of handlers) to determine if a
source is enabled or not. This allows us to notice when a source is no longer in
use. When that happens, we now invoke a new PIC method (pic_disable_intr()) to
inform the PIC driver that the source is no longer in use. The I/O APIC driver
frees the APIC IDT vector when this happens. The MSI driver no longer needs to
have a hack to clear is_enabled during msi_alloc() and msix_alloc() as a result 
of
this change as well. - Add an apic_disable_vector() to reset an IDT vector back 
to
Xrsvd to complement apic_enable_vector() and use it in the I/O APIC and MSI code
when freeing an IDT vector. - Add a new nexus hook: nexus_add_irq() to ask the
nexus driver to add an IRQ to its irq_rman. The MSI code uses this when it 
creates
new interrupt sources to let the nexus know about newly valid IRQs. Previously 
the
msi_alloc() and msix_alloc() passed some extra stuff back to the nexus methods
which then added the IRQs. This approach is a bit cleaner. - Change the MSI sx
lock to a mutex. If we need to create new sources, drop the lock, create the
required number of sources, then get the lock and try the allocation again.


-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: close(2) while accept(2) is blocked

2013-04-04 Thread Andriy Gapon

on 01/04/2013 18:22 John Baldwin said the following:
 I think you need to split the 'struct file' reference count into two different
 counts similar to the how we have vref/vrele vs vhold/vdrop for vnodes.  The
 fget for accept and probably most other system calls should probably be 
 equivalent
 to vhold, whereas things like open/dup (and storing an fd in a cmsg) should be
 more like vref.  close() should then be a vrele().

This model makes perfect sense.
Unfortunately, ENOTIME to work on this.

Meanwhile I am using the following patch specific to local domain sockets,
accept(2) and shutdown(2).  Turns out that the problematic application does both
shutdown(RDWR) and close(2), but that doesn't help on FreeBSD.
BTW, this is the application:
http://thread.gmane.org/gmane.os.freebsd.devel.office/1754
The patch does help.

Author: Andriy Gapon a...@icyb.net.ua
Date:   Thu Mar 28 20:08:13 2013 +0200

[test!] uipc_shutdown: use soisdisconnected instead of socantsendmore

So that in addition to disabling sends we also wake up threads blocked
in accept (on so_timeo in general).

diff --git a/sys/kern/uipc_usrreq.c b/sys/kern/uipc_usrreq.c
index 9b60eab..e93d46c 100644
--- a/sys/kern/uipc_usrreq.c
+++ b/sys/kern/uipc_usrreq.c
@@ -1074,7 +1074,7 @@ uipc_shutdown(struct socket *so)

UNP_LINK_WLOCK();
UNP_PCB_LOCK(unp);
-   socantsendmore(so);
+   soisdisconnected(so);
unp_shutdown(unp);
UNP_PCB_UNLOCK(unp);
UNP_LINK_WUNLOCK();


-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

clang: -mno-omit-leaf-frame-pointer

2013-03-30 Thread Andriy Gapon


It seems that, unlike gcc, for clang -fno-omit-frame-pointer does not imply
-mno-omit-leaf-frame-pointer.  This is probably a bug.

Meanwhile I would like to propose the following amd64-specific patch.  Perhaps
the same type of change would be useful for powerpc as well.

I would like this change primarily for DTrace (fbt), but other
debugging/profiling code may benefit from it as well.

I chose to make -mno-omit-leaf-frame-pointer not conditional on clang, because
with gcc it is just a nop (i.e. it doesn't hurt anything).

--- a/sys/conf/Makefile.amd64
+++ b/sys/conf/Makefile.amd64
@@ -32,7 +32,7 @@ S=../../..
 .include $S/conf/kern.pre.mk

 .if !empty(DDB_ENABLED) || !empty(DTR_ENABLED) || !empty(HWPMC_ENABLED)
-CFLAGS+=   -fno-omit-frame-pointer
+CFLAGS+=   -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer
 .endif

 MKMODULESENV+= MACHINE=amd64
diff --git a/sys/conf/kmod.mk b/sys/conf/kmod.mk
index f0d3d4d..7eaba85 100644
--- a/sys/conf/kmod.mk
+++ b/sys/conf/kmod.mk
@@ -122,7 +122,7 @@ LDFLAGS+=   -d -warn-common

 CFLAGS+=   ${DEBUG_FLAGS}
 .if ${MACHINE_CPUARCH} == amd64
-CFLAGS+=   -fno-omit-frame-pointer
+CFLAGS+=   -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer
 .endif

 .if ${MACHINE_CPUARCH} == powerpc

-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

big change to devfs rules path matching

2013-03-28 Thread Andriy Gapon

on 25/03/2013 22:44 Andriy Gapon said the following:
 
 Would like to ask for opinions on this topic...
 Please read this PR for context:
 http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/122838
 Especially Jaakko's insightful description of the problem.

So I would like to commit the following patch sooner rather than later:
http://people.freebsd.org/~avg/devfs_rule.diff
The only difference from the Jaakko's patch in the PR is FNM_PATHNAME.

Please review and test.
Especially if you rely on any non-trivial devfs rules.

Thank you.
-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

close(2) while accept(2) is blocked

2013-03-28 Thread Andriy Gapon


So, this started as a simple question, but the answer was quite unexpected to 
me.

Let's say we have an opened and listen-ed socket and let's assume that we know
that one thread is blocked in accept(2) and another thread is calling close(2).
What is going to happen?

Turns out that practically nothing.  For kernel the close call would be almost 
a nop.
My understanding is this:
- when socket is created, its reference count is 1
- when accept(2) is called, fget in kernel increments the reference count (kept 
in
an associated struct file)
- when close(2) is called, the reference count is decremented

The reference count is still greater than zero, so fdrop does not call fo_close.
That means that in the case of a socket soclose is not called.

I am sure that the reference counting in this case is absolutely correct with
respect to managing kernel side structures.  But I am not that it is correct 
with
respect to hiding the explicit close(2) call from other threads that may be
waiting on the socket.
In other words, I am not sure if fo_close is supposed to signify that there are 
no
uses of a file, or that userland close-d the file.  Or perhaps these should be 
two
different methods.

Additional note is that shutdown(2) doesn't wake up the thread in accept(2)
either.  At least that's true for unix domain sockets.
Not sure if this is a bug.

But the summary seems to be is that currently it is not possible to break a 
thread
out of accept(2) (at least without resorting to signals).

-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: Debugging kernel cores without a stack

2013-03-25 Thread Andriy Gapon

on 25/03/2013 02:19 Joshua Isom said the following:
 I thought the debugger would have worked regardless.

No, kgdb and libkvm have to be in sync with kernel.
Unfortunately.

-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Fwd: kern/122838: [devfs] devfs doesn't handle complex paths (like zvol/pool/vms) good

2013-03-25 Thread Andriy Gapon


Would like to ask for opinions on this topic...
Please read this PR for context:
http://www.freebsd.org/cgi/query-pr.cgi?pr=kern/122838
Especially Jaakko's insightful description of the problem.


 Original Message 
Message-ID: 5150b598.7050...@freebsd.org
Date: Mon, 25 Mar 2013 22:37:44 +0200
From: Andriy Gapon a...@freebsd.org
Subject: Re: kern/122838: [devfs] devfs doesn#39;t handle complex paths (like
zvol/pool/vms) good


Can't believe that we are still where we were more than two years ago...

I think that we have to make this change even if it _might_ break some existing
rulesets.

Rationale:
- current behavior is contrary to any documentation
- current behavior is contrary to common sense
- current behavior is very hard to describe and account for
- I presume that very few people actually fully understand the current behavior
- I presume that even fewer people made a conscious choice to depend or make use
of its non-trivial features of the current behavior

So, we should make the behavior of devfs pattern consistent with the
documentation and the common sense.

In addition to Jaakko's patch I propose that we pass FNM_PATHNAME to fnmatch(9),
so that the matching is indeed consistent with glob(3) / shell glob-ing rules
for filesystem paths.

-- 
Andriy Gapon


___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: call suspend_cpus() under smp_ipi_mtx

2013-03-23 Thread Andriy Gapon


Looks like this issue needs more thinking and discussing.

The basic idea is that suspend_cpus() must be called with smp_ipi_mtx held (on
SMP systems).
This is for exactly the same reasons as to why we first take smp_ipi_mtx before
calling stop_cpus() in the shutdown path.  Essentially one CPU could be holding
smp_ipi_mtx (and thus with interrupts disabled[*]) and waiting for an
acknowledgement from other CPUs (e.g. in smp_rendezvous or in a TLB shootdown),
while another CPU could be with interrupts disabled (explicitly - like in the
shutdown or ACPI suspend paths) and trying to deliver an IPI to other CPUs.

In my opinion, we must consistently use the same lock, smp_ipi_mtx, for all
regular (non-NMI) synchronous IPI-based communication between CPUs.  Otherwise a
deadlock is quite possible.

Some obstacles for just going ahead and making the suggested change:

- acpi_sleep_machdep() calls intr_suspend() with interrupts disabled; currently
witness(9) is not aware of that, but if smp_ipi_mtx spin-lock is used, then we
would have to make intr_table_lock and msi_lock the spin-locks as well;
- AcpiLeaveSleepStatePrep() (from ACPICA) is called with interrupts disabled and
currently it performs an action that requires memory allocation; again, with
interrupts disabled via intr_disable() this fact is not visible to witness, etc,
but with smp_ipi_mtx it needs to be somehow handled.

I talked to ACPICA guys about the last issue and they told me that what is
currently done in AcpiLeaveSleepStatePrep does not need to be with interrupts
disabled and can be moved to AcpiLeaveSleepState.  This is after the _BFS and
_GTS support was removed.

What do you think?
Thank you.
-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

mountroot event

2013-03-20 Thread Andriy Gapon


I would like to propose the following change.

My understanding is that it was never a true intention to post 'mountroot' event
on every set_rootvnode() call, but rather an accident in the original commit.
But I could be wrong here.  In either case I think that it is more appropriate
to post the event only once.  I do not expect that there could be any consumers
interested in all the details of root fs manipulations.

It looks like there is just a single subscriber to this event in
subr_firmware.c.  I am not familiar with that code, so I would like to ask for
confirmation that the proposed change won't break anything there.

Thank you.

commit 9dc8eaa50afa6ac88c44fbaad82509721e106f1a
Author: Andriy Gapon a...@icyb.net.ua
Date:   Wed Mar 6 08:57:35 2013 +0200

post mountroot event after a real/final root is mounted

not every time an intermediate root (including the first devfs) is mounted.

diff --git a/sys/kern/vfs_mountroot.c b/sys/kern/vfs_mountroot.c
index 147926e..162738d 100644
--- a/sys/kern/vfs_mountroot.c
+++ b/sys/kern/vfs_mountroot.c
@@ -199,8 +199,6 @@ set_rootvnode(void)
VREF(rootvnode);

FILEDESC_XUNLOCK(p-p_fd);
-
-   EVENTHANDLER_INVOKE(mountroot);
 }

 static int
@@ -991,6 +989,8 @@ vfs_mountroot(void)
atomic_store_rel_int(root_mount_complete, 1);
wakeup(root_mount_complete);
mtx_unlock(mountlist_mtx);
+
+   EVENTHANDLER_INVOKE(mountroot);
 }

 static struct mntarg *

-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

SI_SUB_DTRACE_PROVIDER

2013-03-19 Thread Andriy Gapon


Does anyone know what is the point of SYSINIT(SI_SUB_DTRACE_PROVIDER)?
Specifically, as opposed to normal MOD_LOAD/MOD_UNLOAD handling.
I am thoroughly confused about DTrace modules using DEV_MODULE() with the dummy
handlers and then using SYSINIT/SYSUNINIT when they could use
DECLARE_MODULE(SI_SUB_DTRACE_PROVIDER).  Especially this is concerning because
return values from SYSUNINIT functions are ignored.  Unlike MOD_UNLOAD.

-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: dtrace: operands have incompatible types: dmu_buf_t = dmu_buf_t

2013-03-14 Thread Andriy Gapon


First, this link
http://docs.oracle.com/cd/E37670_01/E38608/html/dt_typcondef_dlang.html has a
rather good description in section 2.13.4 of how types are managed in DTrace and
of the special C and D namespaces/modules.

on 12/03/2013 16:14 Andriy Gapon said the following:
 
 For your amusement:
 
 = testcase.d =
 dmu_buf_t **buf;

CTF data in my kernel contains these types:
1964 STRUCT dmu_buf (32 bytes)
1965 TYPEDEF dmu_buf_t refers to 1964
1966 POINTER __anon__ refers to 1965
[2501] POINTER __anon__ refers to 1966

Type 2501 is dmu_buf_t ** and DTrace uses this type for 'buf'.

 /* Remove the following line to defuse. */
 bpobj_t *bpobj;

Now, there are the following types related to bpobj_t [*footnote]:
8775 STRUCT bpobj (80 bytes)
8776 TYPEDEF bpobj_t refers to 8775
but there is no type for POINTER that refers to 8776.

So, as described in the referenced document, dtrace creates bpobj_t* type in the
D module.  For that it also has to copy bpobj_t and struct bpobj types and also
types for each member in struct bpobj and so on.
As a result, copies of struct dmu_buf and typedef dmu_buf_t are also created in
D namespace.

These copied types have different type IDs, but all other of their properties
are supposed to be such that the CTF handling code should be able to determine
that the copies and the originals are equivalent types.

 fbt::dmu_bonus_hold:entry
 {
   buf = args[3]; /* the error is about this line */
 }

When DTrace resolves types for arguments of 'dmu_bonus_hold' it first looks in C
and D namespaces and only then it looks in the kernel CTF data.
So, now dmu_buf_t ** is resolved to a copy of the original type.

Unfortunately, there is a quirk that may lead to an error like incompatible
types dmu_buf_t ** and dmu_buf_t **.
The CTF code assumes that the types like a pointer to another type or a
qualifier plus another type are always anonymous.  And in fact they are in C
language (note that I am talking about raw things like char* or const int, and
not about typedefs).  Also, the DWARF specification mandates that anonymous
types either should not have a name attribute at all or its value should be an
empty string (a single zero byte).
For the above reasons, when the CTF code creates a copy of a type such as a
pointer to another type, it simply sets a name of the copy to an empty string.
And here is the quirk: in a violation of the DWARF specification our libdwarf
sets names of all anonymous types to __anon__.  This name surely looks special
to a human, but unfortunately there is nothing special about it to the code.
In the end, the original type with __anon__ name and the copied type with
empty name are no longer considered to be equivalent.


The following patch seems to resolve the problem:
--- a/lib/libdwarf/dwarf_die.c
+++ b/lib/libdwarf/dwarf_die.c
@@ -29,8 +29,6 @@
 #include stdlib.h
 #include _libdwarf.h

-static const char *anon_name = __anon__;
-
 int
 dwarf_die_add(Dwarf_CU cu, int level, uint64_t offset, uint64_t abnum,
Dwarf_Abbrev a, Dwarf_Die *diep, Dwarf_Error *err)
 {
@@ -57,7 +55,7 @@ dwarf_die_add(Dwarf_CU cu, int level, uint64_t offset,
uint64_t abnum, Dwarf_Abb
die-die_abnum  = abnum;
die-die_a  = a;
die-die_cu = cu;
-   die-die_name   = anon_name;
+   die-die_name   = ;

/* Initialise the list of attribute values. */
STAILQ_INIT(die-die_attrval);


[*footnote] In fact I see that for some, unknown to me, reason there are more
than one entries for struct bpobj and typedef bpobj_t in the ctf data.  For one
of those entries there is a related bpobj_t pointer entry.  But as far as I can
tell from the code, it's always the last entry with a given name that is
actually used as a type definition for that name.
-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

dtrace: operands have incompatible types: dmu_buf_t = dmu_buf_t

2013-03-12 Thread Andriy Gapon


For your amusement:

= testcase.d =
dmu_buf_t **buf;
/* Remove the following line to defuse. */
bpobj_t *bpobj;

fbt::dmu_bonus_hold:entry
{
buf = args[3]; /* the error is about this line */
}
===

I think I know what's going on, but I am still making sure that that's true.

-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

dtrace -c patch

2013-03-07 Thread Andriy Gapon


$ dtrace -n 'blah blah' -c 'touch /'
dtrace: failed to control pid 16473: process exited with status 0

I think that we need the following change:

--- a/cddl/contrib/opensolaris/lib/libdtrace/common/dt_open.c
+++ b/cddl/contrib/opensolaris/lib/libdtrace/common/dt_open.c
@@ -1115,7 +1115,7 @@ alloc:
 #if defined(sun)
dtp-dt_prcmode = DT_PROC_STOP_PREINIT;
 #else
-   dtp-dt_prcmode = DT_PROC_STOP_MAIN;
+   dtp-dt_prcmode = DT_PROC_STOP_POSTINIT;
 #endif
dtp-dt_linkmode = DT_LINK_KERNEL;
dtp-dt_linktype = DT_LTYP_ELF;


Rationale:
- we don't use DT_PROC_STOP_PREINIT like solaris does, because for some reason
that I haven't investigated we the following line ifdef-ed out in 
dt_proc_attach:
dt_proc_rdwatch(dpr, RD_PREINIT, RD_PREINIT);

- 'main' symbol can't always be reliably resolved:
$ readelf -a -W /usr/bin/touch | fgrep main
$

So, it seems that postinit is where we can reliable catch a process.
What do you think?

-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: clang generated code sometimes confuses fbt

2013-03-05 Thread Andriy Gapon

on 04/03/2013 13:50 Dimitry Andric said the following:
 Actually, that way of fixing breaks it for gcc, so it should really be
 fixed in sort_iidescs() instead.  See attached diff; I verified it works
 for both gcc-compiled and clang-compiled objects.

Impossible!  It looks like the attached diff was a nop change :-)

BTW, so changing clang to match gcc is not an option?
I do not insist, just curious.

-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: clang generated code sometimes confuses fbt

2013-03-04 Thread Andriy Gapon

on 04/03/2013 09:57 Matt Burke said the following:
 On 03/02/13 17:35, Andriy Gapon wrote:
 
 To summarize: I would be glad of either clang generated code was
 fbt-friendly or if ctf information was generated for
 bpobj_iterate_impl. Either is perfect for me.
 
 Apologies if this is a silly suggestion,

Apologies, but yes, it is :-)

 but are you building with -O0?
 I've noticed backtraces being hard to follow (i.e. skipping intermediate
 and wrapper functions) with a kernel built with the default optimisation
 level, and it seems to me that the same thing may be happening here - the
 wrappers are being optimised out.
 

If they were optimized out, then I wouldn't be able to provide their disassembly
in my original email, would I?


-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: clang generated code sometimes confuses fbt

2013-03-03 Thread Andriy Gapon

on 02/03/2013 22:23 Dimitry Andric said the following:
 
 Have you verified that ctfconvert does the right thing, if you modify
 the FILE symbol to have just the filename?

No, I haven't.
How can I test that?

However my reading of the code makes me believe that that would help.

-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: clang generated code sometimes confuses fbt

2013-03-03 Thread Andriy Gapon

on 03/03/2013 17:45 Dimitry Andric said the following:
 Debug log of ctfconvert operating on a gcc-compiled bpobj.o:
 
   http://www.andric.com/freebsd/ctfconvert-bpobj-gcc.log
 
 The same, but on a clang-compiled bpobj.o:
 
   http://www.andric.com/freebsd/ctfconvert-bpobj-clang.log
 
 This latter log does not change at all, if you change the FILE symbol to
 just bpobj.c.  So there seems to something else in (probably) the
 debug information that confuses ctfconvert.

Sorry, but these two logs look so different to each other and your clang log is
so different from a log that I get here with that I believe that something is
wrong in your test.

-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

clang generated code sometimes confuses fbt

2013-03-02 Thread Andriy Gapon


I observe the following problem.

There are two tiny wrapper functions around a larger implementation function:
int
bpobj_iterate(bpobj_t *bpo, bpobj_itor_t func, void *arg, dmu_tx_t *tx)
{
return (bpobj_iterate_impl(bpo, func, arg, tx, B_TRUE));
}
int
bpobj_iterate_nofree(bpobj_t *bpo, bpobj_itor_t func, void *arg, dmu_tx_t *tx)
{
return (bpobj_iterate_impl(bpo, func, arg, tx, B_FALSE));
}

On a clang compiled system:
$ dtrace -l | fgrep bpobj_iterate
 1483fbtkernelbpobj_iterate_impl entry
 1484fbtkernelbpobj_iterate_impl return

On a gcc compiled system:
dtrace -l | fgrep bpobj_iterate
  647fbtkernelbpobj_iterate_impl entry
  648fbtkernelbpobj_iterate_impl return
20656fbtkernel bpobj_iterate entry
20657fbtkernel bpobj_iterate return
28426fbtkernel  bpobj_iterate_nofree entry
28427fbtkernel  bpobj_iterate_nofree return

Examination reveals why that is so.
clang:
Dump of assembler code for function bpobj_iterate:
0x802d5a90 bpobj_iterate+0:   mov$0x1,%r8d
0x802d5a96 bpobj_iterate+6:   jmp0x802d5aa0
bpobj_iterate_impl

gcc:
Dump of assembler code for function bpobj_iterate:
0x802d3f43 bpobj_iterate+0:   push   %rbp
0x802d3f44 bpobj_iterate+1:   mov%rsp,%rbp
0x802d3f47 bpobj_iterate+4:   mov$0x1,%r8d
0x802d3f4d bpobj_iterate+10:  callq  0x802d3787
bpobj_iterate_impl
0x802d3f52 bpobj_iterate+15:  pop%rbp
0x802d3f53 bpobj_iterate+16:  retq

So quite obviously fbt can not really entry/return points for the clang 
function.

This is not a big problem on its own, of course, but here is a bad twist.
On the clang system:
$ ctfdump -f /boot/kernel/kernel | fgrep bpobj_iterate
  [8975] FUNC (bpobj_iterate) returns: 24 args: (2601, 4824, 34, 2221)
  [13093] FUNC (bpobj_iterate_nofree) returns: 24 args: (2601, 4824, 34, 2221)

Now that's the problem: fbt sees only bpobj_iterate_impl, but CTF data is
generated/present only for bpobj_iterate and bpobj_iterate_nofree.

On the gcc system:
ctfdump -f /boot/kernel/kernel | fgrep bpobj_iterate
  [323] FUNC (bpobj_iterate_impl) returns: 1 args: (5153, 5661, 6, 5078, 1350)
  [10439] FUNC (bpobj_iterate) returns: 1 args: (5153, 5661, 6, 5078)
  [14377] FUNC (bpobj_iterate_nofree) returns: 1 args: (5153, 5661, 6, 5078)

To summarize: I would be glad of either clang generated code was fbt-friendly
or if ctf information was generated for bpobj_iterate_impl.  Either is perfect
for me.

Now, I am not quite sure why ctfconvert skips bpobj_iterate_impl in the
clang-generated code.  Seems like some sort of a bug in ctfconvert.

-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: clang generated code sometimes confuses fbt

2013-03-02 Thread Andriy Gapon

on 02/03/2013 19:35 Andriy Gapon said the following:
 Now, I am not quite sure why ctfconvert skips bpobj_iterate_impl in the
 clang-generated code.  Seems like some sort of a bug in ctfconvert.

It seems that gcc and clang put different names for symbol of type FILE:
clang:
readelf -a -W /usr/obj/usr/src/sys/TRANT/bpobj.o| fgrep -w FILE
 1:  0 FILELOCAL  DEFAULT  ABS
/usr/src/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/bpobj.c

gcc:
readelf -a -W /usr/obj/usr/src/sys/ODYSSEY/bpobj.o| fgrep -w FILE
 1:  0 FILELOCAL  DEFAULT  ABS bpobj.c

ctfconvert seems to compare this value with bpobj.c and so in the clang case
it doesn't recognize the static symbols.

Does my analysis seem reasonable?
-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

memory allocation in spinlock context

2013-03-01 Thread Andriy Gapon


I am trying to understand if it is possible to allow memory allocations 
(M_NOWAIT,
of course) in a spinlock context.
I do not see any obvious architectural obstacles.
But the fact that all of the uma locks, system map lock, object locks, page 
queue
locks and so on are regular mutexes makes it impossible to allocate memory 
without
violating the fundamental lock ordering rules.

Could all of the above mentioned locks potentially be converted to spin mutexes?
(And that seems to be a large nasty change)
Are there any alternative possibilities?

BTW, currently we have at least one place where a memory allocation of this kind
is done stealthily (and thus dangerously?).  ACPI resume code must execute
AcpiLeaveSleepStatePrep with interrupts disabled and ACPICA code performs memory
allocations in that code path.  Since the interrupts are disabled by means of
intr_disable(), witness(9) and similar are completely oblivious of the fact.

-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: memory allocation in spinlock context

2013-03-01 Thread Andriy Gapon

on 01/03/2013 16:22 Matthew Jacob said the following:
 On 3/1/2013 5:50 AM, Andriy Gapon wrote:
 I am trying to understand if it is possible to allow memory allocations 
 (M_NOWAIT,
 of course) in a spinlock context.

 There are mechanisms to do just this- essentially by creating private pools 
 that
 are organized in a way to allow for spinlock (and thus possible interrupt 
 level)
 safe allocations (and failure if the pool is empty). Are you trying to make a
 general mechanism for this?

I am just pondering what would it take to develop such a general mechanism.

-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: kgdb modules

2013-02-13 Thread Andriy Gapon

on 04/02/2013 21:58 John Baldwin said the following:
 You can also load modules manually by
 using the add-kld command (give it a full path to an individual module).  You
 may need to use 'nosharedlibrary' to unload symbols from the wrong module
 before add-kld will be useful however.

I think that this approach is superior to what I suggested.
BTW, thank you for 'nosharedlibrary' - I learned a new thing about gdb :-)

-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: Reviewing a FAQ change about LORs

2013-02-09 Thread Andriy Gapon

on 09/02/2013 01:51 Eitan Adler said the following:
 On 8 February 2013 16:31, Ian Lepore i...@freebsd.org wrote:
 On Thu, 2013-02-07 at 19:32 -0500, Eitan Adler wrote:
 Does someone here mind reviewing
 http://www.freebsd.org/cgi/query-pr.cgi?pr=174226 for correctness.

 Please feel free to post alternate diffs as a reply as well.


 Does it make sense to reference a web page on LOR status that hasn't
 been updated in four years?
 
 I was unaware of this, which is the reason I asked for review. ;)
 
 Is there an updated page or is there no such service anymore?

I suspect that the list of LORs doesn't get updated because we don't get many
new LORs that here are to stay.  Those old LORs are well known, harmless and
hard to fix.  We try to not introduce any new LORs of that kind.  So the new
LORs are either not introduced or getting fixed.
Hence no strong need for an up-to-date list.

It also seems that the interest in LORs diminished over the years as FreeBSD SMP
/ locking stabilized to the point of being taken for granted (as opposed to the
early SMP days).  So nobody (except developers adding new locks) really looks at
LORs until a deadlock/livelock is really hit.

On the other hand, the referenced page looks like new reports are welcome and
get actually processes, which is not true.  Also, the list of fixed/patched LORs
has no practical use.  Additionally, many LORs there are duplicates (e.g. a LOR
between devfs and any other fs during unmount is replicated for many values of
other fs).  There also seem to be some fixed LORs, etc.

It probably would make sense to reference some static page with a list of some
well known LORs.  But that page doesn't seem to be very useful.

-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: dynamically calculating NKPT [was: Re: huge ktr buffer]

2013-02-06 Thread Andriy Gapon

on 05/02/2013 01:05 Neel Natu said the following:
 Hi,
 
 I have a patch to dynamically calculate NKPT for amd64 kernels. This
 should fix the various issues that people pointed out in the email
 thread.
 
 Please review and let me know if there are any objections to committing this.
 
 Also, thanks to Alan (alc@) for reviewing and providing feedback on
 the initial version of the patch.
 
 Patch (also available at 
 http://people.freebsd.org/~neel/patches/nkpt_diff.txt):

It seems I am a little bit late with a review, but not too late :-)
Some comments below:

 Index: sys/amd64/include/pmap.h
 ===
 --- sys/amd64/include/pmap.h  (revision 246277)
 +++ sys/amd64/include/pmap.h  (working copy)
 @@ -113,13 +113,7 @@
   ((unsigned long)(l2)  PDRSHIFT) | \
   ((unsigned long)(l1)  PAGE_SHIFT))
 
 -/* Initial number of kernel page tables. */
 -#ifndef NKPT
 -#define  NKPT32
 -#endif

I think that we could still keep this, if the below code is done slightly 
different:

[snip]
 +/* number of kernel PDP slots */
 +#define  NKPDPE(ptpgs)   howmany((ptpgs), NPDEPG)
 +
  static void
 +nkpt_init(vm_paddr_t addr)
 +{
 + int pt_pages;
 + 
 +#ifdef NKPT
 + pt_pages = NKPT;
 +#else
 + pt_pages = howmany(addr, 1  PDRSHIFT);

A very minor cosmetic note: perhaps NBPDR would look more concise here.

 + pt_pages += NKPDPE(pt_pages);
 +
 + /*
 +  * Add some slop beyond the bare minimum required for bootstrapping
 +  * the kernel.
 +  *
 +  * This is quite important when allocating KVA for kernel modules.
 +  * The modules are required to be linked in the negative 2GB of
 +  * the address space.  If we run out of KVA in this region then
 +  * pmap_growkernel() will need to allocate page table pages to map
 +  * the entire 512GB of KVA space which is an unnecessary tax on
 +  * physical memory.
 +  */
 + pt_pages += 4;  /* 8MB additional slop for kernel modules */
 +#endif
 + nkpt = pt_pages;
 +}

I would slightly re-organize this code so that it uses NKPT, if defined, as a
default value for nkpt.  Then, only if the calculated value is greater then it
would override the default.
There are tradeoffs, of course.  So I am just voicing my opinion/preference.

The slack thing is a little bit imperfect, but I am not a perfectionist :-)

Thank you very much for this great feature.

 +static void
  create_pagetables(vm_paddr_t *firstaddr)
  {
 - int i, j, ndm1g;
 + int i, j, ndm1g, nkpdpe;
 
 - /* Allocate pages */
 - KPTphys = allocpages(firstaddr, NKPT);
 - KPML4phys = allocpages(firstaddr, 1);
 - KPDPphys = allocpages(firstaddr, NKPML4E);
 - KPDphys = allocpages(firstaddr, NKPDPE);
 -
 + /* Allocate page table pages for the direct map */
   ndmpdp = (ptoa(Maxmem) + NBPDP - 1)  PDPSHIFT;
   if (ndmpdp  4) /* Minimum 4GB of dirmap */
   ndmpdp = 4;
 @@ -517,6 +546,22 @@
   DMPDphys = allocpages(firstaddr, ndmpdp - ndm1g);
   dmaplimit = (vm_paddr_t)ndmpdp  PDPSHIFT;
 
 + /* Allocate pages */
 + KPML4phys = allocpages(firstaddr, 1);
 + KPDPphys = allocpages(firstaddr, NKPML4E);
 +
 + /*
 +  * Allocate the initial number of kernel page table pages required to
 +  * bootstrap.  We defer this until after all memory-size dependent
 +  * allocations are done (e.g. direct map), so that we don't have to
 +  * build in too much slop in our estimate.
 +  */
 + nkpt_init(*firstaddr);
 + nkpdpe = NKPDPE(nkpt);
 +
 + KPTphys = allocpages(firstaddr, nkpt);
 + KPDphys = allocpages(firstaddr, nkpdpe);
 +
   /* Fill in the underlying page table pages */
   /* Read-only from zero to physfree */
   /* XXX not fully used, underneath 2M pages */


-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: kgdb modules

2013-02-04 Thread Andriy Gapon

on 04/02/2013 12:36 Matt Burke said the following:
 How do I get kgdb to load kernel modules from somewhere other than
 /boot/kernel?
 
 Googling tells me I need to use asf to create a file, but I haven't managed
 to figure out how to get kgdb use the output.

Research in the direction of set sysroot, solib-absolute-prefix,
solib-search-path.  I would not be surprised if the ancient gdb version on which
kgdb is based does not support some of these settings.

-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: [clang] NMI while trying to read acpi timer register

2013-02-04 Thread Andriy Gapon

on 04/02/2013 21:11 Adrian Chadd said the following:
 On 30 January 2013 13:03, Andriy Gapon a...@freebsd.org wrote:
 on 28/01/2013 16:30 Andriy Gapon said the following:
 is there any reasonable explanation for getting an NMI while trying to read 
 acpi
 timer register?
 Note: this happens only after ACPI suspend/resume.

 An update.
 This happens only with clang compiled kernel, gcc compiled kernel is OK.
 Also, this happens only in the depth of fwohci driver (where it calls DELAY).
 If firewire is not loaded, then there is no problem.

 I suspect that perhaps there is some miscompilation that results in some
 incorrect I/O access that later leads to NMI.  Too many unknowns and guesses
 here, obviously.
 
 Do you have stack traces showing where it's happening?
 
 Posting that and the disassembly from those areas may shed a clue.

The information should be available from a user who got this issue.
Are you willing to take a look?  I'll connect you.

-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: [clang] NMI while trying to read acpi timer register

2013-02-04 Thread Andriy Gapon

on 04/02/2013 23:06 Adrian Chadd said the following:
 I'm not the right person for it, but I think it's worth wrapping up
 all my requested details in a PR so Those Who Know can take a peek.
 
 Especially if it boils down to the choice of compiler. Who knows what
 other weird corner issues people will see with clang compiling their
 drivers?

OK, I'll ask the user to open a PR.
I'll just note that the problem seems to be too strange...
There is a huge distance from compiler to nmi.

-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: scheduler-swapper, SI_SUB_RUN_SCHEDULER-SI_SUB_LAST

2013-02-03 Thread Andriy Gapon

on 02/02/2013 16:50 Konstantin Belousov said the following:
 On Sat, Feb 02, 2013 at 01:50:40PM +0200, Andriy Gapon wrote:

 I would like to propose the following mostly cosmetic change:
 http://people.freebsd.org/~avg/scheduler-swapper.diff

 This is something that bit me early in my FreeBSD days, so I am kind of 
 obsessed
 with it.
 The current naming is confusing/misleading indeed.
 And magic properties of SI_SUB_RUN_SCHEDULER:SI_ORDER_LAST is a hidden gem.
 
 You may remove the Giant unlock from the scheduler()/swapper() as well
 then, doing it before the swapper() call in the mi_startup().
 
 Note that the wait chain for the idle swapper is still called sched.

Thank you for the review.  I am fixing both issues.

-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

detect mwait capabilities and extensions

2013-02-03 Thread Andriy Gapon


Guys,

could you please the following change?
It is amd64-centric now, but obviously I plan equivalent changes for i386.
I am mostly concerned about proper header files for various definitions and
proper names for them.  Especially I am not sure where to put STATE_RUNNING,
STATE_MWAIT, STATE_SLEEPING...

Thank you.

diff --git a/sys/amd64/amd64/identcpu.c b/sys/amd64/amd64/identcpu.c
index 2517498..d831f95 100644
--- a/sys/amd64/amd64/identcpu.c
+++ b/sys/amd64/amd64/identcpu.c
@@ -513,6 +513,13 @@ identify_cpu(void)
}
}

+   if (cpu_high = 5  (cpu_feature2  CPUID2_MON) != 0) {
+   do_cpuid(5, regs);
+   cpu_mon_mwait_flags = regs[2];
+   cpu_mon_min_size = regs[0]   CPUID5_MON_MIN_SIZE;
+   cpu_mon_max_size = regs[1]   CPUID5_MON_MAX_SIZE;
+   }
+
if (cpu_high = 7) {
cpuid_count(7, 0, regs);
cpu_stdext_feature = regs[1];
diff --git a/sys/amd64/amd64/initcpu.c b/sys/amd64/amd64/initcpu.c
index 4abed4c..f7574b1 100644
--- a/sys/amd64/amd64/initcpu.c
+++ b/sys/amd64/amd64/initcpu.c
@@ -75,6 +75,9 @@ u_int cpu_mxcsr_mask; /* Valid bits in mxcsr */
 u_int  cpu_clflush_line_size = 32;
 u_int  cpu_stdext_feature;
 u_int  cpu_max_ext_state_size;
+u_int  cpu_mon_mwait_flags;/* MONITOR/MWAIT flags (CPUID.05H.ECX) */
+u_int  cpu_mon_min_size;   /* MONITOR minimum range size, bytes */
+u_int  cpu_mon_max_size;   /* MONITOR minimum range size, bytes */

 SYSCTL_UINT(_hw, OID_AUTO, via_feature_rng, CTLFLAG_RD,
via_feature_rng, 0, VIA RNG feature available in CPU);
diff --git a/sys/amd64/include/md_var.h b/sys/amd64/include/md_var.h
index 5d7cb74..ddc5b9f 100644
--- a/sys/amd64/include/md_var.h
+++ b/sys/amd64/include/md_var.h
@@ -58,6 +58,9 @@ externu_int   cpu_procinfo;
 extern u_int   cpu_procinfo2;
 extern charcpu_vendor[];
 extern u_int   cpu_vendor_id;
+extern u_int   cpu_mon_mwait_flags;
+extern u_int   cpu_mon_min_size;
+extern u_int   cpu_mon_max_size;
 extern charctx_switch_xsave[];
 extern charkstack[];
 extern charsigcode[];
diff --git a/sys/x86/include/specialreg.h b/sys/x86/include/specialreg.h
index dbf9ba0..af64c1b 100644
--- a/sys/x86/include/specialreg.h
+++ b/sys/x86/include/specialreg.h
@@ -240,6 +240,29 @@
 #defineCPUID_LOCAL_APIC_ID 0xff00

 /*
+ * CPUID instruction 5 info
+ */
+#defineCPUID5_MON_MIN_SIZE 0x  /* eax */
+#defineCPUID5_MON_MAX_SIZE 0x  /* ebx */
+#defineCPUID5_MON_MWAIT_EXT0x0001  /* ecx */
+#defineCPUID5_MWAIT_INTRBREAK  0x0002  /* ecx */
+
+/*
+ * MWAIT cpu power states.  Lower 4 bits are sub-states.
+ */
+#defineMWAIT_C00xf0
+#defineMWAIT_C10x00
+#defineMWAIT_C20x10
+#defineMWAIT_C30x20
+#defineMWAIT_C40x30
+
+/*
+ * MWAIT extensions.
+ */
+/* Interrupt breaks MWAIT even when masked. */
+#defineMWAIT_INTRBREAK 0x0001
+
+/*
  * CPUID instruction 6 ecx info
  */
 #defineCPUID_PERF_STAT 0x0001

--- a/sys/amd64/amd64/machdep.c
+++ b/sys/amd64/amd64/machdep.c
@@ -665,10 +665,6 @@ TUNABLE_INT(machdep.idle_mwait, idle_mwait);
 SYSCTL_INT(_machdep, OID_AUTO, idle_mwait, CTLFLAG_RW, idle_mwait,
 0, Use MONITOR/MWAIT for short idle);

-#defineSTATE_RUNNING   0x0
-#defineSTATE_MWAIT 0x1
-#defineSTATE_SLEEPING  0x2
-
 static void
 cpu_idle_acpi(int busy)
 {
diff --git a/sys/amd64/include/cpu.h b/sys/amd64/include/cpu.h
index 1c2871f..dc29a37 100644
--- a/sys/amd64/include/cpu.h
+++ b/sys/amd64/include/cpu.h
@@ -43,8 +43,14 @@
 #include machine/frame.h
 #include machine/segments.h

+/*
+ * CPU states for the purpose of communication using MONITOR+MWAIT. */
+#defineSTATE_RUNNING   0x0
+#defineSTATE_MWAIT 0x1
+#defineSTATE_SLEEPING  0x2
+
 #definecpu_exec(p) /* nothing */
 #definecpu_swapin(p)   /* nothing */
 #definecpu_getstack(td)((td)-td_frame-tf_rsp)
 #definecpu_setstack(td, ap)((td)-td_frame-tf_rsp = (ap))
 #definecpu_spinwait()  ia32_pause()



-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: some questions on kern_linker and pre-loaded modules

2013-02-02 Thread Andriy Gapon

on 28/01/2013 17:40 John Baldwin said the following:
 Yes, I think it is too hard at present to safely allow a linker file to
 override the same module in a kernel, so the duplicate linker file should
 just be rejected entirely.  I'm not sure if your change is completely
 correct for that.

Incidentally that particular patch was broken.
Here is a working patch:
http://people.freebsd.org/~avg/linker_file_register_modules-linker_kernel_file.diff

But I really need to get a better understanding of the bigger picture and then
discuss what has to be done in that scope.

I plan to write a larger response to your complete followup.
Thank you.

-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

scheduler-swapper, SI_SUB_RUN_SCHEDULER-SI_SUB_LAST

2013-02-02 Thread Andriy Gapon


I would like to propose the following mostly cosmetic change:
http://people.freebsd.org/~avg/scheduler-swapper.diff

This is something that bit me early in my FreeBSD days, so I am kind of obsessed
with it.
The current naming is confusing/misleading indeed.
And magic properties of SI_SUB_RUN_SCHEDULER:SI_ORDER_LAST is a hidden gem.

-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: base gcc and _GLIBCXX_USE_C99

2013-02-01 Thread Andriy Gapon


[ping]

on 28/01/2013 17:11 Andriy Gapon said the following:
 
 Guys,
 
 I wonder why the following is the case for the base gcc.
 /usr/include/c++/4.2/bits/c++config.h:
 
 /* Define if C99 functions or macros from wchar.h, math.h, complex.h,
stdio.h, and stdlib.h can be used or exposed. */
 /* #undef _GLIBCXX_USE_C99 */
 
 Because of this undef there is no e.g. std::strtoll().
 Ditto for other things in stdlib.h.
 


-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: base gcc and _GLIBCXX_USE_C99

2013-02-01 Thread Andriy Gapon

on 01/02/2013 15:08 Dimitry Andric said the following:
 On 2013-02-01 14:01, Andriy Gapon wrote:
 on 28/01/2013 17:11 Andriy Gapon said the following:
 I wonder why the following is the case for the base gcc.
 /usr/include/c++/4.2/bits/c++config.h:

 /* Define if C99 functions or macros from wchar.h, math.h, complex.h,
 stdio.h, and stdlib.h can be used or exposed. */
 /* #undef _GLIBCXX_USE_C99 */

 Because of this undef there is no e.g. std::strtoll().
 Ditto for other things in stdlib.h.
 
 Maybe this support can't be enabled, because we don't expose all the
 required functions yet?  Or maybe it is just something that was
 committed years ago, and then forgotten.
 
 If we are sure that all the C99 functions libstdc++ requires are now
 available and working, I see no problem in turning on _GLIBCXX_USE_C99.

Having looked into the source code of a recent GCC I get an impression that this
is a silliness on GCC's part (plus incompleteness of FreeBSD C99 support, it 
seems).

cstdlib would provide e.g. std::strtoull only when _GLIBCXX_USE_C99 is defined.

Now looking at libstdc++-v3/acinclude.m4 we can see that there is a dedicated
check for ISO C99 support in stdlib.h.  That check sets variable
glibcxx_cv_c99_stdlib.

But, _GLIBCXX_USE_C99 is set only if all of glibcxx_cv_c99_math,
glibcxx_cv_c99_complex, glibcxx_cv_c99_stdio, glibcxx_cv_c99_stdlib and
glibcxx_cv_c99_wchar are set.

So if glibcxx_cv_c99_stdlib is yes, but something like glibcxx_cv_c99_complex is
no, then no std::strtoull for me.

Not sure why GCC couldn't have a dedicated macro _GLIBCXX_USE_C99_STDLIB like
e.g. _GLIBCXX_USE_C99_MATH that it does have.

-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

[clang] NMI while trying to read acpi timer register

2013-01-30 Thread Andriy Gapon

on 28/01/2013 16:30 Andriy Gapon said the following:
 is there any reasonable explanation for getting an NMI while trying to read 
 acpi
 timer register?
 Note: this happens only after ACPI suspend/resume.

An update.
This happens only with clang compiled kernel, gcc compiled kernel is OK.
Also, this happens only in the depth of fwohci driver (where it calls DELAY).
If firewire is not loaded, then there is no problem.

I suspect that perhaps there is some miscompilation that results in some
incorrect I/O access that later leads to NMI.  Too many unknowns and guesses
here, obviously.

-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

NMI while trying to read acpi timer register

2013-01-28 Thread Andriy Gapon


Guys,

is there any reasonable explanation for getting an NMI while trying to read acpi
timer register?
Note: this happens only after ACPI suspend/resume.
-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

dtrace vs module unloading

2013-01-27 Thread Andriy Gapon


It seems that FreeBSD DTrace currently does not track module loading / unloading
at all.  dtrace_module_loaded/dtrace_module_unloaded are both under ifdef sun.

I think that this is a root cause of e.g. fbt probes for some functions
remaining after a module that provides the functions is unloaded.

It looks like currently we do not post any event when a module gets loaded /
unloaded.  Perhaps this is one of the factors in current situation.
OTOH, in Solaris they just have some dtrace hooks in the form of function
pointers directly in the module handling code (equivalent of our kern_linker).

-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

some questions on kern_linker and pre-loaded modules

2013-01-26 Thread Andriy Gapon


I.
It seems that linker_preload checks a  module for being a duplicate module only
if the module has MDT_VERSION.

This is probably designed to allow different version of the same module to
co-exist (for some definition of co-exist)?

But, OTOH, this doesn't work well if the module is version-less (no
MODULE_VERSION in the code) and is pre-loaded twice (e.g. once in kernel and
once in a preloaded file).

At present a good example of this is zfsctrl module, which could be present both
in kernel (options ZFS) and in zfs.ko.

I haven't thought about any linker-level resolution for this issue.
I've just tried a plug the ZFS hole for now.

commit ed8b18f2d6c4d1be915bff94cdec0c51a479529f
Author: Andriy Gapon a...@icyb.net.ua
Date:   Wed Dec 19 23:29:23 2012 +0200

[bugfix] zfs: add MODULE_VERSION for zfsctrl

This should allow the kernel linker to easily detect a situation
when the module is present both in a kernel and in a preloaded file 
(zfs.ko).

diff --git a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c
b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c
index 10d28c2..5721010 100644
--- a/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c
+++ b/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_ioctl.c
@@ -5599,6 +5599,7 @@ static moduledata_t zfs_mod = {
0
 };
 DECLARE_MODULE(zfsctrl, zfs_mod, SI_SUB_VFS, SI_ORDER_ANY);
+MODULE_VERSION(zfsctrl, 1);
 MODULE_DEPEND(zfsctrl, opensolaris, 1, 1, 1);
 MODULE_DEPEND(zfsctrl, krpc, 1, 1, 1);
 MODULE_DEPEND(zfsctrl, acl_nfs4, 1, 1, 1);


II.
It seems that linker_file_register_modules() for the kernel is called after
linker_file_register_modules() is called for all the pre-loaded files.
linker_file_register_modules() for the kernel is called from
linker_init_kernel_modules via SYSINIT(SI_SUB_KLD, SI_ORDER_ANY) and that
happens after linker_preload() which is executed via SYSINIT(SI_SUB_KLD,
SI_ORDER_MIDDLE).

Perhaps this is designed to allow modules in the preloaded files to override
modules compiled into the kernel?

But this doesn't seem to work well.
Because modules from the kernel are not registered yet,
linker_file_register_modules() would be successful for the duplicate modules in
a preloaded file and thus any sysinits present in the file will also be 
registered.
So, if the module is present both in the kernel and in the preloaded file and
the module has a module event handler (modeventhand_t), then the handler will
registered and called twice.

I cobbled together the following hack, but I am not sure if it is OK or if it
violates fundamental architecture/design of this subsystem.

commit 14ebf07633d0f0ea393801c3e4161d6c37393661
Author: Andriy Gapon a...@icyb.net.ua
Date:   Wed Dec 19 23:27:46 2012 +0200

[wip][experiment] kernel linker: register kernel modules before preloaded
modules...

Also, skip adding sysinit and sysctl stuff from preloaded modules if module
registration fails.

This should result in much saner behavior if a module is present in both
the kernel and a preloaded file.
Perhaps, the original intent was to allow the preloaded files to override
modules present in kernel, but that was extremly fragile because of double
sysinit registration.

diff --git a/sys/kern/kern_linker.c b/sys/kern/kern_linker.c
index b3ab4df..be46cdf 100644
--- a/sys/kern/kern_linker.c
+++ b/sys/kern/kern_linker.c
@@ -365,6 +365,7 @@ linker_file_register_modules(linker_file_t lf)
return (first_error);
 }

+#if 0
 static void
 linker_init_kernel_modules(void)
 {
@@ -374,6 +375,7 @@ linker_init_kernel_modules(void)

 SYSINIT(linker_kernel, SI_SUB_KLD, SI_ORDER_ANY, linker_init_kernel_modules,
 0);
+#endif

 static int
 linker_load_file(const char *filename, linker_file_t *result)
@@ -1599,7 +1601,11 @@ restart:
printf(KLD file %s is missing dependencies\n, lf-filename);
linker_file_unload(lf, LINKER_UNLOAD_FORCE);
}
-
+#if 1
+   error = linker_file_register_modules(linker_kernel_file);
+   if (error)
+   printf(linker_file_register_modules(linker_kernel_file) 
failed: %d\n, error);
+#endif
/*
 * We made it. Finish off the linking in the order we determined.
 */
@@ -1642,13 +1648,15 @@ restart:
 * Now do relocation etc using the symbol search paths
 * established by the dependencies
 */
+   error = linker_file_register_modules(lf);
+   if (error)
+   goto fail;
error = LINKER_LINK_PRELOAD_FINISH(lf);
if (error) {
printf(KLD file %s - could not finalize loading\n,
lf-filename);
goto fail;
}
-   linker_file_register_modules(lf);
if (linker_file_lookup_set(lf, sysinit_set, si_start,
si_stop, NULL) == 0

Re: [GIANT-LOCKED] even without D_NEEDGIANT

2013-01-18 Thread Andriy Gapon

on 18/01/2013 13:39 Jimmy Olgeni said the following:
 
 Hello list,
 
 At $DAILY_JOB I got involved with an ASI board that didn't have any kind of
 FreeBSD support, so I ended up writing a driver for it.
 
 If you try to ignore the blatant style(9) violations (of which there are many,
 hopefully on the way to be cleaned up) it seems to work fine.
 
 However, I noticed that when loading the driver I always get a
 message about the giant lock being used, even if D_NEEDGIANT is not
 specified anywhere.
 
 The actual output when loading is this (FreeBSD 9-STABLE i386):
 
 dektec0: DekTec DTA-145 mem 0xfeaff800-0xfeaf irq 16 at device 13.0 on 
 pci0
 dektec0: [GIANT-LOCKED]
 dektec0: [ITHREAD]
 dektec0: board model 145, firmware version 2 (tx: 0, rx: 2), tx fifo 16384 MB
 
 Source code here:
 
   https://github.com/olgeni/freebsd-dektec/blob/master/dektec.c
 
 Can anybody offer a clue about what could be triggering the GIANT
 requirement? Could I be doing something that has this, and possibly
 other, unintended side effects?
 

See INTR_MPSAFE in bus_setup_intr(9).

-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

uart: add resume method

2012-12-23 Thread Andriy Gapon


Guys,

do you think that the following change is useful/needed?

I needed it with UART emulated in qemu, but I have no experience with real 
hardware.

commit ff1cc9b33c34fec4f3d1d3cb675ec3f8cfbc96de
Author: Andriy Gapon a...@icyb.net.ua
Date:   Wed Dec 19 23:31:49 2012 +0200

uart: add resume method and enable it for attachments on the most common x86
buses

Otherwise the uart hardware could be in such a state after the resume
where IER is cleared and thus no interrupts are generated.

diff --git a/sys/dev/uart/uart_bus.h b/sys/dev/uart/uart_bus.h
index 01752d0..322e9a8 100644
--- a/sys/dev/uart/uart_bus.h
+++ b/sys/dev/uart/uart_bus.h
@@ -138,6 +138,7 @@ extern char uart_driver_name[];

 int uart_bus_attach(device_t dev);
 int uart_bus_detach(device_t dev);
+int uart_bus_resume(device_t dev);
 serdev_intr_t *uart_bus_ihand(device_t dev, int ipend);
 int uart_bus_ipend(device_t dev);
 int uart_bus_probe(device_t dev, int regshft, int rclk, int rid, int chan);
diff --git a/sys/dev/uart/uart_bus_acpi.c b/sys/dev/uart/uart_bus_acpi.c
index 756f7f2..ccd213d 100644
--- a/sys/dev/uart/uart_bus_acpi.c
+++ b/sys/dev/uart/uart_bus_acpi.c
@@ -47,6 +47,7 @@ static device_method_t uart_acpi_methods[] = {
DEVMETHOD(device_probe, uart_acpi_probe),
DEVMETHOD(device_attach,uart_bus_attach),
DEVMETHOD(device_detach,uart_bus_detach),
+   DEVMETHOD(device_resume,uart_bus_resume),
{ 0, 0 }
 };

diff --git a/sys/dev/uart/uart_bus_isa.c b/sys/dev/uart/uart_bus_isa.c
index 1836663..1fda15f 100644
--- a/sys/dev/uart/uart_bus_isa.c
+++ b/sys/dev/uart/uart_bus_isa.c
@@ -50,6 +50,7 @@ static device_method_t uart_isa_methods[] = {
DEVMETHOD(device_probe, uart_isa_probe),
DEVMETHOD(device_attach,uart_bus_attach),
DEVMETHOD(device_detach,uart_bus_detach),
+   DEVMETHOD(device_resume,uart_bus_resume),
{ 0, 0 }
 };

diff --git a/sys/dev/uart/uart_bus_pci.c b/sys/dev/uart/uart_bus_pci.c
index 307ee340..4237fa4 100644
--- a/sys/dev/uart/uart_bus_pci.c
+++ b/sys/dev/uart/uart_bus_pci.c
@@ -51,6 +51,7 @@ static device_method_t uart_pci_methods[] = {
DEVMETHOD(device_probe, uart_pci_probe),
DEVMETHOD(device_attach,uart_bus_attach),
DEVMETHOD(device_detach,uart_bus_detach),
+   DEVMETHOD(device_resume,uart_bus_resume),
{ 0, 0 }
 };

diff --git a/sys/dev/uart/uart_core.c b/sys/dev/uart/uart_core.c
index 8234d63..aa0d792 100644
--- a/sys/dev/uart/uart_core.c
+++ b/sys/dev/uart/uart_core.c
@@ -590,3 +590,12 @@ uart_bus_detach(device_t dev)

return (0);
 }
+
+int
+uart_bus_resume(device_t dev)
+{
+   struct uart_softc *sc;
+
+   sc = device_get_softc(dev);
+   return (UART_ATTACH(sc));
+}

-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: uart: add resume method

2012-12-23 Thread Andriy Gapon

on 23/12/2012 23:53 Garrett Cooper said the following:
 On Sun, Dec 23, 2012 at 5:36 AM, Andriy Gapon a...@freebsd.org wrote:

 Guys,

 do you think that the following change is useful/needed?

 I needed it with UART emulated in qemu, but I have no experience with real 
 hardware.
 
 I assume that you suspended the emulator and the UART didn't come
 back to life?

No, I executed acpiconf -s 3 in the guest and then system_wakeup in the qemu
monitoring console.

-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

huge ktr buffer

2012-12-06 Thread Andriy Gapon


So I configured a kernel with the following option:
options   KTR_ENTRIES=(1024UL*1024)
then booted the kernel and did
$ sysctl debug.ktr.clear=1
and got an insta-reboot.

No panic, nothing, just a reset.

I suspect that the huge static buffer resulting from the above option could be a
cause.  But I would like to understand the details, if possible.

Also, perhaps ktr could be a little bit more sophisticated with its buffer than
just using a static array.

-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: sleepq problem

2012-12-05 Thread Andriy Gapon

on 05/12/2012 17:55 Alexandr Matveev said the following:
 Hello,
 
 I'm writing a storage controller driver for 9.0-RELEASE-p4 and i'm using
 sleepq at initialization to sleep until command is processed by controller:
 
 struct command {
 ...
 uint8_tdone;
 };
 
 void send_command_and_wait(struct command *cmd)
 {
 command-done = 0;
 
 send_command(cmd);
 
 for (;;) {
 sleepq_lock(command-done);
 if (command-done)
 break;
 sleepq_add(command-done, NULL, wait for completion,
 SLEEPQ_SLEEP, 0);
 sleepq_wait(command-done, 0);
 }
 sleepq_release(command-done);
 }
 
 Interrupt handler calls special function when command is processed:
 
 void command_finish(struct command *cmd)
 {
 sleepq_lock(command-done);
 command-done = 1;
 sleepq_signal(command-done, SLEEPQ_SLEEP, 0, 0);
 sleepq_release(command-done);
 }
 
 This code panics very often with following messages:
 
 Sleeping thread (tid 100248, pid 1859) owns a non-sleepable lock
 sched_switch() at sched_switch+0xf1
 mi_switch() at mi_switch+0x170
 sleepq_wait() at sleepq_wait+0x44
 send_command_and_wait() at send_command_with_retry+0x77
 ...
 panic: sleeping thread
 cpuid = 1
 KDB: stack backtrace:
 db_trace_self_wrapper() at db_trace_self_wrapper+0x2a
 kdb_backtrace() at kdb_backtrace+0x37
 panic() at panic+0x187
 propagate_priority() at propagate_priority+0x161
 turnstile_wait() at turnstile_wait+0x1b8
 _mtx_lock_sleep() at _mtx_lock_sleep+0xb0
 _mtx_lock_flags() at _mtx_lock_flags+0x96
 softclock() at softclock+0x25e
 intr_event_execute_handlers() at intr_event_execute_handlers+0x66
 ithread_loop() at ithread_loop+0x96
 fork_exit() at fork_exit+0x11d
 fork_trampoline() at fork_trampoline+0xe
 --- trap 0, rip = 0, rsp = 0xff80002fad00, rbp = 0 ---
 
 Where tid 100248 is my driver thread which is sleeping  waiting for command
 completion:
 db show thread 100248
 Thread 100243 at 0xfe0146aa98c0:
  proc (pid 1859): 0xfe02a6815488
  name: kldload
  stack: 0xff8464bf2000-0xff8464bf5fff
  flags: 0x4  pflags: 0
  state: INHIBITED: {SLEEPING}
  wmesg: wait for completion  wchan: 0xff8464c1e244
  priority: 127
  container lock: sleepq chain (0x81101af8)
 
 But I can't understand what goes wrong. Sleepq chain lock is owned by
 the other thread:
 db show lock 0x81101af8
  class: spin mutex
  name: sleepq chain
  flags: {SPIN, RECURSE}
  state: {OWNED}
  owner: 0xfe0008377000 (tid 100019, pid 12, swi4: clock)
 
 Unfortunately, I can't find any examples of using sleepq in drivers.
 What am I missing or don't understand?
 
You should not use sleepq, it's too low level.
See locking(9) and follow references from there.

-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: LK_SHARED/LK_DOWNGRADE adjustments to lock.9 manual page

2012-11-29 Thread Andriy Gapon

on 16/11/2012 16:42 Andriy Gapon said the following:
 on 15/11/2012 23:44 Attilio Rao said the following:
 Do you think you can test this patch?:
 http://www.freebsd.org/~attilio/lockmgr_forcerec.patch
 
 I will use this patch in my tree, but I think that it is effectively already 
 quite
 well tested by using INVARIANTS+WITNESS.
 

I've been using this patch in both debug and non-debug environments and I have 
not
run into any issues.  Please commit when you get a chance.
Thank you.

-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: lib for working with graphs

2012-11-29 Thread Andriy Gapon

on 28/11/2012 17:02 Jonathan Anderson said the following:
 On Wednesday, 28 November 2012 at 14:37, Andriy Gapon wrote:
 Graphs as in vertices, edges, etc :) And things like graph basics: BFS, DFS,
 connected components, topological sort, etc
 
 
 I've used igraph in my research: http://igraph.sourceforge.net/. It's very
 full-featured, with attention to efficiency and sensible choices for (at least
 some) algorithms, but it is GPL'ed rather than BSD-licenced.

Yeah, a bummer for me.

 And, big oops sorry, forgot one very important detail - it has to be C.
 
 
 
 Does it have to *be* C, or does it have to be *interoperable with* C? For
 instance, igraph has a core C library to do the heavy lifting, but I'd never
 want to use it directly when exploring data sets because the Python wrapper 
 API
 is so very convenient (and I can pop the resulting data into matplotlib).

It has to be C because I need to use it from C code in a C runtime.

-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: lib for working with graphs

2012-11-29 Thread Andriy Gapon

on 28/11/2012 18:36 Mehmet Erol Sanliturk said the following:
 
 
 On Wed, Nov 28, 2012 at 6:37 AM, Andriy Gapon a...@freebsd.org
 mailto:a...@freebsd.org wrote:
 
 on 28/11/2012 16:31 David Wolfskill said the following:
  On Wed, Nov 28, 2012 at 04:20:28PM +0200, Andriy Gapon wrote:
 
  Does anyone know a light-weight BSD-licensed (or analogous) library / 
 piece of
  code for doing useful things with graphs?
  Thank you.
  
 
  Errr graphs is fairly ambiguous, and things with graphs covers a
  very wide range of activities.
 
 Graphs as in vertices, edges, etc :)
 And things like graph basics: BFS, DFS, connected components, topological
 sort, etc
 
  ports/math/R may be useful for this -- I use it to generate graphs (and
  perform statistical analyses).
 
  ports/graphics/plotmtv is possibly of some interest, as well, as it
  allows a certain level of interactivity (though the code hasn't been
  updated in quite some time -- but it still works).
 
  If neither of those suits your intent, perhaps you could expand a bit on
  what that intent is?
 
 And, big oops sorry, forgot one very important detail - it has to be C.
 
 http://en.wikipedia.org/wiki/JUNG
 http://en.wikipedia.org/wiki/Xfig
 http://en.wikipedia.org/wiki/SVG-edit
 
 
 http://en.wikipedia.org/wiki/Category:Graph_drawing_software
 http://en.wikipedia.org/wiki/Comparison_of_vector_graphics_editors
 http://en.wikipedia.org/wiki/Category:Free_diagramming_software
 
 
 Thank you very much .

Thank you, but all of these appear to be off-mark.
They all are end-user oriented applications for drawing/editing graphs, etc.
While I need a light-weight library for embedding graph analysis.

-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: lib for working with graphs

2012-11-29 Thread Andriy Gapon

on 28/11/2012 17:09 Dan Nelson said the following:
 In the last episode (Nov 28), Andriy Gapon said:
 on 28/11/2012 16:31 David Wolfskill said the following:
 On Wed, Nov 28, 2012 at 04:20:28PM +0200, Andriy Gapon wrote:

 Does anyone know a light-weight BSD-licensed (or analogous) library /
 piece of code for doing useful things with graphs?  Thank you.
 

 Errr graphs is fairly ambiguous, and things with graphs covers a
 very wide range of activities.

 Graphs as in vertices, edges, etc :) And things like graph basics: BFS,
 DFS, connected components, topological sort, etc
 
 Graphviz would be the most popular package for stuff like this, I think, and
 it includes a C API.  It's licensed under the Eclipse Public License.
 
 http://www.graphviz.org/
 http://www.graphviz.org/Gallery.php
 http://www.graphviz.org/doc/libguide/libguide.pdf

The library sounds interesting, but I need to evaluate the license and
light-weight-ness of it.  EPL is not as long as GPL, but is not as short as BSDL
unfortunately.

Thank you!

-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

lib for working with graphs

2012-11-28 Thread Andriy Gapon


Does anyone know a light-weight BSD-licensed (or analogous) library / piece of
code for doing useful things with graphs?
Thank you.

-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: lib for working with graphs

2012-11-28 Thread Andriy Gapon

on 28/11/2012 16:31 David Wolfskill said the following:
 On Wed, Nov 28, 2012 at 04:20:28PM +0200, Andriy Gapon wrote:

 Does anyone know a light-weight BSD-licensed (or analogous) library / piece 
 of
 code for doing useful things with graphs?
 Thank you.
 
 
 Errr graphs is fairly ambiguous, and things with graphs covers a
 very wide range of activities.

Graphs as in vertices, edges, etc :)
And things like graph basics: BFS, DFS, connected components, topological sort, 
etc

 ports/math/R may be useful for this -- I use it to generate graphs (and
 perform statistical analyses).
 
 ports/graphics/plotmtv is possibly of some interest, as well, as it
 allows a certain level of interactivity (though the code hasn't been
 updated in quite some time -- but it still works).
 
 If neither of those suits your intent, perhaps you could expand a bit on
 what that intent is?

And, big oops sorry, forgot one very important detail - it has to be C.

-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: stop_cpus_hard when multiple CPUs are panicking from an NMI

2012-11-25 Thread Andriy Gapon

on 25/11/2012 14:29 Attilio Rao said the following:
 I think the patch you propose makes such effects even worse, because
 it disables interrupts in generic_stop_cpus().
 What I suggest to do, is the following:
 - The CPU which wins the race for generic_stop_cpus also signals the
 CPUs it is willing to stop on a global mask
 - Another CPU entering generic_stop_cpus() and loosing the race,
 checks the mask of cpus which might be stopped and stops itself if
 necessary (ie. not yet done). We must be careful with races here, but
 I'm confindent this can be done easily enough.

I think that you either misunderstood my patch or I misunderstand your
suggestion, because my patch does exactly what you wrote above.

-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: stop_cpus_hard when multiple CPUs are panicking from an NMI

2012-11-25 Thread Andriy Gapon

on 25/11/2012 16:01 Attilio Rao said the following:
 On Sun, Nov 25, 2012 at 12:55 PM, Andriy Gapon a...@freebsd.org wrote:
 on 25/11/2012 14:29 Attilio Rao said the following:
 I think the patch you propose makes such effects even worse, because
 it disables interrupts in generic_stop_cpus().
 What I suggest to do, is the following:
 - The CPU which wins the race for generic_stop_cpus also signals the
 CPUs it is willing to stop on a global mask
 - Another CPU entering generic_stop_cpus() and loosing the race,
 checks the mask of cpus which might be stopped and stops itself if
 necessary (ie. not yet done). We must be careful with races here, but
 I'm confindent this can be done easily enough.

 I think that you either misunderstood my patch or I misunderstand your
 suggestion, because my patch does exactly what you wrote above.
 
 The patch is someway incomplete:
 - I don't think that we need specific checks in cpustop_handler() (and
 if you have added them to prevent races, I don't think they are
 enough, see below)

The check is to cover the following scenario:
- two CPUs race in hard-stop scenario, and both are in NMI contexts
- one CPU wins and the other does spinning right in generic_stop_cpus
- NMI for the spinning CPU is blocked and remains pending
- the winning CPU releases all other CPUs
- the spinning CPU exits the NMI context and get the NMI which was pending

 - setting of stopping_cpus map must happen atomically/before the
 stopper_cpu cpuid setting, otherwise some CPUs may end up using a NULL
 mask in the check

Not NULL, but empty or stale.  But a very good point, I agree.
The logic must be redone.

 - Did you consider the races about when a stop and restart request
 happen just after the CPU_ISSET() check? I think CPUs can deadlock
 there.

Yeah, good point again.  This seems to be a different side of the issue above.
stopping_cpus is probably a bad idea.

 - I'm very doubious about the spinlock_enter() stuff, I think I can
 just make the problem worse atm.

Well, this is where I disagree.  I think that cpu_stop must already be called in
context which effectively disable pre-emption and interrupts.
The added spinlock_enter() stuff is kind of a safety belt to make things more
explicit, but it could be changed into some sort of an assert if that's 
possible.

 However you are right, the concept of your patch is the same I really
 wanted to get, we maybe need to just lift it up a bit.

I agree.

To add some gas to the fire.  Do you recall my wish to drop the mask parameter
from the stop calls?  If that was done then it would be simpler to handle these
things.  In that case only stopper_cpu ID (master/winner) would be needed.

 In the while I also double-checked suspended_cpus and I don't think
 there are real showstoppers to have it in stopped_cpus map.

-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: stop_cpus_hard when multiple CPUs are panicking from an NMI

2012-11-25 Thread Andriy Gapon

on 25/11/2012 20:05 Attilio Rao said the following:
 If you really want to do something like that please rename
 s/generic_stop_cpus/generic_stop_butself() or similar convention and I
 may be not opposed to it.

As we discussed before, anything else besides all but self does not make
sense.  So I am not sure what's the point of being that verbose in the naming.

-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: FreeBSD needs Git to ensure repo integrity [was: 2012 incident]

2012-11-19 Thread Andriy Gapon

on 18/11/2012 16:17 Chris Rees said the following:
 On 18 November 2012 14:04, Adrian Chadd adr...@freebsd.org wrote:
 On 18 November 2012 02:48, Andriy Gapon a...@freebsd.org wrote:

 What you describe is not a workflow issue, but a local development
 environment(s) setup issue.

 Which is a workflow issue.

 I mean, we could bang heads on semantics for hours on end, or we can
 realise that git isn't a magic bullet for FreeBSD development.
 
 Also... did I mention git is GPL?

Yes, you did, twice now.
I do not see any relevance of Git license to FreeBSD's choice of SCM solution.
But, of course, I am not:
a) a zealot
b) proposing Git (or Subversion, or Mercurial) to be bundled with a base system


-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: FreeBSD needs Git to ensure repo integrity [was: 2012 incident]

2012-11-19 Thread Andriy Gapon

on 19/11/2012 03:53 Nathan Whitehorn said the following:
 git would be a huge step backward from svn for the central repo in lots of 
 ways.

Dramatic statements (huge, lots) require dramatic evidence.

 Besides being (in my experience) extremely fragile and error-prone and the

Ditto (extremely).

 issues of workflow that have been brought up, the loss of monotonic revision
 numbers is a really major problem.

Monotonic revision numbers are nice to have, but again, are they really of that
major importance?

 Switching SCMs as a result of a security
 problem is also an action totally disproportionate with the issue that should
 not be made in a panic. Having more [cryptographic] verifiability in the 
 release
 process is a good thing; it is not strictly related to the choice of version
 control system.

With this part I entirely agree.

-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: FreeBSD needs Git to ensure repo integrity [was: 2012 incident]

2012-11-19 Thread Andriy Gapon

on 19/11/2012 15:08 Chris Rees said the following:
 
 On 19 Nov 2012 13:05, Andriy Gapon a...@freebsd.org 
 mailto:a...@freebsd.org
 wrote:

 on 18/11/2012 16:17 Chris Rees said the following:
  On 18 November 2012 14:04, Adrian Chadd adr...@freebsd.org
 mailto:adr...@freebsd.org wrote:
  On 18 November 2012 02:48, Andriy Gapon a...@freebsd.org
 mailto:a...@freebsd.org wrote:
 
  What you describe is not a workflow issue, but a local development
  environment(s) setup issue.
 
  Which is a workflow issue.
 
  I mean, we could bang heads on semantics for hours on end, or we can
  realise that git isn't a magic bullet for FreeBSD development.
 
  Also... did I mention git is GPL?

 Yes, you did, twice now.
 I do not see any relevance of Git license to FreeBSD's choice of SCM 
 solution.
 But, of course, I am not:
 a) a zealot
 b) proposing Git (or Subversion, or Mercurial) to be bundled with a base 
 system
 
 I'm disappointed that you choose to call me a zealot- this is a valid concern,
 especially since we're trying to move away from GPL stuff.  Whether it's in 
 base
 or not, you'd still be forcing me to install it for development.  Why are 
 people
 so dismissive of this point?

a. I haven't called you a zealot
b. You now _sound_ like a zealot if even just installing GPL-licensed software
on your local/private machine sounds like a problem to you

-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: FreeBSD needs Git to ensure repo integrity [was: 2012 incident]

2012-11-18 Thread Andriy Gapon

on 18/11/2012 10:15 Adrian Chadd said the following:
 On 17 November 2012 23:31, Konstantin Belousov kostik...@gmail.com wrote:
 
 Git would work well with our workflow. It supports the centralized
 repository model, which the project employs right now.
 
 It may work with your workflow, but it doesn't work with mine. :-)
 
 Right now the source tree isn't very good at building drivers from a
 full HEAD checkout on a -9 or -8 running system.
 
 The include paths end up pulling from the local sys/net directory, for
 example, rather than falling through to the specified kernel build and
 kernel source path.
 
 So at least for me, working almost exclusively in driver/stack land, I
 can do  sparse check out of only the bits that I'm working on. It lets
 me get work done without having to run an up to date -HEAD (and keep
 said install up to date.) I also do development on little old netbooks
 with SSDs that would make it prohibitive to checkout multiple git
 trees. No, using git on a USB/CF/etc card doesn't work very well
 either I'm afraid.
 
 I'm sure there are other use cases.

What you describe is not a workflow issue, but a local development
environment(s) setup issue.


-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: FreeBSD needs Git to ensure repo integrity [was: 2012 incident]

2012-11-18 Thread Andriy Gapon

on 18/11/2012 16:04 Adrian Chadd said the following:
 On 18 November 2012 02:48, Andriy Gapon a...@freebsd.org wrote:
 
 What you describe is not a workflow issue, but a local development
 environment(s) setup issue.
 
 Which is a workflow issue.

Well, this is what I understand as workflow:
google://git workflow
http://git-scm.com/book/en/Distributed-Git-Distributed-Workflows
http://www.kernel.org/pub/software/scm/git/docs/gitworkflows.html

Local workflow is up to each developer to set up.

 I mean, we could bang heads on semantics for hours on end,

Indeed.

 or we can
 realise that git isn't a magic bullet for FreeBSD development.

I haven't heard anyone saying that.

-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: stop_cpus_hard when multiple CPUs are panicking from an NMI

2012-11-16 Thread Andriy Gapon

on 16/11/2012 14:30 Attilio Rao said the following:
 On Fri, Nov 16, 2012 at 7:54 AM, Andriy Gapon a...@freebsd.org wrote:
 on 16/11/2012 00:58 Ryan Stone said the following:
 At work we have some custom watchdog hardware that sends an NMI upon
 expiry.  We've modified the kernel to panic when it receives the watchdog
 NMI.  I've been trying the stop_scheduler_on_panic mode, and I've
 discovered that when my watchdog expires, the system gets completely
 wedged.  After some digging, I've discovered is that I have multiple CPUs
 getting the watchdog NMI and trying to panic concurrently.  One of the CPUs
 wins, and the rest spin forever in this code:

 /*
  * We don't want multiple CPU's to panic at the same time, so we
  * use panic_cpu as a simple spinlock.  We have to keep checking
  * panic_cpu if we are spinning in case the panic on the first
  * CPU is canceled.
  */
 if (panic_cpu != PCPU_GET(cpuid))
 while (atomic_cmpset_int(panic_cpu, NOCPU,
 PCPU_GET(cpuid)) == 0)
 while (panic_cpu != NOCPU)
 ; /* nothing */

 The system wedges when stop_cpus_hard() is called, which sends NMIs to all
 of the other CPUs and waits for them to acknowledge that they are stopped
 before returning.  However the CPU will not deliver an NMI to a CPU that is
 already handling an NMI, so the other CPUs that got a watchdog NMI and are
 spinning will never go into the NMI handler and acknowledge that they are
 stopped.

 I thought about this issue and fixed (in my tree) in a different way:
 http://people.freebsd.org/~avg/cpu_stop-race.diff

 The need for spinlock_enter in the patch in not entirely clear.
 The main idea is that a CPU which calls cpu_stop and loses a race should
 voluntary enter cpustop_handler.
 I am also not sure about MI-cleanness of this patch.
 
 It is similar to what I propose but with some differences:
 - It is not clean from MI perspective

OK.

 - I don't think we need to treact it specially, I would just
 unconditionally stop all the CPUs entering in the spinlock zone,
 making the patch simpler.

I couldn't understand this part.

 So I guess you are really fine with the proposal I made?

I definitely agree with with the MI cpustop_handler part.  I couldn't understand
the rest of the proposal.

-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: stop_cpus_hard when multiple CPUs are panicking from an NMI

2012-11-16 Thread Andriy Gapon

on 16/11/2012 16:41 Attilio Rao said the following:
 On Fri, Nov 16, 2012 at 1:18 PM, Andriy Gapon a...@freebsd.org wrote:
 on 16/11/2012 14:30 Attilio Rao said the following:
 On Fri, Nov 16, 2012 at 7:54 AM, Andriy Gapon a...@freebsd.org wrote:
 on 16/11/2012 00:58 Ryan Stone said the following:
 At work we have some custom watchdog hardware that sends an NMI upon
 expiry.  We've modified the kernel to panic when it receives the watchdog
 NMI.  I've been trying the stop_scheduler_on_panic mode, and I've
 discovered that when my watchdog expires, the system gets completely
 wedged.  After some digging, I've discovered is that I have multiple CPUs
 getting the watchdog NMI and trying to panic concurrently.  One of the 
 CPUs
 wins, and the rest spin forever in this code:

 /*
  * We don't want multiple CPU's to panic at the same time, so we
  * use panic_cpu as a simple spinlock.  We have to keep checking
  * panic_cpu if we are spinning in case the panic on the first
  * CPU is canceled.
  */
 if (panic_cpu != PCPU_GET(cpuid))
 while (atomic_cmpset_int(panic_cpu, NOCPU,
 PCPU_GET(cpuid)) == 0)
 while (panic_cpu != NOCPU)
 ; /* nothing */

 The system wedges when stop_cpus_hard() is called, which sends NMIs to all
 of the other CPUs and waits for them to acknowledge that they are stopped
 before returning.  However the CPU will not deliver an NMI to a CPU that 
 is
 already handling an NMI, so the other CPUs that got a watchdog NMI and are
 spinning will never go into the NMI handler and acknowledge that they are
 stopped.

 I thought about this issue and fixed (in my tree) in a different way:
 http://people.freebsd.org/~avg/cpu_stop-race.diff

 The need for spinlock_enter in the patch in not entirely clear.
 The main idea is that a CPU which calls cpu_stop and loses a race should
 voluntary enter cpustop_handler.
 I am also not sure about MI-cleanness of this patch.

 It is similar to what I propose but with some differences:
 - It is not clean from MI perspective

 OK.

 - I don't think we need to treact it specially, I would just
 unconditionally stop all the CPUs entering in the spinlock zone,
 making the patch simpler.

 I couldn't understand this part.
 
 I'm sorry, I think I misread your patch.
 
 I was basically suggesting to fix Ryan case by calling
 cpustop_handler() (or the new MI interface) into the panic() function,
 in the case the CPU don't win the race for panic_cpu.
 Basically doing:
 Index: sys/kern/kern_shutdown.c
 ===
 --- sys/kern/kern_shutdown.c(revision 243154)
 +++ sys/kern/kern_shutdown.c(working copy)
 @@ -568,15 +568,11 @@ panic(const char *fmt, ...)
  #ifdef SMP
 /*
  * We don't want multiple CPU's to panic at the same time, so we
 -* use panic_cpu as a simple spinlock.  We have to keep checking
 -* panic_cpu if we are spinning in case the panic on the first
 -* CPU is canceled.
 +* use panic_cpu as a simple lock.
  */
 if (panic_cpu != PCPU_GET(cpuid))
 -   while (atomic_cmpset_int(panic_cpu, NOCPU,
 -   PCPU_GET(cpuid)) == 0)
 -   while (panic_cpu != NOCPU)
 -   ; /* nothing */
 +   if (atomic_cmpset_int(panic_cpu, NOCPU, PCPU_GET(cpuid)) == 
 0)
 +   cpustop_handler();
 
 if (stop_scheduler_on_panic) {
 if (panicstr == NULL  !kdb_active) {
 
 
 Infact it seems to me that the comment is outdated and no longer
 represent truth.

Ah, I see.  Thank you.

My older plan was to get rid of stop_scheduler_on_panic, that is to make the
behavior unconditional.  And then to use stop_cpus_hard instead of the 
hand-rolled
'panic_cpu' spinlock.  This way the whichever CPU wins stop_cpus_hard would be 
the
only CPU to enter panic.

Sorry, if I was not fully clear when I posted my patch.

-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: LK_SHARED/LK_DOWNGRADE adjustments to lock.9 manual page

2012-11-16 Thread Andriy Gapon

on 15/11/2012 23:44 Attilio Rao said the following:
 Do you think you can test this patch?:
 http://www.freebsd.org/~attilio/lockmgr_forcerec.patch

I will use this patch in my tree, but I think that it is effectively already 
quite
well tested by using INVARIANTS+WITNESS.

-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

LK_SHARED/LK_DOWNGRADE adjustments to lock.9 manual page

2012-11-15 Thread Andriy Gapon


To people knowing the code,

do the following documentation changes look correct?

--- a/share/man/man9/lock.9
+++ b/share/man/man9/lock.9
@@ -148,7 +148,9 @@ Flags indicating what action is to be taken.
 .Bl -tag -width .Dv LK_CANRECURSE
 .It Dv LK_SHARED
 Acquire a shared lock.
-If an exclusive lock is currently held, it will be downgraded.
+If an exclusive lock is currently held,
+.Dv EDEADLK
+will be returned.
 .It Dv LK_EXCLUSIVE
 Acquire an exclusive lock.
 If an exclusive lock is already held, and
@@ -158,7 +160,8 @@ is not set, the system will
 .It Dv LK_DOWNGRADE
 Downgrade exclusive lock to a shared lock.
 Downgrading a shared lock is not permitted.
-If an exclusive lock has been recursed, all references will be downgraded.
+If an exclusive lock has been recursed, the system will
+.Xr panic 9 .
 .It Dv LK_UPGRADE
 Upgrade a shared lock to an exclusive lock.
 If this call fails, the shared lock is lost.

-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: [RFQ] make witness panic an option

2012-11-15 Thread Andriy Gapon

on 15/11/2012 19:56 Warner Losh said the following:
 It sounds like he's more worried about introducing LoRs into his wireless 
 code.

Mere LORs do not result in panic, by default.
Only more serious lock-related issues lead to panics.

-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: LK_SHARED/LK_DOWNGRADE adjustments to lock.9 manual page

2012-11-15 Thread Andriy Gapon

on 15/11/2012 20:46 Attilio Rao said the following:
 On 11/15/12, Andriy Gapon a...@freebsd.org wrote:

 To people knowing the code,

 do the following documentation changes look correct?
 
 The latter chunk is not correct.
 It will panic only if assertions are on.

But the current content is not correct too?

 I was thinking that however
 it would be good idea to patch lockmgr to panic also in non-debugging
 kernel situation.

It would make sense indeed, IMO.

-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: [RFQ] make witness panic an option

2012-11-15 Thread Andriy Gapon

on 15/11/2012 22:00 Adrian Chadd said the following:
 But I think my change is invaluable for development, where you want to
 improve and debug the locking and lock interactions of a subsystem.

My practical experience was that if you mess up one lock in one place, then it
is a total mess further on.  but apparently you've got a different practical
experience :-)

What would indeed be invaluable to _me_ - if the LOR messages also produced the
stack(s) where a supposedly correct lock order was learned.

-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: [RFQ] make witness panic an option

2012-11-15 Thread Andriy Gapon

on 16/11/2012 01:38 Attilio Rao said the following:
 On Thu, Nov 15, 2012 at 8:51 PM, Andriy Gapon a...@freebsd.org wrote:
 on 15/11/2012 22:00 Adrian Chadd said the following:
 But I think my change is invaluable for development, where you want to
 improve and debug the locking and lock interactions of a subsystem.

 My practical experience was that if you mess up one lock in one place, then 
 it
 is a total mess further on.  but apparently you've got a different practical
 experience :)
 What would indeed be invaluable to _me_ - if the LOR messages also produced 
 the
 stack(s) where a supposedly correct lock order was learned.
 
 Please note that the supposedly correct lock order, as for the
 definition that it is correct, can be used in several different
 stacks. I don't see the point of saving it somewhere.
 The only helpful case would be if the wrong order is catched first.
 If this is really the case, I suggest you to force the order you
 expect in the static table so that the first time the wrong order
 happens it yells.

Exactly my point - if I am a user of some code, not its developer, and I don't
know which one is the correct order I would have had the complete information
from the very start instead of having to jump through the hoops.

-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: [RFQ] make witness panic an option

2012-11-15 Thread Andriy Gapon

on 16/11/2012 01:20 Alfred Perlstein said the following:
 We need to enable developers to skip these areas and test their own code.

I wish that there was a magic knob to ignore build breakages, so that the
developers could test how their own code compiles :-)

On a serious note, why stop here?  E.g. Solaris seems to have knob to ignore all
asserts (just to print a message, but not panic).

-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: stop_cpus_hard when multiple CPUs are panicking from an NMI

2012-11-15 Thread Andriy Gapon

on 16/11/2012 00:58 Ryan Stone said the following:
 At work we have some custom watchdog hardware that sends an NMI upon
 expiry.  We've modified the kernel to panic when it receives the watchdog
 NMI.  I've been trying the stop_scheduler_on_panic mode, and I've
 discovered that when my watchdog expires, the system gets completely
 wedged.  After some digging, I've discovered is that I have multiple CPUs
 getting the watchdog NMI and trying to panic concurrently.  One of the CPUs
 wins, and the rest spin forever in this code:
 
 /*
  * We don't want multiple CPU's to panic at the same time, so we
  * use panic_cpu as a simple spinlock.  We have to keep checking
  * panic_cpu if we are spinning in case the panic on the first
  * CPU is canceled.
  */
 if (panic_cpu != PCPU_GET(cpuid))
 while (atomic_cmpset_int(panic_cpu, NOCPU,
 PCPU_GET(cpuid)) == 0)
 while (panic_cpu != NOCPU)
 ; /* nothing */
 
 The system wedges when stop_cpus_hard() is called, which sends NMIs to all
 of the other CPUs and waits for them to acknowledge that they are stopped
 before returning.  However the CPU will not deliver an NMI to a CPU that is
 already handling an NMI, so the other CPUs that got a watchdog NMI and are
 spinning will never go into the NMI handler and acknowledge that they are
 stopped.

I thought about this issue and fixed (in my tree) in a different way:
http://people.freebsd.org/~avg/cpu_stop-race.diff

The need for spinlock_enter in the patch in not entirely clear.
The main idea is that a CPU which calls cpu_stop and loses a race should
voluntary enter cpustop_handler.
I am also not sure about MI-cleanness of this patch.

P.S.  I also have the hard stop and the soft stop separate in my tree.  Just
in case there is a simultaneous occurrence of the soft stop happening for
whatever reason and the hard stop for panic or kdb entry, I always want the hard
stop to override the soft stop.

 I've been able to work around this with the following hideous hack:
 
 --- kern_shutdown.c 2012-08-17 10:25:02.0 -0400
 +++ kern_shutdown.c 2012-11-15 17:04:10.0 -0500
 @@ -658,11 +658,15 @@
  * panic_cpu if we are spinning in case the panic on the first
  * CPU is canceled.
  */
 -   if (panic_cpu != PCPU_GET(cpuid))
 +   if (panic_cpu != PCPU_GET(cpuid)) {
 while (atomic_cmpset_int(panic_cpu, NOCPU,
 -   PCPU_GET(cpuid)) == 0)
 +   PCPU_GET(cpuid)) == 0) {
 +   atomic_set_int(stopped_cpus, PCPU_GET(cpumask));
 while (panic_cpu != NOCPU)
 ; /* nothing */
 +   }
 +   atomic_clear_int(stopped_cpus, PCPU_GET(cpumask));
 +   }
 
 if (stop_scheduler_on_panic) {
 if (panicstr == NULL  !kdb_active)
 
 
 But I'm hoping that somebody has some ideas on a better way to fix this
 kind of problem.


-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

dtrace anonymous tracing

2012-11-14 Thread Andriy Gapon


Do we have support for DTrace anonymous tracing (boot-time tracing) on FreeBSD?

-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

_mtx_lock_spin: obsolete historic handling of kdb_active and panicstr?

2012-10-17 Thread Andriy Gapon


_mtx_lock_spin has the following check in its retry loop:
if (i  6000 || kdb_active || panicstr != NULL)
DELAY(1);
else
_mtx_lock_spin_failed(m);

Which means that in the (!kdb_active  panicstr == NULL) case we will make at
most 6000 iterations and then call _mtx_lock_spin_failed (which proceeds to
panic).  When either kdb_active or panicstr is set, then we are going to loop
forever.

I've done some digging through the lengthy history and many evolutions of the
code (unfortunately I haven't kept records during the research), and my
conclusion is that the kdb_active and panicstr checks were added at the quite
early era of FreeBSD SMP, where we didn't have a mechanism to stop/block other
CPUs when kdb or panic were entered.  We didn't even prevent parallel execution
of panic.
So the above code was a sort of defense where we hoped that other CPUs would
eventually stumble upon some held spinlock and would be captured there.  Maybe
there was a specific set of spinlocks, which were supposed to help.

Nowadays, we do try to stop other CPUs during panic and kdb activation and there
are good chances that they are indeed stopped.  In this circumstances, should
the main CPU be so unlucky as to run into the held spinlock, the above check
would do more harm than good - the main CPU would just spin there forever,
because a lock owner is also spinning in the stop loop and so can't release the
lock.
Actually, this is only true for the kdb case.  In the panic case we make a check
earlier and simply ignore/skip/bust all the locks.  That makes the panicstr
check in the code in question both harmless and useless.

So I'd like to propose to remove those checks altogether.  Or perhaps to
reverse them and immediately induce a (possibly secondary) panic if we ever
get to that wait loop and kdb_active || panicstr != NULL.

What do you think?
-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: _mtx_lock_spin: obsolete historic handling of kdb_active and panicstr?

2012-10-17 Thread Andriy Gapon

on 17/10/2012 15:07 m...@freebsd.org said the following:
 On Wed, Oct 17, 2012 at 4:20 AM, Andriy Gapon a...@freebsd.org wrote:

 _mtx_lock_spin has the following check in its retry loop:
 if (i  6000 || kdb_active || panicstr != NULL)
 DELAY(1);
 else
 _mtx_lock_spin_failed(m);

 [snip analysis]

 So I'd like to propose to remove those checks altogether.  Or perhaps to
 reverse them and immediately induce a (possibly secondary) panic if we ever
 get to that wait loop and kdb_active || panicstr != NULL.
 
 The panicstr can clearly be removed.  I think there can be race
 conditions with entering kdb and taking a spinlock, because the
 spinlock acquire will block interrupts.  I don't remember if we always
 NMI for kdb enter or if that's configurable.  The old code was clearer
 (or maybe I'm just remembering an Isilon hack); looking at
 stop_cpus_hard() I don't see that it uses an NMI.

kdb always uses stop_cpus_hard and stop_cpus_hard always uses NMI on x86.
From sys/x86/x86/local_apic.c:
if (vector == IPI_STOP_HARD)
icrlo |= APIC_DELMODE_NMI | APIC_LEVEL_ASSERT;

  So a CPU can block
 interrupts, then if it sees kdb_active it will spin until we leave
 kdb, rather than panic.  Of course this would only be relevant if the
 CPU it's trying to acquire is already held; otherwise it should find
 the lock unowned and this isn't relevant.  And if the lock is owned by
 the thread entering kdb, that would be a real panic, not a recoverable
 kdb entry.
 
 So I think maybe the kdb_active check is also not helpful after all.
 
 Cheers,
 matthew
 


-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: curcpu false positive?

2012-10-11 Thread Andriy Gapon

on 11/10/2012 16:06 Erik Cederstrand said the following:
 Hello,
 
 I'm looking at some Clang Static Analyzer reports in the kernel, and a lot of 
 them point back to a null pointer dereference in __pcpu_type 
 (sys/amd64/include/pcpu.h:102) which is defined as:
 
 102/*
 103* Evaluates to the type of the per-cpu variable name.
 104*/
 105   #define __pcpu_type(name)   
 \
 106   __typeof(((struct pcpu *)0)-name)
 
 
 Which indeed looks like a NULL pointer dereference. Looking at the latest 
 commit message there, I'm sure the code is correct, but I'm unsure why the 
 null pointer is OK. I'd appreciate an explanation :-)

Read about __typeof [1].
It's evaluated at compile time, so actual value of an expression does not matter
at all.

[1] http://gcc.gnu.org/onlinedocs/gcc/Typeof.html
-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

machine/cpu.h in userland

2012-10-07 Thread Andriy Gapon


I noticed that couple of our userland file include machine/cpu.h for no apparent
good reason.  It looks like at least amd64 cpu.h has no userland serviceable
parts inside.
Maybe its content should be put under _KERNEL ?

Also:

--- a/sbin/adjkerntz/adjkerntz.c
+++ b/sbin/adjkerntz/adjkerntz.c
@@ -51,7 +51,6 @@ __FBSDID($FreeBSD$);
 #include syslog.h
 #include sys/time.h
 #include sys/param.h
-#include machine/cpu.h
 #include sys/sysctl.h

 #include pathnames.h
diff --git a/usr.bin/w/w.c b/usr.bin/w/w.c
index 8441ce5..9674fc2 100644
--- a/usr.bin/w/w.c
+++ b/usr.bin/w/w.c
@@ -57,7 +57,6 @@ static const char sccsid[] = @(#)w.c 8.4 (Berkeley) 4/16/94;
 #include sys/socket.h
 #include sys/tty.h

-#include machine/cpu.h
 #include netinet/in.h
 #include arpa/inet.h
 #include arpa/nameser.h

-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: ule+smp: small optimization for turnstile priority lending

2012-10-03 Thread Andriy Gapon

on 20/09/2012 16:14 Attilio Rao said the following:
 On 9/20/12, Andriy Gapon a...@freebsd.org wrote:
[snip]
 The patch works well as far as I can tell.  Thank you!
 There is one warning with full witness enables but it appears to be harmless
 (so
 far):
 
 Andriy,
 thanks a lot for your testing and reports you made so far.
 Unfortunately I'm going off for 2 weeks now and I won't work on
 FreeBSD for that timeframe. I will get back to those in 2 weeks then.
 If you want  to play more with this idea feel free to extend/fix/etc.
 this patch.

Unfortunately I haven't found time to work on this further, but I have some
additional thoughts.

First, I think that the witness message was not benign and it actually warns 
about
the same kind of deadlock that I originally had.
The problem is that sched_lend_prio() is called with target thread's td_lock 
held,
which is a lock of tdq on the thread's CPU.  Then, in your patch, we acquire
current tdq's lock to modify its load.  But if two tdq locks are to be held at 
the
same time, then they must be locked using tdq_lock_pair, so that lock order is
maintained.  With the patch no tdq lock order can be maintained (can arbitrary)
and thus it is possible to run into a deadlock.

I see two possibilities here, but don't rule out that there can be more.

1. Do something like my patch does.  That is, manipulate current thread's tdq in
advance before any other thread/tdq locks are acquired (or td_lock is changed to
point to a different lock and current tdq is unlocked).  The API can be made 
more
generic in nature.  E.g. it can look like this:
void
sched_thread_block(struct thread *td, int inhibitor)
{
struct tdq *tdq;

THREAD_LOCK_ASSERT(td, MA_OWNED);
KASSERT(td == curthread,
(sched_thread_block: only curthread is supported));
tdq = TDQ_SELF();
TDQ_LOCK_ASSERT(tdq, MA_OWNED);
MPASS(td-td_lock == TDQ_LOCKPTR(tdq));
TD_SET_INHIB(td, inhibitor);
tdq_load_rem(tdq, td);
tdq_setlowpri(tdq, td);
}


2. Try to do things from sched_lend_prio based on curthread's state.  This, as 
it
seems, would require completely lock-less manipulations of current tdq.  E.g. 
for
tdq_load we could use atomic operations (it is already accessed locklessly, but
not modified).  But for tdq_lowpri a more elaborate trick would be required like
having a separate field for a temporary value.

In any case, I'll have to revisit this later.

-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

kvm_proclist: gnore processes in PRS_NEW

2012-10-03 Thread Andriy Gapon


I believe that the following patch does the right thing that is repeated in a 
few
other places.
I would like to ask for a review just in case.

commit cf0f573a1dcbc09cb8fce612530afeeb7f1b1c62
Author: Andriy Gapon a...@icyb.net.ua
Date:   Sun Sep 23 22:49:26 2012 +0300

kvm_proc: ignore processes in larvae state

diff --git a/lib/libkvm/kvm_proc.c b/lib/libkvm/kvm_proc.c
index 8fc415c..d1daf77 100644
--- a/lib/libkvm/kvm_proc.c
+++ b/lib/libkvm/kvm_proc.c
@@ -144,6 +144,8 @@ kvm_proclist(kvm_t *kd, int what, int arg, struct proc *p,
_kvm_err(kd, kd-program, can't read proc at %p, p);
return (-1);
}
+   if (proc.p_state == PRS_NEW)
+   continue;
if (proc.p_state != PRS_ZOMBIE) {
if (KREAD(kd, (u_long)TAILQ_FIRST(proc.p_threads),
mtd)) {

-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

kvm_getprocs: gracefully handle errors in kvm_deadprocs

2012-10-03 Thread Andriy Gapon


kvm_deadprocs returns -1 to signify an error.
Current kvm_getprocs code would pass this return code as 'cnt' out parameter and
would not reset return value to NULL.
This confuses some callers, most prominently procstat_getprocs, into believing
that kvm_getprocs was successful.  Moreover, the code tried to use cnt=-1 as an
unsigned number to allocate some additional memory.  As a result fstat -M could
try to allocate huge amount of memory e.g. when used with a kernel that didn't
match userland.

With the proposed change the error code should be handled properly.
Additionally it should now be possible to enable realloc code, which previously
contained a bug and was called even after kvm_deadprocs error.

commit 6ddf602409119eded40321e5bb349b464f24e81a
Author: Andriy Gapon a...@icyb.net.ua
Date:   Sun Sep 23 22:52:28 2012 +0300

kvm_proc: gracefully handle errors in kvm_deadprocs, don't confuse callers

Plus fix a bug under 'notdef' (sic) section.

diff --git a/lib/libkvm/kvm_proc.c b/lib/libkvm/kvm_proc.c
index d1daf77..31258d7 100644
--- a/lib/libkvm/kvm_proc.c
+++ b/lib/libkvm/kvm_proc.c
@@ -593,9 +593,15 @@ liveout:

nprocs = kvm_deadprocs(kd, op, arg, nl[1].n_value,
  nl[2].n_value, nprocs);
+   if (nprocs = 0) {
+   _kvm_freeprocs(kd);
+   nprocs = 0;
+   }
 #ifdef notdef
-   size = nprocs * sizeof(struct kinfo_proc);
-   (void)realloc(kd-procbase, size);
+   else {
+   size = nprocs * sizeof(struct kinfo_proc);
+   kd-procbase = realloc(kd-procbase, size);
+   }
 #endif
}
*cnt = nprocs;

P.S. it might may sense to change 'count' parameter of procstat_getprocs to 
signed
int so that it matches kvm_getprocs interface.  Alternatively, an intermediate
variable could be used to insulate signedness difference:

index 56562e1..11a817e 100644
--- a/lib/libprocstat/libprocstat.c
+++ b/lib/libprocstat/libprocstat.c
@@ -184,15 +184,18 @@ procstat_getprocs(struct procstat *procstat, int what, 
int arg,
struct kinfo_proc *p0, *p;
size_t len;
int name[4];
+   int cnt;
int error;

assert(procstat);
assert(count);
p = NULL;
if (procstat-type == PROCSTAT_KVM) {
-   p0 = kvm_getprocs(procstat-kd, what, arg, count);
-   if (p0 == NULL || count == 0)
+   *count = 0;
+   p0 = kvm_getprocs(procstat-kd, what, arg, cnt);
+   if (p0 == NULL || cnt = 0)
return (NULL);
+   *count = cnt;
len = *count * sizeof(*p);
p = malloc(len);
if (p == NULL) {



-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

monitor+mwait and volatile-ish

2012-09-26 Thread Andriy Gapon


Typical x86 MONITOR+MWAIT is like this (taken from Intel manual):

EAX = Logical Address(Trigger)
ECX = 0 (*Hints *)
EDX = 0 (* Hints *)
IF ( !trigger_store_happened) {
MONITOR EAX, ECX, EDX
IF ( !trigger_store_happened ) {
MWAIT EAX, ECX
}
}

In FreeBSD we have this helper function for MONITOR:
static __inline void
cpu_monitor(const void *addr, u_long extensions, u_int hints)
{

__asm __volatile(monitor
: : a (addr), c (extensions), d (hints));
}

Now, let's assume that 'Trigger' is a global variable and
'trigger_store_happened' is a check for a particular value in this variable.

Then, with current state of matters, we must either declare the global variable
to be volatile or use a volatile cast in the second IF.  Otherwise, a compiler
is free to assume that the variable doesn't change between the first and the
second IF and drop the second IF altogether.  And that would make MONITOR+MWAIT
to miss an event if it happens between the first check and MONITOR.

So what's my point.
- using volatile variable with cpu_monitor requires DEVOLATILE to silence
compiler warning about discarding volatile; this is unnecessary code bloat
- adding volatile cast in the checks is easy to forget and adds code bloat

Possible improvements:
- make the argument of cpu_monitor be 'const volatile void *', the most
permissive type; this would also be a hint that variable should be volatile
- add some magic dust to cpu_monitor that would tell compiler to not cache
  the variable; right now I can only think of the memory constraint, but it
  seems to be too big of a hummer

What do you think about this?

-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: monitor+mwait and volatile-ish

2012-09-26 Thread Andriy Gapon

on 26/09/2012 12:10 Konstantin Belousov said the following:
 On Wed, Sep 26, 2012 at 11:14:41AM +0300, Andriy Gapon wrote:
[snip]
 So what's my point. - using volatile variable with cpu_monitor requires
 DEVOLATILE to silence compiler warning about discarding volatile; this is
 unnecessary code bloat - adding volatile cast in the checks is easy to
 forget and adds code bloat
 
 Possible improvements: - make the argument of cpu_monitor be 'const
 volatile void *', the most permissive type; this would also be a hint
 that variable should be volatile - add some magic dust to cpu_monitor
 that would tell compiler to not cache the variable; right now I can only
 think of the memory constraint, but it seems to be too big of a hummer
 
 What do you think about this?
 
 You might claim that the asm writes to *addr by specifying it in the output
 constraint. This should fool the compiler into reload *addr after the
 monitor execution.
 

You mean something like:
static __inline void
cpu_monitor(const void *addr, u_long extensions, u_int hints)
{

__asm __volatile(monitor
:  =m (*(char *)addr)
: a (addr), c (extensions), d (hints));
}

This seems to do the job with base gcc, 4.6, 4.7 and clang.
Thank you!
-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: serial console detection during boot

2012-09-22 Thread Andriy Gapon

on 22/09/2012 11:40 Olivier Cochard-Labbé said the following:
 On Mon, Sep 17, 2012 at 10:23 PM, Andriy Gapon a...@freebsd.org wrote:

 Guys,

 With this patch I am able to boot with
 boot_multicons=YES
 console=vidconsole,comconsole
 in loader.conf on hardware where serial ports are disabled in BIOS.
 Previously loader would just hang trying to apply the console setting.
 
 I confirm that I can boot under Virtualbox without serial port
 configured with your patch and dual console configuration in
 /boot/loader.conf.
 
 But, I've still the same hang problem with a system configured for
 booting with dual console configuration like that:
 echo -D  /boot.config

This is because -D affects behavior of boot2 as well and boot2 uses different
serial console code (in BTX).

-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: serial console detection during boot

2012-09-22 Thread Andriy Gapon


Here is an update on the changes:
http://people.freebsd.org/~avg/boot-comconsole.diff

Please note that the file is actually a patchset that consists of three
individual changes:
- a generic change in common boot code
- libi386 comconsole change
- BTX and boot2-ish comconsole change

All the code is lightly tested.
As I am not an expert in the assembly code and also because boot2 is quite
size-sensitive I would like to ask for a special attention to the last change.

Thank you!!

-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

kern_exec: check p_tracecred instead of p_cred

2012-09-22 Thread Andriy Gapon


Currently even if root ktraces an unprivileged process that execs a sguid
executable, then tracing aborted at that point.  I think that that happens
because wrong credentials are checked at that point.

What do you think about the following patch?

commit 956a80783bc39162b1d64383188c5037f9767413
Author: Andriy Gapon a...@icyb.net.ua
Date:   Sat Sep 22 18:17:46 2012 +0300

kern_exec: check p_tracecred instead of p_cred

... when deciding whether to continue tracing across suid/sgid exec

diff --git a/sys/kern/kern_exec.c b/sys/kern/kern_exec.c
index 90f7311..8d62c1e 100644
--- a/sys/kern/kern_exec.c
+++ b/sys/kern/kern_exec.c
@@ -694,7 +694,8 @@ interpret:
setsugid(p);

 #ifdef KTRACE
-   if (priv_check_cred(oldcred, PRIV_DEBUG_DIFFCRED, 0))
+   if (p-p_tracecred != NULL 
+   priv_check_cred(p-p_tracecred, PRIV_DEBUG_DIFFCRED, 0))
ktrprocexec(p, tracecred, tracevp);
 #endif
/*

-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

Re: cpu_spinwait in cngetc

2012-09-22 Thread Andriy Gapon

on 18/09/2012 16:25 Andriy Gapon said the following:
 
 (Why[*]) Would anyone object to a change like this?
 
 diff --git a/sys/kern/kern_cons.c b/sys/kern/kern_cons.c
 index 5346bc3..d17846a 100644
 --- a/sys/kern/kern_cons.c
 +++ b/sys/kern/kern_cons.c
 @@ -384,7 +384,7 @@ cngetc(void)
   if (cn_mute)
   return (-1);
   while ((c = cncheckc()) == -1)
 - ;
 + cpu_spinwait();
   if (c == '\r')
   c = '\n';   /* console input is always ICRNL */
   return (c);
 
 [*] :-)
 

I would to re-ping with this question.
Is there any architecture where a cpu_spinwait could cause a surprise?
Or is universally safe?

The most visible place which this change can affect is ddb prompt.


-- 
Andriy Gapon
___
freebsd-hackers@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-hackers
To unsubscribe, send any mail to freebsd-hackers-unsubscr...@freebsd.org

1 2 3 4 5 6 7 >

1 - 100 of 613 matches

Mail list logo