Re: pciconf -l doesn't show any devices on Hyper-V VM

2018-05-23 Thread Sepherosa Ziehau
Gen1 or Gen2?  On Gen2 Hyper-V, there is _no_ pcib0 and pci0 at all;
unless you uses pci-e pass-through or SR-IOV.

Why do you care about the existence of pci bus?

On Thu, May 24, 2018 at 9:33 AM, Yuri Pankov  wrote:
> Hi,
>
> Running FreeBSD-12.0-CURRENT-amd64-20180514-r333606 snapshot as a VM on
> Hyper-V, `pciconf -l` (and `lspci`, expected) doesn't return any
> information.
>
> Hyper-V Version: 10.0.14393 [SP2] (that's Windows 2016 with Hyper-V role).
>
> dmesg.boot is attached in case it's useful.
>
> ___
> freebsd-current@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
>



-- 
Tomorrow Will Never Die
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Where should I put devd related scripts?

2017-07-17 Thread Sepherosa Ziehau
Hi all,

Where should I put devd related scripts?

Thanks,
sephe

-- 
Tomorrow Will Never Die
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Add support for ACPI Module Device ACPI0004?

2017-05-01 Thread Sepherosa Ziehau
On Tue, May 2, 2017 at 11:28 AM, Sepherosa Ziehau  wrote:
> On Tue, May 2, 2017 at 12:25 AM, John Baldwin  wrote:
>> On Sunday, April 30, 2017 09:02:30 AM Sepherosa Ziehau wrote:
>>> On Sat, Apr 29, 2017 at 12:01 AM, John Baldwin  wrote:
>>> > On Friday, April 28, 2017 05:38:32 PM Sepherosa Ziehau wrote:
>>> >> On Thu, Apr 27, 2017 at 12:14 AM, John Baldwin  wrote:
>>> >> > On Wednesday, April 26, 2017 09:18:48 AM Sepherosa Ziehau wrote:
>>> >> >> On Wed, Apr 26, 2017 at 4:36 AM, John Baldwin  
>>> >> >> wrote:
>>> >> >> > On Thursday, April 20, 2017 02:29:30 AM Dexuan Cui wrote:
>>> >> >> >> > From: John Baldwin [mailto:j...@freebsd.org]
>>> >> >> >> > Sent: Thursday, April 20, 2017 02:34
>>> >> >> >> > > Can we add the support of "ACPI0004" with the below one-line 
>>> >> >> >> > > change?
>>> >> >> >> > >
>>> >> >> >> > >  acpi_sysres_probe(device_t dev)
>>> >> >> >> > >  {
>>> >> >> >> > > -static char *sysres_ids[] = { "PNP0C01", "PNP0C02", NULL 
>>> >> >> >> > > };
>>> >> >> >> > > +static char *sysres_ids[] = { "PNP0C01", "PNP0C02", 
>>> >> >> >> > > "ACPI0004", NULL };
>>> >> >> >> > >
>>> >> >> >> > Hmm, so the role of C01 and C02 is to reserve system resources, 
>>> >> >> >> > though we
>>> >> >> >> > in turn allow any child of acpi0 to suballocate those ranges 
>>> >> >> >> > (since historically
>>> >> >> >> > c01 and c02 tend to allocate I/O ranges that are then used by 
>>> >> >> >> > things like the
>>> >> >> >> > EC, PS/2 keyboard controller, etc.).  From my reading of 
>>> >> >> >> > ACPI0004 in the ACPI
>>> >> >> >> > 6.1 spec it's not quite clear that ACPI0004 is like that?  In 
>>> >> >> >> > particular, it
>>> >> >> >> > seems that 004 should only allow direct children to suballocate? 
>>> >> >> >> >  This
>>> >> >> >> > change might work, but it will allow more devices to allocate 
>>> >> >> >> > the ranges in
>>> >> >> >> >  _CRS than otherwise.
>>> >> >> >> >
>>> >> >> >> > Do you have an acpidump from a guest system that contains an 
>>> >> >> >> > ACPI0004
>>> >> >> >> > node that you can share?
>>> >> >> >> >
>>> >> >> >> > John Baldwin
>>> >> >> >>
>>> >> >> >> Hi John,
>>> >> >> >> Thanks for the help!
>>> >> >> >>
>>> >> >> >> Please see the attached file, which is got by
>>> >> >> >> "acpidump -dt | gzip -c9 > acpidump.dt.gz"
>>> >> >> >>
>>> >> >> >> In the dump, we can see the "ACPI0004" node (VMOD) is the parent of
>>> >> >> >> "VMBus" (VMBS).
>>> >> >> >> It looks the _CRS of ACPI0004 is dynamically generated. Though we 
>>> >> >> >> can't
>>> >> >> >> see the length of the MMIO range in the dumped asl code, it does 
>>> >> >> >> have
>>> >> >> >> a 512MB MMIO range [0xFE000, 0xF].
>>> >> >> >>
>>> >> >> >> It looks FreeBSD can't detect ACPI0004 automatically.
>>> >> >> >> With the above one-line change, I can first find the child device
>>> >> >> >> acpi_sysresource0 of acpi0, then call AcpiWalkResources() to get
>>> >> >> >> the _CRS of acpi_sysresource0, i.e. the 512MB MMIO range.
>>> >> >> >>
>>> >> >> >> If you think we shouldn't touch acpi_sysresource0 here, I guess
>>> >> >> >> we can add a new small driver for 

Re: Add support for ACPI Module Device ACPI0004?

2017-05-01 Thread Sepherosa Ziehau
On Tue, May 2, 2017 at 12:25 AM, John Baldwin  wrote:
> On Sunday, April 30, 2017 09:02:30 AM Sepherosa Ziehau wrote:
>> On Sat, Apr 29, 2017 at 12:01 AM, John Baldwin  wrote:
>> > On Friday, April 28, 2017 05:38:32 PM Sepherosa Ziehau wrote:
>> >> On Thu, Apr 27, 2017 at 12:14 AM, John Baldwin  wrote:
>> >> > On Wednesday, April 26, 2017 09:18:48 AM Sepherosa Ziehau wrote:
>> >> >> On Wed, Apr 26, 2017 at 4:36 AM, John Baldwin  wrote:
>> >> >> > On Thursday, April 20, 2017 02:29:30 AM Dexuan Cui wrote:
>> >> >> >> > From: John Baldwin [mailto:j...@freebsd.org]
>> >> >> >> > Sent: Thursday, April 20, 2017 02:34
>> >> >> >> > > Can we add the support of "ACPI0004" with the below one-line 
>> >> >> >> > > change?
>> >> >> >> > >
>> >> >> >> > >  acpi_sysres_probe(device_t dev)
>> >> >> >> > >  {
>> >> >> >> > > -static char *sysres_ids[] = { "PNP0C01", "PNP0C02", NULL };
>> >> >> >> > > +static char *sysres_ids[] = { "PNP0C01", "PNP0C02", 
>> >> >> >> > > "ACPI0004", NULL };
>> >> >> >> > >
>> >> >> >> > Hmm, so the role of C01 and C02 is to reserve system resources, 
>> >> >> >> > though we
>> >> >> >> > in turn allow any child of acpi0 to suballocate those ranges 
>> >> >> >> > (since historically
>> >> >> >> > c01 and c02 tend to allocate I/O ranges that are then used by 
>> >> >> >> > things like the
>> >> >> >> > EC, PS/2 keyboard controller, etc.).  From my reading of ACPI0004 
>> >> >> >> > in the ACPI
>> >> >> >> > 6.1 spec it's not quite clear that ACPI0004 is like that?  In 
>> >> >> >> > particular, it
>> >> >> >> > seems that 004 should only allow direct children to suballocate?  
>> >> >> >> > This
>> >> >> >> > change might work, but it will allow more devices to allocate the 
>> >> >> >> > ranges in
>> >> >> >> >  _CRS than otherwise.
>> >> >> >> >
>> >> >> >> > Do you have an acpidump from a guest system that contains an 
>> >> >> >> > ACPI0004
>> >> >> >> > node that you can share?
>> >> >> >> >
>> >> >> >> > John Baldwin
>> >> >> >>
>> >> >> >> Hi John,
>> >> >> >> Thanks for the help!
>> >> >> >>
>> >> >> >> Please see the attached file, which is got by
>> >> >> >> "acpidump -dt | gzip -c9 > acpidump.dt.gz"
>> >> >> >>
>> >> >> >> In the dump, we can see the "ACPI0004" node (VMOD) is the parent of
>> >> >> >> "VMBus" (VMBS).
>> >> >> >> It looks the _CRS of ACPI0004 is dynamically generated. Though we 
>> >> >> >> can't
>> >> >> >> see the length of the MMIO range in the dumped asl code, it does 
>> >> >> >> have
>> >> >> >> a 512MB MMIO range [0xFE000, 0xF].
>> >> >> >>
>> >> >> >> It looks FreeBSD can't detect ACPI0004 automatically.
>> >> >> >> With the above one-line change, I can first find the child device
>> >> >> >> acpi_sysresource0 of acpi0, then call AcpiWalkResources() to get
>> >> >> >> the _CRS of acpi_sysresource0, i.e. the 512MB MMIO range.
>> >> >> >>
>> >> >> >> If you think we shouldn't touch acpi_sysresource0 here, I guess
>> >> >> >> we can add a new small driver for ACPI0004, just like we added VMBus
>> >> >> >> driver as a child device of acpi0?
>> >> >> >
>> >> >> > Hmmm, so looking at this, the "right" thing is probably to have a 
>> >> >> > device
>> >> >> > driver for the ACPI0004 device that parses its _CRS and then allows 
>> >&g

Re: Add support for ACPI Module Device ACPI0004?

2017-04-29 Thread Sepherosa Ziehau
On Sat, Apr 29, 2017 at 12:01 AM, John Baldwin  wrote:
> On Friday, April 28, 2017 05:38:32 PM Sepherosa Ziehau wrote:
>> On Thu, Apr 27, 2017 at 12:14 AM, John Baldwin  wrote:
>> > On Wednesday, April 26, 2017 09:18:48 AM Sepherosa Ziehau wrote:
>> >> On Wed, Apr 26, 2017 at 4:36 AM, John Baldwin  wrote:
>> >> > On Thursday, April 20, 2017 02:29:30 AM Dexuan Cui wrote:
>> >> >> > From: John Baldwin [mailto:j...@freebsd.org]
>> >> >> > Sent: Thursday, April 20, 2017 02:34
>> >> >> > > Can we add the support of "ACPI0004" with the below one-line 
>> >> >> > > change?
>> >> >> > >
>> >> >> > >  acpi_sysres_probe(device_t dev)
>> >> >> > >  {
>> >> >> > > -static char *sysres_ids[] = { "PNP0C01", "PNP0C02", NULL };
>> >> >> > > +static char *sysres_ids[] = { "PNP0C01", "PNP0C02", 
>> >> >> > > "ACPI0004", NULL };
>> >> >> > >
>> >> >> > Hmm, so the role of C01 and C02 is to reserve system resources, 
>> >> >> > though we
>> >> >> > in turn allow any child of acpi0 to suballocate those ranges (since 
>> >> >> > historically
>> >> >> > c01 and c02 tend to allocate I/O ranges that are then used by things 
>> >> >> > like the
>> >> >> > EC, PS/2 keyboard controller, etc.).  From my reading of ACPI0004 in 
>> >> >> > the ACPI
>> >> >> > 6.1 spec it's not quite clear that ACPI0004 is like that?  In 
>> >> >> > particular, it
>> >> >> > seems that 004 should only allow direct children to suballocate?  
>> >> >> > This
>> >> >> > change might work, but it will allow more devices to allocate the 
>> >> >> > ranges in
>> >> >> >  _CRS than otherwise.
>> >> >> >
>> >> >> > Do you have an acpidump from a guest system that contains an ACPI0004
>> >> >> > node that you can share?
>> >> >> >
>> >> >> > John Baldwin
>> >> >>
>> >> >> Hi John,
>> >> >> Thanks for the help!
>> >> >>
>> >> >> Please see the attached file, which is got by
>> >> >> "acpidump -dt | gzip -c9 > acpidump.dt.gz"
>> >> >>
>> >> >> In the dump, we can see the "ACPI0004" node (VMOD) is the parent of
>> >> >> "VMBus" (VMBS).
>> >> >> It looks the _CRS of ACPI0004 is dynamically generated. Though we can't
>> >> >> see the length of the MMIO range in the dumped asl code, it does have
>> >> >> a 512MB MMIO range [0xFE000, 0xF].
>> >> >>
>> >> >> It looks FreeBSD can't detect ACPI0004 automatically.
>> >> >> With the above one-line change, I can first find the child device
>> >> >> acpi_sysresource0 of acpi0, then call AcpiWalkResources() to get
>> >> >> the _CRS of acpi_sysresource0, i.e. the 512MB MMIO range.
>> >> >>
>> >> >> If you think we shouldn't touch acpi_sysresource0 here, I guess
>> >> >> we can add a new small driver for ACPI0004, just like we added VMBus
>> >> >> driver as a child device of acpi0?
>> >> >
>> >> > Hmmm, so looking at this, the "right" thing is probably to have a device
>> >> > driver for the ACPI0004 device that parses its _CRS and then allows its
>> >> > child devices to sub-allocate resources from the ranges in _CRS.  
>> >> > However,
>> >> > this would mean make VMBus be a child of the ACPI0004 device.  Suppose
>> >> > we called the ACPI0004 driver 'acpi_module' then the 'acpi_module0' 
>> >> > device
>> >> > would need to create a child device for all of its child devices.  Right
>> >> > now acpi0 also creates devices for them which is somewhat messy (acpi0
>> >> > creates child devices anywhere in its namespace that have a valid _HID).
>> >> > You can find those duplicates and remove them during acpi_module0's 
>> >> > attach
>> >> > routine before creating its own ch

Re: Add support for ACPI Module Device ACPI0004?

2017-04-28 Thread Sepherosa Ziehau
On Thu, Apr 27, 2017 at 12:14 AM, John Baldwin  wrote:
> On Wednesday, April 26, 2017 09:18:48 AM Sepherosa Ziehau wrote:
>> On Wed, Apr 26, 2017 at 4:36 AM, John Baldwin  wrote:
>> > On Thursday, April 20, 2017 02:29:30 AM Dexuan Cui wrote:
>> >> > From: John Baldwin [mailto:j...@freebsd.org]
>> >> > Sent: Thursday, April 20, 2017 02:34
>> >> > > Can we add the support of "ACPI0004" with the below one-line change?
>> >> > >
>> >> > >  acpi_sysres_probe(device_t dev)
>> >> > >  {
>> >> > > -static char *sysres_ids[] = { "PNP0C01", "PNP0C02", NULL };
>> >> > > +static char *sysres_ids[] = { "PNP0C01", "PNP0C02", "ACPI0004", 
>> >> > > NULL };
>> >> > >
>> >> > Hmm, so the role of C01 and C02 is to reserve system resources, though 
>> >> > we
>> >> > in turn allow any child of acpi0 to suballocate those ranges (since 
>> >> > historically
>> >> > c01 and c02 tend to allocate I/O ranges that are then used by things 
>> >> > like the
>> >> > EC, PS/2 keyboard controller, etc.).  From my reading of ACPI0004 in 
>> >> > the ACPI
>> >> > 6.1 spec it's not quite clear that ACPI0004 is like that?  In 
>> >> > particular, it
>> >> > seems that 004 should only allow direct children to suballocate?  This
>> >> > change might work, but it will allow more devices to allocate the 
>> >> > ranges in
>> >> >  _CRS than otherwise.
>> >> >
>> >> > Do you have an acpidump from a guest system that contains an ACPI0004
>> >> > node that you can share?
>> >> >
>> >> > John Baldwin
>> >>
>> >> Hi John,
>> >> Thanks for the help!
>> >>
>> >> Please see the attached file, which is got by
>> >> "acpidump -dt | gzip -c9 > acpidump.dt.gz"
>> >>
>> >> In the dump, we can see the "ACPI0004" node (VMOD) is the parent of
>> >> "VMBus" (VMBS).
>> >> It looks the _CRS of ACPI0004 is dynamically generated. Though we can't
>> >> see the length of the MMIO range in the dumped asl code, it does have
>> >> a 512MB MMIO range [0xFE000, 0xF].
>> >>
>> >> It looks FreeBSD can't detect ACPI0004 automatically.
>> >> With the above one-line change, I can first find the child device
>> >> acpi_sysresource0 of acpi0, then call AcpiWalkResources() to get
>> >> the _CRS of acpi_sysresource0, i.e. the 512MB MMIO range.
>> >>
>> >> If you think we shouldn't touch acpi_sysresource0 here, I guess
>> >> we can add a new small driver for ACPI0004, just like we added VMBus
>> >> driver as a child device of acpi0?
>> >
>> > Hmmm, so looking at this, the "right" thing is probably to have a device
>> > driver for the ACPI0004 device that parses its _CRS and then allows its
>> > child devices to sub-allocate resources from the ranges in _CRS.  However,
>> > this would mean make VMBus be a child of the ACPI0004 device.  Suppose
>> > we called the ACPI0004 driver 'acpi_module' then the 'acpi_module0' device
>> > would need to create a child device for all of its child devices.  Right
>> > now acpi0 also creates devices for them which is somewhat messy (acpi0
>> > creates child devices anywhere in its namespace that have a valid _HID).
>> > You can find those duplicates and remove them during acpi_module0's attach
>> > routine before creating its own child device_t devices.  (We associate
>> > a device_t with each Handle when creating device_t's for ACPI handles
>> > which is how you can find the old device that is a direct child of acpi0
>> > so that it can be removed).
>>
>> The remove/reassociate vmbus part seems kinda "messy" to me.  I'd just
>> hook up a new acpi0004 driver, and let vmbus parse the _CRS like what
>> we did to the hyper-v's pcib0.
>
> The acpi_pci driver used to do the remove/reassociate part.  What acpi0
> should probably be doing is only creating device_t nodes for immediate
> children.  This would require an ACPI-aware isa0 for LPC devices below
> the ISA bus in the ACPI namespace.  We haven't done that in part because
> BIOS vendors are not always consistent in placi

Re: Add support for ACPI Module Device ACPI0004?

2017-04-25 Thread Sepherosa Ziehau
On Wed, Apr 26, 2017 at 4:36 AM, John Baldwin  wrote:
> On Thursday, April 20, 2017 02:29:30 AM Dexuan Cui wrote:
>> > From: John Baldwin [mailto:j...@freebsd.org]
>> > Sent: Thursday, April 20, 2017 02:34
>> > > Can we add the support of "ACPI0004" with the below one-line change?
>> > >
>> > >  acpi_sysres_probe(device_t dev)
>> > >  {
>> > > -static char *sysres_ids[] = { "PNP0C01", "PNP0C02", NULL };
>> > > +static char *sysres_ids[] = { "PNP0C01", "PNP0C02", "ACPI0004", 
>> > > NULL };
>> > >
>> > Hmm, so the role of C01 and C02 is to reserve system resources, though we
>> > in turn allow any child of acpi0 to suballocate those ranges (since 
>> > historically
>> > c01 and c02 tend to allocate I/O ranges that are then used by things like 
>> > the
>> > EC, PS/2 keyboard controller, etc.).  From my reading of ACPI0004 in the 
>> > ACPI
>> > 6.1 spec it's not quite clear that ACPI0004 is like that?  In particular, 
>> > it
>> > seems that 004 should only allow direct children to suballocate?  This
>> > change might work, but it will allow more devices to allocate the ranges in
>> >  _CRS than otherwise.
>> >
>> > Do you have an acpidump from a guest system that contains an ACPI0004
>> > node that you can share?
>> >
>> > John Baldwin
>>
>> Hi John,
>> Thanks for the help!
>>
>> Please see the attached file, which is got by
>> "acpidump -dt | gzip -c9 > acpidump.dt.gz"
>>
>> In the dump, we can see the "ACPI0004" node (VMOD) is the parent of
>> "VMBus" (VMBS).
>> It looks the _CRS of ACPI0004 is dynamically generated. Though we can't
>> see the length of the MMIO range in the dumped asl code, it does have
>> a 512MB MMIO range [0xFE000, 0xF].
>>
>> It looks FreeBSD can't detect ACPI0004 automatically.
>> With the above one-line change, I can first find the child device
>> acpi_sysresource0 of acpi0, then call AcpiWalkResources() to get
>> the _CRS of acpi_sysresource0, i.e. the 512MB MMIO range.
>>
>> If you think we shouldn't touch acpi_sysresource0 here, I guess
>> we can add a new small driver for ACPI0004, just like we added VMBus
>> driver as a child device of acpi0?
>
> Hmmm, so looking at this, the "right" thing is probably to have a device
> driver for the ACPI0004 device that parses its _CRS and then allows its
> child devices to sub-allocate resources from the ranges in _CRS.  However,
> this would mean make VMBus be a child of the ACPI0004 device.  Suppose
> we called the ACPI0004 driver 'acpi_module' then the 'acpi_module0' device
> would need to create a child device for all of its child devices.  Right
> now acpi0 also creates devices for them which is somewhat messy (acpi0
> creates child devices anywhere in its namespace that have a valid _HID).
> You can find those duplicates and remove them during acpi_module0's attach
> routine before creating its own child device_t devices.  (We associate
> a device_t with each Handle when creating device_t's for ACPI handles
> which is how you can find the old device that is a direct child of acpi0
> so that it can be removed).

The remove/reassociate vmbus part seems kinda "messy" to me.  I'd just
hook up a new acpi0004 driver, and let vmbus parse the _CRS like what
we did to the hyper-v's pcib0.

Thanks,
sephe

-- 
Tomorrow Will Never Die
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Please test EARLY_AP_STARTUP

2016-12-01 Thread Sepherosa Ziehau
On Fri, Dec 2, 2016 at 12:53 PM, Sepherosa Ziehau  wrote:
> On Fri, Dec 2, 2016 at 1:49 AM, John Baldwin  wrote:
>> On Thursday, December 01, 2016 01:53:29 PM Sepherosa Ziehau wrote:
>>> On Wed, Nov 30, 2016 at 9:59 AM, Sepherosa Ziehau  
>>> wrote:
>>> >>> After fdc is disabled and hyperv/storvsc is fixed, it seems to boot
>>> >>> fine, except a long delay (28~30seconds) here:
>>> >>> 
>>> >>> Timecounters tick every 1.000 msec
>>> >>> -
>>> >>> 28 ~ 30 seconds delay
>>> >>> -
>>> >>> vlan: initialized, using hash tables with chaining
>>> >>> 
>>> >>>
>>> >>> I have the bootverbose dmesg here:
>>> >>> https://people.freebsd.org/~sephe/dmesg_earlyap.txt
>>> >>>
>>> >>> I booted 10 times, only one boot does not suffer this 30 seconds
>>> >>> delay.  It sounds like some races to me.  Any hints?
>>> >>
>>> >> It is likely a race as we start running things sooner now, yes.  Can you
>>> >> break into DDB during the hang and see what thread0 is waiting on?  If
>>> >> it is in the interrupt hooks you can use 'show conifhk' in DDB to see the
>>> >> list of pending interrupt hooks.  That provides a list of candidate 
>>> >> drivers
>>> >> to inspect (e.g. stack traces of relevant kthreads) for what is actually
>>> >> waiting (and what it is waiting on)
>>> >
>>> > Just tried, but I failed to break into DDB during the 30 seconds
>>> > delay.  DDB was entered after the 30 seconds delay, though I press the
>>> > break key when the delay started.
>>>
>>> I tried add VERBOSE_SYSINIT option in order to get a rough location of
>>> this delay, but the system boots just fine w/ VERBOSE_SYSINIT option,
>>> sigh.
>>
>> You could add KTR_PROC tracing and use 'show ktr' in DDB when you break in 
>> after the
>> 30 second delay to see what it was doing during the delay perhaps?
>
> I have narrowed it down by patching the VERBOSE_SYSINIT: the
> kthread_start(&deadlkres_kd) introduces the 30 seconds delay, i.e.
> SYSINIT(deadlkres, SI_SUB_CLOCKS, SI_ORDER_ANY, kthread_start,
> &deadlkres_kd) blocks for 30 seconds.

I commented out the DEADLKRES option, now the delay happens randomly,
sometimes even before a VERBOSE_SYSINIT entry logging.  But the delay
always happens after inittimecounter(0), and before the
SI_SUB_INT_CONFIG_HOOKS.

I didn't notice useful information from DDB 'show ktr', once it
thawed.  Anything I should be aware of in the KTR_PROC ktrlog?

Thanks,
sephe

-- 
Tomorrow Will Never Die
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Please test EARLY_AP_STARTUP

2016-12-01 Thread Sepherosa Ziehau
On Fri, Dec 2, 2016 at 1:49 AM, John Baldwin  wrote:
> On Thursday, December 01, 2016 01:53:29 PM Sepherosa Ziehau wrote:
>> On Wed, Nov 30, 2016 at 9:59 AM, Sepherosa Ziehau  
>> wrote:
>> >>> After fdc is disabled and hyperv/storvsc is fixed, it seems to boot
>> >>> fine, except a long delay (28~30seconds) here:
>> >>> 
>> >>> Timecounters tick every 1.000 msec
>> >>> -
>> >>> 28 ~ 30 seconds delay
>> >>> -
>> >>> vlan: initialized, using hash tables with chaining
>> >>> 
>> >>>
>> >>> I have the bootverbose dmesg here:
>> >>> https://people.freebsd.org/~sephe/dmesg_earlyap.txt
>> >>>
>> >>> I booted 10 times, only one boot does not suffer this 30 seconds
>> >>> delay.  It sounds like some races to me.  Any hints?
>> >>
>> >> It is likely a race as we start running things sooner now, yes.  Can you
>> >> break into DDB during the hang and see what thread0 is waiting on?  If
>> >> it is in the interrupt hooks you can use 'show conifhk' in DDB to see the
>> >> list of pending interrupt hooks.  That provides a list of candidate 
>> >> drivers
>> >> to inspect (e.g. stack traces of relevant kthreads) for what is actually
>> >> waiting (and what it is waiting on)
>> >
>> > Just tried, but I failed to break into DDB during the 30 seconds
>> > delay.  DDB was entered after the 30 seconds delay, though I press the
>> > break key when the delay started.
>>
>> I tried add VERBOSE_SYSINIT option in order to get a rough location of
>> this delay, but the system boots just fine w/ VERBOSE_SYSINIT option,
>> sigh.
>
> You could add KTR_PROC tracing and use 'show ktr' in DDB when you break in 
> after the
> 30 second delay to see what it was doing during the delay perhaps?

I have narrowed it down by patching the VERBOSE_SYSINIT: the
kthread_start(&deadlkres_kd) introduces the 30 seconds delay, i.e.
SYSINIT(deadlkres, SI_SUB_CLOCKS, SI_ORDER_ANY, kthread_start,
&deadlkres_kd) blocks for 30 seconds.

Thanks,
sephe

-- 
Tomorrow Will Never Die
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Please test EARLY_AP_STARTUP

2016-11-30 Thread Sepherosa Ziehau
On Wed, Nov 30, 2016 at 9:59 AM, Sepherosa Ziehau  wrote:
> On Tue, Nov 29, 2016 at 2:27 AM, John Baldwin  wrote:
>> On Monday, November 28, 2016 02:35:07 PM Sepherosa Ziehau wrote:
>>> Hi John,
>>>
>>> fdc seems to cause panic on Hyper-V:
>>> https://people.freebsd.org/~sephe/fdc_panic.png
>>
>> You shouldn't get this panic in latest HEAD (post-r309148).
>
>
> The base of my kernel tree is ~20 days old :)
>
>
>>
>>> I then commented out device fdc, and I fixed one panic on Hyper-V here:
>>> https://reviews.freebsd.org/D8656
>>
>> Replied to the review.
>>
>>> After fdc is disabled and hyperv/storvsc is fixed, it seems to boot
>>> fine, except a long delay (28~30seconds) here:
>>> 
>>> Timecounters tick every 1.000 msec
>>> -
>>> 28 ~ 30 seconds delay
>>> -
>>> vlan: initialized, using hash tables with chaining
>>> 
>>>
>>> I have the bootverbose dmesg here:
>>> https://people.freebsd.org/~sephe/dmesg_earlyap.txt
>>>
>>> I booted 10 times, only one boot does not suffer this 30 seconds
>>> delay.  It sounds like some races to me.  Any hints?
>>
>> It is likely a race as we start running things sooner now, yes.  Can you
>> break into DDB during the hang and see what thread0 is waiting on?  If
>> it is in the interrupt hooks you can use 'show conifhk' in DDB to see the
>> list of pending interrupt hooks.  That provides a list of candidate drivers
>> to inspect (e.g. stack traces of relevant kthreads) for what is actually
>> waiting (and what it is waiting on)
>
> Just tried, but I failed to break into DDB during the 30 seconds
> delay.  DDB was entered after the 30 seconds delay, though I press the
> break key when the delay started.

I tried add VERBOSE_SYSINIT option in order to get a rough location of
this delay, but the system boots just fine w/ VERBOSE_SYSINIT option,
sigh.

Thanks,
sephe

-- 
Tomorrow Will Never Die
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Please test EARLY_AP_STARTUP

2016-11-29 Thread Sepherosa Ziehau
On Tue, Nov 29, 2016 at 2:27 AM, John Baldwin  wrote:
> On Monday, November 28, 2016 02:35:07 PM Sepherosa Ziehau wrote:
>> Hi John,
>>
>> fdc seems to cause panic on Hyper-V:
>> https://people.freebsd.org/~sephe/fdc_panic.png
>
> You shouldn't get this panic in latest HEAD (post-r309148).


The base of my kernel tree is ~20 days old :)


>
>> I then commented out device fdc, and I fixed one panic on Hyper-V here:
>> https://reviews.freebsd.org/D8656
>
> Replied to the review.
>
>> After fdc is disabled and hyperv/storvsc is fixed, it seems to boot
>> fine, except a long delay (28~30seconds) here:
>> 
>> Timecounters tick every 1.000 msec
>> -
>> 28 ~ 30 seconds delay
>> -
>> vlan: initialized, using hash tables with chaining
>> 
>>
>> I have the bootverbose dmesg here:
>> https://people.freebsd.org/~sephe/dmesg_earlyap.txt
>>
>> I booted 10 times, only one boot does not suffer this 30 seconds
>> delay.  It sounds like some races to me.  Any hints?
>
> It is likely a race as we start running things sooner now, yes.  Can you
> break into DDB during the hang and see what thread0 is waiting on?  If
> it is in the interrupt hooks you can use 'show conifhk' in DDB to see the
> list of pending interrupt hooks.  That provides a list of candidate drivers
> to inspect (e.g. stack traces of relevant kthreads) for what is actually
> waiting (and what it is waiting on)

Just tried, but I failed to break into DDB during the 30 seconds
delay.  DDB was entered after the 30 seconds delay, though I press the
break key when the delay started.

Thanks,
sephe

-- 
Tomorrow Will Never Die
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: Please test EARLY_AP_STARTUP

2016-11-27 Thread Sepherosa Ziehau
Hi John,

fdc seems to cause panic on Hyper-V:
https://people.freebsd.org/~sephe/fdc_panic.png

I then commented out device fdc, and I fixed one panic on Hyper-V here:
https://reviews.freebsd.org/D8656

After fdc is disabled and hyperv/storvsc is fixed, it seems to boot
fine, except a long delay (28~30seconds) here:

Timecounters tick every 1.000 msec
-
28 ~ 30 seconds delay
-
vlan: initialized, using hash tables with chaining


I have the bootverbose dmesg here:
https://people.freebsd.org/~sephe/dmesg_earlyap.txt

I booted 10 times, only one boot does not suffer this 30 seconds
delay.  It sounds like some races to me.  Any hints?

Thanks,
sephe


On Sat, Nov 26, 2016 at 2:20 AM, John Baldwin  wrote:
> I plan to enable EARLY_AP_STARTUP on x86 in a week on HEAD.  Some folks
> have been testing it for the last week or so which has exposed some
> additional things to fix.  I think I've resolved most of those in one
> way or another, but it will make things smoother if other folks can
> start testing this over the next few days before it is enabled by default.
>
> (To enable, add 'options EARLY_AP_STARTUP' to your kernel config.)
>
> Note that non-x86 platforms should eventually adopt this, but I don't
> think any of them are ready yet.
>
> --
> John Baldwin
> ___
> freebsd-current@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"



-- 
Tomorrow Will Never Die
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: (boost::)asio and kqueue problem

2016-07-19 Thread Sepherosa Ziehau
On Wed, Jul 20, 2016 at 12:38 PM, Konstantin Belousov
 wrote:
> On Tue, Jul 19, 2016 at 05:35:59PM +0200, Hartmut Brandt wrote:
>> Hi,
>>
>> I'm trying to use asio (that's boost::asio without boost) to handle
>> listening sockets asynchronuosly. This appears not to work. There are also
>> some reports on the net about this problem. I was able to reproduce the
>> problem with a small C-programm that does the same steps as asio. The
>> relevant sequence of system calls is:
>>
>> kqueue()   = 3 (0x3)
>> socket(PF_INET,SOCK_STREAM,6)  = 4 (0x4)
>> setsockopt(0x4,0x,0x800,0x7fffea2c,0x4)= 0 (0x0)
>> kevent(3,{ 4,EVFILT_READ,EV_ADD|EV_CLEAR,0x0,0x0,0x0 
>> 4,EVFILT_WRITE,EV_ADD|EV_CLEAR,0x0,0x0,0x0 },2,0x0,0,0x0) = 0 (0x0)
>> setsockopt(0x4,0x,0x4,0x7fffea2c,0x4)  = 0 (0x0)
>> bind(4,{ AF_INET 0.0.0.0:8080 },16)= 0 (0x0)
>> listen(0x4,0x80)   = 0 (0x0)
>> ioctl(4,FIONBIO,0xea2c)= 0 (0x0)
>> kevent(3,{ 4,EVFILT_READ,EV_ADD|EV_CLEAR,0x0,0x0,0x0 
>> 4,EVFILT_WRITE,EV_ADD|EV_CLEAR,0x0,0x0,0x0 },2,0x0,0,0x0) = 0 (0x0)
>> kevent(3,0x0,0,0x7fffe5a0,32,0x0)  ERR#4 'Interrupted system 
>> call'
>>
>> The problem here is that asio registers each file descriptor with
>> EVFILT_READ and EVFILT_WRITE as soon as it is opened (first kevent call).
>> After bringing the socket into the listening state and when async_accept()
>> is called it registers the socket a second time. According to the man page
>> this is perfectly legal and can be used to modify the registration.
>>
>> With this sequence of calls kevent() does not return when a connection is
>> established successfully.
>>
>> I tracked down the problem and the reason is in soo_kqfilter(). This is
>> called for the first EVFILT_READ registration and decides based on the
>> SO_ACCEPTCONN flag which filter operations to use solisten_filtops or
>> soread_filtops. In this case it chooses soread_filtops.
>>
>> The second EVFILT_READ registration does not call soo_kqfilter() again,
>> but just updates the filter from the data and fflags field so the
>> listening socket ends up with the wrong filter operations.
>>
>> The attached patch fixes this (kind of) by using the f_touch
>> operation (currently used only by the user filter). The filt_sotouch()
>> function changes the operation pointer in the knote when the socket is now
>> in the listening state. I suppose that the required locking is already
>> done in kqueue_register(), but I'm not sure. Asynchronous accepting now 
>> works.
>>
>> A better fix would probably be to change the operation vector on all
>> knotes attached to the socket in solisten(), but I fear I don't have the
>> necessary understanding of the locking that is required for this.
>>
>> Could somebody with enough kqueue() knowledge look whether the patch is
>> correct lock-wise?
> I find it weird that the fix still requires re-registration of the socket
> event to get it working after socket is marked as listen.  In other words,
> until the re-registration is done, the events for the registered filter
> are lost.
>
> IMO more correct solution would be to merge the filt_solisten and
> filt_soread, deciding which path to take by testing the SO_ACCEPTCON
> flag in the f_event() op.

This is reasonable.

Thanks,
sephe

-- 
Tomorrow Will Never Die
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: FreeBSD-11.0-BETA1-amd64-disc1.iso is too big for my 700MB CD-r

2016-07-15 Thread Sepherosa Ziehau
On Fri, Jul 15, 2016 at 7:13 AM, Glen Barber  wrote:
> With additional tweaks, I was able to get the CD to boot both with
> a real internal CD-ROM drive, as well as USB CD-ROM.
>
> I have uploaded a disc1.iso image here:
>
>  https://people.freebsd.org/~gjb/disc1_uzip.iso
>
> Could people try this on various hardware, KVM setups, and so on?  I'm

It works for me on Hyper-V.

Thanks,
sephe

-- 
Tomorrow Will Never Die
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: FreeBSD_HEAD_i386 - Build #2847 - Still Failing

2016-04-13 Thread Sepherosa Ziehau
On Wed, Apr 13, 2016 at 8:46 PM, David Somayajulu
 wrote:
> Hi All,
> I would appreciate if you could let me know how to back off the following 
> commits.
>
> https://svnweb.freebsd.org/changeset/base/297916
> https://svnweb.freebsd.org/changeset/base/297909
> https://svnweb.freebsd.org/changeset/base/297898
>
> I tried  the following but it fails:
>  cd 
>  svn merge -c -r297916

svn merge -c -r .

You missed the '.'

Thanks,
sephe


>
> Thanks
> David S.
>
> -Original Message-
> From: owner-freebsd-curr...@freebsd.org 
> [mailto:owner-freebsd-curr...@freebsd.org] On Behalf Of 
> jenkins-ad...@freebsd.org
> Sent: Wednesday, April 13, 2016 5:14 AM
> To: m...@freebsd.org; and...@freebsd.org; se...@freebsd.org; 
> jenkins-ad...@freebsd.org; freebsd-current@FreeBSD.org; 
> freebsd-i...@freebsd.org
> Subject: FreeBSD_HEAD_i386 - Build #2847 - Still Failing
>
> FreeBSD_HEAD_i386 - Build #2847 - Still Failing:
>
> Build information: https://jenkins.FreeBSD.org/job/FreeBSD_HEAD_i386/2847/
> Full change log: 
> https://jenkins.FreeBSD.org/job/FreeBSD_HEAD_i386/2847/changes
> Full build log: https://jenkins.FreeBSD.org/job/FreeBSD_HEAD_i386/2847/console
>
> Change summaries:
>
> 297915 by mav:
> Filter Port Database Changed notifications.
>
> For some reason firmware sends Port Database Changed notifications in case of 
> explicit login requests from the driver when target port is unavailabe.
> Those notifications don't give driver any new information, but only cause 
> infinite scan loop.
>
> 297914 by andrew:
> Increase the arm64 kernel address space to 512GB, and the DMAP region to 2TB. 
> The latter can be increased in 512GB chunks by adjusting the lower address, 
> however more work will be needed to increase the former.
>
> There is still some work needed to only create a DMAP region for the RAM 
> address space as on ARM architectures all mappings should have the same 
> memory attributes, and these will be different for device and normal memory.
>
> Reviewed by:kib
> Obtained from:  ABT Systems Ltd
> Relnotes:   yes
> Sponsored by:   The FreeBSD Foundation
> Differential Revision:  https://reviews.freebsd.org/D5859
>
> 297913 by sephe:
> hyperv: device_get_softc does not return NULL
>
> MFC after:  1 week
> Sponsored by:   Microsoft OSTC
>
>
>
> The end of the build log:
>
> [...truncated 155318 lines...]
> cc  -O2 -pipe  -fno-strict-aliasing -Werror -D_KERNEL -DKLD_MODULE -nostdinc  
> -I. -I/usr/src/sys/modules/ath/../../dev/ath 
> -I/usr/src/sys/modules/ath/../../dev/ath/ath_hal -I. 
> -I/usr/src/sys/modules/ath/../../contrib/dev/ath/ath_hal/ 
> -DHAVE_KERNEL_OPTION_HEADERS -include 
> /usr/obj/usr/src/sys/GENERIC/opt_global.h -I. -I/usr/src/sys -fno-common -g 
> -I/usr/obj/usr/src/sys/GENERIC  -MD  -MF.depend.ah_eeprom_v14.o 
> -MTah_eeprom_v14.o -mno-mmx -mno-sse -msoft-float -ffreestanding -fwrapv 
> -fstack-protector -gdwarf-2 -Wall -Wredundant-decls -Wnested-externs 
> -Wstrict-prototypes  -Wmissing-prototypes -Wpointer-arith -Winline 
> -Wcast-qual  -Wundef -Wno-pointer-sign -D__printf__=__freebsd_kprintf__  
> -Wmissing-include-dirs -fdiagnostics-show-option  -Wno-unknown-pragmas  
> -Wno-error-tautological-compare -Wno-error-empty-body  
> -Wno-error-parentheses-equality -Wno-error-unused-function  
> -Wno-error-pointer-sign -Wno-error-shift-negative-value  -mno-aes -mno-avx  
> -std=iso9899:1999 -c /usr/src/sys/modul
 es  /ath/../../dev/ath/ath_hal/ah_eeprom_v14.c -o ah_eeprom_v14.o ctfconvert 
-L VERSION -g ah_eeprom_v14.o
> --- ah_eeprom_v4k.o ---
> cc  -O2 -pipe  -fno-strict-aliasing -Werror -D_KERNEL -DKLD_MODULE -nostdinc  
> -I. -I/usr/src/sys/modules/ath/../../dev/ath 
> -I/usr/src/sys/modules/ath/../../dev/ath/ath_hal -I. 
> -I/usr/src/sys/modules/ath/../../contrib/dev/ath/ath_hal/ 
> -DHAVE_KERNEL_OPTION_HEADERS -include 
> /usr/obj/usr/src/sys/GENERIC/opt_global.h -I. -I/usr/src/sys -fno-common -g 
> -I/usr/obj/usr/src/sys/GENERIC  -MD  -MF.depend.ah_eeprom_v4k.o 
> -MTah_eeprom_v4k.o -mno-mmx -mno-sse -msoft-float -ffreestanding -fwrapv 
> -fstack-protector -gdwarf-2 -Wall -Wredundant-decls -Wnested-externs 
> -Wstrict-prototypes  -Wmissing-prototypes -Wpointer-arith -Winline 
> -Wcast-qual  -Wundef -Wno-pointer-sign -D__printf__=__freebsd_kprintf__  
> -Wmissing-include-dirs -fdiagnostics-show-option  -Wno-unknown-pragmas  
> -Wno-error-tautological-compare -Wno-error-empty-body  
> -Wno-error-parentheses-equality -Wno-error-unused-function  
> -Wno-error-pointer-sign -Wno-error-shift-negative-value  -mno-aes -mno-avx  
> -std=iso9899:1999 -c /usr/src/sys/modul
 es  /ath/../../dev/ath/ath_hal/ah_eeprom_v4k.c -o ah_eeprom_v4k.o ctfconvert 
-L VERSION -g ah_eeprom_v4k.o
> --- all_subdir_cam ---
> ctfconvert -L VERSION -g scsi_ch.o
> --- all_subdir_ath ---
> --- ar5416_ani.o ---
> cc  -O2 -pipe  -fno-strict-aliasing -Werror -D_KERNEL -DKLD_MODULE -nostdinc  
> -I. -I/usr/src/sys/modules/ath/../../dev/ath 
> -I/usr/src/sys/modules/ath/../../dev/ath/ath_ha

Re: Revision 297176 - hyperv/evttimer: Use an independent message slot so that it can work

2016-04-08 Thread Sepherosa Ziehau
I have reverted this change.  It will be brought back, after some code
refactoring.

On Fri, Apr 8, 2016 at 4:22 PM,   wrote:
>
> Hello
>
> I recently update one of my many vms from an older CURRENT revision r297196
> to r297659 and on reboot it just panics with the following:
>
> FreeBSD clang version 3.8.0 (tags/RELEHSE_380/final 262564) (based on LLVM
> 3.8.0
> )
> VT(vga): text 80x25
> Timecounter "Hyper-V" frequency 1000 Hz quality 1000
> Kernel trap 9 with interrupts disabled
>
>
> Fatal trap 9: general protection fault while in kernel mode
> cpuid = 0: apic id = 00
> instruction pointer = 0x20:0x8100d6?9
> stack pointer   = 0x28:ox820d5c30
> frame pointer   = 0x28:ox820d5c40
> code segment= base 0x0, limit 0xf, type 0x1b
> = DPL 0, pres 1, long 1, def32 0, gran 1
> processor eflags= IOPL = 0
> current process = 0 ()
> [ thread pid 0 tid 0 ]
> stopped at  hv_get_timecount+0x9:   rdmsr
> db) wh
> Tracing pid 0 tid 0 td 0x81d0eff0
> hv_get_timecount() at hv_get_timecount+0x9/frame 0x820d5c40
> tc_init() at tc_init+0x251/frame 0x820d5c90
> mi_startup() at mi_startup+0x118/frame 0x820d5cb0
> btext() at btext+ox2c   =
> db>
>
> I changed hv_hv.c back to the previous revision (297176) and no panics under
> Xen VM.
>
> Thanks!
>
> p.s. not sure why Xen gets detected as HyperV
>



-- 
Tomorrow Will Never Die
___
freebsd-current@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [PATCH] SO_REUSEADDR and SO_REUSEPORT behaviour

2013-12-05 Thread Sepherosa Ziehau
On Tue, Dec 3, 2013 at 5:41 AM, Adrian Chadd  wrote:
>
> On 2 December 2013 03:45, Sepherosa Ziehau  wrote:
> >
> > On Mon, Dec 2, 2013 at 1:02 PM, Adrian Chadd  wrote:
> >
> >> Ok, so given this, how do you guarantee the UTHREAD stays on the given
> >> CPU? You assume it stays on the CPU that the initial listen socket was
> >> created on, right? If it's migrated to another CPU core then the
> >> listen queue still stays in the original hash group that's in a netisr
> >> on a different CPU?
> >
> > As I wrote in the above brief introduction, Dfly currently relies on the
> > scheduler doing the proper thing (the scheduler does do a very good job
> > during my tests).  I need to export certain kind of socket option to make
> > that information available to user space programs.  Force UTHREAD binding in
> > kernel is not helpful, given in reverse proxy application, things are
> > different.  And even if that kind of binding information was exported to
> > user space, user space program still would have to poll it periodically (in
> > Dfly at least), since other programs binding to the same addr/port could
> > come and go, which will cause reorganizing of the inp localgroup in the
> > current Dfly implementation.
>
> Right. I kinda gathered that. It's fine, I was conceptually thinking
> of doing some thead pinning into this anyway.
>
> How do you see this scaling on massively multi-core machines? Like 32,
> 48, 64, 128 cores? I had some vague handwav-y notion of maybe limiting

We do have a 48 core box.  It is mainly used for package building and
other stuffs.  I didn't run network stress tests on it.  However, we
do address some message passing problems on it which will not be
unveiled on 8 cpu boxes.

> the concept of pcbgroup hash / netisr threads to a subset of CPUs, or
> have them be able to float between sockets but only have 1 (or n,

Floating around may be good, but by pinning netisr to a specific CPU
you could enjoy lockless per-cpu data.

> maybe) per socket. Or just have a fixed, smaller pool. The idea then

We used to have dedicated threads for UDP and TCP processing, but it
turns out that one netisr per cpu works best in Dfly.  You probably
need to try and measure before deciding to move to 1 or N netisrs per
cpu.

Best Regards,
sephe

> is the scheduler would need to be told that a given userland
> thread/process belongs to a given netisr thread, and to schedule them
> on the same CPU when possible.
>
> Anyway, thanks for doing this work. I only wish that you'd do it for
> FreeBSD. :-)
>
>
>
> -adrian




-- 
Tomorrow Will Never Die
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [PATCH] SO_REUSEADDR and SO_REUSEPORT behaviour

2013-12-02 Thread Sepherosa Ziehau
On Mon, Dec 2, 2013 at 1:02 PM, Adrian Chadd  wrote:

> Hi! Thanks for the writeup!
>
> On 1 December 2013 20:17, Sepherosa Ziehau  wrote:
>
> > I also put up a brief description of SO_REUSEPORT in dfly; may be useful
> to
> > you:
> > http://leaf.dragonflybsd.org/~sephe/netisr_so_reuseport.txt
>
> Ok, so given this, how do you guarantee the UTHREAD stays on the given
> CPU? You assume it stays on the CPU that the initial listen socket was
> created on, right? If it's migrated to another CPU core then the
> listen queue still stays in the original hash group that's in a netisr
> on a different CPU?
>
>
As I wrote in the above brief introduction, Dfly currently relies on the
scheduler doing the proper thing (the scheduler does do a very good job
during my tests).  I need to export certain kind of socket option to make
that information available to user space programs.  Force UTHREAD binding
in kernel is not helpful, given in reverse proxy application, things are
different.  And even if that kind of binding information was exported to
user space, user space program still would have to poll it periodically (in
Dfly at least), since other programs binding to the same addr/port could
come and go, which will cause reorganizing of the inp localgroup in the
current Dfly implementation.

Best Regards,
sephe

-- 
Tomorrow Will Never Die
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"


Re: [PATCH] SO_REUSEADDR and SO_REUSEPORT behaviour

2013-12-02 Thread Sepherosa Ziehau
On Mon, Dec 2, 2013 at 12:29 PM, Oleg Moskalenko wrote:

> Sepherosa, while reading your description I noticed another long-standing
> problem for UDP application developers: the UDP sockets are always hashed
> with 2-tuple. But UDP sockets can be "connected", too, to a remote address,
> with connect(...)
>

The connected UDP sockets will be in connect hash, which is hashed using
faddr/laddr/fport/lport.  SO_REUSEPORT only affects wildcard sockets.


> function. Unfortunately, with 2-tuple hashing, that pattern is useless for
> large-scale applications: if a large number of UDP sockets on the same
> local port are "connected" to remote address, then the kernel have to go
> thru the long list of UDP sockets with the same hash value.
>
> If the connected UDP sockets would use 4-tuples, then it would be very
> helpful for the new generation of the UDP-based media applications. For
> example, servers which use DTLS protocol would become simpler and more
> efficient.
>
>
If you are talking about RSS, then igb, ixgbe and mxge (and may be other
drivers) support RSS extension (mxge is not using RSS, but still 4-tuple
hash), which will include UDP fport/lport into Toeplitz hash calculation.
Well, for fragments of a UDP datagram, if the ports are taken into
consideration the RSS hash will be different for leading fragment and rest
of the fragments; I think that's why MS didn't include ports for UDP.

Best Regards,
sephe


> Thanks
> Oleg
>
>
>
> On Sun, Dec 1, 2013 at 8:17 PM, Sepherosa Ziehau wrote:
>
>>
>>
>>
>> On Sat, Nov 30, 2013 at 2:42 AM, Ermal Luçi  wrote:
>>
>>> Well seems Dragonfly has some version of it already from commit [1].
>>>
>>>
>> The distribution algorithm was changed a little bit after initial commit
>> to gain more idle time (bnx(4) output has already been maxed out):
>>
>> http://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/c275f18d832361be28b150d3f4fd518914bdeba6
>>
>> Well, I also addressed a reasonable concern from nginx folks (I am not
>> quite sure about Linux's position on it; Linux original implementation of
>> SO_REUSEPORT from Google had this drawback, which I mentioned in the commit
>> message):
>>
>> http://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/02ad2f0b874fb0a45eb69750219f79f5e8982272
>>
>> As about nginx, SO_REUSEPORT patch for nginx (both 1.4.x and 1.5.x) is in
>> dports; should be easier to be back ported to FreeBSD's ports.  I failed to
>> convince nginx folks to merge it into mainline and I am currently onto
>> other stuffs, will come back to them later.  If FreeBSD is going to
>> implement Linux's style of SO_REUSEPORT, pushing the patch to the nginx
>> mainline will be easier.
>>
>> I also put up a brief description of SO_REUSEPORT in dfly; may be useful
>> to you:
>> http://leaf.dragonflybsd.org/~sephe/netisr_so_reuseport.txt
>>
>> Best Regards,
>> sephe
>>
>>
>>>  In FreeBSD there is the framework for this with by defining PCBGROUP.
>>> Also the explanation of it at [2] and [3].
>>> It can achieve approximately the same features of SO_RESUSEPORT of linux.
>>> The only thing missing is the marketing behind it and i think and better
>>> RSS support.
>>> By looking at dates the support is there before linux so all you guys
>>> looking for it can experiment with it.
>>>
>>> What i was trying to accomplish was something else from performance
>>> improvement and
>>> maybe put a sysctl behind it to make it more acceptable..
>>>
>>> [1]
>>>
>>> http://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/740d1d9f7b7bf9c9c021abb8197718d7a2d441c9
>>> [2]
>>> http://fxr.watson.org/fxr/source/netinet/in_pcbgroup.c?im=bigexcerpts#L51
>>> [3]
>>> http://lists.freebsd.org/pipermail/svn-src-head/2011-June/028190.html
>>>
>>>
>>> On Fri, Nov 29, 2013 at 7:03 PM, Oleg Moskalenko >> >wrote:
>>>
>>> > Tim, you are wrong. Read what is "multicast" definition, and read how
>>> UDP
>>> > and TCP sockets work in Linux 3.9+ kernels.
>>> >
>>> > Oleg .
>>> >
>>> >
>>> > On Fri, Nov 29, 2013 at 9:59 AM, Tim Kientzle >> >wrote:
>>> >
>>> >>
>>> >> On Nov 29, 2013, at 4:04 AM, Ermal Luçi  wrote:
>>> >>
>>> >> > Hello,
>>> >> >
>>> >> > since SO_REUSEADDR and SO_REUSEPORT are supposed to allow two
>>> da

Re: [PATCH] SO_REUSEADDR and SO_REUSEPORT behaviour

2013-12-01 Thread Sepherosa Ziehau
On Sat, Nov 30, 2013 at 2:42 AM, Ermal Luçi  wrote:

> Well seems Dragonfly has some version of it already from commit [1].
>
>
The distribution algorithm was changed a little bit after initial commit to
gain more idle time (bnx(4) output has already been maxed out):
http://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/c275f18d832361be28b150d3f4fd518914bdeba6

Well, I also addressed a reasonable concern from nginx folks (I am not
quite sure about Linux's position on it; Linux original implementation of
SO_REUSEPORT from Google had this drawback, which I mentioned in the commit
message):
http://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/02ad2f0b874fb0a45eb69750219f79f5e8982272

As about nginx, SO_REUSEPORT patch for nginx (both 1.4.x and 1.5.x) is in
dports; should be easier to be back ported to FreeBSD's ports.  I failed to
convince nginx folks to merge it into mainline and I am currently onto
other stuffs, will come back to them later.  If FreeBSD is going to
implement Linux's style of SO_REUSEPORT, pushing the patch to the nginx
mainline will be easier.

I also put up a brief description of SO_REUSEPORT in dfly; may be useful to
you:
http://leaf.dragonflybsd.org/~sephe/netisr_so_reuseport.txt

Best Regards,
sephe


> In FreeBSD there is the framework for this with by defining PCBGROUP.
> Also the explanation of it at [2] and [3].
> It can achieve approximately the same features of SO_RESUSEPORT of linux.
> The only thing missing is the marketing behind it and i think and better
> RSS support.
> By looking at dates the support is there before linux so all you guys
> looking for it can experiment with it.
>
> What i was trying to accomplish was something else from performance
> improvement and
> maybe put a sysctl behind it to make it more acceptable..
>
> [1]
>
> http://gitweb.dragonflybsd.org/dragonfly.git/commitdiff/740d1d9f7b7bf9c9c021abb8197718d7a2d441c9
> [2]
> http://fxr.watson.org/fxr/source/netinet/in_pcbgroup.c?im=bigexcerpts#L51
> [3] http://lists.freebsd.org/pipermail/svn-src-head/2011-June/028190.html
>
>
> On Fri, Nov 29, 2013 at 7:03 PM, Oleg Moskalenko  >wrote:
>
> > Tim, you are wrong. Read what is "multicast" definition, and read how UDP
> > and TCP sockets work in Linux 3.9+ kernels.
> >
> > Oleg .
> >
> >
> > On Fri, Nov 29, 2013 at 9:59 AM, Tim Kientzle  >wrote:
> >
> >>
> >> On Nov 29, 2013, at 4:04 AM, Ermal Luçi  wrote:
> >>
> >> > Hello,
> >> >
> >> > since SO_REUSEADDR and SO_REUSEPORT are supposed to allow two daemons
> to
> >> > share the same port and possibly listening ip …
> >>
> >> These flags are used with TCP-based servers.
> >>
> >> I’ve used them to make software upgrades go more smoothly.
> >> Without them, the following often happens:
> >>
> >> * Old server stops.  In the process, all of its TCP connections are
> >> closed.
> >>
> >> * Connections to old server remain in the TCP connection table until the
> >> remote end can acknowledge.
> >>
> >> * New server starts.
> >>
> >> * New server tries to open port but fails because that port is “still in
> >> use” by connections in the TCP connection table.
> >>
> >> With these flags, the new server can open the port even though
> >> it is “still in use” by existing connections.
> >>
> >>
> >> > This is not the case today.
> >> > Only multicast sockets seem to have the behaviour of broadcasting the
> >> data
> >> > to all sockets sharing the same properties through these options!
> >>
> >> That is what multicast is for.
> >>
> >> If you want the same data sent to all listeners, then
> >> that is multicast behavior and you should be using
> >> a multicast socket.
> >>
> >> > The patch at [1] implements/corrects the behaviour for UDP sockets.
> >>
> >> You’re trying to turn all UDP sockets with those options
> >> into multicast sockets.
> >>
> >> If you want a multicast socket, you should ask for one.
> >>
> >> Tim
> >>
> >> ___
> >> freebsd-...@freebsd.org mailing list
> >> http://lists.freebsd.org/mailman/listinfo/freebsd-net
> >> To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"
> >>
> >
> >
>
>
> --
> Ermal
> ___
> freebsd-current@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-current
> To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"
>



-- 
Tomorrow Will Never Die
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to "freebsd-current-unsubscr...@freebsd.org"