Re: FS hang when creating snapshots on a UFS SU+J setup

2012-01-11 Thread Yamagi Burmeister
Hello,
I've done some tests to verify that the problem only occures when SU+J
is used, but not SU without J. In fact, I did run the following two
loops on different TTYs in parallel:

while 1
 cp -r /usr/src /root
 rm -Rf /root/src
end

while 1
 mksnap_ffs / /.snap/snap
 rm -f /.snap/snap
end

With SU without J the system survives this for at least 1 hour. But as
soon as SU+J is used it most likely deadlocks or even panics in the
first 1 or 2 minutes. What extactly happens seems to vary... In most
cases the system just deadlocks, sometimes like al...@bsdgate.org
descripes and sometimes it's completely unresponsive to any input. 
I've seen kernel messages like fsync: giving up on dirty.

Several times the system paniced. In most cases printing the generic
panic: page fault while in kernel mode and one time printing 
panic: snapacct_ufs2: bad block. I've never seen the same
backtrace twice. One time the system suddenly rebooted, like a tripple
fault or something like that happend.

Since it's much more likely that the problems described above arrise
when the the filesystem is loaded (for example by the first loop) while
taking the snapshot this looks like some kind of race condition or
something like that. 

Some more information from an older debug session can be found at:
http://deponie.yamagi.org/freebsd/debug/snapshots_panic/

On Tue, 10 Jan 2012 10:30:13 -0800
Kirk McKusick mckus...@mckusick.com wrote:

  Date: Mon, 9 Jan 2012 18:30:51 +0100
  From: Yamagi Burmeister li...@yamagi.org
  To: j...@freebsd.org, mckus...@freebsd.org
  Cc: freebsd-current@freebsd.org, br...@bryce.net
  Subject: Re: FS hang when creating snapshots on a UFS SU+J setup
  
  Hello,
  
  I'm sorry to bother you, but you may not be aware of this thread and
  this problem. We are several people experiencing deadlocks, kernel
  panics and other problems when creating sanpshots on file systems
  with SU+J. It would be nice to get some feedback, e.g. how can we
  help debugging and / or fixing this problem.
  
  Thank you,
  Yamagi
 
 First step in debugging is to find out if the problem is SU+J
 specific. To find out, turn off SU+J but leave SU. This change
 is done by running:
 
   umount filesystem
   tunefs -j disable filesystem
   mount filesystem
   cd filesystem
   rm .sujournal
 
 You may want to run `fsck -f' on the filesystem while you have
 it unmounted just to be sure that it is clean. Then run your
 snapshot request to see if it still fails. If it works, then
 we have narrowed the problem down to something related to SU+J.
 If it fails then we have a broader issue to deal with.
 
 If you wish to go back to using SU+J after the test, you can
 reenable SU+J by running:
 
   umount filesystem
   tunefs -j enable filesystem
   mount filesystem
 
 When responding to me, it is best to use my mckus...@mckusick.com
 email as I tend to read it more regularly.
 
   Kirk McKusick
 


-- 
Homepage:  www.yamagi.org
XMPP:  yam...@yamagi.org
GnuPG/GPG: 0xEFBCCBCB


pgpCLdO5w7GlU.pgp
Description: PGP signature


Re: bus dma: a flag/quirk for page zero

2012-01-11 Thread John Baldwin
On Tuesday, January 10, 2012 3:18:28 pm Andriy Gapon wrote:
 
 Some hardware interfaces may reserve a special meaning for a (physical) memory
 address value of zero.  One example is the OHCI specification where a zero 
 value
 in CurrentBufferPointer doesn't mean a physical address, but has a reserved
 meaning.  To be honest I don't have another example :) but don't preclude its
 existence.
 
 To deal with this peculiarity we could use a special flag/quirk that would
 instruct the bus dma code to never use the page zero for communication with 
 the
 hardware.
 Here's a proof of concept patch that implements the idea:
 http://people.freebsd.org/~avg/usb-dma-pagezero.diff
 
 Some concerns:
 - not sure if BUS_DMA_NO_PAGEZERO is the best name for the flag
 - the patch implements the flag only for x86 at the moment
 - usb code uses the flag regardless of the actual controller type
 
 What do you think?

I think this is fine, but you should just always exclude page zero when 
allocating
bounce pages.  Bounce pages are assigned to zones that can be shared by multiple
tags, so other tags that map to the same zone can alloc bounce pages that ohci
will use (add_bounce_page() should probably take the bounce zone as an arg 
instead
of a tag).  I think it's not worth creating a separate zone just for ohci, but
to forbid page zero from all zones instead.  Also, please change this:

-   if (newtag-lowaddr  ptoa((vm_paddr_t)Maxmem)
-|| newtag-alignment  1)
+   if (newtag-lowaddr  ptoa((vm_paddr_t)Maxmem) ||
+   newtag-alignment  1)
+   newtag-flags |= BUS_DMA_COULD_BOUNCE;
+
+   if ((newtag-flags  BUS_DMA_NO_PAGEZERO) != 0)
newtag-flags |= BUS_DMA_COULD_BOUNCE;

To just be one if.

-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: memory barriers in bus_dmamap_sync() ?

2012-01-11 Thread John Baldwin
On Tuesday, January 10, 2012 5:41:00 pm Luigi Rizzo wrote:
 On Tue, Jan 10, 2012 at 01:52:49PM -0800, Adrian Chadd wrote:
  On 10 January 2012 13:37, Luigi Rizzo ri...@iet.unipi.it wrote:
   I was glancing through manpages and implementations of bus_dma(9)
   and i am a bit unclear on what this API (in particular, bus_dmamap_sync() 
   )
   does in terms of memory barriers.
  
   I see that the x86/amd64 and ia64 code only does the bounce buffers.

That is because x86 in general does not need memory barriers.  Other platforms
have them (alpha had them in bus_dmamap_sync()).

   The mips seems to do some coherency-related calls.
  
   How do we guarantee, say, that a recently built packet is
   to memory before issuing the tx command to the NIC ?
  
  The drivers should be good examples of doing the right thing. You just
  do pre-map and post-map calls as appropriate.
  
  Some devices don't bother with this on register accesses and this is a
  bug. (eg, ath/ath_hal.) Others (eg iwn) do explicit flushes where
  needed.
 
 so you are saying that drivers are correct unless they are buggy :)

For bus_dma, just use bus_dmamap_sync() and you will be fine.

 Anyways... i see that some drivers use wmb() and rmb() and redefine
 their own version, usually based on lfence/sfence even on i386
 
   #define rmb()   __asm volatile(lfence ::: memory)
   #define wmb()   __asm volatile(sfence ::: memory)
 
 whereas the standard definitions are slightly different, e.g.
 sys/i386/include/atomic.h:
 
 #define  rmb()   __asm __volatile(lock; addl $0,(%%esp) : : : 
 memory)
 #define  wmb()   __asm __volatile(lock; addl $0,(%%esp) : : : 
 memory)
 
 and our bus_space API in sys/x86/include/bus.h is a bit unclear to
 me (other than the fact that having 4 unused arguments don't really
 encourage its use...)

We could use lfence/sfence on amd64, but on i386 not all processors support
those.  The broken drivers doing it by hand don't work on early i386 CPUs.
Also, I personally don't like using membars like rmb() and wmb() by hand.
If you are operating on normal memory I think atomic_load_acq() and
atomic_store_rel() are better.

 static __inline void
 bus_space_barrier(bus_space_tag_t tag __unused, bus_space_handle_t bsh 
 __unused,
 bus_size_t offset __unused, bus_size_t len __unused, int 
 flags)
 {
 #ifdef __GNUCLIKE_ASM
   if (flags  BUS_SPACE_BARRIER_READ)
 #ifdef __amd64__
   __asm __volatile(lock; addl $0,0(%%rsp) : : : memory);
 #else
   __asm __volatile(lock; addl $0,0(%%esp) : : : memory);
 #endif
   else
   __asm __volatile( : : : memory);
 #endif
 }

This is only for use with something accessed via bus_space(9).  Often these
are not needed however.  For example, on x86 all bus_space memory is mapped
uncached, so no actual barrier is needed except for a compiler barrier.

-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: memory barriers in bus_dmamap_sync() ?

2012-01-11 Thread Scott Long

On Jan 10, 2012, at 2:37 PM, Luigi Rizzo wrote:

 I was glancing through manpages and implementations of bus_dma(9)
 and i am a bit unclear on what this API (in particular, bus_dmamap_sync() )
 does in terms of memory barriers.
 
 I see that the x86/amd64 and ia64 code only does the bounce buffers.
 The mips seems to do some coherency-related calls.
 
 How do we guarantee, say, that a recently built packet is
 to memory before issuing the tx command to the NIC ?
 

In short, i386 and amd64 architectures do bus snooping between the cpu cache 
and the memory and bus controllers, and coherency is implicit and guaranteed.  
No explicit barriers or flushes are needed for device mastered DMA.  Other CPU 
architectures have appropriate cache flushes and memory barriers built into 
their busdma implementations.  Note that this is a different scenario than 
device register accesses, which is essentially host mastered DMA.

Scott


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: bus dma: a flag/quirk for page zero

2012-01-11 Thread Scott Long
An old controller in the aac driver family had a variation of this problem back 
when the FreeBSD contigmalloc algorithm started from the bottom of memory 
instead of the top.  I worked around it at driver init time by basically 
assuring that page 0 (and page 1) were allocated and thrown away; it seemed 
easier to leak 8k of memory than to jump through expensive hoops in busdma.

The busdma filter is expensive, and is used so rarely that I'm not even sure it 
works.  It was created for an old SCSI controller that had a buggy DMA 
controller which aliased a repeating pattern of address ranges; in other words 
it was a hack.  It's expensive to use, since it forces every bus_dmamap_load() 
request through the slow path and possibly bouncing.

With that said, your idea of a flag is probably a reasonable change for now.  
Alternatively, the ability to specify multiple DMA exclusion ranges has come up 
in the past, and would be a more complete answer to your problem; just treating 
page0 as special might not be enough (and I know for a fact that this is true 
with old i960RX pci processors).  That'll involve an API change, so is 
something that I'd rather not happen on a whim.

Scott

On Jan 10, 2012, at 1:18 PM, Andriy Gapon wrote:

 
 
 Some hardware interfaces may reserve a special meaning for a (physical) memory
 address value of zero.  One example is the OHCI specification where a zero 
 value
 in CurrentBufferPointer doesn't mean a physical address, but has a reserved
 meaning.  To be honest I don't have another example :) but don't preclude its
 existence.
 
 To deal with this peculiarity we could use a special flag/quirk that would
 instruct the bus dma code to never use the page zero for communication with 
 the
 hardware.
 Here's a proof of concept patch that implements the idea:
 http://people.freebsd.org/~avg/usb-dma-pagezero.diff
 
 Some concerns:
 - not sure if BUS_DMA_NO_PAGEZERO is the best name for the flag
 - the patch implements the flag only for x86 at the moment
 - usb code uses the flag regardless of the actual controller type
 
 What do you think?
 
 -- 
 Andriy Gapon
 ___
 freebsd-current@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-current
 To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: memory barriers in bus_dmamap_sync() ?

2012-01-11 Thread Luigi Rizzo
On Wed, Jan 11, 2012 at 10:05:28AM -0500, John Baldwin wrote:
 On Tuesday, January 10, 2012 5:41:00 pm Luigi Rizzo wrote:
  On Tue, Jan 10, 2012 at 01:52:49PM -0800, Adrian Chadd wrote:
   On 10 January 2012 13:37, Luigi Rizzo ri...@iet.unipi.it wrote:
I was glancing through manpages and implementations of bus_dma(9)
and i am a bit unclear on what this API (in particular, 
bus_dmamap_sync() )
does in terms of memory barriers.
   
I see that the x86/amd64 and ia64 code only does the bounce buffers.
 
 That is because x86 in general does not need memory barriers. ...

maybe they are not called memory barriers but for instance
how do i make sure, even on the x86, that a write to the NIC ring
is properly flushed before the write to the 'start' register occurs ?

Take for instance the following segment from

head/sys/ixgbe/ixgbe.c::ixgbe_xmit() :

txd-read.cmd_type_len |=
htole32(IXGBE_TXD_CMD_EOP | IXGBE_TXD_CMD_RS);
txr-tx_avail -= nsegs;
txr-next_avail_desc = i;

txbuf-m_head = m_head;
/* Swap the dma map between the first and last descriptor */
txr-tx_buffers[first].map = txbuf-map;
txbuf-map = map;
bus_dmamap_sync(txr-txtag, map, BUS_DMASYNC_PREWRITE);

/* Set the index of the descriptor that will be marked done */
txbuf = txr-tx_buffers[first];
txbuf-eop_index = last;

bus_dmamap_sync(txr-txdma.dma_tag, txr-txdma.dma_map,
BUS_DMASYNC_PREREAD | BUS_DMASYNC_PREWRITE);
/*
 * Advance the Transmit Descriptor Tail (Tdt), this tells the
 * hardware that this frame is available to transmit.
 */
++txr-total_packets;
IXGBE_WRITE_REG(adapter-hw, IXGBE_TDT(txr-me), i);

the descriptor is allocated without any caching constraint,
the bus_dmamap_sync() are effectively NOPs on i386 and amd64,
and IXGBE_WRITE_REG has no implicit guard. 

 We could use lfence/sfence on amd64, but on i386 not all processors support

ok then we can make it machine-specific versions... this is kernel
code so we do have a list of supported CPUs.

 those.  The broken drivers doing it by hand don't work on early i386 CPUs.
 Also, I personally don't like using membars like rmb() and wmb() by hand.
 If you are operating on normal memory I think atomic_load_acq() and
 atomic_store_rel() are better.

is it just a matter of names ?

My complaint was mostly on how many
unused parameters you need to pass to bus_space_barrier().
They make life hard for both the programmer and the
compiler, which might become unable to optimize them out.

I understand that more parameter may help parallelism,
but i wonder if it is worth the effort.

cheers
luigi
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: memory barriers in bus_dmamap_sync() ?

2012-01-11 Thread John Baldwin
On Wednesday, January 11, 2012 11:29:44 am Luigi Rizzo wrote:
 On Wed, Jan 11, 2012 at 10:05:28AM -0500, John Baldwin wrote:
  On Tuesday, January 10, 2012 5:41:00 pm Luigi Rizzo wrote:
   On Tue, Jan 10, 2012 at 01:52:49PM -0800, Adrian Chadd wrote:
On 10 January 2012 13:37, Luigi Rizzo ri...@iet.unipi.it wrote:
 I was glancing through manpages and implementations of bus_dma(9)
 and i am a bit unclear on what this API (in particular, 
 bus_dmamap_sync() )
 does in terms of memory barriers.

 I see that the x86/amd64 and ia64 code only does the bounce buffers.
  
  That is because x86 in general does not need memory barriers. ...
 
 maybe they are not called memory barriers but for instance
 how do i make sure, even on the x86, that a write to the NIC ring
 is properly flushed before the write to the 'start' register occurs ?
 
 Take for instance the following segment from
 
 head/sys/ixgbe/ixgbe.c::ixgbe_xmit() :
 
 txd-read.cmd_type_len |=
 htole32(IXGBE_TXD_CMD_EOP | IXGBE_TXD_CMD_RS);
 txr-tx_avail -= nsegs;
 txr-next_avail_desc = i;
 
 txbuf-m_head = m_head;
 /* Swap the dma map between the first and last descriptor */
 txr-tx_buffers[first].map = txbuf-map;
 txbuf-map = map;
 bus_dmamap_sync(txr-txtag, map, BUS_DMASYNC_PREWRITE);
 
 /* Set the index of the descriptor that will be marked done */
 txbuf = txr-tx_buffers[first];
 txbuf-eop_index = last;
 
 bus_dmamap_sync(txr-txdma.dma_tag, txr-txdma.dma_map,
 BUS_DMASYNC_PREREAD | BUS_DMASYNC_PREWRITE);
 /*
  * Advance the Transmit Descriptor Tail (Tdt), this tells the
  * hardware that this frame is available to transmit.
  */
 ++txr-total_packets;
 IXGBE_WRITE_REG(adapter-hw, IXGBE_TDT(txr-me), i);
 
 the descriptor is allocated without any caching constraint,
 the bus_dmamap_sync() are effectively NOPs on i386 and amd64,
 and IXGBE_WRITE_REG has no implicit guard.

x86 doesn't need a guard as its stores are ordered.  The bus_dmamap_sync()
would be sufficient for platforms where stores can be reordered in this
case (as those platforms should place memory barriers in their implementation
of bus_dmamap_sync()).
 
  We could use lfence/sfence on amd64, but on i386 not all processors support
 
 ok then we can make it machine-specific versions... this is kernel
 code so we do have a list of supported CPUs.

It is not worth it to add the overhead for i386 to do that when all modern
x86 CPUs are going to run amd64 anyway.

  those.  The broken drivers doing it by hand don't work on early i386 CPUs.
  Also, I personally don't like using membars like rmb() and wmb() by hand.
  If you are operating on normal memory I think atomic_load_acq() and
  atomic_store_rel() are better.
 
 is it just a matter of names ?

For regular memory when you are using memory barriers you often want to tie
the barrier to a specific operation (e.g. it is the store in IXGBE_WRITE_REG()
above that you want ordered after any other stores).  Having the load/store
and membar in the same call explicitly notes that relationship.

 My complaint was mostly on how many
 unused parameters you need to pass to bus_space_barrier().
 They make life hard for both the programmer and the
 compiler, which might become unable to optimize them out.

Yes, it seems overly abstracted.  In NetBSD, bus_dmapmap_sync() actually takes
extra parameters to say which portion of the map should be sync'd.  We removed
those in FreeBSD to make the API simpler.  bus_space_barrier() could probably
use similar simplification (I believe we also adopted that API from NetBSD).

-- 
John Baldwin
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Data corruption over NFS in -current

2012-01-11 Thread Martin Cracauer
I'm sorry for the unspecific bug report but I thought a heads-up is
better than none.

$ uname -a
FreeBSD wings.cons.org 10.0-CURRENT FreeBSD 10.0-CURRENT #2: Wed Dec
28 12:19:21 EST 2011
craca...@wings.cons.org:/usr/src/sys/amd64/compile/WINGS  amd64

I see filesystem corruption on NFS filesystems here.  I am running a
heavy shellscript that is noodling around with ascii files assembling
them with awk and whatnot.  Some actions are concurrent with up to 21
forks doing full-CPU load scripting.  This machine is a K8 with a
total of 8 cores, diskless NFS and memory filesystem for /tmp.

I observe two problems:
- for no reason whatsoever, some files change from my 
  (user/group) cracauer/wheel to root/cracauer
- the same files will later be corrupted.  The beginning of the file
  is normal but then it has what looks like parts of /usr/ports,
  including our CVS files and binary junk, mostly zeros

I did do some ports building lately but not at the same time that this
problem manifested itself.  I speculate some ports blocks were still
resident in the filesystem buffer cache.

Server is Linux.

Martin
-- 
%%%
Martin Cracauer craca...@cons.org   http://www.cons.org/cracauer/
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: memory barriers in bus_dmamap_sync() ?

2012-01-11 Thread Scott Long

On Jan 11, 2012, at 9:29 AM, Luigi Rizzo wrote:

 On Wed, Jan 11, 2012 at 10:05:28AM -0500, John Baldwin wrote:
 On Tuesday, January 10, 2012 5:41:00 pm Luigi Rizzo wrote:
 On Tue, Jan 10, 2012 at 01:52:49PM -0800, Adrian Chadd wrote:
 On 10 January 2012 13:37, Luigi Rizzo ri...@iet.unipi.it wrote:
 I was glancing through manpages and implementations of bus_dma(9)
 and i am a bit unclear on what this API (in particular, bus_dmamap_sync() 
 )
 does in terms of memory barriers.
 
 I see that the x86/amd64 and ia64 code only does the bounce buffers.
 
 That is because x86 in general does not need memory barriers. ...
 
 maybe they are not called memory barriers but for instance
 how do i make sure, even on the x86, that a write to the NIC ring
 is properly flushed before the write to the 'start' register occurs ?
 

Flushed from where?  The CPU's cache or the device memory and pci bus?  I 
already told you that x86/64 is fundamentally designed around bus snooping, and 
John already told you that we map device memory to be uncached.  Also, PCI 
guarantees that reads and writes are retired in order, and that reads are 
therefore flushing barriers.  So lets take two scenarios.  In the first 
scenario, the NIC descriptors are in device memory, so the DMA has to do 
bus_space accesses to write them.

Scenario 1
1.  driver writes to the descriptors.  These may or may not hang out in the 
cpu's cache, though they probably won't because we map PCI device memory as 
uncachable.  But let's say for the sake of argument that they are cached.
2. driver writes to the 'go' register on the card.  This may or may not be in 
the cpu's cache, as in step 1.
3. The writes get flushed out of the cpu and onto the host bus.  Again, the 
x86/64 architecture guarantees that these writes won't be reordered.
4. The writes get onto the PCI bus and buffered at the first bridge.
5. PCI ordering rules keep the writes in order, and they eventually make it to 
the card in the same order that the driver executed them.

Scenario 2
1. driver writes to the descriptors in host memory.  This memory is mapped as 
cache-able, so these writes hang out in the CPU.
2. driver writes to the 'go' register on the card.  This may or may not hang 
out in the cpu's cache, but likely won't as discussed previously.
3. The 'go' write eventually makes its way down to the card, and the card 
starts its processing.
4. the card masters a PCI read for the descriptor data, and the request goes up 
the pci bus to the host bridge
5. thanks to the fundamental design guarantees on x86/64, the pci host bridge, 
memory controller, and cpu all snoop each other.  In this case, the cpu sees 
the read come from the pci host bridge, knows that its for data that's in its 
cache, and intercepts and fills the request.  Coherency is preserved!

Explicit barriers aren't needed in either scenario; everything will retire 
correctly and in order.  The only caveat is the buffering that happens on the 
PCI bus.  A write by the host might take a relatively long and indeterminate 
time to reach the card thanks to this buffering and the bus being busy.  To 
guarantee that you know when the write has been delivered and retired, you can 
do a read immediately after the write.  On some systems, this might also boost 
the transaction priority of the write and get it down faster, but that's really 
not a reliably guarantee.  All you'll know is that when the read completes, the 
write prior to it has also completed.

Where barriers _are_ needed is in interrupt handlers, and I can discuss that if 
you're interested.

Scott

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: memory barriers in bus_dmamap_sync() ?

2012-01-11 Thread Ian Lepore
On Wed, 2012-01-11 at 11:49 -0500, John Baldwin wrote:
 On Wednesday, January 11, 2012 11:29:44 am Luigi Rizzo wrote:
  On Wed, Jan 11, 2012 at 10:05:28AM -0500, John Baldwin wrote:
   On Tuesday, January 10, 2012 5:41:00 pm Luigi Rizzo wrote:
On Tue, Jan 10, 2012 at 01:52:49PM -0800, Adrian Chadd wrote:
 On 10 January 2012 13:37, Luigi Rizzo ri...@iet.unipi.it wrote:
  I was glancing through manpages and implementations of bus_dma(9)
  and i am a bit unclear on what this API (in particular, 
  bus_dmamap_sync() )
  does in terms of memory barriers.
 
  I see that the x86/amd64 and ia64 code only does the bounce buffers.
   
   That is because x86 in general does not need memory barriers. ...
  
  maybe they are not called memory barriers but for instance
  how do i make sure, even on the x86, that a write to the NIC ring
  is properly flushed before the write to the 'start' register occurs ?
  
  Take for instance the following segment from
  
  head/sys/ixgbe/ixgbe.c::ixgbe_xmit() :
  
  txd-read.cmd_type_len |=
  htole32(IXGBE_TXD_CMD_EOP | IXGBE_TXD_CMD_RS);
  txr-tx_avail -= nsegs;
  txr-next_avail_desc = i;
  
  txbuf-m_head = m_head;
  /* Swap the dma map between the first and last descriptor */
  txr-tx_buffers[first].map = txbuf-map;
  txbuf-map = map;
  bus_dmamap_sync(txr-txtag, map, BUS_DMASYNC_PREWRITE);
  
  /* Set the index of the descriptor that will be marked done */
  txbuf = txr-tx_buffers[first];
  txbuf-eop_index = last;
  
  bus_dmamap_sync(txr-txdma.dma_tag, txr-txdma.dma_map,
  BUS_DMASYNC_PREREAD | BUS_DMASYNC_PREWRITE);
  /*
   * Advance the Transmit Descriptor Tail (Tdt), this tells the
   * hardware that this frame is available to transmit.
   */
  ++txr-total_packets;
  IXGBE_WRITE_REG(adapter-hw, IXGBE_TDT(txr-me), i);
  
  the descriptor is allocated without any caching constraint,
  the bus_dmamap_sync() are effectively NOPs on i386 and amd64,
  and IXGBE_WRITE_REG has no implicit guard.
 
 x86 doesn't need a guard as its stores are ordered.  The bus_dmamap_sync()
 would be sufficient for platforms where stores can be reordered in this
 case (as those platforms should place memory barriers in their implementation
 of bus_dmamap_sync()).
  
   We could use lfence/sfence on amd64, but on i386 not all processors 
   support
  
  ok then we can make it machine-specific versions... this is kernel
  code so we do have a list of supported CPUs.
 
 It is not worth it to add the overhead for i386 to do that when all modern
 x86 CPUs are going to run amd64 anyway.
 

Harumph.  I run i386 on all my x86 CPUs.  For our embedded systems
products it's because they're small wimpy old CPUs, and for my desktop
system it's because I need to run builds for the embedded systems and
avoid all the cross-build problems of trying to create i386 ports on a
64 bit host.

   those.  The broken drivers doing it by hand don't work on early i386 CPUs.
   Also, I personally don't like using membars like rmb() and wmb() by hand.
   If you are operating on normal memory I think atomic_load_acq() and
   atomic_store_rel() are better.
  
  is it just a matter of names ?
 
 For regular memory when you are using memory barriers you often want to tie
 the barrier to a specific operation (e.g. it is the store in IXGBE_WRITE_REG()
 above that you want ordered after any other stores).  Having the load/store
 and membar in the same call explicitly notes that relationship.
 
  My complaint was mostly on how many
  unused parameters you need to pass to bus_space_barrier().
  They make life hard for both the programmer and the
  compiler, which might become unable to optimize them out.
 
 Yes, it seems overly abstracted.  In NetBSD, bus_dmapmap_sync() actually takes
 extra parameters to say which portion of the map should be sync'd.  We removed
 those in FreeBSD to make the API simpler.  bus_space_barrier() could probably
 use similar simplification (I believe we also adopted that API from NetBSD).

I've wished (in the ARM world) for the ability to sync a portion of a
map.  I've even kicked around the idea of proposing an API extension to
do so, but I guess if FreeBSD went out of its way to remove that
functionality that idea probably won't fly. :)

-- Ian


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: memory barriers in bus_dmamap_sync() ?

2012-01-11 Thread Scott Long

On Jan 11, 2012, at 10:00 AM, Ian Lepore wrote:

 
 I've wished (in the ARM world) for the ability to sync a portion of a
 map.  I've even kicked around the idea of proposing an API extension to
 do so, but I guess if FreeBSD went out of its way to remove that
 functionality that idea probably won't fly. :)

It's been discussed numerous times since mips and arm became relevant in 
FreeBSD, and I'm frankly surprised that it hasn't happened yet.  Go forth and 
code, it won't be opposed.

Scott

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: memory barriers in bus_dmamap_sync() ?

2012-01-11 Thread Ian Lepore
On Wed, 2012-01-11 at 09:59 -0700, Scott Long wrote:
 
 Where barriers _are_ needed is in interrupt handlers, and I can
 discuss that if you're interested.
 
 Scott
 

I'd be interested in hearing about that (and in general I'm loving the
details coming out in your explanations -- thanks!).

-- Ian


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: bus dma: a flag/quirk for page zero

2012-01-11 Thread Andriy Gapon
on 11/01/2012 17:01 John Baldwin said the following:
 I think this is fine, but you should just always exclude page zero when 
 allocating
 bounce pages.  Bounce pages are assigned to zones that can be shared by 
 multiple
 tags, so other tags that map to the same zone can alloc bounce pages that ohci
 will use (add_bounce_page() should probably take the bounce zone as an arg 
 instead
 of a tag).  I think it's not worth creating a separate zone just for ohci, but
 to forbid page zero from all zones instead.

Thank you for the explanation.
Actually, I think that on x86 we don't have to do anything special for any 
memory
allocations that we do, including the bounce pages, as the page zero is excluded
from phys_avail and is not available for normal use.
The only thing we have to do on x86 is to bounce the page zero if it gets passed
to us.  (And that can happen only in very special situations, obviously.  I am 
not
sure if anything besides the system dump would do that.)

And I would prefer to defer any changes to !x86 bus dma to the respective 
platform
maintainers, obviously ;-)

 Also, please change this:
 
 - if (newtag-lowaddr  ptoa((vm_paddr_t)Maxmem)
 -  || newtag-alignment  1)
 + if (newtag-lowaddr  ptoa((vm_paddr_t)Maxmem) ||
 + newtag-alignment  1)
 + newtag-flags |= BUS_DMA_COULD_BOUNCE;
 +
 + if ((newtag-flags  BUS_DMA_NO_PAGEZERO) != 0)
   newtag-flags |= BUS_DMA_COULD_BOUNCE;
 
 To just be one if.

Will do.

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: bus dma: a flag/quirk for page zero

2012-01-11 Thread Andriy Gapon
on 11/01/2012 18:02 Scott Long said the following:
 An old controller in the aac driver family had a variation of this problem 
 back
 when the FreeBSD contigmalloc algorithm started from the bottom of memory
 instead of the top.  I worked around it at driver init time by basically
 assuring that page 0 (and page 1) were allocated and thrown away; it seemed
 easier to leak 8k of memory than to jump through expensive hoops in busdma.
 
 The busdma filter is expensive, and is used so rarely that I'm not even sure 
 it
 works.  It was created for an old SCSI controller that had a buggy DMA
 controller which aliased a repeating pattern of address ranges; in other words
 it was a hack.  It's expensive to use, since it forces every bus_dmamap_load()
 request through the slow path and possibly bouncing.
 
 With that said, your idea of a flag is probably a reasonable change for now.
 Alternatively, the ability to specify multiple DMA exclusion ranges has come 
 up
 in the past, and would be a more complete answer to your problem; just 
 treating
 page0 as special might not be enough (and I know for a fact that this is true
 with old i960RX pci processors).  That'll involve an API change, so is
 something that I'd rather not happen on a whim.
 

Scott,

thank you very much for the explanation and the insight.

As I've written in some other email, on x86 page 0 is already an unavailable
page and the only way it can get into the dma layer is only during a system 
dump.
I am not sure about all other platforms, probably there is at least one where 
page
0 is just another normal page.  Maybe excluding page 0 from both normal use and
the dump is the most simple hummer for this nail...

The problem with trying to deal with page zero at the bus dma level is that it
pessimizes the cases where previously no bouncing was expected as page zero may
pop up anywhere.  That's why I decided to go with the flag instead of handling
page 0 in all dma tags unconditionally as Matthew has suggested.

It feels like there could be a better solution that the flag, but I just can't
come up with it.  To be fair, I haven't come up with the flag either, it's a
John's idea.

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: FS hang when creating snapshots on a UFS SU+J setup

2012-01-11 Thread Gautam Mani
On Wed, Jan 11, 2012 at 10:30:39AM +0100, Yamagi Burmeister wrote:
 Hello,
 I've done some tests to verify that the problem only occures when SU+J
 is used, but not SU without J. In fact, I did run the following two
 loops on different TTYs in parallel:

I also confirm this using a similar technique. The panic is only seen
with SU+J and not with just SU. 

I did a similar cp -R /root /var/tmp ; rm -rf /var/tmp/root and the
panic was trigger with dump -0L...
I got the panic (again in less than a minute of issuing the dump command)
-- I also got the giving up on dirty kind of message. 

I took a picture of the screen -- I am not sure if that helps!

http://picpaste.com/11012012519-LF0sWlpw.jpg

 Since it's much more likely that the problems described above arrise
 when the the filesystem is loaded (for example by the first loop) while
 taking the snapshot this looks like some kind of race condition or
 something like that. 
 

Earlier I have seen this happen with dump without any high load -- or
atleast very minimum -- again with the /var because some logs were
written or cronjob was running writing to it. That didnt panic as I
indicated in my previous email -- hogged the CPU and forced a
power-cycle.

Do let me know if I can try something further.

Thanks
Gautam


___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Data corruption over NFS in -current

2012-01-11 Thread Stefan Bethke
Am 11.01.2012 um 17:57 schrieb Martin Cracauer:

 I'm sorry for the unspecific bug report but I thought a heads-up is
 better than none.
 
 $ uname -a
 FreeBSD wings.cons.org 10.0-CURRENT FreeBSD 10.0-CURRENT #2: Wed Dec
 28 12:19:21 EST 2011
 craca...@wings.cons.org:/usr/src/sys/amd64/compile/WINGS  amd64

I'm sure Rick will want to know which NFS version, which client code (default 
new code I'm assuming) and which mount options...

 I see filesystem corruption on NFS filesystems here.  I am running a
 heavy shellscript that is noodling around with ascii files assembling
 them with awk and whatnot.  Some actions are concurrent with up to 21
 forks doing full-CPU load scripting.  This machine is a K8 with a
 total of 8 cores, diskless NFS and memory filesystem for /tmp.
 
 I observe two problems:
 - for no reason whatsoever, some files change from my 
  (user/group) cracauer/wheel to root/cracauer
 - the same files will later be corrupted.  The beginning of the file
  is normal but then it has what looks like parts of /usr/ports,
  including our CVS files and binary junk, mostly zeros
 
 I did do some ports building lately but not at the same time that this
 problem manifested itself.  I speculate some ports blocks were still
 resident in the filesystem buffer cache.
 
 Server is Linux.
 
 Martin
 -- 
 %%%
 Martin Cracauer craca...@cons.org   http://www.cons.org/cracauer/
 ___
 freebsd-current@freebsd.org mailing list
 http://lists.freebsd.org/mailman/listinfo/freebsd-current
 To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

-- 
Stefan Bethke s...@lassitu.de   Fon +49 151 14070811



___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Data corruption over NFS in -current

2012-01-11 Thread Martin Cracauer
Stefan Bethke wrote on Wed, Jan 11, 2012 at 07:14:44PM +0100: 
 Am 11.01.2012 um 17:57 schrieb Martin Cracauer:
 
  I'm sorry for the unspecific bug report but I thought a heads-up is
  better than none.
  
  $ uname -a
  FreeBSD wings.cons.org 10.0-CURRENT FreeBSD 10.0-CURRENT #2: Wed Dec
  28 12:19:21 EST 2011
  craca...@wings.cons.org:/usr/src/sys/amd64/compile/WINGS  amd64
 
 I'm sure Rick will want to know which NFS version, which client code (default 
 new code I'm assuming) and which mount options...

It's all default both in fstab and as reported by mount(8).

This is a diskless PXE boot but the mount affected (usr) is not the
root filesystem, so this should come in via fstab.

BTW, my /usr/ports is another mount so the corruption is cross-mount
(garbage from /usr/ports entering /usr).

Appending nfsstat output.

I am re-running things contiguously to see how reproducible this is.
This machine was recently updated from a -current almost a year old,
so it's its first time with the new NFS client code.

Martin

  I see filesystem corruption on NFS filesystems here.  I am running a
  heavy shellscript that is noodling around with ascii files assembling
  them with awk and whatnot.  Some actions are concurrent with up to 21
  forks doing full-CPU load scripting.  This machine is a K8 with a
  total of 8 cores, diskless NFS and memory filesystem for /tmp.
  
  I observe two problems:
  - for no reason whatsoever, some files change from my 
   (user/group) cracauer/wheel to root/cracauer
  - the same files will later be corrupted.  The beginning of the file
   is normal but then it has what looks like parts of /usr/ports,
   including our CVS files and binary junk, mostly zeros
  
  I did do some ports building lately but not at the same time that this
  problem manifested itself.  I speculate some ports blocks were still
  resident in the filesystem buffer cache.
  
  Server is Linux.
  
  Martin
  -- 
  %%%
  Martin Cracauer craca...@cons.org   http://www.cons.org/cracauer/
  ___
  freebsd-current@freebsd.org mailing list
  http://lists.freebsd.org/mailman/listinfo/freebsd-current
  To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
 
 -- 
 Stefan Bethke s...@lassitu.de   Fon +49 151 14070811
 
 
 

-- 
%%%
Martin Cracauer craca...@cons.org   http://www.cons.org/cracauer/
Client Info:
Rpc Counts:
  Getattr   SetattrLookup  Readlink  Read WriteCreateRemove
 94392942513117   3637266  2577  40227237   2824593333832304567
   Rename  Link   Symlink Mkdir Rmdir   Readdir  RdirPlusAccess
32522  5121  4856 20363 13954179035 0   3534382
MknodFsstatFsinfo  PathConfCommit
5  21127240 3  2999521782
Rpc Info:
 TimedOut   Invalid X Replies   Retries  Requests
0 0 0 0 167678419
Cache Info:
Attr HitsMisses Lkup HitsMisses BioR HitsMisses BioW HitsMisses
1933340911  73265447 1123380719   3636242  90975094450509   4917135   
2824593
BioRLHitsMisses BioD HitsMisses DirE HitsMisses Accs HitsMisses
 54732346  2577599049142917352394 0 733726346   3534382

Server Info:
  Getattr   SetattrLookup  Readlink  Read WriteCreateRemove
0 0 0 0 0 0 0 0
   Rename  Link   Symlink Mkdir Rmdir   Readdir  RdirPlusAccess
0 0 0 0 0 0 0 0
MknodFsstatFsinfo  PathConfCommit
0 0 0 0 0
Server Ret-Failed
0
Server Faults
0
Server Cache Stats:
   Inprog  Idem  Non-idemMisses
0 0 0 0
Server Write Gathering:
 WriteOps  WriteRPC   Opsaved
0 0 0
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Re: memory barriers in bus_dmamap_sync() ?

2012-01-11 Thread Scott Long

On Jan 11, 2012, at 10:10 AM, Ian Lepore wrote:

 On Wed, 2012-01-11 at 09:59 -0700, Scott Long wrote:
 
 Where barriers _are_ needed is in interrupt handlers, and I can
 discuss that if you're interested.
 
 Scott
 
 
 I'd be interested in hearing about that (and in general I'm loving the
 details coming out in your explanations -- thanks!).
 
 -- Ian
 
 

Well, I unfortunately wasn't as clear as I should have been.  Interrupt 
handlers need bus barriers, not cpu cache/instruction barriers.  This is 
because the interrupt signal can arrive at the CPU before data and control 
words are finished being DMA's up from the controller.  Also, many controllers 
require an acknowledgement write to be performed before leaving the interrupt 
handler, so the driver needs to do a bus barrier to ensure that the write 
flushes.  But these are two different topics, so let me start with the 
interrupt handler.

Legacy interrupts in PCI are carried on discrete pins and are level triggered.  
When the device wants to signal an interrupt, it asserts the pin.  That 
assertion is seen at the IOAPIC on the host bridge and converted to an 
interrupt message, which is then sent immediately to the CPU's lAPIC.  This all 
happened very, very quickly.  Meanwhile, the interrupt condition could have 
been predicated on the device DMA'ing bytes up to host memory, and those DMA 
writes could have gotten stalled and buffered on the way up the PCI topology.  
The end result is often that the driver interrupt handler runs before those 
writes have hit host memory.  To fix this, drivers do a read of a card register 
as the first step in the interrupt handler, even if the read is just a dummy 
and the result is thrown away.  Thanks to PCI ordering, the read will ensure 
that any pending writes from the card have flushed all the way up, and 
everything will be coherent by the time the read completes.

MSI and MSIX interrupts on modern PCI and PCIe fix this.  These interrupts are 
sent as byte messages that are DMA'd to the host bridge.  Since they are 
in-band data, they are subject to the same ordering rules as all other data on 
the bus, and thus ordering for them is implicit.  When the MSI message reaches 
the host bridge, it's converted into an lAPIC message just like before.  
However, the driver doesn't need to do a flushing read because it knows that 
the MSI message was the last write on the bus, therefore everything prior to it 
has arrived and everything is coherent.  Since reads are expensive in PCI, this 
saves a considerable amount of time in the driver.  Unfortunately, it adds 
non-deterministic latency to the interrupt since the MSI message is in-band and 
has no way to force priority flushing on a busy bus.  So while MSI/MSIX save 
some time in the interrupt handler, they actually make the overall latency 
situation potentially worse (thanks Intel!).

The acknowledgement write issue is a little more straight forward.  If the card 
requires an acknowledgment write from the driver to know that the interrupt has 
been serviced (so that it'll then know to de-assert the interrupt line), that 
write has to be flushed to the hardware before the interrupt handler completes. 
 Otherwise, the write could get stalled, the interrupt remain asserted, and in 
the interrupt erroneously re-trigger on the host CPU.  I've seen cases where 
this devolves into the card getting out of sync with the driver to the point 
that interrupts get missed.  Also, this gets a little weird sometimes with 
buggy MSI hacks in both device and PCI bridge hardware.

Scott



___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


[RFT] Major snd_hda rewrite

2012-01-11 Thread Alexander Motin

Hi.

I would like request for testing of my work on further HDA sound driver 
improvement.


List of changes done this time:
 - Huge old hdac driver was split into three independent pieces: HDA 
controller driver (hdac), HDA CODEC driver (hdacc) and HDA sudio 
function driver (hdaa). All drivers are completely independent and talk 
to each other only via NewBus interfaces. Using more NewBus bells and 
whistles allows to properly see HDA structure with standard system 
instruments, such as `devinfo -v`. Biggest driver file size now is 150K, 
instead of 240K before, and the code is much more clean.
 - Support for multichannel recording was added. While I've never seen 
it configured by default, UAA specification tells that it is possible. 
Now, as specification defines, driver checks input associations for pins 
with sequence numbers 14 and 15, and if found (usually) -- works as 
before, mixing signals together. If it doesn't, it configures input 
association as multichannel. I've found some CODECs doing strange things 
when configured for multichannel recording, but I've also found 
successfully working examples.
 - Signal tracer was improved to look for cases where several DACs/ADCs 
in CODEC can work with the same audio signal. If such case found, driver 
registers additional playback/record stream (channel) for the pcm 
device. Having more then one stream allows to avoid vchans use and so 
avoid extra conversion to pre-configured vchan rate and sample format. 
Not many CODECs allow this, especially on playback, but some do.
 - New controller streams reservation mechanism was implemented. That 
allows to have more pcm devices then streams supported by the controller 
(usually 4 in each direction). Now it limits only number of 
_simultaneously_ transferred audio streams, that is rarely reachable and 
properly reported if happens.
 - Codec pins and GPIO signals configuration was exported via set of 
writable sysctls. Another sysctl dev.hdaa.X.reconfig allows to trigger 
driver reconfiguration in run-time. The only requirement is that all pcm 
devices should be closed at the moment, as they will be destroyed and 
recreated. This should significantly simplify process of fixing CODEC 
configuration. It should be possible now even to write GUI to do it with 
few mouse clicks.
 - Driver now decodes pins location and connector type names. In some 
cases it allows to hint user where on the system case connectors, 
related to the pcm device, are located. Number of channels supported by 
pcm device, reported now (if it is not 2), should also make search easier.

 - Added fix for digital mic recording on some Asus laptops/netbooks.

That is how it may look now in dmesg:

hdac0: Intel 5 Series/3400 Series HDA Controller mem 
0xf7ef4000-0xf7ef7fff irq 22 at device 27.0 on pci0

hdacc0: VIA VT1708S_0 HDA CODEC at cad 0 on hdac0
hdaa0: VIA VT1708S_0 HDA CODEC Audio Function Group at nid 1 on hdacc0
hdacc1: Intel Ibex Peak HDA CODEC at cad 3 on hdac0
hdaa1: Intel Ibex Peak HDA CODEC Audio Function Group at nid 1 on hdacc1
pcm0: VIA VT1708S_0 HDA CODEC PCM (Analog) at nid 28,29 and 26,30,27 
on hdaa0

pcm1: VIA VT1708S_0 HDA CODEC PCM (Digital) at nid 32 on hdaa0
pcm2: Intel Ibex Peak HDA CODEC PCM (DisplayPort 8ch) at nid 6 on hdaa1

Patch can be found here:
http://people.freebsd.org/~mav/hda.rewrite.patch

Patch was generated for 10-CURRENT, but should apply to fresh 9-STABLE 
and 8-STABLE branches also.


Special thanks to iXsystems, Inc. for supporting this work.

Comments and tests results are welcome!

--
Alexander Motin
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: bus dma: a flag/quirk for page zero

2012-01-11 Thread Andriy Gapon
on 11/01/2012 19:18 Andriy Gapon said the following:
 Actually, I think that on x86 we don't have to do anything special for any 
 memory
 allocations that we do, including the bounce pages, as the page zero is 
 excluded
 from phys_avail and is not available for normal use.

After some additional thinking there is probably no reason to take advantage of
this fact.  First, it would increase differences with other platforms.  Second,
it would add a hidden dependency.  So it's better to be explicit here.

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Very fresh (two days ago) 10-current becomes completely unresponsive under load

2012-01-11 Thread Lev Serebryakov
Hello, Chuck.
You wrote 11 января 2012 г., 3:47:08:

 If it were me, I would also try with the older 44BSD scheduler, just to
 see what happens.
 It helps both with mpd5.5 and mpd5.6.
 Now under network load top lines in `top' are

  PID USERNAME PRI NICE   SIZERES STATETIME   WCPU COMMAND
   10 root 155 ki31 0K 8K RUN  2:19 60.74% idle
   11 root -72- 0K   112K WAIT 1:47 32.03% intr{swi1: netisr 0}

And system is very responsive.

  ng_queue is not in top 17 (one screen) lines of `top' any more, it
 looks usual to me.

 I'll try to find revision, which breaks ULE + NetGraph by binary
search, but it takes some time as here is 590 revisions in head/sys
between previous version I used (which works Ok with ULE) and current
version (which doesn't). So, it should be ~9 iterations, and every
iteration takes ~1 hour and I could not spend 9 hours in row on this
task.


-- 
// Black Lion AKA Lev Serebryakov l...@freebsd.org

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: ImageMagick: tests fail on freebsd 10

2012-01-11 Thread Andriy Gapon
on 12/01/2012 00:22 Andriy Gapon said the following:
[snip]
 /usr/include/xlocale.h:160:3: error: unknown type name 'va_list'
 /usr/include/xlocale.h:162:3: error: unknown type name 'va_list'
[snip]
 Back to the main problem.  I am not sure where the difference between the base
 GCC and GCC 4.6 with respect to 'va_list' in xlocale.h comes from.

Changing those two instances of 'va_list' to '__va_list' (which is used a lot
throughout the header) seems to fix the problem with GCC 4.6.

David, what do you think?

-- 
Andriy Gapon
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: couldn't log on to my -CURRENT machine after upgrade to latest PAM

2012-01-11 Thread Don Lewis
On 11 Jan, Dag-Erling Smørgrav wrote:
 Could you please try this:
 
 # cd /usr/src/contrib
 # mv openpam openpam.orig
 # svn export svn://svn.des.no/openpam/trunk@526 openpam
 # cd ../lib/libpam
 # make depend  make all  make install

[snip]
building shared library libpam.so.5
make: don't know how to make openpam.3. Stop
*** Error code 2

Other than that, it works great and doesn't get tripped up by my
obsolete /etc/pam.conf.  Thanks!

___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Re: Data corruption over NFS in -current

2012-01-11 Thread Rick Macklem
Martin Cracauer wrote:
 Stefan Bethke wrote on Wed, Jan 11, 2012 at 07:14:44PM +0100:
  Am 11.01.2012 um 17:57 schrieb Martin Cracauer:
 
   I'm sorry for the unspecific bug report but I thought a heads-up
   is
   better than none.
  
   $ uname -a
   FreeBSD wings.cons.org 10.0-CURRENT FreeBSD 10.0-CURRENT #2: Wed
   Dec
   28 12:19:21 EST 2011
   craca...@wings.cons.org:/usr/src/sys/amd64/compile/WINGS amd64
 
  I'm sure Rick will want to know which NFS version, which client code
  (default new code I'm assuming) and which mount options...
 
 It's all default both in fstab and as reported by mount(8).
 
I assume that by the above statement, you mean that you don't specify any
mount options in your /etc/fstab entry except rw? (If this isn't correct,
please post your /etc/fstab entries for the NFS mounts.)

- If I am correct, in that you just specify rw, the main difference
  between the old and new NFS client will be the rsize/wsize used. The
  new NFS client will use MAX_BSIZE (64Kb) decreased to whatever the
  server says is the largest it can handle. This should be fine, unless
  the server says it can handle = 64Kb, but actually only works correctly
  for 32Kb (which is what the old NFS client will default to, I think?).

A few things to try/check:
- Look locally on the server to see if the file is corrupted there.
- Try the old NFS client. (Set the fs type to oldnfs instead of nfs
  on the lines in your /etc/fstab.)
  - If switching to the old client helps, it might be a bug in the way the
new client generates the create verifier. I just looked at the code and
I'm not certain the code in the new client would work correctly for a
amd64. (I only have i386 to test with.)
- I can easily generate a patch that changes the new client to do this
  the same way as the old client, but there is no point, unless the old
  client doesn't have the problem.
-- Exclusive create problems might explain the incorrect ownership,
since it first does a create that will fill in user/group in whatever
default way the Linux server chooses to and then does a Setattr RPC
to change them to the correct values. If the Setattr RPC fails, then
the file exists owned by whatever the server chooses. (I don't know
if Linux servers use the gid of the directory or the gid of the
requestor or ???)
- If you have a non-Linux NFS server, try running against that to see if it
  is a Linux server specific problem. (Since I haven't seen any other reports
  like this, I suspect it might be an interoperability problem related to the
  Linux server.)

Also, if you can reproduce the problem fairly easily, capture a packet trace via
# tcpdump -s 0 -w xxx host server 
running on the client (or similar). Then email me xxx as an attachment and
I can look at it in wireshark. (If you choose to look at it in wireshark, I
would suggest you look for Create RPCs to see if they are Exclusive Creates,
plus try and see where the data for the corrupt file is written.)

Even if the capture is pretty large, it should be easy to find the interesting
part, so long as you know the name of the corrupt file and search for that.

 This is a diskless PXE boot but the mount affected (usr) is not the
 root filesystem, so this should come in via fstab.
 
 BTW, my /usr/ports is another mount so the corruption is cross-mount
 (garbage from /usr/ports entering /usr).
 
 Appending nfsstat output.
 
nfsstat output is pretty useless for this kind of situation. I did find
it interesting that you do so many Fsstat RPCs, but that shouldn't be
a problem, it's just weird to see that.

rick
 I am re-running things contiguously to see how reproducible this is.
 This machine was recently updated from a -current almost a year old,
 so it's its first time with the new NFS client code.
 
 Martin
 
   I see filesystem corruption on NFS filesystems here. I am running
   a
   heavy shellscript that is noodling around with ascii files
   assembling
   them with awk and whatnot. Some actions are concurrent with up to
   21
   forks doing full-CPU load scripting. This machine is a K8 with a
   total of 8 cores, diskless NFS and memory filesystem for /tmp.
  
   I observe two problems:
   - for no reason whatsoever, some files change from my
(user/group) cracauer/wheel to root/cracauer
   - the same files will later be corrupted. The beginning of the
   file
is normal but then it has what looks like parts of /usr/ports,
including our CVS files and binary junk, mostly zeros
  
   I did do some ports building lately but not at the same time that
   this
   problem manifested itself. I speculate some ports blocks were
   still
   resident in the filesystem buffer cache.
  
   Server is Linux.
  
   Martin
   --
   %%%
   Martin Cracauer craca...@cons.org http://www.cons.org/cracauer/
   ___
   

Re: Data corruption over NFS in -current

2012-01-11 Thread Martin Cracauer
Rick Macklem wrote on Wed, Jan 11, 2012 at 08:42:25PM -0500: 
 Martin Cracauer wrote:
  Stefan Bethke wrote on Wed, Jan 11, 2012 at 07:14:44PM +0100:
   Am 11.01.2012 um 17:57 schrieb Martin Cracauer:
  
I'm sorry for the unspecific bug report but I thought a heads-up
is
better than none.
   
$ uname -a
FreeBSD wings.cons.org 10.0-CURRENT FreeBSD 10.0-CURRENT #2: Wed
Dec
28 12:19:21 EST 2011
craca...@wings.cons.org:/usr/src/sys/amd64/compile/WINGS amd64
  
   I'm sure Rick will want to know which NFS version, which client code
   (default new code I'm assuming) and which mount options...
  
  It's all default both in fstab and as reported by mount(8).
  
 I assume that by the above statement, you mean that you don't specify any
 mount options in your /etc/fstab entry except rw? (If this isn't correct,
 please post your /etc/fstab entries for the NFS mounts.)

172.18.30.2:/home/diskless/freebsd-current-usr  /usrnfs rw 0 0
172.18.30.2:/home/diskless/usr-ports/usr/ports  nfs rw 0 0

 - If I am correct, in that you just specify rw, the main difference
   between the old and new NFS client will be the rsize/wsize used. The
   new NFS client will use MAX_BSIZE (64Kb) decreased to whatever the
   server says is the largest it can handle. This should be fine, unless
   the server says it can handle = 64Kb, but actually only works correctly
   for 32Kb (which is what the old NFS client will default to, I think?).

I'll try 32 KB.

 A few things to try/check:
 - Look locally on the server to see if the file is corrupted there.

Yes it has the corrupted version of the file, and in a new run I had
another file changed to root ownership and that is the same from
server and client standpoint.

The good news is that this seems fairly reproducible, the root
ownership is back.  This time I stopped the script when ownership
changed so I don't know whether it would have gone forward with
corrupting the file afterwards.

 - Try the old NFS client. (Set the fs type to oldnfs instead of nfs
   on the lines in your /etc/fstab.)
   - If switching to the old client helps, it might be a bug in the way the
 new client generates the create verifier. I just looked at the code and
 I'm not certain the code in the new client would work correctly for a
 amd64. (I only have i386 to test with.)
 - I can easily generate a patch that changes the new client to do this
   the same way as the old client, but there is no point, unless the old
   client doesn't have the problem.
 -- Exclusive create problems might explain the incorrect ownership,
 since it first does a create that will fill in user/group in whatever
 default way the Linux server chooses to and then does a Setattr RPC
 to change them to the correct values. If the Setattr RPC fails, then
 the file exists owned by whatever the server chooses. (I don't know
 if Linux servers use the gid of the directory or the gid of the
 requestor or ???)
 - If you have a non-Linux NFS server, try running against that to see if it
   is a Linux server specific problem. (Since I haven't seen any other reports
   like this, I suspect it might be an interoperability problem related to the
   Linux server.)

I should mention that I also updated the server to Linux-3.1.5 two
weeks ago.  I'm not sure I put I put heavy load on it since then.

I will have a Linux NFS client do the same thing and try the FreeBSD
things you mention.

 Also, if you can reproduce the problem fairly easily, capture a packet trace 
 via
 # tcpdump -s 0 -w xxx host server 
 running on the client (or similar). Then email me xxx as an attachment and
 I can look at it in wireshark. (If you choose to look at it in wireshark, I
 would suggest you look for Create RPCs to see if they are Exclusive Creates,
 plus try and see where the data for the corrupt file is written.)
 
 Even if the capture is pretty large, it should be easy to find the interesting
 part, so long as you know the name of the corrupt file and search for that.

That's probably not practical, we are talking about hammering the NFS
server with several CPU hours worth of parallel activity in a
shellscript but I'll do my best :-)

Martin

  This is a diskless PXE boot but the mount affected (usr) is not the
  root filesystem, so this should come in via fstab.
  
  BTW, my /usr/ports is another mount so the corruption is cross-mount
  (garbage from /usr/ports entering /usr).
  
  Appending nfsstat output.
  
 nfsstat output is pretty useless for this kind of situation. I did find
 it interesting that you do so many Fsstat RPCs, but that shouldn't be
 a problem, it's just weird to see that.
 
 rick
  I am re-running things contiguously to see how reproducible this is.
  This machine was recently updated from a -current almost a year old,
  so it's its first time with the new NFS client code.
  
  Martin
  
I see filesystem corruption on NFS 

Re: CAM Target Layer available

2012-01-11 Thread Kenneth D. Merry
On Wed, Jan 04, 2012 at 21:53:11 -0700, Kenneth D. Merry wrote:
 
 The CAM Target Layer (CTL) is now available for testing.  I am planning to
 commit it to to head next week, barring any major objections.
 
 CTL is a disk and processor device emulation subsystem originally written
 for Copan Systems under Linux starting in 2003.  It has been shipping in
 Copan (now SGI) products since 2005.
 
 It was ported to FreeBSD in 2008, and thanks to an agreement between SGI
 (who acquired Copan's assets in 2010) and Spectra Logic in 2010, CTL is
 available under a BSD-style license.  The intent behind the agreement was
 that Spectra would work to get CTL into the FreeBSD tree.
 
 The patches are against FreeBSD/head as of SVN change 229516 and are
 located here:
 
 http://people.freebsd.org/~ken/ctl/ctl_diffs.20120104.4.txt.gz
 
 The code is not perfect (few pieces of software are), but is in good
 shape from a functional standpoint.  My intent is to get it out there for
 other folks to use, and perhaps help with improvements.
 
 There are a few other CAM changes included with these diffs, some of which
 will be committed separately from CTL, some concurrently.  This is a quick
 summary:
 
  - Fix a panic in the da(4) driver when a drive disappears on boot.
  - Fix locking in the CAM EDT traversal code.
  - Add an optional sysctl/tunable (disabled by default) to suppress
duplicate devices.  This most frequently shows up with dual ported SAS
drives.
  - Add some very basic error injection into the da(4) driver.
  - Bump the length field in the SCSI INQUIRY CDB to 2 bytes to line up with
more recent SCSI specs.
 
 CTL Features:
 
 
  - Disk and processor device emulation.
  - Tagged queueing
  - SCSI task attribute support (ordered, head of queue, simple tags)
  - SCSI implicit command ordering support.  (e.g. if a read follows a mode
select, the read will be blocked until the mode select completes.)
  - Full task management support (abort, LUN reset, target reset, etc.)
  - Support for multiple ports
  - Support for multiple simultaneous initiators
  - Support for multiple simultaneous backing stores
  - Persistent reservation support
  - Mode sense/select support
  - Error injection support
  - High Availability support (1)
  - All I/O handled in-kernel, no userland context switch overhead.
 
 (1) HA Support is just an API stub, and needs much more to be fully
 functional.  See the to-do list below.
 
 Configuring and Running CTL:
 ===
 
  - After applying the CTL patchset to your tree, build world and install it
on your target system.
 
  - Add 'device ctl' to your kernel configuration file.
 
  - If you're running with a 8Gb or 4Gb Qlogic FC board, add
'options ISP_TARGET_MODE' to your kernel config file.  'device ispfw'
or loading the ispfw module is also recommended.
 
  - Rebuild and install a new kernel.
 
  - Reboot with the new kernel.
 
  - To add a LUN with the RAM disk backend:
 
   ctladm create -b ramdisk -s 10485760
   ctladm port -o on
 
  - You should now see the CTL disk LUN through camcontrol devlist:
 
 scbus6 on ctl2cam0 bus 0:
 FREEBSD CTLDISK 0001 at scbus6 target 1 lun 0 (da24,pass32)
  at scbus6 target -1 lun -1 ()
 
This is visible through the CTL CAM SIM.  This allows using CTL without
any physical hardware.  You should be able to issue any normal SCSI
commands to the device via the pass(4)/da(4) devices.
 
If any target-capable HBAs are in the system (e.g. isp(4)), and have
target mode enabled, you should now also be able to see the CTL LUNs via
that target interface.
 
Note that all CTL LUNs are presented to all frontends.  There is no
LUN masking, or separate, per-port configuration.
 
  - Note that the ramdisk backend is a fake ramdisk.  That is, it is
backed by a small amount of RAM that is used for all I/O requests.  This
is useful for performance testing, but not for any data integrity tests.
 
  - To add a LUN with the block/file backend:
 
   truncate -s +1T myfile
   ctladm create -b block -o file=myfile
   ctladm port -o on
 
  - You can also see a list of LUNs and their backends like this:
 
 # ctladm devlist
 LUN Backend   Size (Blocks)   BS Serial NumberDevice ID   
   0 block2147483648  512 MYSERIAL   0 MYDEVID   0 
   1 block2147483648  512 MYSERIAL   1 MYDEVID   1 
   2 block2147483648  512 MYSERIAL   2 MYDEVID   2 
   3 block2147483648  512 MYSERIAL   3 MYDEVID   3 
   4 block2147483648  512 MYSERIAL   4 MYDEVID   4 
   5 block2147483648  512 MYSERIAL   5 MYDEVID   5 
   6 block2147483648  512 MYSERIAL   6 MYDEVID   6 
   7 block2147483648  512 MYSERIAL   7 MYDEVID   7 
   8 block2147483648  512 MYSERIAL   8   

Re: Data corruption over NFS in -current

2012-01-11 Thread Dan Nelson
In the last episode (Jan 11), Martin Cracauer said:
 Rick Macklem wrote on Wed, Jan 11, 2012 at 08:42:25PM -0500: 
  Also, if you can reproduce the problem fairly easily, capture a packet
  trace via
  # tcpdump -s 0 -w xxx host server 
  running on the client (or similar). Then email me xxx as an attachment
  and I can look at it in wireshark.  (If you choose to look at it in
  wireshark, I would suggest you look for Create RPCs to see if they are
  Exclusive Creates, plus try and see where the data for the corrupt file
  is written.)
  
  Even if the capture is pretty large, it should be easy to find the
  interesting part, so long as you know the name of the corrupt file and
  search for that.
 
 That's probably not practical, we are talking about hammering the NFS
 server with several CPU hours worth of parallel activity in a shellscript
 but I'll do my best :-)

The tcpdump options -C and -W can help here.  For example, -C 1000 -W 10
will keep the most recent 10-GB of traffic by circularly writing to 10 1-GB
capture files.  All you need to do is kill the tcpdump when you discover the
corruption, and work backwards through the logs until you find your file.

-- 
Dan Nelson
dnel...@allantgroup.com
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


Can you use a USB3.0 hub?

2012-01-11 Thread Kohji Okuno
Hi,

Can you use a USB3.0 hub?

I tried a USB3.0 hub (BUFFALO BSH4A04U3BK).
And I used 8-stable and PCI-E card (BUFFALO IFC-PCIE2U3)

The hub is for only japanese market.
The card is NEC’s 720200 chip
http://www.buffalotech.com/products/accessories/interface-card-adapters/usb-30-pci-express-interface-card/


The kernel could not recognize USB3.0 HDD that connected to this hub
as the following log. But, the kernel could reconize USB2.0 HDD that
connected to this hub.

Regards,
 Kohji Okuno

-- log 
xhci0: XHCI (generic) USB 3.0 controller mem 0xf7ffe000-0xf7ff irq 28 at d
evice 0.0 on pci1
xhci0: [ITHREAD]
xhci0: 32 byte context size.
usbus0 on xhci0
  ...

ugen0.2: VIA Labs, Inc. at usbus0
uhub11: VIA Labs, Inc. 4-Port USB 3.0 Hub, class 9/0, rev 3.00/3.74, addr 1 on
 usbus0
uhub11: 4 ports with 4 removable, self powered
usb_alloc_device: set address 3 failed (USB_ERR_IOERROR, ignored)
usbd_req_re_enumerate: addr=3, set address failed! (USB_ERR_IOERROR, ignored)
usbd_req_re_enumerate: addr=3, set address failed! (USB_ERR_IOERROR, ignored)
ugen0.3: Unknown at usbus0 (disconnected)
uhub_reattach_port: could not allocate new device
uhub_reattach_port: device problem (USB_ERR_STALLED), disabling port 4
ugen0.3: vendor 0x2109 at usbus0
uhub12: vendor 0x2109 USB2.0 Hub, class 9/0, rev 2.00/2.74, addr 2 on usbus0
uhub12: 4 ports with 4 removable, self powered
usb_alloc_device: set address 4 failed (USB_ERR_IOERROR, ignored)
usbd_req_re_enumerate: addr=4, set address failed! (USB_ERR_IOERROR, ignored)
usbd_req_re_enumerate: addr=4, set address failed! (USB_ERR_IOERROR, ignored)
ugen0.4: Unknown at usbus0 (disconnected)
uhub_reattach_port: could not allocate new device
uhub_reattach_port: device problem (USB_ERR_STALLED), disabling port 4
usb_alloc_device: set address 4 failed (USB_ERR_IOERROR, ignored)
usbd_req_re_enumerate: addr=4, set address failed! (USB_ERR_IOERROR, ignored)
usbd_req_re_enumerate: addr=4, set address failed! (USB_ERR_IOERROR, ignored)
ugen0.4: Unknown at usbus0 (disconnected)
uhub_reattach_port: could not allocate new device
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org

Build Option Survey results

2012-01-11 Thread Bjoern A. Zeeb
Hey,

after two years I had the opportunity to run the build option survey,
initially done by phk, again.  The number of options seems to have grown
quite a bit it felt.  I have not even looked at the results yet but here
they are fresh off the machine:

   http://people.freebsd.org/~bz/build_option_survey_20120106/

Special thanks go to np, sbruno and bhaga for bringing worm back to life.

/bz

PS: the last run from 2010 can still be found here:

  http://people.freebsd.org/~bz/build_option_survey_20100104/

-- 
Bjoern A. Zeeb You have to have visions!
   It does not matter how good you are. It matters what good you do!
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org


kernel config files outside of sys/${ARCH}/conf ?

2012-01-11 Thread Luigi Rizzo
usr/sbin/config assumes that the kernel config file
lives in ${src_base}/sys/${arch}/conf , which means that
if you need to build a custom kernel one needs RW
access to that directory.

Any idea on how we can enable config to work in a
generic directory ?

I scanned the source code usr.sbin/config and found that
it uses hardwired paths -- specifically, it looks for
the kernel source tree in ../.. and has multiple
hardwired paths such as ../../conf/.
There is also a somewhat undocumented access to a
file called DEFAULTS that extends the configuration you pass.

Any objections to the addition of a -s option to config(8)
to specify the location of the source tree ?

cheers
luigi
___
freebsd-current@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-current
To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org