[linuxkernelnewbies] taskset - retrieve or set a pro cess’s CPU affinity

2009-04-26 Thread Peter Teoh





TASKSET(1)   
Linux User’s Manual   TASKSET(1)

NAME
   taskset - retrieve or set a process’s CPU affinity

SYNOPSIS
   taskset [options] mask command [arg]...
   taskset [options] -p [mask] pid

DESCRIPTION
   taskset  is used to set or retrieve the CPU affinity of a
running process given its PID or to launch a new COMMAND with a given
CPU affinity.  CPU affinity
   is a scheduler property that "bonds" a process to a given set of
CPUs on the system.  The Linux scheduler will honor the given CPU
affinity and the process
   will not run on any other CPUs.  Note that the Linux scheduler
also supports natural CPU affinity: the scheduler attempts to keep
processes on the same CPU
   as long as practical for performance reasons.  Therefore,
forcing a specific CPU affinity is useful only in certain applications.

   The CPU affinity is represented as a bitmask, with the lowest
order bit corresponding to the first logical CPU and the highest order
bit  corresponding  to
   the  last logical CPU.  Not all CPUs may exist on a given system
but a mask may specify more CPUs than are present.  A retrieved mask
will reflect only the
   bits that correspond to CPUs physically on the system.  If an
invalid mask is given (i.e., one that corresponds to no valid CPUs on
the current system)  an
   error is returned.  The masks are typically given in
hexadecimal.  For example,

   0x0001
  is processor #0

   0x0003
  is processors #0 and #1

   0x
  is all processors (#0 through #31)

   When taskset returns, it is guaranteed that the given program
has been scheduled to a legal CPU.

OPTIONS
   -p, --pid
  operate on an existing PID and not launch a new task

   -c, --cpu-list
  specify  a  numerical  list  of processors instead of a
bitmask.  The list may contain multiple items, separated by comma, and
ranges.  For example,
  0,5,7,9-11.

   -h, --help
  display usage information and exit

   -V, --version
  output version information and exit

USAGE
   The default behavior is to run a new command with a given
affinity mask:
  taskset mask command [arguments]

   You can also retrieve the CPU affinity of an existing task:
  taskset -p pid

   Or set it:
  taskset -p mask pid

PERMISSIONS
   A user must possess CAP_SYS_NICE to change the CPU affinity of a
process.  Any user can retrieve the affinity mask.

AUTHOR
   Written by Robert M. Love.


===


Description of problem:

Install Fedora 9 (sorry, I can not find the entry for F9 bug report). 
Set one task affinity to one CPU core, then set offline the CPU core. After that
we can not set online the CPU core again.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.  run the test.sh script:   test.sh 1  ( 1 is the logical CPU )
test.sh script:
 
#!/bin/bash

main(){

typeset -i CPU=$1
./task.sh > /dev/null&
PID=$!

if [ `cat /sys/devices/system/cpu/cpu${CPU}/online` = "0" ]; then
echo "1" > /sys/devices/system/cpu/cpu${CPU}/online
fi

MASK=$((1<<${CPU}))

`taskset -p ${MASK} ${PID} > /dev/null 2>&1`

echo "0" > /sys/devices/system/cpu/cpu${CPU}/online

echo "1" > /sys/devices/system/cpu/cpu${CPU}/online

disown $PID
kill -9 $PID > /dev/null 2>&1

echo "PASS\n"

}

typeset -i TEST_CPU=$1
main $TEST_CPU

2. task.sh script as following

#!/bin/bash

while :
do
  NOOP=1
done

3.
  
Actual results:

The test.sh will block at set online the CPU ( echo "1" >
/sys/devices/system/cpu/cpu${CPU}/online ). 

Expected results:


Additional info:
Happened in Intel Bensley platform (2xXeon 2.83G Harpertown C0, chipset
Blackford G1, 160 SATA)  
  --- Comment #1
>From  Bill Nottingham  2008-02-18 13:07:03
EDT ---  
Does this happen on the upstream kernel as well?  
 
  --- Comment #2
>From  Song, Youquan  2008-02-21 03:59:48
EDT ---  
Yes. the kernel 2.6.24 to 2.6.25-rc2 also exit the bug. 
But the bug is not exit at kernel 2.6.18.   
 
  --- Comment #3
>From  Chuck Ebbert  2008-02-25 17:33:21 EDT
---  
Does the CPU mask of the running process get changed when the processor is
offlined?

And can you get a system state (alt-sysrq-t) when the script hangs?  
 
  --- Comment #4
>From  Song, Youquan  2008-02-27 04:15:50
EDT ---  
Yes, after I set offline the CPU, I use commands "taskset -p $PID and ps --
pid=$PID -o psr" to find that process CPU mask is change and process migrate 
to other CPU correctly.
Attachment is the Screenshot.png  
 
  --- Comment #5
>From  Song, Youquan  2008-02-27 04:18:13
EDT ---  
Created an attachment (id=296037) [details]
CPU can not  do hotplu

[linuxkernelnewbies] Direct Memory Access (DMA) and Interrupt Handling

2009-04-26 Thread Peter Teoh





http://www.eventhelix.com/RealtimeMantra/FaultHandling/dma_interrupt_handling.htm

DMA and Interrupt Handling
In this series on hardware basics, we have already looked at read
and write bus cycles. In this article we will cover Direct Memory
Access
(DMA) and Interrupt Handling. Knowledge of DMA and interrupt handling
would be
useful in writing code that interfaces directly with IO devices (DMA
based serial port design pattern is a good example of such a
device).
We will discuss the following topics:

  

  Direct
Memory Access (DMA)
  A typical DMA operation is described here.
Interactions between the main CPU and DMA device are covered. The
impact of DMA on processor's internal cache is also covered.


  Interrupt
Handling
  Processor handling of hardware interrupts is
described in this section. 


  Interrupt
Acknowledge Cycle
  Many processors allow the interrupting hardware
device to identify itself. This speeds up interrupt handling as the
processor can directly invoke the interrupt service routine for the
right device.


  Synchronization
Requirements for DMA and Interrupts
  Software designers need to keep in mind that DMA
operations can be triggered at bus cycle boundary while interrupts can
only be triggered at instruction boundary.

  

Direct Memory Access (DMA)

  Device wishing to perform DMA asserts the processors bus request
signal.
  Processor completes the current bus cycle and then asserts the
bus grant signal to the device.
  The device then asserts the bus grant ack signal.
  The processor senses in the change in the state of bus grant ack
signal and starts listening to the data and address bus for DMA
activity.
  The DMA device performs the transfer from the source to
destination address.
  During these transfers, the processor monitors the addresses on
the bus and checks if any location modified during DMA operations is
cached in the processor. If the processor detects a cached address on
the bus, it can take one of the two actions:

  Processor invalidates the internal cache entry for the
address involved in DMA write operation
  Processor updates the internal cache when a DMA write is
detected

  
  Once the DMA operations have been completed, the device releases
the bus by asserting the bus release signal.
  Processor acknowledges the bus release and resumes its bus cycles
from the point it left off.

Interrupt Handling
Here we describe interrupt handling in a scenario where the hardware
does not
support identifying the device that initiated the interrupt. In such
cases, the
possible interrupting devices need to be polled in software.

  A device asserts the interrupt signal at a hardwired interrupt
level.
  The processor registers the interrupt and waits to finish the
current instruction execution.
  Once the current instruction execution is completed, the
processor initiates the interrupt handling by saving the current
register contents on the stack.
  The processor then switches to supervisor mode and initiates an
interrupt acknowledge cycle.
  No device responds to the interrupt acknowledge cycle, so the
processor fetches the vector corresponding to the interrupt level.
  The address found at the vector is the address of the interrupt
service routine (ISR).
  The ISR polls all the devices to find the device that caused the
interrupt. This is accomplished by checking the interrupt status
registers on the devices that could have triggered the interrupt.
  Once the device is located, control is transferred to the handler
specific to the interrupting device.
  After the device specific ISR routine has performed its job, the
ISR executes the "return from interrupt" instruction.
  Execution of the "return from interrupt" instruction results in
restoring the processor state. The processor is restored back to user
mode.

Interrupt Acknowledge Cycle
Here we describe interrupt handling in a scenario where the hardware
does
support identifying the device that initiated the interrupt. In such
cases, the
exact source of the interrupt can be identified at hardware level.

  A device asserts the interrupt signal at a hardwired interrupt
level.
  The processor registers the interrupt and waits to finish the
current instruction execution.
  Once the current instruction execution is completed, the
processor initiates the interrupt handling by saving the current
register contents on the stack.
  The processor then switches to supervisor mode and initiates an
interrupt acknowledge cycle.
  The interrupting device responds to the interrupt acknowledge
cycle with the vector number for the interrupt.
  Processor uses the vector number obtained above and fetches the
vector.
  The address found at the vector is the address of the interrupt
service routine (ISR) for the interrupting device.
  After the  ISR routine has performed its job, the ISR executes
the "return from interrupt" instruction.
  Execution of the "retur

[linuxkernelnewbies] Kernel Log: What's new in 2.6.29 - Part 3: Kernel controlled graphics modes - The H: Security news and Open source developments

2009-04-26 Thread Peter Teoh





http://www.h-online.com/news/Kernel-Log-What-s-new-in-2-6-29-Part-3-Kernel-controlled-graphics-modes--/112431

Kernel Log: What's new in 2.6.29 - Part 3: Kernel controlled
graphics modes
With the release of 2.6.29-rc1
last weekend, Linus Torvalds concluded the first phase, called the
merge window, of the development cycle. This phase allows for
incorporating the substantial changes intended for the next kernel
version into the source code management system
of the Linux kernel. As a result, 2.6.29 is now in the second,
stabilising phase, which usually takes eight to ten weeks and gives the
kernel developers the opportunity to correct mistakes and make minor
changes that are unlikely to cause further flaws. As major changes are
only rarely discarded during the stabilising phase, the kernel log can
already discuss the most important changes expected for 2.6.29 in the
"What's new in 2.6.29" series.
Kernel-based mode setting
Almost 21 months after its
first major announcement,
the support for kernel-based mode setting (KMS) for recent Intel
graphics hardware has been integrated into the main development branch
of Linux (for example 1, 2, 3).
This technology gives the kernel noticeably more control over the
graphics hardware. When KMS is active, the kernel sets the graphics
mode suitable for a monitor as soon as all the required hardware
components (ACPI, PCI, graphics hardware etc.) have been initialised.
From a user's perspective, this approach is initially no different
from framebuffer graphics with suitable drivers. However, in contrast
to framebuffer graphics, the kernel also sets the screen resolution
during operation, taking over this, and other tasks, from the X server.
If the X server and a text console, managed with KMS, use the same
screen resolution, the kernel no longer needs to reset the graphics
chip and screen resolution when switching between the graphics
interface and the console; this was previously required every time the
user switched to X and VGA text or framebuffer consoles, because the
kernel didn't know the X Server's configuration of the graphics chip.
As a result, switching with KMS – for example while booting, when the X
server first starts up – is considerably faster and is no longer
afflicted by screen flickering or short display disruptions.
Because the kernel controls the graphics hardware in KMS, problems
that arise when the VGA console and framebuffer driver, the Direct
Rendering Manager (DRM) and various userspace programs, including the X
server, compete for access to the graphics hardware, can be eliminated.
With KMS, when waking up from suspend mode, the kernel also handles the
entire graphics hardware re-initialisation, which is designed to solve
some of the problems with using the suspend modes.
With KMS, X servers will reportedly also operate without root
privileges; this and several other improvements associated with KMS are
to facilitate the parallel operation of several X servers, allowing
users to switch backwards and forwards (fast user switching). KMS will
also allow Linux to snatch control from the X server in case of a
serious kernel problem (kernel panic) and display troubleshooting
instructions similar to those displayed for the dreaded blue screen in
Windows – some developers have talked about a "Blue
Penguin Of Death", but this isn't possible with the code
incorporated in 2.6.29.
To avoid hardware access disagreements between the X server and the
kernel, the X server and its graphics driver must also support KMS.
However, X and kernel hacker Dave Airlie, who is responsible for the
kernel's DRM code, explicitly says in his patch integration request
that these parts are still being developed and currently are only
intended for developers; therefore, KMS should not be enabled during
kernel configuration, without the required userspace support.
It is likely to be some time until the kernel is ready for KMS with
Radeon hardware: Although the KMS code for Radeon GPUs is already
available, it is based on the TTM Memory Manager rather than the more
recent Graphics Execution Manager (GEM) incorporated
with 2.6.28 and so far that is geared to work with Intel hardware.
However, according
to Dave Airlie
the TTM code is not mature enough to be integrated into the official
kernel yet. It will probably be even longer until KMS becomes available
with a standard kernel and Nvidia hardware, unless the developers of
the Nouveau driver,
which was created using reverse engineering, can pull some mature KMS
code out of their hats, or Nvidia decides to provide KMS support. The
latter is particularly likely to improve the reliability of the suspend
modes, which often malfunction with the open source drivers for Nvidia
hardware.
More graphics
The Graphics Execution Manager (GEM), which is still set to work
with Intel hardware and manages the main memory as well as the access
to the GPU's processing units, has been extended to include new
features in 2.6.29 (1, 2).
Several further

[linuxkernelnewbies] Kernel Log: main development phase for 2.6.29 ends, new X.org drivers - The H: Security news and Open source developments

2009-04-26 Thread Peter Teoh





http://www.h-online.com/news/Kernel-Log-main-development-phase-for-2-6-29-ends-new-X-org-drivers--/112399

Kernel Log: main development phase for 2.6.29 ends, new X.org
drivers
With the release of
2.6.29-rc1
on Saturday night, Linus Torvalds has closed the 2.6.29 merge window
and brought to a close the development phase, during which the major
new features for the next version of Linux are adopted. All significant
changes in 2.6.29 should now be in the Linux source code management
system, including new features previously discussed on heise open such
as WiMAX, access
point support and the Btrfs and Squashfs
file systems.
These changes are just some of the more conspicuous changes adopted
by the kernel hackers for 2.6.29. Support has been added for
kernel-based mode setting on Intel graphics hardware and improvements
have been made to the Graphics Execution Manager (GEM), which was
integrated with
2.6.28.
The SCSI subsystem now supports Fibre Channel over Ethernet (FCoE) and
there are fixes to, and new functions in, the eCryptfs, Ext4, OCFS2 and
XFS file systems. There are also numerous new and revised drivers,
including new or revised audio drivers from the Alsa project and over
600 changes to the V4L/DVB drivers. These are now joined by various, in
some cases very large, staging drivers, such as the Comedi
framework, or support for Google's Android. heise open's Kernel Log
will carry detailed reports on these and other changes over the next
few weeks as part of our "What's coming in 2.6.29" series.
The realtime defragmenter (online ext4 defragmentation) has
not made it into 2.6.29 – Theodore Tso explains why on LKML. Also left out, for the time
being, are support for operation as a primary Xen domain (Dom0) and compression
of the kernel image with bzip2/lzma. It looks
like it could also be a while before support for kernel-based mode
setting with AMD hardware meets the kernel development team's quality
standards.
All about X.org
AMD developer Alex Deucher has released version 6.10
of the xf86-video-ati driver package, usually known simply as ati or
radeon. It includes support for the RV710 (Radeon HD 4300/HD 4500) and
RV730 (Radeon HD 4600) Radeon chips. The new version also reduces
tearing during video playback and supports Bicubic Xv scaling on
r3xx/r4xx/r5xx/rs690 Radeon chips. The developer discusses further
changes on his blog.
Matthias Hopf has now released the AtomBIOS disassembler previously
used for programming the alternative Radeon graphics driver radeonhd.
He describes some of the background to the tool on his
blog.
The X.org developers have also released
version 1.4.0 of the xf86-input-mouse mouse driver. This driver deals
with many of the tasks previously dealt with by X server, and the code
responsible for this has been removed from X server – with the result
that in X server 1.6, currently under development, users will, unless
their systems use Evdev, need at least version 1.4.0 of
xf86-input-mouse.
In Brief:

  Following LWN.net's occasional publication of analysis of which
kernel developers have, for instance, introduced the most or the
largest changes into a kernel version (e.g. 1, 2,
3), Wang Chen has been trying his hand at a similar set
of online statistics.


  SELinux hacker James Morris has announced the creation of the Kernel Security Wiki on his
blog, where he has also recently summarised
all the most significant security-related changes in Linux 2.6.28.


  As part of the discussion on the adoption of Squashfs,
Greg Kroah-Hartman has declared that he will in future accept file
systems into the staging directory, as long as they do not require
changes in other parts of the kernel.


  Daniel Phillips is continuing to work on Tux3 and is keeping the
developer community updated on new features or internal matters in his
"Tux3 Report" – a recent e-mail to LKML, for example, elucidates the
current structure of the file system.


  The kernel development team are planning to hold a "Linux Storage and Filesystem Summit 2009" in San
Francisco in early April.


  A group of developers are working on open source firmware
for some of the Broadcom WLAN chips supported under Linux by the b43
driver; this firmware does not, however, appear to work for all
testers. Marvell has made WLAN firmware for the GSPI-88W8686 available to download, but has not released the
source code.


  As reported elsewhere, Nvidia has
released version 180.22 of its proprietary graphics driver for x86-32 and x86-64 Linux.

Further background and information about developments in the
Linux kernel and its environment can also be found in previous issues
of the kernel log at heise open:

  Kernel Log:
What's new in 2.6.29 - Part 2: WiMax
  Kernel
Log: What's new in 2.6.29 - Part 1: Dodgy Wifi drivers and AP support
  Kernel
Log: 2.6.29 development kicks off, improved 3D support
  Kernel
Log: Higher and Further, The innovations of Linux 2.6.28
  Kernel
Log: What's coming in 2.6.28 - Part 9: Fastboot and other 

[linuxkernelnewbies] Kernel Log: What's new in 2.6.29 - Part 2: WiMAX - The H: Security news and Open source developments

2009-04-26 Thread Peter Teoh




http://www.h-online.com/news/Kernel-Log-What-s-new-in-2-6-29-Part-2-WiMAX--/112393

Kernel Log: What's new in 2.6.29 - Part 2: WiMAX



 

 In Part 2 of the Kernel Log's coverage of the major changes
happening in the main development branch for the Linux kernel 2.6.29
release, we look at a major new addition to Linux's networking
capability, WiMAX support. 
USB sub-system maintainer Greg Kroah-Hartman has brought the WiMAX stack,
developed primarily by Intel developers in the framework of the Linux
WiMAX project, into the Linux main development branch.
The stack gives Linux 2.6.29 a basic infrastructure for WiMAX
wireless broadband networking technology based on the i2400m USB
driver, which was also developed by the WiMAX project and concurrently
integrated into the kernel. The WiMAX stack communicates with the WiMAX Connection 2400 chip in Intel Wireless WiMAX/WiFi Link 5150 and 5350
(codename: Echo Peak) WLAN/WiMAX modules, found mainly in newer
Centrino notebooks.
As the change log in the ultimately successful e-mail request for integration
shows, Linux WiMAX developers made a number of attempts before the
network and USB sub-system administrators were satisfied with the code
and gave it the green light for integration into the kernel. Numerous
details and background information on the Linux kernel's new WiMAX
infrastructure can be found in the e-mail mentioned above, by following
the links at the end of this article to commits in the source code
administration system, and on the Linux WiMAX website.
Also, on the website you can download the
i2400m firmware
and the corresponding userspace software. However, the Intel WiMAX
binary supplicant needed for authentication with the remote host, as
well as the Intel WiMAX binary OMADM client are only available online
as a pre-compiled archive (license, FAQ).
Therefore, distributions based solely on open source software, such as
Debian, Fedora and OpenSuse, will not yet include these parts of the
userspace stack in their core distributions. However, in the e-mail
mentioned above, Intel developers do say "For networks that require
authentication (most), the Intel device requires a supplicant in user
space – because of a set of issues we are working to resolve, it cannot
be made open source yet, but it will".
See – Part
1 of Whats new in 2.6.29.
The WiMAX Changes in detail

  i2400m: debugfs controls
  i2400m: documentation and instructions for usage
  i2400m: firmware loading and bootrom initialization
  i2400m: Generic probe/disconnect, reset and message
passing
  i2400m: host/device procotol and core driver definitions
  i2400m: linkage to the networking stack
  i2400m: Makefile and Kconfig
  i2400m: RX and TX data/control paths
  i2400m/SDIO: firmware upload backend
  i2400m/SDIO: header for the SDIO subdriver
  i2400m/SDIO: probe/disconnect, dev init/shutdown and
reset backends
  i2400m/SDIO: TX and RX path backends
  i2400m/USB: firmware upload backend
  i2400m/USB: header for the USB bus driver
  i2400m/USB: probe/disconnect, dev init/shutdown and
reset backends
  i2400m/USB: TX and RX path backends
  i2400m/usb: wrap USB power saving in #ifdef CONFIG_PM
  i2400m: various functions for device management
  wimax: basic API: kernel/user messaging, rfkill and
reset
  wimax: debugfs controls
  wimax: debug macros and debug settings for the WiMAX
stack
  wimax: documentation for the stack
  wimax: export linux/wimax.h and linux/wimax/i2400m.h
with headers_install
  wimax: fix kconfig interactions with rfkill and input
layers
  wimax: generic device management (registration,
deregistration, lookup)
  wimax: headers for kernel API and user space interaction
  wimax/i2400m: add CREDITS and MAINTAINERS entries
  wimax: internal API for the kernel space WiMAX stack
  wimax: Makefile, Kconfig and docbook linkage for the
stack

Further background and information about developments in the
Linux kernel and its environment can also be found in previous issues
of the kernel log at heise open:

  Kernel
Log: What's new in 2.6.29 - Part 1: Dodgy Wifi drivers and AP support
  Kernel
Log: 2.6.29 development kicks off, improved 3D support
  Kernel
Log: Higher and Further, The innovations of Linux 2.6.28
  Kernel
Log: What's coming in 2.6.28 - Part 9: Fastboot and other remainders
  Kernel
Log: What's coming in 2.6.28 - Part 7: architecture support, memory
subsystem and virtualisation
  Kernel
Log: What's coming in 2.6.28 - Part 6: Changes to the audio drivers

Older Kernel logs can be found in the
archives or by using the search function at heise open.
(thl/c't)






[linuxkernelnewbies] Kernel Log: What's new in 2.6.29 - Part 1: Dodgy Wifi drivers and AP support - The H: Security news and Open source developments

2009-04-26 Thread Peter Teoh




http://www.h-online.com/news/Kernel-Log-What-s-new-in-2-6-29-Part-1-Dodgy-Wifi-drivers-and-AP-support--/112392

Kernel Log: What's new in 2.6.29 - Part 1: Dodgy Wifi drivers and
AP support
See – Part 2 of
Whats new in 2.6.29. 



 

 Scarcely two weeks after the release of Linux 2.6.28,
Linus Torvalds has integrated comprehensive changes for kernel version
2.6.29 into the main development branch. As of Friday morning, he had
added a whopping 7550 patches that changed 8388 files and included more
than 1,061,513 new, changed, or moved, lines of code. Over the weekend,
the merge window closed and the second phase of the development cycle,
which usually lasts some eight to ten weeks, has started with the
release of 2.6.29-RC1. In the second phase only corrections, or small
changes that do not threaten the code, will find their way into the
main development branch.
A significant part of the changes integrated up to now includes a
long list of improvements in the kernel's network support features.
Since the network support features are the most significant additions
to the kernel, a round-up of these changes kicks off this "What's new
in 2.6.29" series. Should any other noteworthy patches for the network
subsystem find their way into the main development branch in the coming
weeks, we will sum them up in the final instalment of this series –
shortly before Torvalds releases 2.6.29. On the occasion of the
release, a comprehensive Kernel Log will again sum up the most
important changes reported in the course of the "What's new in 2.6.29"
series.
Dodgy network drivers
Following the integration
in 2.6.28 of Greg Kroah-Hartman's staging
kernel branch into the Linux main development branch, the
self-styled "maintainers
of crap" have added
numerous additional drivers, in the kernel's staging area, that do
not meet the kernel developers' quality standards. Among the offenders
were the rt2860 and rt2870
Wifi drivers for the Ralink Wifi chips found in some of the new
netbooks and low-end notebooks. Other new entries in the staging area
are the otus driver, released
in October, for the Atheros UB81, UB82 and UB83 WLAN chips, as well
as the agnx and rtl8187se
drivers for the Airgo AGNX00 WLAN chip and the Realtek RTL8187SE WLAN
chip. There was also a comprehensive scrub and restructuring done on
the code for the wlan-ng framework included in 2.6.28. Developers swapped
the at76_usb staging driver, also included in 2.6.28, for a variant
based on the mac80211 Linux WLAN stack – actually, the author of the
driver had another solution in mind, so it would be no
surprise if additional changes are made to the patch, or if it is
withdrawn entirely.
Also, the benet
network driver for ServerEngines' BladeEngine (EC 3210) 10Gb network
adapter is a new addition to the staging area. Whether users of the
mainstream distributions and their kernels will see tangible advantages
from the inclusion of all these staging drivers depends on which
distribution they are using. Administrators of the distributions'
kernels activate only some of the staging drivers, or only partially
activate them, since they do not meet the normal quality standards of
the kernel developers. The drivers' failure to meet these standards is
also the reason the kernel is marked with a "TAINT_CRAP" flag when it
is loading them. This makes it clear in users' error reports that a
"crappy" driver has "besmirched" the kernel and may have been
responsible, or partly responsible, for problems. However, in the
absence of other drivers, users who simply want to use their hardware
may not give a hoot about the driver, as long as it does not cause any
serious problems. Network manager/developer Dan Williams made it known
in a recent Fedora list post that he does not think much of the staging
drivers (1, 2). He said that he would ignore bugs involving
staging drivers, "Basically,
I'm going to ignore any issues that come in from these drivers because
they aren't accepted upstream wireless drivers, despite what gregkh
(who's not a wireless developer) tries to make them."
More than a thousand other changes
Network subsystem administrator David S. Miller did not leave it to
Greg Kroah-Hartman alone to submit all of the network updates; he
collected more than a thousand network-related patches himself and sent
them to Torvalds. (1, 2, 3).
New and removed WLAN drivers, AP mode
Support for operation as an access point (AP), which has been in the
kernel's Wifi stack for some time, albeit deactivated, has now been activated (documentation, support in nl80211).
However, the kernel does not handle the actual AP administration
functions itself, but rather leaves them to the current versions of hostapd.
The WLAN drivers have to support AP mode as well, although this is not the case
with the kernel's drivers for the Intel WLAN modules found in Centrino
notebooks and others. Developers are expanding the ath5k and p54 WLAN
drivers, to support AP mode (1, 2).
The kernel hackers have extended th

[linuxkernelnewbies] Kernel Log: Morton questions acceptance of Xen Dom0 code; file systems for SSDs - News - The H Open Source: News and Features

2009-04-26 Thread Peter Teoh




http://www.h-online.com/open/Kernel-Log-Morton-questions-acceptance-of-Xen-Dom0-code-file-systems-for-SSDs--/news/112784

Kernel Log: Morton questions acceptance of Xen Dom0 code; file
systems for SSDs



 

 In his response to the invitation on the Linux Kernel
Mailing list (LKML) for comments on the recently submitted Xen Dom0
patches, Andrew Morton
asks whether accepting these kernel extensions into the main Linux
development tree to operate as the leading Xen domain (Dom0) still
makes sense. He has suggested that Xen may be the "old" way to achieve
virtualisation, whereas the world is moving in a "new" direction,
towards KVM. He also suggests that Linux developers could
regret accepting Xen Dom0 support in three years' time ("I
hate to be the one to say it, but we should sit down and work out
whether it is justifiable to merge any of this into Linux. I think it's
still the case that the Xen technology is the "old" way and that the
world is moving off in the "new" direction, KVM? In three years' time,
will we regret having merged this? ").
This has prompted a debate
on the pros and cons, and the relative advantages and drawbacks of Xen
and KVM. Jeremy Fitzhardinge, a long-standing Xen developer who sent
Xen Dom0 patches developed by him and others to the LKML, campaigned strongly for Xen, but was to some extent
rebuffed by other well known Linux developers, including Nick Piggin and Ingo Molnar.
As one of the managers of the kernel code for supporting the x86
architecture, Molnar could have an important say in the decision
whether to accept Xen support. A decision has probably not been made
yet, but in spite of the discussion stimulated by Morton and the
objections of other kernel hackers, it's perfectly possible that the
next-but-one Linux version (2.6.30) will incorporate Xen Dom0 code,
based on these patches.
How the situation arose
In any case, it's difficult to make any predictions for the
development model of the Linux kernel, because many developers, not
least Linus Torvalds, can considerably speed up or hold back the
acceptance of patches. Originally, Morton prophesied four years ago
that acceptance of Xen support into the Linux kernel was imminent. At
that time, however, kernel developers were already dissatisfied with
some aspects of integrating it into the kernel sources, and asked for
changes before it could be accepted.
While Xen developers were working on this, other Linux-specific
virtualisation solutions appeared, such as KVM
(Kernel-based Virtual Machine) and Lguest (originally called Lhype).
The kernel developers are consequently pressing for an interface that
lets the Linux kernel work as efficiently as possible as a
paravirtualised guest under all of these and other virtualisation
solutions, without large quantities of special code having to be
included in the kernel for each hypervisor. The paravirt_ops
abstraction layer then emerged, largely under the leadership of the
Lguest developer, and found its way into the main Linux development
tree with Linux 2.6.20.
That same version also saw the developers accept the KVM
virtualisation framework into Linux. Though only a few months old at
the time, it fitted into the kernel much better than Xen support and,
in the opinion of many kernel hackers, was clearly the technically more
elegant solution, since it used the kernel itself as hypervisor and
thus had recourse to the infrastructure of the kernel (scheduler,
memory management, drivers), while the Xen hypervisor is positioned
upstream of the Linux kernel.
On the other hand, KVM requires CPUs with virtualisation functions,
like the AMD-V and Intel VT. Xen can also use these functions in order
to virtualise unmodified guest systems, but if the CPU doesn't provide
these functions, an operating system adapted to Xen can alternatively
run as a guest under the Xen hypervisor using paravirtualisation.
Fitzhardinge is now citing this difference as one of the advantages of
Xen, though all recent x86 server processors and many desktop and
notebook CPUs now provide virtualisation functions.
Second attempt 

 


While KVM underwent further constant and rapid speedy development
within normal work on the kernel, acquiring functions like migration
and PCI device pass-through for guests, Xen developers were slow to
move ahead with integrating Xen into the Linux kernel. Instead, they
paid a lot of attention to further development of the Xen code, which
is also used in commercial Xen products. It sits on top of Linux kernel
2.6.18, and doesn't satisfy the quality requirements of kernel
developers. The 2.6.18 kernel however lacks many drivers for more
recent PC components, so the distribution developers are porting this
Xen code to later kernels. This was and still is extremely laborious
and, in practice, the result only works after a fashion. This is
probably one of the reasons that motivated Red Hat to buy Qumranet, a
company specialising in KVM, and subsequently (according to recently divulged plans

[linuxkernelnewbies] pNFS and the Future of File Systems

2009-04-26 Thread Peter Teoh





http://www.enterprisestorageforum.com/sans/features/article.php/3793301


pNFS and the Future of
File Systems
December 24, 2008
By Drew
Robb



High-performance
file systems such as Panasas PanFS, Sun QFS, Quantum StorNext, IBM GPFS
and HP File Services can add plenty of value to storage implementations
(see Choosing
the Right High-Performance File System).
Take the case of
DigitalFilm Tree, a company based in Hollywood that provides
post-production and visual effects (VFX) services for the entertainment
industry. It recently had to ramp up its operations to deal with VFX
for Showtime's "Weeds," CW's "Everybody Hates Chris," NBC's "Scrubs," a
new TV pilot episode, and work on the Jet Li movie "The Forbidden
Kingdom."
The company
harnesses
a storage environment that includes Apple (NASDAQ: AAPL) Xsan, HP
(NYSE: HPQ) StorageWorks arrays, QLogic (NASDAQ: QLGC) switches and
gear from several other storage vendors. It is also a mixed OS
environment, with the workflow having to deal with users on Macs and
PCs.
"The velocity of
our
work on the TV shows demands a non-linear workflow and the management
of well over 100 TB of data," said Ramy Katrib, founder and CEO of
DigitalFilm Tree. "StorNext enabled us to greatly expand our delivery
without having to double our staff."
But with the
ongoing updates to file system protocols like NFS,
including parallel NFS (pNFS),
is there a possibility that NFS could eventually supplant the many
proprietary file systems out there? Let's first take a look at another
couple of high-performance offerings from Sun and NetApp (NASDAQ: NTAP)
before taking out our crystal ball to see what the future holds.

Sun Lustre
Sun Microsystems
(NASDAQ: JAVA) characterizes Lustre as "the most scalable parallel file
system in the world." In evidence of this, it serves six of the top 10
supercomputers and 40 percent of the top 100.
"We have Lustre
file
systems that scale to petabytes of data in one cohesive name space and
deliver in excess of 100 GB/s aggregate performance to 25,000 clients
or more," said Peter Bojanic, director of Sun's Lustre Group. "This
includes HPC applications at Livermore, Oak Ridge and Sandia National
Laboratories, where large-file I/O
and sustained high bandwidth are essential."
Adoption is also
growing in oil and gas, rich media and content distribution networks,
which all require mixed workloads with large and small files. One of
Lustre's differentiators is that it is available as open source
software based on Linux. That's why you find it integrated with storage
products from other HPC vendors, including SGI (NASDAQ: SGIC), Dell
(NASDAQ: DELL), HP, Cray (NASDAQ: CRAY) and Terascala.
Lustre is an object-based
cluster file system, but it is not T10 OSD-compliant, and the
underlying storage allocation management is block-based. It requires
the presence of a Lustre MetaData Server and Lustre Object Storage
Servers. File operations bypass the MetaData Server, utilizing parallel
data paths to Object Servers in the cluster. Servers are organized in
failover pairs. It runs on a variety of networks, including IP and InfiniBand.

NetApp WAFL
NetApp has a file
system called WAFL (Write Anywhere File Layout), which consolidates CIFS,
NFS, HTTP, FTP, Fibre
Channel and iSCSI
and works in conjunction with NetApp's Data ONTAP operating system.
WAFL is integrated with RAID-DP, NetApp's high-performance version of RAID-6,
so it can survive the loss of one or two disk drives.
Non-volatile memory
(NVRAM) is added to improve speed by allowing a storage access protocol
target to respond to requests to modifications before writing to disks.
Through WAFL, requests are logged to NVRAM and file system
modifications are saved in volatile memory. After several modifications
have accumulated in volatile memory, WAFL gathers the results into what
NetApp terms a "consistency point" (basically a snapshot) and writes
the consistency point to the RAID group assigned to the file system.
"If the consistency
point is not written to disk before hardware or software failure, then
once Data ONTAP reboots, the contents of the NVRAM log are replayed to
the WAFL, and the consistency point is written to disk," said Michael
Eisler, senior technical director of NFS at NetApp. "Most of NetApp's
competitors have snapshots, but NetApp has used its underlying snapshot
technology to build features like file system level mirroring, backup
integration, cloning, de-duplication,
data retention, striping across network storage devices, and flexible
volumes."
Flexible volumes
(also
called FlexVols) are volumes that can share a single pool (or
aggregate) of storage with other flexible volumes. These volumes can be
grown or contracted as needed — freed up space is returned to the
storage pool to be used by other FlexVols.

The Future of File
Systems
Not everyone needs
high performance, of course. There are the more common file system
protocols such as NFS and CIFS, as well as Sun's open-source ZFS file
system that runs on

[linuxkernelnewbies] FAQ - VNUML-WIKI

2009-04-26 Thread Peter Teoh





http://www.dit.upm.es/vnumlwiki/index.php/FAQ
VNUML
Frequently Asked Questions



Authors:
David Fernández (david at dit.upm.es)
Fermín Galán (galan at dit.upm.es)
version 1.7, June 4th, 2004




  

  
  
  Contents
  [hide]
  
1 Writing the VNUML
specification
2 Limitations
3 Linux Kernels for
VNUML
4 About root
filesystems
5 Starting the
simulation (-t option)
6 VNUML over different
Linux distributions
  
  

  







  Writing the VNUML specification 
How can I check if my VNMUL XML specification is correct?

Whenever vnuml tool is executed, the specification is checked
and will give you some error messages in case the specification is not
correct. Alternatively, you can check your specification using the xmlwf
command that comes with expat distribution (needed to run VNUML). The xmllint
command (that comes with libxml package) also could be used for the
same task.


  Limitations 
What is the maximum virtual networks number?

There are two hard limits in the number of simultaneous virtual
networks (ie, how many  can
vnumlparser.pl manage).


   64 maximum networks, if using host kernel version < 2.6.5
  
   32 maximum networks, if using brige-utils version < 0.9.7
  

So, if you want to use as many virtual networks as your physical
host could cope, use at least bridge-utils 0.9.7 (available just as
tarball at http://sourceforge.net/projects/bridge/ at time of
this writting) and Linux Kernel 2.6.5.



  Linux Kernels for VNUML 
How can I know which kernel options were used when compiling a
UML linux kernel?

Just execute the kernel with "--showconfig" option. For example, to
know if a UML linux kernel has IPv6 support just type: 
> linux --showconfig | grep IPV6


  About root filesystems 
I have changed the filesystem used by a virtual machine, but
when I start the simulation it seems to use the old one.

If you are using "COW" filesystems as recommended, you have to
delete the old cow file before starting the simulation with the new
filesystem. The reason is that cow files save a reference to the root
file system they are derived from. To delete a cow file you can use the
"purge" option ("-P") of vnumlparser.pl.


I am using "root_fs_tutorial" root filesystem and I see that
Apache web server (or any other service) is not automatically started
when the virtual machine boots, why? How can I make it start from the
beginning? 

Most of the services are not started from the beginning to speed
up virtual machines boot up process during the scenario start-up (-t
option). It is recommended to start the services you need using
"" commands inside your VNUML specification. For example,
to start apache2, you can include the following command: 
/etc/init.d/apache2 start

Alternatively, you can use "update-rc.d" command to restore the
scripts that start apache2 during boot process. Just start the
rootfilesystem in direct mode as described in update rootfilesystem example, login
into the virtual machine through the console or using ssh and type the
following command: 
update-rc.d apache2 defaults


  Starting the simulation (-t option) 
When I build a scenario, I get the following message when
booting each virtual machine:

 Checking for the skas3 patch in the host:
 - /proc/mm...not found
 - PTRACE_FAULTINFO...not found
 - PTRACE_LDT...not found
 UML running in SKAS0 mode

Then the process stops, apparently hanging, but if I press
CTRL+C it continues and, finally, the scenario is set up properly. Can
this be avoided?

This is a known problem that happens with some guest UML kernels
and host kernel combinations. If you are using a modern UML guest
kernel (like 2.6.21.5) the problem doesn't use to occur, but otherwise
you can test some of the following:


  This problem use to happen during the VNUMLization, so if you
avoid
it using -Z switch, it won't happen. However, older VNUML versions does
not implement -Z and, anyway, avoiding VNUMLizaton could be problematic
if you are not using an official root filesystem. See the the user manual for more
information on VNUMLization.
  
  This problem seems related with the configuration of the host
kernel. In
particular, if you are using a host kernel version previous to
2.6.20.2,
the problem may happen if CONFIG_COMPAT_VDSO=y. If you are using
CONFIG_COMPAT_VDSO=n, the problem won't occur.
  
  It seems that when using hostfs kernel 2.6.20.2 or newer, the
problem does
not happen at all, even in the case you are using CONFIG_COMPAT_VDSO=y
(from the 2.6.20.2
changelog: "Fix broken CONFIG_COMPAT_VDSO on i386")
  

The recommended solution is the third one. As a proof, I'm using a
2.6.21 kernel, CONFIG_COMPAT_VDSO=y and the
hanging does not occurs. However, further confirmation by other users
would be helpful :)

I'm trying to build the simple.xml example that comes with the
VNUML software, but I'm getting the following error:

 Checking for the skas3 patch in the host:
 - /proc/mm...not 

[linuxkernelnewbies] The world won’t listen » Us er-mode Linux and skas0

2009-04-26 Thread Peter Teoh





http://blogs.igalia.com/berto/2006/09/13/user-mode-linux-and-skas0/

User-mode Linux and skas0

User-mode Linux
(UML) is a port of Linux to its own system call interface. In short,
it’s a system that allows to run Linux inside Linux.
UML is integrated in the standard Linux tree, so it’s possible to
compile an UML kernel from any recent kernel sources (using ‘make
ARCH=um‘).
Traditionally, UML had a working mode which was both slow and
insecure, as each process inside the UML had write access to the kernel
data. This mode is known as Tracing Thread (tt mode).
A new mode was added in order to solve those issues. It was called
skas (for Separate Kernel Address Space).
Now the UML kernel was totally inaccessible to UML processes, resulting
in a far more secure environment. In skas mode the system ran
noticeably faster too.
To enable skas mode the host kernel had to be patched. As of
September 2006, the latest version of the patch is called skas3. The
patch is small but hasn’t been merged in the standard Linux tree. The
official UML site has a page about
skas mode that explains all these issues more thoroughly.
However, by July 2005 a new mode was added to UML in Linux 2.6.13
called skas0
(which, for some reason, isn’t explained in the above page). This new
mode is very close to skas3: it provides the same security model and
most of its speed gains. The main difference is that you don’t
need to patch the host kernel,
so you can use a skas-enabled UML in your Linux system without having
to mess with the host kernel. The patch is explained in the 2.6.13
changelog or in this article.
A skas0-enabled kernel boots like this:
Checking that ptrace can change system call numbers...OK
Checking syscall emulation patch for ptrace...OK
Checking advanced syscall emulation patch for ptrace...OK
Checking for tmpfs mount on /dev/shm...OK
Checking PROT_EXEC mmap in /dev/shm/...OK
Checking for the skas3 patch in the host:
  - /proc/mm...not found
  - PTRACE_FAULTINFO...not found
  - PTRACE_LDT...not found
UML running in SKAS0 mode
...


Posted: September 13th, 2006 under
Planet
Igalia, English, Software, Planet GPUL.
Comments:
2






[linuxkernelnewbies] C/C++ Thread Safety Annotatio...

2009-04-26 Thread Peter Teoh





http://docs.google.com/Doc?id=ddqtfwhb_0c49t6zgr


C/C++
Thread Safety Annotations 
Le-Chun Wu

Modified:June 9, 2008


Objective

This project creates a set of C/C++ program annotations
that (1) allow developers to document multi-threaded code so
that maintainers can avoid introducing thread
safety bugs, and (2) help program analysis tools identify potential
thread
safety issues. We add a new GCC analysis pass that uses the source
annotations to identify thread safety issues and emit compiler warnings.

Background

Multi-threading is an increasingly important technique to boost
performance
on multi-core/multiprocessor systems. Unfortunately, multi-threaded
programming is hard: timing-dependent bugs, such as data races and
deadlocks,
are very difficult to expose in testing and hard to reproduce and
isolate
once discovered. Proper documentation of synchronization policies and
thread safety guarantees is probably one of the most useful techniques
to manage multi-threaded code and avoid concurrency bugs.
In practice, programmers' intended synchronization policies,
such as lock acquisition order and lock requirement for shared
variables
and functions, are often documented in comments. Comments help
maintainers avoid introducing errors, but it is hard for program
analysis
tools to use the information to tell programmers when they have
violated
their synchronization policies and identify potential thread safety
issues.
Therefore this project creates program annotations for C/C++ to help
developers
document locks and how they need to be used to safely read and write
shared
variables. We design and implement a new GCC pass that uses the
annotations
to identify and warn about the issues that could potentially result in
race
conditions and deadlocks.

Overview

There are many styles of synchronization in multi-threaded
programming. The annotations used here only focus on the
mutex lock-based synchronization. The annotations are implemented in
GCC's
"attribute" language extension. The following is a list of C macro
definitions using the proposed new attributes. We define these macros
here
to simplify the examples and discussions in this document. It is also a
common practice for people to use the macros instead of the raw GCC
attributes for code portability and compatibility.

#define GUARDED_BY(x)  __attribute__ ((guarded_by(x)))
#define GUARDED_VAR__attribute__ ((guarded))
#define PT_GUARDED_BY(x)   __attribute__ ((point_to_guarded_by(x)))
#define PT_GUARDED_VAR __attribute__ ((point_to_guarded))
#define ACQUIRED_AFTER(...)__attribute__ ((acquired_after(__VA_ARGS__)))
#define ACQUIRED_BEFORE(...)   __attribute__ ((acquired_before(__VA_ARGS__)))
#define LOCKABLE   __attribute__ ((lockable))
#define SCOPED_LOCKABLE__attribute__ ((scoped_lockable))
#define EXCLUSIVE_LOCK_FUNCTION(...)__attribute__ ((exclusive_lock(__VA_ARGS__)))
#define SHARED_LOCK_FUNCTION(...)   __attribute__ ((shared_lock(__VA_ARGS__)))
#define EXCLUSIVE_TRYLOCK_FUNCTION(...) __attribute__ ((exclusive_trylock(__VA_ARGS__)))
#define SHARED_TRYLOCK_FUNCTION(...)__attribute__ ((shared_trylock(__VA_ARGS__)))
#define UNLOCK_FUNCTION(...)__attribute__ ((unlock(__VA_ARGS__)))
#define LOCK_RETURNED(x)__attribute__ ((lock_returned(x)))
#define LOCKS_EXCLUDED(...) __attribute__ ((locks_excluded(__VA_ARGS__)))
#define EXCLUSIVE_LOCKS_REQUIRED(...)   __attribute__ ((exclusive_locks_required(__VA_ARGS__)))
#define SHARED_LOCKS_REQUIRED(...)  __attribute__ ((shared_locks_required(__VA_ARGS__)))
#define NO_THREAD_SAFETY_ANALYSIS   __attribute__ ((no_thread_safety_analysis))


Note that the annotations proposed here are not expressive enough to
handle
fine-grained locking relationship between locks and the guarded
variables,
e.g. when each individual element (or a group of elements) of a linked
list/hash table is guarded by a different lock. While most of the
proposed
annotations are designed for documenting synchronization policies,
some are simply created to help program analysis tools.
Detailed explanation of the annotations and their usage are discussed
in the
next section.

Detailed Design
Variable Annotations

The following annotations are used to specify synchronization policies,
such as which variables are guarded by which locks and the acquisition
order of locks.



  GUARDED_BY(lock)
and GUARDED_VAR
These two annotations document a shared
variable/field that
needs to be protected by a lock. GUARDED_BY specifies
a particular lock should be held when accessing the
annotated variable. GUARDED_VAR only indicates a shared
variable should be guarded (by any lock). GUARDED_VAR is
primarily used when the client cannot express the name of the lock.
The lock argument in GUARDED_BY (or in any other annotations
mentioned below that take lock arguments) can be a variable, a class
member, or even an _expression_ specifying an