[linuxkernelnewbies] C/C++ Thread Safety Annotatio...

2009-04-26 Thread Peter Teoh


Thread Safety Annotations 
Le-Chun Wu
lcwu at google.com
Modified:June 9, 2008


This project creates a set of C/C++ program annotations
that (1) allow developers to document multi-threaded code so
that maintainers can avoid introducing thread
safety bugs, and (2) help program analysis tools identify potential
safety issues. We add a new GCC analysis pass that uses the source
annotations to identify thread safety issues and emit compiler warnings.


Multi-threading is an increasingly important technique to boost
on multi-core/multiprocessor systems. Unfortunately, multi-threaded
programming is hard: timing-dependent bugs, such as data races and
are very difficult to expose in testing and hard to reproduce and
once discovered. Proper documentation of synchronization policies and
thread safety guarantees is probably one of the most useful techniques
to manage multi-threaded code and avoid concurrency bugs.
In practice, programmers' intended synchronization policies,
such as lock acquisition order and lock requirement for shared
and functions, are often documented in comments. Comments help
maintainers avoid introducing errors, but it is hard for program
tools to use the information to tell programmers when they have
their synchronization policies and identify potential thread safety
Therefore this project creates program annotations for C/C++ to help
document locks and how they need to be used to safely read and write
variables. We design and implement a new GCC pass that uses the
to identify and warn about the issues that could potentially result in
conditions and deadlocks.


There are many styles of synchronization in multi-threaded
programming. The annotations used here only focus on the
mutex lock-based synchronization. The annotations are implemented in
"attribute" language extension. The following is a list of C macro
definitions using the proposed new attributes. We define these macros
to simplify the examples and discussions in this document. It is also a
common practice for people to use the macros instead of the raw GCC
attributes for code portability and compatibility.

#define GUARDED_BY(x)  __attribute__ ((guarded_by(x)))
#define GUARDED_VAR__attribute__ ((guarded))
#define PT_GUARDED_BY(x)   __attribute__ ((point_to_guarded_by(x)))
#define PT_GUARDED_VAR __attribute__ ((point_to_guarded))
#define ACQUIRED_AFTER(...)__attribute__ ((acquired_after(__VA_ARGS__)))
#define ACQUIRED_BEFORE(...)   __attribute__ ((acquired_before(__VA_ARGS__)))
#define LOCKABLE   __attribute__ ((lockable))
#define SCOPED_LOCKABLE__attribute__ ((scoped_lockable))
#define EXCLUSIVE_LOCK_FUNCTION(...)__attribute__ ((exclusive_lock(__VA_ARGS__)))
#define SHARED_LOCK_FUNCTION(...)   __attribute__ ((shared_lock(__VA_ARGS__)))
#define EXCLUSIVE_TRYLOCK_FUNCTION(...) __attribute__ ((exclusive_trylock(__VA_ARGS__)))
#define SHARED_TRYLOCK_FUNCTION(...)__attribute__ ((shared_trylock(__VA_ARGS__)))
#define UNLOCK_FUNCTION(...)__attribute__ ((unlock(__VA_ARGS__)))
#define LOCK_RETURNED(x)__attribute__ ((lock_returned(x)))
#define LOCKS_EXCLUDED(...) __attribute__ ((locks_excluded(__VA_ARGS__)))
#define EXCLUSIVE_LOCKS_REQUIRED(...)   __attribute__ ((exclusive_locks_required(__VA_ARGS__)))
#define SHARED_LOCKS_REQUIRED(...)  __attribute__ ((shared_locks_required(__VA_ARGS__)))
#define NO_THREAD_SAFETY_ANALYSIS   __attribute__ ((no_thread_safety_analysis))

Note that the annotations proposed here are not expressive enough to
fine-grained locking relationship between locks and the guarded
e.g. when each individual element (or a group of elements) of a linked
list/hash table is guarded by a different lock. While most of the
annotations are designed for documenting synchronization policies,
some are simply created to help program analysis tools.
Detailed explanation of the annotations and their usage are discussed
in the
next section.

Detailed Design
Variable Annotations

The following annotations are used to specify synchronization policies,
such as which variables are guarded by which locks and the acquisition
order of locks.

These two annotations document a shared
variable/field that
needs to be protected by a lock. GUARDED_BY specifies
a particular lock should be held when accessing the
annotated variable. GUARDED_VAR only indicates a shared
variable should be guarded (by any lock). GUARDED_VAR is
primarily used when the client cannot express the name of the lock.
The lock argument in GUARDED_BY (or in any other annotations
mentioned below that take lock arguments) can be a variable, a class
member, or even an _expression_ specifying an 

[linuxkernelnewbies] The world won’t listen » Us er-mode Linux and skas0

2009-04-26 Thread Peter Teoh


User-mode Linux and skas0

User-mode Linux
(UML) is a port of Linux to its own system call interface. In short,
it’s a system that allows to run Linux inside Linux.
UML is integrated in the standard Linux tree, so it’s possible to
compile an UML kernel from any recent kernel sources (using ‘make
Traditionally, UML had a working mode which was both slow and
insecure, as each process inside the UML had write access to the kernel
data. This mode is known as Tracing Thread (tt mode).
A new mode was added in order to solve those issues. It was called
skas (for Separate Kernel Address Space).
Now the UML kernel was totally inaccessible to UML processes, resulting
in a far more secure environment. In skas mode the system ran
noticeably faster too.
To enable skas mode the host kernel had to be patched. As of
September 2006, the latest version of the patch is called skas3. The
patch is small but hasn’t been merged in the standard Linux tree. The
official UML site has a page about
skas mode that explains all these issues more thoroughly.
However, by July 2005 a new mode was added to UML in Linux 2.6.13
called skas0
(which, for some reason, isn’t explained in the above page). This new
mode is very close to skas3: it provides the same security model and
most of its speed gains. The main difference is that you don’t
need to patch the host kernel,
so you can use a skas-enabled UML in your Linux system without having
to mess with the host kernel. The patch is explained in the 2.6.13
changelog or in this article.
A skas0-enabled kernel boots like this:
Checking that ptrace can change system call numbers...OK
Checking syscall emulation patch for ptrace...OK
Checking advanced syscall emulation patch for ptrace...OK
Checking for tmpfs mount on /dev/shm...OK
Checking PROT_EXEC mmap in /dev/shm/...OK
Checking for the skas3 patch in the host:
  - /proc/mm...not found
  - PTRACE_FAULTINFO...not found
  - PTRACE_LDT...not found
UML running in SKAS0 mode

Posted: September 13th, 2006 under
Igalia, English, Software, Planet GPUL.

[linuxkernelnewbies] FAQ - VNUML-WIKI

2009-04-26 Thread Peter Teoh

Frequently Asked Questions

David Fernndez (david at dit.upm.es)
Fermn Galn (galan at dit.upm.es)
version 1.7, June 4th, 2004


1 Writing the VNUML
2 Limitations
3 Linux Kernels for
4 About root
5 Starting the
simulation (-t option)
6 VNUML over different
Linux distributions


  Writing the VNUML specification 
How can I check if my VNMUL XML specification is correct?

Whenever vnuml tool is executed, the specification is checked
and will give you some error messages in case the specification is not
correct. Alternatively, you can check your specification using the xmlwf
command that comes with expat distribution (needed to run VNUML). The xmllint
command (that comes with libxml package) also could be used for the
same task.

What is the maximum virtual networks number?

There are two hard limits in the number of simultaneous virtual
networks (ie, how many net can
vnumlparser.pl manage).

   64 maximum networks, if using host kernel version  2.6.5
   32 maximum networks, if using brige-utils version  0.9.7

So, if you want to use as many virtual networks as your physical
host could cope, use at least bridge-utils 0.9.7 (available just as
tarball at http://sourceforge.net/projects/bridge/ at time of
this writting) and Linux Kernel 2.6.5.

  Linux Kernels for VNUML 
How can I know which kernel options were used when compiling a
UML linux kernel?

Just execute the kernel with "--showconfig" option. For example, to
know if a UML linux kernel has IPv6 support just type: 
 linux --showconfig | grep IPV6

  About root filesystems 
I have changed the filesystem used by a virtual machine, but
when I start the simulation it seems to use the old one.

If you are using "COW" filesystems as recommended, you have to
delete the old cow file before starting the simulation with the new
filesystem. The reason is that cow files save a reference to the root
file system they are derived from. To delete a cow file you can use the
"purge" option ("-P") of vnumlparser.pl.

I am using "root_fs_tutorial" root filesystem and I see that
Apache web server (or any other service) is not automatically started
when the virtual machine boots, why? How can I make it start from the

Most of the services are not started from the beginning to speed
up virtual machines boot up process during the scenario start-up (-t
option). It is recommended to start the services you need using
"exec" commands inside your VNUML specification. For example,
to start apache2, you can include the following command: 
exec seq="start" type="verbatim"/etc/init.d/apache2 start/exec

Alternatively, you can use "update-rc.d" command to restore the
scripts that start apache2 during boot process. Just start the
rootfilesystem in direct mode as described in update rootfilesystem example, login
into the virtual machine through the console or using ssh and type the
following command: 
update-rc.d apache2 defaults

  Starting the simulation (-t option) 
When I build a scenario, I get the following message when
booting each virtual machine:

 Checking for the skas3 patch in the host:
 - /proc/mm...not found
 - PTRACE_FAULTINFO...not found
 - PTRACE_LDT...not found
 UML running in SKAS0 mode

Then the process stops, apparently hanging, but if I press
CTRL+C it continues and, finally, the scenario is set up properly. Can
this be avoided?

This is a known problem that happens with some guest UML kernels
and host kernel combinations. If you are using a modern UML guest
kernel (like the problem doesn't use to occur, but otherwise
you can test some of the following:

  This problem use to happen during the VNUMLization, so if you
it using -Z switch, it won't happen. However, older VNUML versions does
not implement -Z and, anyway, avoiding VNUMLizaton could be problematic
if you are not using an official root filesystem. See the the user manual for more
information on VNUMLization.
  This problem seems related with the configuration of the host
kernel. In
particular, if you are using a host kernel version previous to,
the problem may happen if CONFIG_COMPAT_VDSO=y. If you are using
CONFIG_COMPAT_VDSO=n, the problem won't occur.
  It seems that when using hostfs kernel or newer, the
problem does
not happen at all, even in the case you are using CONFIG_COMPAT_VDSO=y
(from the
changelog: "Fix broken CONFIG_COMPAT_VDSO on i386")

The recommended solution is the third one. As a proof, I'm using a
2.6.21 kernel, CONFIG_COMPAT_VDSO=y and the
hanging does not occurs. However, further confirmation by other users
would be helpful:)

I'm trying to build the simple.xml example that comes with the
VNUML software, but I'm getting the following error:

 Checking for the skas3 

[linuxkernelnewbies] pNFS and the Future of File Systems

2009-04-26 Thread Peter Teoh


pNFS and the Future of
File Systems
December 24, 2008
By Drew

file systems such as Panasas PanFS, Sun QFS, Quantum StorNext, IBM GPFS
and HP File Services can add plenty of value to storage implementations
(see Choosing
the Right High-Performance File System).
Take the case of
DigitalFilm Tree, a company based in Hollywood that provides
post-production and visual effects (VFX) services for the entertainment
industry. It recently had to ramp up its operations to deal with VFX
for Showtime's "Weeds," CW's "Everybody Hates Chris," NBC's "Scrubs," a
new TV pilot episode, and work on the Jet Li movie "The Forbidden
The company
a storage environment that includes Apple (NASDAQ: AAPL) Xsan, HP
(NYSE: HPQ) StorageWorks arrays, QLogic (NASDAQ: QLGC) switches and
gear from several other storage vendors. It is also a mixed OS
environment, with the workflow having to deal with users on Macs and
"The velocity of
work on the TV shows demands a non-linear workflow and the management
of well over 100 TB of data," said Ramy Katrib, founder and CEO of
DigitalFilm Tree. "StorNext enabled us to greatly expand our delivery
without having to double our staff."
But with the
ongoing updates to file system protocols like NFS,
including parallel NFS (pNFS),
is there a possibility that NFS could eventually supplant the many
proprietary file systems out there? Let's first take a look at another
couple of high-performance offerings from Sun and NetApp (NASDAQ: NTAP)
before taking out our crystal ball to see what the future holds.

Sun Lustre
Sun Microsystems
(NASDAQ: JAVA) characterizes Lustre as "the most scalable parallel file
system in the world." In evidence of this, it serves six of the top 10
supercomputers and 40 percent of the top 100.
"We have Lustre
systems that scale to petabytes of data in one cohesive name space and
deliver in excess of 100 GB/s aggregate performance to 25,000 clients
or more," said Peter Bojanic, director of Sun's Lustre Group. "This
includes HPC applications at Livermore, Oak Ridge and Sandia National
Laboratories, where large-file I/O
and sustained high bandwidth are essential."
Adoption is also
growing in oil and gas, rich media and content distribution networks,
which all require mixed workloads with large and small files. One of
Lustre's differentiators is that it is available as open source
software based on Linux. That's why you find it integrated with storage
products from other HPC vendors, including SGI (NASDAQ: SGIC), Dell
(NASDAQ: DELL), HP, Cray (NASDAQ: CRAY) and Terascala.
Lustre is an object-based
cluster file system, but it is not T10 OSD-compliant, and the
underlying storage allocation management is block-based. It requires
the presence of a Lustre MetaData Server and Lustre Object Storage
Servers. File operations bypass the MetaData Server, utilizing parallel
data paths to Object Servers in the cluster. Servers are organized in
failover pairs. It runs on a variety of networks, including IP and InfiniBand.

NetApp has a file
system called WAFL (Write Anywhere File Layout), which consolidates CIFS,
Channel and iSCSI
and works in conjunction with NetApp's Data ONTAP operating system.
WAFL is integrated with RAID-DP, NetApp's high-performance version of RAID-6,
so it can survive the loss of one or two disk drives.
Non-volatile memory
(NVRAM) is added to improve speed by allowing a storage access protocol
target to respond to requests to modifications before writing to disks.
Through WAFL, requests are logged to NVRAM and file system
modifications are saved in volatile memory. After several modifications
have accumulated in volatile memory, WAFL gathers the results into what
NetApp terms a "consistency point" (basically a snapshot) and writes
the consistency point to the RAID group assigned to the file system.
"If the consistency
point is not written to disk before hardware or software failure, then
once Data ONTAP reboots, the contents of the NVRAM log are replayed to
the WAFL, and the consistency point is written to disk," said Michael
Eisler, senior technical director of NFS at NetApp. "Most of NetApp's
competitors have snapshots, but NetApp has used its underlying snapshot
technology to build features like file system level mirroring, backup
integration, cloning, de-duplication,
data retention, striping across network storage devices, and flexible
Flexible volumes
called FlexVols) are volumes that can share a single pool (or
aggregate) of storage with other flexible volumes. These volumes can be
grown or contracted as needed  freed up space is returned to the
storage pool to be used by other FlexVols.

The Future of File
Not everyone needs
high performance, of course. There are the more common file system
protocols such as NFS and CIFS, as well as Sun's open-source ZFS file
system that runs on 

[linuxkernelnewbies] Kernel Log: What's new in 2.6.29 - Part 2: WiMAX - The H: Security news and Open source developments

2009-04-26 Thread Peter Teoh


Kernel Log: What's new in 2.6.29 - Part 2: WiMAX


 In Part 2 of the Kernel Log's coverage of the major changes
happening in the main development branch for the Linux kernel 2.6.29
release, we look at a major new addition to Linux's networking
capability, WiMAX support. 
USB sub-system maintainer Greg Kroah-Hartman has brought the WiMAX stack,
developed primarily by Intel developers in the framework of the Linux
WiMAX project, into the Linux main development branch.
The stack gives Linux 2.6.29 a basic infrastructure for WiMAX
wireless broadband networking technology based on the i2400m USB
driver, which was also developed by the WiMAX project and concurrently
integrated into the kernel. The WiMAX stack communicates with the WiMAX Connection 2400 chip in Intel Wireless WiMAX/WiFi Link 5150 and 5350
(codename: Echo Peak) WLAN/WiMAX modules, found mainly in newer
Centrino notebooks.
As the change log in the ultimately successful e-mail request for integration
shows, Linux WiMAX developers made a number of attempts before the
network and USB sub-system administrators were satisfied with the code
and gave it the green light for integration into the kernel. Numerous
details and background information on the Linux kernel's new WiMAX
infrastructure can be found in the e-mail mentioned above, by following
the links at the end of this article to commits in the source code
administration system, and on the Linux WiMAX website.
Also, on the website you can download the
i2400m firmware
and the corresponding userspace software. However, the Intel WiMAX
binary supplicant needed for authentication with the remote host, as
well as the Intel WiMAX binary OMADM client are only available online
as a pre-compiled archive (license, FAQ).
Therefore, distributions based solely on open source software, such as
Debian, Fedora and OpenSuse, will not yet include these parts of the
userspace stack in their core distributions. However, in the e-mail
mentioned above, Intel developers do say "For networks that require
authentication (most), the Intel device requires a supplicant in user
space  because of a set of issues we are working to resolve, it cannot
be made open source yet, but it will".
See  Part
1 of Whats new in 2.6.29.
The WiMAX Changes in detail

  i2400m: debugfs controls
  i2400m: documentation and instructions for usage
  i2400m: firmware loading and bootrom initialization
  i2400m: Generic probe/disconnect, reset and message
  i2400m: host/device procotol and core driver definitions
  i2400m: linkage to the networking stack
  i2400m: Makefile and Kconfig
  i2400m: RX and TX data/control paths
  i2400m/SDIO: firmware upload backend
  i2400m/SDIO: header for the SDIO subdriver
  i2400m/SDIO: probe/disconnect, dev init/shutdown and
reset backends
  i2400m/SDIO: TX and RX path backends
  i2400m/USB: firmware upload backend
  i2400m/USB: header for the USB bus driver
  i2400m/USB: probe/disconnect, dev init/shutdown and
reset backends
  i2400m/USB: TX and RX path backends
  i2400m/usb: wrap USB power saving in #ifdef CONFIG_PM
  i2400m: various functions for device management
  wimax: basic API: kernel/user messaging, rfkill and
  wimax: debugfs controls
  wimax: debug macros and debug settings for the WiMAX
  wimax: documentation for the stack
  wimax: export linux/wimax.h and linux/wimax/i2400m.h
with headers_install
  wimax: fix kconfig interactions with rfkill and input
  wimax: generic device management (registration,
deregistration, lookup)
  wimax: headers for kernel API and user space interaction
  wimax/i2400m: add CREDITS and MAINTAINERS entries
  wimax: internal API for the kernel space WiMAX stack
  wimax: Makefile, Kconfig and docbook linkage for the

Further background and information about developments in the
Linux kernel and its environment can also be found in previous issues
of the kernel log at heise open:

Log: What's new in 2.6.29 - Part 1: Dodgy Wifi drivers and AP support
Log: 2.6.29 development kicks off, improved 3D support
Log: Higher and Further, The innovations of Linux 2.6.28
Log: What's coming in 2.6.28 - Part 9: Fastboot and other remainders
Log: What's coming in 2.6.28 - Part 7: architecture support, memory
subsystem and virtualisation
Log: What's coming in 2.6.28 - Part 6: Changes to the audio drivers

Older Kernel logs can be found in the
archives or by using the search function at heise open.

[linuxkernelnewbies] Kernel Log: main development phase for 2.6.29 ends, new X.org drivers - The H: Security news and Open source developments

2009-04-26 Thread Peter Teoh


Kernel Log: main development phase for 2.6.29 ends, new X.org
With the release of
on Saturday night, Linus Torvalds has closed the 2.6.29 merge window
and brought to a close the development phase, during which the major
new features for the next version of Linux are adopted. All significant
changes in 2.6.29 should now be in the Linux source code management
system, including new features previously discussed on heise open such
as WiMAX, access
point support and the Btrfs and Squashfs
file systems.
These changes are just some of the more conspicuous changes adopted
by the kernel hackers for 2.6.29. Support has been added for
kernel-based mode setting on Intel graphics hardware and improvements
have been made to the Graphics Execution Manager (GEM), which was
integrated with
The SCSI subsystem now supports Fibre Channel over Ethernet (FCoE) and
there are fixes to, and new functions in, the eCryptfs, Ext4, OCFS2 and
XFS file systems. There are also numerous new and revised drivers,
including new or revised audio drivers from the Alsa project and over
600 changes to the V4L/DVB drivers. These are now joined by various, in
some cases very large, staging drivers, such as the Comedi
framework, or support for Google's Android. heise open's Kernel Log
will carry detailed reports on these and other changes over the next
few weeks as part of our "What's coming in 2.6.29" series.
The realtime defragmenter (online ext4 defragmentation) has
not made it into 2.6.29  Theodore Tso explains why on LKML. Also left out, for the time
being, are support for operation as a primary Xen domain (Dom0) and compression
of the kernel image with bzip2/lzma. It looks
like it could also be a while before support for kernel-based mode
setting with AMD hardware meets the kernel development team's quality
All about X.org
AMD developer Alex Deucher has released version 6.10
of the xf86-video-ati driver package, usually known simply as ati or
radeon. It includes support for the RV710 (Radeon HD 4300/HD 4500) and
RV730 (Radeon HD 4600) Radeon chips. The new version also reduces
tearing during video playback and supports Bicubic Xv scaling on
r3xx/r4xx/r5xx/rs690 Radeon chips. The developer discusses further
changes on his blog.
Matthias Hopf has now released the AtomBIOS disassembler previously
used for programming the alternative Radeon graphics driver radeonhd.
He describes some of the background to the tool on his
The X.org developers have also released
version 1.4.0 of the xf86-input-mouse mouse driver. This driver deals
with many of the tasks previously dealt with by X server, and the code
responsible for this has been removed from X server  with the result
that in X server 1.6, currently under development, users will, unless
their systems use Evdev, need at least version 1.4.0 of
In Brief:

  Following LWN.net's occasional publication of analysis of which
kernel developers have, for instance, introduced the most or the
largest changes into a kernel version (e.g. 1, 2,
3), Wang Chen has been trying his hand at a similar set
of online statistics.

  SELinux hacker James Morris has announced the creation of the Kernel Security Wiki on his
blog, where he has also recently summarised
all the most significant security-related changes in Linux 2.6.28.

  As part of the discussion on the adoption of Squashfs,
Greg Kroah-Hartman has declared that he will in future accept file
systems into the staging directory, as long as they do not require
changes in other parts of the kernel.

  Daniel Phillips is continuing to work on Tux3 and is keeping the
developer community updated on new features or internal matters in his
"Tux3 Report"  a recent e-mail to LKML, for example, elucidates the
current structure of the file system.

  The kernel development team are planning to hold a "Linux Storage and Filesystem Summit 2009" in San
Francisco in early April.

  A group of developers are working on open source firmware
for some of the Broadcom WLAN chips supported under Linux by the b43
driver; this firmware does not, however, appear to work for all
testers. Marvell has made WLAN firmware for the GSPI-88W8686 available to download, but has not released the
source code.

  As reported elsewhere, Nvidia has
released version 180.22 of its proprietary graphics driver for x86-32 and x86-64 Linux.

Further background and information about developments in the
Linux kernel and its environment can also be found in previous issues
of the kernel log at heise open:

  Kernel Log:
What's new in 2.6.29 - Part 2: WiMax
Log: What's new in 2.6.29 - Part 1: Dodgy Wifi drivers and AP support
Log: 2.6.29 development kicks off, improved 3D support
Log: Higher and Further, The innovations of Linux 2.6.28
Log: What's coming in 2.6.28 - Part 9: Fastboot and other 

[linuxkernelnewbies] Kernel Log: What's new in 2.6.29 - Part 3: Kernel controlled graphics modes - The H: Security news and Open source developments

2009-04-26 Thread Peter Teoh


Kernel Log: What's new in 2.6.29 - Part 3: Kernel controlled
graphics modes
With the release of 2.6.29-rc1
last weekend, Linus Torvalds concluded the first phase, called the
merge window, of the development cycle. This phase allows for
incorporating the substantial changes intended for the next kernel
version into the source code management system
of the Linux kernel. As a result, 2.6.29 is now in the second,
stabilising phase, which usually takes eight to ten weeks and gives the
kernel developers the opportunity to correct mistakes and make minor
changes that are unlikely to cause further flaws. As major changes are
only rarely discarded during the stabilising phase, the kernel log can
already discuss the most important changes expected for 2.6.29 in the
"What's new in 2.6.29" series.
Kernel-based mode setting
Almost 21 months after its
first major announcement,
the support for kernel-based mode setting (KMS) for recent Intel
graphics hardware has been integrated into the main development branch
of Linux (for example 1, 2, 3).
This technology gives the kernel noticeably more control over the
graphics hardware. When KMS is active, the kernel sets the graphics
mode suitable for a monitor as soon as all the required hardware
components (ACPI, PCI, graphics hardware etc.) have been initialised.
From a user's perspective, this approach is initially no different
from framebuffer graphics with suitable drivers. However, in contrast
to framebuffer graphics, the kernel also sets the screen resolution
during operation, taking over this, and other tasks, from the X server.
If the X server and a text console, managed with KMS, use the same
screen resolution, the kernel no longer needs to reset the graphics
chip and screen resolution when switching between the graphics
interface and the console; this was previously required every time the
user switched to X and VGA text or framebuffer consoles, because the
kernel didn't know the X Server's configuration of the graphics chip.
As a result, switching with KMS  for example while booting, when the X
server first starts up  is considerably faster and is no longer
afflicted by screen flickering or short display disruptions.
Because the kernel controls the graphics hardware in KMS, problems
that arise when the VGA console and framebuffer driver, the Direct
Rendering Manager (DRM) and various userspace programs, including the X
server, compete for access to the graphics hardware, can be eliminated.
With KMS, when waking up from suspend mode, the kernel also handles the
entire graphics hardware re-initialisation, which is designed to solve
some of the problems with using the suspend modes.
With KMS, X servers will reportedly also operate without root
privileges; this and several other improvements associated with KMS are
to facilitate the parallel operation of several X servers, allowing
users to switch backwards and forwards (fast user switching). KMS will
also allow Linux to snatch control from the X server in case of a
serious kernel problem (kernel panic) and display troubleshooting
instructions similar to those displayed for the dreaded blue screen in
Windows  some developers have talked about a "Blue
Penguin Of Death", but this isn't possible with the code
incorporated in 2.6.29.
To avoid hardware access disagreements between the X server and the
kernel, the X server and its graphics driver must also support KMS.
However, X and kernel hacker Dave Airlie, who is responsible for the
kernel's DRM code, explicitly says in his patch integration request
that these parts are still being developed and currently are only
intended for developers; therefore, KMS should not be enabled during
kernel configuration, without the required userspace support.
It is likely to be some time until the kernel is ready for KMS with
Radeon hardware: Although the KMS code for Radeon GPUs is already
available, it is based on the TTM Memory Manager rather than the more
recent Graphics Execution Manager (GEM) incorporated
with 2.6.28 and so far that is geared to work with Intel hardware.
However, according
to Dave Airlie
the TTM code is not mature enough to be integrated into the official
kernel yet. It will probably be even longer until KMS becomes available
with a standard kernel and Nvidia hardware, unless the developers of
the Nouveau driver,
which was created using reverse engineering, can pull some mature KMS
code out of their hats, or Nvidia decides to provide KMS support. The
latter is particularly likely to improve the reliability of the suspend
modes, which often malfunction with the open source drivers for Nvidia
More graphics
The Graphics Execution Manager (GEM), which is still set to work
with Intel hardware and manages the main memory as well as the access
to the GPU's processing units, has been extended to include new
features in 2.6.29 (1, 2).
Several further 

[linuxkernelnewbies] Direct Memory Access (DMA) and Interrupt Handling

2009-04-26 Thread Peter Teoh


DMA and Interrupt Handling
In this series on hardware basics, we have already looked at read
and write bus cycles. In this article we will cover Direct Memory
(DMA) and Interrupt Handling. Knowledge of DMA and interrupt handling
would be
useful in writing code that interfaces directly with IO devices (DMA
based serial port design pattern is a good example of such a
We will discuss the following topics:


Memory Access (DMA)
  A typical DMA operation is described here.
Interactions between the main CPU and DMA device are covered. The
impact of DMA on processor's internal cache is also covered.

  Processor handling of hardware interrupts is
described in this section.

Acknowledge Cycle
  Many processors allow the interrupting hardware
device to identify itself. This speeds up interrupt handling as the
processor can directly invoke the interrupt service routine for the
right device.

Requirements for DMA and Interrupts
  Software designers need to keep in mind that DMA
operations can be triggered at bus cycle boundary while interrupts can
only be triggered at instruction boundary.


Direct Memory Access (DMA)

  Device wishing to perform DMA asserts the processors bus request
  Processor completes the current bus cycle and then asserts the
bus grant signal to the device.
  The device then asserts the bus grant ack signal.
  The processor senses in the change in the state of bus grant ack
signal and starts listening to the data and address bus for DMA
  The DMA device performs the transfer from the source to
destination address.
  During these transfers, the processor monitors the addresses on
the bus and checks if any location modified during DMA operations is
cached in the processor. If the processor detects a cached address on
the bus, it can take one of the two actions:

  Processor invalidates the internal cache entry for the
address involved in DMA write operation
  Processor updates the internal cache when a DMA write is

  Once the DMA operations have been completed, the device releases
the bus by asserting the bus release signal.
  Processor acknowledges the bus release and resumes its bus cycles
from the point it left off.

Interrupt Handling
Here we describe interrupt handling in a scenario where the hardware
does not
support identifying the device that initiated the interrupt. In such
cases, the
possible interrupting devices need to be polled in software.

  A device asserts the interrupt signal at a hardwired interrupt
  The processor registers the interrupt and waits to finish the
current instruction execution.
  Once the current instruction execution is completed, the
processor initiates the interrupt handling by saving the current
register contents on the stack.
  The processor then switches to supervisor mode and initiates an
interrupt acknowledge cycle.
  No device responds to the interrupt acknowledge cycle, so the
processor fetches the vector corresponding to the interrupt level.
  The address found at the vector is the address of the interrupt
service routine (ISR).
  The ISR polls all the devices to find the device that caused the
interrupt. This is accomplished by checking the interrupt status
registers on the devices that could have triggered the interrupt.
  Once the device is located, control is transferred to the handler
specific to the interrupting device.
  After the device specific ISR routine has performed its job, the
ISR executes the "return from interrupt" instruction.
  Execution of the "return from interrupt" instruction results in
restoring the processor state. The processor is restored back to user

Interrupt Acknowledge Cycle
Here we describe interrupt handling in a scenario where the hardware
support identifying the device that initiated the interrupt. In such
cases, the
exact source of the interrupt can be identified at hardware level.

  A device asserts the interrupt signal at a hardwired interrupt
  The processor registers the interrupt and waits to finish the
current instruction execution.
  Once the current instruction execution is completed, the
processor initiates the interrupt handling by saving the current
register contents on the stack.
  The processor then switches to supervisor mode and initiates an
interrupt acknowledge cycle.
  The interrupting device responds to the interrupt acknowledge
cycle with the vector number for the interrupt.
  Processor uses the vector number obtained above and fetches the
  The address found at the vector is the address of the interrupt
service routine (ISR) for the interrupting device.
  After the ISR routine has performed its job, the ISR executes
the "return from interrupt" instruction.
  Execution of the "return 

[linuxkernelnewbies] taskset - retrieve or set a pro cess’s CPU affinity

2009-04-26 Thread Peter Teoh

Linux User’s Manual   TASKSET(1)

   taskset - retrieve or set a process’s CPU affinity

   taskset [options] mask command [arg]...
   taskset [options] -p [mask] pid

   taskset  is used to set or retrieve the CPU affinity of a
running process given its PID or to launch a new COMMAND with a given
CPU affinity.  CPU affinity
   is a scheduler property that "bonds" a process to a given set of
CPUs on the system.  The Linux scheduler will honor the given CPU
affinity and the process
   will not run on any other CPUs.  Note that the Linux scheduler
also supports natural CPU affinity: the scheduler attempts to keep
processes on the same CPU
   as long as practical for performance reasons.  Therefore,
forcing a specific CPU affinity is useful only in certain applications.

   The CPU affinity is represented as a bitmask, with the lowest
order bit corresponding to the first logical CPU and the highest order
bit  corresponding  to
   the  last logical CPU.  Not all CPUs may exist on a given system
but a mask may specify more CPUs than are present.  A retrieved mask
will reflect only the
   bits that correspond to CPUs physically on the system.  If an
invalid mask is given (i.e., one that corresponds to no valid CPUs on
the current system)  an
   error is returned.  The masks are typically given in
hexadecimal.  For example,

  is processor #0

  is processors #0 and #1

  is all processors (#0 through #31)

   When taskset returns, it is guaranteed that the given program
has been scheduled to a legal CPU.

   -p, --pid
  operate on an existing PID and not launch a new task

   -c, --cpu-list
  specify  a  numerical  list  of processors instead of a
bitmask.  The list may contain multiple items, separated by comma, and
ranges.  For example,

   -h, --help
  display usage information and exit

   -V, --version
  output version information and exit

   The default behavior is to run a new command with a given
affinity mask:
  taskset mask command [arguments]

   You can also retrieve the CPU affinity of an existing task:
  taskset -p pid

   Or set it:
  taskset -p mask pid

   A user must possess CAP_SYS_NICE to change the CPU affinity of a
process.  Any user can retrieve the affinity mask.

   Written by Robert M. Love.


Description of problem:

Install Fedora 9 (sorry, I can not find the entry for F9 bug report). 
Set one task affinity to one CPU core, then set offline the CPU core. After that
we can not set online the CPU core again.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1.  run the test.sh script:   test.sh 1  ( 1 is the logical CPU )
test.sh script:


typeset -i CPU=$1
./task.sh  /dev/null

if [ `cat /sys/devices/system/cpu/cpu${CPU}/online` = "0" ]; then
echo "1"  /sys/devices/system/cpu/cpu${CPU}/online


`taskset -p ${MASK} ${PID}  /dev/null 21`

echo "0"  /sys/devices/system/cpu/cpu${CPU}/online

echo "1"  /sys/devices/system/cpu/cpu${CPU}/online

disown $PID
kill -9 $PID  /dev/null 21

echo "PASS\n"


typeset -i TEST_CPU=$1
main $TEST_CPU

2. task.sh script as following


while :

Actual results:

The test.sh will block at set online the CPU ( echo "1" 
/sys/devices/system/cpu/cpu${CPU}/online ). 

Expected results:

Additional info:
Happened in Intel Bensley platform (2xXeon 2.83G Harpertown C0, chipset
Blackford G1, 160 SATA)  
  --- Comment #1
>From  Bill Nottingham  2008-02-18 13:07:03
EDT ---  
Does this happen on the upstream kernel as well?  
  --- Comment #2
>From  Song, Youquan  2008-02-21 03:59:48
EDT ---  
Yes. the kernel 2.6.24 to 2.6.25-rc2 also exit the bug. 
But the bug is not exit at kernel 2.6.18.   
  --- Comment #3
>From  Chuck Ebbert  2008-02-25 17:33:21 EDT
Does the CPU mask of the running process get changed when the processor is

And can you get a system state (alt-sysrq-t) when the script hangs?  
  --- Comment #4
>From  Song, Youquan  2008-02-27 04:15:50
EDT ---  
Yes, after I set offline the CPU, I use commands "taskset -p $PID and ps --
pid=$PID -o psr" to find that process CPU mask is change and process migrate 
to other CPU correctly.
Attachment is the Screenshot.png  
  --- Comment #5
>From  Song, Youquan  2008-02-27 04:18:13
EDT ---  
Created an attachment (id=296037) [details]
CPU can not  do hotplug when one