from:"Peter Teoh"

[linuxkernelnewbies] HotNets-VII: Technical Program

2009-08-26 Thread Peter Teoh





http://conferences.sigcomm.org/hotnets/2008/program.html


  

  
  Seventh ACM Workshop on
  
  


  
  Hot Topics in Networks
  


  
  (HotNets-VII)
  


  
  
  
  Calgary, Alberta, Canada
  


  
  
  
  October 6-7, 2008
  

  



  Home
  Call
for Papers
  
  Paper Submission
  
  Important
Dates
  
Technical Program
  Registration
  
  Travel
  Student
Travel Grants
  Hotel
  
  Local
Information
  Organizing
Committee


Technical Program

The HotNets-VII workshop will have a two-day technical program,
similar to past workshops. The full program is provided below.


All technical sessions take place in the Husky Oil
Great Hall in the Rozsa Centre at the University of Calgary.
Wireless Internet access is available via the AirUC network.


Continental breakfast is available at
the Quality Inn hotel starting at 7:00am each day, and also
at the Husky Oil Great Hall starting at 8:00am each day (your choice!).
Morning and afternoon coffee breaks will be provided next
to the meeting area.
Daily lunches and the Monday evening reception will take place in the
Blue Room at the Dining Centre, University of Calgary.



All HotNets 2008 papers in one PDF (20 MB)

 Monday, October 6 
7:30am: Registration Desk Open (Husky Oil Great Hall, Rozsa Centre)

8:00am: Continental Breakfast

8:45am: Welcome and Opening Remarks

9:00am: Session 1: Router Hardware (Chair: Alex Snoeren)

Summary 

  

Rethinking Packet Forwarding Hardware

Martin Casado (Nicira Networks), Teemu Koponen (HIIT), Daekyeong Moon
(UCB), Scott Shenker (UCB and ICSI) 
  

API Design Challenges for Open Router Platforms on Proprietary Hardware

Jeffrey C. Mogul (HP Labs), Praveen Yalagandula (HP Labs), Jean
Tourrilhes (HP Labs), Rick McGeer (HP Labs), Sujata Banerjee (HP Labs),
Tim Connors (HP Labs), Puneet Sharma (HP Labs) 
Slides (PDF, 175 KB)
  

10:00am: Break

10:30am: Session 2: Wireless I (Chair: Majid Ghaderi)

Summary 

  

Interference Avoidance and Control

Ramakrishna Gummadi (MIT), Rabin Patra (UC Berkeley), Hari Balakrishnan
(MIT), Eric Brewer (UC Berkeley) 
Slides (PPT, 7.8 MB)
  
  

Wireless ACK Collisions Not Considered Harmful

Prabal Dutta (University of California, Berkeley), Razvan Musaloiu-E.
(Johns Hopkins University), Ion Stoica (University of California,
Berkeley), Andreas Terzis (Johns Hopkins University) 
Slides (PDF, 650 KB)
  
  

Message in Message (MIM): A Case for Shuffling Transmissions in
Wireless Networks

Naveen Kumar Santhapuri (University of South Carolina), Justin
Manweiler (Duke University), Souvik Sen (Duke University), Romit Roy
Choudhury (Duke University), Srihari Nelakuditi (University of South
Carolina), Kamesh Munagala (Duke University) 
Slides (PPT, 1.1 MB)
  

12 noon: Lunch (Blue Room, Dining Centre)

1:30pm: Session 3: Potpourri (Chair: Nick Feamster)

Summary 

  

Can You Fool Me? Towards Automatically Checking Protocol Gullibility

Milan Stanojevic (UCLA), Ratul Mahajan (Microsoft Research), Todd
Millstein (UCLA), Madanlal Musuvathi (Microsoft Research) 
Slides (PDF, 630 KB)
  
  

Revisiting Smart Dust with RFID Sensor Networks

Michael Buettner (University of Washington), Ben Greenstein (Intel
Research Seattle), Alanson Sample (University of Washington), Joshua R.
Smith (Intel Research Seattle), David Wetherall (Intel Research
Seattle, University of Washington) 
Slides (PDF, 1.7 MB)
  
  

Eat All You Can in an All-you-can-eat Buffet: A Case for Aggressive
Resource Usage

Ratul Mahajan (Microsoft Research), Jitu Padhye (Microsoft Research),
Ramya Raghavendra (Microsoft Research), Brian Zill (Microsoft Research)

Slides (PDF, 420 KB)
  

3:00pm: Break

3:30pm: Session 4: Cloud and Data Centers (Chair: Martin
Arlitt)

Summary 

  

Plugging Into Energy Market Diversity

Asfandyar Qureshi (MIT)

Slides (PDF, 2.7 MB)
  
  

On Delivering Embarrassingly Distributed Cloud Services

Kenneth Church (Microsoft), Albert Greenberg (Microsoft), James
Hamilton (Microsoft) 
Slides (PPTX, 3.2 MB)
  
  

Dr. Multicast: Rx for Datacenter Communication Scalability

Ymir Vigfusson (Cornell University), Hussam Abu-Libdeh (Cornell
University), Mahesh Balakrishnan (Cornell University), Ken Birman
(Cornell University), Yoav Tock (IBM Haifa Research Lab) 
Slides (PPT, 740 KB)
  

5:00pm: End of Technical Sessions for Day 1

5:30pm: Evening Reception (2 hours, Blue Room, Dining Centre)
 Tuesday, October 7 
8:00am: Continental Breakfast

8:30am: Session 5: Wireless II (Chair: Carey Williamson)

Summary 

  

How to Evaluate Exotic Wireless Routing Protocols?

Dimitrios Koutsonikolas (Purdue University), Y. Charlie Hu (Purdue
University), Konstantina Papagiannaki (Intel Research, Pittsburgh) 
Slides (PPT, 650 KB)
  
  

Wireless Networks Should Spread Spectrum Based on Demands

Ramakrishna

[linuxkernelnewbies] Thursday 2009-08-20: SIGCOMM CONFERENCE: Performance Optimization (Chair: Ratul Mahajan, Microsoft Research)

2009-08-26 Thread Peter Teoh






http://eurosys.org/blog/?p=246

A blog where winners of the EuroSys travel
grants (and a few others) can report on conferences they attended



« Thursday
2009-08-20: SIGCOMM CONFERENCE: Session 8: Network Measurement (Chair:
Gianluca Iannaccone, Intel Labs Berkeley)
Thursday
2009-08-20: SIGCOMM CONFERENCE: Closing Remarks »


Thursday 2009-08-20: SIGCOMM CONFERENCE: Performance Optimization
(Chair: Ratul Mahajan, Microsoft Research)

Session 9: Performance Optimization (Chair: Ratul Mahajan, Microsoft
Research)
———-
Safe and Effective Fine-grained TCP Retransmissions for Datacenter
Communication
Vijay Vasudevan (Carnegie Mellon University), Amar Phanishayee
(Carnegie Mellon University), Hiral Shah (Carnegie Mellon University),
Elie Krevat (Carnegie Mellon University), David Andersen (Carnegie
Mellon University), Greg Ganger (Carnegie Mellon University), Garth
Gibson (Carnegie Mellon University and Panasas, Inc), Brian Mueller
(Panasas, Inc.)
———-
TCP has a problem in data centers: the dropped packet takes 200ms to be
retransmitted
There are some apps that can not tolerate that
solution: enable ms retransmission
improve throughout/latency in datacenter
safe for wide-area
10-100 microsecond, 1-10Gbps
under heavy load, pkt loss is common
1 TCP timout is 1000s times more than RTT
The scenario involves the client sending a single request packet
once in a while. This is in contrary of TCP design principles: full
window of packets. Hence, the fast-retransmission does not get
triggered in case of pkt loss
Solution:
1) eliminate long 200ms timeout
2) TCP must track RTT in microseconds
Interaction with delayed ACK
- The reduction is not so much
Stability? Causing congestion collapse?
- Today’s TCP has mechanisms to cope with that
Q: problem for congestion control?
A: exponential backup takes care of that
Tags: Sigcomm09
  This entry was posted on
Thursday, August 20th, 2009 at 3:16 pm and is filed under Uncategorized.
You can follow any responses to this entry through the RSS 2.0 feed.
Responses are currently closed, but you can trackback
from your own site.  



Comments are closed.

[linuxkernelnewbies] Technical Training by Dashcourses - PCIe Fundamentals

2009-08-27 Thread Peter Teoh





http://www.dashcourses.com/public-courses/public-courses/pcie-fundamentals.html


  

  PCIe Fundamentals 
 
 

  

Public Training Course - Taught Live OnLine

  

  
  Course Details
  
   
  
   


  Date: 
  
  Nov 2nd-5th 
   


  Time:
  Noon - 5pm EST
   


  Location:
  Online Webinar
  
   


  Tuition:
  
  $995
  
   

  

Description
This online PCI Express technical
training course will cover the PCI-SIG‘s PCI Express Base
Specification, including version 2.0 changes/enhancements. Emphasized
material will include the details of the new PCI Express protocol stack
for Express devices, including, protocol layer functions and formats,
transaction details, and configuration requirements. Also presented are
the 2.0 changes including Trusted Configuration Environment, Trusted
Configuration Access Method, Trusted Computing Group, and Trusted
Platform Module or TCE, TCAM, TCG and TPM. Legacy and native PCI
Express devices and power management, and fabric topology, which
provide the ability to allocate bandwidth and support isochronous
applications, will be discussed.
PCI Express is a next generation
PCI enhancement, and is here to stay. Express is now a serial bus
inter-connect I/O technology, along with associated speed, protocol,
and capabilities enhancements well beyond PCI and PCI-X. Express is an
extension of the PCI Base Specification and maintains binary backwards
compatibility with previous versions of the PCI and PCI-X
Specification. 
Objectives 
In this course you will:

  Learn about PCI Express fabric topology, the terms and
definitions 
  Understand the PCI Express protocol including layer definitions
and layer relationships 
  Traffic types defined by PCI Express and the meaning and usage of
isochronous traffic 
  The PCI Express definition of configuration 
  Compatibility requirements with PCI and PCI-X, and PCI Express
new enhanced features 
  Understand
the parallel/serial paradigm shift and PCI Express capabilities
relative to other serial hardware/software architects 
  Serial protocol analyzer usage in design and debug, validation,
and testing 

Prerequisites 
Participants should have attended the Dashcourses
three-day PCI/PCI-X
course or have a good working understanding of PCI 2.2 or later.
Knowledge of the related PCI supporting specifications as defined by
the PCI Special Interest Group (PCI SIG) is helpful but not required. 
Outline 
PCI Express Architectural Overview
This
chapter will provide the only review of PCI/PCI-X; only the essential
concepts relative to PCI Express will be presented. Performance
enhancements (real and hyped) will be discussed and explained. All the
PCI Express architectural components will be defined. The chapter will
serve as an overview of the rest of the class with specific details
explained in related chapters.

  Next Generation PCI 
  
Compatibility with Existing PCI Specification 

  PCI 
  PCI-X 
  PCI Compatibility Software 
  
Ability to enumerate and configure PCI Express hardware
using PCI system configuration software with no modifications
  

Performance Enhancements 

  Low-overhead, low-latency point-to-point communications 
  Support for different traffic types I 
  
sochronous traffic support
  
  Support for differentiated serviced 
  Hot Plug and Hot Swap Support 
  Multi-hierarchy topology Support

PCI Express System Architecture 

  High speed serial interconnect 
  PCI Express Protocol Stack 
  Device types 
  Legacy and native Power Management Support 
  INTx Emulation and MSI Support 
  Error Signaling and Logging 
  Virtual Channel Support 

  

Serial/Parallel Paradigm
This
chapter is not part of the specification, but an explanation of major
differences between parallel and high speed serial interconnects.
Clocking and signal differences, and diagnostic tools are emphasized. A
comparison between different serial protocols presented.

  Serial vs. Parallel Communications 
  
Signal and Clocking 
PCI Express Protocol Stack 
OSI Model and Layered Protocols 

  Ethernet, Fiver Channel, TCP/IP, InfiniBand, and PCI Express
Comparison

Serial Switches 
Serial Protocol Analyzers
  

Physical Layer
This
chapter begins the explanation of the PCI Express Protocol Stack,
starting at the bottom lf the stack with the physical layer. Bits,
bytes, symbols, clocking, wire bit rate and effective bit rate, and
basic electrical/mechanical requirements are presented.

  Physical Layer 
  
Physical Layer Function and Services 
Logical Sub-Block 

  Symbol Encoding 
  
Symbols, Symbol Types, and Special Character Sets
  
  8B/10B Decode Rules 
  Framing and Application of Symbols to Lanes

[linuxkernelnewbies] Startup state of a Linux/i386 ELF binary

2009-08-27 Thread peter teoh






http://asm.sourceforge.net/articles/startup.html

Startup state of a Linux/i386 ELF binary
Copyright (C) 1999-2000 by Konstantin
Boldyshev

All information provided here has derived
from my own research.
So, mistakes and deficiencies could exist.
If you find any -- please contact me.

Contents

  1.
Introduction
  2.
Overview
  3.
Stack layout
  4.
Registers
  
 4.1 Linux
2.0
4.2
Linux 2.2

  
  5.
Other info
  6.
Summary
  7.
Contact


1. Introduction

The objective of this document is to describe several startup process
details
and the initial state of the stack & registers of an ELF binary
program,
for Linux Kernel 2.2.x and 2.0.x on i386.

Portions of material represented here may be
applicable
to any ELF-based IA-32 OS (FreeBSD, NetBSD, BeOS, etc).

Please note that in general case you can apply this
information
only to plain assembly programs (gas/nasm);
some things described here (stack/registers state) are not true
for anything compiled/linked with gcc (C as well as assembly) --
gcc inserts its own startup code which is executed before control
is passed to main() function.

Main source and authority of information provided below
is Linux Kernel's fs/binfmt_elf.c file.
If you want all details of the startup process -- go read it.

All assembly code examples use nasm syntax.

You can download program suite that was used while writing this
document
at the Linux Assembly (binaries, source).

2. Overview

Every program is executed by means of sys_execve() system call;
usually one just types program name at the shell prompt.
In fact a lot of interesting things happen after you press enter.
Shortly, startup process of an ELF binary can be represented
with the following step-by-step figure:

  

  Function
  Kernel file
  Comments


  shell
  ...
  on user side one types in program name and
strikes enter


  execve()
  ...
  shell calls libc function


  sys_execve()
  ...
  libc calls kernel...


  sys_execve()
  arch/i386/kernel/process.c
  arrive to kernel side


  do_execve()
  fs/exec.c
  open file and do some preparation


  search_binary_handler()
  fs/exec.c
  find out type of executable


  load_elf_binary()
  fs/binfmt_elf.c
  load ELF (and needed libraries) and create
user segment


  start_thread()
  include/asm-i386/processor.h
  and finally pass control to program code

  

Figure 1. Startup
process of an ELF binary.

Layout of segment created for an ELF binary shortly can be represented
with Figure 2.
Yellow parts represent correspondent program sections.
Shared libraries are not shown here; their layout duplicates layout of
program,
except that they reside in earlier addresses.
0x08048000

  

  code
  .text section


  data
  .data section


  bss
  .bss section


  ...
...
...
  free space


  stack
  stack (described later)


  arguments
  program arguments


  environment
  program environment


  program name
  filename of program (duplicated in arguments
section)


  null (dword)
  final dword of zero

  

0xBFFF
Figure 2. Segment
layout of an ELF binary.

Program takes at least two pages of memory (1 page == 4 KB),
even if it consists of single sys_exit();
at least one page for ELF data (yellow color),
and one for stack, arguments, and environment.
Stack is growing to meet .bss;
also you can use memory beyond .bss section for dynamic data
allocation.

Note: this information was gathered from
fs/binfmt_elf.c, include/linux/sched.h
(task_struct.addr_limit),
and core dumps investigated with ultimate binary viewer
biew).

3. Stack layout
Initial stack layout is very important, because it provides access
to command line and environment of a program.

Here is a picture of what is on the stack when program is launched:

  

  argc
  [dword] argument counter (integer)


  argv[0]
  [dword] program name (pointer)


  argv[1]
  ...
  argv[argc-1]
  
  [dword] program args (pointers)


  NULL
  [dword] end of args (integer)


  env[0]
  env[1]
  ...
  env[n]
  
  [dword] environment variables (pointers)


  NULL
  [dword] end of environment (integer)

  

Figure 3. Stack
layout of an ELF binary.
Here is the piece of source from kernel that proves it:
fs/binfmt_elf.c create_elf_tables()
	...

	put_user((unsigned long) argc, --sp);
	current->mm->arg_start = (unsigned long) p;
	while (argc-- > 0) {
		put_user(p, argv++);
		while (get_user(p++))	/* nothing */
			;
	}
	put_user(0, argv);
	current->mm->arg_end = current->mm->env_start = (unsigned long) p;
	while (envc-- > 0) {
		put_user(p, envp++);
		while (get_user(p++))	/* nothing */

[linuxkernelnewbies] core dump generator in linux kernel

2009-08-27 Thread peter teoh






http://www.isec.pl/vulnerabilities/isec-0023-coredump.txt

Synopsis:  Linux kernel ELF core dump privilege elevation
Product:   Linux kernel
Version:   2.2 up to and including 2.2.27-rc2, 2.4 up to and including
   2.4.29, 2.6 up to and including 2.6.11
Vendor:http://www.kernel.org/
URL:   http://isec.pl/vulnerabilities/isec-0023-coredump.txt
CVE:   CAN-2005-1263
Severity:  local(9)
Author:Paul Starzetz 
Date:  May 11, 2005
Updated:   May 12, 2005


Issue:
==

A locally exploitable flaw has been found in the Linux ELF binary format
loader's core dump  function  that  allows  local  users  to  gain  root
privileges and also execute arbitrary code at kernel privilege level.


Details:


The Linux kernel contains a binary format loader layer to load (execute)
programs in different binary formats like ELF  or  a.out.  Some  of  the
binary  format  modules  like  ELF provide an additional function to the
kernel layer named core_dump(). The kernel may call this function  if  a
fault  (e.g.  memory  access  error)  occurs during the execution of the
binary. The core_dump() function will be called by the  kernel,  if  the
process's limit for the core file (RLIMIT_CORE) is sufficiently high and
the process's binary format supports core dumping.

The regular task of the core_dump() function is to  create  an  on  disk
image  of  the  faulty  binary  at the moment of the execution fault for
debugging purposes. In the case of an ELF binary, the image will contain
a  memory  fingerprint  of  the  binary, its registers and moreover some
kernel level structures  containing  the  kernel  state  of  the  faulty
process.

An  analyze  of  the  ELF's  function  elf_core_dump() from binfmt_elf.c
revealed a flaw in the handling of the argument area of an ELF  process.
The  argument  area  is the memory region of the process (in user space)
that contains program arguments at the time  of  its  initial  execution
(argc and argv arguments to the C main() function, arg_start and arg_end
fields in the process's memory descriptor).


Discussion:
=

The vulnerable  code  resides  in  fs/binfmt_elf.c  in  your  preferable
version of the Linux kernel source code tree:

static int elf_core_dump(long signr, struct pt_regs * regs, struct file * file)
{
   struct elf_prpsinfo psinfo; /* NT_PRPSINFO */

   /* first copy the parameters from user space */
   memset(&psinfo, 0, sizeof(psinfo));
   {
[*]   int i, len;

  len = current->mm->arg_end - current->mm->arg_start;
[**]  if (len >= ELF_PRARGSZ)
 len = ELF_PRARGSZ-1;
[1167]copy_from_user(&psinfo.pr_psargs,
   (const char *)current->mm->arg_start, len);

where  the  line numbers are all valid for the 2.4.30 kernel version. As
can be seen from [*] the len variable supplied to  the  copy_from_user()
function  is signed and can potentially take a negative value. That will
let the check [**] pass  (since  the  ELF_PRARGSZ  constant  is  defined
signed  the  check will be performed with signed arithmetic) and cause a
kernel stack buffer overflow. Note that a negative  length  provided  to
copy_from_user()  will  be interpreted as a very high positive byte copy
count, since the length argument of  the  copy_from_user()  function  is
defined unsigned itself.

However,  there  is at least one difficulty - how could the len argument
become negative? A fast grep through the source code  reveals  that  the
arg_start/end  fields are set only during execution of a new program. In
case of ELF this is performed in the create_elf_tables() subroutine from
binfmt_elf.c,  so  that  in theory those fields are always reset to safe
values. Paradoxically,  there  is  a  flaw  in  the  create_elf_tables()
function,  that  can  permit  a  binary to "inherit" old values from the
preceding binary (during binary execution the task descriptor as well as
the memory descriptor are kept). A look at the code in question reveals:

static elf_addr_t *
create_elf_tables(char *p, int argc, int envc,
struct elfhdr * exec,
unsigned long load_addr,
unsigned long load_bias,
unsigned long interp_load_addr, int ibcs)
{
   current->mm->arg_start = (unsigned long) p;
   while (argc-->0) {
  __put_user((elf_caddr_t)(unsigned long)p,argv++);
  len = strnlen_user(p, PAGE_SIZE*MAX_ARG_PAGES);
  if (!len || len > PAGE_SIZE*MAX_ARG_PAGES)
[239]return NULL;
  p += len;
   }
   __put_user(NULL, argv);
   current->mm->arg_end = current->mm->env_start = (unsigned long) p;

Obviously it is possible  to  return  from  create_elf_tables()  without
setting  arg_end  (but  with  arg_start  set  to  a  new  value), if the
strnlen_user()  function  fails  to  count  the  length  of  the  binary
argument(s)  supplied.  If  the  arg_start value becomes hig

[linuxkernelnewbies] SourceForge.net: pyvix - Project Web Hosting - Open Source Software

2009-08-28 Thread Peter Teoh






http://pyvix.sourceforge.net/

 Project Information 
 About this project: 
 This is the  pyvix  project ("pyvix") 
pyvix is a Python wrapper for the VMWare(R) VIX C API that allows
Python to programmatically control VMWare(R) virtual machines. Example
operations include: powering on; suspending; creating, reverting to,
and removing snapshots; and running programs.

[linuxkernelnewbies] Somniloquy: Maintaining network connectivity while your computer sleeps - Microsoft Research

2009-08-29 Thread Peter Teoh





http://research.microsoft.com/apps/pubs/default.aspx?id=70560

Somniloquy: Maintaining network connectivity while
your computer sleeps
Yuvraj Agarwal, Steve Hodges, James Scott, Ranveer
Chandra, Victor Bahl, and Rajesh Gupta
1 March 2008
Reducing
the energy consumption of computers is becoming increasingly important
with rising energy costs and environmental concerns. It is doubly
important for mobile devices, whose battery lifetime is always an
issue. Sleep states such as S3 (suspend) save energy but make it
impossible to communicate directly with a device across a network.
Therefore, many people do not use S3 and instead leave their computers
plugged in and active. Somniloquy enables devices to be configured so
that they may be awoken from S3 based on specified network traffic,
such as remote-desktop sessions and file-transfer requests. With
Somniloquy, remote servers, the network, and even applications running
on a device do not have to be modified or specially configured. We
present a prototype implementation of Somniloquy using a USB
peripheral, which is therefore easily retrofitted to existing
computers. Our prototype achieves a ten-fold increase in battery
lifetime compared to an idle computer not in S3, while only adding 4-7s
of latency to respond to applicationlayer events. Our system allows
computers to appear alwayson when they are in fact talking in their
sleep. 
 
 Please cite the subsequent version of this paper at
http://research.microsoft.com/apps/pubs/default.aspx?id=79419

[linuxkernelnewbies] Index of /1/items/RECON2008/

2009-08-29 Thread Peter Teoh






http://ia311337.us.archive.org/1/items/RECON2008/

Index of /1/items/RECON2008/

  
Name
Last Modified
Size
Type
  
  

  Parent
Directory/
   
  -  
  Directory


  RECON2008.thumbs/
  2008-Dec-01 21:43:50
  -  
  Directory


  RECON2008-T01_Pierre-Marc_Bureau-How_I_learned_Reverse_Engineering_with_Storm.avi
  2008-Jul-19 17:08:03
  208.3M
  video/x-msvideo


  RECON2008-T02-Bruce_Dang-Methods_for_analyzing_malicious_Office_documents.avi
  2008-Jul-19 18:05:59
  191.4M
  video/x-msvideo


  RECON2008-T03-Ilfak_Guilfanov-Building_plugins_for_IDA_Pro.avi
  2008-Jul-19 19:39:57
  312.5M
  video/x-msvideo


  RECON2008-T04-Thomas_Garnier-Windows_privilege_escalation_through_LPC_and_ALPC_interfaces.avi
  2008-Jul-19 20:33:58
  165.9M
  video/x-msvideo


  RECON2008-T05-Nicolas_Pouvesle-NetWare_kernel_stack_overflow_exploitation.avi
  2008-Jul-19 21:33:00
  195.7M
  video/x-msvideo


  RECON2008-T06-Cameron_Hotchkies-Under_the_iHood.avi
  2008-Jul-19 22:09:46
  120.6M
  video/x-msvideo


  RECON2008-T07-Jason_Raber-Helikaon_Linux_Debuger.avi
  2008-Jul-19 23:16:40
  207.3M
  video/x-msvideo


  RECON2008-T08-Craig_Smith-Creating_Code_Obfuscation_Virtual_Machines.avi
  2008-Jul-20 00:03:48
  158.4M
  video/x-msvideo


  RECON2008-T09-Eric_D_Laspe-The_Deobfuscator.avi
  2008-Jul-20 00:37:18
  112.7M
  video/x-msvideo


  RECON2008-T09-Eric_D_Laspe-The_Deobfuscator_512kb.mp4
  2008-Dec-02 09:52:44
  40.6M
  video/mp4


  RECON2008-T10-Nicolas_Brulez-Polymorphic_Virus_Analysis.avi
  2008-Jul-20 01:31:32
  182.8M
  video/x-msvideo


  RECON2008-T11-Michael_Strangelove-Hacking_Culture.avi
  2008-Jul-20 04:26:15
  582.1M
  video/x-msvideo


  RECON2008-T12-Anthony_de_Almeida_Lopes-Bypassing_Security_Protections_by_Backdooring_libc.avi
  2008-Jul-20 04:58:38
  107.8M
  video/x-msvideo


  RECON2008-T13-Alexander_Sotirov-Blackbox_Reversing_Of_XSS_Filters.avi
  2008-Jul-20 06:28:13
  300.1M
  video/x-msvideo


  RECON2008-T14-Aaron_Portnoy_and_Ali_Rizvi-Santiago-Reverse_Engineering_Dynamic_Languages_a_Focus_on_Python.avi
  2008-Jul-20 14:01:55
  173.4M
  video/x-msvideo


  RECON2008-T15-Sharon_Conheady_and_Alex_Bayly-Social_Engineering_for_the_Socially_Inept.avi
  2008-Jul-20 07:51:17
  271.7M
  video/x-msvideo


  RECON2008-T15-Sharon_Conheady_and_Alex_Bayly-Social_Engineering_for_the_Socially_Inept.gif
  2008-Dec-01 20:30:30
  640.4K
  image/gif


  RECON2008-T16-Pablo_Sole-RE_over_Adobe_Acrobat_Reader_using_Immunity_Debugger.avi
  2008-Jul-20 08:35:37
  147.7M
  video/x-msvideo


  RECON2008-T16-Pablo_Sole-RE_over_Adobe_Acrobat_Reader_using_Immunity_Debugger_512kb.mp4
  2008-Dec-01 21:14:44
  38.2M
  video/mp4


  RECON2008-T17-Gera-Two_very_small_reverse_engineering_tools.avi
  2008-Jul-20 09:18:25
  143.9M
  video/x-msvideo


  RECON2008-T18-Tiller_Beauchamp-RE_Trace-Applied_Reverse_Engineering_on_OS_X.avi
  2008-Jul-20 10:46:24
  294.7M
  video/x-msvideo


  RECON2008-T18-Tiller_Beauchamp-RE_Trace-Applied_Reverse_Engineering_on_OS_X.gif
  2008-Dec-01 21:47:35
  650.6K
  image/gif


  RECON2008-T19-Sebastien_Doucet-64-bit_Imports_Rebuilding_and_Unpacking-Part1.avi
  2008-Jul-20 12:13:31
  292.5M
  video/x-msvideo


  RECON2008-T19-Sebastien_Doucet-64-bit_Imports_Rebuilding_and_Unpacking-Part2.avi
  2008-Jul-20 12:59:27
  153.4M
  video/x-msvideo


  RECON2008_files.xml
  2008-Dec-02 23:18:18
  19.9K
  application/xml


  RECON2008_meta.xml
  2008-Jul-28 21:34:33
  4.2K
  application/xml


  RECON2008_reviews.xml
  2008-Oct-13 17:09:50
  0.5K
  application/xml

[linuxkernelnewbies] Linux Kernel Power Management

2009-08-30 Thread Peter Teoh







Linux Kernel Power Management 

29 April 2003

Patrick Mochel


Abstract

Power management is the process by which the overall consumption of
power by a computer is limited based on user requirements and
policy. Power management has become a hot topic in the computer world
in recent years, as laptops have become more commonplace and users
have become more conscious of the environmental and financial effects
of limited power resources. 

While there is no such thing as perfect power management, since all
computers must use some amount of power to run, there have been many
advances in system and software architectures to conserve the amount
of power being used. Exploiting these features is key to providing
good system- and device-level power management. 

This paper discusses recent advances in the power management
infrastructure of the Linux kernel that will allow Linux to fully
exploit the power management capabilities of the various platforms
that it runs on. These advances will allow the kernel to provide
equally great power management, using a simple interface, regardless
of the underlying archtitecture. 

This paper covers the two broad areas of power management - System
Power Management (SPM) and Device Power Management (DPM). It describes
the major concepts behind both subjects and describes the new kernel
infrastructure for implement both. It also discusses the mechanism for
implementing hibernation, otherwise known as suspend-to-disk, support
for Linux. 


Overview


Benefits of Power Management

A sane power management infrastructure provides many benefits to the
kernel, and in not only the obvious areas. 

Battery-powered devices, such as embedded devices, handhelds, and
laptops reap most of the rewards of power management, since the more
conservative the draw on the battery is, the longer it will last.

System power management decreases boot time of a system, by restoring
previously saved state instead of reinitializing the entire
system. This conserves battery life on mobile devices the annoying
wait for the computer to boot into a useable state.

Recently, power management concepts have begun to filter into less
obvious places, like the enterprise. In a rack of servers, some
servers may power down during idle times, and power back up when
needed again to fulfill network requests. While the power consumption
of a single server is but a drop in the water, being able to conserve
the power draw of dozens or hundreds of computers could save a company
a significant amount of money. 

Also, at the lower-level, power management may be used to provide
emergency reaction to a critical system state, such as crossing a
pre-defined thermal threshold or reaching a critically low battery
state. The same concept can be applied when triggering a critical
software state, like an Oops or a BUG() in the kernel. 



System and Device Power Management

There are two types of power management that the OS must handle -
System Power Management and Device Power Management. 

Device Power Management deals with the process of placing individual
devices into low-power states while the system is running. This allows
a user to conserve power on devices that are not currently being used,
such as the sound device in my laptop while I write this paper. 

Individual device power management may be invoked explicitly on
devices, or may happen automatically after a device has been idle for
a set of amount of time. Not all devices support run-time power
management, but those that do must export some mechanism for
controlling it in order to execute the user's policy decisions. 


System Power Management is the process by which the entire system is
placed into a low-power state. There are several power states that a
system may enter, depending on the platform it is running on. Many are
similar across platforms, and will be discussed in detail later. The
general concept is that the state of the running system is saved
before the system is powered down, and restored once the system has
regained power. This prevents the system from performing an entire
shutdown and startup sequence.

System power management may be invoked for a number of reasons. It may
automatically enter a low-power state after it has been idle for some
amount of time, after a user closes a lid on a laptop, or when some
critical state has been reached. These are also policy decisions that
are up to the user to configure and require some global mechanism for
controlling.



Device Power Management


Device power management in the kernel is made possible by the new
driver model in the 2.5 kernel. In fact, the driver model was inspired
by the requirement to implement decent power management in the kernel.
The new driver model allows generic kernel to communicate with every
device in the system, regardless of the bus the device resides on, or
the class it belongs to. 

The driver model also provides a hierarchical representation of the
devices in the system. This is

[linuxkernelnewbies] The Old Joel on Software Forum: Part 1 (of 5) - AVL Trees vs. Red-Black Trees?

2009-09-01 Thread peter teoh

http://discuss.fogcreek.com/joelonsoftware1/default.asp?cmd=show&ixPost=22948&ixReplies=15

AVL Trees vs. Red-Black Trees?
Hi,
I worked through the chapter in Introduction to Algorithms by Cormen et
al on red-black trees. One of the problems at the end discusses AVL
trees.

Which, overall, is better to use and why? I didn't see much comparison
between the two, and a Google search didn't yield anything satisfying.

If I'm missing something very obvious, I apologize.
Warren Henning
Tuesday, December 17, 2002
I'm
not sure that it'll be much info for you but trees in Java Collections
are implemented using red-black trees. Which probably doesn't mean a
lot.
Evgeny Goldin
Tuesday, December 17, 2002
So are std::map and set::set in STL.
Ivan-Assen Ivanov
Tuesday, December 17, 2002
Creating a tree which balances nodes will tend to improve the search
time for any random term within the tree.

So the red-black tree is meant to provide a O(log n) time. This is
true so long as it remains balanced, and is true only from the root of
the tree.

Any insertions, unless also balanced will tilt the tree, or sub tree.
To re-balance the tree is supposed to also take O(log n) time, which I
find dubious or at any rate counter intuitive.

Regardless of that, a red-black tree's performance will degrade
depending upon the degree of insertion and rotation about a node takes
place so I would not use it for a dynamic index. In situations such as
a view of fixed data, or a search to fixed data it would be the optimum
layout.
Simon Lucy
Tuesday, December 17, 2002
RB-Trees are, as well as AVL trees, self-balancing. Both of them
provide O(log n) lookup and insertion performance.
The difference is that RB-Trees guarantee O(1) rotations per insert
operation. That is what actually costs performance in real
implementations.
Simplified, RB-Trees gain this advantage from conceptually being 2-3
trees without carrying around the overhead of dynamic node structures.
Physically RB-Trees are implemented as binary trees, the
red/black-flags simulate 2-3 behaviour.
Alex
Tuesday, December 17, 2002
It's
been a long time since any data structure lectures, but my
understanding is that in an AVL tree the difference between the
shortest and longest path to from the root to any leaf is at most one.
In a red-black tree the difference can be a factor of 2.

Both of these give O(log n) for look up, but balancing an AVL tree can
require O(log n) rotations, whilst a red black tree will take at most
two rotations to bring it into balance (though it may have to examine
O(log n) nodes to decide where the rotations are necessary). The
rotations themselves are O(1) operations since you are just moving
pointers around.

You might also want to look up 2-3-4 trees, or more generally B trees
for more balanced tree data structures.
Rob Walker
Tuesday, December 17, 2002
Actually,
AVL trees only guarantee that, for each node in the tree, the heights
of its subtrees differ by at most one. Since the height is defined by
the longest path to a leaf, it makes no guarantees about the ratio
between the shortest path to a leaf and the longest. In fact, it is
possible to generate a tree with the AVL property whose whose
shortest/longest path ratio is arbitrarily bad.

In the worst case, the constant factors for search can be much worse
for an AVL tree than a comparably sized RB Tree. On the other hand, AVL
trees thend to be simpler to implement.
Devil's Advocate
Tuesday, December 17, 2002
"Since
the height is defined by the longest path to a leaf, it makes no
guarantees about the ratio between the shortest path to a leaf and the
longest. In fact, it is possible to generate a tree with the AVL
property whose whose shortest/longest path ratio is arbitrarily bad."

Actually, I think I am wrong about this part. At the very least, I want
to think about it more before I feel comfortable asserting it.
Devil's Advocate
Tuesday, December 17, 2002
If
you require a balanced tree type data structure I recommend taking a
look at Splay trees. Splay trees are cool because they are relatively
easy to implement, and they don't require storing any additional
information at each node.

They offer O(logN) amortised performance. This means that a single
action (insert/update/delete) may require more than O(logN) operations,
but in the long run the average number of operations per action is
O(logN).

You could say that Splay trees adapt to the input data, which I think
is pretty neat.
Peter McKenzie
Tuesday, December 17, 2002
In
fact, we need not compare btw. the longest and the shortest path of a
tree to measure its performance. What we need, instead, is the tree's
longest path, or height, relative to the number of its elements, say N.

We all know that a perfectly balanced tree has (log N) height. It may
be optimal to search, but is however too rigid to use if we wanted to
insert or delete an item.

A Red-Black tree has 2(log N) height, considering the fact

[linuxkernelnewbies] jcm’s blog » Cloning a Fedo ra rawhide virtual machine

2009-09-02 Thread Peter Teoh






http://www.jonmasters.org/blog/category/general/posts-related-to-ongoing-work-on-the-fedora-project/


Cloning
a Fedora rawhide virtual machine
Saturday, August
8th, 2009
 
Setting up a clone of
a Fedora rawhide virtual machine is so simple…


  Create a new
virtual machine instance
  Stop and then copy
the disk image file for the previous VM
  Boot the new VM in
single user mode
  Edit the
/etc/sysconfig/network file to change the hostname
  Edit the
/etc/sysconfig/network-scripts/ifcfg-eth0 file to change the networking
  Do exactly the
same thing in /etc/udev/rules.d/70-persistent-net.rules
  grep through the
filesystem to see where else network data is duplicated.


Notice how more and
more abstraction of network configuration does
not a simpler system make. At least I don’t care about sound on my
virtual machines, so to avoid that fun I simply delete the sound device
whenever I create a new VM. I never use NetworkManager on boxes with
fixed IPs - somehow I don’t think cloning would get any easier (unless
I used DHCP, which does work here but I prefer being certain the box
has a fixed configuration when used for testing) with that turned on.

Jon.

[linuxkernelnewbies] Re: Query related with Book Linux Device Drivers

2009-09-03 Thread Peter Teoh






http://mail.nl.linux.org/kernelnewbies/2004-10/msg00089.html



> Memory Barrier. It enforced in-order execution of
> memory accesses
> either side of this function call - so you guarantee
> everything got
> done that you asked for. Intel people don't (I
> think, but I don't do
> Intel stuff very often) generally have to worry
> about this even on the
> latest cores because any instruction re-ordering is
> transparent to the
> programmer. On RISC based machines such as Sparc
> International (Sun)
> SPARC, AIM PowerPC, IBM POWER, and so on, we have to
> handle the fact
> that memory accesses may be re-ordered by both the
> compiler and/or the
> micro itself in order to streamline the processor
> instruction
> pipeline.
> 

Thanks a lot for explaining this and for your
guidance.
 
> You'll have to provide a reference to the C source
> file where you saw
> these functions. Looks like a test to see what kind
> of memory is at a
> certain address - if you write data to a memory
> location then it will
> change unless it is ROM, but if it is then it will
> return the same
> value every time that you read it. If there is
> nothing at a particular
> address then you will read only random data
> according to whatever
> electical state the memory bus is in - apparently
> sometimes you can
> actually read a valid value back if your memory bus
> has some capacitve
> issue where it acts like a register itself (but
> that's rarely a
> problem these days - it is however enough for me/us
> to take special
> measures in our memory test routines at work).
> 

The source code refers to the examples which comes
along Linux Device Drivers Book [ I downloaded from
site ] and file is skull_init.c.

if ((oldval^newval) == 0xff) {  /* we re-read our
 change: it's ram */
printk(KERN_INFO "%lx: RAM\n", add);
continue;
}
 
if ((oldval^newval) != 0) {  /* random bits
 changed:it's empty */
printk(KERN_INFO "%lx: empty\n", add);
continue;
}
 
I am getting what your are trying to say. As far as
ROM is concerned, we can never write on ROM, so how
can we detect that memory is a part of ROM.

unsigned char oldval, newval; /* values read from
memory   */
unsigned long flags;  /* used to hold system
flags */
unsigned long add, i;
void *base;

/* Use ioremap to get a handle on our region */
base = ioremap(ISA_REGION_BEGIN, ISA_REGION_END -
ISA_REGION_BEGIN);
base -= ISA_REGION_BEGIN;  /* Do the offset once
*/

/* probe all the memory hole in 2KB steps */
for (add = ISA_REGION_BEGIN; add < ISA_REGION_END;
add += STEP) {
	/*
	 * Check for an already allocated region.
	 */
	if (check_mem_region (add, 2048)) {
		printk(KERN_INFO "%lx: Allocated\n", add);
		continue;
	}
	/*
	 * Read and write the beginning of the region and see
what happens.
	 */
	save_flags(flags); 
	cli();
	oldval = readb (base + add);  /* Read a byte */
	writeb (oldval^0xff, base + add);
	mb();
	newval = readb (base + add);
	writeb (oldval, base + add);
	restore_flags(flags);

	if ((oldval^newval) == 0xff) {  /* we re-read our
change: it's ram */
	printk(KERN_INFO "%lx: RAM\n", add);
	continue;
	}
	if ((oldval^newval) != 0) {  /* random bits changed:
it's empty */
	printk(KERN_INFO "%lx: empty\n", add);
	continue;
	}
}

I got the point, that If the old value and new value
are same, then oldval ^ newval will be epmty and hence
it is RAM.

if ((oldval^newval) == 0xff) {  /* we re-read our
change: it's ram */
   printk(KERN_INFO "%lx: RAM\n", add);
   continue;
}

And if oldval ^ newval is not empty, then the random
bit has changed and hence it is epmty [ which has been
explained by you ].

if ((oldval^newval) != 0) {  /* random bits changed:
it's empty */
   printk(KERN_INFO "%lx: empty\n", add);
   continue;
}

Jon, could you please suggest me a book on assembly
language as that will help me to play with bits and
bytes .

Regards
Dinesh

[linuxkernelnewbies] Re: NMI between switch_mm and switch_to: msg#00603 linux-kernel

2009-09-03 Thread Peter Teoh






http://osdir.com/ml/linux-kernel/2009-08/msg00603.html


Ingo Molnar writes:

> * Peter Zijlstra  wrote:
> 
> > On Tue, 2009-07-28 at 14:49 +1000, Paul Mackerras wrote:
> >
> > > Ben H. suggested there might be a problem if we get a
PMU 
> > > interrupt and try to do a stack trace of userspace in
the 
> > > interval between when we call switch_mm() from 
> > > sched.c:context_switch() and when we call
switch_to(). If we 
> > > get an NMI in that interval and do a stack trace of
userspace, 
> > > we'll see the registers of the old task but when we
peek at user 
> > > addresses we'll see the memory image for the new
task, so the 
> > > stack trace we get will be completely bogus.
> > > 
> > > Is this in fact also a problem on x86, or is there
some subtle 
> > > reason why it can't happen there?
> > 
> > I can't spot one, maybe Ingo can when he's back :-)
> > 
> > So I think this is very good spotting from Ben.
> 
> Yeah.
> 
> > We could use preempt notifiers (or put in our own hooks)
to 
> > disable callchains during the context switch I suppose.
> 
> I think we should only disable user call-chains i think - the 
> in-kernel call-chain is still reliable.
> 
> Also, i think we dont need preempt notifiers, we can use a
simple 
> check like this:
> 
> if (current->mm &&
> cpu_isset(smp_processor_id(),
¤t->mm->cpu_vm_mask) {

On x86, do you clear the current processor's bit in cpu_vm_mask when
you switch the MMU away from a task? We don't on powerpc, which would
render the above test incorrect. (But then we don't actually have the
problem on powerpc since interrupts get hard-disabled in switch_mm and
stay hard-disabled until they get soft-enabled.)

[linuxkernelnewbies] Robust futexes - a new approach [LWN.net]

2009-09-05 Thread peter teoh

http://lwn.net/Articles/172149/

Robust futexes - a new approach
[Posted February 15, 2006 by corbet]

One of the many features added during the 2.5 development series was
the
"futex" - a sort of fast, user-space mutual exclusion primitive. In the
non-contended case, futexes can be obtained and released with no kernel
involvement at all, making them quite fast. When contention does happen
(one process tries to obtain a futex currently owned by another), the
kernel is called in to queue any waiting processes and wake them up
when
the futex becomes available. When queueing is not needed, however, the
kernel maintains no knowledge of the futex, keeping its overhead low.
There is one problem with keeping the kernel out of the picture,
however.
If a process comes to an untimely end while holding a futex, there is
no
way to release that futex and let other processes know about the
problem.
The SYSV semaphore mechanism - a much more heavyweight facility - has
an
"undo" mechanism which can be called into play in this sort of
situation,
but there is no such provision for futexes. As a result, a few
different
"robust futex" patches have been put together over the past years; LWN looked at one of them in
January,
2004. These patches have tended to greatly increase the cost of
futexes,
however, and none have been accepted into the mainline.

Ingo Molnar, working with Thomas Gleixner and Ulrich Drepper, has
tossed
aside those years' worth of work and, in a couple of days, produced a new robust futex patch
which,
he hopes, will find its way into the mainline. The new patch has the
advantage of being fast, but, as Ingo notes:

Be warned though - the patchset does things we
normally dont do in Linux, so some might find the approach disturbing.
Parental advice recommended ;-)

The fundamental problem to solve is that the kernel must, somehow, know
about all futexes held by an exiting process in order to release them.
A
past solution has been the addition of a system call to notify the
kernel
of lock acquisitions and releases. That approach defeats one of the
main
features of futexes - their speed. It also adds a record-keeping and
resource limiting problem to the kernel, and suffers from some
problematic
race conditions.

So Ingo's patch takes a different approach. A list of held futexes
is
maintained for each thread, but that list lives in user space. All the
thread has to do is to make a single call to a new system call:

long set_robust_list(struct robust_list_head *head, size_t size);

That call informs the kernel of the location of a linked list of held
futexes in the calling process's address space; there is also a
get_robust_list() call for retrieving that information.
Typically, this call would be made by glibc, and never seen by the
application. Glibc would also take on the task of maintaining the list
of
futexes.

When a process dies, the kernel looks for a pointer to a user-space
futex
list. Should that pointer be found, the kernel will carefully walk
through
it, bearing in mind that, as a user-space data structure, it could be
accidentally or maliciously corrupt. For each held futex, the kernel
will
release the lock and set it to a special value indicating that the
previous
holder made a less-than-graceful exit. It will then wake a waiting
process, if one exists. That process will be able to see that it has
obtained the lock under dubious circumstances (user-space functions
like
pthread_mutex_lock() are able to return that information) and
take
whatever action it deems to be necessary. The kernel will release a
maximum of one million locks; that keeps the kernel from looping
forever on
a circular list. Given the practical difficulties of making a
million-lock
application work at all, that limit should not constrain anybody for
quite
some time.

There is still a race condition here: if a process dies between the
time it
acquires a lock and when it updates the list, that lock might not be
released by the kernel. Getting around that problem involves a bit of
poor
kernel hacker's journaling. The head of the held futex list contains a
single-entry field which can be used to point to a lock which the
application is about to acquire. The kernel will check that field on
exit,
and, if it points to a lock actually held by the application, that lock
will be released with the others. So, if glibc sets that field before
acquiring a lock (and clears it after the list is updated), all locks
held
by the application will be covered.

The discussion on this patch was just beginning when this article
was
written. There is some concern about having the kernel walking through
user-space data structures; the chances of trouble and security
problems
are certainly higher when that is going on. Other issues may yet come
up
as well. But, since this is clearly not a 2.6.16 feature in any case,
there will be time to talk about them.

[linuxkernelnewbies] Scratchbox

2009-09-05 Thread peter teoh






http://www.scratchbox.org/

Scratchbox
Welcome to the scratchbox.org website, the home of the
cross-compilation toolkit project.
Scratchbox is a cross-compilation toolkit designed to make embedded
Linux application development easier.
It also provides a full set of tools to integrate and cross-compile an
entire Linux distribution.
To find out what it can do, take a look at some of the documentation.
Scratchbox is licensed under GNU General Public
License (GPL).
A brief summary of features:

  
Scratchbox is used by Maemo
development platform (Nokia 770).
But it is not restricted to that use.
  
  
Supports ARM and x86 targets (PowerPC, MIPS and CRIS targets are
experimental)
  
  
Especially Debian is
supported, but Scratchbox has also been used to
cross-compile eg. Slackware for
ARM.
  
  
Provides glibc and uClibc
as C-library choices
  
  
Uses either QEMU
or a real target hardware to execute cross-compiled binaries (extremely
useful when cross-compiling software which uses autoconf & co.)
  


News:
2009-08-25 New releases:
scratchbox 1.0.16
Fixes a configuration issue with customized targets. Downloads from Apophis
download page.


2009-08-05 New releases:
scratchbox 1.0.15
Fixes an issue with copying of host files and upgrades gettext to a
less ancient version. Downloads from Apophis
download page.


2009-08-03 New releases:
apt-https 1.0.9
Includes a fix for the extra long dependency information. Downloads
from Apophis
download page.


2009-06-02 New releases:
doctools 1.0.13
Includes a fix for the tetex/fmtutil issue with five year old source
files. Downloads from Apophis
download page.


2009-06-01 New releases:
doctools 1.0.12, cs2007q3-glibc2.5-* 1.0.12-*
Minor upgrades to doctools devkit, recompiled cs2007q3-glibc2.5-* to
include additional profiling patch for arm-linux-gnueabi. Downloads
from Apophis
download page.


2009-04-29 New release:
apt-https devkit 1.0.7
Yet more fixes for the new apt, upgrading from 1.0.5 and/or 1.0.6 is
strongly recommended. Downloads from Apophis
download page.


2009-04-22 New release:
apt-https devkit 1.0.6
Apt does not want to write log anymore by default. Fixes issue with
returning an error code when target has no log directory. Downloads
from Apophis
download page.


2009-04-09 Scratchbox 1.0.14 and
apt-https 1.0.5, doctools 1.0.11, qemu 0.10.0-0sb5 devkits
Various fixes included for apt, fakeroot and the documentation
tools. A new, separate qemu devkit containing up-to-date user space
qemu only. Downloads from Apophis
download page.

[linuxkernelnewbies] [Fwd: Re: question on sched-rt group allocation cap: sched_rt_runtime_us]

2009-09-05 Thread Peter Teoh








--- Begin Message ---

Hi again:

I am copying my test code here. I am really hoping to get some answers/ 
pointers. If there are whitespace/formatting issues in this mail,  
please let me know. I am using an alternate mailer.


Cheers,

Ani


/* Test code to experiment the CPU allocation cap for an FIFO RT thread
 * spinning on a tight loop. Yes, you read it right. RT thread on a
 * tight loop.
*/
#define _GNU_SOURCE

#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

unsigned long reg_count;

void *fifo_thread(void *arg)
{
int core = (int) arg;
int i, j;
cpu_set_t cpuset;
struct sched_param fifo_schedparam;
int fifo_policy;
unsigned long start, end;
unsigned long fifo_count = 0;

CPU_ZERO(&cpuset);
CPU_SET(core, &cpuset);

assert(sched_setaffinity(0, sizeof cpuset, &cpuset) == 0);

/* RT priority 1 - lowest */
fifo_schedparam.sched_priority = 1;
assert(pthread_setschedparam(pthread_self(), SCHED_FIFO,  
&fifo_schedparam) == 0);

start = reg_count;
printf("start reg_count=%llu\n", start);

for(i = 0; i < 5; i++) {
for(j = 0; j < UINT_MAX/10; j++) {
  fifo_count++;
}
}
printf("\nRT thread has terminated\n");
end = reg_count;
printf("end reg_count=%llu\n", end);
printf("delta reg count = %llu\n", end-start);
printf("fifo count = %llu\n", fifo_count);
printf("% = %f\n", ((float)(end-start)*100)/(float)fifo_count);

return NULL;
}

void *reg_thread(void *arg)
{
int core = (int) arg;
int i, j;
int new_nice;
cpu_set_t cpuset;
struct sched_param fifo_schedparam;
int fifo_policy;
/* let's renice it to highest priority level */
new_nice = nice(-20);
printf("new nice value for regular thread=%d\n", new_nice);
printf("regular thread dispatch(%d)\n", core);

CPU_ZERO(&cpuset);
CPU_SET(core, &cpuset);

assert(sched_setaffinity(0, sizeof cpuset, &cpuset) == 0);

for(i = 0; i < 5; i++) {
  for(j = 0; j < UINT_MAX/10; j++) {
reg_count++;
  }
}
printf("\nregular thread has terminated\n");

return NULL;
}


int main(int argc, char *argv[])
{
char *core_str = NULL;
int core;
pthread_t tid1, tid2;
pthread_attr_t attr;

if(argc != 2) {
fprintf(stderr, "Usage: %s \n", argv[0]);
return -1;
}
reg_count = 0;

core = atoi(argv[1]);

pthread_attr_init(&attr);
assert(pthread_attr_setschedpolicy(&attr, SCHED_FIFO) == 0);
assert(pthread_create(&tid1, &attr, fifo_thread, (void*)core) ==  
0);


assert(pthread_attr_setschedpolicy(&attr, SCHED_OTHER) == 0);
assert(pthread_create(&tid2, &attr, reg_thread, (void*)core) == 0);

pthread_join(tid1, NULL);
pthread_join(tid2, NULL);

return 0;
}

-

From: Anirban Sinha
Sent: Fri 9/4/2009 5:55 PM
To:
Subject: question on sched-rt group allocation cap: sched_rt_runtime_us

Hi Ingo and rest:

I have been playing around with the sched_rt_runtime_us cap that can  
be used to limit the amount of CPU time allocated towards scheduling  
rt group threads. I am using 2.6.26 with CONFIG_GROUP_SCHED disabled  
(we use only the root user in our embedded setup). I have no other CPU  
intensive workloads (RT or otherwise) running on my system. I have  
changed no other scheduling parameters from /proc.


I have written a small test program that:

(a) forks two threads, one SCHED_FIFO and one SCHED_OTHER (this thread  
is reniced to -20) and ties both of them to a specific core.
(b) runs both the threads in a tight loop (same number of iterations  
for both threads) until the SCHED_FIFO thread terminates.
(c) calculates the number of completed iterations of the regular  
SCHED_OTHER thread against the fixed number of iterations of the  
SCHED_FIFO thread. It then calculates a percentage based on that.


I am running the above workload against varying sched_rt_runtime_us  
values (200 ms to 700 ms) keeping the sched_rt_period_us constant at  
1000 ms. I have also experimented a little bit by decreasing the value  
of sched_rt_period_us (thus increasing the sched granularity) with no  
apparent change in behavior.


My observations are listed in tabular form. The numbers in the two  
columns are:


rt_runtime_us /
rt_period_us

Vs

completed iterations of reg thr /
all iterations of RT thr (in %)


0.2   100 % (reg thread completed all its iterations).
0.3   73 %
0.4   45 %
0.5   17 %
0.6   0 % (reg thr completely throttled. Never ran)
0.7   0 %

This result kind of baffles me. Even when we cap the RT group to a  
fraction of 0.6 of overall CPU time, the rest 0.4 \should\ still be  
available for running regular threads. So my SCHED_OTHER \should\ make  
some progress as opposed to being completely throttled. Similarly,  
with any fraction less than 0.5, the SCHED_OTHER should complete  
before SCHED_FIFO.


I do not have an easy way to verify my results over the latest kernel  
(2.6.31).

[linuxkernelnewbies] USB networking - maemo.org wiki

2009-09-05 Thread peter teoh






http://wiki.maemo.org/USB_networking

USB networking

This page describes how the maemo platform can be turned into a USB
network device. The first part describes how to configure the Nokia
tablet as a USB pluggable network device. The second part describes how
to configure various platforms to use the Nokia tablet as a network
device. This article is based loosely on the Maemo 3.x configuring USB networking HOWTO.

You might want to use the tablet as a USB network device to log
into your tablet remotely, or to transfer data from your tablet to
another computer, in a situation where wifi or bluetooth are not an
option. If you wish to connect your tablet to a Linux machine over
TCP/IP, the PC connectivity section in the Maemo SDK
documentation also contains useful information.

WARNING
Currently there is a bug in the g_ether.ko driver of OS2008 (both 4.0.1
and 4.1) which prevents USB networking from working correctly with
Windows machines (but not with Linux machines). See bug
#3243
for details. The bug was introduced somewhere between kernels 2.6.18
and 2.6.21, so Maemo versions based on 2.6.18 kernels (e.g. OS2007 and
earlier) will work.


  

  
  
  Contents
  [hide]
  
1 Tablet USB network
configuration
  
1.1 USB statusbar
plugin
1.2 Behind the scenes
1.3 Starting and
stopping USB network mode
  

2 Host USB Network
Configuration
  
2.1 Windows
  
2.1.1 Requirements
2.1.2 Preparing the
Windows host
  

2.2 Linux
  
2.2.1 Kernel
Configuration
2.2.2 Fedora
2.2.3 Debian
2.2.4 Configuring the
host as a gateway
2.2.5 Configuring the
host as a bridge
2.2.6 Configuring the
host firewall
  

2.3 Testing the
connection
2.4 Known issues
2.5 Frequently asked
questions
  

  
  

  


[edit]
 Tablet USB network configuration 

[edit]  USB statusbar plugin 
The usb-otg-plugin
applet [BAD LINK]
lets you set up USB networking tablet-side, and switch between host and
client mode. This is the easy way to do things and no other tablet-side
configuration is required.

Alternatively you could try the usb networking applet found here:

 http://repository.maemo.org/extras-devel/pool/diablo/free/m/maemo-control/


[edit]  Behind the scenes 
In normal circumstances, the USB Mass storage driver had control of
the USB hardware. USBNet allows the g_ether network driver to take
control of the USB interface.

After installing USB networking, set up a dummy access point by
running the following:

gconftool -s -t string /system/osso/connectivity/IAP/DEFAULT/type DUMMY

You should see a "DEFAULT" connection appear in the connection
manager.

WARNINGCurrently there is a bug in Diablo that causes
DUMMY connections not to show up in connection manager, a semi-official
fix is outlined in bug
#3306.


[edit]
 Starting and stopping USB network mode 
To easily start & stop USB network mode, place the following
script in /etc/init.d/usbnet on your tablet. To do this, you will need root
access to the device.

While switching between modes by running the script, it is important
to disconnect the USB cable.

#! /bin/sh
#
# Startup script for USBnet (networking, instead of USB Mass Storage behaviour)
# Author: Michael Mlivoncic

PATH=/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin
NAME=usbnet
DESC="USB Networking for Nokia Internet Tablets"
INITFILE=/etc/init.d/$NAME

case "$1" in
start)
umount /media/mmc1
umount /media/mmc2
sleep 2
USBNET="`lsmod | grep g_ether`"
KERNEL_VERSION="`uname -r`"
if [ "$USBNET" = "" ]
then
   echo "Entering Ethernet via USB mode (g_ether)..."
   insmod /mnt/initfs/lib/modules/$KERNEL_VERSION/g_ether.ko
   echo "Waiting, then bringing up the usb0 interface.."
   sleep 1
   /sbin/ifup usb0
else
  echo "Already in Ethernet-via-USB mode..."
  echo "Try ping 192.168.2.15"
fi
    ;;
stop)
 echo "switching back to USB Mass Storage mode..."
 echo "removing module g_ether"
 /sbin/ifdown usb0
 sleep 2
 rmmod g_ether
    ;;
*)
printf "Usage: $INITFILE {start|stop}\n" >&2
exit 1
    ;;
esac

exit 0

This script is quite basic, but can be run automatically at
start-up, or manually using the command:

sudo /etc/init.d/usbnet [start|stop]

to activate or deactivate USB networking.

There is a default USB network interface configuration on Nokia
N800 and 770 tablets. In the file /etc/network/interfaces, you should
see a section which looks like this:

auto usb0
iface usb0 inet static
   address 192.168.2.15
   netmask 255.255

[linuxkernelnewbies] mpatrol

2009-09-06 Thread peter teoh






http://mpatrol.sourceforge.net/


 mpatrol 
 
Overview
 The mpatrol
library is a powerful debugging tool that attempts to diagnose run-time
errors that are caused by the wrong use of dynamically allocated
memory. It acts as a malloc() debugger for debugging dynamic memory
allocations, although it can also trace and profile calls to malloc()
and free() too. If you don't know what the malloc() function or
operator new[] do then this library is probably not for you. You have
to have a certain amount of programming expertise and a knowledge of
how to run a command line compiler and linker before you should attempt
to use this. 
 Along with providing a
comprehensive and configurable log of all dynamic memory operations
that occurred during the lifetime of a program, the mpatrol
library performs extensive checking to detect any misuse of dynamically
allocated memory. All of this functionality can be integrated into
existing code through the inclusion of a single header file at
compile-time. On UNIX and Windows platforms (and AmigaOS when using
GCC) this may not even be necessary as the mpatrol library can
be linked with existing object files at link-time or, on some
platforms, even dynamically linked with existing programs at run-time. 
 All logging and
tracing output from the mpatrol library is sent to a separate
log file in order to keep its diagnostics separate from any that the
program being tested might generate. A wide variety of library settings
can also be changed at run-time via an environment variable, thus
removing the need to recompile or relink in order to change the
library's behaviour. 
 A file containing a
summary of the memory allocation profiling statistics for a particular
program can be produced by the mpatrol library. This file can
then be read by a profiling tool which will display a set of tables
based upon the accumulated data. The profiling information includes
summaries of all of the memory allocations listed by size and the
function that allocated them and a list of memory leaks with the call
stack of the allocating function. It also includes a graph of all
memory allocations listed in tabular form, and an optional graph
specification file for later processing by the dot graph visualisation
package. 
 A file containing a
concise encoded trace of all memory allocations and deallocations made
by a program can also be produced by the mpatrol library. This
file can then be read by a tracing tool which will decode the trace and
display the events in tabular or graphical form, and also display any
relevant statistics that could be calculated. 
 The mpatrol
library has been designed with the intention of replacing calls to
existing C and C++ memory allocation functions as seamlessly as
possible, but in many cases that may not be possible and slight code
modifications may be required. However, a preprocessor macro containing
the version of the mpatrol library is provided for the purposes
of conditional compilation so that release builds and debug builds can
be easily automated. 
 
Releases
 The mpatrol
library is freely distributable software and is covered by the GNU
Lesser General Public License. The latest version is 1.5.1 and was
released on the 16th of December, 2008. 
 The source code for all
of the mpatrol releases is available for download from the SourceForge
Subversion repository. Prebuilt binaries for specific platforms are
not available here but may be found at other sites. 
 A gzipped tar archive of
the source code for the older 1.4.8 release (including formatted
documentation) can be downloaded from the SourceForge
downloads area. In addition, several people have contributed code
to mpatrol and some of that code has not made it into the
mpatrol distribution. They are available from the same location as
patches that can be applied to the mpatrol 1.4.8 distribution,
but they are normally only useful for specific situations and are
untested. See the mpatrol manual for a description of the
patches. 
 A (not very up-to-date)
list summarising the platforms that the mpatrol library has
been built on (and the major features that are supported on each
platform) is shown here.
Note that it should be fairly easy to build mpatrol on a system
that is not currently supported, but some work may be required to
enable many of the advanced features that mpatrol might have
support for on such a system. 
 
Documentation
 Online documentation for
the latest 1.5.1 release of mpatrol is available here.
Alternatively, you may prefer to download the PDF versions of the mpatrol
manual
and quick
reference card for the 1.5.1 release. Each release of mpatrol
also contains UNIX manual pages for the library functions and
associated command line tools.

[linuxkernelnewbies] User-space device drivers [LWN.net]

2009-09-06 Thread Peter Teoh






http://lwn.net/Articles/66829/


User-space device drivers
[Posted January 20, 2004 by corbet]
 


Peter Chubb works with the Gelato
project, which works toward better Linux performance on the IA-64
architecture. Among other things, Peter is responsible for the 64-bit
sector support which went into the 2.5 kernel. At Linux.Conf.Au, Peter
discussed device drivers. He pointed out that drivers, while making up
roughly
50% of the code in the kernel, are responsible for 85% of all kernel
bugs.
Drivers tend to be written by people who would not normally be
considered
kernel hackers: hardware engineers, for example. These people tend to
have
a hard time dealing with the special nature of kernel programming,
where
interfaces are fluid, bugs are lethal, and many normal development
tools
are not available.
Driver authors - and their users - might have a much easier time if
drivers could be written to run in user space. In addition to
mitigating
the above-mentioned kernel programming issues, user-space driver
development would allow the creation of a stable ABI; it also,
presumably,
would eliminate any licensing issues associated with closed-source
drivers. User-space driver writers could also use any language they
choose, "even Python."

Peter and company have set out to make user-space drivers possible.
Some
of the necessary pieces are already in place. Standard Linux will allow
a
suitably privileged process to access I/O ports, for example.
Low-address
memory-mapped I/O registers can be accessed via a mmap() of
/dev/mem. There is also an interface which gives user-space
processes access to the PCI configuration space; this interface works
via
ioctl() calls on /proc files, though, thus upsetting
the
sensibilities of most kernel hackers. These facilities are enough to
allow
some user-space drivers (particularly XFree86) to work, but they are
not
sufficient to enable a wider range of drivers to move out of the
kernel.

One of the big gaps is interrupts; there is no way, currently, for
user-space processes to register and respond to device interrupts. A
patch
from the Gelato project addresses this gap by creating a set of files
under
/proc. A process wanting to deal with interrupt 11, say, would
open /proc/irq/11/irq. Reading the resulting file descriptor
enables the interrupt and blocks the process until a device interrupt
happens; control then returns to user-space, which can figure out what
to
do. A typical user-space driver will set up a separate thread to wait
for
interrupts in this manner; the actual work can be handed off to a
different
thread within the program.

Peter presented some graphs showing that interrupt response times
suffer
very little when interrupt handlers run in user space. The main
limitation
at the moment seems to be the fact that shared interrupts are not
supported.

Another thing that user-space processes cannot normally do is set up
DMA
operations. To enable DMA, a new set of system calls has been added.
The
interface appears to be in a bit of flux, but it will be something like
the
following. The driver starts by opening a special file for device
operations:


int usr_pci_open(int bus, int slot, int function);


There is then a function for setting up DMA mappings:


int usr_pci_map(int fd, int cmd, struct mapping_info *info);


The cmd argument can be USR_ALLOC_CONSISTENT to set
up a
long-lived consistent mapping, or USR_MAP to create a
streaming,
scatter/gather mapping. In either case, the info argument is
used
to pass in the relevant information, and to get the necessary
address(es).
There is also, of course, a USR_UNMAP operation for when the
DMA
is complete.

Many user-space drivers will be able to obtain their requests
directly from
user space; the X server works in this way. Many other drivers,
however,
will need to hook into the kernel for this information. The current
patch
includes a mechanism (Peter described it as ugly) for a user-space
block
driver to register itself with the kernel and get I/O requests. It
works
by opening another special file and using it to communicate requests
and
responses back and forth. A similar interface apparently exists for
network drivers.

Getting a user-space driver patch into the kernel could be an
interesting
challenge. Many kernel hackers, certainly, resist changes that look
like
they are pushing Linux toward something that looks like a microkernel
architecture - or which might legitimize binary-only drivers. On the
other
hand, some drivers bring a great deal of baggage into the kernel with
them
which might be better kept in user space; think of some of the code
required by some sound drivers or the modulation software needed by
"linmodem"
drivers. The ability to run these drivers in user space could be a nice
thing to have.

See the
Gelato user-level drivers page for more information.

[linuxkernelnewbies] Handling interrupts in user space [LWN.net]

2009-09-06 Thread Peter Teoh






http://lwn.net/Articles/127698/


Handling interrupts in user space
[Posted March 15, 2005 by corbet]
 


Peter Chubb has long been working on a project to move device drivers
into
user space. Getting drivers out of the kernel, he points out, would
have a
number of benefits. Faults in drivers (the source of a large percentage
of
kernel bugs) would be less likely to destabilize the entire system.
Drivers could be easily restarted and upgraded. And a user-space
implementation would make it possible to provide a relatively stable
driver
API, which would appeal to many vendors.
Much of the support needed for user-space drivers is already in
place. A
process can communicate with hardware by mapping the relevant I/O
memory
directly into its address space, for example; that is how the X server
works with video adaptors. One piece, however, is missing:
user-space drivers cannot handle device interrupts. In many cases, a
proper driver cannot be written without using interrupts, so a
user-space
implementation is not possible.

Peter has now posted his
user-space interrupts
patch for review and possible inclusion. The mechanism that he
ended
up with is simple and easy to work with, but it suffers from an
important
limitation. 
The mechanism is this: a process wishing to respond to interrupts
opens a
new /proc file; for IRQ 10, the file would be
/proc/irq/10/irq. A read on that file will yield the number of
interrupts which have occurred since the last read. If no interrupts
have
occurred, the read() call will block until the next interrupt
happens. The select() and poll() system calls are
properly supported, so it is possible to include interrupt handling as
just
another thing to do in an event loop. 
On the kernel side, the real interrupt handler looks like this:


static irqreturn_t irq_proc_irq_handler(int irq, void *vidp, 
struct pt_regs *regs)
{
 	struct irq_proc *idp = (struct irq_proc *)vidp;
 
 	BUG_ON(idp->irq != irq);
 	disable_irq_nosync(irq);
 	atomic_inc(&idp->count);
 	wake_up(&idp->q);
 	return IRQ_HANDLED;
}


In other words, all it does is count the interrupt and wake up any
process
that might be waiting to handle it.

The handler also disables the interrupt before returning. There is
an
important reason for this action: since the
handler knows nothing of the device which is actually interrupting, it
is
unable to acknowledge or turn off the interrupt. So, when the handler
returns, the device will still be signalling an interrupt. If the
interrupt were not disabled in the processor (or the APIC), the
processor
would be interrupted (and the handler called) all over again,
repeatedly -
at least, when level-triggered interrupts are in use. Disabling the
interrupt allows life to go on until the user-space process gets
scheduled
and is able to tend to the interrupting device.

There is a problem here, however: interrupt lines are often shared
between
devices. Disabling a shared interrupt shuts it off for all devices
using
that line, not just the one being handled by a user-space driver. It is
entirely possible that masking that interrupt will block a device which
is
needed by the user-space handler - a disk controller, perhaps. In that
case, the system may well deadlock. For this reason, the patch does not
allow user-space drivers to work with shared interrupts. This
restriction
avoids problems, but it also reduces the utility of the whole thing.

One possible solution was posted
by Alan
Cox. He would require user-space processes to pass a small structure
into
the kernel describing the hardware's IRQ interface. It would be just
enough for the kernel to tell if a particular device is interrupting,
acknowledge that interrupt, and tell the device to shut
up. With that in place, the kernel could let user space deal with what
the
device really needs while leaving the interrupt enabled. It has been pointed out that this
simple scheme would not
work with some of the more complicated hardware, but it would be a step
in
the right direction regardless.

Meanwhile, Michael Raymond described
a
different user-space interrupt implementation (called "User Level
Interrupt" or ULI) done at SGI. This patch is significantly more
complicated. In this scheme, a user-space driver would register an
interrupt handler function directly with the kernel. When an interrupt
happens, the ULI code performs some assembly-code black magic so that
its
"return from interrupt" instruction jumps directly into the user-space
handler, in user mode. Once that handler returns, the ULI library
writes a
code to a magic device which causes the kernel stack and related data
structures to be restored to their pre-interrupt state. The
implementation
is more complex, and it currently only works on the ia-64 architecture,
but
it could conceivably offer better performance than the /proc
method.

[linuxkernelnewbies] Gigabit Ethernet Jumbo Frames

2009-09-06 Thread peter teoh





http://sd.wareonearth.com/~phil/jumbo.html

See also Jumbo
Frame Information

Gigabit Ethernet Jumbo Frames
And why you should care


Phil Dykstra
Chief Scientist
WareOnEarth Communications, Inc.
p...@wareonearth.com
20 December 1999



Whether or not Gigabit Ethernet (and beyond) should support frame
sizes (i.e. packets) larger than 1500 bytes has been a topic of
great debate. With the explosive growth of Gigabit ethernet, the
impact of this decision is critically important and will affect
Internet performance for years to come.

Most of the debate about jumbo frames has focused on local
area network performance and the impact that frame size has on
host processing requirements, interface cards, memory, etc.
But what is less well known, and of critical concern for high
performance computing, is the impact that frame size has on
wide area network performance. This document discusses why
you should care, and about the largely ignored but important impact
that frame size has on the wide area performance of TCP.

How jumbo is a jumbo frame anyway?
Ethernet has used 1500 byte
frame sizes since it was created (around 1980). To maintain
backward compatibility, 100 Mbps ethernet used the same size,
and today "standard" gigabit ethernet is also using 1500 byte
frames. This is so a packet to/from any combination of 10/100/1000
Mbps ethernet devices can be handled without any layer two
fragmentation or reassembly.
"Jumbo frames" extends ethernet to 9000 bytes. Why 9000? First
because ethernet uses a 32 bit CRC that loses its effectiveness
above about 12000 bytes. And secondly, 9000 was large enough
to carry an 8 KB application datagram (e.g. NFS) plus packet
header overhead. Is 9000 bytes enough? It's a lot better than
1500, but for pure performance reasons there is little reason
to stop there. At 64 KB we reach the limit of an IPv4
datagram, while IPv6 allows for packets up to 4 GB in size.
For ethernet however, the 32 bit CRC limit is hard to change, so
don't expect to see ethernet frame sizes above 9000 bytes anytime
soon.

How can jumbo frames and 1500 byte frames coexist?
Two basic approaches exist:

  On a port by port basis, where everything "downstream" from
a given port is known to support jumbo frames.
  
  Using 802.1q Virtual LANs, where jumbo frame and non-jumbo
frame devices are segregated to different VLANs.
  

What frame sizes are actually being used?


The above graph is from a study[1] of traffic on the InternetMCI
backbone in 1998. It shows the distribution of packet sizes
flowing over a particular backbone OC3 link. There is clearly
a wall at 1500 bytes (the ethernet limit), but there is
also traffic up to the 4000 byte FDDI MTU. But here is a more
surprising fact: while the number of packets larger than
1500 bytes appears small, more than 50% of the bytes were
carried by such packets because of their larger size.
Also, the above traffic was limited by FDDI interfaces (thus the
4000 byte limit). Many high performance flows have been achieved
over ATM WAN's offering 9180 byte MTU paths.
Local performance issues


Smaller frames usually mean more CPU interrupts and more
processing overhead for a given data transfer size. Often
the per-packet processing overhead sets the limit of TCP
performance in the LAN environment. The above graph, from
a white paper[2] by Alteon is an often cited study showing
an example where jumbo frames provided 50% more throughput
with 50% less CPU load than 1500 byte frames.
Such local overhead can be reduced by improved
system design, offloading work to the NIC interface cards,
etc. But however you feel about these often debated local
performance issues, it is the WAN that we are most concerned
about here.

WAN TCP performance issues
The performance of TCP over wide area networks (the Internet)
has been extensively studied and modeled. One landmark paper
by Matt Mathis et al.[3] explains how TCP throughput has an upper
bound based on the following parameters:
	Throughput <= ~0.7 * MSS / (rtt * sqrt(packet_loss))


So maximum TCP throughput is directly proportional to the
Maximum Segment Size (MSS, which is MTU minus TCP/IP headers).
All other things being equal,
you can double your throughput by doubling the packet size!
This relationship seems to have escaped most of the arguments
surrounding jumbo frames.
[Packet_loss may also increase with MSS size, but does so at
a sub-linear rate, and in any case has an inverse square effect
on throughput, i.e. MSS size still dominates throughput.]
In the local area network or campus environment, rtt and packet
loss are both usually small enough that factors other than the
above equation set your performance limit (e.g. raw available
link bandwidths, packet forwarding speeds, host CPU limitations,
etc.). In the WAN however, rtt and packet loss are often rather
large and something that the end systems can not control.
Thus their only hope for improved performance in the wide area
is to use larger packet sizes.

Let's take an exam

[linuxkernelnewbies] NSDI 2009: Day 3: Wireless #2: Programming and Transport

2009-09-07 Thread Peter Teoh






http://eurosys.org/blog/?p=217



« NSDI
2009: Day 2: Green Networked Systems
NSDI
09: Day 3: Routing »


NSDI 2009: Day 3: Wireless #2: Programming and Transport

Session Chair: Dina Katabi, Massachusetts
Institute of Technology
Softspeak: Making VoIP Play Well in Existing 802.11
Deployments
Patrick Verkaik, Yuvraj Agarwal, Rajesh Gupta, and
Alex C. Snoeren, University of California, San Diego
——-
We have many VoIP users over 802.11
- However, 802.11 is designed for data traffic.
- What is the call quality for them?
- What is its impact on users who only transfer data?
There are problems which lead to degradation of call quality.
- Exponential back-off when contention happens
- Framing overhead per packet
Possible Solutions:
- Decrease the packet rate
- Use higher speed networks like 802.11g
- It would not help, the degradation problem still exist
- Use 802.11e which prioritize VoIP traffic
- It would increase contention
Our solution:
- For uplink: prioritized TDMA
- Stablish an schedule for sending packets
- Sync clock between nodes
- Compete with non-TDMA traffics
- use several level of prioritization
- For downlink: use an aggregator
- Send the aggregated traffic towards only one station
- The other stations must overhear that
Evaluation:
- On 801.11b and 802.11g
- The improvement on 802.11g is less
Q: Your evaluation did not cover the case for multiple TCP traffics
A: We had experiments with Web traffic which is the case of multiple
TCP traffics
Q: How does it work in practice? In practice, we have collisions and
also assigning slides between multiple hops is a big challenge!
A: -
Q: What is the delay performance?
A: -
Q: Why not just give higher priority to short packets?
A: -
Q: Why not use a separate single channel for all VoIP traffics?
A: Yes, why not?


Block-switched Networks: A New Paradigm for
Wireless Transport
Ming Li, Devesh Agrawal, Deepak Ganesan, and Arun Venkataramani, University
of Massachusetts Amherst
—
TCP performs bad over wireless.
1) It is because end-to-end rate control mechanism is too conservative
- It leads to redundant retransmissions
2) It uses packets as unit of control
3) It has complex cross-layer interaction
Re-design:
1) End-to-end -> hop by hop
2) Packets -> Blocks
3) Complexity -> minimalism
Techniques:
- virtual retransmission
- use data in cache for retransmission
- Back pressure
- limit # of outstanding blocks per flow at forwarder node
- ACK withholding
Q: What about connections to wired networks?
A: Future work
Q: What about responsiveness? For SSH traffics, the responsiveness
is very important.
A: In the evaluation, we have results related to small files.
Q: Your solution does not cover the case that two routers are
feeding a downstream router.
A: We treat them similarly.
  This entry was posted on
Wednesday, April 29th, 2009 at 1:02 pm and is filed under NSDI 2009. You
can follow any responses to this entry through the RSS 2.0 feed. You
can leave a response,
or trackback from your own site.  



Leave a Reply
You must be logged
in to post a comment.

[linuxkernelnewbies] L1/2/3 cache information - how to derive it?

2009-09-07 Thread Peter Teoh






http://www.cs.fsu.edu/~baker/devices/lxr/http/source/linux/arch/x86/kernel/cpu/common.c


122 
123 void __cpuinit display_cacheinfo(struct cpuinfo_x86 *c)
124 {
125 unsigned int n, dummy, ecx, edx, l2size;
126 
127 n = cpuid_eax(0x8000);
128 
129 if (n >= 0x8005) {
130 cpuid(0x8005, &dummy, &dummy, &ecx, &edx);
131 printk(KERN_INFO "CPU: L1 I Cache: %dK (%d bytes/line), D cache %dK (%d bytes/line)\n",
132 edx>>24, edx&0xFF, ecx>>24, ecx&0xFF);
133 c->x86_cache_size=(ecx>>24)+(edx>>24);  
134 }
135 
136 if (n < 0x8006) /* Some chips just has a large L1. */
137 return;
138 
139 ecx = cpuid_ecx(0x8006);
140 l2size = ecx >> 16;
141 
142 /* do processor-specific cache resizing */
143 if (this_cpu->c_size_cache)
144 l2size = this_cpu->c_size_cache(c,l2size);
145 
146 /* Allow user to override all this if necessary. */
147 if (cachesize_override != -1)
148 l2size = cachesize_override;
149 
150 if ( l2size == 0 )
151 return; /* Again, no L2 cache is possible */
152 
153 c->x86_cache_size = l2size;
154 
155 printk(KERN_INFO "CPU: L2 Cache: %dK (%d bytes/line)\n",
156l2size, ecx & 0xFF);
157 }
158 
159 /* Naming convention should be:  [()] */
160 /* This table only is used unless init_() below doesn't set it; */
161 /* in particular, if CPUID levels 0x8002..4 are supported, this isn't used */
162 
163 /* Look up CPU names by table lookup. */
164 static char __cpuinit *table_lookup_model(struct cpuinfo_x86 *c)
165 {
166 struct cpu_model_info *info;
167 
168 if ( c->x86_model >= 16 )
169 return NULL;/* Range check */
170 
171 if (!this_cpu)
172 return NULL;
173 
174 info = this_cpu->c_models;
175 
176 while (info && info->family) {
177 if (info->family == c->x86)
178 return info->model_names[c->x86_model];
179 info++;
180 }
181 return NULL;/* Not found */
182 }
183 
184 
185 static void __cpuinit get_cpu_vendor(struct cpuinfo_x86 *c, int early)
186 {
187 char *v = c->x86_vendor_id;
188 int i;
189 static int printed;
190 
191 for (i = 0; i < X86_VENDOR_NUM; i++) {
192 if (cpu_devs[i]) {
193 if (!strcmp(v,cpu_devs[i]->c_ident[0]) ||
194 (cpu_devs[i]->c_ident[1] && 
195  !strcmp(v,cpu_devs[i]->c_ident[1]))) {
196 c->x86_vendor = i;
197 if (!early)
198 this_cpu = cpu_devs[i];
199 return;
200 }
201 }
202 }
203 if (!printed) {
204 printed++;
205 printk(KERN_ERR "CPU: Vendor unknown, using generic init.\n");
206 printk(KERN_ERR "CPU: Your system may be unstable.\n");
207 }
208 c->x86_vendor = X86_VENDOR_UNKNOWN;
209 this_cpu = &default_cpu;
210 }
211 
212 
213 static int __init x86_fxsr_setup(char * s)
214 {
215 setup_clear_cpu_cap(X86_FEATURE_FXSR);
216 setup_clear_cpu_cap(X86_FEATURE_XMM);
217 return 1;
218 }
219 __setup("nofxsr", x86_fxsr_setup);
220 
221 
222 static int __init x86_sep_setup(char * s)
223 {
224 setup_clear_cpu_cap(X86_FEATURE_SEP);
225 return 1;
226 }
227 __setup("nosep", x86_sep_setup);
228 





311 /* Do minimum CPU detection early.
312Fields really needed: vendor, cpuid_level, family, model, mask, cache alignment.
313The others are not touched to avoid unwanted side effects.
314 
315WARNING: this function is only called on the BP.  Don't add code here
316that is supposed to run on all CPUs. */
317 static void __init early_cpu_detect(void)
318 {
319 struct cpuinfo_x86 *c = &boot_cpu_data;
320 
321 c->x86_cache_alignment = 32;
322 c->x86_clflush_size = 32;
323 
324 if (!have_cpuid_p())
325 return;
326 
327 cpu_detect(c);
328 
329 get_cpu_vendor(c, 1);
330 
331 switch (c->x86_vendor) {
332 case X86_VENDOR_AMD:
333 early_init_amd(c);
334 break;
335 case X86_VENDOR_INTEL:
336 early_init_intel(c);
337 break;
338 }
339 
340 early_get_cap(c);
341 }
342 



Notice how the above function have evolved to the following (2.6.31-rc9):

514 void __cpuinit cpu_detect(struct cpuinfo_x86 *c)
515 {
516 /* Get vendor name */
517 cpuid(0x, (unsigned int *)&c->cpuid_level,
518   (unsigned int *)&c->x

[linuxkernelnewbies] MIT OpenCourseWare | Electrical Engineering and Computer Science | 6.046J Introduction to Algorithms (SMA 5503), Fall 2005 | Video Lectures

2009-09-07 Thread peter teoh






http://ocw.mit.edu/OcwWeb/Electrical-Engineering-and-Computer-Science/6-046JFall-2005/VideoLectures/

Video Lectures
Audio/video for lectures 20 and 21 are not available.


  Lecture notes files.

  
SES #
TOPICS
  
  


  1
  Administrivia
- Introduction - Analysis of Algorithms, Insertion Sort, Mergesort


  2
  Asymptotic
Notation - Recurrences - Substitution, Master Method


  3
  Divide-and-Conquer:
Strassen, Fibonacci, Polynomial Multiplication


  4
  Quicksort,
Randomized Algorithms


  5
  Linear-time
Sorting: Lower Bounds, Counting Sort, Radix Sort 


  6
  Order
Statistics, Median


  7
  Hashing,
Hash Functions


  8
  Universal
Hashing, Perfect Hashing


  9
  Relation
of BSTs to Quicksort - Analysis of Random BST


  10
  Red-black
Trees, Rotations, Insertions, Deletions


  11
  Augmenting
Data Structures, Dynamic Order Statistics, Interval Trees 


  12
  Skip
Lists


  13
  Amortized
Algorithms, Table Doubling, Potential Method


  14
  Competitive
Analysis: Self-organizing Lists


  15
  Dynamic
Programming, Longest Common Subsequence


  16
  Greedy
Algorithms, Minimum Spanning Trees


  17
  Shortest
Paths I: Properties, Dijkstra's Algorithm, Breadth-first Search


  18
  Shortest
Paths II: Bellman-Ford, Linear Programming, Difference Constraints


  19
  Shortest
Paths III: All-pairs Shortest Paths, Matrix Multiplication,
Floyd-Warshall, Johnson


  20
  Quiz 2 Review
  Note: No audio or video is available for this session.


  21
  Ethics, Problem Solving (Mandatory
Attendance)
  Note: No audio or video is available for this session.


  22
  Advanced
Topics


  23
  Advanced
Topics (cont.)


  24
  Advanced
Topics (cont.)


  25
  Advanced
Topics (cont.) - Discussion of Follow-on Classes

[linuxkernelnewbies] Lecture 1: Administrivia, Introduction, Analysis of Algorithms, Insertion Sort, Mergesort

2009-09-07 Thread peter teoh






http://videolectures.net/mit6046jf05_leiserson_lec01/

[linuxkernelnewbies] ACPI power-management

2009-09-07 Thread Peter Teoh








/*
 * ACPI power-managed devices may be controlled in two ways:
 * 1. via "Device Specific (D-State) Control"
 * 2. via "Power Resource Control".
 * This module is used to manage devices relying on Power Resource
Control.
 * 
 * An ACPI "power resource object" describes a software controllable
power
 * plane, clock plane, or other resource used by a power managed device.
 * A device may rely on multiple power resources, and a power resource
 * may be shared by multiple devices.
 */

#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

#define _COMPONENT            ACPI_POWER_COMPONENT
ACPI_MODULE_NAME("power");
#define ACPI_POWER_CLASS        "power_resource"
#define ACPI_POWER_DEVICE_NAME        "Power Resource"
#define ACPI_POWER_FILE_INFO        "info"
#define ACPI_POWER_FILE_STATUS        "state"
#define ACPI_POWER_RESOURCE_STATE_OFF    0x00
#define ACPI_POWER_RESOURCE_STATE_ON    0x01
#define ACPI_POWER_RESOURCE_STATE_UNKNOWN 0xFF

int acpi_power_nocheck;
module_param_named(power_nocheck, acpi_power_nocheck, bool, 000);

static int acpi_power_add(struct acpi_device *device);
static int acpi_power_remove(struct acpi_device *device, int type);
static int acpi_power_resume(struct acpi_device *device);
static int acpi_power_open_fs(struct inode *inode, struct file *file);

static struct acpi_device_id power_device_ids[] = {
    {ACPI_POWER_HID, 0},
    {"", 0},
};
MODULE_DEVICE_TABLE(acpi, power_device_ids);

static struct acpi_driver acpi_power_driver = {
    .name = "power",
    .class = ACPI_POWER_CLASS,
    .ids = power_device_ids,
    .ops = {
        .add = acpi_power_add,
        .remove = acpi_power_remove,
        .resume = acpi_power_resume,
        },
};

struct acpi_power_reference {
    struct list_head node;
    struct acpi_device *device;
};

struct acpi_power_resource {
    struct acpi_device * device;
    acpi_bus_id name;
    u32 system_level;
    u32 order;
    struct mutex resource_lock;
    struct list_head reference;
};

static struct list_head acpi_power_resource_list;

static const struct file_operations acpi_power_fops = {
    .owner = THIS_MODULE,
    .open = acpi_power_open_fs,
    .read = seq_read,
    .llseek = seq_lseek,
    .release = single_release,
};

/*
--
 Power Resource Management
  
--
*/

static int
acpi_power_get_context(acpi_handle handle,
           struct acpi_power_resource **resource)
{
    int result = 0;
    struct acpi_device *device = NULL;


    if (!resource)
        return -ENODEV;

    result = acpi_bus_get_device(handle, &device);
    if (result) {
        printk(KERN_WARNING PREFIX "Getting context [%p]\n", handle);
        return result;
    }

    *resource = acpi_driver_data(device);
    if (!*resource)
        return -ENODEV;

    return 0;
}

static int acpi_power_get_state(acpi_handle handle, int *state)
{
    acpi_status status = AE_OK;
    unsigned long long sta = 0;
    char node_name[5];
    struct acpi_buffer buffer = { sizeof(node_name), node_name };


    if (!handle || !state)
        return -EINVAL;

    status = acpi_evaluate_integer(handle, "_STA", NULL, &sta);
    if (ACPI_FAILURE(status))
        return -ENODEV;

    *state = (sta & 0x01)?ACPI_POWER_RESOURCE_STATE_ON:
              ACPI_POWER_RESOURCE_STATE_OFF;

    acpi_get_name(handle, ACPI_SINGLE_NAME, &buffer);

    ACPI_DEBUG_PRINT((ACPI_DB_INFO, "Resource [%s] is %s\n",
              node_name,
                *state ? "on" : "off"));

    return 0;
}

static int acpi_power_get_list_state(struct acpi_handle_list *list, int
*state)
{
    int result = 0, state1;
    u32 i = 0;


    if (!list || !state)
        return -EINVAL;

    /* The state of the list is 'on' IFF all resources are 'on'. */
    /* */

    for (i = 0; i < list->count; i++) {
        /*
         * The state of the power resource can be obtained by
         * using the ACPI handle. In such case it is unnecessary to
         * get the Power resource first and then get its state again.
         */
        result = acpi_power_get_state(list->handles[i], &state1);
        if (result)
            return result;

        *state = state1;

        if (*state != ACPI_POWER_RESOURCE_STATE_ON)
            break;
    }

    ACPI_DEBUG_PRINT((ACPI_DB_INFO, "Resource list is %s\n",
              *state ? "on" : "off"));

    return result;
}

static int acpi_power_on(acpi_handle handle, struct acpi_device *dev)
{
    int result = 0;
    int found = 0;
    acpi_status status = AE_OK;
    struct acpi_power_resource *resource = NULL;
    struct list_head *node, *next;
    struct acpi_power_reference *ref;


    result = acpi_power_get_context(handle, &resource);
    if (result)
        return result;

    mutex_lock(&resource->resource_lock);
    list_for_each_safe(node, next, &resource->reference) {
        ref =

[linuxkernelnewbies] Dos and Don't when writing a Linux Device Driver

2009-09-07 Thread peter teoh

http://janitor.kernelnewbies.org/docs/driver-howto.html

HOWTO: Linux Device Driver Dos and Don'ts
Linux Device Drivers DOs and DON'Ts A guide to
writing Robust Linux Device Drivers
Version 1.1
-

Index

License,
author and version

Introduction

1-Overview

1.1
Why this document

1.2
What's a "Hardened Device Driver"

1.3
Robust Device Drivers"

2.0
Where do I start?"

3.0
OK, I'm ready...

3.1
Efficient error handling, reporting and recovery

3.2
Up-to-date with kernel APIs/Interfaces

3.2.1
Module interface changes

3.2.2
Sysfs and new driver model

3.2.3
Cli() and Sti()

3.3
Managing resources

3.3.1
String functions

3.3.2
Variable declaration and initialization

3.3.3
Balancing functions

3.4
I/O operations

3.4.1
I/O space access

3.4.2
Memory mapped I/O access

3.5
Obvious DONTZ

License, author and version
Copyright (c) 2003 by Intel Corporation. This material may be
distributed only subject to the terms and conditions set forth
in the Open Publication License, v1.0 or later.

The latest version is presently available at:
http://www.opencontent.org/openpub.

Author: Tariq Shureih

Version: 1.1
Date updated:$Date: 2004/03/18 21:42:47 $

Introduction
This document is not a real driver HOWTO -- there are books out there on
how to write a linux kernel driver.
Writing a linux kernel driver can be as simple as writing three lines
of code or an extensive task which requires understanding of how Linux
addresses hardware on the various architectures it supports as well
as the understanding of PC concepts (all reference in this document is x86
architecture-centric yet not specific).

What this document will address is the DOs and DON'Ts when writing a linux
kernel device driver.
These DOs and DON'Ts are based on the kernel-janitor project's TODO list.
Further, concepts introduced by the original "Hardened Drivers" spec published
at http://hardeneddrivers.sf.net are also present in this document.

For more information on the Kernel-Janitor project, visit http://kernel-janitor.sourceforge.net/.

1-Overview
1.1 Why this document?
I wanted to collect the information I learned when I got involved in kernel development
in a single document and hopefully make it a guide to newbies and/or people looking for
those little tricks that go into writing a robust device driver.

This document is rather a simple guide to known methods and techniques when writing
a Linux device driver and it could be regarded as a companion to other available resources.

1.2 What's a "Hardened Device Driver"?
The answer to this question depends on who you ask. To some, a hardened
device driver is a stable, reliable, instrumental and highly available device
driver. In a previous effort to specify what constitutes a hardened driver, a
hardened driver was described as consisting of three levels:

1-Stability and Reliability: The use of best-known coding practices within the driver to detect
and report errant conditions in hardware and software, to protect the driver, kernel, and other
software components from corruption, to provide for fault injection testing, and to make the
code clear and easy to read for accurate maintenance.

2-Instrumentation: Statistics reporting interface, diagnostic test interface, and POSIX error
logging for monitoring and management by higher-level system management components.

3-High Availability: Special features, such as device redundancy, hot swap, and fault
recovery, to enhance availability.

This document will attempt to describe "hardened drivers" with a slightly different
approach; yet building on some of the highlights above, what a hardened (robust) device
driver should mean and how it should be implemented and measured.

To avoid confusion with previous efforts to speficy the requirements of a hardened driver,
this document will refer to the ideal driver as an "robust" driver.

1.3 Robust device drivers
A robust driver is really just a robust, bug free and maintainable example of kernel level code.
As Linux evolves, the specifics of what makes up a robust device driver will change, but the
following general attributes will probably hold consistent:

-Follows the Linux CodingStyle.
making it is easy to maintain and consistent with the kernel coding style..
Reference: /Documentation/CodingStyle
Reference: http://www.kroah.com/linux/talks/ols_2002_kernel_codingstyle_talk/html/

Note: no more is discussed on this topic since the references above cover all aspects.

-Efficient in managing faults and handling, reporting
and recovering from errors. (printk levels, enumerated return codes, etc.)
Also not panic() happy.

[linuxkernelnewbies] Eliminating tasklets [LWN.net]

2009-09-07 Thread peter teoh






http://lwn.net/Articles/239633/


Eliminating tasklets
[Posted June 24, 2007 by corbet]
 


Tasklets are a deferred-execution method used within the kernel; they
were
added in the 2.3 development series as a way for interrupt handlers to
schedule work to be done in the very near future. Essentially, a
tasklet
is a function to be called (with a data pointer) in a software
interrupt as
soon as the kernel is able to do so.
In practice, a tasklet which is scheduled will (probably) be executed
when
the kernel either (1) finishes running an interrupt handler, or
(2) returns to user space. Since tasklets run in software interrupt
mode, they must be atomic - no sleeping, references to user space, etc.
So
the work that can be done in tasklets is limited, but they are still
heavily used within the kernel.
There is another problem with tasklets: since they run as software
interrupts, they have a higher priority than any process on the system.
Tasklets can, thus, create unbounded latencies - something which the
low-latency developers have been long working to eliminate. Some
efforts
have been made to mitigate this problem; if the kernel has a hard time
keeping up with software interrupts it will eventually dump them into
the
ksoftirqd process and let them fight it out in the scheduler.
Specific tasklets which have been shown to create latency problems -
the
RCU callback handler, for example - have been made to behave better.
And
the realtime tree pushes all software interrupt handling into separate
processes which can be scheduled (and preempted) like anything else.

Recently, Steven Rostedt came up with a different approach: why
not
just get rid of tasklets altogether? Since the development of tasklets,
the kernel has acquired other, more flexible ways of deferring work; in
particular, workqueues function much like tasklets, but without many of
the
disadvantages of tasklets. Since workqueues use dedicated worker
processes, they can be preempted and do not present the same latency
problems as tasklets; as a bonus, they provide a process context which
allows work functions to sleep
if need be. Workqueues, argues Steven, are sufficiently capable that
there
is no need for tasklets anymore.

So Steven's patch cleans up the interface in a few ways, and turns
the RCU
tasklet into a separate software interrupt outside of the tasklet
mechanism. Then the tasklet code is torn out and replaced with a
wrapper
interface which conceals a workqueue underneath. The end result is a
tasklet-free kernel without the need to rewrite all of the code which
uses
tasklets.

There is little opposition to the idea of eliminating tasklets,
though it
is clear that quite a bit of performance testing will be required
before
such a change could go into the mainline kernel. But almost nobody
likes
the wrapper interface; it is just the sort of compatibility glue that
the
"no stable internal API" policy tries to avoid. So there is a lot of
pressure to dump the wrapper and simply convert all tasklet users
directly
to workqueues. Needless to say, this is a rather larger job; it's not
surprising that somebody might be tempted to try to avoid it. In any
case,
the current patch is good for testing; if the replacement of tasklets
will
cause trouble, this patch should turn it up before anybody has gone to
the
trouble of converting all the tasklet users.

Another question needs to be answered here, though: does the
conversion of
tasklets to workqueues lead to a better interrupt handling path, or
should
wider changes be considered? Rather than doing a context switch into a
workqueue process, the system might get better performance by simply
running the interrupt handler as a thread as well. As it happens, the
realtime tree has long done exactly that: all (OK, almost all)
interrupt
handlers run in their own threads. The realtime developers have plans
to
merge this work within the next few kernel cycles.

Under the current plans, threaded interrupt handlers would probably
be a
configuration-time option. But if developers knew that
interrupt
handlers would run in process context, they could simply do the
necessary
processing in the handler and do away with deferred work mechanisms
altogether. This approach might not work in every driver - for some
devices, it might risk adding unacceptable interrupt response latency -
but, in many cases, it has the potential to simplify and streamline the
situation considerably. The code would not just be simpler - it might
just
perform better as well.

Either way, the removal of tasklets would appear to be in the works.
As a
step in that direction, Ingo Molnar is looking
for potential performance problems:


 So how about the following, different approach:
anyone who has a tasklet in any performance-sensitive codepath, please
yell now. We'll also do a proactive search for such places. We can
convert those places to softirqs, or move them back into hardirq
context. Once this is done - and i doubt it will go beyond 1-2 places

[linuxkernelnewbies] Jeff Erickson's Algorithms Course Materials

2009-09-07 Thread peter teoh

Thank you Jeff Erickson for the free materials

http://compgeom.cs.uiuc.edu/~jeffe//teaching/algorithms/

Algorithms Course Materials
by Jeff
Erickson
August 2009 revision
This page contains all my lecture notes for the algorithms classes
required for all computer science undergraduate and graduate students
at the University of Illinois, Urbana-Champaign. I have taught
incarnations of this course eight times: Spring 1999, Fall 2000, Spring
2001, Fall 2002, Spring 2004, Fall 2005, Fall 2006, Spring 2007, Fall
2008, and Spring 2009. These notes are numbered roughly in the order I
used them in my undergraduate class, with (lettered) notes containing
sprinkled in reasonable places. More advanced material is indicated by
the symbol ※. More
information about these notes is available after the notes
themselves.
A large collection of old
homeworks and exams
follows the lecture notes. Most of these homework and exam problems
also appear at the ends of the appropriate lecture notes. Except for
various Homework Zeros, which are solely my fault, most of the homework
problems were written by my teaching assistants:

Aditya Ramani,
Alina Ene,
Amir Nayyeri,
Asha Seetharam,
Ben Moseley,
Brian Ensink,
Chris Neihengen,
Dan Bullok,
Dan Cranston,
Johnathon Fischer,
Ekta Manaktala,
Erin Wolf Chambers,
Igor Gammer, Gio Kao,
Kevin Milans,
Kevin Small,
Lan Chen,
Michael Bond,
Mitch Harris,
Nick Hurlburt,
Nitish Korula,
Reza Zamani-Nasab,
Rishi Talreja,
Rob McCann,
Shripad Thite, and
Yasu Furakawa.

Please do not ask me for solutions. If you're a student, you
will (usually) learn more from trying to solve a problem and failing
than by reading the answer. If you really need help, ask your
instructor, your TAs, and/or your fellow students. If you're an
instructor, you really shouldn't assign problems that you can't solve
yourself! (Because I don't always follow my own advice, some of the
problems are buggy, especially in older homeworks and exams. I've tried
to keep the buggy problems out of the lecture notes themselves.)
More recent version of these notes, along with current homework and
exams, may be available at the official sites for the undergraduate and/or graduate algorithms
classes at UIUC.

Feedback of any kind is always welcome, especially bug reports.
I would particularly appreciate hearing from anyone outside UIUC who
finds these notes useful (or useless)!

Copyright.
Except as indicated otherwise, all material linked from this web page
is Copyright © 1999–2009 Jeff
Erickson.
Anyone may freely download, print, copy, and/or distribute anything on
this page, either electronically or on paper. However, nothing on this
page may be sold in any form for more than the actual cost of
printing and/or reproduction. For example, you are welcome to make
local copies on your own web server, but you are not allowed to require
an access fee (such as tuition) to view your local copy; that's the
same as selling it. If you distribute these notes, please give me
proper credit and please include a pointer to this web
page (http://www.cs.uiuc.edu/~jeffe/teaching/algorithms).
If you're a lawyer, read the
legalese.

This work is
licensed under a Creative
Commons Attribution-Noncommercial-Share Alike 3.0 United States License.

You know,
I could write a book.

And this book would be thick enough to stun an ox.

— Laurie
Anderson, "Let X=X", Big Science (1982)

Everything

Everything
in one file (765 pages)

Cover
material (6 pages)

All
lecture notes in one file (379 pages)

All
homework, head-banging, and exam problems in one file (386 pages)

If we are
ready to tolerate everything as understood, there is
nothing left to explain; while if we sourly refuse to take anything,
even tentatively, as clear, no explanation can be given. What intrigues
us as a problem, and what will satisfy us as a solution, will depend
upon the line we draw between what is already clear and what needs to
be clarified.

— Nelson
Goodman, Fact, Fiction & Forecast (1955)

Lecture Notes

0. Introduction,
history, and course goals

Recursion

1. Simplify
and delegate
A. Fast
Fourier transforms ※

2. Backtracking

B. Fast
exponential-time algorithms ※

3. Dynamic
programming

C. Advanced
dynamic programming tricks ※

4. Greedy
algorithms

D. Matroids
※

Randomization

5. Nuts
and bolts (randomized quicksort)

6. Treaps
and skip lists

E. Tail
inequalities ※

7. Uniform
and universal hashing

F. Randomized
minimum cut

Amortized analysis

8. Aggregation,
taxation, potential

9. Scapegoat
trees and spla

[linuxkernelnewbies] LKML: "Rafael J. Wysocki": Re: [RFC][PATCH 1/2] PCI PM: Introduce __pci_[start|complete]_power_transition()

2009-09-07 Thread Peter Teoh






http://lkml.org/lkml/2009/3/26/125


On Wednesday 25 March 2009, Rafael J. Wysocki wrote:
> From: Rafael J. Wysocki 
> 
> The radeonfb driver needs to program the device's PMCSR directly due
> to some quirky hardware it has to handle (see
> http://bugzilla.kernel.org/show_bug.cgi?id=12846 for details) and
> after doing that it needs to call the platform (usually ACPI) to
> finish the power transition of the device.  Currently it uses
> pci_set_power_state() for this purpose, however making a specific
> assumption about the internal behavior of this function, which has
> changed recently so that this assumption is no longer satisfied.
> For this reason, introduce __pci_complete_power_transition() that may
> be called by the radeonfb driver to complete the power transition of
> the device.  For symmetry, introduce __pci_start_power_transition().
> 
> Signed-off-by: Rafael J. Wysocki 

Sorry, with this version of the patch the following failure scenarion is
possible:
* a caller of pci_set_power_state() wants to put a device into D3
* pci_raw_set_power_state() fails and returns error code
* device is not power manageable by the platform, so
  __pci_complete_power_transition() returns 0
* pci_set_power_state() returns 0, although it should return the error code
  from pci_raw_set_power_state().

Also, if the device doesn't support the native PM and is not power manageable
by the platform, we should always fall back to D0.

The updated patch below has these problems fixed.

Thanks,
Rafael

---
From: Rafael J. Wysocki 
Subject: PCI PM: Introduce __pci_[start|complete]_power_transition() (rev. 2)
The radeonfb driver needs to program the device's PMCSR directly due
to some quirky hardware it has to handle (see
http://bugzilla.kernel.org/show_bug.cgi?id=12846 for details) and
after doing that it needs to call the platform (usually ACPI) to
finish the power transition of the device.  Currently it uses
pci_set_power_state() for this purpose, however making a specific
assumption about the internal behavior of this function, which has
changed recently so that this assumption is no longer satisfied.
For this reason, introduce __pci_complete_power_transition() that may
be called by the radeonfb driver to complete the power transition of
the device.  For symmetry, introduce __pci_start_power_transition().

Signed-off-by: Rafael J. Wysocki 
---
 drivers/pci/pci.c   |   69 ++--
 include/linux/pci.h |1 
 2 files changed, 52 insertions(+), 18 deletions(-)
Index: linux-2.6/drivers/pci/pci.c
===
--- linux-2.6.orig/drivers/pci/pci.c
+++ linux-2.6/drivers/pci/pci.c
@@ -540,6 +540,53 @@ void pci_update_current_state(struct pci
 }
 
 /**
+ * pci_platform_power_transition - Use platform to change device power state
+ * @dev: PCI device to handle.
+ * @state: State to put the device into.
+ */
+static int pci_platform_power_transition(
+	struct pci_dev *dev, pci_power_t state)
+{
+	int error;
+
+	if (platform_pci_power_manageable(dev)) {
+		error = platform_pci_set_power_state(dev, state);
+		if (!error)
+			pci_update_current_state(dev, state);
+	} else {
+		error = -ENODEV;
+		pci_update_current_state(dev, PCI_D0);
+	}
+
+	return error;
+}
+
+/**
+ * __pci_start_power_transition - Start power transition of a PCI device
+ * @dev: PCI device to handle.
+ * @state: State to put the device into.
+ */
+static void __pci_start_power_transition(struct pci_dev *dev, pci_power_t state)
+{
+	if (state == PCI_D0)
+		pci_platform_power_transition(dev, PCI_D0);
+}
+
+/**
+ * __pci_complete_power_transition - Complete power transition of a PCI device
+ * @dev: PCI device to handle.
+ * @state: State to put the device into.
+ *
+ * This function should not be called directly by device drivers.
+ */
+int __pci_complete_power_transition(struct pci_dev *dev, pci_power_t state)
+{
+	return state > PCI_D0 ?
+			pci_platform_power_transition(dev, state) : -ENODEV;
+}
+EXPORT_SYMBOL_GPL(__pci_complete_power_transition);
+
+/**
  * pci_set_power_state - Set the power state of a PCI device
  * @dev: PCI device to handle.
  * @state: PCI power state (D0, D1, D2, D3hot) to put the device into.
@@ -575,16 +622,8 @@ int pci_set_power_state(struct pci_dev *
 	if (dev->current_state == state)
 		return 0;
 
-	if (state == PCI_D0) {
-		/*
-		 * Allow the platform to change the state, for example via ACPI
-		 * _PR0, _PS0 and some such, but do not trust it.
-		 */
-		int ret = platform_pci_power_manageable(dev) ?
-			platform_pci_set_power_state(dev, PCI_D0) : 0;
-		if (!ret)
-			pci_update_current_state(dev, PCI_D0);
-	}
+	__pci_start_power_transition(dev, state);
+
 	/* This device is quirked not to be put into D3, so
 	   don't put it in D3 */
 	if (state == PCI_D3hot && (dev->dev_flags & PCI_DEV_FLAGS_NO_D3))
@@ -592,14 +631,8 @@ int pci_set_power_state(struct pci_dev *
 
 	error = pci_raw_set_power_state(dev, state);
 
-	if (state > PCI_D0

[linuxkernelnewbies] LKML: Len Brown: [PATCH 05/85] ACPI: Add "acpi.power_nocheck=1" to disable power state check in power transition

2009-09-07 Thread Peter Teoh






Qs:   how to set the hardware ACPI
states?   what are the states availables?   how to get alerted when the
hardware initiate the state transition (eg, some specific button
pressed)?   how to set the callback for power state transition when a
timer has runout?

http://lkml.org/lkml/2008/10/11/12


From: Zhao Yakui 

   Maybe the incorrect power state is returned on the bogus bios, which
is different with the real power state. For example: the bios returns D0
state and the real power state is D3. OS expects to set the device to D0
state. In  such case if OS uses the power state returned by the BIOS and
checks the device power state very strictly in power transition, the device
can't be transited to the correct power state.

   So the boot option of "acpi.power_nocheck=1" is added to avoid checking
the device power in the course of device power transition.

http://bugzilla.kernel.org/show_bug.cgi?id=8049
http://bugzilla.kernel.org/show_bug.cgi?id=11000

Signed-off-by: Zhao Yakui 
Signed-off-by: Zhang Rui 
Signed-off-by: Li Shaohua 
Signed-off-by: Andi Kleen 
---
 Documentation/kernel-parameters.txt |8 ++
 drivers/acpi/bus.c  |   14 ++-
 drivers/acpi/power.c|   42 +-
 include/acpi/acpi_drivers.h |1 +
 4 files changed, 53 insertions(+), 12 deletions(-)
diff --git a/Documentation/kernel-parameters.txt b/Documentation/kernel-parameters.txt
index 44d1bd1..99cf83f 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -251,6 +251,14 @@ and is between 256 and 4096 characters. It is defined in the file
  			Warning: Many of these options can produce a lot of
  			output and make your system unusable. Be very careful.
 
+ 	acpi.power_nocheck=	[HW,ACPI]
+ 			Format: 1/0 enable/disable the check of power state.
+ 			On some bogus BIOS the _PSC object/_STA object of
+ 			power resource can't return the correct device power
+ 			state. In such case it is unneccessary to check its
+ 			power state again in power transition.
+ 			1 : disable the power state check
+
 	acpi_pm_good	[X86-32,X86-64]
 			Override the pmtimer bug detection: force the kernel
 			to assume that this machine's pmtimer latches its value
diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c
index ccae305..91bdeb3 100644
--- a/drivers/acpi/bus.c
+++ b/drivers/acpi/bus.c
@@ -223,7 +223,19 @@ int acpi_bus_set_power(acpi_handle handle, int state)
 	/*
 	 * Get device's current power state
 	 */
-	acpi_bus_get_power(device->handle, &device->power.state);
+	if (!acpi_power_nocheck) {
+		/*
+		 * Maybe the incorrect power state is returned on the bogus
+		 * bios, which is different with the real power state.
+		 * For example: the bios returns D0 state and the real power
+		 * state is D3. OS expects to set the device to D0 state. In
+		 * such case if OS uses the power state returned by the BIOS,
+		 * the device can't be transisted to the correct power state.
+		 * So if the acpi_power_nocheck is set, it is unnecessary to
+		 * get the power state by calling acpi_bus_get_power.
+		 */
+		acpi_bus_get_power(device->handle, &device->power.state);
+	}
 	if ((state == device->power.state) && !device->flags.force_power_state) {
 		ACPI_DEBUG_PRINT((ACPI_DB_INFO, "Device is already at D%d\n",
   state));
diff --git a/drivers/acpi/power.c b/drivers/acpi/power.c
index e7bab75..7ff7349 100644
--- a/drivers/acpi/power.c
+++ b/drivers/acpi/power.c
@@ -54,6 +54,14 @@ ACPI_MODULE_NAME("power");
 #define ACPI_POWER_RESOURCE_STATE_OFF	0x00
 #define ACPI_POWER_RESOURCE_STATE_ON	0x01
 #define ACPI_POWER_RESOURCE_STATE_UNKNOWN 0xFF
+
+#ifdef MODULE_PARAM_PREFIX
+#undef MODULE_PARAM_PREFIX
+#endif
+#define MODULE_PARAM_PREFIX "acpi."
+int acpi_power_nocheck;
+module_param_named(power_nocheck, acpi_power_nocheck, bool, 000);
+
 static int acpi_power_add(struct acpi_device *device);
 static int acpi_power_remove(struct acpi_device *device, int type);
 static int acpi_power_resume(struct acpi_device *device);
@@ -228,12 +236,18 @@ static int acpi_power_on(acpi_handle handle, struct acpi_device *dev)
 	if (ACPI_FAILURE(status))
 		return -ENODEV;
 
-	result = acpi_power_get_state(resource->device->handle, &state);
-	if (result)
-		return result;
-	if (state != ACPI_POWER_RESOURCE_STATE_ON)
-		return -ENOEXEC;
-
+	if (!acpi_power_nocheck) {
+		/*
+		 * If acpi_power_nocheck is set, it is unnecessary to check
+		 * the power state after power transition.
+		 */
+		result = acpi_power_get_state(resource->device->handle,
+&state);
+		if (result)
+			return result;
+		if (state != ACPI_POWER_RESOURCE_STATE_ON)
+			return -ENOEXEC;
+	}
 	/* Update the power resource's _device_ power state */
 	resource->device->power.state = ACPI_STATE_D0;
 
@@ -279,11 +293,17 @@ static int acpi_power_off_device(acpi_handle handle, struct acpi_device *dev)
 	if (ACPI_FAILURE(status))
 		return -ENODEV;
 
-	result = acpi_power_get_state(handle, &state);
-	if (r

[linuxkernelnewbies] Ben Pfaff: GNU libavl

2009-09-07 Thread peter teoh





http://adtinfo.org/


Ben Pfaff: [
News | 
Papers | 
Software | 
GNU
libavl | 
Pictures | 
Writings | 
Links
]
[ 
Stanford
Grad Student
 |
MSU Alumnus |
 Valid HTML 4.0!
]

GNU libavl
Binary search trees provide O(lg n) performance on average
for important operations such as item insertion, deletion, and search
operations. Balanced trees provide O(lg n) even in the worst
case.
GNU libavl is the most complete, well-documented collection of
binary search tree and balanced tree library routines anywhere. It
supports these kinds of trees:

  
Plain binary trees:

  Binary search trees
  AVL trees
  Red-black trees

  
  
Threaded binary trees:

  Threaded binary search trees
  Threaded AVL trees
  Threaded red-black trees

  
  
Right-threaded binary trees:

  Right-threaded binary search trees
  Right-threaded AVL trees
  Right-threaded red-black trees

  
  
Binary trees with parent pointers:

  Binary search trees with parent pointers
  AVL trees with parent pointers
  Red-black trees with parent pointers

  

Visit the online HTML
version of libavl.
libavl's name is a historical accident: it originally implemented
only AVL trees. Its name may change to something more appropriate in
the future, such as “libsearch”. You should also expect this page to
migrate to www.gnu.org sometime in the indefinite future.
Version 2.0
Version 2.0 of libavl was released on January 6, 2002. It is a
complete rewrite of earlier versions implemented in a “literate
programming” fashion, such that in addition to being a useful library,
it is also a book describing in full the algorithms behind the library.
Version 2.0.1 of libavl was released on August 24, 2002. It fixes
some typos in the text and introduces an HTML output format. No bugs in
the libavl code were fixed, because none were reported. Unlike 2.0,
this version is compatible with recent releases of Texinfo. dvipdfm is
now used for producing the PDF version.
Version 2.0.2 of libavl was released on December 28, 2004. It fixes
a bug in tavl_delete() reported by Petr Silhavy a long
time ago. This is the same fix posted here
earlier.
This version (again) works with recent versions of Texinfo, fixes a few
typos in the text, and slightly enhances the HTML output format.
You can download the preformatted book or a source distribution that
will allow you to use the library in your own programs. You can also
use the source distribution to format the book yourself:

  
Preformatted book:

  
Online HTML or gzip'd tar
archive (1.7 MB)
  
  
gzip'd
PDF (1.4 MB)
  
  
gzip'd
PostScript (746 kB)
  
  
gzip'd
plain text (224 kB)
  

The PostScript and PDF versions are 432 U.S. letter-size pages
in
length. The plain text version is 26,968 lines long, or about 409 pages
at 66 lines per page.
  
  
Source distribution as a gzip'd tar
archive (1.4 MB)
  

For an overview of the ideas behind libavl 2.0, see the poster presentation
made on April 6, 2001, at Michigan State University. This presentation
is also available in the original PostScript.
Older Versions
Version 1.4.0 is the predecessor to 2.0. It implemented only the
following types of trees:

  AVL tree.
  Threaded AVL tree.
  Right-threaded AVL tree.
  Red-black tree.

Version 1.4.0 is no longer being actively developed, but any
reported bugs that affect its behavior will be fixed. Source code for
libavl 1.4.0 can be obtained from ftp://ftp.gnu.org/pub/gnu/avl.
Other AVL resources
Several AVL tree libraries are available on the net. The following
is a list of the ones that I consider to be well-written and generally
useful in other code. Let me know of any others and I'll add them to
the list after checking them out.

  
kazlib.
This is an red-black tree implementation, by Kaz Kylheku. I'm listing
it here because I always forget where it is and have to look it up.
Free license.
  
  
libdict.
This library implements AVL and red-black trees and several other kinds
of dictionary data structures. BSD-style license with advertising
clause.
  
  
avlmap. A
library in C by Phil Howard that provides convenient implementations
for several variable types and voluminous documentation in HTML format.
Very large code; e.g., one included header file is 68 kilobytes. GNU
Lesser General Public License.
  
  
glib. GTK+ includes a library
named glib, which has an unoptimized recursive C implementation. GNU
Lesser General Public License.
  
  
cprops. AVL trees,
red-black tree, splay trees, and more, in a recursive implementation
designed for multithreaded applications. GNU Lesser General Public
License.
  
  
Python
avllib.
Iterative C implementation including all the usual routines. Although I
haven't tested it, it looks very well-written. Includes Python bindings
as well as some unusual but useful features (Knuth's RANK field, for
instance).

[linuxkernelnewbies] Binghamton Operating Systems and Networks Lab: Systems Research Seminar

2009-09-08 Thread Peter Teoh





http://osnet.cs.binghamton.edu/seminar/index.html


Following is a
tentative list of papers from recent conferences:

Virtual Machines

  Architectural
support for shadow memory in multiprocessors, Vijay Nagarajan and Rajiv
Gupta, VEE 2009

  Memory buddies:
Exploiting page sharing for smart colocation in virtualized data
centers,
Timothy Wood, Gabriel Tarasuk-Levin, Prashant Shenoy, Peter Desnoyers,
Emmanuel Cecchet, and Mark Corner, VEE 2009

  Entropy: A
consolidation manager for clusters,
Fabien Hermenier, Xavier Lorca, Jean-Marc Menaud, Gilles Muller, and
Julia Lawallr, VEE 2009

  Achieving 10Gbps
using safe and transparent network interface virtualization,
Kaushik Kumar Ram, Jose Renato Santos, Yoshio Turner, Alan L Cox, and
Scott Rixner, VEE 2009

  Task-aware virtual
machine scheduling for I/O performance,
Hwanju Kim, Hyeontaek Lim, Jinkyu Jeong, Heeseung Jo, and Joonwon Lee,
VEE 2009

  The hybrid
scheduling framework for virtual machine systems, Chuliang Weng, Minglu
Li, and Xinda Lu, VEE 2009


Power 

  ClientVisor:
Leverage COTS OS functionalities for power management in virtualized
desktop environment,
Huacai Chen, Hai Jin, Zhiyuan Shao, Ke Yu, and Kevin Tian, VEE 2009

  Everest: Scaling
Down Peak Loads Through I/O Off-Loading,
Dushyanth Narayanan, Austin Donnelly, Eno Thereska, Sameh Elnikety, and
Antony Rowstron, Microsoft Research Cambridge, United Kingdom, OSDI
2008

  Quanto: Tracking
Energy in Networked Embedded Systems
Rodrigo Fonseca, University of California, Berkeley, and Yahoo!
Research; Prabal Dutta, University of California, Berkeley; Philip
Levis, Stanford University; Ion Stoica, University of California,
Berkeley, OSDI 2008

  Somniloquy:
Augmenting Network Interfaces to Reduce PC Energy
Usage,
Yuvraj Agarwal, University of California, San Diego; Ranveer Chandra,
Steve Hodges, James Scott, and Paramvir Bahl, Microsoft Research;
Rajesh Gupta, University of California, San Diego, NSDI 2009

  Skilled in the Art
of Being Idle: Reducing Energy Waste in
Networked Systems,
Sergiu Nedevschi, International Computer Science Institute; Sylvia
Ratnasamy, Intel Research Berkeley; Jaideep Chandrashekar, Intel
Research Santa Clara; Bruce Nordman, Lawrence Berkeley National
Laboratory; Nina Taft, Intel Research Berkeley


Wireless

  Sora: High
Performance Software Radio Using General Purpose
Multi-core Processors,
Kun Tan, Jiansong Zhang, and Haitao Wu, Microsoft Research Asia; Fang
Ji, Beijing Jiao Tong University; He Liu, Yusheng Ye, and Shen Wang,
Tsinghua University; Yongguang Zhang and Wei Wang, Microsoft Research
Asia; Geoffrey M. Voelker, University of California, San Diego, NSDI
2009

  Enabling MAC
Protocol Implementations on Software-defined Radios,
George Nychis, Srinivasan Seshan, Peter Steenkiste, Thibaud Hottelier,
and Zhuocheng Yang, Carnegie Mellon University, NSDI 2009

  Wishbone:
Profile-based Partitioning for Sensornet Applications
Ryan Newton, Sivan Toledo, Lewis Girod, Hari Balakrishnan, and Samuel
Madden, MIT CSAIL, NSDI 2009.

  Softspeak: Making
VoIP Play Well in Existing 802.11 Deployments
Patrick Verkaik, Yuvraj Agarwal, Rajesh Gupta, and Alex C. Snoeren,
University of California, San Diego, NSDI 2009

  Block-switched
Networks: A New Paradigm for Wireless Transport
Ming Li, Devesh Agrawal, Deepak Ganesan, Arun Venkataramani, and
Himanshu Agrawal, University of Massachusetts Amherst, NSDI 2009


Content Distribution

  AntFarm: Efficient
Content Distribution with Managed Swarms,
Ryan S. Peterson and Emin Güer, Cornell Universityy, NSDI 2009

  HashCache: Cache
Storage for the Next Billion,
Anirudh Badam, KyoungSoo Park, Vivek S. Pai, and Larry L. Peterson,
Princeton University, NSDI 2009


Fault Tolerance

  CuriOS: Improving
Reliability through Operating System Structure,
Francis M. David, Ellick M. Chan, Jeffrey C. Carlyle, and Roy H.
Campbell, University of Illinois at Urbana-Champaig, OSDI 2008

  Tolerating Latency
in Replicated State Machines Through Client
Speculation
Benjamin Wester, University of Michigan; James Cowling, MIT CSAIL;
Edmund B. Nightingale, Microsoft Research; Peter M. Chen and Jason
Flinn, University of Michigan; Barbara Liskov, MIT CSAIL, NSDI 2009

  Making Byzantine
Fault Tolerant Systems Tolerate Byzantine Faults,
Lorenzo Alvisi, Allen Clement, and Mike Dahlin, The University of Texas
at Austin; Mirco Marchetti, University of Mondena and Reggio Emilia;
Edmund Wong, The University of Texas at Austin, NSDI 2009

  Zeno: Eventually
Consistent Byzantine Fault Tolerance,
Atul Singh, MPI-SWS and Rice University; Pedro Fonseca, Petr Kuznetsov,
and Rodrigo Rodrigues, MPI-SWS; Petros Maniatis, Intel Research
Berkeley, NSDI 2009


Multi-Core

  Corey: An Operating
System for Many Cores,
Silas Boyd-Wickizer, Massachusetts Institute of Technology; Haibo Chen,
Rong Chen, and Yandong Mao, Fudan University; Frans Kaashoek, Robert
Morris, and Aleksey Pesterev,

[linuxkernelnewbies] jprobes for 'enqueue_entity' and 'dequeue_entity' causes kernel hang on spinlock

2009-09-09 Thread peter teoh






http://mail.nl.linux.org/kernelnewbies/2009-03/msg00097.html



  To:
kernelnewbies 
  Subject:
jprobes for 'enqueue_entity' and 'dequeue_entity' causes kernel hang on
spinlock
  From:
Sukanto Ghosh 
  Date:
Fri, 6 Mar 2009 15:27:31 +0530
  Dkim-signature:
v=1; a=rsa-sha256; c=relaxed/relaxed;
d=gmail.com; s=gamma;
h=domainkey-signature:mime-version:received:date:message-id:subject
:from:to:content-type:content-transfer-encoding;
bh=NscJmdOiNc0OOvULmR+1RjDi3y19j7QzAid9gxww/5o=;
b=BHMdVxHeeyKOp0MphXbUS5WYs/Ml68+/oZ9UrbS0dgsNN8xPsc4JBn2biR3cE0c8L7
RLczTBXerPvZ9RTy4im0xwFXln2Vas7z0SqciHj0pR4pMtosroQ7FOkIS4Tpt3ugaiMb
IRAuJq9YgeFXWzNoYST5P7LFNJaq2SarAY92I=
  Domainkey-signature:
a=rsa-sha1; c=nofws; d=gmail.com;
s=gamma; h=mime-version:date:message-id:subject:from:to:content-type
:content-transfer-encoding;
b=qvaTaXlP6LsxC2JWy41dPqpriMobnwbvNFKUBgfH7LKR+GhRxEGyyzVH6XsmkwKEJO
xMtnlD+KbOoI8ib9mxaON+NUTcwE5zMu8sRwOsTv5yeqz6M74pn5CMAKhSlkllE3Mln4
z6QcSsWoMYOfmawnyrUmgDaGm7vcUERR6kCXA=
  List-archive:

  List-help:

  List-id:

  List-owner:

  List-post:

  List-software:
Ecartis version 1.0.0
  List-subscribe:

  List-unsubscribe:

  Sender: kernelnewbies-bou...@





Hi,

I have to log when a process gets in/out of the runqueue. For the same
I took the jprobes example code from Documentation/kprobes.txt and
modified it to probe enqueue_entity() and dequeue_entity() . When I
insmod my module the kernel hangs. On running with a remote gdb I
could figure out that it is spinning on a spinlock.

I have used printks in my module code.  It is interesting to note that
wake_up_klogd() is being called twice in this chain and I think it is
causing the trouble, as it is spinning on a lock which it has already
locked earlier.  Is my analysis right ?  Am I doing anything wrong
here ?


#0  0xc04180fa in __ticket_spin_lock (lock=0xc0740ea4) at
include/asm/spinlock.h:75
#1  0xc0639667 in _spin_lock_irqsave (lock=0x1e1d) at
include/asm/paravirt.h:1401
#2  0xc041dfc3 in __wake_up (q=0xc0740ea4, mode=7709, nr_exclusive=1,
key=0x0) at kernel/sched.c:4585
#3  0xc0429eb5 in wake_up_klogd () at kernel/printk.c:988
#4  0xc042a03d in release_console_sem () at kernel/printk.c:1036
#5  0xc042a49e in vprintk (fmt=0xe0a370d0 "<6>probe[enqueue_entity]:
[pid:%u]\n", args=0xde98edd4 "\215\006") at kernel/printk.c:771
#6  0xc063762d in printk (fmt=0xe0a370d0 "<6>probe[enqueue_entity]:
[pid:%u]\n") at kernel/printk.c:604
#7  0xe0a37041 in ?? ()
#8  0xc042041a in enqueue_task_fair (rq=0xc140a580, p=, wakeup=1) at kernel/sched_fair.c:928
#9  0xc041c540 in enqueue_task (rq=0xc140a580, p=0xdecb8000, wakeup=1)
at kernel/sched.c:1644
#10 0xc041c67d in activate_task (rq=0xc140a580, p=0x1e1d, wakeup=1) at
kernel/sched.c:1715
#11 0xc0424133 in try_to_wake_up (p=0xdecb8000, state=1, sync=0) at
kernel/sched.c:2281
#12 0xc04241bc in default_wake_function (curr=,
mode=7709, sync=1, key=0x0) at kernel/sched.c:4546
#13 0xc043a1ea in autoremove_wake_function (wait=0xc0740ea4,
mode=7709, sync=1, key=0x0) at kernel/wait.c:132
#14 0xc041cb30 in __wake_up_common (q=, mode=1,
nr_exclusive=1, sync=0, key=0x0) at kernel/sched.c:4567
#15 0xc041dfd6 in __wake_up (q=0xc0740ea4, mode=1, nr_exclusive=1,
key=0x0) at kernel/sched.c:4586
#16 0xc0429eb5 in wake_up_klogd () at kernel/printk.c:988
#17 0xc042a03d in release_console_sem () at kernel/printk.c:1036
#18 0xc042a49e in vprintk (fmt=0xe0a3712d "<6>Planted jprobe at %p,
handler addr %p\n", args=0xde98ef3c "�\001B�%p\230�%\021@�")
at kernel/printk.c:771
#19 0xc063762d in printk (fmt=0xe0a3712d "<6>Planted jprobe at %p,
handler addr %p\n") at kernel/printk.c:604
#20 0xe089802f in ?? ()
#21 0xc0401125 in do_one_initcall (fn=0xe0898000) at init/main.c:715
#22 0xc0449167 in sys_init_module (umod=0x9e2c018, len=94352,
uargs=0x9e2c008 "") at kernel/module.c:2291
#23 
#24 0xb7f66424 in ?? ()
#25 0x00a595d6 in ?? ()
#26 0x08048631 in ?? ()


My code:
---
#include 
#include 
#include 
#include 

/* An entity is a task if it doesn't "own" a runqueue */
#define entity_is_task(se)	(!se->my_q)

static inline struct task_struct *task_of(struct sched_entity *se)
{
	return container_of(se, struct task_struct, se);
}

static void j_enqueue_entity(struct cfs_rq *cfs_rq, struct
sched_entity *se, int wakeup)
{
	if (entity_is_task(se)) {
			struct task_struct *tsk = task_of(se);
			printk(KERN_INFO "probe[enqueue_entity]: [pid:%u]\n", tsk->pid);
	}
	/* Always end with a call to jprobe_return(). */
	jprobe_return();
}

static struct jprobe enqueue_entity_jprobe = {
	.entry			= j_enqueue_entity,
	.kp = {
		.symbol_name	= "enqueue_entity",
	},
};

static void j_dequeue_entity(struct cfs_

[linuxkernelnewbies] Linux for the Nios II Processor - Nios Community Wiki

2009-09-09 Thread peter teoh






http://www.nioswiki.com/linux

 Nios Community Wiki > Linux for
the Nios II Processor 

Linux for the Nios II Processor

 


Table of contents
No headers

  
This
is the community supported version of Nios II Linux with MMU. This is
package will work only on Linux. You will need a virtual Linux to run
it on Windows. This is GPL software, and come with absolutely NO
warranty.

You may get support with the Nios forum, or the nios2-dev mailing list.

http://forum.niosforum.com/forum/ind...p?showforum=18
http://sopc.et.ntust.edu.tw/cgi-bin/...info/nios2-dev
  
Install the required development packages on your Linux
desktop, as root or sudo,
For RHEL5/Centos5, enable EPEL
at https://fedoraproject.org/wiki/EPEL.


# for RHEL5/Centos5 only
wget http://download.fedora.redhat.com/pub/epel/5/i386/epel-release-5-3.noarch.rpm
rpm -Uvh epel-release-5-3.noarch.rpm

# for RHEL5/Centos5/Fedora11
yum install git-all git-gui tcsh make gcc ncurses-devel bison libglade2-devel \
byacc flex gawk gettext ccache zlib-devel gtk2-devel lzo-devel pax-utils

  
For new users who didn't
used nios2 git before, please download the tarball (1.7GB) ,as a normal
user,

wget http://www.niosftp.com/pub/linux/nios2-linux-20090825.tar

sha1sum nios2-linux-20090825.tar
c156d21b1b6adf1b47102a5f37c4d1d9acdb637f  nios2-linux-20090825.tar
tar xf nios2-linux-20090825.tar
cd nios2-linux
./checkout 

 

For existing nios2 git users,
(with nios2-linux-20080619.tar or nios2-linux-20090730.tar) there is no
need to download the tarball. You may add a new branch to track nios2
mmu kernel and clone to get the binary toolchain.

cd nios2-linux
git clone git://sopc.et.ntust.edu.tw/git/toolchain-mmu.git
cd linux-2.6
git fetch origin
git branch nios2mmu origin/nios2mmu
git checkout -f nios2mmu
git clean -f -x -d
cd ..
cd uClinux-dist
git fetch origin
git branch trunk origin/trunk
git checkout -f trunk
git clean -f -x -d
In
short, to build with MMU, use nios2mmu branch on linux-2.6, trunk
branch on uClinux-dist. To build without MMU, use test-nios2 branch on
linux-2.6, test-nios2 branch on uClinux-dist.

 

QUICK START

1. add the binary toolchain to your PATH in, .bash_profile or .profile
, like 
this, 
PATH=$PATH:/home/hippo/nios2-linux/toolchain-mmu/x86-linux2/bin

2. Build the Linux image in uClinux-dist dir,

  
cd nios2-linux/uClinux-dist
make menuconfig   # or make xconfig
  
In the menuconfig, make sure it is selected as follows:

Vendor/Product Selection --->   # select
    --- Select the Vendor you wish to target
        Vendor (Altera)  --->   # select Altera 
    --- Select the Product you wish to target 
        Altera Products (nios2)   --->  # select nios2

Kernel/Library/Defaults Selection --->  # select
    --- Kernel is linux-2.6.x
Libc Version (None)  --->   # should default to None - very important.
    [*] Default all settings (lose changes) # select
    [ ] Customize Kernel Settings 
    [ ] Customize Vendor/User Settings 
    [ ] Update Default Vendor Settings 
Then   
(If you were asked option like this, "Build faac (LIB_FAAC) [N/y/?]
(NEW)" just enter to use default. This will be fixed.) 

  
Compile kernel and apps,

make

(this will take a while)

3. The images created are,

images/linux.initramfs.gz is the elf image with initramfs built-in
images/zImge.initramfs.gz is the compressed elf image with initramfs
built-in

images/vmImage is compressed u-boot image
images/rootfs.initramfs.gz is compressed initramfs to be used as
initrd by u-boot
images/rootfs.jffs2
is jffs2 image, eg, cp rootfs.jffs2 /dev/mtd0. This is available when
jffs2 is selected in kernel. Please note the flash erase sector size on
3c120 dev board is 128KB, you will have to specify "MKFS_JFFS2_FLAGS =
-l -p -e 128" at the beginning of your product Makefile.

Connect USB Blaster cable to 3C120 dev board, download the sof and elf.

nios2-configure-sof ../3c120_default/nios2_linux_3c120_125mhz_top.sof
nios2-download -g images/linux.initramfs.gz
nios2-terminal


There is a prebuild linux.initramfs.gz elf image in the 3c120_default 
dir, which you may try out first.

4. Get source updates from community server.

Normally you will use "git" protocol to get update from server if your
PC
is directly connected to the Internet. Then you may skip to step 5.

Only if you are behind a proxy firewall and cannot use git protocol, 
you can change the git to use ssh tunneling through port 443 to get
updates
from community server with this command, "./use_ssh443_for_update" .

You should have ssh tunneling package installed, such as "corkscrew".
Add the following 3 lines to your ~/.ssh/config, which should have no
public access, "chmod og-rwx config". Replace 

to that of your http proxy server. Change the nios2-linux path to yours.


IdentityFile ~/.ssh/id_rsa
IdentityFile ~/nios2-linux/sshkey/id_rsa
ProxyCommand corkscrew   %h %p



If you failed to use ssh tunnling as above, you may try dumb http
protocol
with this command, "./use_

[linuxkernelnewbies] [Nios2-dev] nios2 mmu toolchain rebuilt

2009-09-09 Thread peter teoh






http://sopc.et.ntust.edu.tw/pipermail/nios2-dev/2009-September/003066.html

Dear nios2 developers,

Please note that I have rebuilt the gcc4 toolchain. The compiler is renamed to nios2-linux-gnu-gcc. Please pull toolchain-mmu/master and 
uClinux-dist/trunk to get updates. You may find patches and rebuild instruction in http://sopc.et.ntust.edu.tw/pub/gnutools/nios2gcc4/README.

Cheers,
Thomas



Build nios2 gcc4 toolchain

0. you need gcc3 and makeinfo to build. install these packages,

yum install compat-gcc-34 texinfo tetex*

1. get the source

http://www.niosftp.com/pub/gnutools/wrs-linux-4.1-176-nios2-wrs-linux-gnu.src.tar.bz2

2. get the patches

ftp://sopc.et.ntust.edu.tw/pub/gnutools/nios2gcc4

0001-binutils-fix-makeinfo-version-check-in-configure.patch
0001-gcc-fix-makeinfo-version-check-in-configure.patch
0001-gdb-fix-makeinfo-version-check-in-configure.patch
0001-kbuild-fix-C-libary-confusion-in-unifdef.c-due-to-g.patch
build.sh
REAME (this file)

3. untar the source

tar jxf wrs-linux-4.1-176-nios2-wrs-linux-gnu.src.tar.bz2

4. in the wrs-linux-4.1-176-nios2-wrs-linux-gnu dir, there are source tarballs. extract them to build dir, /opt/nios2gcc4/src. 

cd wrs-linux-4.1-176-nios2-wrs-linux-gnu

tar jxf binutils-4.1-176.tar.bz2	-C /opt/nios2gcc4/src
tar jxf expat-4.1-176.tar.bz2		-C /opt/nios2gcc4/src
tar jxf gcc-4.1-176.tar.bz2		-C /opt/nios2gcc4/src
tar jxf gdb-4.1-176.tar.bz2		-C /opt/nios2gcc4/src
tar jxf glibc-4.1-176.tar.bz2		-C /opt/nios2gcc4/src
tar jxf glibc_localedef-4.1-176.tar.bz2	-C /opt/nios2gcc4/src
tar jxf glibc_ports-4.1-176.tar.bz2	-C /opt/nios2gcc4/src
tar jxf linux-4.1-176.tar.bz2		-C /opt/nios2gcc4/src

5. apply the patches

cd /opt/nios2gcc4/src/binutils-2.17.50
patch -p1 < 0001-binutils-fix-makeinfo-version-check-in-configure.patch
cd /opt/nios2gcc4/src/gcc-4.1
patch -p1 < 0001-gcc-fix-makeinfo-version-check-in-configure.patch
cd /opt/nios2gcc4/src/gdb-wrs
patch -p1 < 0001-gdb-fix-makeinfo-version-check-in-configure.patch
cd /opt/nios2gcc4/src/linux-2.6.21-wrs-nios2
patch -p1 < 0001-kbuild-fix-C-libary-confusion-in-unifdef.c-due-to-g.patch

6. run the build script, you can change the build dir, /opt/nios2gcc4 and install dir /opt/nios2.

sh build.sh

7. you will need mkimage of u-boot or install it from debian/ubuntu package uboot-mkinage.

[linuxkernelnewbies] Who-T: Re-designing input methods with XKB

2009-09-09 Thread peter teoh





http://who-t.blogspot.com/


Thursday, August 20, 2009

Re-designing
input methods with XKB


I've had an interesting meeting with Jens Petersen yesterday about
input methods. Jens is one of the i18n guys working for Red Hat.

Input
methods are a way of merging several typed symbols into one actual
symbols. Western languages rarely use them (the compose key isn't quite
the same), but many eastern languages rely on them. To give one (made
up) example, an IM setup allows you to type "qqq" and converts it into
the chinese symbol for tree. 

Unfortunately, IM implementations
are somewhat broken and rely on a multitude of hacks. Right now, IM
implementations often need to hook onto keycodes instead of keysyms.
Keycodes are a numerical value that is usually the same for a key
(except when it isn't). So "q" will always be the same keycode (except
when it isn't). In X, a keycode has no meaning other than being an
index into the keysym table.

Keysyms are the actual symbols that
are to be displayed. So while the "q" key may have a keycode of 24, it
will have the keysym for "q" in qwerty and the keysym for "a" in azerty.

And
here's where everything goes wrong for IM. If you listen for keycodes,
and you switch drivers, then keycode 24 isn't the same key anymore. If
you listen for keysyms and you switch layout, keysym "q" isn't the same
key anymore. Oops.

During a previous meeting and the one yesterday, we came up with a
solution to fix them properly.

Let's
take a step back and look at keyboard input. The user hits a physical
key, usually because of what is printed on that key. That key generates
a keycode, which represents a keysym. That keysym is usually the same
symbol as what is printed on the keyboard. (Of course, there are
exceptions to that with the prime example being dvorak layout on a
qwerty physical keyboard)
In the end, IM should aim to provide the same functionality, with the
added step of combining multiple symbols into one.

For IM implementations, we can differ between two approaches:
In
the first approach, a set of keysyms should combine to a final symbol.
For example, typing "tree" should result in a tree symbol. This case
can be fixed easily by the IM implementation only ever dealing with
keysyms. Where the key is located doesn't matter and it works equally
well with us(qwerty) and fr(dvorak). As a mental bridge: if the symbols
come in via morse code and you can convert to the correct final symbol,
then your IM is in this category. This approach is easy to deal with,
so we can close the case on it.

In the second approach, a set of
key presses should combine to a final symbol. For example, typing the
top left key four times should result in a tree symbol. In this case,
we can't hook onto keysyms because they may change with the layout. But
we can't hook onto keycodes either because they are essentially random.

Wait. What? Why does the keysym change with the layout? 

Because we have the wrong layout selected.
If you're trying to type Chinese, you shouldn't have a us layout. If
you're trying to type Japanese, you shouldn't have a french layout.
Because these keysyms don't represent what the key is supposed to do.
The keysyms are supposed to represent what is printed on the keyboard,
and those symbols are Chinese, Japanese, Indic, etc. So the solution is
to fix up the keysyms. Instead of trying to listen for a "q", the
keyboard layout should generate a "tree" keysym. The IM implementation
can then listen for this symbol and combine to the final symbol as
required.

This essentially means that for each keyboard with
intermediate symbols there should be an appropriate keyboard layout -
just as there is for western languages. And once these keysyms are
available, the second approach becomes identical to the first approach
and it doesn't matter anymore where the physical key is located.

The
good thing about this approach are that users and developers can
leverage existing tools for selecting and changing between different
layouts. (bonus points for using the word "leverage") It also means
that a more unified configuration between standard DE tools and IM
tools is possible.

For the IM implementation, this simplifies
things by a bit. First of all, it can listen to the XKB group state to
adjust automatically whether IM is needed or not. For example, if
us(qwerty) and traditional chinese are configured as layouts, the IM
implementation can kick in whenever the group toggles to chinese. As
long as it is on us(qwerty), it can slumber in the background.

Second,
no layout-specific hacks are required. The physical location of the
key, the driver, they all don't matter anymore. Even morse-code is
supported now ;)

Talking to Jens, his main concern is that XKB
limits to 4 groups at a time. This restriction is built into the
protocol and won't disappear completely anytime soon. Though XI2 and
XKB2 address this issue, it will take a while to get a meaningful
adoption rate. Nonetheless, the approach above sh

[linuxkernelnewbies] Who-T: XI2 Recipies, Part 2

2009-09-09 Thread peter teoh





http://who-t.blogspot.com/2009/06/xi2-recipies-part-2.html

Tuesday, June 9, 2009

XI2
Recipies, Part 2


This post is part of a mini-series of various recipes on how to deal
with the new functionality in XI2. The examples here are merely
snippets, full example programs to summarize each part are available here.

Update 13.07.09: adjusted to the new cookie
event API

A word of warning: XI2 is still in flux and the code documented here
may change before the 2.0 release.

In Part
1 I covered how to initialise and select for events. In this part,
I will cover how to query and modify the device hierarchy.

What is the device hierarchy?

The
device hierarchy is the tree of master and slave devices. A master
pointer is represented by a visible cursor, a master keyboard is
represented by a keyboard focus. Slave pointers and keyboards are
(usually) physical devices attached to one master device.

The
distinction may sound odd first but we've been using it for years. A
computer has two sets of interfaces: Physical interfaces are what we
humans employ to interact with the computer (e.g. mouse, keyboard,
touchpad). Virtual interfaces is what applications actually see. Think
of it: if you have a laptop with two physical devices (a mouse and a
touchpad) you're still only controlling one virtual device (the
cursor). So although you have two very different physical interfaces,
the application isn't aware of it at all. 

This works mostly
fine as long as you have only one virtual interface per type but it
gets confusing really quickly if you have multiple users on the same
screen at the same time. Hence the explicit device hierarchy in XI2.

We
call virtual devices master devices, and physical devices slave
devices. Note that there are exceptions where a slave device is a
emulation of a physical device.

A device may be of one of five device types:

  
  Master
pointers are devices that represent a cursor on the screen. One master
pointer is always available (the "Virtual core pointer"). Master
pointers usually send core events, meaning they appear like a normal
pointer device to non-XI applications.
  
  Master keyboards are
devices that represent a keyboard focus. One master keyboard is always
available (the "Virtual core keyboard"). Master keyboards usually send
core events, meaning they appear like a normal keyboard device to
non-XI applications.
  
  Slave pointers are pointer devices
that are attached to a master pointer. Slave pointers never send core
events, they are invisible to non-XI applications and can only interact
with a core application if they are attached to a master device (in
which case it's actually the master device that interacts)
  
  Slave
keyboards are keyboard devices that are attached to a master keyboard.
Slave keyboards never send core events, they are invisible to non-XI
applications and can only interact with a core application if they are
attached to a master device (in which case it's actually the master
device that interacts)
  
  Floating slaves are slave devices
that are currently not attached to a master device. They can only be
used by XI or XI2 applications and do not have a visible cursor or
keyboard focus.
  



So what does attachment mean? A
master device cannot generate events by itself. If a slave device is
attached to a master device, then each event that the slave device
generates is also passed through the master device. This is how the X
server works since 1.4, if you click a mouse button, the server sends a
click event from the mouse and from the "virtual core pointer".

A
floating device on the other hand does not send events through the
master device. They don't control a visible cursor or keyboard focus
and any application listening to a floating slave device needs to
control focus and cursor manually. One example where floating slaves
are useful is the use of graphics tablets in the GIMP (where the area
of the tablet is mapped to the canvas).

For most applications, you will only ever care about master devices.

Querying the device hierarchy


At some point, clients may need to know which devices are actually
present in the system right now. 


int ndevices;
XIDeviceInfo *devices, device;

devices = XIQueryDevice(display, XIAllDevices, &ndevices);

for (i = 0; i < ndevices; i++) {
device = &devices[i];
printf("Device %s (id: %d) is a ", device->name, device->deviceid);

switch(device->use) {
   case XIMasterPointer: printf("master pointer\n"); break;
   case XIMasterKeyboard: printf("master keyboard\n"); break;
   case XISlavePointer: printf("slave pointer\n"); break;
   case XISlaveKeyboard: printf("slave keyboard\n"); break;
   case XIFloatingSlave: printf("floating slave\n"); break;
}

printf("Device is attached to/paired with %d\n", device->attachement);
}

XIFreeDeviceInfo(devices);




As
with event selection, XIAllDevices and XIAllMaster devices are valid as
device ID parameter. Alternatively, just supply the device I

[linuxkernelnewbies] kernel threads enumeration in 2.6.31-rc9

2009-09-09 Thread Peter Teoh







root 2 0  0 10:42 ?    00:00:00 [kthreadd]
root 3 2  0 10:42 ?    00:00:00 [migration/0]
root 4 2  0 10:42 ?    00:00:00 [ksoftirqd/0]
root 5 2  0 10:42 ?    00:00:00 [watchdog/0]
root 6 2  0 10:42 ?    00:00:00 [migration/1]
root 7 2  0 10:42 ?    00:00:00 [ksoftirqd/1]
root 8 2  0 10:42 ?    00:00:00 [watchdog/1]
root 9 2  0 10:42 ?    00:00:00 [events/0]
root    10 2  0 10:42 ?    00:00:00 [events/1]
root    11 2  0 10:42 ?    00:00:00 [khelper]
root    17 2  0 10:42 ?    00:00:00 [async/mgr]
root   155 2  0 10:42 ?    00:00:00 [kintegrityd/0]
root   156 2  0 10:42 ?    00:00:00 [kintegrityd/1]
root   158 2  0 10:42 ?    00:00:00 [kblockd/0]
root   159 2  0 10:42 ?    00:00:00 [kblockd/1]
root   161 2  0 10:42 ?    00:00:00 [kacpid]
root   162 2  0 10:42 ?    00:00:00 [kacpi_notify]
root   163 2  0 10:42 ?    00:00:00 [kacpi_hotplug]
root   253 2  0 10:42 ?    00:00:00 [ata/0]
root   254 2  0 10:42 ?    00:00:01 [ata/1]
root   255 2  0 10:42 ?    00:00:00 [ata_aux]
root   257 2  0 10:42 ?    00:00:00 [ksuspend_usbd]
root   261 2  0 10:42 ?    00:00:00 [khubd]
root   264 2  0 10:42 ?    00:00:00 [kseriod]
root   333 2  0 10:42 ?    00:00:00 [khungtaskd]
root   356 2  0 10:42 ?    00:00:00 [pdflush]
root   357 2  0 10:42 ?    00:00:00 [pdflush]
root   358 2  0 10:42 ?    00:00:00 [kswapd0]
root   412 2  0 10:42 ?    00:00:00 [aio/0]
root   413 2  0 10:42 ?    00:00:00 [aio/1]
root   424 2  0 10:42 ?    00:00:00 [crypto/0]
root   425 2  0 10:42 ?    00:00:00 [crypto/1]
root   636 2  0 10:42 ?    00:00:00 [kpsmoused]
root   642 2  0 10:42 ?    00:00:00 [kstriped]
root   645 2  0 10:42 ?    00:00:00 [ksnapd]
root   674 2  0 10:42 ?    00:00:00 [usbhid_resumer]
root   807 2  0 10:42 ?    00:00:01 [scsi_eh_0]
root   808 2  0 10:42 ?    00:00:00 [scsi_eh_1]
root   819 2  0 10:42 ?    00:00:00 [scsi_eh_2]
root   820 2  0 10:42 ?    00:00:00 [scsi_eh_3]
root   822 2  0 10:42 ?    00:00:00 [scsi_eh_4]
root   823 2  0 10:42 ?    00:00:00 [scsi_eh_5]
root   830 2  0 10:42 ?    00:00:00 [kjournald]
root  1275 2  0 10:42 ?    00:00:00 [kgameportd]
root  1426 2  0 10:42 ?    00:00:00 [scsi_eh_6]
root  1427 2  0 10:42 ?    00:00:00 [scsi_eh_7]
root  1628 2  0 10:42 ?    00:00:00 [kauditd]
root  1784 2  0 10:42 ?    00:00:00 [kmpathd/0]
root  1786 2  0 10:42 ?    00:00:00 [kmpathd/1]
root  1787 2  0 10:42 ?    00:00:00 [kmpath_handlerd]
root  1829 2  0 10:43 ?    00:00:00 [kjournald2]
root  1830 2  0 10:43 ?    00:00:00 [kjournald]
root  1831 2  0 10:43 ?    00:00:00 [kjournald]
root  1832 2  0 10:43 ?    00:00:00 [kjournald]
root  1833 2  0 10:43 ?    00:00:00 [kjournald]
root  1834 2  0 10:43 ?    00:00:00 [kjournald]
root  1835 2  0 10:43 ?    00:00:00 [kjournald]
root  1836 2  0 10:43 ?    00:00:00 [kjournald]
root  1841 2  0 10:43 ?    00:00:00 [kjournald2]
root  2060 2  0 10:43 ?    00:00:00 [rpciod/0]
root  2062 2  0 10:43 ?    00:00:00 [rpciod/1]
root  2411 2  0 10:43 ?    00:00:00 [bluetooth]
root  3042  2822  0 10:46 ?    00:00:00 [Xsession]

[linuxkernelnewbies] Easy way to convert OGG Vorbis to mp3 on Fedora - LinuxQuestions.org

2009-09-09 Thread peter teoh

http://www.linuxquestions.org/questions/linux-software-2/easy-way-to-convert-ogg-vorbis-to-mp3-on-fedora-484356/
Easy way to convert OGG Vorbis to mp3
on Fedora

[Log in
to get rid of this advertisement]

First off, I'd like to say that this thread isn't a question, but
instead an answer to many questions Fedora users may have.

Recently I have searched for a way to convert OGG Vorbis to mp3
so that I could send mp3 files to Windows users so that they don't have
to download an OGG player. I read that SoX (Sound Exchange) was a great
program, but it was used only in the Terminal and you needed to enable
mp3 support for it, which was a hassle to many users. So, I stumbled
upon a simpler solution: Sound Converter.

Downloading Sound Converter

In this solution, I am assuming that you have GNOME as the default
destkop environment. Go to Applications, Add/Remove Software. If you
are not logged in as root, you will need to type in your password to
open the program. After the Package Manager has loaded, go to
Applications, Sound and Video, and click on the Optional Packages
button. A new window will popup showing you all the available sound and
video packages. Scroll down until you see soundconverter.
A description of the program will appear beside it. Place a checkmark
beside its name and click close. Then, click Apply and wait for the
program to download and install. Once the installation is complete, you
may now use Sound Converter.

Using Sound Converter

Sound Converter is a nice, easy to use program with a simple user
interface. You can convert a single file, a multitude of files, or even
a folder of files at once. To use, click on one of the buttons that say
Add File or Add Folder. An open file dialog will appear. Choose a file
and click Open. Then, to change your conversion options, go to Edit,
Preferences. The Preferences dialog will appear. Under the heading
"Type of Result?," you may choose from OGG Vorbis, MP3, FLAC, and WAV.
Once you have chosen your conversion type, you may change your bitrate
mode (MP3 only) or quality (OGG Vorbis and MP3 only). You can change
several other options if you wish, but when you're finished, click
Close. When you are ready to convert, simply click the Convert button.
When you are finished, you may convert another file or folder or simply
close Sound Converter.

[linuxkernelnewbies] Large Blocksize Support V4

2009-09-09 Thread peter teoh






http://lkml.org/lkml/2007/6/20/238

V3->V4
- It is possible to transparently make filesystems support larger
  blocksizes by simply allowing larger blocksizes in set_blocksize.
  Remove all special modifications for mmap etc from the filesystems.
  This now makes 3 disk based filesystems that can use larger blocks
  (reiser, ext2, xfs). Are there any other useful ones to make work?
- Patch against 2.6.22-rc4-mm2 which allows the use of Mel's antifrag
  logic to avoid fragmentation.
- More page cache cleanup by applying the functions to filesystems.
- Disable bouncing when the gfp mask is setup.
- Disable mmap directly in mm/filemap.c to avoid filesystem changes
  while we have no mmap support for higher order pages.

RFC V2->V3
- More restructuring
- It actually works!
- Add XFS support
- Fix up UP support
- Work out the direct I/O issues
- Add CONFIG_LARGE_BLOCKSIZE. Off by default which makes the inlines revert
  back to constants. Disabled for 32bit and HIGHMEM configurations.
  This also allows a gradual migration to the new page cache
  inline functions. LARGE_BLOCKSIZE capabilities can be
  added gradually and if there is a problem then we can disable
  a subsystem.

RFC V1->V2
- Some ext2 support
- Some block layer, fs layer support etc.
- Better page cache macros
- Use macros to clean up code.

This patchset modifies the Linux kernel so that larger block sizes than
page size can be supported. Larger block sizes are handled by using
compound pages of an arbitrary order for the page cache instead of
single pages with order 0.

Rationales:

1. We have problems supporting devices with a higher blocksize than
   page size. This is for example important to support CD and DVDs that
   can only read and write 32k or 64k blocks. We currently have a shim
   layer in there to deal with this situation which limits the speed
   of I/O. The developers are currently looking for ways to completely
   bypass the page cache because of this deficiency.

2. 32/64k blocksize is also used in flash devices. Same issues.

3. Future harddisks will support bigger block sizes that Linux cannot
   support since we are limited to PAGE_SIZE. Ok the on board cache
   may buffer this for us but what is the point of handling smaller
   page sizes than what the drive supports?

4. Reduce fsck times. Larger block sizes mean faster file system checking.
   Using 64k block size will reduce the number of blocks to be managed
   by a factor of 16 and produce much denser and contiguous metadata.

5. Performance. If we look at IA64 vs. x86_64 then it seems that the
   faster interrupt handling on x86_64 compensate for the speed loss due to
   a smaller page size (4k vs 16k on IA64). Supporting larger block sizes
   sizes on all allows a significant reduction in I/O overhead and increases
   the size of I/O that can be performed by hardware in a single request
   since the number of scatter gather entries are typically limited for
   one request. This is going to become increasingly important to support
   the ever growing memory sizes since we may have to handle excessively
   large amounts of 4k requests for data sizes that may become common
   soon. For example to write a 1 terabyte file the kernel would have to
   handle 256 million 4k chunks.

6. Cross arch compatibility: It is currently not possible to mount
   an 16k blocksize ext2 filesystem created on IA64 on an x86_64 system.
   With this patch this becomes possible. Note that this also means that
   some filesystems are already capable of working with blocksizes of
   up to 64k (ext2, XFS) which is currently only available on a select
   few arches. This patchset enables that functionality on all arches.
   There are no special modifications needed to the filesystems. The
   set_blocksize() function call will simply support a larger blocksize.

7. VM scalability
   Large block sizes mean less state keeping for the information being
   transferred. For a 1TB file one needs to handle 256 million page
   structs in the VM if one uses 4k page size. A 64k page size reduces
   that amount to 16 million. If the limitation in existing filesystems
   are removed then even higher reductions become possible. For very
   large files like that a page size of 2 MB may be beneficial which
   will reduce the number of page struct to handle to 512k. The variable
   nature of the block size means that the size can be tuned at file
   system creation time for the anticipated needs on a volume.

8. IO scalability
   The IO layer will receive large blocks of contiguious memory with
   this patchset. This means that less scatter gather elements are needed
   and the memory used is guaranteed to be contiguous. Instead of having
   to handle 4k chunks we can f.e. handle 64k chunks in one go.

   Dave Chinner measures a performance increase of 50% when going to 64k
   blocksize with XFS.

How to make this work:

1. Apply this patchset on top of 2.6.22-rc4-mm2
2. Enable LARGE_BLOCKSIZE Support
3. co

[linuxkernelnewbies] 2.6.25-rc8-mm2: IP: [] __kmalloc+0x69/0x110

2009-09-09 Thread peter teoh






http://linux.derkeiler.com/Mailing-Lists/Kernel/2008-04/msg04749.html

2.6.25-rc8-mm2: IP: [] __kmalloc+0x69/0x110


  From: Alexey Dobriyan 
  Date: Mon, 14 Apr 2008 00:44:22 +0400


Grrr, I was hunting for oopses in dup_fd and near that were plaguing one
box here for far too long, and hit below.

What happened if freshly booted box (probably not all init scripts
finished),
X already started. ssh from another box and reboot from session.


(gdb) p __kmalloc
$1 = {void *(size_t, gfp_t)} 0x80286890 <__kmalloc>
(gdb) l *(0x80286890 + 0x69)
0x802868f9 is in __kmalloc (mm/slub.c:1663).
1658
1659 object = __slab_alloc(s, gfpflags, node, addr, c);
1660
1661 else {
1662 object = c->freelist;
1663 ===> c->freelist = object[c->offset]; <===
1664 stat(c, ALLOC_FASTPATH);
1665 }
1666 local_irq_restore(flags);



BUG: unable to handle kernel paging request at 00050500
IP: [] __kmalloc+0x69/0x110
PGD 17e04a067 PUD 0 
Oops:  [1] SMP DEBUG_PAGEALLOC
last sysfs file:
/sys/devices/pci:00/:00:1e.0/:05:02.0/resource
CPU 1 
Modules linked in: nf_conntrack_irc ipt_MASQUERADE iptable_nat nf_nat
nf_conntrack_ipv4 xt_state nf_conntrack iptable_filter ip_tables
x_tables usblp ehci_hcd uhci_hcd usbcore sr_mod cdrom
Pid: 4966, comm: depscan.sh Not tainted 2.6.25-rc8-mm2 #20
RIP: 0010:[] []
__kmalloc+0x69/0x110
RSP: 0018:81017cba9c68 EFLAGS: 00010006
RAX:  RBX: 805c3950 RCX: 81017e7bb278
RDX: 81017c868000 RSI: 0001 RDI: 802868db
RBP: 81017cba9c98 R08:  R09: 0001
R10: 05050561 R11: 036c00b1 R12: 00050500
R13: 0282 R14: 80d0 R15: 810001070360
FS: 7fc9d17276f0() GS:81017fc44600()
knlGS:
CS: 0010 DS:  ES:  CR0: 8005003b
CR2: 00050500 CR3: 00017c9c2000 CR4: 06e0
DR0:  DR1:  DR2: 
DR3:  DR6: 0ff0 DR7: 0400
Process depscan.sh (pid: 4966, threadinfo 81017cba8000, task
81017c868000)
Stack: 802d4a42 81017e7bb278 81017e7bb278
fe5c5c7c
0cb4c2b8 81017efdc8c0 81017cba9cd8 802d4a42
81017cba9cd8 81017e7bb278 81017f82e2a0 81017cba9da8
Call Trace:
[] ? ext3_htree_store_dirent+0x32/0x120
[] ext3_htree_store_dirent+0x32/0x120
[] htree_dirblock_to_tree+0x105/0x170
[] ext3_htree_fill_tree+0x7d/0x220
[] ? trace_hardirqs_on_caller+0xc9/0x150
[] ? ext3_readdir+0x5c4/0x630
[] ext3_readdir+0x144/0x630
[] ? filldir+0x0/0xe0
[] ? __mutex_lock_common+0x22a/0x330
[] ? vfs_readdir+0x71/0xc0
[] ? filldir+0x0/0xe0
[] ? filldir+0x0/0xe0
[] vfs_readdir+0xa3/0xc0
[] sys_getdents+0x92/0xd0
[] system_call_after_swapgs+0x7b/0x80


Code: 48 89 45 d0 9c 41 5d fa e8 f5 a5 fc ff 65 8b 04 25 24 00 00 00 48
98 4c 8b bc c3 c8 00 00 00 4d 8b 27 4d 85 e4 74 7a 41 8b 47 14
<49> 8b 04 c4 49 89 07 41 f7 c5 00 02 00 00 75 37 41 55 9d e8 bf 
RIP [] __kmalloc+0x69/0x110
RSP 
CR2: 00050500
---[ end trace f513ce88520d2ac0 ]---
BUG: sleeping function called from invalid context at kernel/rwsem.c:21
in_atomic():0, irqs_disabled():1
INFO: lockdep is turned off.
irq event stamp: 19250
hardirqs last enabled at (19249): []
trace_hardirqs_on+0xd/0x10
hardirqs last disabled at (19250): []
trace_hardirqs_off+0xd/0x10
softirqs last enabled at (14334): []
__do_softirq+0xee/0x110
softirqs last disabled at (14329): []
call_softirq+0x1c/0x30
Pid: 4966, comm: depscan.sh Tainted: G D 2.6.25-rc8-mm2 #20

Call Trace:
[] ? print_irqtrace_events+0x110/0x120
[] __might_sleep+0xc7/0xe0
[] down_read+0x1d/0x50
[] exit_mm+0x2e/0xf0
[] do_exit+0x189/0x760
[] ? __wake_up+0x4e/0x70
[] oops_end+0x85/0x90
[] do_page_fault+0x3fc/0x890
[] ? __lock_acquire+0x645/0xc50
[] error_exit+0x0/0xa9
[] ? __kmalloc+0x4b/0x110
[] ? __kmalloc+0x69/0x110
[] ? __kmalloc+0x4b/0x110
[] ? ext3_htree_store_dirent+0x32/0x120
[] ? ext3_htree_store_dirent+0x32/0x120
[] ? htree_dirblock_to_tree+0x105/0x170
[] ? ext3_htree_fill_tree+0x7d/0x220
[] ? trace_hardirqs_on_caller+0xc9/0x150
[] ? ext3_readdir+0x5c4/0x630
[] ? ext3_readdir+0x144/0x630
[] ? filldir+0x0/0xe0
[] ? __mutex_lock_common+0x22a/0x330
[] ? vfs_readdir+0x71/0xc0
[] ? filldir+0x0/0xe0
[] ? filldir+0x0/0xe0
[] ? vfs_readdir+0xa3/0xc0
[] ? sys_getdents+0x92/0xd0
[] ? system_call_after_swapgs+0x7b/0x80

BUG: unable to handle kernel paging request at 00050500
IP: [] kmem_cache_alloc+0x52/0xd0
PGD 17e277067 PUD 0 
Oops:  [2] SMP DEBUG_PAGEALLOC
last sysfs file:
/sys/devices/pci:00/:00:1e.0/:05:02.0/resource
CPU 1 
Modules linked in: nf_conntrack_irc ipt_MASQUERADE iptable_nat nf_nat
nf_conntrack_ipv4 xt_state nf_conntrack iptable_filter ip_tables
x_tables usblp ehci_hcd uhci_hcd usbcore sr_mod cdrom
Pid: 4951, comm: bash Tainted: G D 2.6.25-rc8-mm2 #20
RIP: 0010:[] []
kmem_cache_alloc+0x52/0xd0
RSP: 0018:81017d76

[linuxkernelnewbies] 2.6.25-rc8-mm2 - ftraced chews 100% of a CPU

2009-09-09 Thread peter teoh






http://linux.derkeiler.com/Mailing-Lists/Kernel/2008-04/msg04304.html

2.6.25-rc8-mm2 -
ftraced chews 100% of a CPU


  From: valdis.kletni...@xx
  Date:
Sat, 12 Apr 2008 06:47:35 -0400


On Thu, 10 Apr 2008 20:33:54 PDT, Andrew Morton said:

ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.25-rc8/2.6.25-rc8-mm2/
  

(Man, everything I look at tonight falls over.. I'm jinxed :)

So I built a kernel with:

% zgrep FTRACE /proc/config.gz 
CONFIG_HAVE_FTRACE=y
CONFIG_FTRACE=y
CONFIG_DYNAMIC_FTRACE=y
CONFIG_FTRACE_SELFTEST=y
CONFIG_FTRACE_STARTUP_TEST=y

It ran pretty much OK for about 12 minutes, and then gkrellm reported
tons of
system time, and 'top' confirms it:

top - 06:27:28 up 25 min, 3 users, load average: 1.21, 1.20, 1.02
Tasks: 132 total, 3 
running, 128 sleeping, 0 stopped, 1
zombie
Cpu(s): 0.7%us, 50.7%sy, 0.0%ni, 48.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 2054764k total, 773648k used, 1281116k free, 32008k buffers
Swap: 2031608k total, 0k used, 2031608k free, 425696k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ 
COMMAND 
17 root 15 -5 0 0 0 R 99.8 0.0 13:04.53 ftraced 
2717 root 20 0 169m 60m 11m S 2.0 3.0 1:14.71 X 

And the CPUs are sitting at around 58C, which is their usual temp when

running flat-out, so there's actual
looping happening.

I see this in the dmesg from right around when it went belly-up:

[ 725.345544] hm, dftrace overflow: 127 changes (0 total) in 273 usecs
[ 725.345562] [ cut here ]
[ 725.345568] WARNING: at kernel/trace/ftrace.c:658
ftraced+0x138/0x1f0()

(no I have no idea what the system was doing at that instant)

and 'echo t > /proc/sysrq-trigger' tells us:

[ 2368.256762] ftraced R running task 5952 17 2
[ 2368.256769] 81000101b940 00010101f800 81007f957d10
802302c9
[ 2368.256777] 81007f954480  ab636be3
805e35ec
[ 2368.256784] 81007f957d50 80213b72 0246
8022ea19
[ 2368.256791] Call Trace:
[ 2368.256796] [] ? schedule+0x3e/0x6a9
[ 2368.256801] [] ?
pick_next_task_fair+0xa0/0xc2
[ 2368.256806] [] ? trace_preempt_on+0x1c/0x32
[ 2368.256811] [] ? sub_preempt_count+0x49/0x73
[ 2368.256817] [] ? _spin_lock_irqsave+0x35/0x69
[ 2368.256826] [] ? __mod_timer+0xce/0xf4
[ 2368.256830] [] ? del_timer_sync+0x28/0x4d
[ 2368.256836] [] ? schedule_timeout+0xac/0xdc
[ 2368.256840] [] ? process_timeout+0x0/0x37
[ 2368.256844] [] ? __mod_timer+0x30/0xf4
[ 2368.256851] [] ? ftraced+0x52/0x1f0
[ 2368.256855] [] ? kthread+0x0/0xa4
[ 2368.256859] [] ? ftraced+0x0/0x1f0
[ 2368.256863] [] ? kthread+0x61/0xa4
[ 2368.256868] [] ? child_rip+0xa/0x12
[ 2368.256873] [] ? restore_args+0x0/0x30
[ 2368.256878] [] ? kthread+0x0/0xa4
[ 2368.256882] [] ? child_rip+0x0/0x12

Hopefully this tells you something?

[linuxkernelnewbies] Unify interface to persistent CMOS/RTC/whatever clock: hook function usage

2009-09-09 Thread peter teoh

http://lkml.org/lkml/2006/8/19/84

Enter your
search termsSubmit
search formWeblkml.org

  From
  David Brownell <>

  Subject
  Re: [RFC][PATCH] Unify interface to persistent
CMOS/RTC/whatever clock

  Date
  Sat, 19 Aug 2006 09:39:10 -0700

On Friday 18 August 2006 4:36 pm, john stultz wrote:

> No, I'm sorry, I realize the RTC interface has the potential to be more
> widely used. I'm just a bit frustrated that in order to utilize the RTC
> framework for what I'm trying to do, I have to first implement RTC
> drivers for 90% of the arches. :(

Maybe not ... you could do a lot with just a hook function.  Your original
patch provided one hook (an arch-level function call) assuming the RTC is
always accessible with IRQs blocked.  The RTC framework could provide such
a similar hook too ... best done through a function pointer, though.  It
shouldn't be a kernel build error if there's no RTC; userspace can use an
external one via NTP, load a module later than you'd like, etc.

Then be sure to call that hook from some can-sleep context, and you're as
set for the boot/init issue as possible.  (That is, no luck if the RTC is
in a module loaded after init starts.)

> Also the "RTC might not be available when you need it" issue makes it
> uh.. difficult to use. :)

I can't see that "when you need it" thing.  Especially since you had
proposed that "late" wall clock init was not a problem ...

> So if we go w/  the "it may not be available, so always assume it isn't"
> way of thinking, it forces us to rely upon the RTC driver(s) to resume
> time (which means every RTC, no matter how simple has to have
> suspend/resume hooks and call settimeofday at least). 

No, that was the point of my comment about using the new class level
suspend/resume calls.  The RTC drivers wouldn't be responsible for
that; the RTC framework would be.  RTC drivers may still want the
suspend/resume hooks to make sure they issue system wakeup events,
and so on, but no longer for maintaining the wall clock.  I'll send
a patch (of the "it compiles" type) later.

> I don't really like that uses-graph (I can imagine someone system not
> resuming properly because they forgot to compile in the right RTC
> driver).

Without the right RTC support, they'd not be getting initial clock
setup right either ... same difference, same userspace fix (using
external RTC, via NTP etc).

> Also it doesn't resolve the timekeeping resume ordering issue 
> I'm trying to address.

Worth exploring that a bit more.  Exactly what is the issue?  I'm
not sure your first explanation came across in enough depth...

> But we might have to deal with it. Just to make sure we can balance this
> properly, what is the percentage of RTC drivers where the might not be
> available at timekeeping_resume()?  If it is small, it might be
> reasonable to special case them (and by "special case", i *don't* mean
> ignore :)

Basically 100%.  It's because timekeeping_resume() applies to a (fake)
sys_device, and the sys_device resume phase kicks in before "real"
drivers resume, and with IRQs blocked ... ergo before sleeping calls
can be issued (e.g. waiting for I2C or SPI access to complete).

The RTC drivers are all "real" device drivers, and the RTC framework
itself issues sleeping calls, like mutex_lock.

> > The RTC framework is no more ARM-only than the generic TOD framework
> > is x86-only.  But those changes did start from different corners of the
> > Linux market, which likely explains some of the surprise associated with
> > this little collision.  (If rtc-acpi got a bit more attention, that'd
> > surely help raise awareness outside the embedded space...)
> 
> Looking at the rtc-acpi code, it describes itself as being AT compatible
> (ie: The old CMOS clock), but its not clear if it requires ACPI or not
> to work. Further, does it work on x86_64, or ia64 as well?

It's called "rtc-acpi" since it requires ACPI ... in fact, PNPACPI is
what provides the driver model device it binds to.  Plus it hooks into
ACPI to get the wakeup function, and the extra register options.  I'm
running it on both i386 and x86_64.  So like I said, most modern PCs
should be able to run it just fine.  IA64 uses ACPI, so presumably it
should be able to use the driver too.

The core of the driver could be reused on some non-PC platforms, some
not-modern PCs, and on ACPI platforms without PNPACPI ... given the
addition of platform code to register a platform_device.  And for the
ACPI-but-not-PNPCACPI configs, wakeup and the extra registers could
still be used.

Someone would have to provide the relevant patches; I'd not mind if
the driver were renamed to e.g. "rtc-cmos" at that point.

> Another comment, drivers/rtc/ is a bit overwhelming. Same with the
> Kconfig. Is there any way it could be broken up so arch-specific RTCs
> aren't shown on arches

[linuxkernelnewbies] NVIDIA Delivers Comprehensive OpenCL Support Under Snow Leopard

2009-09-10 Thread peter teoh






http://www.hardwarezone.com.sg/news/view.php?id=14526&cid=1



  

  
   
  NVIDIA
Delivers Comprehensive OpenCL Support Under Snow Leopard
  
  
  All NVIDIA CUDA-Enabled GPUs Shipped by
Apple Supported under New Operating System 
  
  
  SINGAPORE — September 4, 2009 — Apple’s new
Snow
Leopard operating system (OS) is the first OS to integrate OpenCL, a
cross-platform open standard that makes it possible for developers to
tap into the vast gigaflops of computing power currently in the
graphics processing unit (GPU) and use them for any application. 
  
OpenCL
on the NVIDIA® CUDA™ architecture enables applications to use the CPU
and the GPU together as co-processors. NVIDIA’s integration of the CUDA
architecture across its brands and segments enables it to offer Apple
users a broad selection of 10 GPU models officially supported by Snow
Leopard. These are: NVIDIA® GeForce® 9400M, GeForce 9600M GT, GeForce
8600M GT, GeForce GT 120, GeForce GT 130, GeForce GTX 285, GeForce 8800
GT, GeForce 8800 GS, NVIDIA® Quadro® FX 4800, and Quadro FX 5600. 
  
“NVIDIA
chairs the OpenCL working group and is the only company to have OpenCL
drivers for the GPU in the hands of thousands of Snow Leopard, Windows
and Linux developers today,” said Sanford Russell, general manager of
CUDA at NVIDIA. “We’re excited to see Snow Leopard ship, signaling the
arrival of GPU Computing for all Apple users.” NVIDIA has a range of
documentation and tools available for OpenCL including a detailed
programming guide, a Best Practices guide with tips and tricks for
tuning performance, SDK code samples and a soon-to-be-released Visual
Profiler for performance optimization. In addition to working closely
with Apple to integrate support for OpenCL into Snow Leopard, NVIDIA
has also released OpenCL conformant drivers for Windows and Linux. 
  
For more information, visit www.nvidia.com/opencl and for more information on
Apple’s Snow Leopard OS, go to www.apple.com/macosx   
  
  
  
  


  
  


  
  


  
  

  
  
  
  
  
  Ads By Google

[linuxkernelnewbies] SoX - Sound eXchange | HomePage

2009-09-10 Thread peter teoh

http://sox.sourceforge.net/

Welcome to the home of SoX, the Swiss Army knife of sound
processing programs.

SoX is a cross-platform (Windows, Linux, MacOS
X,
etc.) command line utility that can convert various formats of computer
audio files in to other formats. It can also apply various effects to
these sound files, and, as an added bonus, SoX can play and record
audio files on most platforms.

The screen-shot to the right shows an example
of SoX
first being used to process some audio, then being used to play some
audio files.

For the list of all file formats, device
drivers, and effects supported in the latest release, click
here.
To see the complete set of SoX documentation, click here.

If
you find SoX to be useful, please consider supporting the project with
a donation. We can accept PayPal donations through the SourceForge
donation system, although currently a SourceForge login ID (or an
openID), is required. Creating a SourceForge ID takes only a few
seconds—click on the Paypal logo below to make a donation.

An example SoX session

Latest News
SoX 14.3.0 was released on June 21, '09. Highlights include:

New filter effects: `sinc', `fir', `biquad'.

Other new effects: `stats', `overdrive', `vad'.

New audio device handler for OpenBSD.

Fixed problems with temporary file on Windows.

Can now enable automated clipping protection for most
effects.

Automatically `dither' as needed.

Improvements to AIFF, WAV, FLAC, MP3 handlers.

ALSA driver now supports 24-bit.

`spectrogram' effect enhancements including multi-channel
support.

`synth' effect enhancements including new `pluck' type.

More gain/normalise options.

Now uses CPU multi-core to speed up some effects.

SOX_OPTS environment variable for setting default options.

Interactive playback volume control (on some systems).

More `soxi' options, including multi-file total duration.

Can now auto-detect file-type even when inputing from a
pipe.

The complete list of changes can be viewed here.

Bugs and workarounds associated with recent releases can be
found here.

Download
SoX 14.3.0 downloads:

Source code distribution

MS-Windows executable

Mac OSX executable

Source code and executables for older versions
of SoX are available here.

Some third-party pre-built (usually older)
versions of SoX are available via the links page.

Using gnuplot with SoX

Technical Information

SoX is often used to convert an audio file from one sampling
rate to
another rate (e.g. from DAT to CD rates). SoX's resampling algorithm is
highly configurable; some notes on selecting options for resampling are
available here.

If you're interested in the format of various
audio files then you will be interested in the Audio File Format FAQ which I also maintain.

Support and Development
SorceForge.net Most of the SoX project's resources are provided by
Soureforge. There is a SoX project
web site that can be of use when working with CVS and its mailing list.

There is a low volume mailing list set up that you
can subscribe to or read online located at the SoX-users Mailing List web site.

Development of SoX is done using CVS. It is possible
to view the files checked in to CVS using a CVS web
interface as well as find additional information on SourceForge's CVS webpage.

If you have CVS installed on your system then you may
obtain a snapshot of the latest source by performing the following
commands. The commands will log you in and check out a copy of the sox
module and place it in the subdirectory it was run from.

cvs -d :pserver:anonym...@sox.cvs.sourceforge.net:/cvsroot/sox login
cvs -z3 -d :pserver:anonym...@sox.cvs.sourceforge.net:/cvsroot/sox co -P sox

The anonymous CVS account does not need a
password. When prompted for a password by the cvs command, enter
nothing and hit Enter.

To merge in future updates you may run the following
command (from inside the sox directory that was created from your
checkout):

cvs update -P

If you make any changes to SoX that you would
like to be included in future releases then you may use the following
command to make an easy to read diff.

cvs diff -uw

[linuxkernelnewbies] Our ACPI Core Software Implementation

2009-09-10 Thread Peter Teoh

http://www.usenix.org/events/usenix02/tech/freenix/full_papers/watanabe/watanabe_html/node10.html

Our ACPI Core Software
Implementation

In September 1999 we started writing our own ACPI core software
implementation, including an AML execution environment. The
implementation was based on Doug Rabson's ACPI disassembler and our
ACPI data analyzing tool.

We first wrote a ACPI memory recognition routine to detect and
preserve the ACPI tables. We then wrote a process that could run AML
methods manually (e.g. suspend and wakeup) based on somewhat
incomplete ASL output. This allowed us to enter power state S1 and
also to shutdown a machine by pushing the power button.

We also wrote an AML interpreter in user space by merging the
namespace functions from our analyzing tool into the ACPI disassembler
and adding a memory management module to it. After this was
implemented we merged the AML interpreter module into a kernel driver
and then we had a basic working version of power management.

While we were working out the bugs in our in-kernel AML interpreter,
we noticed that ACPI-CA software from Intel had a suitable license to
merge into FreeBSD. As we were preparing to merge our ACPI into the
main branch of the FreeBSD source repository we read the ACPI-CA
implementation. We then decided to switch to ACPI-CA using glue code
that we wrote. The reason we switched was that ACPI-CA is an
OS-independent implementation so we can share and benefit from
feedback from other groups. While the ACPI-CA implementation is
larger, it is also more complete and well documented. So our
implementation is no longer in the kernel, but it still remains in
user-level tools such as amldb(8) and acpidump(8).

[linuxkernelnewbies] bdget()

2009-09-11 Thread Peter Teoh






    524 static int
bdev_set(struct inode *inode, void *data)
    525 {
    526 BDEV_I(inode)->bdev.bd_dev = *(dev_t *)data;
    527 return 0;
    528 }
    529 
    530 static LIST_HEAD(all_bdevs);
    531 
    532 struct block_device *bdget(dev_t dev)
    533 {
    534 struct block_device *bdev;
    535 struct inode *inode;
    536 
    537 inode = iget5_locked(blockdev_superblock, hash(dev),
    538 bdev_test, bdev_set, &dev);
    539 
    540 if (!inode)
    541 return NULL;
    542 
    543 bdev = &BDEV_I(inode)->bdev;
    544 
    545 if (inode->i_state & I_NEW) {
    546 bdev->bd_contains = NULL;
    547 bdev->bd_inode = inode;
    548 bdev->bd_block_size = (1 <<
inode->i_blkbits);
    549 bdev->bd_part_count = 0;
    550 bdev->bd_invalidated = 0;
    551 inode->i_mode = S_IFBLK;
    552 inode->i_rdev = dev;
    553 inode->i_bdev = bdev;
    554 inode->i_data.a_ops = &def_blk_aops;
    555 mapping_set_gfp_mask(&inode->i_data,
GFP_USER);
    556 inode->i_data.backing_dev_info =
&default_backing_dev_info;
    557 spin_lock(&bdev_lock);
    558 list_add(&bdev->bd_list, &all_bdevs);
    559 spin_unlock(&bdev_lock);
    560 unlock_new_inode(inode);
    561 }
    562 return bdev;
    563 }
    564 
    565 EXPORT_SYMBOL(bdget);
    566

[linuxkernelnewbies] Linux for the Nios II Processor - Nios Community Wiki

2009-09-12 Thread peter teoh






http://www.nioswiki.com/linux


Linux for the Nios II Processor

 


Table of contents
No headers

  
This
is the community supported version of Nios II Linux with MMU. This is
package will work only on Linux. You will need a virtual Linux to run
it on Windows. This is GPL software, and come with absolutely NO
warranty.

You may get support with the Nios forum, or the nios2-dev mailing list.

http://forum.niosforum.com/forum/ind...p?showforum=18
http://sopc.et.ntust.edu.tw/cgi-bin/...info/nios2-dev
  
Install the required development packages on your Linux
desktop, as root or sudo,
For RHEL5/Centos5, enable EPEL
at https://fedoraproject.org/wiki/EPEL.


# for RHEL5/Centos5 only
wget http://download.fedora.redhat.com/pub/epel/5/i386/epel-release-5-3.noarch.rpm
rpm -Uvh epel-release-5-3.noarch.rpm

# for RHEL5/Centos5/Fedora11
yum install git-all git-gui tcsh make gcc ncurses-devel bison libglade2-devel \
byacc flex gawk gettext ccache zlib-devel gtk2-devel lzo-devel pax-utils

  
For new users who didn't
used nios2 git before, please download the tarball (1.7GB) ,as a normal
user,

wget http://www.niosftp.com/pub/linux/nios2-linux-20090825.tar

sha1sum nios2-linux-20090825.tar
c156d21b1b6adf1b47102a5f37c4d1d9acdb637f  nios2-linux-20090825.tar
tar xf nios2-linux-20090825.tar
cd nios2-linux
./checkout 

 

For existing nios2 git users,
(with nios2-linux-20080619.tar or nios2-linux-20090730.tar) there is no
need to download the tarball. You may add a new branch to track nios2
mmu kernel and clone to get the binary toolchain.

cd nios2-linux
git clone git://sopc.et.ntust.edu.tw/git/toolchain-mmu.git
cd linux-2.6
git fetch origin
git branch nios2mmu origin/nios2mmu
git checkout -f nios2mmu
git clean -f -x -d
cd ..
cd uClinux-dist
git fetch origin
git branch trunk origin/trunk
git checkout -f trunk
git clean -f -x -d
In
short, to build with MMU, use nios2mmu branch on linux-2.6, trunk
branch on uClinux-dist. To build without MMU, use test-nios2 branch on
linux-2.6, test-nios2 branch on uClinux-dist.

 

QUICK START

1. add the binary toolchain to your PATH in, .bash_profile or .profile
, like 
this, 
PATH=$PATH:/home/hippo/nios2-linux/toolchain-mmu/x86-linux2/bin

2. Build the Linux image in uClinux-dist dir,

  
cd nios2-linux/uClinux-dist
make menuconfig   # or make xconfig
  
In the menuconfig, make sure it is selected as follows:

Vendor/Product Selection --->   # select
    --- Select the Vendor you wish to target
        Vendor (Altera)  --->   # select Altera 
    --- Select the Product you wish to target 
        Altera Products (nios2)   --->  # select nios2

Kernel/Library/Defaults Selection --->  # select
    --- Kernel is linux-2.6.x
Libc Version (None)  --->   # should default to None - very important.
    [*] Default all settings (lose changes) # select
    [ ] Customize Kernel Settings 
    [ ] Customize Vendor/User Settings 
    [ ] Update Default Vendor Settings 
Then   
(If you were asked option like this, "Build faac (LIB_FAAC) [N/y/?]
(NEW)" just enter to use default. This will be fixed.) 

  
Compile kernel and apps,

make

(this will take a while)

3. The images created are,

images/linux.initramfs.gz is the elf image with initramfs built-in
images/zImge.initramfs.gz is the compressed elf image with initramfs
built-in

images/vmImage is compressed u-boot image
images/rootfs.initramfs.gz is compressed initramfs to be used as
initrd by u-boot
images/rootfs.jffs2
is jffs2 image, eg, cp rootfs.jffs2 /dev/mtd0. This is available when
jffs2 is selected in kernel. Please note the flash erase sector size on
3c120 dev board is 128KB, you will have to specify "MKFS_JFFS2_FLAGS =
-l -p -e 128" at the beginning of your product Makefile.

Connect USB Blaster cable to 3C120 dev board, download the sof and elf.

nios2-configure-sof ../3c120_default/nios2_linux_3c120_125mhz_top.sof
nios2-download -g images/linux.initramfs.gz
nios2-terminal


There is a prebuild linux.initramfs.gz elf image in the 3c120_default 
dir, which you may try out first.

4. Get source updates from community server.

Normally you will use "git" protocol to get update from server if your
PC
is directly connected to the Internet. Then you may skip to step 5.

Only if you are behind a proxy firewall and cannot use git protocol, 
you can change the git to use ssh tunneling through port 443 to get
updates
from community server with this command, "./use_ssh443_for_update" .

You should have ssh tunneling package installed, such as "corkscrew".
Add the following 3 lines to your ~/.ssh/config, which should have no
public access, "chmod og-rwx config". Replace 

to that of your http proxy server. Change the nios2-linux path to yours.


IdentityFile ~/.ssh/id_rsa
IdentityFile ~/nios2-linux/sshkey/id_rsa
ProxyCommand corkscrew   %h %p



If you failed to use ssh tunnling as above, you may try dumb http
protocol
with this command, "./use_http_for_update" . But this is very slow and
not 
recomm

[linuxkernelnewbies] 2.6.25-rc8-mm2 - ftraced chews 100% of a CPU

2009-09-12 Thread peter teoh






http://linux.derkeiler.com/Mailing-Lists/Kernel/2008-04/msg04304.html

2.6.25-rc8-mm2 - ftraced chews 100% of a CPU


  From: valdis.kletni...@xx
  Date: Sat, 12 Apr 2008 06:47:35 -0400


On Thu, 10 Apr 2008 20:33:54 PDT, Andrew Morton said:
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.25-rc8/2.6.25-rc8-mm2/


(Man, everything I look at tonight falls over.. I'm jinxed :)

So I built a kernel with:

% zgrep FTRACE /proc/config.gz 
CONFIG_HAVE_FTRACE=y
CONFIG_FTRACE=y
CONFIG_DYNAMIC_FTRACE=y
CONFIG_FTRACE_SELFTEST=y
CONFIG_FTRACE_STARTUP_TEST=y

It ran pretty much OK for about 12 minutes, and then gkrellm reported
tons of
system time, and 'top' confirms it:

top - 06:27:28 up 25 min, 3 users, load average: 1.21, 1.20, 1.02
Tasks: 132 total, 3
running,
128 sleeping, 0 stopped, 1 zombie
Cpu(s): 0.7%us, 50.7%sy, 0.0%ni, 48.7%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 2054764k total, 773648k used, 1281116k free, 32008k buffers
Swap: 2031608k total, 0k used, 2031608k free, 425696k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND

17 root 15 -5 0 0 0 R 99.8 0.0 13:04.53 ftraced 
2717 root 20 0 169m 60m 11m S 2.0 3.0 1:14.71 X 

And the CPUs are sitting at around 58C, which is their usual temp when
running
flat-out, so there's actual looping happening.

I see this in the dmesg from right around when it went belly-up:

[ 725.345544] hm, dftrace overflow: 127 changes (0 total) in 273 usecs
[ 725.345562] [ cut here ]
[ 725.345568] WARNING: at kernel/trace/ftrace.c:658
ftraced+0x138/0x1f0()

(no I have no idea what the system was doing at that instant)

and 'echo t > /proc/sysrq-trigger' tells us:

[ 2368.256762] ftraced R running task 5952 17 2
[ 2368.256769] 81000101b940 00010101f800 81007f957d10
802302c9
[ 2368.256777] 81007f954480  ab636be3
805e35ec
[ 2368.256784] 81007f957d50 80213b72 0246
8022ea19
[ 2368.256791] Call Trace:
[ 2368.256796] [] ? schedule+0x3e/0x6a9
[ 2368.256801] [] ?
pick_next_task_fair+0xa0/0xc2
[ 2368.256806] [] ? trace_preempt_on+0x1c/0x32
[ 2368.256811] [] ? sub_preempt_count+0x49/0x73
[ 2368.256817] [] ? _spin_lock_irqsave+0x35/0x69
[ 2368.256826] [] ? __mod_timer+0xce/0xf4
[ 2368.256830] [] ? del_timer_sync+0x28/0x4d
[ 2368.256836] [] ? schedule_timeout+0xac/0xdc
[ 2368.256840] [] ? process_timeout+0x0/0x37
[ 2368.256844] [] ? __mod_timer+0x30/0xf4
[ 2368.256851] [] ? ftraced+0x52/0x1f0
[ 2368.256855] [] ? kthread+0x0/0xa4
[ 2368.256859] [] ? ftraced+0x0/0x1f0
[ 2368.256863] [] ? kthread+0x61/0xa4
[ 2368.256868] [] ? child_rip+0xa/0x12
[ 2368.256873] [] ? restore_args+0x0/0x30
[ 2368.256878] [] ? kthread+0x0/0xa4
[ 2368.256882] [] ? child_rip+0x0/0x12

Hopefully this tells you something?
Attachment:
pgpeBPB1mBuxW.pgp
Description: PGP signature

[linuxkernelnewbies] MIDI Decoder Chip? | Comp.Arch.Embedded | EmbeddedRelated.com

2009-09-14 Thread Peter Teoh

http://www.embeddedrelated.com/usenet/embedded/show/30356-1.php

>   I'm sorry for not being clear, but you guys misunderstand - I
> don't want to control an external MIDI synthesizer.  I want a MIDI
> synthesizer chip that I can build into a project.

Well, suffice it say it might have helped if you had asekd for a
synthesizer chip right away, instead of speaking about a "decoder"
chip.  Acquiring anything like that in qty less than 1000 might indeed
prove challenging, these days, when most applications, including
cellphones, would just use a DSP and a huge wave table memory instead
of dedicated chips.

=

MIDI Decoder Chip? - b...@jfcl.com - 2005-06-13 01:02:00

  Is there an IC that will decode MIDI data?  I need something that I
can use in an embedded system with a fairly simple (e.g. 8051 or PIC)
microcontroller.  Ideally the chip would take MIDI data directly and
output analog audio for an amplifier, run on 5V and be available retail
in a thru hole package, but I'll take what I can get :-)  It does have
to be something I can by retail, though, in small quantities.

  I know there are MP3 decoder chips (e.g. the STA013) that approach
this ideal, but I really need to play MIDIs, not MP3s.

Thanks,
Bob Armstrong

Re: MIDI Decoder Chip? - jim dorey - 2005-06-13 02:12:00

On Mon, 13 Jun 2005 02:02:58 -0300, b...@jfcl.com  wrote:

>   Is there an IC that will decode MIDI data?  I need something that I
> can use in an embedded system with a fairly simple (e.g. 8051 or PIC)
> microcontroller.  Ideally the chip would take MIDI data directly and
> output analog audio for an amplifier, run on 5V and be available retail
> in a thru hole package, but I'll take what I can get :-)  It does have
> to be something I can by retail, though, in small quantities.
>
>   I know there are MP3 decoder chips (e.g. the STA013) that approach
> this ideal, but I really need to play MIDIs, not MP3s.
>
> Thanks,
> Bob Armstrong

i say either a basic synthesiser chip, or read the midi spec farther than  
the first half page.  try midi fanatics brainwashing center, it'll  
probably help you.

-- 
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/

Re: MIDI Decoder Chip? - Neil Bradley - 2005-06-13 02:36:00

b...@jfcl.com wrote:
>   Is there an IC that will decode MIDI data?  I need something that I
> can use in an embedded system with a fairly simple (e.g. 8051 or PIC)
> microcontroller.  Ideally the chip would take MIDI data directly and
> output analog audio for an amplifier, run on 5V and be available retail
> in a thru hole package, but I'll take what I can get :-)  It does have
> to be something I can by retail, though, in small quantities.
> 
>   I know there are MP3 decoder chips (e.g. the STA013) that approach
> this ideal, but I really need to play MIDIs, not MP3s.

MIDI Is a standard of controls and notes. It does not dictate instrumentation. 
MIDI Is like having the sheet music to a song. It has no inherent sound. What 
you need is a sound synthesizer that understands MIDI and can generates sound 
based upon it.

There is a standard called "General MIDI" which puts specific instruments on 
specific channels and patches, but you really need to understand what your MIDI 
source is set up for to know what it's going to play. I have MIDI files I've 
created from my own sequences, but you can't play them back on anything other 
than my studio (no sounds will be correct).

-->Neil

Re: MIDI Decoder Chip? - Dr Justice - 2005-06-13 03:18:00

 wrote in message
news:1...@g43g2000cwa.googlegroups.com...
>   Is there an IC that will decode MIDI data?

Yes, It's called an UART.

> I need something that I
> can use in an embedded system with a fairly simple (e.g. 8051 or PIC)
> microcontroller.

Practically all microcontrollers have at least one UART built in,
so you're not likely to need anything but your microcontroller
and an electrical interface  for the 5 ma current loop (use e.g.
a 6N138 optocoupler).

DJ
--

Re: MIDI Decoder Chip? - Gareth Magennis - 2005-06-13
05:43:00

 wrote in message 
news:1...@g43g2000cwa.googlegroups.com...
>  Is there an IC that will decode MIDI data?  I need something that I
> can use in an embedded system with a fairly simple (e.g. 8051 or PIC)
> microcontroller.  Ideally the chip would take MIDI data directly and
> output analog audio for an amplifier, run on 5V and be available retail
> in a thru hole package, but I'll take what I can get :-)  It does have
> to be something I can by retail, though, in small quantities.
>
>  I know there are MP3 decoder chips (e.g. the STA013) that approach
> this ideal, but I really need to play MIDIs, not MP3s.
>
> Thanks,
> Bob Armstrong
>

I think you completely misunderstand what MIDI is.  MIDI is just the 
information a MIDI keyboard generates when you play it - i.e. note on, note 
off inf

[linuxkernelnewbies] Fiemap, an extent mapping ioctl [LWN.net]

2009-09-14 Thread Peter Teoh






http://lwn.net/Articles/297696/


Fiemap, an extent mapping ioctl


  

  From:
   
  Mark Fasheh 


  To:
   
  Andrew Morton 


  Subject:
   
  [PATCH 0/3] Fiemap, an extent mapping ioctl


  Date:
   
  Wed, 10 Sep 2008 05:49:34 -0700


  Message-ID:
   
  <20080910124934.gb4...@wotan.suse.de>


  Cc:
   
  Andreas Dilger , Eric
Sandeen , linux-e...@vger.kernel.org


  Archive-link:
   
  Article,
  Thread
  

  


Hello,

	The following patches are the latest attempt at implementing a
fiemap ioctl, which can be used by userspace software to get extent
information for an inode in an efficient manner.

	These patches are against 2.6.27-rc6. I hope we've incorporated
enough feedback by now that they're ready for inclusion in -mm.

	There is an ioctl wrapper program available for testing at:

   http://www.kernel.org/pub/linux/kernel/people/mfasheh/fie...


Changes from last posting:

* s/FIEMAP_FLAG_NO_DIRECT/FIEMAP_FLAG_NO_BYPASS/  

* Updated wording for FIEMAP_EXTENT_UNWRITTEN

* Ext4 patch temporarily dropped because it no longer applies. I'm sure we
  can rectify this quickly.


Below this I will include the contents of fiemap.txt to make it convenient
for folks to get details on the API.
	--Mark



Fiemap Ioctl


The fiemap ioctl is an efficient method for userspace to get file
extent mappings. Instead of block-by-block mapping (such as bmap), fiemap
returns a list of extents.


Request Basics
--

A fiemap request is encoded within struct fiemap:

struct fiemap {
	__u64	fm_start;	 /* logical offset (inclusive) at
  * which to start mapping (in) */
	__u64	fm_length;	 /* logical length of mapping which
  * userspace cares about (in) */
	__u32	fm_flags;	 /* FIEMAP_FLAG_* flags for request (in/out) */
	__u32	fm_mapped_extents; /* number of extents that were
* mapped (out) */
	__u32	fm_extent_count; /* size of fm_extents array (in) */
	__u32	fm_reserved;
	struct fiemap_extent fm_extents[0]; /* array of mapped extents (out) */
};


fm_start, and fm_length specify the logical range within the file
which the process would like mappings for. Extents returned mirror
those on disk - that is, the logical offset of the 1st returned extent
may start before fm_start, and the range covered by the last returned
extent may end after fm_length. All offsets and lengths are in bytes.

Certain flags to modify the way in which mappings are looked up can be
set in fm_flags. If the kernel doesn't understand some particular
flags, it will return EBADR and the contents of fm_flags will contain
the set of flags which caused the error. If the kernel is compatible
with all flags passed, the contents of fm_flags will be unmodified.
It is up to userspace to determine whether rejection of a particular
flag is fatal to it's operation. This scheme is intended to allow the
fiemap interface to grow in the future but without losing
compatibility with old software.

fm_extent_count specifies the number of elements in the fm_extents[] array
that can be used to return extents.  If fm_extent_count is zero, then the
fm_extents[] array is ignored (no extents will be returned), and the
fm_mapped_extents count will hold the number of extents needed in
fm_extents[] to hold the file's current mapping.  Note that there is
nothing to prevent the file from changing between calls to FIEMAP.

Currently, there are three flags which can be set in fm_flags:

* FIEMAP_FLAG_SYNC
If this flag is set, the kernel will sync the file before mapping extents.

* FIEMAP_FLAG_XATTR
If this flag is set, the extents returned will describe the inodes
extended attribute lookup tree, instead of it's data tree.


Extent Mapping
--

Extent information is returned within the embedded fm_extents array
which userspace must allocate along with the fiemap structure. The
number of elements in the fiemap_extents[] array should be passed via
fm_extent_count. The number of extents mapped by kernel will be
returned via fm_mapped_extents. If the number of fiemap_extents
allocated is less than would be required to map the requested range,
the maximum number of extents that can be mapped in the fm_extent[]
array will be returned and fm_mapped_extents will be equal to
fm_extent_count. In that case, the last extent in the array will not
complete the requested range and will not have the FIEMAP_EXTENT_LAST
flag set (see the next section on extent flags).

Each extent is described by a single fiemap_extent structure as
returned in fm_extents.

struct fiemap_extent {
	__u64	fe_logical;  /* logical offset in bytes for the start of
			  * the extent */
	__u64	fe_physical; /* physical offset in bytes for the start
			  * of the extent */
	__u64	fe_length;   /* length in bytes for the extent */
	__u32	fe_flags;/* FIEMAP_EXTENT_* flags f

[linuxkernelnewbies] SEEK_HOLE or FIEMAP? [LWN.net]

2009-09-14 Thread Peter Teoh






http://lwn.net/Articles/260795/


SEEK_HOLE or FIEMAP?


 By Jonathan Corbet
December 3, 2007 
Sparse files have an apparent size which is larger than the amount of
storage actually allocated to them. The usual way to create such
a file is to seek past its end and write some new data; Unix-derived
systems will traditionally not allocate disk blocks for the portion of
the
file past the previous end which was skipped over. The result is a
"hole,"
a piece of the file which logically exists, but which is not
represented on
disk. A read operation on a hole succeeds, with the returned data being
all zeroes. Relatively smart file archival and backup utilities will
recognize holes in files; these holes are not stored in the resulting
archive and will not be filled if the file is restored from that
archive.
The process of recognizing holes is relatively primitive, though:
about the
only way to do it in a portable way is to simply look for blocks filled
with zeroes. This technique works, but it requires making a pass over
the
data to obtain information which the lower levels of the system already
know. It seems like there should be a better way.

About two years ago, the Solaris ZFS developers proposed
an extension to lseek() which would allow an application
to
find the holes in sparse files more efficiently. This extension
works by adding two new "whence" options:



   SEEK_HOLE positions the file descriptor to the
beginning of the first hole which occurs after the given offset. For
the purposes of this operation, "hole" is defined as a region of all
zeros of any length, but the system is not required to actually detect
all holes. So, in practice, small ranges of zeroes will be skipped
over, as will, in all likelihood, large (multi-block) ranges which have
actually been written to disk.

  
   SEEK_DATA moves to the start of next region (after the
given offset) which is not a hole.
  


This functionality has been part of Solaris for a while; the Solaris
developers would like to see it spread elsewhere and become something
more
than a Solaris-only extension. To that end, Josef Bacik has recently
posted an implementation
of
this extension for Linux. Internally, it adds a new member to the
file_operations structure (seek_hole_data()) intended
to
allow filesystems to efficiently implement the new operations.

One might argue that anybody who wants to separate holes and data in
a file
can already do so with the FIBMAP ioctl() command.
While
that is true, FIBMAP is an inefficient way of getting
this sort of information, especially on filesystems which support
extents.
A FIBMAP call returns the mapping information for exactly one
block; mapping out a large file may require millions of calls when,
once
again, the filesystem should already know how to provide that
information
in a much more straightforward manner.

Even so, this patch looks relatively unlikely to make it into the
mainline. The API is unpopular, being seen as ugly and as a change in
the
semantics of the lseek() call. But, more to the point, it may
be
interesting to learn much more about the representation of a file than
just
where the holes are. And, as it turns out, there is already a proposed
ioctl() command which can provide all of that information.
That
interface is the FIEMAP
ioctl() specified by Andreas Dilger back in October.

A FIEMAP call takes the following structure as an
argument:


struct fiemap {
	__u64	fm_start;	 /* logical starting byte offset (in/out) */
	__u64	fm_length;	 /* logical length of map (in/out) */
	__u32	fm_flags;	 /* FIEMAP_FLAG_* flags for request (in/out) */
	__u32	fm_extent_count; /* number of extents in fm_extents (in/out) */
	__u64	fm_end_offset;	 /* end of mapping in last ioctl */
	struct fiemap_extent	fm_extents[0];
};


An application wanting to learn something about how a file is stored
will
put the starting offset into fm_start and the length
of the region of interest in fm_length. If fm_flags
contains FIEMAP_FLAG_NUM_EXTENTS, the system call will simply
set
fm_extent_count to the number of extents used to store the
specified range of bytes and return. In this form, FIEMAP can
be
used to determine how fragmented the file is on disk.

If the application is looking for more information than that, it
will
allocate enough space for one or more fm_extents structures:


struct fiemap_extent {
	__u64 fe_offset;/* offset in bytes for the start of the extent */
	__u64 fe_length;/* length in bytes for the extent */
	__u32 fe_flags; /* returned FIEMAP_EXTENT_* flags for the extent */
	__u32 fe_lun;   /* logical device number for extent(starting at 0)*/
};


In this case, fm_extent_count should be set to the number of
these
structures before making the FIEMAP call. On return, these
structures (as many as is indicated by the returned value of
fm_extent_count) will be filled in with information on the
actual
file extents; fe_offset says where (on disk) the extent
starts,
and fe_length is th

[linuxkernelnewbies] Lynx Wiki [misc:e1550]: Huawei E1550 Broadband Modem Howto

2009-09-16 Thread Peter Teoh






http://wiki.lynxworks.eu/misc/e1550





  
  

  


  




  



 You are here: start » misc » misc:e1550

 Trace: » e1550 


−Table of Contents


  
Huawei
E1550

  
Ubuntu
  
  
Fedora
  
  
Arch
  

  




Huawei E1550

3G/HSDPA USB modem, used by many providers - mine is 3 Mobile (UK).


Ubuntu

Install udev-extras:

sudo apt-get install udev-extras

Create a custom udev rule to override the devices initial attempts to
be removable storage.

gksu gedit /etc/udev/rules.d/15-huawei-e1550.rules

Paste the following:

SUBSYSTEM=="usb", SYSFS{idProduct}=="1446", SYSFS{idVendor}=="12d1",
RUN+="/usr/bin/modem-modeswitch --vendor 0x12d1 --product 0x1446 --type option-zerocd"

Now When the card is plugged in, Network Manager prompts to configure a
new connection - choose “3 (handset)”.


Fedora

Install usb_modeswitch, do these steps as root:

yum install usb_modeswitch

You need to add a configuration to /etc/usb_modeswitch.conf:

DefaultVendor = 0x12d1
DefaultProduct = 0x1446
MessageEndPoint = "0x01"
MessageContent = "55534243001106"

Create a custom udev rule to override the devices initial attempts to
be removable storage. Note this is not the same as the Ubuntu rule.

vi /etc/udev/rules.d/15-huawei-e1550.rules

Paste the following:

SUBSYSTEM=="usb", SYSFS{idProduct}=="1446", SYSFS{idVendor}=="12d1",
RUN+="/lib/udev/usb_modeswitch"

When Network Manager prompts to configure a new connection, choose “3
(handset)”.


Arch

Download and compile usb_modeswitch. Copy to /usr/bin and the
usb_modeswitch.conf to /etc


Add the following to /etc/usb_modeswitch.conf:

DefaultVendor = 0x12d1
DefaultProduct = 0x1446
MessageEndPoint = "0x01"
MessageContent = "55534243001106"

Create the udev rule /etc/udev/rules.d/15-huawei-e1550.rules:

SUBSYSTEM=="usb", SYSFS{idProduct}=="1446", SYSFS{idVendor}=="12d1",
RUN+="/usr/bin/usb_modeswitch"
Network Manager is not so intuitive because the Mobile Broadband
Assistant isn't packaged for Arch. Fortunately 3 hasn't any odd
configuration, so will work.

[linuxkernelnewbies] Lynx Blog: Huawei E1550 Broadband Modem Howto

2009-09-16 Thread Peter Teoh





http://blog.lynxworks.eu/20090830/huawei-e1550-on-ubuntu

I picked up a Huawei E1550 pre-pay mobile broadband dongle, £39.99
with 3
Mobile including 3Gb usage (note it’s not the device
they’re picturing).
I’m on a course next month so that’ll do fine, I have no reception
at home and am not away enough to warrant a contract.
It appears to identify itself as USB storage, to install drivers on
Windows then flip-flops to a modem.  Nice idea, terrible
implementation, even in Windows where it installs drivers every time
you use a different USB port (it’s often wise to try such devices in
Windows – so you don’t chase your tail with a faulty device).  Pretty
sure it’s the autorun program that’s flipping the device.
Anyway you need udev-extras:
sudo apt-get install udev-extras
Add a udev rule:
gksu gedit /etc/udev/rules.d/15-huawei-e1550.rules
What we’re doing is telling udev that when this device is plugged in
to switch its mode.  Paste this and save:
SUBSYSTEM=="usb",
SYSFS{idProduct}=="1446",
SYSFS{idVendor}=="12d1",
RUN+="/lib/udev/modem-modeswitch --vendor 0x12d1 --product 0x1446
--type option-zerocd"
On next insertion, Network Manager’s mobile broadband configuration
assistant will run – select “3 (handsets)”.
Also, the booklet that came with mine was fairly unhelpful but
flashing green lights are powered, flashing blue show available
networks and solid blue is connected to a network.
The differences with both Fedora and Arch are on my wiki pages.
Please don’t ask if it works in Linpus Linux Lite because I haven’t
had that installed in ages.  I suspect the Fedora guide will point the
way but I know Acer have their own mobile broadband software for Huawei
devices.  Whether that extends to this model I couldn’t say.

Share and Enjoy:


  
  
  
  
  
  
  
  



Related Posts

  Fedora from an Ubuntu point of view:
In
the interests of not becoming blinkered to one distribution, I thought
I might give Fedora 11 a whirl.  Not having used Fedora since FC4, I
was surprised to see the adoption of a live CD installation and
relieved to avoid a DVD size download.  Just like Ubuntu it’s well
polished, perhaps more so with [...]
  
  Watch TV with VLC and a Freecom DVB-T Stick:
One
of the things I need my Aspire One to do is watch TV.  When you’re
away, it’s nice to be able to watch a little TV.  I bought a Freecom
DVB-T USB stick years ago and have always had success under Linux.
 It’s small, sensitive and selective.
I was surprised, especially on Ubuntu, how easy [...]
  
  Replacing Firefox:
I am a heavy web user.  Everything from webmail to banking, if I
can do it on-line I will.
So when I’m away from home (which is common) I like to have
connectivity.  With WiFi access and mobile broadband what to connect to
is straightforward.  As I like to travel light, I prefer to use a [...]
  


 By Dougie | Posted in Arch, Computing, Fedora, Ubuntu | Tagged Arch Linux,
Fedora, Linux, Planet
Ubuntu, Ubuntu
| Comments (4)

[linuxkernelnewbies] Draisberghof - Software - USB_ModeSwitch

2009-09-17 Thread peter teoh






http://www.draisberghof.de/usb_modeswitch/

USB_ModeSwitch - Activating Switchable USB Devices on Linux
 

  Introduction
  
  Download
  
  How to install
  
  How to use (and to automate)
  
  Known working hardware
  
  Troubleshooting
  
  Kernel support
  
  Contribute
  
  Whodunit
  
  History
  

Version française 
içi - merci au Bullteam
If you don't like the text column width, adjust your browser window

Introduction

USB_ModeSwitch is (surprise!) a mode switching tool for controlling
"flip flop" (multiple device) USB gear.


Several new USB devices (especially high-speed wireless WAN stuff,
there seems to be a chipset
from Qualcomm offering that feature) have their MS Windows drivers
onboard; when plugged in for the first
time they act like a flash storage and start installing the driver from
there.
After that (and on every consecutive plugging) this driver switches the
mode
internally, the storage device vanishes (in most cases), and a new
device (like
an USB modem) shows up. The WWAN gear maker Option calls that feature
"ZeroCD (TM)".

As you may have guessed, nothing of this is documented in any form and
so far there are no
official Linux drivers available (with the notable exception of Option
products).
On the good side, most of the known devices work out of the box with
the available
Linux drivers like "usb-storage" or "usbserial" (in recent kernels
quite a lot of devices
are acknowledged by the "option" module). That leaves the problem of
the
mode switching from storage to modem or whatever the thing is supposed
to do.


Fortunately there are things like human reason, USB sniffing programs
and
"libusb". It is possible to eavesdrop the communication of the MS
Windows driver,
to isolate the command or action that does the switching, and to
reproduce the
same thing with Linux.


USB_ModeSwitch makes the last step considerably easier by taking the
important parameters
from a configuration file and doing all the initialization and
communication stuff.
Starting from version 0.9.7 it has an optional success check which
spares the
manual call of "lsusb" to note any changes after execution. This comes
at a price though: to
work properly, the success check needs a system/device dependent delay
to give the device
time to settle after the switch. This delay is configurable, but it
obviously prevents
USB_ModeSwitch to return immediately. Thus it is mostly useful during
testing.


Please read the information on this page carefully before you go
around posting questions!
This is no tool that does it all for you automagically (yet). It really
helps to understand in
principle what is happening, which in turn makes it easy to adapt the
config file to your setup or
even do your own exploring of new devices.


For hints about doing your own sniffing see paragraph Contribute
below


Breaking News: just found this humble tool in the source code of
the fine
Dovado UMR router, which
they publish in compliance
with the GPL. So if you want the power of your Wireless Broadband
across your local network, but
without the "fun" of setting up your own Linux router (as I did),
consider investing in such
a machine.

Download
The latest release version is 1.0.5. The tar archive contains
the source and a i586 binary
(32 bit, GCC 4.3.2). I used libusb-0.1.12.
There are changes and updates to the config file more often than new
releases; most of the
valuable knowledge about devices is contained in this file. So you
better use the latest version
linked here.


  Download usb_modeswitch-1.0.5.tar.bz2,
dated from 2009-08-26; a Debian (Xandros/Ubuntu) package should be
available soon at the  Debian Repository.
Many architectures are supported there (like amd64 or ia64).
  
  Load the latest usb_modeswitch.conf;
the default place is "/etc". Last updated 2009-08-26 
  Don't forget libusb
if you don't have it. In your distribution, there is most likely a
package named "libusb-dev" or "libusb-devel" 

How to install
Unpack and run "make install". Edit "/etc/usb_modeswitch.conf"
according to your
hardware (it's heavily commented and should tell you what to do).
If you want to compile it for yourself, just delete the binary, then
run "make" or type on
the shell:

$ gcc -l usb -o usb_modeswitch usb_modeswitch.c
That's as easy as it gets. And it should be as portable as libusb
itself (some limitations on
FreeBSD based systems are known).
If installing manually, take the executable "usb_modeswitch" and put
it into your path (preferably "/sbin" or "/usr/sbin").
Put "usb_modeswitch.conf" into "/etc" and don't forget to edit it.


Alternatively you can use the command line interface to tell
USB_ModeSwitch
the things it needs to know; try "usb_modeswitch -h" to list the
parameters.
This way you can handle multiple configurations. If any command line
parameters
except -W and -q are used, the default config file is NOT read.
For a command line parameter reference, you have to consult
/etc/usb_modeswitch.conf at the
moment. Until I decide t

[linuxkernelnewbies] Linux Tutorials - Using Udev To Manage Hardware In Linux | DreamInCode.net

2009-09-17 Thread peter teoh






http://www.dreamincode.net/forums/showtopic20020.htm

Using udev to manage hardware

Why?
The first
question is, why would you write a tutorial on using udev? Aren't there
enough out there already?! I'd say yes, but in order for me to REALLY
start getting the idea and writing my own rules, it required me to read
almost all of them, and piece together the good parts of each one. I
thought I could bring together what I had going on, and then possibly
share what I had figured out.
The second question is, why would I
need to learn to write custom udev rules? Well, take, for instance, my
standard desktop system. I've got a card reader that reads
5,195,638,923,579,823 different types of cards (that's a slight
exaggeration...), my iPod which has it's own dock, my digital camera,
my 6 different thumb drives, and my pocket hard drive. These all mount
as /dev/sda, /dev/sdb, /dev/sdc... and so on, all depending on the
order I plug them in at. This makes my /etc/fstab super messy,
and a little unwieldy. It would be nice if my SD card reader always
appeared as /dev/sdcardreader and I could mount accordingly there, etc.

Prerequisites
The /dev directory in your linux installation is a list of nodes, and
each node corresponds with a piece of hardware on your system. So say
you want to read input from the mouse, you'd just read input from
/dev/input/mice
Originally the /dev directory stored EVERY SINGLE
node that the kernel could possibly know about. This meant that if you
used a full kernel with everything enabled, you could have a node for a
device you have never even heard of, much less own. This made a bloated
and rather lethargic.
Along came devfs, which was a little smart.
It populated /dev only with hardware it could find on the system. This
meant that if you didn't plug in a mouse, you'd have no /dev/input/mice
This was a much better approach because you only had the hardware that
was relevant to your system. However, there were still many
architectural problems that hadn't been thoroughly thought out fully
before implementation, and so as the system grew, the pain was felt.
Enter udev. udev's idea cleared up some of the problems with devfs,
including better kernel management (as of kernel 2.6.x), and the
implementation of sysfs, which is found at /sys Using sysfs, udev knows
about information plugged into your. sysfs can be thought of as a
“middle layer” of sorts. When a device is plugged into a system, sysfs
provides udev with extensive information about that hardware.

Name Schemes and udev Rule Writing
I have a server I'm running that has a SCSI IDE controller in it. It
then has two 74GB hard drives that are mirrored using software
mirroring. udev automatically has some of the most generic of rules
written in for you already. For instance, for all of my storage
devices, udev puts a node record in /dev/disk. To see what udev has
found, I just do ls -lR /dev/disk, and it returns this:


CODE

drwxr-xr-x 2 root root 160 2006-10-19 13:27 by-id/
drwxr-xr-x 2 root root 180 2006-10-19 13:27 by-path/
drwxr-xr-x 2 root root  60 2006-10-19 13:27 by-uuid/

/dev/disk/by-id:
total 0
lrwxrwxrwx 1 root
root  9 2006-10-19 13:27 scsi-20e09e00019ecd40e -> ../../sdb
lrwxrwxrwx 1 root
root 10 2006-10-19 13:27 scsi-20e09e00019ecd40e-part1 -> ../../sdb1
lrwxrwxrwx 1 root
root 10 2006-10-19 13:27 scsi-20e09e00019ecd40e-part2 -> ../../sdb2
lrwxrwxrwx 1 root
root  9 2006-10-19 13:27 scsi-20e09e00019ecd44c -> ../../sda
lrwxrwxrwx 1 root
root 10 2006-10-19 13:27 scsi-20e09e00019ecd44c-part1 -> ../../sda1
lrwxrwxrwx 1 root
root 10 2006-10-19 13:27 scsi-20e09e00019ecd44c-part2 -> ../../sda2

/dev/disk/by-path:
total 0
lrwxrwxrwx 1 root
root  9 2006-10-19 13:27 pci-:00:0c.0-scsi-0:0:0:0 -> ../../sda
lrwxrwxrwx 1 root
root 10 2006-10-19 13:27 pci-:00:0c.0-scsi-0:0:0:0-part1 -> ../../sda1
lrwxrwxrwx 1 root
root 10 2006-10-19 13:27 pci-:00:0c.0-scsi-0:0:0:0-part2 -> ../../sda2
lrwxrwxrwx 1 root
root  9 2006-10-19 13:27 pci-:00:0c.0-scsi-0:0:1:0 -> ../../sdb
lrwxrwxrwx 1 root
root 10 2006-10-19 13:27 pci-:00:0c.0-scsi-0:0:1:0-part1 -> ../../sdb1
lrwxrwxrwx 1 root
root 10 2006-10-19 13:27 pci-:00:0c.0-scsi-0:0:1:0-part2 -> ../../sdb2
lrwxrwxrwx 1 root
root 10 2006-10-19 13:27 pci-:00:0c.1-scsi-0:0:6:0 -> ../../scd0

/dev/disk/by-uuid:
total 0
lrwxrwxrwx 1 root
root 9 2006-10-19 13:27 6aae5be6-6003-4def-8837-9a6ea3489d5e -> ../../md0


So in /dev/disk are three folders, by-id, by-path, and by-uuid. by-id
and by-path both show me the scsi drives by their path or id (big
surprise!), but by-uuid doesn't show the drives, but the RAID array
they are actually in. I can then reference those drives by any of those
address provided there (although RAID arrays are a special case, and
you only want to write and read to the array). Notice, however, that
these nodes are actually just symlinks to the real node address. This
way, if /dev/sda and /dev/sdb are switched on boot one time (for some
unknown reason) I can still access

[linuxkernelnewbies] Writing udev rules

2009-09-17 Thread peter teoh






http://reactivated.net/writing_udev_rules.html

Writing udev rules
by Daniel Drake (dsd)
Version 0.74

The most recent version of this document can always be found at: 
http://www.reactivated.net/writing_udev_rules.html
Contents

  Introduction

  About
this document
  History

  
  The concepts

  Terminology:
devfs, sysfs, nodes, etc.
  Why?
  Built-in
persistent naming schemes

  
  Rule writing

  Rule
files and semantics
  Rule
syntax
  Basic
rules
  Matching
sysfs attributes
  Device
hierarchy
  String
substitutions
  String
matching

  
  Finding suitable information from sysfs

  The
sysfs tree
  udevinfo
  Alternative
methods

  
  Advanced topics

  Controlling
permissions and ownership
  Using
external programs to name devices
  Running
external programs on certain events
  Environment
interaction
  Additional options

  
  Examples

  USB
Printer
  USB
Camera
  USB
Hard Disk
  USB
Card Reader
  USB
Palm Pilot
  CD/DVD
drives
  Network
interfaces

  
  Testing and debugging

  Putting
your rules into action
  udevtest

  
  Author
and contact

Introduction

About this document

udev is targeted at Linux kernels 2.6 and beyond to provide a userspace
solution for a dynamic /dev directory, with persistent device naming.
The previous /dev implementation, devfs, is now deprecated, and
udev is seen as the successor. udev vs devfs is a sensitive area of
conversation - you should read this
document before making comparisons.


Over the years, the things that you might use udev rules for has
changed, as well as the flexibility of rules themselves. On a modern
system, udev provides persistent naming for some device types
out-of-the-box, eliminating the need for custom rules for those
devices. However, some users will still require the extra level of
customisation.

This document assumes that you have udev installed and running OK
with default configurations. This is usually handled by your Linux
distribution.

This document does not cover every single detail of rule writing,
but does aim to introduce all of the main concepts. The finer details
can be found in the udev man page.

This document uses various examples (many of which are entirely
fictional) to illustrate ideas and concepts. Not all syntax is
explicitly described in the accompanying text, be sure to look at the
example rules to get a complete understanding.


History

  April 5th 2008 v0.74: Typo fixes.
  December 3rd 2007 v0.73: Update for new udev versions, and some
miscellaneous improvements.
  October 2nd 2006 v0.72: Fixed a typo in one of the example rules.
  June 10th 2006 v0.71: Misc changes based on recent feedback -
thanks!
  June 3rd 2006 v0.7: Complete rework, to be more suited for the
modern-day udev.
  May 9th 2005 v0.6: Misc updates, including information about
udevinfo, groups and permissions, logging, and udevtest.
  June 20th 2004 v0.55: Added info on multiple symlinks, and some
minor changes/updates.
  April 26th 2004 v0.54: Added some Debian info. Minor corrections.
Re-reverted information about what to call your rule file. Added info
about naming network interfaces.
  April 15th 2004 v0.53: Minor corrections. Added info about
NAME{all_partitions}. Added info about other udevinfo tricks.
  April 14th 2004 v0.52: Reverted to suggesting using "udev.rules"
until the udev defaults allow for other files. Minor work.
  April 6th 2004 v0.51: I now write suggest users to use their own
"local.rules" file rather than prepending "udev.rules".
  April 3rd 2004 v0.5: Minor cleanups and preparations for possible
inclusion in the udev distribution.
  March 20th 2004 v0.4: General improvements, clarifications, and
cleanups. Added more information about writing rules for usb-storage.
  February 23rd 2004 v0.3: Rewrote some parts to emphasise how
sysfs
naming works, and how it can be matched. Updated rule-writing parts to
represent udev 018s new SYSFS{filename} naming scheme. Improved
sectioning, and clarified many points. Added info about KDE.
  February 18th 2004 v0.2: Fixed a small omission in an example.
Updated section on identifying mass-storage devices. Updated section on
nvidia.
  February 15th 2004 v0.1: Initial publication.

The concepts

Terminology: devfs, sysfs, nodes, etc.

A basic introduction only, might not be totally accurate.


On typical Linux-based systems, the /dev directory is used to
store file-like device nodes
which refer to certain devices in the system. Each node points to a
part of the system (a device), which might or might not exist.
Userspace applications can use these device nodes to interface with the
systems hardware, for example, the X server will "listen to"
/dev/input/mice so that it can relate the user's mouse movements to
moving the visual mouse pointer.


The original /dev directories were just populated with every
device that mi

[linuxkernelnewbies] Using the Huawei E169G usb mobile internet modem on the EEE | greenhughes.com

2009-09-17 Thread peter teoh





http://www.greenhughes.com/content/using-huawei-e169g-usb-mobile-internet-modem-eee

Using the Huawei E169G usb mobile internet modem on the EEE
Posted June 15th, 2008 by Liam Green-Hughes
 Tags:

  3
  asuseee
  e169g
  eee
  huawei e169g
  mobile
  modem
  mtech
  three.co.uk
  udev
  usb


‹
Installing Python onto a Symbian-powered mobile phone | Main | A look at
Ubuntu Netbook Remix ›
Note
to Ubuntu and Easy Peasy users: Your Huawei E169G should now work
out-of-the box with later versions of Ubuntu (8.10 onwards) and
derivatives. Lots of other modems work too, like
the E160G.
Update: I've attempted to automate the steps above by using
a package, have a look at: Huawei
E169G - the easy way 
Yesterday I treated myself to a new mobile internet "dongle" to go
with my Asus EEE PC. I decided to go for the Huawei E169G usb modem as
it matches my black EEE, however there is a small problem with getting
this device to work straight away. The problem is that the E169G is a composite
device,
which basically means that it will ask as a USB memory stick until it
is sent a command to tell it to be a modem. The EEE doesn't know about
this so you can't use it straight away as a 3G modem in the connection
wizard. Fortunatelty, back in April Dale Lane
documented in his blog
how to send the modem the right command to be able to use it wil the
EEE, his blog post on the topic is worth reading as it explains the
background to the issue. After experimenting with my friend Keren Mills'
E169G (thanks Keren!) to check that I could get this method to work I
took the plunge and got my own one. Following the instructions on Dale
Lane's blog I was able to send some commands manually to the unit to
get it to switch but what I really wanted to do was to get the EEE to
recognise the device automatically so I can start a 3G connection
without having to run any commands in the terminal. Fortunately this is
possible.
Here I how I got it all to work. The first thing to do is to get
hold of the program that sends the command to the E169G to get it to
switch to modem mode. This is called usb_modeswitch and is available
from: http://www.draisberghof.de/usb_modeswitch/
. This utility is suuplied ready to go and after you download the file
for it uncompress it by (opening an terminal and) typing:
tar xvzf [the name of the file you downloaded]
This should create a new directory containing the downloaded files.
If you look inside it (cd [the directory created]) you should see a
file named usb_modeswitch. This should be executable (i.e. the computer
can run it as a program, it should appear green in colour, if it is not
fix ths by typing chmod u+x usb_modeswitch). We now
need to copy this file into a location where it can be easily found by
the operating system, so type:
sudo cp usb_modeswitch /usr/sbin/
The next step is to make ourselves a small file that will actually
function as a command to switch the E169G into modem mode. We are
actually going to send the unit two command, one to tell it to stop
being a storage medium and one to tell it to be a modem. For fun, let's
use VIM, a text mode editor to do this. Type this start the VIM text
editor:
sudo vim /usr/sbin/e169g_switch 
Press the 'i' key to start inputting and type:
#!/bin/sh
/usr/sbin/usb_modeswitch -v 12d1 -p 1001 -d 1
/usr/sbin/usb_modeswitch -v 12d1 -p 1001 -H 1
To save the file press the ESC key and then enter ':wq'
(this means write the file and quit). You now need to tell the EEE that
this is a program that it can run and not just a text file, so do this
by typing:
sudo chmod u+x /usr/sbin/e169g_switch
At this point we now have a command we can run manually to switch
the E169G into modem mode, but what would be ideal is to get the EEE to
run this automatically when the unit is plugged in. We can do this by
adding a configuration file as described under the "How to automate"
section of the usb_modeswitch documentation. So we need to create
a small file to do this, type:
cd /etc/udev/rules.d
This brings you to the place where we need to add a file to get the
EEE to send the switch command automatically. To start work on the file
type:
sudo vim 70-e169g.rules
Press the 'i' key to start inputting and type:
SUBSYSTEM=="usb" SYSFS{idProduct}=="1001",
SYSFS{idVendor}=="12d1", RUN+="/usr/sbin/e169g_switch"
.. and press ESC then enter ':wq' to save and quit.
Be careful to type this exactly as shown, making sure you enter the
right number of equals signs.
Hopefully by this point your EEE should now be capable of using the
E169G. You might have to restart the machine, but the next time you
plug the E169G in it should get picked up as a modem and you should be
able to use it to create a 3G connection using the wizard.
What I don't have yet is a way to send and receive SMSs and get the
signal strength. There are however some utilities out there which might
provide this functionality (of which this
one from Vodafone looks promising and UMTSmon
looks like it could help too). I'll be having

[linuxkernelnewbies] Dani's Blog: usb_modeswitch

2009-09-17 Thread peter teoh






http://blog.pew.cc/tag/linux/

Debian for eee PC

        
Wed, 02 Sep 2009 15:53 - Daniel - Other - Comments (0)

        

    

I used fluxflux for quite some time on my eee Pc, but I didn't really
like the PCLinuxOS it was based on. It worked quite well for some time,
but the last release is kinda aged and after upgrading the OS I had
some serious problems. Since I prefer Debian based systems anyway I
took a quick look at Debian Lenny and Ubuntu.

Ubuntu has a special version for netbooks called Ubuntu Netbook Remix.
It's not as heavy as a full Ubuntu install, but it's still quite big.
Also I has a few problems with the eee 701.

Debian on the other hand can be installed with only a minimum system
that uses very little space (I know, ubuntu can do that too with it's
alternate installer CDs) and supports pretty much everything on the eee
701 thanks to the work of the DebianEeePC Project.

Installing it is very simple. Just follow the steps of the HowTo. I
installed just the standard system and later added the x.org,
gnome-core, gdm, network-manager and iceweasel by hand. After that the
OS install is just about 1 GB and some stuff even is optional. For
example you can skip gdm and network-manager tofree a few MB if you
don't need a fancy graphical login and manage your connections manually.

The only thing that doesn't work at the moment is changing the volume
with the function keys, but I don't really need that anyway.

To sum it up, Debian on the eee PC is very nice. Boot time is lightning
fast and the hardware works "out of the box". If you think about
installing another OS on your eee PC, Debian would be a good choise.


Tags: linux ubuntu fluxflux debian asus eee 

        



    
Songbird 1.0.0 deb

        
Wed, 03 Dec 2008 18:54 - Daniel - Other - Comments (1)

        

    

I found a deb file that works on my ubuntu 8.04 box. Unter Hund
provides debs for songbird 1.0rc1 through 1.0 final.

Unfortunately he hosts the deb for Songbird 1.0 on a one-click-hosting
site. If some one want to grab this deb too, you can get it now from my
mirror too.

Most noticeable change from songbird 0.7.0 to 1.0.0 is the improved
startup speed. Very nice.


Tags: linux songbird music player ubuntu 

        



    
Songbird 1.0.0

        
Tue, 02 Dec 2008 22:08 - Daniel - Other - Comments (0)

        

    

Today songbird 1.0.0 has benn released. This is the first final version
of songbird. Songbird is a music player based on xulrunner (from
mozilla, also powers firefox), gstreamer for playback and sqlite as
database backend for the music library. It's free as in freedom and
runs on windows, linux, mac os x and even opensolaris.

I've been using it for a while now (since version 0.6.0), and it is a
really good music player. Unfortunately it the tarball release won't
run on my ubuntu pc. Maybe building it from source will help, but I
havn't tried that yet.


Tags: linux songbird music player ubuntu 

        



    
Huawei E160 and Linux

        
Sat, 27 Sep 2008 13:40 - Daniel - Other - Comments (3)

        

    

I recently got a Huawei E160 3G modem and use it on my Asus EEE 701 for
mobile browsing. Since the E160 has two modes. In the default mode it
acts like a read only USB drive, in the second mode, the modem is
available. To change between these modes, you need a tool called
usb_modeswtich.



The solution I'm describing here works on my EEE with fluxflux (a
PCLinuxOS-remaster). It should work with other distributions too, but I
havn't tested it. This solutions is based on  Thomas Schönhütl's post
about how to get the E169 working with fluxflux (german).



First, we need to get usb_modeswitch and compile it. You need
libusb-dev (or libusb-devel on some distros) installed for this.

wget
http://www.draisberghof.de/usb_modeswitch/usb_modeswitch-0.9.4.tar.bz2
tar -xjf usb_modeswitch-0.9.4.tar.bz2
cd usb_modeswitch-0.9.4
./compile.sh
cp usb_modeswitch /usr/local/bin/



Now install ivman. Ivman is a daemon to auto-mount and manage media
devices. We'll use ivman to run usb_modeswitch when the E160 is
connected. I tried to do this with udev but failed. If anyone is
successfull by doing it with udev, let me know.

apt-get install ivman
ivman



Add ivman to you autostart. In ubuntu you can do this by going to
System->Preferences->Sessions and adding a new Startup Program.

Now, lets adjust the ivman config file. Open
$HOME/.ivman/IvmConfigActions.xml and add this befor


    
    
    
    



Now create a file named .e160.sh in you home folder and open it with
you favourite text editor. Paste this to the file

!/bin/bash

if [ -z "`/bin/ls /dev/ttyUSB0`" ]; then
    if [ "`/usr/sbin/lsusb | grep 12d1 | cut -d : -f3 | cut -b -4`"
= "1003" ]; then
    /usr/local/bin/usb_modeswitch -v 12d1 -p 1003 -d 1
    /usr/local/bin/usb_modeswitch -v 12d1 -p 1003 -H 1
    fi
fi



Save the file and make it executable.

chmod +x .e160.sh



Restar

[linuxkernelnewbies] Using Huawei E180 modem in linux (RHEL5) - slight insights

2009-09-17 Thread peter teoh






http://chankle.org/blog/index.php?/archives/24-Using-Huawei-E180-modem-in-linux-RHEL5.html

Using
Huawei E180 modem in linux (RHEL5)

 
 I had been trying to get my Huawei
E180 wireless modem that came with my Singtel
wireless Broadband plan to work with my linux laptop.
It wasn't easy and I had to search for information on how to do it.
I'll thought I'll write something on how I did it.

The main concept
First you need to get udev to create a device node for the modem at /dev/ttyUSB0.
At this point which turns the modem is just like an analog modem.
Then you will then use a modem dialer program (such as wvdial) to use
the modem dial to the Singtel network. If it works correctly, you'll see
a ppp0 link when you do a ifconfig at the console:


ppp0 Link encap:Point-to-Point Protocol 
inet addr:10.111.22.151 P-t-P:10.64.64.64 Mask:255.255.255.255
UP POINTOPOINT RUNNING NOARP MULTICAST MTU:1500 Metric:1
RX packets:13 errors:0 dropped:0 overruns:0 frame:0
TX packets:15 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:3 
RX bytes:658 (658.0 b) TX bytes:688 (688.0 b)


Setting up USB_Modeswitch
The Huawei E180 is one of those new style devices that comes with
Windows device driver stored in them. When first plugged in, it will
appear as a 
pseudo CD-ROM and then Windows will use it install the device drivers
for it. This is convinent forWindows users but prevents it from being 
recongized as a modem when you use in a Linux system. So the first
thong to do is to install the USB_ModeSwitch utility (http://www.draisberghof.de/usb_modeswitch/)

Get the code, compile and put the binary in your /sbin
directory and the usb_modeswitch.conf at /etc/. Then
you need to configure it. The utility 
reads the configuration from /etc/usb_modeswitch.conf.
Take a look at the file and uncomment the section for your modem. There
is a section for Huawei modems in there.
The important thing is to get the Vendor and Product id right, to check
do a lsusb like this:


# lsusb
Bus 007 Device 001: ID : 
Bus 002 Device 001: ID : 
Bus 005 Device 001: ID : 
Bus 004 Device 001: ID : 
Bus 001 Device 009: ID 12d1:1003 Huawei Technologies Co., Ltd.
E220 HSDPA Modem / E270 HSDPA/HSUPA Modem
Bus 001 Device 001: ID : 
Bus 003 Device 001: ID : 
Bus 006 Device 001: ID : 


Here the Vendor id for the Huawei E180 modem is 12d1 and the product ID
is 1003.



# Huawei E220 (aka "Vodafone EasyBox II", aka "T-Mobile wnw Box Micro")
# Huawei E270
# Huawei E870
#
# Two options: 1. removal of "usb-storage" 2. the special control
# message found by Miroslav Bobovsky
#
# Contributor: Hans Kurent, Denis Sutter

DefaultVendor= 0x12d1;
DefaultProduct= 0x1003

# choose one of these:
;DetachStorageOnly=1
HuaweiMode=1


For the Huawei modem, I uncommented the "HuaweiMode=1" line which
should detach the CD-ROM mode and turn the modem into a modem. 
For other modems you may have to experiement a bit. The utility takes
in command line arguments which should make it easier to test changes.


Creating the /dev/ttyUSB0 node
First create a script for udev as /sbin/usb_modeswitch.sh:



 #!/bin/sh   

 # close these FDs to detach from udev

exec 1<&- 2<&- 5<&- 7<&-

sh -c "sleep 4; /usr/bin/usb_modeswitch" &

exit 0




The exec is a hack for RHEL 5 and UDEV 0.95 which I am using (see
http://www.draisberghof.de/usb_modeswitch/bb/viewtopic.php?t=11). 

Then you need to tell the udev process to execute the script when it
sense the modem is plugged and after this to load the usb_serial module:


# cat /etc/udev/rules.d/66-huawei.rules
SUBSYSTEM=="usb", SYSFS{idVendor}=="12d1", SYSFS{idProduct}=="1003",
RUN+="/sbin/usb_modeswitch.sh"
SUBSYSTEM=="usb", SYSFS{idVendor}=="12d1", SYSFS{idProduct}=="1003",
RUN+="/sbin/modprobe usbserial vendor=0x12d1 product=0x1003"


The important thing is to get the vendor and product id, the script and
modprobe command correct.

Now if everything is ok, when you plug in the modem you should be a
/dev/ttyUSB0 node created.


Dialing the ISP
You can use any modem dialler to do this. I like simpicity and so I
prefer wvdial. To do I add the following entry to /etc/wvdial.conf


# http://ubuntuforums.org/showthread.php?t=426354&page=4
[Dialer singtel-wireless]
Phone = *99#
Username = 123
Password = 123
Stupid Mode = 1
Dial Command = ATDT
Modem = /dev/ttyUSB0
Baud = 115200
Init2 = ATZ
Init3 = ATQ0V1E1S0=0&C1&D2+FCLASS=0
INIT5 = AT+CGDCONT=1,"IP","internet"
ISDN = 0
Modem Type = Analog Modem


Then I execute "wvdial singtel-wireless" to starting
dialling. If everything works, then a ppp0 link in created and you can
start surfing. 
 Posted by chan
kok leong in linux
at 12:28
| Comments
(0) | Trackbacks
(0) 


 
Trackbacks
 Trackback
specific URI for this entry 

No Trackbacks


Comments
Display comments as (Linear
| Threaded) 

No comments

[linuxkernelnewbies] usb_modeswitch.conf

2009-09-17 Thread peter teoh






http://www.draisberghof.de/usb_modeswitch/usb_modeswitch.conf

# /etc/usb_modeswitch.conf
#
# Last modified: 2009-08-26
#
# Configuration for usb_modeswitch, a mode switching tool for controlling
# flip flop (multiple device) USB gear
#
# Main purpose is to trigger the switching of several known UMTS modems
# from storage device mode ("ZeroCD TM" for use with MS Windows) to modem
# (serial) device mode
#
# Detailed instructions and a friendly forum on the homepage:
# http://www.draisberghof.de/usb_modeswitch
#
# News update: you want to read the paragraph about troubleshooting there
# if you run into problems !!!


# Just set or remove the comment signs (# and ;) in order to activate
# your device. (Actual entries are further down, after the reference.)
#
# For custom settings:
# Numbers can be decimal or hexadecimal, MessageStrings MUST be
# hexadecimal without prepended "0x". Digits 9-16 in the known
# MessageStrings are arbitrary; I set them to "12345678"


# What it all means (short command line flags appended):
#
#
# * DefaultVendor-v 
# * DefaultProduct   -p 
#
# This is the ID the USB device shows after having been plugged in.
# The program needs this; if not found -> no action.
#
#
# * TargetVendor -V 
# * TargetProduct-P 
#
# These are the IDs of the USB device after successful mode switching.
# They are optional, but I recommend to provide them for better analysis.
# You definitely need them if you enable CheckSuccess (see below)
#
#
# * TargetProductList(file only) 
#
# Like TargetProduct, but more than one possibility. Only used in automated
# config files (in /etc/usb_modeswitch.d). 
#
#
# * TargetClass  -C 
#
# Some weird devices don't change IDs. They only switch the device class.
# If the device has the target class -> no action (and vice versa)
#
#
# * MessageEndpoint  -m 
# 
# A kind of address inside the interface to which the "message"
# (the sequence that does the actual switching) is directed.
# Starting from version 0.9.7 the MessageEndpoint is autodetected
# if not given
#
#
# * MessageContent   -M 
#
# A hex string containing the "message" sequence; it will be
# sent as a USB bulk transfer.
# 
#
# * ResponseEndpoint -r 
# * NeedResponse <0/1>   -n
#
# Some devices were reported to require receiving the response of the
# bulk transfer to do the switching properly. Usually not needed.
# Starting from version 1.0.0 the ResponseEndpoint is autodetected
# if not given
#
#
# * DetachStorageOnly <0/1>  -d
#
# Some devices just need to be detached from the usb-storage
# driver to initiate the mode switching. Using this feature
# instead of removing the whole usbstorage module keeps other
# storage devices working.
#
#
# * HuaweiMode <0/1> -H
#
# Some Huawei devices can be switched by a special control
# message.
#
#
# * SierraMode <0/1> -S
#
# Some Sierra devices can be switched by a special control
# message.
#
#
# * SonyMode <0/1>   -O
#
# Some Sony-Ericsson devices can be switched by a special control
# message. This is experimental and might not have a stable result
#
#
# * ResetUSB <0/1>   -R
#
# Some devices need a rougher treatment. If the switching seems
# to do something (run udevmonitor), but your system does not reflect
# it, try this somewhat brutal method to do a reset after switching.
# Mind that if your device switched OK before, this will probably set
# it back to storage mode ...
#
#
# * Interface-i 
# * Configuration-u 
# * AltSetting   -a 
#
# More USB parameter to help with tricky devices and for doing lots
# of cruel experiments ...
#
## Note:
## AltSetting/Configuration changes and ResetUSB are executed after all
## other steps and can be combined or used on their own (e.g. a reset
## might have the same effect as a manual replug)
#
#
# * InquireDevice <0|1>  -I (disables inquiry)
#
# The standard since 1.0.0 is to do a SCSI inquiry on the default device
# before other actions. This might be a future way to identify a device
# without ambiguities. If it causes trouble with your device, just disable.
#
#
# * CheckSuccess -s 
#
# Check continuously if the switch succeeded for max  seconds.
# First, an interface access test: most devices vanish after
# switching and can't be accessed anymore.
# Second, a recount of target devices: one more than at the initial
# count, at the same bus with a higher device number -> device
# switched fine.
# It's safe to give a higher value than needed; checking stops as
# soon as the target device is found
#
#
# -> All other entries are just ignored <-

# Additional command line flags:
# 
# Verbose output -W
# No output at all   -q
# Other config file  -c 

# For filling in all this information for an unknown device,
# see instructions and links on the homepage:
# http://www.draisberghof.de/usb_modeswitch
#
# If you find working

[linuxkernelnewbies] How is Wikipedia's example of an unbalanced AVL tree really unbalanced? - Stack Overflow

2009-09-17 Thread peter teoh





http://stackoverflow.com/questions/230831/how-is-wikipedias-example-of-an-unbalanced-avl-tree-really-unbalanced


How is Wikipedia’s example of an unbalanced
AVL tree really unbalanced?




  


  

  
  3   
  1
  
  
  
  
  
  
  The image above is from "Wikipedia's
entry on AVL trees" which Wikipedia indicates is unbalanced.
How is this tree not balanced already? Here's a quote from the article:
  
The balance factor of a node is the height of its
right subtree minus the height of its left subtree and a node with
balance factor 1, 0, or -1 is considered balanced. A node with any
other balance factor is considered unbalanced and requires rebalancing
the tree. The balance factor is either stored directly at each node or
computed from the heights of the subtrees.
  
  Both the left and right subtrees have a height of 4. The right
subtree of the left tree has a height of 3 which is still only 1 less
than 4. Can someone explain what I'm missing?
  
   binary-trees
  avl-tree
  data-structures 
  

  

flag



edited Oct 23 at 19:19






asked Oct 23 at 18:20

somas1
518●2●10


  

  
  
  


  
  
  
  
  
   
  
  
  
  

  





3 Answers
 oldest
newest
votes 




  

  
  6  
  
  
  
  
  To be balanced, every node in the tree must, either,
  
have no children, (be a "leaf" node)
Have two children.

  Or, if it has only one child, that child must be a leaf.
  In the chart you post, 9, 54 & 76 violate the last
rule.

  
  Properly balanced, the tree would look like:
  Root: 23
(23) -> 14 & 67
(14) -> 12 & 17
(12) -> 9
(17) -> 19
(67) -> 50 & 72
(50) -> 54
(72) -> 76

[linuxkernelnewbies] Linux-Kongress 2009 Program

2009-09-17 Thread Peter Teoh





http://www.linux-kongress.org/2009/program.html


  

   Get
news? 
  2009  2008  2007  2006  2005  2004  2003  2002  2001  2000  1999  1998  1997  About  Contact
  
   Want to
help? 

  



  


   
  The International Linux System Technology Conference
  


  


Home
Call for Papers
Program   
Abstracts   
Sponsoring





  

  
  
  
  Program overview
  
  
  

  
 Tuesday,
2009/10/27 
  
  
 Wednesday,
2009/10/28 
  
  
 Thursday,
2009/10/29 
  
  
 Friday,
2009/10/30 
  
  
 

  

  
  

  



  

  
  

  
 Tutorial
Day 1: Tuesday, 2009/10/27

  
  
9:00-18:00
  Registration 
  
  
10:00-18:00
 Network
Monitoring With Open Source Tools, Day 1 by Timo Altenfeld, Wilhelm
Dolle, Robin Schröder and Christoph Wegener  
  
  
10:00-18:00
 Linux
im Netzwerk, Day 1 by Johannes Hubertz, Jens Link and Thomas
Martens  
  
  
10:00-18:00
 Building
a high available virtualization cluster based on iSCSI storage and XEN 
Day 1 by Thomas Groß 
  
  
10:00-18:00
 A Linux
Kernel Safari, by Wolfgang Mauerer 
  
  
10:00-18:00
 IKEv2-based
Virtual Private Networks using strongSwan by Andreas Steffen 
  
  
10:00-18:00
 High-Availability
Clustering with OpenAIS and Pacemaker by Lars Marowsky-Brée 
  
  
 Tutorial
Day 2: Wednesday, 2009/10/28

  
  
9:00-18:00
  Registration 
  
  
10:00-18:00
 Network
Monitoring With Open Source Tools, Day 2 by Timo Altenfeld, Wilhelm
Dolle, Robin Schröder and Christoph Wegener  
  
  
10:00-18:00
 Linux
im Netzwerk, Day 2 by Johannes Hubertz, Jens Link and Thomas
Martens  
  
  
10:00-18:00
 Building
a high available virtualization cluster based on iSCSI storage and XEN
Day 2 by Thomas Groß 
  
  
10:00-18:00
 Deploying
VoIP - Identifying and avoiding pitfalls by Heison Chak 
  
  
 Technical
Sessions, Day 1: Thursday,
2009/10/29 
  
  
9:00-18:00
  Registration 
  
  
09:30-09:45
 Opening
of Linux-Kongress 2009 Program 
  
  
09:45-10:45
 Keynote:
 Ts'o,
Theodore: Linux and Open Source in 2010 and Beyond 
  
  
10:45-11:15
  Coffee break 
  
  
11:15-12:00
 QEMU -
The building block of Open Source Virtualization 
Glauber Costa
 Compiler
Optimization Survey 
Felix von Leitner
  
  
12:00-12:45
 View-OS:
Change your View on Virtualization. 
Renzo Davoli
 A
generic architecture and extension of eCryptfs 
André Osterhues
  
  
12:45-14:15
  Lunch break 
  
  
14:15-15:00
 Linux
multi-core scalability 
Andi Kleen
 Samba
status report 
Volker Lendecke 
  
  
15:00-15:45
 Real-Time
performance comparisons and improvements between 2.6 Linux Kernels 
Gazment Gerdeci
 dmraid
update 
Heinz Mauelshagen
  
  
15:45-16:15
  Coffee break 
  
  
16:15-17:00
 The
Good, the Bad, and the Ugly? Structure and Trends of Open Unix Kernels

Wolfgang Mauerer 
 Userspace
Application Tracing 
Jan Blunck
  
  
17:00-17:45
 Fighting
regressions with git bisect 
Christian Couder
 System
call tracing overhead 
Jörg Zinke
  
  
 Technical
Sessions, Day 2: Friday,
2009/10/30 
  
  
9:00-14:00
  Registration 
  
  
9:45-10:45
 Keynote:

  
  
10:45-11:15
  Coffee break 
  
  
11:15-12:00
 ext4,
btrfs and the others 
 Jan Kára 
 EDE, a
light desktop environment, description and best practices 
Zukan, Sanel

[linuxkernelnewbies] Video4Linux2 part 3: Basic ioctl() handling [LWN.net]

2009-09-17 Thread peter teoh






http://lwn.net/Articles/206765/


Video4Linux2 part 3: Basic ioctl() handling
[Posted October 30, 2006 by corbet]
 



  

  The LWN.net
Video4Linux2
API series.
  

  

Anybody who has spent any amount of time working through the Video4Linux2 API
specification will have certainly noted that V4L2 makes heavy use
of
the ioctl() interface. Perhaps more than just about any other
type of peripheral, video hardware has a vast number of knobs to tweak.
Video streams have many parameters associated with them, and,
often, there is quite a bit of processing done in the hardware. Trying
to
operate video hardware outside of its well-supported modes can lead to
poor
performance at best, and often no performance at all. So there is no
alternative to exposing many of the hardware's features and quirks to
the
end application.
Traditionally, video drivers have included ioctl()
functions of
approximately the same length as a Neal Stephenson novel; while the
functions often come to more satisfying conclusions than the novels,
they
do tend to drag a lot in the middle. So the V4L2 API was changed in
2.6.18; the interminable ioctl() function has been replaced
with a
large set of callbacks which implement the individual ioctl()
functions. There are, in fact, 79 of them in 2.6.19-rc3. Fortunately,
most drivers need not implement all - or even most - of the possible
callbacks. 
What has really happened is that the long ioctl() function
has
been moved into drivers/media/video/videodev.c. This code
handles
the movement of data between user and kernel space and dispatches
individual ioctl() calls to the driver. To use it, the driver
need only use video_ioctl2() as its ioctl() method
in the
video_device structure. Actually, most drivers should be able
to
use it as unlocked_ioctl() instead; the locking within the
Video4Linux2 layer can handle it, and drivers should have proper
locking in
place as well.

The first callback your driver is likely to implement is:


int (*vidioc_querycap)(struct file *file, void *priv, 
   struct v4l2_capability *cap);


This function handles the VIDIOC_QUERYCAP ioctl(),
which
asks a simple "who are you and what can you do?" question. Implementing
it
is mandatory for V4L2 drivers. In this function, as with all other V4L2
callbacks, the priv argument is the contents of file->private_data
field; the usual practice is to point it at
the driver's internal structure representing the device at open()
time. 
The driver should respond by filling in the
structure cap and returning the usual "zero or negative error
code" value. On successful return, the V4L2 layer will take care of
copying the response back into user space.

The v4l2_capability structure (defined in
) looks like this:


struct v4l2_capability
{
	__u8	driver[16];	/* i.e. "bttv" */
	__u8	card[32];	/* i.e. "Hauppauge WinTV" */
	__u8	bus_info[32];	/* "PCI:" + pci_name(pci_dev) */
	__u32   version;/* should use KERNEL_VERSION() */
	__u32	capabilities;	/* Device capabilities */
	__u32	reserved[4];
};


The driver field should be filled in with the name of the
device
driver, while the card field should have a description of the
hardware behind this particular device. Not all drivers bother with the
bus_info field; those that do usually use something like:


sprintf(cap->bus_info, "PCI:%s", pci_name(&my_dev));


The version field holds a version number for the driver. The
capabilities field is a bitmask describing various things that
the
driver can do:



   V4L2_CAP_VIDEO_CAPTURE: The device can capture video
data.
  
   V4L2_CAP_VIDEO_OUTPUT: The device can perform video
output.
  
   V4L2_CAP_VIDEO_OVERLAY: It can do video overlay onto
the frame buffer.
  
   V4L2_CAP_VBI_CAPTURE: It can capture raw video
blanking interval data.
  
   V4L2_CAP_VBI_OUTPUT: It can do raw VBI output.
  
   V4L2_CAP_SLICED_VBI_CAPTURE: It can do sliced VBI
capture.
  
   V4L2_CAP_SLICED_VBI_OUTPUT: It can do sliced VBI
output.
  
   V4L2_CAP_RDS_CAPTURE: It can capture Radio Data System
(RDS) data.
  
   V4L2_CAP_TUNER: It has a computer-controllable tuner.
  
   V4L2_CAP_AUDIO: It can capture audio data.
  
   V4L2_CAP_RADIO: It is a radio device.
  
   V4L2_CAP_READWRITE: It supports the read()
and/or write() system calls; very few devices will support
both. It makes little sense to write to a camera, normally.
  
   V4L2_CAP_ASYNCIO: It supports asynchronous I/O.
Unfortunately, the V4L2 layer as a whole does not yet support
asynchronous I/O, so this capability is not meaningful.
  
   V4L2_CAP_STREAMING: It supports ioctl()-controlled
streaming I/O.
  


The final field (reserved) should be left alone. The V4L2
specification requires that reserved be set to zero, but,
since
video_ioctl2() sets the entire structure to zero, that is
nicely
taken care of.

A fairly typical implementation can be found in the "vivi" driver:


static int vidioc_querycap (struct file *file, void  *priv,
	struct v4l

[linuxkernelnewbies] - loop-use-unlocked_ioctl.patch removed from -mm tree

2009-09-18 Thread Peter Teoh






http://www.mail-archive.com/mm-comm...@vger.kernel.org/msg24059.html


- loop-use-unlocked_ioctl.patch
removed from -mm tree
The patch titled

 loop: use unlocked_ioctl
has been removed from the -mm tree.  Its filename was
 loop-use-unlocked_ioctl.patch

This patch was dropped because it isn't in the present -mm lineup

--
Subject: loop: use unlocked_ioctl
From: Andrew Morton <[EMAIL PROTECTED]>

The last lock_kernel() has disappeared from loop.c.  Switch it over to using
unlocked_ioctl.

Cc: Diego Woitasen <[EMAIL PROTECTED]>
Cc: Christoph Hellwig <[EMAIL PROTECTED]>
Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
---

 drivers/block/loop.c |   12 +++-
 1 files changed, 7 insertions(+), 5 deletions(-)

diff -puN drivers/block/loop.c~loop-use-unlocked_ioctl drivers/block/loop.c
--- a/drivers/block/loop.c~loop-use-unlocked_ioctl
+++ a/drivers/block/loop.c
@@ -1124,12 +1124,14 @@ loop_get_status64(struct loop_device *lo
return err;
 }
 
-static int lo_ioctl(struct inode * inode, struct file * file,
-   unsigned int cmd, unsigned long arg)
+static long lo_ioctl(struct file *file, unsigned int cmd, unsigned long arg)
 {
-   struct loop_device *lo = inode->i_bdev->bd_disk->private_data;
+   struct inode *inode;
+   struct loop_device *lo;
int err;
 
+   inode = file->f_mapping->host;
+   lo = inode->i_bdev->bd_disk->private_data;
mutex_lock(&lo->lo_ctl_mutex);
switch (cmd) {
case LOOP_SET_FD:
@@ -1304,7 +1306,7 @@ static long lo_compat_ioctl(struct file 
arg = (unsigned long) compat_ptr(arg);
case LOOP_SET_FD:
case LOOP_CHANGE_FD:
-   err = lo_ioctl(inode, file, cmd, arg);
+   err = lo_ioctl(file, cmd, arg);
break;
default:
err = -ENOIOCTLCMD;
@@ -1340,7 +1342,7 @@ static struct block_device_operations lo
.owner =THIS_MODULE,
.open = lo_open,
.release =  lo_release,
-   .ioctl =lo_ioctl,
+   .unlocked_ioctl = lo_ioctl,
 #ifdef CONFIG_COMPAT
.compat_ioctl = lo_compat_ioctl,
 #endif
_

Patches currently in -mm which might be from [EMAIL PROTECTED] are

origin.patch
revert-ecryptfs-fix-lookup-error-for-special-files.patch
process_zones-fix-recovery-code.patch
remove-bdput-from-do_open-in-fs-block_devc.patch
slow-down-printk-during-boot.patch
slow-down-printk-during-boot-fix-2.patch
git-acpi.patch
acpi-add-reboot-mechanism.patch
git-alsa.patch
working-3d-dri-intel-agpko-resume-for-i815-chip.patch
revert-gregkh-driver-block-device.patch
revert-gregkh-driver-warn-when-statically-allocated-kobjects-are-used.patch
sysfs-crash-debugging.patch
git-dma.patch
git-dma-makefile-fix.patch
disable-ioat.patch
git-dvb.patch
git-dvb-fixup-2.patch
infiniband-work-around-gcc-slub-problem.patch
adbhid-produce-all-capslock-key-events.patch
iforce-warning-fix.patch
console-keyboard-events-and-accessibility-fix.patch
console-keyboard-events-and-accessibility-fix-2.patch
git-kvm.patch
git-libata-all.patch
ata-add-the-sw-ncq-support-to-sata_nv-for-mcp51-mcp55-mcp61patch.patch
ide-arm-hack.patch
git-mmc.patch
git-mmc-fixup.patch
gregkh-driver-driver-core-change-add_uevent_var-to-use-a-struct-vs-git-mmc.patch
git-mtd.patch
git-netdev-all.patch
e1000e-build-fix.patch
revert-8139too-clean-up-i-o-remapping.patch
git-net.patch
git-net-fixup.patch
git-backlight.patch
git-nfs-vs-git-unionfs.patch
git-nfsd.patch
revert-gregkh-pci-pci_bridge-device.patch
pci-remove-irritating-try-pci=assign-busses-warning.patch
fix-ide-legacy-mode-resources-fix.patch
git-s390.patch
git-scsi-misc.patch
advansys-printk-fix.patch
git-block-fixup.patch
git-block-fix-headers_check.patch
git-unionfs.patch
git-unionfs-build-fix.patch
git-unionfs-fix-2.patch
fix-gregkh-usb-usb-sisusb2vga-convert-printk-to-dev_-macros.patch
git-wireless.patch
git-wireless-fixup.patch
git-wireless-vs-gregkh-driver-driver-core-change-add_uevent_var-to-use-a-struct.patch
git-wireless-printk-fixes.patch
net-add-ath5k-wireless-driver-fix.patch
x86_64-get-mp_bus_to_node-as-early-v3.patch
ich-force-hpet-ich7-or-later-quirk-to-force-detect-enable-fix.patch
ich-force-hpet-ich5-quirk-to-force-detect-enable-fix.patch
git-xfs.patch
git-kgdb-fixup.patch
vmscan-give-referenced-active-and-unmapped-pages-a-second-trip-around-the-lru.patch
sparsemem-record-when-a-section-has-a-valid-mem_map-fix.patch
readahead-combine-file_ra_stateprev_index-prev_offset-into-prev_pos-fix.patch
readahead-combine-file_ra_stateprev_index-prev_offset-into-prev_pos-fix-2.patch
vm-dont-run-touch_buffer-during-buffercache-lookups.patch
alloc_pages-permit-get_zeroed_pagegfp_atomic-from-interrupt-context.patch
fs-introduce-write_begin-write_end-and-perform_write-aops.patch
git-nfs-vs-nfs-convert-to-new-aops.patch
memoryless-nodes-introduce-mask-of-nodes-with-memory-fix.patch
categorize-gfp-flags-fix.patch
bias-the-location-of-pages-fre

[linuxkernelnewbies] Talking to Device Files (writes and IOCTLs)

2009-09-18 Thread Peter Teoh






http://tldp.org/LDP/lkmpg/2.6/html/x892.html



  

  The Linux Kernel Module
Programming Guide


  Prev
  Chapter 7. Talking
To Device Files
  Next

  



7.1. Talking to Device Files
(writes and IOCTLs)
Device files are supposed to represent physical devices. Most
physical devices are used for output as well as input, so there has to
be some mechanism for device drivers in the kernel to get the output to
send to the device from processes. This is done by opening the device
file for output and writing to it, just like writing to a file. In the
following example, this is implemented by device_write.
This is not always enough. Imagine you had a serial port connected
to a modem (even if you have an internal modem, it is still implemented
from the CPU's perspective as a serial port connected to a modem, so
you don't have to tax your imagination too hard). The natural thing to
do would be to use the device file to write things to the modem (either
modem commands or data to be sent through the phone line) and read
things from the modem (either responses for commands or the data
received through the phone line). However, this leaves open the
question of what to do when you need to talk to the serial port itself,
for example to send the rate at which data is sent and received.
The answer in Unix is to use a special function called ioctl (short for Input Output ConTroL). Every
device can have its own ioctl commands,
which can be read ioctl's (to send
information from a process to the kernel), write ioctl's
(to return information to a process), [1] both or neither. The ioctl function is called with three parameters:
the file descriptor of the appropriate device file, the ioctl number,
and a parameter, which is of type long so you can use a cast to use it
to pass anything. [2]
The ioctl number encodes the major device number, the type of the
ioctl, the command, and the type of the parameter. This ioctl number is
usually created by a macro call (_IO, _IOR, _IOW or _IOWR --- depending on the type) in a header
file. This header file should then be included both by the programs
which will use ioctl (so they can generate
the appropriate ioctl's) and by the kernel
module (so it can understand it). In the example below, the header file
is chardev.h and the program which uses it
is ioctl.c.
If you want to use ioctls in your own
kernel modules, it is best to receive an official ioctl
assignment, so if you accidentally get somebody else's ioctls, or if they get yours, you'll know
something is wrong. For more information, consult the kernel source
tree at Documentation/ioctl-number.txt.

Example 7-1. chardev.c

  

  
  /*
 *  chardev.c - Create an input/output character device
 */

#include 	/* We're doing kernel work */
#include 	/* Specifically, a module */
#include 
#include 	/* for get_user and put_user */

#include "chardev.h"
#define SUCCESS 0
#define DEVICE_NAME "char_dev"
#define BUF_LEN 80

/* 
 * Is the device open right now? Used to prevent
 * concurent access into the same device 
 */
static int Device_Open = 0;

/* 
 * The message the device will give when asked 
 */
static char Message[BUF_LEN];

/* 
 * How far did the process reading the message get?
 * Useful if the message is larger than the size of the
 * buffer we get to fill in device_read. 
 */
static char *Message_Ptr;

/* 
 * This is called whenever a process attempts to open the device file 
 */
static int device_open(struct inode *inode, struct file *file)
{
#ifdef DEBUG
	printk(KERN_INFO "device_open(%p)\n", file);
#endif

	/* 
	 * We don't want to talk to two processes at the same time 
	 */
	if (Device_Open)
		return -EBUSY;

	Device_Open++;
	/*
	 * Initialize the message 
	 */
	Message_Ptr = Message;
	try_module_get(THIS_MODULE);
	return SUCCESS;
}

static int device_release(struct inode *inode, struct file *file)
{
#ifdef DEBUG
	printk(KERN_INFO "device_release(%p,%p)\n", inode, file);
#endif

	/* 
	 * We're now ready for our next caller 
	 */
	Device_Open--;

	module_put(THIS_MODULE);
	return SUCCESS;
}

/* 
 * This function is called whenever a process which has already opened the
 * device file attempts to read from it.
 */
static ssize_t device_read(struct file *file,	/* see include/linux/fs.h   */
			   char __user * buffer,	/* buffer to be
			 * filled with data */
			   size_t length,	/* length of the buffer */
			   loff_t * offset)
{
	/* 
	 * Number of bytes actually written to the buffer 
	 */
	int bytes_read = 0;

#ifdef DEBUG
	printk(KERN_INFO "device_read(%p,%p,%d)\n", file, buffer, length);
#endif

	/* 
	 * If we're at the end of the message, return 0
	 * (which signifies end of file) 
	 */
	if (*Message_Ptr == 0)
		return 0;

	/* 
	 * Actually put the data into the buffer 
	 */
	while (length && *Message_Ptr) {

		/* 
		 * Because the buffer is in the user data segment,
		 * not the kernel data segment, assignment wouldn't
		 * work. Instead, we have to use put_user which

[linuxkernelnewbies] ioctl() and termios for "canonical" read - Dev Shed

2009-09-18 Thread Peter Teoh






http://forums.devshed.com/c-programming-42/ioctl-and-termios-for-canonical-read-60821.html


 
main(){

struct termios oldT, newT;
char c;

ioctl(0,TCGETS,&oldT); /*get current mode

newT=oldT;
newT.c_lflag &= ~ECHO; /* echo off */
newT.c_lflag &= ~ICANON; /*one char @ a time*/

ioctl(0,TCSETS,&newT); /* set new terminal mode */

read(0,&c,1); /*read 1 char @ a time from stdin*/

ioctl(0,TCSETS,&oldT); /* restore previous terminal mode */

}

[linuxkernelnewbies] compcache - Project Hosting on Google Code

2009-09-19 Thread Peter Teoh





http://code.google.com/p/compcache/


  

  
   compcache 
   compressed
in-memory swap device for Linux 
  
  
 
  

  


  

  
  
   Project Home 
  
  
    
  
  
   Downloads
  
  
  
    
  
  
   Wiki 
  
  
    
  
  
   Issues 
  
  
    
  
  
   Source 
  
  
    
   

  


  

  
  
Summary  |  Updates 
|  People 
  
  
  
   
  

  








  

   Star this project 
  

  






  

  Code license:
   GNU General Public License v2 


  Content license:
Creative Commons 3.0 BY-SA  


  Labels:
   linux, memory,
  compress,
  swap,
  virtualization,
  embedded,
  android,
  openmoko,
  beagleboard,
  moblin,
  kernel,
  performance,
  netbook
  

  





 Show
all Featured downloads:
compcache-0.6.tar.gz





 Show
all Featured wiki pages:
CompilingAndUsingNew
Performance xvMalloc 





  

  Links:
  
  
LinkedIn Profile
  
  


  Blogs:
  
  
vflare
  
  


  Feeds:
  
  
Project
feeds
  
  

  




 People details

  

  Project owners:


   
   nitingupta910 

  





 
This project creates RAM based block device (named ramzswap)
which acts as swap disk. Pages swapped to this disk are compressed and
stored in memory itself. 
Compressing
pages and keeping them in RAM virtually increases its capacity. This
allows more applications to fit in given amount of memory. 
The usual argument I get is - memory is so cheap so why
bother with compression? So I list here some of the use cases.
Rest depends on your imagination :) 

  Netbooks:
Market is now getting flooded with these "lighweight laptops". These
are memory constrained but have CPU enough to drive on compressed
memory (e.g. Cloudbook features 1.2 GHz processor!). 


  Virtualization: With compcache at hypervisor
level, we can compress any part of guest memory
transparently - this is true for any type of Guest OS (Linux,
Windows etc.). This should allow running more number of VMs for given
amount of total host memory. 


  Embedded Devices:
Memory is scarce and adding more memory increases device cost. Also,
flash storage suffers from wear-leveling issues, so its useful if we
can avoid using them as swap device. 

Contact/Mailing Lists
linux-mm-cc at laptop dot org (Info
Page) 
ngupta at vflare dot org (vflare.org) 
Help

  CompilingAndUsingNew
(for version 0.6.x) 

News

  Aug 20, 09 - compcache-0.6 is released. 
  Aug 09, 09 - multiple_rzs and default branch
merged. 
  Aug 06, 09 - compcache-0.6-pre3 released. It
includes experimental support for swap free notify feature
which eliminates any stale data from ramzswap -- README and CompilingAndUsingNew
has details on how to enable this feature. It also includes fix for
invalid stats reporting on ARM ( Issue
#34 ). See Changelog for details. 
  Jul 20, 09 - compcache-0.6-pre2 is out. It includes fix for
crashes on ARM (see  Issue
#33 ). See Changelog included for full list for changes. 
  Jul 14, 09 - compcache-0.6-pre1 released! New features include
creating multiple ramzswap devices, ability to have file as a
backing swap. See CompilingAndUsingNew
for help. This required major code changes, so it needs good amount of
testing before any serious use. 
  May 28, 09 - Project switches to mercurial. 
  May 26, 09 - compcache article featured at LWN.net.
  
  Apr 08, 09 - compcache 0.5.3 released. This includes major
cleanups. LZO modules are no longer included in package since most
distros now already include these. There is change in name for module
and parameters - README and CompilingAndUsing
page are updated to reflect these changes. 
  Apr 03, 09 - It could not make it into 2.6.30 due to lack of
supporting data. See this post on LKML. Now the point is, what data to
collect? Can we get it by next kernel release?? 
  Mar 29, 09 - Effort to push it to mainline has started.
I have uploaded the kernel patch in Downloads section. Please test it
with whatever benchmarks you can think of and let me know any
problems you get. Your support can help make it into mainline. Please
note that there is change in module name (compcache -> ramzswap) and
change in names of configuration parameters. So, please go through
Documentation/blockdev/ramzswap.txt. The patch applies against 2.6.29
(during menuconfig, select Device Drivers -> Block Devices ->
Compressed RAM swap device). Thanks! 
  Mar 11, 09 - compcache
0.5.2 released! Most notable features include ability to send
incompressible pages to physical swap and no memory allocation for
zero-filled pages. See Changelog included for all details. There are
also changes to module parameters

[linuxkernelnewbies] VMchannel Requirements - KVM

2009-09-19 Thread Peter Teoh






http://www.linux-kvm.org/page/VMchannel_Requirements

 Requirements 

   We want an interface between the guest and the host
  


   The channel is to be used for simple communication, like
sharing of the clipboard between the user desktop and the guest desktop

   For relatively low rate of data transfer -- a few MB/s
  
   Events to be delivered to the guest, like 'shutdown',
'reboot', 'logoff'
  
   Queries to the guest, like 'which users are logged in'
  

  


   Survive live migration
  


   Support for multiple agents (consumers of the data) on the guest
  


   Multiple channels could be opened at the same time
  


   In multi-channels case, one blocked channel shouldn't block
communication between others (or one channel shouldn't hog all the
bandwidth)
  


   Stable ABI (for future upgrades)
  


   Channel addressing

   An agent in the guest should be able to find the channel
it's interested in
  

  


   Dynamic channel creation
  


   Security

   No threats to the host
  
   Unprivileged user should be able to use the channel
  

  


   Should work out of the box, without any configuration necessary
on the part of the guest
  


   Notifications of channels being added / removed (hotplugging)
  


   Notifications on connection / disconnection on guest as well as
host side
  


   An API inside qemu to communicate with agents in the guest
  



[edit]

 History 
A few reasons why the obvious solutions do not work:


   via the fully emulated serial device.

   performance (exit per byte)
  
   scalability - only 4 serial ports per guest
  
   accessed by root only in the guest
  

  


   via TCP/IP network sockets

   The guest may not have networking enabled
  
   The guest firewall may block access to the host IPs
  
   Windows can't bind sockets to specific ethernet interfaces
  

  


   via user net 
http://thread.gmane.org/gmane.comp.emulators.qemu/35780
  


   via slirp 
This implementation does exist upstream as "-net channel" http://www.nabble.com/-PATCH--specify-vmchannel-as-a-net-option-td21911523.html

   Again, based on networking so same drawbacks mentioned above
apply
  
   Currently used by libguestfs
  

  



[edit]

 Use Cases 

   Guest - Host clipboard copy/paste operations

   By a VMM or via an internal API within qemu
  

  
   libguestfs (offline
usage)

   For poking inside a guest to fetch the list of installed
apps, etc.
  

  
   Online usage

   Locking desktop session when vnc session is closed
  

  
   Cluster I/O Fencing aka STONITH

   Current models require networking between guest/host

   fence_virsh, xen0 -> ssh to defined host and to perform
fencing; no migration tracking; requires ssh key distribution to work.
  
   fence_xvm -> tracks migrations, but requires
multicast between guest/host; distributed key recommended but not
required
  

  
   Using VMChannel-Serial, the requirement of guest-host can be
avoided
  

Key distribution of any sort can be avoided, making this easier to
configure than existing solutions

[linuxkernelnewbies] Getting started with virtualization - FedoraProject

2009-09-19 Thread Peter Teoh

http://fedoraproject.org/wiki/Virtualization_Quick_Start

Getting started with virtualization
From FedoraProject
(Redirected from Virtualization Quick Start)
Jump to: navigation,
search

Contents
[hide]

1 Using virtualization
on fedora
2 Installing and
configuring fedora for virtualized guests

2.1 System requirements

2.1.1 Additional
requirements for para-virtualized guests
2.1.2 Additional
requirements for fully virtualized guests

2.2 Installing the
virtualization packages
2.3 Introduction to
virtualization with fedora
2.4 Creating a fedora
guest

2.4.1 Creating a
fedora guest with virt-install
2.4.2 Creating a
fedora guest with virt-manager

2.5 Remote management
2.6 Guest system
administration

2.6.1 Managing guests
with virt-manager
2.6.2 Managing guests
with virsh
2.6.3 Managing guests
with qemu-kvm

3 Troubleshooting
virtualization

3.1 SELinux
3.2 Log files
3.3 Serial console
access for troubleshooting and management

3.3.1 Host serial
console access
3.3.2 Para-virtualized
guest serial console access
3.3.3 Fully
virtualized guest serial console access

3.4 Accessing data on
guest disk images
3.5 Getting help

3.5.1 Resources

3.6 References

Using virtualization on fedora
Fedora provides virtualization with both the KVM and the Xen
virtualization platforms. For information on other virtualization
platforms, refer to http://virt.kernelnewbies.org/TechComparison.

Xen supports para-virtualized guests as well as fully
virtualized guests with para-virtualized drivers. Para-virtualization
is faster than full virtualization but does not work with non-Linux
operating systems or Linux operating system without the Xen kernel
extensions. Xen fully virtualized are slower than KVM fully virtualized
guests.

KVM offers fast full virtualization, which requires the
virtualization instructions sets on your processor. KVM requires an x86
intel or AMD processors with virtualization extensions enabled. Without
these extensions KVM uses QEMU software virtualization.

Other virtualization products and packages are available but are not
covered by this guide.

For information on Xen, refer to http://wiki.xensource.com/xenwiki/
and the Fedora Xen pages.

For information on KVM, refer to http://kvm.qumranet.com/kvmwiki.

Fedora uses Xen version 3.0.x. Xen 3.0.0 was released in
December of 2005 and is incompatible with guests created using Xen
2.0.x versions.

Installing and configuring fedora for
virtualized guests
This section covers setting up Xen, KVM or both on your system.
After the successful completion of this section you will be able to
create virtualized guest operating systems.

System requirements
The common system requirements for virtualization on fedora are:

At least 600MB of hard disk storage per guest. A minimal
command-line fedora system requires 600MB of storage. Standard fedora
desktop guests require at least 3GB of space.

At least 256 megs of RAM per guest plus 256 for the base OS.
At least 756MB is recommended for each guest of a modern operating
system. A good rule of thumb is to think about how much memory is
required for the operating system normally and allocate that much to
the virtualized guest.

Xen host or Domain-0 support requires Fedora 8. Support will
return once parvirt_ops features are implemented
in the upstream kernel.

Additional requirements for
para-virtualized guests

Xen. KVM does not support para-virtualization at this time.
The kernel-xen package is required with versions of Fedora older than
10.

Any x86-64 or Intel Itanium CPU or any x86 CPU with the PAE
extensions. Many older laptops (particularly those based on Pentium
Mobile / Centrino) do not have PAE support. To determine if a CPU has
PAE extensions, execute:

$ grep pae /proc/cpuinfo
flags : fpu tsc msr pae mce cx8 apic mtrr mca cmov pat pse36 mmx fxsr sse syscall mmxext 3dnowext 3dnow up ts

The above output shows a CPU with the PAE extensions. If the command
returns nothing, then the CPU does not support para-virtualization.

Additional requirements for fully
virtualized guests
Full virtualization with Xen or KVM requires a CPU with
virtualization extensions, that is, the Intel VT or AMD-V extensions.
Verify whether your Intel CPU has Intel VT support (the 'vmx' flag):

$ grep vmx /proc/cpuinf

[linuxkernelnewbies] redhat.com | RHCE Prep Guide

2009-09-19 Thread peter teoh






http://www.redhat.com/certification/rhce/prep_guide/


RHCE Certification:
RHCE and RHCT Exam Preparation Guide



Overview
This guide provides information candidates may use in preparing to
take the RHCT or RHCE exam. Red Hat is not responsible for the content
or accuracy of other guides, books, online resources, or any other
information provided by organizations or individuals other than Red Hat
Global Learning Services. Red Hat reserves the right to change this
Guide when appropriate, and candidates who have enrolled in forthcoming
classes or exams are advised to check this guide periodically for
changes.
Performance-based Exams
The Red Hat Certified Engineer (RHCE) and Red Hat Certified
Technician (RHCT) exams are performance-based evaluations of Red Hat
Enterprise Linux system administration skills and knowledge. Candidates
perform a number of routine system administration tasks and are
evaluated on whether they have met specific objective criteria.
Performance-based testing means that candidates must perform tasks
similar to what they must perform on the job.
Prospective employers of RHCEs and RHCTs should verify any and all
claims by people claiming to hold one of these certificates by
requesting their certificate number and verifying
it here.
Authorized Training Partners
Only Red Hat and Red Hat Certified Training Partners administer the
RHCE and RHCT exams. Prospective candidates should exercise due
diligence when purchasing a seat in an RHCE or RHCT exam from a
provider other than Red Hat itself. They should verify that the
provider is, in fact, an authorized training partner in good standing.
Please notify greymar...@redhat.com
about organizations that purport to offer the RHCE or RHCT exams, but
who are not Red Hat Certified Training Partners.
Official scores for the RHCE and RHCT exams come exclusively from
Red Hat Certification Central. Red Hat does not authorize examiners or
training partners to report results to candidates directly. Scores on
the exam are usually reported within three (3) US business days.
Exam results are reported as section scores. Red Hat does not report
performance on individual items, nor will it provide additional
information upon request.
Preparation for the RHCT and RHCE Exams
Red Hat encourages all candidates for RHCT and RHCE to consider
taking one or more of its official training courses to help prepare for
the RHCE or RHCT exam. Attendance in these classes is not required, and
one can choose to take just an exam. Many successful candidates who
have come to class already possessing substantial skills and knowledge
have reported that the class made a positive difference for them.
To help you determine the best courses to take, Red Hat provides online skills
assessment.
While attending Red Hat's classes can be an important part of one's
preparation to take the RHCE or RHCT exam, attending class does not
guarantee success on the exam. Previous experience, practice, and
native aptitude are also important determinants of success.
Many books and other resources on system administration for Red
Hat's OS products are available. Red Hat does not officially endorse
any as preparation guides for the RHCT or RHCE exams. Nevertheless, you
may find additional reading deepens understanding and can prove helpful.
Format of the RHCE and RHCT Exams
The RHCT exam is a subset of the RHCE exam delivered separately.
Effective May 1, 2009, this exam is a single section lasting 2.0 hours.
Previously, it had been two sections lasting a combined 3.0 hours.
Consolidation and reorganization have made it possible to cover the
same material more efficiently.
Effective May 1, 2009, the RHCE exam is a single section lasting 3.5
hours. Previously, it had been two sections lasting a combined 5.5
hours. The content has be consolidated and reorganized into a single
section in which time is used more efficiently. The RHCE exam consists
of RHCT components (essentially the RHCT exam) plus RHCE-specific
components. It is possible to earn RHCT in an RHCE exam if one has met
the RHCT requirements but not the RHCE ones.
Study Points for the RHCE Exam
Prerequisite skills for RHCT and RHCE
Candidates should possess the following skills, as they may be
necessary in order to fulfill requirements of the RHCT and RHCE exams:

  use standard command line tools (e.g., ls, cp, mv, rm, tail,
cat, etc.) to create, remove, view, and investigate files and
directories
  use grep, sed, and awk to process text streams and files
  use a terminal-based text editor, such as vim or nano, to modify
text files
  use input/output redirection
  understand basic principles of TCP/IP networking, including IP
addresses, netmasks, and gateways for IPv4 and IPv6
  use su to switch user accounts
  use passwd to set passwords
  use tar, gzip, and bzip2 
  configure an email client on Red Hat Enterprise Linux
  use text and/or graphical browser to access HTTP/HTTPS URLs
  use lftp to access FTP URLs

RHCT skills
Troubleshooting and

[linuxkernelnewbies] Idle scan - Wikipedia, the free encyclopedia

2009-09-19 Thread peter teoh





http://en.wikipedia.org/wiki/Idle_scan

Idle scan
From Wikipedia, the free encyclopedia
You have new messages (last change).
Jump to: navigation,
search


  

  
  
  
  This article contains too much
  jargon and may need
simplification or further explanation. Please discuss this issue on the
  talk page, and/or remove or explain jargon
terms used in the article. Editing
help is available. (December 2007)

  

The idle scan is a TCP port
scan method that through utility software tools such as Nmap and Hping
allow sending spoofed packets to a computer. This sophisticated exploit
is dual-hatted as a port scanner and maps out trusted IP relationships
between machines. The attack involves sending forged packets to a
specific machine -target- in an effort to find distinct characteristics
in another -zombie- machine. Discovered by Salvatore Sanfilippo (also
known by his handle "Antirez") in 1998[1], the idle scan has
been used by many Black Hat
"hackers" to covertly identify open ports on a target computer in
preparation for attacking it. Although it was originally named 'dumb
scan', the term 'idle scan' was coined in 1999, after the publication
of a proof of concept 16-bit identification field (IPID) scanner named
"idlescan", by Filipe Almeida (aka LiquidK). This type of scan can also
be referenced as 'zombie scan'; all the nomenclatures are due to the
nature of one of the computers involved in the attack.

  

  
  
  Contents
  [hide]
  
1 Basic mechanics
2 Hping method
3 Nmap method
4 Effectiveness
5 External links
  
  

  


[edit] Basic mechanics
Idle scans take advantage of predictable TCP sequence numbers. An
attacker would first scan for a host with a sequential and predictable
sequence number (IPID). The latest versions of Linux, Solaris, and OpenBSD
are not suitable targets, since the IPID has been implemented with
patches[2].
Computers chosen to be used in this stage are sometimes known as
"zombies". Once a suitable zombie is found the next step would be to
send a SYN packet
to the target computer, spoofing the IP address from the zombie. If the
port of the target computer is open it will respond with a SYN/ACK
packet back to the zombie. The zombie computer will then send a RST
packet to the target computer because it did not actually send the SYN
packet in the first place. Since the zombie had to send the RST packet
it will increment its IPID. This is how an attacker would find out if
the targets port is open. If the IPID is not incremented then the
attacker would know that the particular port is closed.

  

  
  
  
  
  
The first stage of an idle scan
  
  
  
  
  
  
  
  
The second stage of an idle scan
  
  
  

  

[edit] Hping method
The hping method for idle scanning provides a lower level example
for how idle scanning is performed. In this example the target host
(172.16.0.100) will be scanned using an idle host (172.16.0.105). An
open and a closed port will be tested to see how each scenario plays
out.
First, establish that the idle host is actually idle, send packets
using hping2 and observer the id numbers increase incrementally by one.
If the id numbers increase haphazardly, the host is not actually idle.
[r...@localhost hping2-rc3]# ./hping2 -S 172.16.0.105
HPING 172.16.0.105 (eth0 172.16.0.105): S set, 40 headers + 0 data bytes
len=46 ip=172.16.0.105 ttl=128 id=1371 sport=0 flags=RA seq=0 win=0 rtt=0.3 ms
len=46 ip=172.16.0.105 ttl=128 id=1372 sport=0 flags=RA seq=1 win=0 rtt=0.2 ms
len=46 ip=172.16.0.105 ttl=128 id=1373 sport=0 flags=RA seq=2 win=0 rtt=0.3 ms
len=46 ip=172.16.0.105 ttl=128 id=1374 sport=0 flags=RA seq=3 win=0 rtt=0.2 ms
len=46 ip=172.16.0.105 ttl=128 id=1375 sport=0 flags=RA seq=4 win=0 rtt=0.2 ms
len=46 ip=172.16.0.105 ttl=128 id=1376 sport=0 flags=RA seq=5 win=0 rtt=0.2 ms
len=46 ip=172.16.0.105 ttl=128 id=1377 sport=0 flags=RA seq=6 win=0 rtt=0.2 ms
len=46 ip=172.16.0.105 ttl=128 id=1378 sport=0 flags=RA seq=7 win=0 rtt=0.2 ms
len=46 ip=172.16.0.105 ttl=128 id=1379 sport=0 flags=RA seq=8 win=0 rtt=0.4 ms
   

Send a spoofed syn packet to the target host on a port you expect to
be open. In this case, port 22 (ssh) is being tested.
# hping2 --spoof 172.16.0.105 -S 172.16.0.100 -p 22 -c 1
HPING 172.16.0.100 (eth0 172.16.0.100): S set, 40 headers + 0 data bytes

--- 172.16.0.100 hping statistic ---
1 packets tramitted, 0 packets received, 100% packet loss
round-trip min/avg/max = 0.0/0.0/0.0 ms

Since we spoofed the packet, we did not receive a reply and hping
reports 100% packet loss. The target host replied directly to the idle
host with a syn/ack packet. Now, check the idle host to see if the id
number has increased.
# hping2 -S 172.16.0.105 -p 445 -c 1
HPING 172.16.0.105 (eth0 172.16.0.105): S set, 40 headers + 0 data bytes
len=46 ip=172.16.0.105 ttl=128 DF id=1381 sport=44

[linuxkernelnewbies] XenParavirtOps - Xen Wiki

2009-09-20 Thread Peter Teoh






http://wiki.xensource.com/xenwiki/XenParavirtOps

Xen paravirt_ops
for x86 Linux


What is
paravirt_ops?

paravirt_ops
(pv-ops for short) is a piece of Linux kernel infrastructure to allow
it to run paravirtualized on a hypervisor. It currently supports
VMWare's VMI, Rusty's lguest, and most interestingly, Xen. 
The
infrastructure allows you to compile a single kernel binary which will
either boot native on bare hardware (or in hvm mode under Xen), or boot
fully paravirtualized in any of the environments you've enabled in the
kernel configuration. 
It
uses various techniques, such as binary patching, to make sure that the
performance impact when running on bare hardware is effectively
unmeasurable when compared to a non-paravirt_ops kernel. 
At present paravirt_ops is available for x86_32,
x86_64 and ia64 architectures. 
Xen
support has been in mainline Linux since 2.6.23, and is the basis of
all on-going Linux/Xen development (the old Xen patches officially
ended with 2.6.18.x-xen, though various distros have their own
forward-ports of them). Redhat has decided to base all their future
Xen-capable products on the in-kernel Xen support, starting with Fedora
9. 

Current state

Xen/paravirt_ops
has been in mainline Linux since 2.6.23, though it is probably first
usable in 2.6.24. Latest Linux kernels (2.6.27 and newer) are good for
domU use. Fedora 9, Fedora 10 and Fedora 11 distributions include
pv_ops based Xen domU kernel. 

  Features in 2.6.26: 

  x86-32 support 
  SMP 
  Console (hvc0) 
  Blockfront (xvdX) 
  Netfront 
  Balloon (reversible contraction only) 
  paravirtual framebuffer + mouse (pvfb) 
  2.6.26 onwards pv domU is PAE-only (on x86-32) 

  
  Features added in 2.6.27: 

  x86-64 support 
  Save/restore/migration 
  Further pvfb enhancements 

  
  Features added in 2.6.28: 

  ia64 (itanium) pv_ops xen domU support 
  Various bug fixes and cleanups 
  
Expand Xen blkfront for > 16 xvd devices 
  
  Implement CPU hotplugging 
  Add debugfs support 

  
  Features added in 2.6.29: 

  bugfixes 
  performance improvements 
  swiotlb (required for dom0 support) 

  
  Features added in 2.6.30: 

  bugfixes 

  
  Work in progress: 

  dom0
support, currently planned for Linux 2.6.32 or 2.6.33 (latest pv_ops
dom0 patches can be found from jeremy's git tree, see instructions
below) 
  pv-hvm driver support 
  Balloon expansion (using memory hotplug) to grow bigger than
initial domU memory size 

  
  To be done: 

  Device hotplug 
  Other device drivers 
  kdump/kexec 
  blktap support (dom0) 
  framebuffer backend (dom0) 
  ...? 

  


Using
Xen/paravirt_ops


Building with
domU support


  Get a current kernel. The latest kernel.org kernel is generally a
good choice. 
  Configure as normal; you can start with your current .config file

  If building 32 bit kernel make sure you have CONFIG_X86_PAE
enabled (which is set by selecting CONFIG_HIGHMEM64G) 

  non-PAE mode doesn't work in 2.6.25, and has been dropped
altogether from 2.6.26. 

  
  Enable these core options: 

  CONFIG_PARAVIRT_GUEST 
  CONFIG_XEN 

  
  And Xen pv device support 

  CONFIG_HVC_DRIVER and CONFIG_HVC_XEN 
  CONFIG_XEN_BLKDEV_FRONTEND 
  CONFIG_XEN_NETDEV_FRONTEND 

  
  And build as usual 


Running

The kernel
build process will build two kernel images: arch/x86/boot/bzImage and
vmlinux. They are two forms of the same kernel, and are functionally
identical. However, only relatively recent versions of the Xen tools
stack support loading bzImage files (post-Xen 3.2), so you must use the
vmlinux form of the kernel (gzipped, if you prefer). If you've built a
modular kernel, then all the modules will be the same either way. Some
aspects of the kernel configuration have changed: 

  The console is now /dev/hvc0, so put "console=hvc0" on the kernel
command line 
  Disk
devices are always /dev/xvdX. If you want to dual-boot a system on both
Xen and native, then it's best that use use lvm, LABEL or UUID to refer
to your filesystems in your /etc/fstab. 


Testing

Xen/paravirt_ops
has not had wide use or testing, so any testing you do is extremely
valuable. If you have an existing Xen configuration, then updating the
kernel to a current pv-ops and trying to use it as you usually would,
then any feedback on how well that works (success or failure) would be
very interesting. In particular, information about: 

  performance: better/worse/same? 
  bugs: outright crash, or something just not right? 
  missing features: what can't you live without? 


Debugging

If you do
encounter problems, then getting as much information as possible is
very helpful. If the domain crashes very early, before any output
appears on the console, then booting with: "earlyprintk=xen" should
provide some useful inf

[linuxkernelnewbies] Memory Barriers Wrap-up - Kernel Mustard

2009-09-21 Thread peter teoh






http://msmvps.com/blogs/kernelmustard/archive/2004/09/20/13836.aspx

Memory Barriers Wrap-up
Hello blogosphere! I
hope everyone had a great time this weekend puzzling through the
mysteries of memory barriers. Personally, I spent the weekend coding
and reading about realtivity (a recent post by Raymond Chen got me
re-re-re-re-re-started on physics again).
In addition to the above-mentioned nonsense, I got some time to drag
out the intel manuals to see what they had to say about x86 memory
barriers. For the curious, the details can be found in section 7.3 of
the 3rd volume of the Intel Pentium 4 manuals.

The situation is slightly different between the {i486, P5} and
P6+ (Pentium Pro, Pentium II, Xeon, etc.) processors. The first group
of chips enforces relatively strong program ordering of reads and
writes at all times, with one exception: read misses are allowed to go
ahead of write hits. In other words, if a program writes to memory
location 1 and then reads from memory location 2, the read is allowed
to hit the system bus before the write. This is because the execution
stream inside the processor is usually totally blocked waiting for
reads, whereas writes can be "queued" to the cache somewhat more
asynchronously in the core without blocking program flow.

The P6-based processors present a slightly different story,
adding support for out-of-order writes of long string data and
speculative read support. In order to control these features of the
processor, Intel has supplied a few instructions to enforce memory
ordering. There are three explicit fence instructions - LFENCE, SFENCE,
and MFENCE.


  LFENCE - Load fence - all pending load operations must be
completed by the time an LFENCE executes
  
  SFENCE - Store fence - all pending store operations must be
completed by the time an SFENCE executes
  
  MFENCE - Memory fence - all pending load and store operations
must be completed by the time an MFENCE executes
  

These instructions are in addition to the "synchronizing"
instructions, such as interlocked memory operations and the CPUID
instruction. The latter cause a total pipeline flush, leading to
less-efficient utilization of the CPU. It should be noted that the DDK
defines KeMemoryBarrier() using an interlocked store operation, so
KeMemoryBarrier() sufferes from this performance issue.

This story changes on other architectures, as I've said before,
so the best practice is stil to code defensively and use memory
barriers where you need them. However, it doesn't look like you're
likely to run into these situations in x86-land.

[linuxkernelnewbies] LFENCE--Load Fence

2009-09-21 Thread peter teoh






http://www.rz.uni-karlsruhe.de/rz/docs/VTune/reference/vc153.htm

LFENCE--Load
Fence


  

  
  Opcode
  
  
  Instruction
  
  
  Description
  


  
  0F AE /5
  
  
  LFENCE
  
  
  Serializes load
operations.
  

  

Description
Performs a serializing operation on all load instructions that were
issued prior the LFENCE
instruction. This serializing operation guarantees that every load
instruction that precedes in program order the LFENCE instruction is
globally visible before any load instruction that follows the LFENCE
instruction is globally visible. The LFENCE instruction is ordered with
respect to load instructions, other LFENCE instructions, any MFENCE
instructions, and any serializing instructions (such as the CPUID
instruction). It is not ordered with respect to store instructions or
the SFENCE instruction.
Weakly ordered memory types can enable higher performance through
such techniques as out-of-order issue and speculative reads. The degree
to which a consumer of data recognizes or knows that the data is weakly
ordered varies among applications and may be unknown to the producer of
this data. The LFENCE instruction provides a performance-efficient way
of ensuring ordering between routines that produce weakly-ordered
results and routines that consume this data.
It should be noted that processors are free to speculatively fetch
and cache data from system memory regions that are assigned a
memory-type that permits speculative reads (that is, the WB, WC, and WT
memory types). The PREFETCHh
instruction is considered a hint to this speculative behavior. Because
this speculative fetching can occur at any time and is not tied to
instruction execution, the LFENCE instruction is not ordered with
respect to PREFETCHh or any of
the specualtive fetching mechanisms (that is, data could be speculative
loaded into the cache just before, during, or after the execution of an
LFENCE instruction).
Operation
Wait_On_Following_Loads_Until(preceding_loads_globally_visible);
Intel®
C++ Compiler Intrinsic Equivalent
void_mm_lfence(void)
Exceptions (All Modes of Operation)
None.

[linuxkernelnewbies] LFENCE instruction (was: [rfc][patch 3/3] x86: optimise barriers)

2009-09-21 Thread peter teoh






http://linux.derkeiler.com/Mailing-Lists/Kernel/2007-10/msg04948.html

LFENCE instruction (was: [rfc][patch 3/3] x86:
optimise barriers)


  From: Mikulas Patocka 
  Date: Mon, 15 Oct 2007 22:47:42 +0200 (CEST)


According
to latest memory
ordering specification documents from Intel 
and AMD, both manufacturers are committed
to in-order loads from 
cacheable memory for the x86 architecture. Hence, smp_rmb() may be a 
simple barrier.
  
  http://developer.intel.com/products/processor/manuals/318147.pdf
  
  http://www.amd.com/us-en/assets/content_type/white_papers_and_tech_docs/24593.pdf


Hi

I'm just wondering about one thing --- what is LFENCE instruction good 
for?

SFENCE is for enforcing ordering in write-combining buffers (it doesn't

have sense in write-back cache mode).
MFENCE is for preventing of moving stores past loads.

But what is LFENCE for? I read the above documents and they already say

that CPUs have ordered loads.

In Intel instruction reference, the description for LFENCE is copied
from 
SFENCE (with the word "store" replaced with the word "load"), so it 
doesn't really give much insight into the operation of the instruction.

Or is LFENCE just a no-op reserved for the possibility that Intel would

relax ordering rules?

Mikulas

[linuxkernelnewbies] [PATCH 1/3] PM: Introduce new top level suspend and hibernation callbacks (rev. 7) | KernelTrap

2009-09-21 Thread peter teoh






http://kerneltrap.org/mailarchive/linux-kernel/2008/4/3/1345224

[PATCH 1/3] PM: Introduce new top level suspend and
hibernation callbacks (rev. 7)

  view
  thread

!MAILaRCHIVE_VOTE_RePLACE
Previous message: [thread]
[date]
[author]
Next message: [thread]
[date]
[author]

[view
in full thread]

From: Rafael
J. Wysocki 
To: Greg KH 
Cc: pm list , ACPI
Devel Maling List , Alan Stern ,
Len Brown , LKML , Alexey
Starikovskiy , David Brownell
, Pavel Machek , Benjamin
Herrenschmidt , Oliver Neukum , Nigel
Cunningham , Jesse Barnes 

Subject: [PATCH
1/3] PM: Introduce new top level suspend and hibernation callbacks
(rev. 7)
Date: Thursday, April 3, 2008 - 7:12 pm


From: Rafael J. Wysocki 

Introduce 'struct pm_ops' and 'struct pm_ext_ops' ('ext' meaning
'extended') representing suspend and hibernation operations for bus
types, device classes, device types and device drivers.

Modify the PM core to use 'struct pm_ops' and 'struct pm_ext_ops'
objects, if defined, instead of the ->suspend(), ->resume(),
->suspend_late(), and ->resume_early() callbacks (the old callbacks
will be considered as legacy and gradually phased out).

The main purpose of doing this is to separate suspend (aka S2RAM and
standby) callbacks from hibernation callbacks in such a way that the
new callbacks won't take arguments and the semantics of each of them
will be clearly specified.  This has been requested for multiple
times by many people, including Linus himself, and the reason is that
within the current scheme if ->resume() is called, for example, it's
difficult to say why it's been called (ie. is it a resume from RAM or
from hibernation or a suspend/hibernation failure etc.?).

The second purpose is to make the suspend/hibernation callbacks more
flexible so that device drivers can handle more than they can within
the current scheme.  For example, some drivers may need to prevent
new children of the device from being registered before their
->suspend() callbacks are executed or they may want to carry out some
operations requiring the availability of some other devices, not
directly bound via the parent-child relationship, in order to prepare
for the execution of ->suspend(), etc.

Ultimately, we'd like to stop using the freezing of tasks for suspend
and therefore the drivers' suspend/hibernation code will have to take
care of the handling of the user space during suspend/hibernation.
That, in turn, would be difficult within the current scheme, without
the new ->prepare() and ->complete() callbacks.

Signed-off-by: Rafael J. Wysocki 
Acked-by: Pavel Machek 
Acked-by: Jesse Barnes 
---

 arch/x86/kernel/apm_32.c   |4 
 drivers/base/power/main.c  |  692 ++---
 drivers/base/power/power.h |2 
 drivers/base/power/trace.c |4 
 include/linux/device.h |9 
 include/linux/pm.h |  314 ++--
 kernel/power/disk.c|   20 -
 kernel/power/main.c|6 
 8 files changed, 847 insertions(+), 204 deletions(-)

Index: linux-2.6/include/linux/pm.h

[linuxkernelnewbies] Linux Device Driver Template/Skeleton with memory mapping of kernel memory to user-space (mmap)

2009-09-23 Thread peter teoh






http://www.captain.at/howto-linux-device-driver-mmap.php

Linux Device Driver Template/Skeleton with memory
mapping of kernel memory to user-space (mmap)






This is a modified version of the Linux
Device Driver Template/Skeleton.


UPDATE 20-JAN-2006: THE EXAMPLE NOW ALSO WORKS WITH KERNEL 2.6!

NOTE: While the original Linux
Device Driver Template/Skeleton
works for kernel 2.4 and kernel 2.6, this mmap example works only
with kernel 2.4 (the Makefile is the default
makefile for both kernel branches).


The interrupt handler was removed - but it is possible to access kernel
memory from userspace i.e. for high speed I/O
operations. The driver creates a device entry in /dev/, which can be
used to
communicate with the kernel module (read, write, ioctl, mmap).

Compile everything with the "Makefile" and before you load the kernel
module with
Kernel 2.4:
# insmod ./skeleton.o

-or-
Kernel 2.6:
# insmod ./skeleton.ko

you must create the device entry in /dev with
# mknod -m 666 /dev/skeleton c 240 0





After loading the module, start the userspace program "user". This
program opens the
device (/dev/skeleton), writes data to it and reads the data back
(blocking read is disabled here).

# ./user
String 'Skeleton Kernel Module Test' written to /dev/skeleton
String 'Skeleton Kernel Module Test' read from /dev/skeleton
buffer[0]=0
buffer[4]=4
buffer[8]=8
buffer[12]=12
buffer[16]=16
buffer[20]=20
buffer[24]=24
buffer[28]=28
buffer[32]=32
buffer[36]=36
IOCTL test: written: '' - received: ''
#

After starting "user", you will see "String 'Skeleton Kernel Module
Test' written to /dev/skeleton" and
the string is read back. Furthermore the userspace program performs a
mmap operation.
When the kernel module is loaded, some buffer is allocated with kmalloc
and some data is written to the buffer. The userspace application reads
from the buffer and displays the values
(the type of data in the buffer is controlled with "#define USEASCII").


The userspace program also does some basic IOCTL access - you can use
this to send commands to the module.


Another way to test the driver:
Write some data to the module:
# cat > /dev/skeleton
this is a pen.
[press CTRL-D]
#

Read the data back:
# cat < /dev/skeleton
this is a pen.
#

Note
about ClearPageReserved at the bottom





Also check the kernel log for messages:
# tail /var/log/kern.log
Jul 13 20:49:26 localhost kernel: initializing module
Jul 13 20:49:26 localhost kernel: kmalloc_area at 0xd568 (phys 0x81fa000)
Jul 13 20:49:26 localhost kernel: kmalloc_ptr[0]=0
Jul 13 20:49:26 localhost kernel: kmalloc_ptr[4]=4
Jul 13 20:49:26 localhost kernel: kmalloc_ptr[8]=8
Jul 13 20:49:26 localhost kernel: kmalloc_ptr[12]=12
Jul 13 20:49:26 localhost kernel: kmalloc_ptr[16]=16
Jul 13 20:49:26 localhost kernel: kmalloc_ptr[20]=20
Jul 13 20:49:26 localhost kernel: kmalloc_ptr[24]=24
Jul 13 20:49:26 localhost kernel: kmalloc_ptr[28]=28
Jul 13 20:49:26 localhost kernel: kmalloc_ptr[32]=32
Jul 13 20:49:26 localhost kernel: kmalloc_ptr[36]=36
Jul 13 20:49:28 localhost kernel: skeleton_open
Jul 13 20:49:28 localhost kernel: skeleton_release
Jul 13 20:49:32 localhost kernel: cleaning up module

The device driver "skeleton.c":
// Linux Device Driver Template/Skeleton with mmap
// Kernel Module

#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 
#include 

#if LINUX_VERSION_CODE < KERNEL_VERSION(2,6,0)
#include 
#endif

#define SKELETON_MAJOR 240
#define SKELETON_NAME "skeleton"
#define CASE1 1
#define CASE2 2

static unsigned int counter = 0;
static char string [128];
static int data;

//#define USEASCII

#ifdef USEASCII
static char *kmalloc_area = NULL;
static char *kmalloc_ptr = NULL;
#else
static unsigned int *kmalloc_area = NULL;
static unsigned int *kmalloc_ptr = NULL;
#endif

#define LEN (64*1024)
unsigned long virt_addr;

DECLARE_WAIT_QUEUE_HEAD(skeleton_wait);
static int data_not_ready = 0;

// open function - called when the "file" /dev/skeleton is opened in userspace
static int skeleton_open (struct inode *inode, struct file *file) {
	printk("skeleton_open\n");
	// we could do some checking on the flags supplied by "open"
	// i.e. O_NONBLOCK
	// -> set some flag to disable interruptible_sleep_on in skeleton_read
	return 0;
}

// close function - called when the "file" /dev/skeleton is closed in userspace  
static int skeleton_release (struct inode *inode, struct file *file) {
	printk("skeleton_release\n");
	return 0;
}

// read function called when from /dev/skeleton is read
static ssize_t skeleton_read (struct file *file, char *buf,
		size_t count, loff_t *ppos) {
	int len, err;
	
	// check if we have data - if not, sleep
	// wake up in interrupt_handler
	while (data_not_ready) {
		interruptible_sleep_on(&skeleton_wait);
	}
	//data_not_ready = 1;
	
	if( counter <= 0 ) 
		return 0;
	err = copy_to_user(buf,string,counter);
	if (err != 0)
		return -EFAULT;
	len  = counter;
	counter = 0;
	return

[linuxkernelnewbies] Ext3 Data=Ordered vs Data=Writeback mode - Ext4

2009-09-23 Thread peter teoh






http://ext4.wiki.kernel.org/index.php/Ext3_data_mode_tradeoffs

Ext3 Data="" vs
Data="" mode
From Ext4
(Redirected from Ext3 data mode tradeoffs)
Jump to: navigation,
search

If
a filesystem does not explicitly specify a data ordering mode, and the
journal capability allowed it, ext3 used to historically default to
'data="" (hereafter, 'Ordered Mode'). This become configurable in
2.6.30, when the configuration option CONFIG_EXT3_DEFAULTS_TO_ORDERED
was added. If this CONFIG option is disabled, it will change the
default to be 'data="" (hereafter, 'Writeback Mode').

Ordered mode is the mode used by most distributions, but can
introduce latency problems in some workloads, especially if there is a
combination of high bandwidth background writes and foreground
processes calling fsync(). In worst case scenarios, the fsync() call
can take 500ms to multiple seconds to return. In applications such as
firefox, which called fsync() out of its main UI thread, the
application can appear to have crashed since it is no longer responsive
to the user's mouse or keyboard input.

However, the problem with using a default of Writeback Mode is
that after a system crash or a power failure, files that were written
right before the system went down could contain previously written data
or other garbage. With Ordered Mode, journal commits are deferred until
the data blocks get written to disk. This guarantees that any blocks in
the file will be data written by the application, avoiding a
possibility of a security breach, which is especially problematic on a
multi-user system. (Note, however, that Ordered Mode does not guarantee
that the file will be consistent at an application level; the
application must use fsync() at appropriate commit points in order to
guarantee application-level consistency.)

In addition, there are some applications which depend on
data="" to automatically force data blocks to be written to disk
soon after the file is written. Using Writeback Mode extends the time
from when a file is written to when it is pushed out to disk to 30
seconds. This can be surprising for some users; however, it should be
noted that such problems can still be an issue with Ordered Mode
(although they are much rarer). Again, a careful application or library
should always use fsync() at points where the application is at a
stable commit point.

If you have been historically happy with ext3's performance,
using Ordered Mode will be a safe choice and you should enable
CONFIG_EXT3_DEFAULTS_TO_ORDERED. However, if the latency problems are
causing problems for you, and you understand the reliability and data
privacy issues of Writeback Mode, you can disable this option. 
Filesystems can be forced to use a specific data journalling
mode by specifying a mount option on the command line, or in
/etc/fstab, or by using the "tune2fs -o journal_data_ordered" or
"tune2fs -o journal_data_writeback" to specify a default mount option
in the filesystem superblock.

[linuxkernelnewbies] Major Events

2009-09-24 Thread peter teoh






http://lwn.net/Articles/353226/

Events: October 1, 2009 to November 30, 2009
The following event listing is taken from the
LWN.net Calendar.

 

  

  Date(s)
  Event
  Location


  September 28
October 2
  Sixteenth
Annual Tcl/Tk Conference (2009)
  Portland, OR 97232, USA


  October 1
October 2
  Open
World Forum
  Paris, France


  October 2
  Mozilla Public
DevDay/Open Web Camp 2009
  Prague, Czech Republic


  October 2
  LLVM
Developers' Meeting
  Cupertino, CA, USA


  October 2
October 3
  Open
Source Developers Conference France
  Paris, France


  October 2
October 4
  Ubuntu
Global Jam
  Online, Online


  October 2
October 4
  Linux
Autumn (Jesien Linuksowa) 2009
  Huta Szklana, Poland


  October 2
October 4
  7th
International Conference on Scalable Vector Graphics
  Mountain View, CA, USA


  October 3
October 4
  EU MozCamp 2009
  Prague, Czech Republic


  October 3
October 4
  T-DOSE 2009
  Eindhoven, The Netherlands


  October 7
October 9
  Jornadas
Regionales de Software Libre
  Santiago, Chile


  October 9
October 11
  Maemo
Summit 2009
  Amsterdam, The Netherlands


  October 10
  OSDN
Conference 2009
  Kiev, Ukraine


  October 10
October 12
  Gnome
Boston Summit
  Cambridge, MA, USA


  October 12
October 14
  Qt
Developer Days
  Munich, Germany


  October 15
October 16
  Embedded Linux
Conference Europe 2009
  Grenoble, France


  October 16
October 17
  Pycon Poland 2009
  Ustron, Poland


  October 16
October 18
  German Ubuntu
conference
  Göttingen, Germany


  October 16
October 18
  Pg Conference West 09
  Seattle, WA, USA


  October 18
October 20
  2009
Kernel Summit
  Tokyo, Japan


  October 19
October 22
  ZendCon 2009
  San Jose, CA, USA


  October 21
October 23
  Japan
Linux Symposium
  Tokyo, Japan


  October 22
October 24
  Décimo
Encuentro Linux 2009
  Valparaiso, Chile


  October 24
  Florida
Linux Show 2009
  Orlando, Florida, USA


  October 24
October 25
  FOSS.my 2009
  Kuala Lumpur, Malaysia


  October 26
October 28
  Pacific
Northwest Software Quality Conference
  Portland, OR, USA


  October 26
October 28
  GitTogether
'09
  Mountain View, CA, USA


  October 26
October 28
  Techno
Forensics and Digital Investigations Conference
  Gaithersburg, MD, USA


  October 27
October 30
  Linux-Kongress
2009
  Dresden, Germany


  October 28
October 30
  
no:sql(east).
  Atlanta, USA


  October 28
October 30
  Hack.lu
2009
  , Luxembourg


  October 29
  NLUUG autumn
conference: The Open Web
  Ede, The Netherlands


  November 1
November 6
  23rd
Large Installation System Administration Conference 
  Baltimore, MD, USA


  November 2
November 6
  Ubuntu
Open Week
  Internet, Internet


  November 2
November 6
  ApacheCon
2009
  Oakland, CA, USA


  November 3
November 6
  OpenOffice.org
Conference
  Orvieto, Italy


  November 4
November 5
  Linux World
NL
  Utrecht, The Netherlands


  November 5
  Government Open
Source Conference
  Washington, DC, USA


  November 6
November 8
  WineConf
2009
  Enschede, Netherlands


  November 6
November 10
  CHASE
2009
  Lahore, Pakistan


  November 7
November 8
  Kiwi PyCon 2009
  Christchurch, New Zealand


  November 7
November 8
  OpenRheinRuhr
  Bottrop, Germany


  November 9
November 13
  ACM CCS
2009
  Chicago, IL, USA


  November 12
November 13
  European
Conference on Computer Network Defence
  Milan, Italy


  November 16
November 19
  Web
2.0 Expo
  New York, NY, USA


  November 16
November 20
  INTEROP
  New York, NY, USA


  November 17
November 20
  DeepSec IDSC
  Vienna, Austria


  November 19
November 20
  CONFIdence
2009
  Warsaw, Poland


  November 19
November 22
  Piksel 09
  Bergen, Norway


  November 21
  Baltic
Perl Workshop 2009
  Riga, Latvia


  November 27
November 29
  Ninux Day 2009
  Rome, Italy

[linuxkernelnewbies] Japan Linux Symposium: The Newest Linux Foundation Conference in Asia | Sessions

2009-09-24 Thread peter teoh






http://events.linuxfoundation.org/events/japan-linux-symposium/sessions

The
Enhanced Socket Buffer Accounting Mechanism
Hideo
Aoki, Hitachi, Ltd.
Per Back Device Dirty Data Writeback Replaces
pdflush Driven Writeback in an Attempt to Speed Up This Operation
Jens
Axboe, Oracle
Measuring Function Duration with Ftrace
Tim
Bird, Sony Corporation

How to Contribute to the Linux Kernel and Why It
Makes Economic Sense
James
Bottomley, Novell

The Kernel Report
Jon
Corbet, LWN.net
How to Work With the Kernel Development Community
Jon
Corbet, LWN.net
Upstream In-House Board Support Package
Magnus
Damm
Saving Battery Life With Linux Power Management
Brad
Dixon, MontaVista Software
Addressing the Top 5 Pains in Linux Build and Design
Brad
Dixon, MontaVista Software
Through the Looking Glass: Open Source From a
Teenage Perspective
Elizabeth
Garbee
Understanding Debian
Bdale
Garbee, Hewlett-Packard
Marketing Linux to the Next Generation of Japanese
(and Asian) End-Users
Christine
L.E.V. Hansen, Le CIEL
Kernel Development: Drawing Lessons From "Mistakes"
Toshiharu
Harada, NTT DATA Corporation

Evaluation and Implementation of Data Link Flow
Control in Linux
Mitsuo
Hayasaka, Hitachi Ltd.

The KVM/gemu Storage Stack
Christoph
Hellwig
How Fast is Linux Networking
Stephen
Hemminger, Vyatta
How Not to Get Your Linux Driver Sent to the Staging
Area Penalty Box
Stephen
Hemminger, Vyatta

Not So C Constructs in the Linux Kernel
Tejun
Heo, Novell/SUSE

Linux IPv6 - Where We Are, Where We Will Be
Yoshifuji
Hideaki
Reliable Delivery of Oops Information
Dirk
Hohndel, Intel Corporation
oFono - Open Source Telephony
Marcel
Holtmann, Intel Corporation
Update on Large NAND Flash File System Evaluation
Toru
Homma, Toshiba
Multi-Function PCI Pass-Through for Xen
Simon
Horman, VA Linux Systems Japan K.K.
I/O Controller: State of the Art
Munehiro
Ikeda, NEC Corporation
Flight-Record
Zhao
Lei, Fujitsu & Lai Jiangshan, Fujitsu

Development and
Maintainence of the Fedora kernel
Dave Jones, Red Hat
Improvement of I/O Error Handling on Ext3 Filesystem
Hidehiro
Kawai, Hitachi, Ltd.
Memory Cgroup: Summary and Upcoming Enhancements
Hiroyuki
Kamezawa
Hot-Plugging VESPER into QEMU to Trace HA Cluster of
KVM Guests
Sungho
Kim
LAPP/SELinux - A Secure Web Application Stack
Powered by SELinux
KaiGai
Kohei, NEC Corporation

The nifs2 Filesystem: Review and Challenge
Ryusuke
Konishi, NTT Corporation

Migrating to Linux From Proprietary Unix in a Large
Enterprise Environment
Vinod
Kutty, CME Group
Network Magic: Multicasting, UDP and IGMP 
Christoph
Lameter
Petitboot, A Kexec-Based Bootloader
Geoff
Levand, Sony Corporation

Btrfs:
Filesystem Status and Future Plans
Chris Mason, Oracle

Reducing
SuperH to Linux Commonplace
Hisao
Munakata, Renesas Solutions Corp.
Development of OSS Model Curriculum and Start of Edu
Wiki Project
Hiroshi
Miura, NTT DATA Corporation
Scaling the VFS
Nick
Piggin, Novell/SUSE


Real-Time
Linux Failure
Frank
Rowand, Sony Corporation
The Moblin SDK
Mikio
Sakemoto, Intel K.K. SSG
Statistics of Linux Kernel Development
Tsugikazu
Shibata, NEC

KVM
- Scaling to Infinity and Beyond
Jes
Sorensen


Video4Linux:
Past, Present and Future
Hans
Verkuil, Tandberg Telecom AS
ZABBIX - An Enterprise-Class Open-Source Distributed
Monitoring Solution
Kodai
Terashima, Miracle Linux

Analyzing Kernel Behavior With SystemTap
Atsushi
Tsuji, NEC & Noboru
Obata, Hitachi, Ltd

A Disk IO Bandwidth Controller - Implemented as a
Device-Mapper Module
Ryo
Tsuruta, VA Linux Systems Japan K.K.
Pluggable
Real-Time Performance Monitoring
Dag
Wieers
Generic Receive Offload: How to Receive 10Gb/s and
Have Cycles to Spare
Herbert
Xu, Red Hat Inc.
FSIJ USB Token for GnuPG
Niibe
Ytaka, National Institute of Advanced Industrial Science and Technology


  

  Tutorials

  


Extending
the Vyatta Open Router
Stephen
Hemminger, Vyatta

Learning, Analyzing and Protecting Android with
TOMOYO Linux
Guiseppe
La Tona, NTT DATA Corporation & Daisuke
Numaguchi, NTT DATA Corporation
Introduction to Android Development
Diego
Torres Milano, COD Technologies Ltd.



  

  BoFs

  


Embedded
Linux
Tim
Bird, Sony Corporation

Test Projects
Hisashi
Hashimoto, Hitachi, Ltd.

Enhanced Securities: Where Should We Go Next
KaiGai
Kohei & Toshiharu Harada, NTT DATA Corporation

Why
Are We Hesitating to Join the Community
Satoru
Ueda

Virtualization
Isaku
Yamahata, VA Linux Systems Japan K.K.

[linuxkernelnewbies] "A space-efficient flash translation layer for CompactFlash systems" - Google Search

2009-09-24 Thread peter teoh







  [PDF]
A
space-efficient flash translation layer for compactflash systems ...
File Format: PDF/Adobe
Acrobat - View
as HTML
A Space-Efficient Flash Translation Layer for compactflash
Systems. 37 1. D. Atomicity of Host Requests. Flash-memory-based
storage subsystems such as Com- ...
altair.snu.ac.kr/newhome/kr/course/system.../Advanced%20FTL.pdf
- Similar
by J Kim - 2002 - Cited
by 158 - Related
articles - All
8 versions

  [PPT]
A
Space-Efficient Flash Translation Layer for Compactflash Systems
File Format: Microsoft
Powerpoint - View
as HTML
Introduction
to Flash Memory. 2006. 11. 15. Mobile Embedded System Lab. Kiseok,
Choi. Table of Contents. Stateless PC. Flash Memory Basics. NAND vs. ...
altair.snu.ac.kr/newhome/kr/.../1115_2_flashmemory_kschoi.ppt
- Similar

  
Welcome
to IEEE Xplore 2.0: A space-efficient flash translation ...
A space-efficient flash translation layer for
CompactFlash systems. Jesung Kim Jong Min Kim Noh, S.H. Sang Lyul
Min Yookun Cho Sch. of Comput. Sci. ...
ieeexplore.ieee.org/xpls/abs_all.jsp?arnumber=1010143 - Similar
by J Kim - 2002 - Cited
by 163 - Related
articles

  [PDF]
Slide
1 - University of Minnesota
File Format: PDF/Adobe
Acrobat - View
as HTML
10 Mar 2009 ... A space-efficient flash translation layer
for compactflash systems, Jesung Kim, JongMin Kim, SamH. Noh,
SangLyul Min, and Yookun Cho, ...
www-users.itlabs.umn.edu/classes/Spring-2009/.../Group-12.pdf
- Similar
by D Park

  
File:A
Space-Efficient Flash Translation Layer for Compactflash ...
17 Jun 2009 ... File:A Space-Efficient
Flash Translation Layer for Compactflash Systems.pdf. Go to page.
File:A Space-Efficient Flash Translation Layer for ...
dislab.hufs.ac.kr/.../File:A_Space-Efficient_Flash_Translation_Layer_for_Compactflash_Systems.pdf
- Cached
- Similar

  
A log
buffer-based flash translation layer using fully-associative ...
A space-efficient flash translation layer for
compactflash systems. IEEE Transactions on Consumer Electronics
48, 366--375. ...
portal.acm.org/citation.cfm?id=1275990 - Similar
by SW Lee - 2007 - Cited
by 56 - Related
articles - All
7 versions

  
DFTL
A Space-Efficient
Flash Translation Layer for Compactflash Systems. IEEE
Transactions on Consumer Electronics, 48(2):366--375, 2002. ...
portal.acm.org/citation.cfm?id=1508271 - Similar
by A Gupta - 2009 - Cited
by 9 - Related
articles



  

  
  

  

Show more results from
portal.acm.org




  
A
survey of Flash Translation Layer - Elsevier
[9] Jesung Kim, Jong Min Kim, Sam H. Noh, Sang Lyul
Min and Yookun Cho, A space-efficient flash
translation layer for compactflash systems, ...
linkinghub.elsevier.com/retrieve/pii/S1383762109000356 - Similar
by TS Chung - 2009 -
Related
articles

  
Configurable
Flash-Memory Management: Performance versus Overheads
[14] J. Kim, J.M. Kim, S.H. Noh, S.L. Min, and Y.
Cho, “A Space-Efficient Flash Translation Layer for CompactFlash
Systems,” IEEE Trans. ...
www2.computer.org/portal/web/csdl/doi/10.../TC.2008.61 - Cached
- Similar

  
Flash memory
systems utilizing direct data file storage - Patent ...
Kim et al., "A Space-Efficient Flash Translation
Layer for CompactFlash Systems," IEEE Transactions on
Consumer Electronics, vol. 48, No. 2, May 2002, pp. ...
www.patentgenius.com/patent/7590795.html - Cached
- Similar

[linuxkernelnewbies] A new suspend/hibernate infrastructure [LWN.net]

2009-09-24 Thread peter teoh






http://lwn.net/Articles/274008/


A new suspend/hibernate infrastructure


 By Jonathan Corbet
March 19, 2008 
While attending conferences, your editor has, for some years, made a
point
of seeing just how many other attendees have some sort of suspend and
resume functionality working on their laptops. There is, after all,
obvious value in being able to sit down in a lecture hall, open the
lid,
and immediately start heckling the speaker via IRC without having to
wait
for the entire bootstrap sequence to unfold. But, regardless of whether
one is talking about suspend-to-RAM ("suspend") or suspend-to-disk
("hibernation"), there are surprisingly few people using this
capability.
Despite the efforts which have been made by developers and
distributors,
suspend and hibernate still just do not work reliably for a lot of
people.
For your editor, suspend always works, but the success rate
of the
resume operation is about 95% - just enough to keep using it while
inspiring a fair amount of profanity in inopportune places.

Various approaches to fixing suspend and hibernation have been
proposed;
these include TuxOnIce and kexec jump. Another
possibility, though, is to simply fix the code which is in the kernel
now.
There is a lot that has to be done to make that goal a reality,
including
making the whole process more robust and separating the suspend and
hibernation cases which, as Linus has stated rather strongly several
times,
are really two different problems. To that end, Rafael Wysocki has
posted
a new suspend and hibernation
infrastructure for devices which has the potential to improve the
situation - but at a cost of creating no less than 20 separate device
callbacks.

For the (relatively) simple suspend case, there are four basic
callbacks
which should be provided in the new pm_ops structure by each
bus
and, eventually, by every device:


int (*prepare)(struct device *dev);
int (*suspend)(struct device *dev);

int (*resume)(struct device *dev);
void (*complete)(struct device *dev);


When the system is suspending, each device will first see a call to its
prepare() callback. This call can be seen as a sort of warning
that the suspend is coming, and that any necessary preparation work
should
be done. This work includes preventing the addition of any new child
devices and anything which might require the involvement of user space.
Any significant memory allocations should also be done at this time;
the
system is still functional at this point and, if necessary, I/O can be
performed to make memory available. What should not happen in
prepare() is actually putting the device into a low-power
state;
it needs to remain functional and available.

As usual, a return value of zero indicates that the preparation was
successful, while a negative error code indicates failure. In cases
where
the failure is temporary (a race with the addition of a new child
device is
one possibility), the callback should return -EAGAIN, which
will
cause a repeat attempt later in the process.

At a later point, suspend() will be called to actually
power down
the device. With the current patch, each device will see a
prepare() call quickly followed by suspend(). Future
versions are likely to change things so that all devices get a
prepare() call before any of them are suspended; that way,
even
the last prepare() callback can count on the availability of
a
fully-functioning system.

The resume process calls resume() to wake the device up,
restore
it to its previous state, and generally make it ready to operate. Once
the
resume process is done, complete() is called to clean up
anything
left over from prepare(). A call to complete()
could
also be made directly after prepare() (without an intervening
suspend) if the suspend process fails somewhere else in the system.

The hibernation process is more complicated, in that there are more
intermediate states. In this case, too, the process begins with a call
to
prepare(). Then calls are made to:


int (*freeze)(struct device *dev);
int (*poweroff)(struct device *dev);


The freeze() callback happens before the hibernation image
(the
system image which is written to persistent store) is created; it
should
put the device into a quiescent state but leave it operational. Then,
after the hibernation image has been saved and another call to
prepare() made, poweroff() is called
to shut things down.

When the system is powered back up, the process is reversed through
calls
to:


int (*quiesce)(struct device *dev);
int (*restore)(struct device *dev);


The call to quiesce() will happen early in the resume
process, after the hibernation image has been loaded from disk, but
before it has
been used to recreate the pre-hibernation system's memory. This
callback
should quiet the device so that memory can be reassembled without being
corrupted by device operations. A call to complete() will
follow,
then a call to restore(), which should put the device back
into a
fully-functional

[linuxkernelnewbies] Publications - RTwiki

2009-09-25 Thread Peter Teoh






http://rt.wiki.kernel.org/index.php/Publications

Publications
From RTwiki
Jump to: navigation,
search


  

  
  
  FIXME
  
  
  
  The table needs some of the
fields filled in, but more importantly, we need a lot more links to
Real-Time related articles.
  

  


  

   Title 
   Author(s) 
   Date
  


   Approaches to realtime Linux 
   Jonathan Corbet 
   2004-10-12
  


   Realtime preemption, part 2 
   Jonathan Corbet 
   2004-10-20
  


   A realtime preemption overview 
   Paul McKenney 
   2005-08-10
  


   Read-copy-update for realtime 
   Jonathan Corbet 
   2006-09-26
  


High Resolution Timer
Design Notes 
   Thomas Gleixner 
   -
  


   Next
Generation Hard Realtime on POSIX based Linux 
Robert Schwebel 
   2006-06
  


   The State of RT and Common Mistakes 
   slides of BOF at OLS 2006
by Steve Rostedt and Klaas van Gend 
   2006-07-21
  


   hrtimers - and beyond 
   slides to the talk at OLS 2006
by Thomas Gleixner and Douglas Niehaus 
   2006-07-20
  


   OLS 2006 proceedings 
including paper "hrtimers - and beyond" 
   paper of the talk at OLS 2006
by Thomas Gleixner and Douglas Niehaus 
   2006-07-20
  


   SMP and Embedded Real Time 
   Paul E. McKenney 
   Linux Journal
January 2007
  


   Real
time in embedded Linux systems 
   Michael Opdenacker 
   January 2007
  


   Native mainline Linux: fit for embedded and real-time
systems, boards & solutions (PDF) 
   Carsten Emde and Thomas Gleixner 
   March 2007
  


   Real-time Java, Part 1 - 6 
   Mark Stoodley, Mike Fulton, Michael Dawson, Ryan
Sciampacone, John Kacur 
   April - July 2007
  


   Proceedings
of the 2007 Linux Symposium  
   Various Authors [includes "Internals of the RT Patch" by
Steven Rostedt ] 
   June 2007
  


   The design of preemptible RCU  
   Paul E. McKenney 
   October 2007
  


   The read-copy-update mechanism for supporting real-time
applications on shared-memory multiprocessor systems with Linux  
   Dinakar Guniguntala, Paul E. McKenney, Josh Triplett, and
Jonathan Walpole 
   May 2008
  


   Real-time Linux Software Quick Reference Guide 
   - 
   -
  


   Realtime Linux foundation 
   Real Time Linux Foundation, Inc. 
   -
  


   RTLinux-GPL
   - 
   -
  


   RTAI-Real Time Application
Interface 
   Politecnico di Milano - Dipartimento di Ingegneria
Aerospaziale 
   -
  


   OCERA 
   OCERA Project 
   -
  


   OSADL Project: RT-Preempt patches 
   Open Source Automation
Development Lab 
   Updated continuously
  


   ftrace tutorial  
   Stephen Rostedt 
   OLS 2008 
  
  


   Real Time vs. Real Fast  Presentation  
   Paul E. McKenney 
   OLS 2008 
  
  

  



Retrieved from "http://rt.wiki.kernel.org/index.php/Publications"

Category: NeedsEditing

[linuxkernelnewbies] [3/3] Support piping into commands in /proc/sys/kernel/core_pattern [LWN.net]

2009-09-25 Thread Peter Teoh






http://lwn.net/Articles/195310/

Signed-off-by: Andi Kleen 

Index: linux/fs/exec.c
===
--- linux.orig/fs/exec.c
+++ linux/fs/exec.c
@@ -58,7 +58,7 @@
 #endif
 
 int core_uses_pid;
-char core_pattern[65] = "core";
+char core_pattern[128] = "core";
 int suid_dumpable = 0;
 
 EXPORT_SYMBOL(suid_dumpable);
@@ -1475,6 +1475,7 @@ int do_coredump(long signr, int exit_cod
 	int retval = 0;
 	int fsuid = current->fsuid;
 	int flag = 0;
+	int ispipe = 0;
 
 	binfmt = current->binfmt;
 	if (!binfmt || !binfmt->core_dump)
@@ -1516,22 +1517,34 @@ int do_coredump(long signr, int exit_cod
  	lock_kernel();
 	format_corename(corename, core_pattern, signr);
 	unlock_kernel();
-	file = filp_open(corename, O_CREAT | 2 | O_NOFOLLOW | O_LARGEFILE | flag, 0600);
+ 	if (corename[0] == '|') {
+		/* SIGPIPE can happen, but it's just never processed */
+ 		if(call_usermodehelper_pipe(corename+1, NULL, NULL, &file)) {
+ 			printk(KERN_INFO "Core dump to %s pipe failed\n",
+			   corename);
+ 			goto fail_unlock;
+ 		}
+		ispipe = 1;
+ 	} else
+ 		file = filp_open(corename,
+ O_CREAT | 2 | O_NOFOLLOW | O_LARGEFILE, 0600);
 	if (IS_ERR(file))
 		goto fail_unlock;
 	inode = file->f_dentry->d_inode;
 	if (inode->i_nlink > 1)
 		goto close_fail;	/* multiple links - don't dump */
-	if (d_unhashed(file->f_dentry))
+	if (!ispipe && d_unhashed(file->f_dentry))
 		goto close_fail;
 
-	if (!S_ISREG(inode->i_mode))
+	/* AK: actually i see no reason to not allow this for named pipes etc.,
+	   but keep the previous behaviour for now. */
+	if (!ispipe && !S_ISREG(inode->i_mode))
 		goto close_fail;
 	if (!file->f_op)
 		goto close_fail;
 	if (!file->f_op->write)
 		goto close_fail;
-	if (do_truncate(file->f_dentry, 0, 0, file) != 0)
+	if (!ispipe && do_truncate(file->f_dentry, 0, 0, file) != 0)
 		goto close_fail;
 
 	retval = binfmt->core_dump(signr, regs, file);
Index: linux/fs/binfmt_elf.c
===
--- linux.orig/fs/binfmt_elf.c
+++ linux/fs/binfmt_elf.c
@@ -1153,11 +1153,23 @@ static int dump_write(struct file *file,
 
 static int dump_seek(struct file *file, loff_t off)
 {
-	if (file->f_op->llseek) {
-		if (file->f_op->llseek(file, off, 0) != off)
+	if (file->f_op->llseek && file->f_op->llseek != no_llseek) {
+		if (file->f_op->llseek(file, off, 1) != off)
 			return 0;
-	} else
-		file->f_pos = off;
+	} else {
+		char *buf = (char *)get_zeroed_page(GFP_KERNEL);
+		if (!buf)
+			return 0;
+		while (off > 0) {
+			unsigned long n = off;
+			if (n > PAGE_SIZE)
+n = PAGE_SIZE;
+			if (!dump_write(file, buf, n))
+return 0;
+			off -= n;
+		}
+		free_page((unsigned long)buf);
+	}
 	return 1;
 }
 
@@ -1205,30 +1217,32 @@ static int notesize(struct memelfnote *e
 	return sz;
 }
 
-#define DUMP_WRITE(addr, nr)	\
-	do { if (!dump_write(file, (addr), (nr))) return 0; } while(0)
-#define DUMP_SEEK(off)	\
-	do { if (!dump_seek(file, (off))) return 0; } while(0)
+#define DUMP_WRITE(addr, nr, foffset)	\
+	do { if (!dump_write(file, (addr), (nr))) return 0; *foffset += (nr); } while(0)
 
-static int writenote(struct memelfnote *men, struct file *file)
+static int alignfile(struct file *file, unsigned long *foffset)
 {
-	struct elf_note en;
+	char buf[4] = { 0, };
+	DUMP_WRITE(buf, roundup(*foffset, 4) - *foffset, foffset);
+	return 1;
+}
 
+static int writenote(struct memelfnote *men, struct file *file, unsigned long *foffset)
+{
+	struct elf_note en;
 	en.n_namesz = strlen(men->name) + 1;
 	en.n_descsz = men->datasz;
 	en.n_type = men->type;
 
-	DUMP_WRITE(&en, sizeof(en));
-	DUMP_WRITE(men->name, en.n_namesz);
-	/* XXX - cast from long long to long to avoid need for libgcc.a */
-	DUMP_SEEK(roundup((unsigned long)file->f_pos, 4));	/* XXX */
-	DUMP_WRITE(men->data, men->datasz);
-	DUMP_SEEK(roundup((unsigned long)file->f_pos, 4));	/* XXX */
+	DUMP_WRITE(&en, sizeof(en), foffset);
+	DUMP_WRITE(men->name, en.n_namesz, foffset);
+	if (!alignfile(file, foffset)) return 0;
+	DUMP_WRITE(men->data, men->datasz, foffset);
+	if (!alignfile(file, foffset)) return 0;
 
 	return 1;
 }
 #undef DUMP_WRITE
-#undef DUMP_SEEK
 
 #define DUMP_WRITE(addr, nr)	\
 	if ((size += (nr)) > limit || !dump_write(file, (addr), (nr))) \
@@ -1428,7 +1442,7 @@ static int elf_core_dump(long signr, str
 	int i;
 	struct vm_area_struct *vma;
 	struct elfhdr *elf = NULL;
-	off_t offset = 0, dataoff;
+	off_t offset = 0, dataoff, foffset;
 	unsigned long limit = current->signal->rlim[RLIMIT_CORE].rlim_cur;
 	int numnote;
 	struct memelfnote *notes = NULL;
@@ -1572,7 +1586,8 @@ static int elf_core_dump(long signr, str
 		DUMP_WRITE(&phdr, sizeof(phdr));
 	}
 
-	/* Page-align dumped data */
+	foffset = offset;
+
 	dataoff = offset = roundup(offset, ELF_EXEC_PAGESIZE);
 
 	/* Write program headers for segments dump */
@@ -1597,6 +1612,7 @@ static int elf_core_dump(long signr, str
 		phdr.p_align = ELF_EXEC_PAGESIZE;
 
 		DUMP_WRITE(&phdr, sizeof(phdr));

[linuxkernelnewbies] lockdep validation and how it works

2009-09-26 Thread Peter Teoh







Runtime locking correctness validator
=

started by Ingo Molnar 
additions by Arjan van de Ven 

Lock-class
--

The basic object the validator operates upon is a 'class' of locks.

A class of locks is a group of locks that are logically the same with
respect to locking rules, even if the locks may have multiple (possibly
tens of thousands of) instantiations. For example a lock in the inode
struct is one class, while each inode has its own instantiation of that
lock class.

The validator tracks the 'state' of lock-classes, and it tracks
dependencies between different lock-classes. The validator maintains a
rolling proof that the state and the dependencies are correct.

Unlike an lock instantiation, the lock-class itself never goes away:
when
a lock-class is used for the first time after bootup it gets registered,
and all subsequent uses of that lock-class will be attached to this
lock-class.

State
-

The validator tracks lock-class usage history into 4n + 1 separate
state bits:

- 'ever held in STATE context'
- 'ever held as readlock in STATE context'
- 'ever held with STATE enabled'
- 'ever held as readlock with STATE enabled'

Where STATE can be either one of (kernel/lockdep_states.h)
 - hardirq
 - softirq
 - reclaim_fs

- 'ever used'   [ == !unused   
]

When locking rules are violated, these state bits are presented in the
locking error messages, inside curlies. A contrived example:

   modprobe/2287 is trying to acquire lock:
    (&sio_locks[i].lock){-.-...}, at: []
mutex_lock+0x21/0x24

   but task is already holding lock:
    (&sio_locks[i].lock){-.-...}, at: []
mutex_lock+0x21/0x24


The bit position indicates STATE, STATE-read, for each of the states
listed
above, and the character displayed in each indicates:

   '.'  acquired while irqs disabled and not in irq context
   '-'  acquired in irq context
   '+'  acquired with irqs enabled
   '?'  acquired in irq context with irqs enabled.

Unused mutexes cannot be part of the cause of an error.


Single-lock state rules:


A softirq-unsafe lock-class is automatically hardirq-unsafe as well. The
following states are exclusive, and only one of them is allowed to be
set for any lock-class:

  and 
  and 

The validator detects and reports lock usage that violate these
single-lock state rules.

Multi-lock dependency rules:


The same lock-class must not be acquired twice, because this could lead
to lock recursion deadlocks.

Furthermore, two locks may not be taken in different order:

  -> 
  -> 

because this could lead to lock inversion deadlocks. (The validator
finds such dependencies in arbitrary complexity, i.e. there can be any
other locking sequence between the acquire-lock operations, the
validator will still track all dependencies between locks.)

Furthermore, the following usage based lock dependencies are not allowed
between any two lock-classes:

      ->  
      ->  

The first rule comes from the fact the a hardirq-safe lock could be
taken by a hardirq context, interrupting a hardirq-unsafe lock - and
thus could result in a lock inversion deadlock. Likewise, a softirq-safe
lock could be taken by an softirq context, interrupting a softirq-unsafe
lock.

The above rules are enforced for any locking sequence that occurs in the
kernel: when acquiring a new lock, the validator checks whether there is
any rule violation between the new lock and any of the held locks.

When a lock-class changes its state, the following aspects of the above
dependency rules are enforced:

- if a new hardirq-safe lock is discovered, we check whether it
  took any hardirq-unsafe lock in the past.

- if a new softirq-safe lock is discovered, we check whether it took
  any softirq-unsafe lock in the past.

- if a new hardirq-unsafe lock is discovered, we check whether any
  hardirq-safe lock took it in the past.

- if a new softirq-unsafe lock is discovered, we check whether any
  softirq-safe lock took it in the past.

(Again, we do these checks too on the basis that an interrupt context
could interrupt _any_ of the irq-unsafe or hardirq-unsafe locks, which
could lead to a lock inversion deadlock - even if that lock scenario did
not trigger in practice yet.)

Exception: Nested data dependencies leading to nested locking
-

There are a few cases where the Linux kernel acquires more than one
instance of the same lock-class. Such cases typically happen when there
is some sort of hierarchy within objects of the same type. In these
cases there is an inherent "natural" ordering between the two objects
(defined by the properties of the hierarchy), and the kernel grabs the
locks in this fixed order on each of the objects.

An example of such an object hierarchy that results in "nested locking"
is that of a "whole disk" block-dev object and a "partition" block-dev
object; the

[linuxkernelnewbies] ftrace stacktrace analysis of mmap() calls

2009-09-26 Thread Peter Teoh






#!/bin/bash
set -x

echo 0 >/debug/tracing/tracing_enabled
cat mmap.list1 > /debug/tracing/set_ftrace_filter
echo function > /debug/tracing/current_tracer
echo func_stack_trace > /debug/tracing/trace_options
echo 1 >/debug/tracing/tracing_enabled
/root/a.out 1000
cat /debug/tracing/stack_trace 
echo 0 > /proc/sys/kernel/stack_tracer_enabled
echo 0 >/debug/tracing/tracing_enabled
cat /debug/tracing/trace | tee ftrace/${0}$$.log


where mmap.list will contain a few of the mmap* related API (just an
example):

mmap
mmap_init
mmap_kmem
mmap_mem
mmap_mem_ops
mmap_min_addr
mmap_min_addr_handler
mmap_region
mmap_rnd
mmap_zero
mon_bin_mmap
node_memmap_size_bytes


Where a.out 1000 is the following program:

 #include 
 #include 
 #include 
 #include 
 #include 
 #include 

 #define handle_error(msg) \
 do { perror(msg); exit(EXIT_FAILURE); } while (0)

 int
 main(int argc, char *argv[])
 {
    int length1 = atoi(argv[1]);
    unsigned char *addr;

    if ((addr = mmap(NULL, (size_t) length1 * 1024 * 1024 , PROT_READ |
PROT_WRITE ,
   MAP_SHARED | MAP_ANONYMOUS, -1, 0)) == MAP_FAILED) {
       perror("mmap()");
       abort();
 }
    else printf("good one\n");
    memset(addr, 'c', length1 * 1024 * 1024);
 } /* main */


And the results:

 gnome-panel-2909  [000]  2869.687263:
security_file_mmap <-do_mmap_pgoff
 gnome-panel-2909  [000]  2869.687264: 
 => do_mmap_pgoff
 => sys_mmap
 => system_call_fastpath
 gnome-panel-2909  [000]  2869.687267: cap_file_mmap
<-security_file_mmap
 gnome-panel-2909  [000]  2869.687267: 
 => security_file_mmap
 => do_mmap_pgoff
 => sys_mmap
 => system_call_fastpath
 gnome-panel-2909  [000]  2869.687271: mmap_region
<-do_mmap_pgoff
 gnome-panel-2909  [000]  2869.687271: 
 => do_mmap_pgoff
 => sys_mmap
 => system_call_fastpath
 gnome-panel-2909  [000]  2869.687276: generic_file_mmap
<-mmap_region
 gnome-panel-2909  [000]  2869.687278: 
 => mmap_region
 => do_mmap_pgoff
 => sys_mmap
 => system_call_fastpath
 gnome-panel-2909  [000]  2869.687284: __perf_event_mmap
<-mmap_region
 gnome-panel-2909  [000]  2869.687284: 
 => mmap_region
 => do_mmap_pgoff
 => sys_mmap
 => system_call_fastpath
 gnome-panel-2909  [000]  2869.687297: sys_mmap
<-system_call_fastpath
 gnome-panel-2909  [000]  2869.687298: 
 => system_call_fastpath
 gnome-panel-2909  [000]  2869.687301: do_mmap_pgoff <-sys_mmap
 gnome-panel-2909  [000]  2869.687301: 
 => sys_mmap
 => system_call_fastpath
 gnome-panel-2909  [000]  2869.687304: security_file_mmap
<-do_mmap_pgoff
 gnome-panel-2909  [000]  2869.687305: 
 => do_mmap_pgoff
 => sys_mmap
 => system_call_fastpath
 gnome-panel-2909  [000]  2869.687308: cap_file_mmap
<-security_file_mmap
 gnome-panel-2909  [000]  2869.687308: 
 => security_file_mmap
 => do_mmap_pgoff
 => sys_mmap
 => system_call_fastpath
 gnome-panel-2909  [000]  2869.687312: mmap_region
<-do_mmap_pgoff
 gnome-panel-2909  [000]  2869.687312: 
 => do_mmap_pgoff
 => sys_mmap
 => system_call_fastpath
 gnome-panel-2909  [000]  2869.687323: generic_file_mmap
<-mmap_region
 gnome-panel-2909  [000]  2869.687323: 
 => mmap_region
 => do_mmap_pgoff
 => sys_mmap
 => system_call_fastpath
 gnome-panel-2909  [000]  2869.695637: sys_mmap
<-system_call_fastpath
 gnome-panel-2909  [000]  2869.695638: 
 => system_call_fastpath
 gnome-panel-2909  [000]  2869.695642: do_mmap_pgoff <-sys_mmap
 gnome-panel-2909  [000]  2869.695643: 
 => sys_mmap
 => system_call_fastpath
 gnome-panel-2909  [000]  2869.695646: security_file_mmap
<-do_mmap_pgoff
 gnome-panel-2909  [000]  2869.695647: 
 => do_mmap_pgoff
 => sys_mmap
 => system_call_fastpath
 gnome-panel-2909  [000]  2869.695650: cap_file_mmap
<-security_file_mmap
 gnome-panel-2909  [000]  2869.695650: 
 => security_file_mmap
 => do_mmap_pgoff
 => sys_mmap
 => system_call_fastpath
 gnome-panel-2909  [000]  2869.695654: mmap_region
<-do_mmap_pgoff
 gnome-panel-2909  [000]  2869.695654: 
 => do_mmap_pgoff
 => sys_mmap
 => system_call_fastpath
 gnome-panel-2909  [000]  2869.695762: sys_mmap
<-system_call_fastpath
 gnome-panel-2909  [000]  2869.695762: 
 => system_call_fastpath
 gnome-panel-2909  [000]  2869.695766: do_mmap_pgoff <-sys_mmap
 gnome-panel-2909  [000]  2869.695766: 
 => sys_mmap
 => system_call_fastpath
 gnome-panel-2909  [000]  2869.695772: security_file_mmap
<-do_mmap_pgoff
 gnome-panel-2909  [000]  2869.695773: 
 => do_mmap_pgoff
 => sys_mmap
 => system_call_fastpath
 gnome-panel-2909  [000]  2869.695776: cap_file_mmap
<-security_file_mmap
 gnome-panel-2909  [000]  2869.695776: 
 => security_file_mmap
 => do_mmap_pgoff
 => sys_mmap
 => system_call_fastpath
 gnome-panel-2909  [000]  2869.695780: mmap_region
<-do_mmap_pgoff
 gnome-panel-2909  [000]  2869.695780: 
 => do_mmap_pgoff
 => sys_mmap
 => system_call_fastpath
 gnome-panel-2909  [000]  2869.695785: generic_file_mmap
<-mmap_r

[linuxkernelnewbies] LKML: Steven Rostedt: [PATCH 2/2] ftrace: add stack trace to function tracer

2009-09-27 Thread Peter Teoh






http://lkml.org/lkml/2009/1/15/725


  

  
  

  

  


  Date
  Thu, 15 Jan 2009 19:40:52 -0500


  From
  Steven Rostedt <>


  Subject
  [PATCH 2/2] ftrace: add stack trace to function
tracer


  
  

  

From: Steven Rostedt 

Impact: new feature to stack trace any function

Chris Mason asked about being able to pick and choose a function
and get a stack trace from it. This feature enables his request.

 # echo io_schedule > /debug/tracing/set_ftrace_filter
 # echo function > /debug/tracing/current_tracer
 # echo func_stack_trace > /debug/tracing/trace_options
Produces the following in /debug/tracing/trace:

   kjournald-702   [001]   135.673060: io_schedule <-sync_buffer
   kjournald-702   [002]   135.673671:
 <= sync_buffer
 <= __wait_on_bit
 <= out_of_line_wait_on_bit
 <= __wait_on_buffer
 <= sync_dirty_buffer
 <= journal_commit_transaction
 <= kjournald

Note, be careful about turning this on without filtering the functions.
You may find that you have a 10 second lag between typing and seeing
what you typed. This is why the stack trace for the function tracer
does not use the same stack_trace flag as the other tracers use.

Signed-off-by: Steven Rostedt 
---
 kernel/trace/trace.c   |   26 
 kernel/trace/trace.h   |7 +++
 kernel/trace/trace_functions.c |   84 
 3 files changed, 108 insertions(+), 9 deletions(-)
diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index dcb757f..3c54cb1 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -835,10 +835,10 @@ ftrace(struct trace_array *tr, struct trace_array_cpu *data,
 		trace_function(tr, data, ip, parent_ip, flags, pc);
 }
 
-static void ftrace_trace_stack(struct trace_array *tr,
-			   struct trace_array_cpu *data,
-			   unsigned long flags,
-			   int skip, int pc)
+static void __ftrace_trace_stack(struct trace_array *tr,
+ struct trace_array_cpu *data,
+ unsigned long flags,
+ int skip, int pc)
 {
 #ifdef CONFIG_STACKTRACE
 	struct ring_buffer_event *event;
@@ -846,9 +846,6 @@ static void ftrace_trace_stack(struct trace_array *tr,
 	struct stack_trace trace;
 	unsigned long irq_flags;
 
-	if (!(trace_flags & TRACE_ITER_STACKTRACE))
-		return;
-
 	event = ring_buffer_lock_reserve(tr->buffer, sizeof(*entry),
 	 &irq_flags);
 	if (!event)
@@ -869,12 +866,23 @@ static void ftrace_trace_stack(struct trace_array *tr,
 #endif
 }
 
+static void ftrace_trace_stack(struct trace_array *tr,
+			   struct trace_array_cpu *data,
+			   unsigned long flags,
+			   int skip, int pc)
+{
+	if (!(trace_flags & TRACE_ITER_STACKTRACE))
+		return;
+
+	__ftrace_trace_stack(tr, data, flags, skip, pc);
+}
+
 void __trace_stack(struct trace_array *tr,
 		   struct trace_array_cpu *data,
 		   unsigned long flags,
-		   int skip)
+		   int skip, int pc)
 {
-	ftrace_trace_stack(tr, data, flags, skip, preempt_count());
+	__ftrace_trace_stack(tr, data, flags, skip, pc);
 }
 
 static void ftrace_trace_userstack(struct trace_array *tr,
diff --git a/kernel/trace/trace.h b/kernel/trace/trace.h
index 79c8721..bf39a36 100644
--- a/kernel/trace/trace.h
+++ b/kernel/trace/trace.h
@@ -457,6 +457,11 @@ void update_max_tr(struct trace_array *tr, struct task_struct *tsk, int cpu);
 void update_max_tr_single(struct trace_array *tr,
 			  struct task_struct *tsk, int cpu);
 
+void __trace_stack(struct trace_array *tr,
+		   struct trace_array_cpu *data,
+		   unsigned long flags,
+		   int skip, int pc);
+
 extern cycle_t ftrace_now(int cpu);
 
 #ifdef CONFIG_FUNCTION_TRACER
@@ -467,6 +472,8 @@ void tracing_stop_function_trace(void);
 # define tracing_stop_function_trace()		do { } while (0)
 #endif
 
+extern int ftrace_function_enabled;
+
 #ifdef CONFIG_CONTEXT_SWITCH_TRACER
 typedef void
 (*tracer_switch_func_t)(void *private,
diff --git a/kernel/trace/trace_functions.c b/kernel/trace/trace_functions.c
index 9236d7e..3a5fa08 100644
--- a/kernel/trace/trace_functions.c
+++ b/kernel/trace/trace_functions.c
@@ -16,6 +16,8 @@
 
 #include "trace.h"
 
+static struct trace_array	*func_trace;
+
 static void start_function_trace(struct trace_array *tr)
 {
 	tr->cpu = get_cpu();
@@ -34,6 +36,7 @@ static void stop_function_trace(struct trace_array *tr)
 
 static int function_trace_init(struct trace_array *tr)
 {
+	func_trace = tr;
 	start_function_trace(tr);
 	return 0;
 }
@@ -48,12 +51,93 @@ static void function_trace_start(struct trace_array *tr)
 	tracing_reset_online_cpus(tr);
 }
 
+static void
+function_stack_trace_call(unsigned long ip, unsigned long parent_ip)
+{
+	struct trace_array *tr = func_trace;
+	struct trace_array_cpu *data;
+	unsigned long flags;
+	long disabled;
+	int cpu;
+	int pc;
+
+	if (unlikely(!ftrace_function_enabled))
+		return;
+
+	/*
+	 * Need to use raw, since this must be called before the
+	 * recursive protection is performed.
+	 */
+

[linuxkernelnewbies] cache line ping-p...@everything2.com

2009-09-27 Thread peter teoh






http://everything2.com/index.pl?node_id=1382347

cache line ping-pong



  

  (idea)
  by cebix
  
  Sat Nov 02 2002 at 13:45:57

  



One way of maintaining cache
coherence in multiprocessing
designs with CPUs
that have local caches
is to ensure that single cache lines are never held by more than one CPU
at a time. With write-through caches, this is easily implemented
by having the CPUs invalidate cache lines on snoop hits.

However, if multiple CPUs are working on the same set of data from
main memory, this can lead to the following scenario:


  CPU #1 reads a cache line from memory.
  CPU #2 reads the same line, CPU #1 snoops the access and
invalidates its local copy.
  CPU #1 needs the data again and has to re-read the entire
cache line, invalidating the copy in CPU #2 in the process.
  CPU #2 now also re-reads the entire line, invalidating the copy
in CPU #1.
  Lather, rinse, repeat.
  


The result is a dramatic performance loss because the CPUs keep fetching
the same data over and over again from slow main memory.

Possible solutions include:


  Use a smarter cache coherence protocol, such as MESI.
  Mark the address space in question as cache-inhibited. Most CPUs
will then resort to single-word accesses which should be faster than
reloading entire cache lines (usually 32 or 64 bytes).
  If the data set is small, make one copy in memory for each CPU.

If the data set is large and processed sequentially, have each CPU
work on a different part of it (one starting at the beginning, one at
the middle, etc.).

[linuxkernelnewbies] CRiSP Weblog: December 2009 Archives (porting dtrace tool to Linux)

2009-09-27 Thread peter teoh





http://www.crisp.demon.co.uk/blog/archives/2009-06.html


rtdb framework

 I'm busy at the moment trying to get the
rtld/rtdb functions
to work. Its a difficult decision - do I drag in more and more
Sun/Solaris code, so that there is a one-to-one mapping of
functions and intent, or do I stop here, and start
writing my own code.
The rtdb functions are interfaces to the runtime linker (ld.so.1),
and, although very nice, rely on intimate behavior of the Solaris
linker. This doesnt exist on Linux (i.e. the corresponding functions).
So, copying the code into dtrace means copying more and more
dependencies
(avlist, linked list, msg locales and other stuff), for little benefit.

dtrace uses these functions in a very specific way: get the symtab
of the target process we are tracing, along with the symtab for the
loaded shared libraries.

I am going to draw a line and see how much I can do without
dragging it in. (I dragged it in and have kicked it out again, as
I just spend more and more time porting Solaris to Linux, which
isnt the end goal).

The end goal is making the PID provider and user space stack
traces "as they should be".

This will likely take a while, so will update periodically if
I feel what I have is no worse than before. 


 
Posted by Paul Fox | Permalink
 
 
 Sun
Jun 28 12:07:05 BST 2009

dtrace progress - symtabs

 I have put out a new release which is
better at handling
stacks for 32+64b platforms and whether they are compiled
with/without frame pointers. Its not perfect - the later your
kernel, the more trustworthy the stack will be, since in the
worst case, we have to examine the stack, word-by-word, to find
likely looking return addresses (the same as the kernel does),
since GCC over-optimises frame pointers.
I am currently looking at this:

$ dtrace -n pidXXX::: -p XXX

I tried this on my MacOS system, and was intrigued by the fact
that for a sample Perl app, tens of thousands of new probes sprang into
life. It looks to me that you can DOS attack a kernel with these
privs, since if you do this on lots of processes, you can
eat the probe memory that dtrace will set aside, and either run out,
or affect performance of a system.
At the moment I am knee deep in more ELF/dynamic stuff, so that
we can get the symtab of a running process so that the PID provider
is more usable. 


 
Posted by Paul Fox | Permalink
 

 
 Thu
Jun 25 23:16:47 BST 2009

SDT probes - what?

 SDT - static probes are high level
probes in the kernel, in the
sense that they add value compared to FBT. FBT probes can go on
any function - you know the function got entered or returned.
But finding key datastructures, such as the current "proc" or
"timer" or "packet" isnt easy to discern without playing around
with stack arguments and type casts to a known type.
Thats how I read the SDT: SDT can provide a probe like
"received_packet" and provide an argument which represents the
packet so you can dissect it.

But, the question is - are they useful ?!

I dont really understand the probes despite staring at the code
for a while. I understand lots of the technicalities, but not
the rationale. Is my first paragraph spot on?
Feel free to send me feedback about why they are a *must*.

Why?

Well, many of the probes in Solaris relate to Solaris internals.
The concepts of scheduling on solaris dont match the Linux kernel.
Solaris has a process and a lwp (lightweight kernel thread). In Linux,
all threads are really processes.

So, if you have a D script written for Solaris, it wont work on
Linux,
unless I provide as close an emulation as possible. I have found
the FBT is more than enough to keep me entertained, but I am
trying to find if we need SDT.

There are a lot of values exposed in /proc such as statistic
counters.
And there is a lot of code in the kernel which increments those
counters.
But the counters on their own are not directly interesting (you can put
an FBT on the functions that manipulate those counters). So, maybe I am
missing something, like, with dtrace/linux today, you cannot easily
inspect processes, io, vm, packets, etc. 


 
Posted by Paul Fox | Permalink
 

 
 Tue
Jun 23 23:56:25 BST 2009

fixed the 32b problems?

 Just uploaded a new release -- which may
fix the problem.
Found that if I disable the GPF interrupt hook, the
reliability problems disappear. I dont understand how/why - the
race conditions that could happen should be very small...
but seems to work.
I will have to analyse this more to see why that hook (which
shouldnt
fire, and we do put it back on a rmmod) causes a problem. 


 
Posted by Paul Fox | Permalink
 

 
 Tue
Jun 23 22:35:39 BST 2009

32b drat

 I have had a bug report that builds
since 20090617 for 32b
kernels are failing to load. Strange, because it worked for me,
but I dont have every permutation of kernel and modules.
After trying a few experiments, it appears that reloading the
dtrace driver will panic/crash/reboot the 32b kernel. (After 3
times for my test machine, and in vmware, a re

[linuxkernelnewbies] Windows x64 Watch List

2009-09-27 Thread peter teoh





http://www.osnews.com/story/20330/Windows_x64_Watch_List

   
Windows
x64 Watch List 
 posted by David Handlos on Thu 25th Sep
2008 18:07 UTC 
A
Windows developer and Sysadmin has compiled a "Watch List" of the small
but annoyingly important things to keep in mind when moving from 32 bit
Windows to Windows x64.
Introduction


Like many others in the IT world, I tend to wear a lot of hats in my
job. Often, I'm both an application developer and a system
administrator. I'll develop an application and then optimize the
operating system for it.
And again, like many others in IT, I like to use new technology when I
can, especially if it can save me time down the road. So, once I had
the opportunity to look into the 64-bit editions of Windows (also known
as "x64"), I jumped at the chance. 

In order to take advantage of the benefits of these x64-based Windows
environments, I've begun to look closely at the differences between
them and the traditional 32-bit Windows systems. And frankly, what I've
found so far has blown BOTH
of my hats clean off. While most of the differences between the two
look pretty subtle, they are significant. Whether you are trying to
develop 64-bit applications for the x64 world, or just trying to
migrate your existing 32-bit applications or scripts over, there are
several things that need to be taken into account before you make the
64-bit plunge. 

Background: 64-bit Windows...why should we care?


For those of you who are new to the terms "x64" or "64-bit", what it
boils down to is a way of utilizing more memory. Traditionally,
applications on 32-bit Windows systems can only address 2 gigabytes of
system memory. No matter if you have a system with 4GB of memory, 2GB
is all the memory any single application can use. 
With IIS, Microsoft's built-in web server, that limit is even lower.
At the most, an IIS process can utilize only 800MB
of memory on a given 32-bit system. Again, regardless of how much
memory is free on the system, 800MB is the most IIS can utilize on a
32-bit system, and remain stable.


64-bit operating systems can utilize an exponentially larger amount
of memory, changing things drastically.
As a result, any application built to run on one of these x64
environments can address up to 8 terabytes.
While
you aren't likely to find anything currently that has 8TB of memory
nowadays, it's great to know there's got plenty of room to expand. 

The Watch List


In order to ease my transition to the 64-bit world, I compiled my own
"Watch List" of the small, but annoyingly important things to keep in
mind when moving up to Windows x64.




  64-bit applications cannot access 32-bit libraries, or vice
versa 
Although you can run your old 32-bit applications on a
64-bit machine, they'll run on a separate layer called WOW64 (Windows
On Windows 64). Windows x64 is architected to keep 32 and 64-bit code
separate. If you did actually attempt to merge the two, application
crashes would be in your immediate future.

  
  
  
  There are now separate system file sections for both 32-bit
and 64-bit code
Windows x64's architecture keeps all 32-bit system
files in
a directory named "C:\WINDOWS\SysWOW64", and 64-bit system files are
place in the the oddly-named "C:\WINDOWS\system32" directory. For most
applications, this doesn't matter, as Windows will re-direct all 32-bit
files to use "SysWOW64" automatically to avoid conflicts.
  However, anyone (like us system admins) who depend on
_vbscript_s to
accomplish tasks, may have to directly reference "SysWOW64" files if
needed, since re-direction doesn't apply as smoothly. 

  
  
  There are now separate registry sections for both 32-bit and
64-bit code
With x64, the registry has separate sections as well.
The
"HKEY_LOCAL_MACHINE/SOFTWARE" key is used to contain registry entries
for 64-bit programs, and 32-bit programs are re-directed to use
"HKEY_LOCAL_MACHINE/SOFTWARE/Wow6432Node" instead.
  In most cases, this shouldn't be an issue...unless someone has
both 32
and 64-bit applications that depend on the same registry settings. If
that happens, at least one application may fail, since it is looking
for registry data that it can't find.
  

  
  
  The default ODBC Data Source Administrator is 64-bit only
A wide variety of scripts, programs, and web
applications
use ODBC settings set up by the ODBC Data Source Administrator (located
in "Control Panel -> Administrative Tools") to connect to a given
database.
  Oddly enough, if you set up an ODBC connection with this tool
on an x64
box, that connection will only work for 64-bit applications. If you
want to set up an ODBC connection for a 32-bit program, you have to set
up the connection using the identical-looking, but somewhat hidden,
32-bit ODBC manager, which is here: C:\WINDOWSS\ysWOW64\odbcad32.exe.
  

  
  
  IIS can run either as a 64-bit or 32-bit process
I mentioned earlier that running on 64-bit moves your
memory limit with Micr

[linuxkernelnewbies] Re: cache line ping-p...@everything2.com

2009-09-27 Thread Peter Teoh


More info on ping-pong effect, as well as L1 vs L2 cache:

http://morecores.cn/publication/pdf/Computing%20PI%20to%20Understand%20Key%20Issues%20in%20Multi.pdf

On Sep 28, 10:02 am, peter teoh  wrote:
> http://everything2.com/index.pl?node_id=1382347cache line 
> ping-pong(idea)bycebixSat Nov 02 2002 at 13:45:57
>
>
>
> One way of maintainingcache coherenceinmultiprocessingdesigns withCPUs that 
> have localcaches is to ensure that singlecache lines are never held by more 
> than one CPU at a time. Withwrite-through caches, this is easily implemented 
> by having the CPUs invalidate cache lines onsnoophits.
>
> However, if multiple CPUs are working on the same set of data from main 
> memory, this can lead to the following scenario:CPU #1 reads a cache line 
> from memory.CPU #2 reads the same line, CPU #1 snoops the access and 
> invalidates its local copy.CPU #1 needs the data again and has to re-read 
> theentirecache line, invalidating the copy in CPU #2 in the process.CPU #2 
> now also re-reads the entire line, invalidating the copy in CPU #1.Lather, 
> rinse, repeat.
>
> The result is a dramaticperformanceloss because the CPUs keep fetching the 
> same data over and over again from slow main memory.
>
> Possible solutions include:Use a smarter cache coherence protocol, such 
> asMESI.Mark the address space in question as cache-inhibited. Most CPUs will 
> then resort to single-word accesses which should be faster than reloading 
> entire cache lines (usually 32 or 64 bytes).If the data set is small, make 
> one copy in memory for each CPU.If the data set is large and processed 
> sequentially, have each CPU work on a different part of it (one starting at 
> the beginning, one at the middle, etc.).

[linuxkernelnewbies] kernel initialization kernel_init() to initcalls execution

2009-09-28 Thread Peter Teoh







Inside init/main.c:kernel_init():

854 
855 static int __init kernel_init(void * unused)
856 {
857 lock_kernel();
858 
859 /*
860  * init can allocate pages on any node
861  */
862 set_mems_allowed(node_possible_map);
863 /*
864  * init can run on any cpu.
865  */
866 set_cpus_allowed_ptr(current, cpu_all_mask);
867 /*
868  * Tell the world that we're going to be the grim
869  * reaper of innocent orphaned children.
870  *
871  * We don't want people to have to make incorrect
872  * assumptions about where in the task array this
873  * can be found.
874  */
875 init_pid_ns.child_reaper = current;
876 
877 cad_pid = task_pid(current);
878 
879 smp_prepare_cpus(setup_max_cpus);
880 
881 do_pre_smp_initcalls();
882 start_boot_trace();
883 
884 smp_init();
885 sched_init_smp();
886 
887 do_basic_setup();
888 

901 
902 /*
903  * Ok, we have completed the initial bootup, and
904  * we're essentially up and running. Get rid of the
905  * initmem segments and start the user-mode stuff..
906  */
907 
908 init_post();
909 return 0;
910 }


here do_basic_setup() (above) is executed:


775 /*
776  * Ok, the machine is now initialized. None of the devices
777  * have been touched yet, but the CPU subsystem is up and
778  * running, and memory and process management works.
779  *
780  * Now we can finally start doing some real work..
781  */
782 static void __init do_basic_setup(void)
783 {
784 rcu_init_sched(); /* needed by module_init stage. */
785 init_workqueues();
786 cpuset_init_smp();
787 usermodehelper_init();
788 driver_init();
789 init_irq_proc();
790 do_ctors();
791 do_initcalls();
792 }
793 

>From above, we can see do_initcalls() is executed:

764 static void __init do_initcalls(void)
765 {
766 initcall_t *call;
767 
768 for (call = __early_initcall_end; call < __initcall_end;
call++)
769 do_one_initcall(*call);
770 
771 /* Make sure there is no pending stuff from the initcall
sequence */
772 flush_scheduled_work();
773 }

and among the initcalls to be made is populate_rootfs():

567 
568 static int __init populate_rootfs(void)
569 {
570 char *err = unpack_to_rootfs(__initramfs_start,
571  __initramfs_end - __initramfs_start);
572 if (err)
573 panic(err); /* Failed to decompress INTERNAL
initramfs */
574 if (initrd_start) {
575 #ifdef CONFIG_BLK_DEV_RAM
576 int fd;
577 printk(KERN_INFO "Trying to unpack rootfs image as
initramfs...\n");
578 err = unpack_to_rootfs((char *)initrd_start,
579 initrd_end - initrd_start);
580 if (!err) {
581 free_initrd();
582 return 0;
583 } else {
584 clean_rootfs();
585 unpack_to_rootfs(__initramfs_start,
586  __initramfs_end -
__initramfs_start);
587 }
588 printk(KERN_INFO "rootfs image is not initramfs
(%s)"
589 "; looks like an initrd\n", err);
590 fd = sys_open("/initrd.image", O_WRONLY|O_CREAT,
0700);
591 if (fd >= 0) {
592 sys_write(fd, (char *)initrd_start,
593 initrd_end - initrd_start);
594 sys_close(fd);
595 free_initrd();
596 }
597 #else
598 printk(KERN_INFO "Unpacking initramfs...\n");
599 err = unpack_to_rootfs((char *)initrd_start,
600 initrd_end - initrd_start);
601 if (err)
602 printk(KERN_EMERG "Initramfs unpacking
failed: %s\n", err);
603 free_initrd();
604 #endif
605 }
606 return 0;
607 }
608 rootfs_initcall(populate_rootfs);

Here the rootfs is setup, and torn down later (except retain_initrd
being specified).

And note that after all the do_basic_setup() is the init_post():

901 
902 /*
903  * Ok, we have completed the initial bootup, and
904  * we're essentially up and running. Get rid of the
905  * initmem segments and start the user-mode stuff..
906  */
907 
908 init_post();
909 return 0;
910 }

where all the userspace scripts are executed:

836 /*
837  * We try each of these until one succeeds.
838  *
839  * The Bourne shell can be used instead of init if we are
840  * trying to recover a really broken machine.
841  */
842 if (execute_

[linuxkernelnewbies] kdump howto/internals

2009-09-28 Thread Peter Teoh








Documentation for Kdump - The kexec-based Crash Dumping Solution


This document includes overview, setup and installation, and analysis
information.

Overview


Kdump uses kexec to quickly boot to a dump-capture kernel whenever a
dump of the system kernel's memory needs to be taken (for example, when
the system panics). The system kernel's memory image is preserved across
the reboot and is accessible to the dump-capture kernel.

You can use common commands, such as cp and scp, to copy the
memory image to a dump file on the local disk, or across the network to
a remote system.

Kdump and kexec are currently supported on the x86, x86_64, ppc64 and
ia64
architectures.

When the system kernel boots, it reserves a small section of memory for
the dump-capture kernel. This ensures that ongoing Direct Memory Access
(DMA) from the system kernel does not corrupt the dump-capture kernel.
The kexec -p command loads the dump-capture kernel into this reserved
memory.

On x86 machines, the first 640 KB of physical memory is needed to boot,
regardless of where the kernel loads. Therefore, kexec backs up this
region just before rebooting into the dump-capture kernel.

Similarly on PPC64 machines first 32KB of physical memory is needed for
booting regardless of where the kernel is loaded and to support 64K page
size kexec backs up the first 64KB memory.

All of the necessary information about the system kernel's core image is
encoded in the ELF format, and stored in a reserved area of memory
before a crash. The physical address of the start of the ELF header is
passed to the dump-capture kernel through the elfcorehdr= boot
parameter.

With the dump-capture kernel, you can access the memory image, or "old
memory," in two ways:

- Through a /dev/oldmem device interface. A capture utility can read the
  device file and write out the memory in raw format. This is a raw dump
  of memory. Analysis and capture tools must be intelligent enough to
  determine where to look for the right information.

- Through /proc/vmcore. This exports the dump as an ELF-format file that
  you can write out using file copy commands such as cp or scp. Further,
  you can use analysis tools such as the GNU Debugger (GDB) and the
Crash
  tool to debug the dump file. This method ensures that the dump pages
are
  correctly ordered.


Setup and Installation
==

Install kexec-tools
---

1) Login as the root user.

2) Download the kexec-tools user-space package from the following URL:

http://www.kernel.org/pub/linux/kernel/people/horms/kexec-tools/kexec-tools.tar.gz

This is a symlink to the latest version.

The latest kexec-tools git tree is available at:

git://git.kernel.org/pub/scm/linux/kernel/git/horms/kexec-tools.git
or
http://www.kernel.org/git/?p=linux/kernel/git/horms/kexec-tools.git

More information about kexec-tools can be found at
http://www.kernel.org/pub/linux/kernel/people/horms/kexec-tools/README.html

3) Unpack the tarball with the tar command, as follows:

   tar xvpzf kexec-tools.tar.gz

4) Change to the kexec-tools directory, as follows:

   cd kexec-tools-VERSION

5) Configure the package, as follows:

   ./configure

6) Compile the package, as follows:

   make

7) Install the package, as follows:

   make install


Build the system and dump-capture kernels
-
There are two possible methods of using Kdump.

1) Build a separate custom dump-capture kernel for capturing the
   kernel core dump.

2) Or use the system kernel binary itself as dump-capture kernel and
there is
   no need to build a separate dump-capture kernel. This is possible
   only with the architectures which support a relocatable kernel. As
   of today, i386, x86_64, ppc64 and ia64 architectures support
relocatable
   kernel.

Building a relocatable kernel is advantageous from the point of view
that
one does not have to build a second kernel for capturing the dump. But
at the same time one might want to build a custom dump capture kernel
suitable to his needs.

Following are the configuration setting required for system and
dump-capture kernels for enabling kdump support.

System kernel config options


1) Enable "kexec system call" in "Processor type and features."

   CONFIG_KEXEC=y

2) Enable "sysfs file system support" in "Filesystem" -> "Pseudo
   filesystems." This is usually enabled by default.

   CONFIG_SYSFS=y

   Note that "sysfs file system support" might not appear in the "Pseudo
   filesystems" menu if "Configure standard kernel features (for small
   systems)" is not enabled in "General Setup." In this case, check the
   .config file itself to ensure that sysfs is turned on, as follows:

   grep 'CONFIG_SYSFS' .config

3) Enable "Compile the kernel with debug info" in "Kernel hacking."

   CONFIG_DEBUG_INFO=Y

[linuxkernelnewbies] Linux Crash HOWTO

2009-09-28 Thread Peter Teoh

http://www.faqs.org/docs/Linux-HOWTO/Linux-Crash-HOWTO.html

Linux Crash HOWTO
Norman Patten

nepat...@us.ibm.com

2002-01-30

Revision History

Revision 1.0
2002-01-30
Revised by: NM

Initial release.

This document describes the installation and usage of the LKCD
(Linux Kernel Crash Dump) package.

Table of Contents
1. Introduction

1.1. Copyright
and License

2. How
LKCD Works

2.1. What
You Need

3. Installation
of lkcd

3.1. Installing
>From Source Code
3.2. Building
and Installing LKCD Utilities
3.3. What
Gets Installed
3.4. Installing
LKCD Utilities From RPM
3.5. Patching
the Kernel
3.6. Build
and Install the Kernel

4. Setup,
Test, and Running crash

4.1. Setting
up crash dump
4.2. Testing
crash
4.3. Running
crash

1. Introduction
The LKCD (Linux Kernel Crash Dump) project is a
set of kernel patches and utilities to allow a copy of the kernel
memory to be saved in the event of a kernel panic. The saved kernel
image makes forensics on the kernel panic possible with utilities
included in the package. Most commercial Unix operating systems come
with similar crash utilities, but this package is fairly new to Linux
and has to be added on manually. The LKCD utility is not designed to
gather helpful information in the case of a hardware caused panic or a
segment violation. The complete LKCD package is available for download
at http://lkcd.sourceforge.net/.

1.1. Copyright and License
This document is copyrighted (c) 2002 by Norman Patten. Permission
is granted to copy, distribute and/or modify this document under the
terms of the GNU Free Documentation License, Version 1.1 or any later
version published by the Free Software Foundation; with no Invariant
Sections, with no Front-Cover Texts, and with no Back-Cover Texts. A
copy of the license is available at
http://www.gnu.org/copyleft/fdl.html.
Linux is a registered trademark of Linus Torvalds . lkcd is
distributed under the copyright of Silicon Graphics Inc.
Send feedback to nepat...@us.ibm.com.

2. How LKCD Works
When a kernel encounters certain errors it calls the "panic"
function which results from a unrecoverable error. This panic results
in LKCD initiating a kernel dump where kernel memory is copied out to
the pre-designated dump area. The dump device is configured as primary
swap by default. The kernel is not completely functional at this point,
but there is enough functionality to copy memory to disk. After dump
finishes copying memory to disk, the system re-boots. When the system
boots back up, it checks for a new crash dump. If a new crash dump is
found it is copied from the dump location to the file system, "/var/log/dump" directory by default. After
copying the image, the system continues to boot normally and forensics
can be performed at a later date.

2.1. What You Need
lkcd-kernelxxx.diff file for patching
the kernel. The kernel version supported will change routinely. lkcdutils-xx.src.rpm - this is the utilities
source and scripts you will need to setup and read a crash. At the time
of this writing there is a i386 binary rpm available from lkcd.sourceforge.net,
but you will still need the patches for the startup scripts from the
source rpm.

3. Installation of lkcd

3.1. Installing From Source Code
Get the lkcdutils-xxx.src.rpm and
install it using rpm -i kcdutils-xxx.src.rpm .
This will place a file called lkcdutils-xxx.tar.gz
in the /usr/src/redhat/SOURCES directory.
This file is a compressed tar image of the lkcd source tree. Unwind the
source in a directory of your choice like "/usr/src"
with tar -zxvf kcdutils-xxx.src.rpm . This will
create a directory called "kcdutils-xxx"
which will contain the LKCD utilities source.

3.2. Building and Installing LKCD
Utilities
LKCD used the standard GCC compiler and make files. To build the
suite, cd to the LKCD src directory and run ./configure
to build configuration files. The next step is to run make to build the utilities, and finally run make install to install the utilities and man
pages.

3.3. What Gets Installed

/etc/sysconfig/dump # Configuration file for dump
/sbin/lcrash# The crash utility
/sbin/lkcd # Script to configure and save a crash
/sbin/lkcd_config # Configuration utility for dump
/sbin/lkcd_ksyms# Utility for reconstructing kernel symbols
/usr/include/sial_api.h # Header file for the SIAL API
/usr/lib/libsial.a # Simple Image Access Language library
/usr/man/man1/lcrash.1 # man page for lcrash
/usr/man/man1/lkcd_config.1 # man page for lkcd_config
/usr/man/man1/lkcd_ksyms.1 # man page for lkcd_ksyms
/usr/share/sial/lcrash/ps.sial # ps command implementation of SIAL

[linuxkernelnewbies] LKCD - Linux Kernel Crash Dump

2009-09-28 Thread Peter Teoh






http://lkcd.sourceforge.net/

Introduction - Linux Kernel Crash Dump
 The Linux Kernel Crash Dump (LKCD) project is designed to meet the
needs of customers and system administrators wanting a reliable method
of detecting, saving and examining system crashes. While more mature
operating systems have provided these capabilities by default for
years, Linux has yet to evolve to such a state. LKCD is an attempt to
move Linux towards greater supportability. 
 Kernel Crash Dump Requires Four Components: 

  Kernel Support:
Kernel code for configuring dump parameters, catching error conditions,
and executing a kernel memory dump. Kernel.org kernels need to be
patched with the LKCD dump modules.
  
  Dump Configuration:
Facilities for integrating system crash dump capabilities into the
operating system. These facilities are in the form of user-level
applications to configure and enable crash dumps and various system
scripts necessary for incorporating LKCD into the operating system.
  
  Dump Recovery:
User-level commands to retrieve a dump saved by the kernel and transfer
it to a user accessible location.
  
  Dump Analysis:
A debugger that can operate on the saved dump image. The lkcdutils
package provides the lcrash command for dump analysis.

 LKCD provides the all of the components (kernel and user level
code) designed to: 

   Save the kernel memory image when the system dies due to a
software failure;
   Recover the kernel memory image when the system is rebooted;
   Analyze the memory image to determine what happened when the
failure occurred.

 The memory image is stored into a dump device, which is represented
by one of the disk partitions on the system. That dump is recovered
with an application called lcrash (Linux Crash) once the system
boots back up, before the swap partitions are mounted. A report is
generated and saved into /var/log/dump.

[linuxkernelnewbies] Linux kernel: Triggering crash dumps on non-responsive systems using NMI

2009-09-28 Thread Peter Teoh

http://publib.boulder.ibm.com/infocenter/lnxinfo/v3r0m0/index.jsp?topic=/liaai/crashdump/liaaicrashdumptrignmi.htm

ux on IBM systems > Configuring Remote Crash Dump on Linux Systems > Setting up additional conditions to trigger crash dumps

Send feedback

Triggering crash dumps on non-responsive
systems using NMI

The previous section describes how the Magic SysRq
keys
can be used to trigger a crash dump. However, they only work if the
system still responds to console input. In failures where the system
hangs and normal interrupts are disabled, you can use non maskable
interrupts (NMIs) to trigger a panic and hence a crash dump. There
are two ways to do it and they are incompatible.
Only one of them can be enabled at a time:

Using the NMI Watchdog to detect hangs
This mechanism works on most hardware. When the NMI watchdog
is enabled, the system hardware is programmed to periodically generate
an NMI. Each NMI invokes a handler in the Linux kernel to check the
count of certain
interrupts. If the handler detects that this count has not increased
over a certain period of time, it assumes the system is hung. It
then invokes the panic routine. If Kdump is enabled, the routine
also saves a crash dump.
Generating a NMI manually when hang
An alternative to the NMI watchdog is to generate a NMI
manually. This section describes how to configure the Linux kernel to
call the panic routine when
it receives an NMI with an unknown code. In many cases it is possible
to generate such an NMI on a hung system to cause a panic and hence
a crash dump.

Parent topic: Setting
up additional conditions to trigger crash dumps

[linuxkernelnewbies] Linux kernel Crashdump HOWTO/Analysis

2009-09-28 Thread Peter Teoh







http://www.docstoc.com/docs/DownloadDoc.aspx?doc_id=712673

http://cateee.net/lkddb/web-lkddb/CRASH_DUMP.html

http://www.faqs.org/docs/Linux-HOWTO/Linux-Crash-HOWTO.html

[linuxkernelnewbies] Linux Bangalore/2004: Talk Details: LKCD - Linux Kernel Crash Dump

2009-09-28 Thread Peter Teoh





http://linux-bangalore.org/2004/schedules/talkdetails.php?talkcode=F1100032


  

  Harish K
  


  Company
  Motorola Inc.


  Scope
  Technical


  Track
  Kernel Programming


  Talk Title
  LKCD - Linux Kernel
Crash Dump


  Synopsis
  Often
when a linux system fails, it is necessary to preserve an image of
system memory so that a post analysis of the failure may be performed.
Once the preserved image( called a crash dump) is saved to disk the
system can be returned to production.
  
Linux Kernel Crash Dump(LKCD) is a set of kernel and application code
to configure, implement and analyze system crash dumps. LKCD is one of
the primary objective for RAS ( reliability, availability,
serviceability) initiatives in Linux and in Carrier Grade Linux for
carrier grade applications.
  
The presentation will cover a high level view of the kernel side of
LKCD with a brief introduction to the user-level analysis tool.
  
Kernel Side Of LKCD
  
This section covers following topics in brief:
  
1. Kernel Design considerations
2. Initiating Dump Process
3. Kernel hooks for executing crash dump
4. Kernel Dump execution
5. Kernel Dump layout
6. Kernel /proc Tunables to define the user desired characteristics
  
Introduction to Lcrash � User level analysis tool
  
Lcrash is a linux system crash dump analysis tool. It provides access
to kernel data in LKCD crash dumps or live system memory and displays
detailed information about a system crash. It can be used interactively
to generate system crash dump reports.
  
This section covers some of interactive commands to generate kernel
crash report and kernel stack trace, as it was at the time of crash
  


  Speaker Profile
  Harish
is working for Motorola - Embedded Communications Computing Group as
software engineer in linux team. Basically they are into embedded linux
porting for latest and advanced boards.


  Download Slides
No of Downloads - 1754
  
   OpenOffice
Format
   (Click
here to learn more about OpenOffice, the free office suite for
Linux, Windows, etc.) 
  


  
   PowerPoint
Format

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 2721 matches

Mail list logo