date:20070122

On Mon, 22 Jan 2007 01:53:21 +0059
Jiri Slaby [EMAIL PROTECTED] wrote:

7 Seek_Error_Rate 0x000f   083   060   030Pre-fail  Always   
  -   204305750
1 Raw_Read_Error_Rate 0x000f   059   049   006Pre-fail  Always   
  -   215927244
  195 Hardware_ECC_Recovered  0x001a   059   049   000Old_age   Always   
  -   215927244 
  
  Wow! that HDD is really in a bad condition.
 
 I don't think so, this seems to be normal for Seagate drives...

I agree.

For Chr: I don't think these big raw-numbers are counters, look at the
normalized values instead, and see that they are greater than TRESH
values (so they are good).

The meaning of raw-numbers is vendor specific.

-- 
Paolo Ornati
Linux 2.6.20-rc5 on x86_64
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[RFC] Asynchronous Messaging

2007-01-22 Thread Wink Saville


I have implemented a technique which allows a kernel-space thread
or ISR to communicate with user-space or kernel-space threads
asynchronously and without having to copy data (zero copy).

The solution I came up with I call ACE, Atomic Code Execution. As the
name implies once code starts executing within the ACE environment,
that code is guaranteed to complete before any other code will run.

This is accomplished by allocating a page (or more) of memory which
is executable and mapped into every threads address space. Also, all
ISR entry points are modified to detect if the code that was interrupted
was executing within the ACE page. If it was then the ACE code is
allowed to complete before the ISR continues. This then provides
the guarantee of atomic execution.

Another way to look at it is that it gives user space programs the
capability to disable/enable interrupts thus allowing user space code
to execute the equivalent of spin_lock_irqsave() and
spin_unlock_irqrestore().

I then implemented asynchronous messaging with zero copy by implementing
link list operations within the ACE page, allocating the messages
and auxiliary memory globally using vmalloc and adding the notion of a
mproc (message processor) which encapsulates the a thread
and a queue.

I believe the ACE technique and the mproc idea could be used for several
purposes beyond my desire to write event driven applications. In particular
I could see it as a means of implementing device drivers written in user space
as well as a possible technique for communicating with virtual machines such
as Xen or KVM.

Currently, the proof of concept code runs on an Core 2 Duo. For those that
are interested the code is available as a patch against 2.6.19
at http://www.saville.com/linux/async.

I have been using asynchronous messaging for 4+ years and have found that
it provides very interesting properties, but is hindered because it is not
directly supported by operating systems. I am very interested in getting
feedback on the idea of including asynchronous messaging within the kernel.

Thank you,

Wink Saville
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[patch] md: bitmap read_page error

2007-01-22 Thread yang yin


If the bitmap size is less than one page including super_block and
bitmap and the inode's i_blkbits is also small, when doing the
read_page function call to read the sb_page, it may return a error.
For example, if the device is 12800 chunks, its bitmap file size is
about 1.6KB include the bitmap super block. But the inode i_blkbits
value of the bitmap file is 10,  the read_page will submit 4 bh to
load the sb_page. Because the size of bitmap is only 1.6KB, in the
while loop, the error will ocurr when do bmap operation for the block
2, which will  return 0. Then the bitmap can't be initated because of
ther read sb page fail.

Another error is in the bitmap_init_from_disk function.  Before doing
read_page,. calculating the count value misses the size of super
block. When the bitmap just needs one page, It will read two pages
adding the super block. But at the second read, the count value will
be set to 0, and not all the bitmap will be read from the disk and
some bitmap will missed at the second page.

I give a patch as following:

-
diff -Nur linux-2.6.19.2.orig/drivers/md/bitmap.c
linux-2.6.19.2/drivers/md/bitmap.c
--- linux-2.6.19.2.orig/drivers/md/bitmap.c 2007-01-11
03:10:37.0 +0800
+++ linux-2.6.19.2/drivers/md/bitmap.c  2007-01-20 20:45:32.0 +0800
@@ -352,6 +352,7 @@
   struct inode *inode = file-f_dentry-d_inode;
   struct buffer_head *bh;
   sector_t block;
+   loff_t read_size = 0;

   PRINTK(read bitmap file (%dB @ %Lu)\n, (int)PAGE_SIZE,
   (unsigned long long)index  PAGE_SHIFT);
@@ -371,7 +372,7 @@
   attach_page_buffers(page, bh);
   block = index  (PAGE_SHIFT - inode-i_blkbits);
   while (bh) {
-   if (count == 0)
+   if (count == 0 || (read_size = (inode-i_size -
(index  PAGE_SHIFT
   bh-b_blocknr = 0;
   else {
   bh-b_blocknr = bmap(inode, block);
@@ -394,6 +395,7 @@
   set_buffer_mapped(bh);
   submit_bh(READ, bh);
   }
+   read_size += (1  inode-i_blkbits);
   block++;
   bh = bh-b_this_page;
   }
@@ -877,7 +879,7 @@
   int count;
   /* unmap the old page, we're done with it */
   if (index == num_pages-1)
-   count = bytes - index * PAGE_SIZE;
+   count = bytes + sizeof(bitmap_super_t)
- index * PAGE_SIZE;
   else
   count = PAGE_SIZE;
   if (index == 0) {


yinyang


Tel: (86)10-62600547
Fax: (86)10-6265-7255
Mailing: P. O. Box 2704# Beijing
Postcode: 100080
National Research Centre for High Performance Computer
Institute of Computing Technology,Chinese Academy of Sciences
6,South Kexueyuan Road,
Haidian District, Beijing, China
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: SATA exceptions triggered by XFS (since 2.6.18)

On Mon, 22 Jan 2007 11:46:01 +0900
Tejun Heo [EMAIL PROTECTED] wrote:

  I don't know. It's a two years old ST380817AS.
  
  # smartctl -a -d ata /dev/sda
  
  smartctl version 5.36 [x86_64-pc-linux-gnu] Copyright (C) 2002-6 Bruce Allen
  Home page is http://smartmontools.sourceforge.net/
  
  === START OF INFORMATION SECTION ===
  Model Family: Seagate Barracuda 7200.7 and 7200.7 Plus family
  Device Model: ST380817AS
 
 I'll blacklist it.  Thanks.

Ok. It will be better if someone else with the same HD could confirm.

It looks so strange that an HD that works fine, and should support NCQ,
have so big troubles that I can freeze it in less than a second by
using XFS (while with ext3 I cannot, or at least it's very hard).

-- 
Paolo Ornati
Linux 2.6.20-rc5 on x86_64
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: PROBLEM: KB-KiB, MB - MiB, ... (IEC 60027-2)

2007-01-22 Thread Benny Amorsen

 DS == David Schwartz [EMAIL PROTECTED] writes:

DS If you are right, a 512MB RAM stick is mislabelled and is more
DS correctly labelled as 536.8MB. (With 512MiB being equally
DS correct.)

DS Isn't that obviously not just wrong but borderline crazy?

No. It is not obvious to me what is wrong with that. RAM is the only
thing using binary units, everything else is decimal. It is about time
that RAM switched too.


/Benny


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: PROBLEM: KB-KiB, MB - MiB, ... (IEC 60027-2)

2007-01-22 Thread Roland Kuhn


Hi Jan!

On 21 Jan 2007, at 22:12, Jan Engelhardt wrote:


How fast is your Ethernet port?  100Mbps or 95.37Mbps?


Same lie like with harddrives. It's around 80, not 100.
But it depends on how you look at it. 80 for Layer3, possibly
a little more for Layer2/1.

Nope, I get consistently 12e6 bytes/sec, which is 96e6 bits/sec  
across 100Mbps ethernet, fitting nicely with the frame overhead (some  
50 bytes out of 1500, without TCP options). So no lie here. With  
gigabit I'm not completely sure yet, still have to see the advertised  
125e6 symbols/sec (got only as far as 115e6 up to now).


Ciao,
Roland

--
TU Muenchen, Physik-Department E18, James-Franck-Str., 85748 Garching
Telefon 089/289-12575; Telefax 089/289-12570
--
CERN office: 892-1-D23 phone: +41 22 7676540 mobile: +41 76 487 4482
--
Any society that would give up a little liberty to gain a little
security will deserve neither and lose both.  - Benjamin Franklin
-BEGIN GEEK CODE BLOCK-
Version: 3.12
GS/CS/M/MU d-(++) s:+ a- C+++ UL P+++ L+++ E(+) W+ !N K- w--- M 
+ !V Y+

PGP++ t+(++) 5 R+ tv-- b+ DI++ e+++ h y+++
--END GEEK CODE BLOCK--




smime.p7s
Description: S/MIME cryptographic signature


PGP.sig
Description: This is a digitally signed message part

Re: status of: tasklet_unlock_wait() causes soft lockup with -rt and ieee1394 audio

2007-01-22 Thread Ingo Molnar


* Pieter Palmers [EMAIL PROTECTED] wrote:

 Dear all,
 
 What is the status with respect to this problem? I see that in the 
 current -rt patch the problematic code piece is different. I 
 personally haven't tried to reproduce this myself on a more recent 
 kernel, but I just got a report from one of our users who experienced 
 the same problem with 2.6.19-rt15 and RT preemption (desktop 
 preemption works fine).
 
 Should the latest -rt patches be fixed with respect to this issue? If 
 so I'll try and test them, otherwise I omit the effort.

it's not fixed yet. Could you try the patch below?

Ingo

---
 include/linux/interrupt.h |6 ++
 kernel/softirq.c  |   20 
 2 files changed, 22 insertions(+), 4 deletions(-)

Index: linux/include/linux/interrupt.h
===
--- linux.orig/include/linux/interrupt.h
+++ linux/include/linux/interrupt.h
@@ -328,10 +328,8 @@ static inline void tasklet_unlock(struct
clear_bit(TASKLET_STATE_RUN, (t)-state);
 }
 
-static inline void tasklet_unlock_wait(struct tasklet_struct *t)
-{
-   while (test_bit(TASKLET_STATE_RUN, (t)-state)) { barrier(); }
-}
+extern void tasklet_unlock_wait(struct tasklet_struct *t);
+
 #else
 # define tasklet_trylock(t)1
 # define tasklet_tryunlock(t)  1
Index: linux/kernel/softirq.c
===
--- linux.orig/kernel/softirq.c
+++ linux/kernel/softirq.c
@@ -20,6 +20,7 @@
 #include linux/mm.h
 #include linux/notifier.h
 #include linux/percpu.h
+#include linux/delay.h
 #include linux/cpu.h
 #include linux/kthread.h
 #include linux/rcupdate.h
@@ -656,6 +657,25 @@ void __init softirq_init(void)
open_softirq(HI_SOFTIRQ, tasklet_hi_action, NULL);
 }
 
+#if defined(CONFIG_SMP) || defined(CONFIG_PREEMPT_RT)
+
+void tasklet_unlock_wait(struct tasklet_struct *t)
+{
+   while (test_bit(TASKLET_STATE_RUN, (t)-state)) {
+   /*
+* Hack for now to avoid this busy-loop:
+*/
+#ifdef CONFIG_PREEMPT_RT
+   msleep(1);
+#else
+   barrier();
+#endif
+   }
+}
+EXPORT_SYMBOL(tasklet_unlock_wait);
+
+#endif
+
 static int ksoftirqd(void * __data)
 {
struct sched_param param = { .sched_priority = MAX_USER_RT_PRIO/2 };
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

sigaction's ucontext_t with incorrect stack reference when SA_SIGINFO is being used ?

2007-01-22 Thread Xavier Roche

Hi folks,

I have a probably louzy question regarding sigaction() behaviour when an
alternate signal stack is used: it seems that I can not get the user
stack reference in the ucontext_t stack context ; ie. the uc_stack
member contains reference of the alternate signal stack, not the stack
that was used before the crash.

Is this is a normal behaviour ? Is there a way to retrieve the original
user's stack inside the signal callback ?

The example given below demonstrates the issue:
top of stack==0x7f3d7000, alternative_stack==0x501010
SEGV==0x7f3d6ff8; sp==0x501010; current stack is the alternate stack

It is obvious that the SEGV was a stack overflow: the si_addr address is
just on the page below the stack limit.
/* gcc -g [ -D_REENTRANT ] stacktest.c [ -lpthread ] -o stacktest */

#include stdio.h
#include stdlib.h
#include unistd.h
#include sys/resource.h
#include sys/ucontext.h

#ifdef _REENTRANT
#include pthread.h
#endif

/* the alternative stack reference */
static stack_t ss;

/* this function does nasty things */
static void overflow(void) { overflow(); }

/* test entry point */
static void* threadEntry(void* parg) {
  struct rlimit rlim;
  /* setup alternative strack for the current thread */
  ss.ss_flags = 0;
  ss.ss_size = SIGSTKSZ;
  ss.ss_sp = malloc(ss.ss_size);
  if (ss.ss_sp == NULL) {
abort();
  }
  if (sigaltstack(ss, NULL) == -1) {
abort();
  }
  /* print current stack limit */
  if (getrlimit(RLIMIT_STACK, rlim) == 0) {
const unsigned long page_size = (unsigned long) sysconf(_SC_PAGE_SIZE);
const unsigned long stack_bottom =
  (((unsigned long)rlim-rlim.rlim_cur+page_size-1)/page_size)*page_size;
printf(bottom of stack==%p, alternative_stack==%p\n, (void*)stack_bottom,
   (void*)ss.ss_sp);
  }
  /* do something very nasty */
  overflow();
  /* we may not reach this point */
  return NULL;
}

/* SEGV handler */
static void saHandler(int code, siginfo_t *si, void *sc_) {
  void *kenny = (void*) code;
  ucontext_t * const sc = (ucontext_t*) sc_;
  printf(SEGV==%p; sp==%p; current stack is the %s\n, (void*)si-si_addr,
 (void*)((ucontext_t*)sc_)-uc_stack.ss_sp,
 ( kenny = ss.ss_sp  kenny  ss.ss_sp + SIGSTKSZ )
 ? alternate stack : original stack);
  abort();
}

/* main entry point */
int main(void) {
  /* catch SEGV with SA_ONSTACK enabled */
  struct sigaction s;
  memset(s, 0, sizeof(s));
  sigemptyset(s.sa_mask);
  s.sa_flags = SA_SIGINFO | SA_ONSTACK;
  s.sa_sigaction = saHandler;
  if(sigaction (SIGSEGV, s, NULL)) {
abort();
  }

#ifdef _REENTRANT
  /* threaded version */
  {
pthread_t t;
pthread_create(t, NULL, threadEntry, NULL);
pause();  /* wait (almost) endlessly */
  }
#else
  /* single threaded version */
  (void) threadEntry(NULL);
#endif

  /* not reached */
  abort();
  return 0;
}

Re: SATA exceptions triggered by XFS (since 2.6.18)

2007-01-22 Thread Tejun Heo


Paolo Ornati wrote:

=== START OF INFORMATION SECTION ===
Model Family: Seagate Barracuda 7200.7 and 7200.7 Plus family
Device Model: ST380817AS

I'll blacklist it.  Thanks.


Ok. It will be better if someone else with the same HD could confirm.

It looks so strange that an HD that works fine, and should support NCQ,
have so big troubles that I can freeze it in less than a second by
using XFS (while with ext3 I cannot, or at least it's very hard).


Yeap, certainly.  I'll ask people first before actually proceeding with 
the blacklisting.  I'm just getting a bit tired of tides of NCQ firmware 
problems.


Anyways, for the time being, you can easily turn off NCQ using sysfs. 
Please take a look at http://linux-ata.org/faq.html


--
tejun
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: XFS or Kernel Problem / Bug

2007-01-22 Thread Stefan Priebe - FH


Hi!

I've another idea... could it be, that it is a barrier problem? Since 
barriers are enabled by default from 2.6.17 on ...


Stefan

David Chinner schrieb:

On Mon, Jan 22, 2007 at 08:51:10AM +0100, Stefan Priebe - FH wrote:

Hi!

I'm  not shure but perhaps it isn't an XFS Bug.

Here is what i find out:

We've about 300 servers at the momentan and 5 of them are old Intel 
Pentium 4 Machines with a DFI PM-12 Mainboard with VIA chipset. It only 
happens on THESE Machines.


Hmmm - that points more to a hardware problem than a software problem;
crashes in generic_file_buffered_write() are relatively uncommon, and
to have them all isolated to a specific type of hardware is suspicious

Wasn't there a major update of the IDE layer in 2.6.18? or was that
2.6.19 that I'm thinking of? BTW, have you run memtest86 on these
boxes to rule out dodgy memory?

Cheers,

Dave.



-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 3/4] atl1: Main C file for Attansic L1 driver

2007-01-22 Thread Arjan van de Ven

On Sun, 2007-01-21 at 15:06 -0600, Jay Cliburn wrote:
 +
 + /* PCI config space info */
 + pci_read_config_byte(pdev, PCI_REVISION_ID, hw-revision_id);
 + pci_read_config_word(pdev, PCI_COMMAND, hw-pci_cmd_word);

I'm highly suspicious of drivers that use the PCI_COMMAND word...
thankfully this seems to be a write only variable in this driver :)

 + if (adapter-pci_using_64) {
 + /* test whether HIDWORD dma buffer is not cross boundary */
 + if (unlikely(((ring_header-dma  0xULL)  32)
 + != (((ring_header-dma + size)  0xULL) 


this is not needed; this is never ever supposed to happen..
what is more, you allocated consistent DMA memory, without setting the
consistent DMA mask to anything other than 32 bit... so you'll never
even go outside of the 32 bit region..

 + if (tpd_ring-buffer_info)
 + kfree(tpd_ring-buffer_info);

no need for the if(), kfree(NULL) is perfectly fine


 +static void atl1_clear_phy_int(struct atl1_adapter *adapter)
 +{
 + u16 phy_data;
 + spin_lock(adapter-lock);
 + atl1_read_phy_reg(adapter-hw, 19, phy_data);
 + spin_unlock(adapter-lock);

are you sure this lock doesn't need to be irq safe?


 +/**
 + * atl1_irq_disable - Mask off interrupt generation on the NIC
 + * @adapter: board private structure
 + **/
 +void atl1_irq_disable(struct atl1_adapter *adapter)
 +{
 + atomic_inc(adapter-irq_sem);
 + iowrite32(0, adapter-hw.hw_addr + REG_IMR);
 + synchronize_irq(adapter-pdev-irq);
 +}

doesn't this want a PCI posting flush?
I'm also a bit sceptical about irq_sem ...


 +/**
 + * When ACPI resume on some VIA MotherBoard, the Interrupt Disable bit/0x400
 + * on PCI Command register is disable.
 + * The function enable this bit.
 + * Brackett, 2006/03/15
 + */
 +static void atl1_via_workaround(struct atl1_adapter *adapter)
 +{
 + unsigned long value;
 +
 + value = ioread16(adapter-hw.hw_addr + PCI_COMMAND);
 + if (value  PCI_COMMAND_INTX_DISABLE)
 + value = ~PCI_COMMAND_INTX_DISABLE;
 + iowrite32(value, adapter-hw.hw_addr + PCI_COMMAND);
 +}

hmm I wonder if this shouldn't be a more generic PCI level quirk, not so
much a driver level quirk...




-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: wake_up() takes long time to return

2007-01-22 Thread kalash nainwal


On 1/20/07, Arjan van de Ven [EMAIL PROTECTED] wrote:

On Sat, 2007-01-20 at 15:54 +0530, kalash nainwal wrote:
 Hi there,

 We've a kernel (n/w) module, which sits over ethernet. Whenever a pkt
 is received (in softirq), after doing some minimal processing,
 wake_up() is called to wake up another kernel thread which does rest
 (bulk) of the processing.

 We notice that this wake_up() call is sometimes taking as long as 48
 milli-seconds to return. This happens around 10 times out of 10M. We
 earlier thought its possibly because of the contention on rq-lock,
 but we see the same phenomenon even on a uniprocessor box. So obviosly
 thats not the case.

 We can't figure out any other reason for wake_up() to take this much
 time? As this call comes directly in our (receive) hotpath, we're very
 concerned. Any help would be greatly appreciated.


Hi,

unfortunately you didn't provide your driver code or a link to it, so
people who want to help you would have to guess in the dark... could you
reply to this email with the pointer to the code?

Greetings,
   Arjan van de Ven
--
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via 
http://www.linuxfirmwarekit.org




Hi Arjan,

I won't pretend I'm working on an open-source driver. I personally
would be more than happy to share the driver code, but doing so would
probably cost me my job :)

and so...I won't expect anyone to help me with my code either.

Just wanted to know if wake_up is known to take this long to return?
(some known linux quirk may be?) If so then under what conditions? or
it _definitely_ would be my code only that's screwing up?

I'm using do_gettimeofday() before and after wake_up() to measure this time.

Thanks and regards,
-Kalash
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: SATA exceptions triggered by XFS (since 2.6.18)

On Mon, 22 Jan 2007 18:35:05 +0900
Tejun Heo [EMAIL PROTECTED] wrote:

 Yeap, certainly.  I'll ask people first before actually proceeding with 
 the blacklisting.  I'm just getting a bit tired of tides of NCQ firmware 
 problems.
 
 Anyways, for the time being, you can easily turn off NCQ using sysfs. 
 Please take a look at http://linux-ata.org/faq.html

ok

-- 
Paolo Ornati
Linux 2.6.20-rc5 on x86_64
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] binfmt_elf: core dump masking support

2007-01-22 Thread Pavel Machek

On Mon 2007-01-22 11:29:40, Kawai, Hidehiro wrote:
 Hi Pavel,
 
 The /proc/pid/ approach doesn't have these demerits, and it
 has an advantage that users can change the bitmask of any process
 at anytime.
 
 Well... not sure if it is advantage. 
 
 For example, consider the following case:
   a process forks many children and system administrator wants to
   allow only one of these processes to dump shared memory.
 
 This is accomplished as follows:
 
  $ echo 1  /proc/self/coremask
  $ ./some_program
  (fork children)
  $ echo 0  /proc/a child's pid/coremask
 
 With the /proc/pid/ interface, we don't need to modify the
 user program.  In contrast, with the ulimit or setrlimit interface,
 the administrator can't do it without modifying the user program
 to call setrlimit.  This will not be preferred.
  
  Yep, otoh process coremask setting can change while it is running,
  that is not expected. Hmm, it can also change while it is dumping
  core, are you sure it is not racy?
 
 Good point, thanks.  I never thought of that.
 We can change the coremask setting while dumping the process's
 memory, and it is problematic.
 
 maydump() function which decides a given VMA may be dumped or not
 is invoked twice per VMAs.  One is at the time of writing a program
 header for a VMA, another is at the time of writing its contents.
 If the coremask setting differs between the two, the program
 header will point wrong place in the core file as its contents.
 
  
  (run echo 1  coremask, echo 0  coremask in a loop while dumping
  core. Do you have enough locking to make it work as expected?)
 
 Currently, any lock isn't acquired.  But I think the kernel only
 have to preserve the coremask setting in a local variable at the
 begining of core dumping.  I'm going to do this in the next version.

No, I do not think that is enough. At minimum, you'd need atomic_t
variable. But I'd recomend against it. Playing with locking is tricky.

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[KORG] Linux history trees

2007-01-22 Thread Jean Delvare

Hi Linus, Thomas, all,

It appears that kernel.org is hosting two git repositories with the
history of the linux kernel development, up to 2.6.12-rc2, which was
originally in bitkeeper. The first one is owned by Linus:
http://www2.kernel.org/git/?p=linux/kernel/git/torvalds/old-2.6-bkcvs.git;a=summary

The second one is owned by Thomas:
http://www2.kernel.org/git/?p=linux/kernel/git/tglx/history.git;a=summary

As both trees serve the same purpose, I was thinking that we could have
a single copy. I see two benefits in doing so:
* Thomas' version is better as far as I can see (it has the author
  names which are missing from Linus' version for example) but I
  suspect most people don't know about it and use Linus' version,
  as I have been doing myself until very recently.
* It might help lower the load on the kernel.org servers (by increasing
  the cache hits.)

So I suggest that Linus deletes his old-2.6-bkcvs tree. What do you
think?

Thanks,
-- 
Jean Delvare
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: SATA exceptions triggered by XFS (since 2.6.18)

On Mon, 22 Jan 2007 18:35:05 +0900
Tejun Heo [EMAIL PROTECTED] wrote:

 Yeap, certainly.  I'll ask people first before actually proceeding with 
 the blacklisting.  I'm just getting a bit tired of tides of NCQ firmware 
 problems.

Another interesting thing: it seems that I'm unable to reproduce the
problem mounting XFS with nobarrier (using sda queue_depth = 31).

So it looks like a problem with NCQ combined with cache flush command...

-- 
Paolo Ornati
Linux 2.6.20-rc5 on x86_64
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2.6.20-rc5 1/1] MM: enhance Linux swap subsystem

2007-01-22 Thread Pavel Machek

Hi1

 My patch is based on my new idea to Linux swap subsystem, you can find more 
 in
 Documentation/vm_pps.txt which isn't only patch illustration but also file
 changelog. In brief, SwapDaemon should scan and reclaim pages on
 UserSpace::vmalist other than current zone::active/inactive. The change will
 conspicuously enhance swap subsystem performance by

No, this is not the way to submit major rewrite of swap subsystem.

You need to (at minimum, making fundamental changes _is_ hard):

1) Fix your mailer not to wordwrap.

2) Get some testing. Identify workloads it improves.

3) Get some _external_ testing. You are retransmitting wordwrapped
patch. That means noone other then you is actually using it.

4) Don't cc me; I'm not mm expert, and I tend to read l-k, anyway.

Pavel

 + Pure Private Page System (pps)
 + Copyright by Yunfeng Zhang on GFDL 1.2

I am not sure GFDL is GPL compatible.

 +// Purpose ([{

You have certainly interesting heading style. What is this markup?
 +
 +// The prototype of the function is fit with the func of int
 +// smp_call_function (void (*func) (void *info), void *info, int retry, int
 +// wait); of include/linux/smp.h of 2.6.16.29. Call it with NULL.
 +void timer_flush_tlb_tasks(void* data /* = NULL */);

I thought I told you to read the CodingStyle in some previous mail?

Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: PROBLEM: KB-KiB, MB - MiB, ... (IEC 60027-2)

2007-01-22 Thread Bernd Petrovitsch

On Mon, 2007-01-22 at 02:56 +0100, Krzysztof Halasa wrote:
 Jan Engelhardt [EMAIL PROTECTED] writes:
 
  Bleh. Except for storage, base 1024 was used for almost everything
  I remember. 4 MB memory meant 4096 KB, and that's still the case today.
  Most likely the same for transfer rates.
 
 Nope, transfer rates were initially 1000-based: 9.6 kbps = 9600 bps,
 28.8 kbps = 28800 bps, 64 kbps = 64000 bps. Then it went 128, 256,
 512 kbps = 512000 bps and 1 Mbps = 2 * 512 kbps = 1024000 bps.

ACK. But this and harddisk sizes are really the only areas.

 But it's limited mostly to serial interfaces. Other networks use
 10, 1000 etc. because they have nothing natural in (powers of) 2
 so 1 Mbps may be 100 bps as well.
 
  It's just that storage vendors broke the computer rule and went with 1000.
 
 1024 etc. is (should be) natural to disks because the sector size
 is 512 B, 2048 B or something like that.

Yes, but it sounds in commercials better if there is a larger number
there. And you can raise the result of a fraction if you lower the
divisor.

Bernd
-- 
Firmix Software GmbH   http://www.firmix.at/
mobil: +43 664 4416156 fax: +43 1 7890849-55
  Embedded Linux Development and Services

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Introduce simple TRUE and FALSE boolean macros.

2007-01-22 Thread Nick Piggin


Robert P. J. Day wrote:


by adding (temporarily) the definitions of TRUE and FALSE to types.h,
you should then (theoretically) be able to delete over 100 instances
of those same macros being *defined* throughout the source tree.
you're not going to be deleting the hundreds and hundreds of *uses* of
TRUE and FALSE (not yet, anyway) but, at the very least, by adding two
lines to types.h, you can delete all those redundant *definitions* and
make sure that nothing breaks.  (it shouldn't, of course, but it's
always nice to be sure.)


Doesn't seem very worthwhile, and it legitimises this definition we're
trying to get rid of.


*now*, once that's done, you can start going through the tree and
doing the conversion from upper case to lower case, little by little,
subsystem by subsystem.


I don't see why your patch is needed before the individual conversions?


the predictable response will be, you really should do that all at
once.


You don't need to do it all at once.

--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: PROBLEM: KB-KiB, MB - MiB, ... (IEC 60027-2)

2007-01-22 Thread Andreas Schwab

Krzysztof Halasa [EMAIL PROTECTED] writes:

 Jan Engelhardt [EMAIL PROTECTED] writes:

 It's just that storage vendors broke the computer rule and went with 1000.

 1024 etc. is (should be) natural to disks because the sector size
 is 512 B, 2048 B or something like that.

But other than the sector size there is no natural power of 2 connected to
disk size.  A disk can have any odd number of sectors.

Andreas.

-- 
Andreas Schwab, SuSE Labs, [EMAIL PROTECTED]
SuSE Linux Products GmbH, Maxfeldstraße 5, 90409 Nürnberg, Germany
PGP key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
And now for something completely different.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Introduce simple TRUE and FALSE boolean macros.

2007-01-22 Thread Robert P. J. Day

On Mon, 22 Jan 2007, Nick Piggin wrote:

 Robert P. J. Day wrote:

  by adding (temporarily) the definitions of TRUE and FALSE to
  types.h, you should then (theoretically) be able to delete over
  100 instances of those same macros being *defined* throughout the
  source tree. you're not going to be deleting the hundreds and
  hundreds of *uses* of TRUE and FALSE (not yet, anyway) but, at the
  very least, by adding two lines to types.h, you can delete all
  those redundant *definitions* and make sure that nothing breaks.
  (it shouldn't, of course, but it's always nice to be sure.)

 Doesn't seem very worthwhile, and it legitimises this definition
 we're trying to get rid of.

h ... apparently, you totally missed my use of the important
word temporarily:

  $ grep -r temporary hack . | wc -l
  16

rday
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Ksummit-2006-discuss] 2007 Linux Kernel Summit

2007-01-22 Thread Christoph Hellwig

On Mon, Jan 22, 2007 at 02:09:17AM -0500, Theodore Ts'o wrote:
 
 Hi folks,
 
   It's time to start kicking off the 2007 Kernel Summit planning
 process.  This year, the Kernel Summit will be held in Cambridge,
 England, at the DeVere University Arms Hotel, September 5-6 (with a
 welcome reception on the 4th).  The decision to move the Kernel Summit
 to England is a one-year experiment based on the very strong request of
 last year's kernel summit attendees to try a location outside of Ottawa,
 and especially from the roughly 1/3rd of the attendees that come from
 the UK or Europe.  So the plan is for us to book the Ottawa Congress
 Ceter space for July 2008 (which we will need to do by mid-year 2007),

Very strong please no from me.  Please move it around to different
venues, if needed in north america again.  kernel summit shouldn't
be a marketing add-on but something on it's own.

While we're at it it would be nice to get rid of all that usenix
and sponsors that get a seat baggage aswell, especially as we've
proven that all small on-topic conferences without that overhead
are a lot more productive.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Serial port blues


Serial port latency is heavily dependant on the HZ rate for data bits
and input side stuff and you can set the low latency flag to improve upon
that. Beyond that if you are using the modem control ioctls then it
depends a lot on the hardware. USB has some implicit queuing on the bus
but generic uarts have very little on the whole.

You should be able to get much better results by using
mlockall(MCL_FUTURE) on the actual test process and setting the priority
into the real time range, in combination with turning on low latency on
the motherboard ports. 

The current -mm kernels also support arbitary baud rate (well 45 or 50
rather than 45.5), although this hasn't yet been enabled for all
platforms or pushed into the base kernel for i386 yet. It will be soon
however and then glibc can be tweaked to use it by default.

Alan
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

dear GOD pls i need ur help

2007-01-22 Thread Iheanacho Vitus

my dearest,father

i am vitus by name and i am an orphan raised in the
motherless babies home i never knew my parents till
today as i am talking to you ,pls i need help before i
do some thing that will lead me to my ealy grave i
thank God for those people who have real nice life
they should always thank their God that made a good
way for them not like me that has no trace growing up
to be in the mix of the rejected i shead a lot of
tears as i type this letter i need some one to come
and adopt me i swear i will be a good child and as for
my real parents i hope GOD forgives them where ever
they are ,pls breathen if you recive this letter of my
as an insult pls forgive me it is said that when a man
is in a critical post he will do anything to get out
of it all i need is a fatherly love i have never
experence it before you can call me to know me more
+2348023668365 it is not easy to be alone  i need
someone to take me out of this motherless home i need
a home i need to be adopted i am 23yrs of age and am
still looking for a job well i hope someone out there
will hear my cry and help.

takecare and Godbless you amen.

vitus iheanacho





___ 
New Yahoo! Mail is the ultimate force in competitive emailing. Find out more at 
the Yahoo! Mail Championships. Plus: play games and win prizes. 
http://uk.rd.yahoo.com/evt=44106/*http://mail.yahoo.net/uk 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [patch 03/26] Dynamic kernel command-line - arm

2007-01-22 Thread Alon Bar-Lev


On 1/18/07, Bodo Eggert [EMAIL PROTECTED] wrote:

Alon Bar-Lev [EMAIL PROTECTED] wrote:
 On 1/18/07, Russell King [EMAIL PROTECTED] wrote:
 On Thu, Jan 18, 2007 at 01:58:52PM +0100, Bernhard Walle wrote:
  2. Set command_line as __initdata.

 You can't.

  -static char command_line[COMMAND_LINE_SIZE];
  +static char __initdata command_line[COMMAND_LINE_SIZE];

 Uninitialised data is placed in the BSS.  Adding __initdata to BSS
 data causes grief.

 There are many places in kernel that uses __initdata for uninitialized
 variables.

 For example:

 static int __initdata is_chipset_set[MAX_HWIFS];

 So all these current places are wrong?
 If I initialize the data will it be OK.

objdump -t vmlinux |grep -3 is_chipset_set suggests that it's placed
into .init.data here, not into .bss.


Russell ?
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH 2/4] atl1: Header files for Attansic L1 driver

2007-01-22 Thread Francois Romieu

 diff --git a/drivers/net/atl1/atl1_hw.h b/drivers/net/atl1/atl1_hw.h
 new file mode 100644
 index 000..0450b77
 --- /dev/null
 +++ b/drivers/net/atl1/atl1_hw.h
[...]
 +/*  MII definition */
 +/* PHY Common Register */
 +#define MII_BMCR 0x00
 +#define MII_BMSR 0x01
 +#define MII_PHYSID1  0x02
 +#define MII_PHYSID2  0x03
 +#define MII_ADVERTISE0x04
 +#define MII_LPA  0x05
 +#define MII_EXPANSION0x06
[snip]

It duplicates a lot of #define already available in include/linux/mii.h.

-- 
Ueimor
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2.6.20-rc5 1/7] ehea: Fixed wrong dereferencation

Not only check the pointer against 0 but also the dereferenced value

Signed-off-by: Thomas Klein [EMAIL PROTECTED]
---


 drivers/net/ehea/ehea.h  |2 +-
 drivers/net/ehea/ehea_main.c |6 --
 2 files changed, 5 insertions(+), 3 deletions(-)


diff -Nurp -X dontdiff linux-2.6.20-rc5/drivers/net/ehea/ehea.h 
patched_kernel/drivers/net/ehea/ehea.h
--- linux-2.6.20-rc5/drivers/net/ehea/ehea.h2007-01-12 19:54:26.0 
+0100
+++ patched_kernel/drivers/net/ehea/ehea.h  2007-01-19 13:56:41.0 
+0100
@@ -39,7 +39,7 @@
 #include asm/io.h
 
 #define DRV_NAME   ehea
-#define DRV_VERSIONEHEA_0043
+#define DRV_VERSIONEHEA_0044
 
 #define EHEA_MSG_DEFAULT (NETIF_MSG_LINK | NETIF_MSG_TIMER \
| NETIF_MSG_RX_ERR | NETIF_MSG_TX_ERR)
diff -Nurp -X dontdiff linux-2.6.20-rc5/drivers/net/ehea/ehea_main.c 
patched_kernel/drivers/net/ehea/ehea_main.c
--- linux-2.6.20-rc5/drivers/net/ehea/ehea_main.c   2007-01-12 
19:54:26.0 +0100
+++ patched_kernel/drivers/net/ehea/ehea_main.c 2007-01-19 13:58:01.0 
+0100
@@ -2471,14 +2471,16 @@ static int __devinit ehea_probe(struct i
 
adapter_handle = (u64*)get_property(dev-ofdev.node, ibm,hea-handle,
NULL);
-   if (!adapter_handle) {
+   if (adapter_handle)
+   adapter-handle = *adapter_handle;
+
+   if (!adapter-handle) {
dev_err(dev-ofdev.dev, failed getting handle for adapter
 '%s'\n, dev-ofdev.node-full_name);
ret = -ENODEV;
goto out_free_ad;
}
 
-   adapter-handle = *adapter_handle;
adapter-pd = EHEA_PD_ID;
 
dev-ofdev.dev.driver_data = adapter;

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2.6.20-rc5 2/7] ehea: Fixing firmware queue config issue

Fix to use exactly one queue for incoming packets in all
firmware configurations

Signed-off-by: Thomas Klein [EMAIL PROTECTED]
---


 drivers/net/ehea/ehea_main.c |2 +-
 1 files changed, 1 insertion(+), 1 deletion(-)


diff -Nurp -X dontdiff linux-2.6.20-rc5/drivers/net/ehea/ehea_main.c 
patched_kernel/drivers/net/ehea/ehea_main.c
--- linux-2.6.20-rc5/drivers/net/ehea/ehea_main.c   2007-01-19 
13:59:07.0 +0100
+++ patched_kernel/drivers/net/ehea/ehea_main.c 2007-01-19 14:01:38.0 
+0100
@@ -998,7 +998,7 @@ static int ehea_configure_port(struct eh
 | EHEA_BMASK_SET(PXLY_RC_JUMBO_FRAME, 1);
 
for (i = 0; i  port-num_def_qps; i++)
-   cb0-default_qpn_arr[i] = port-port_res[i].qp-init_attr.qp_nr;
+   cb0-default_qpn_arr[i] = port-port_res[0].qp-init_attr.qp_nr;
 
if (netif_msg_ifup(port))
ehea_dump(cb0, sizeof(*cb0), ehea_configure_port);


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2.6.20-rc5 3/7] ehea: Modified initial autoneg state determination

Logical partitions are not allowed to (try to) set the autonegotiation status.
This patch removes the respective function call from the port setup function.

Signed-off-by: Thomas Klein [EMAIL PROTECTED]
---


 drivers/net/ehea/ehea_main.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)


diff -Nurp -X dontdiff linux-2.6.20-rc5/drivers/net/ehea/ehea_main.c 
patched_kernel/drivers/net/ehea/ehea_main.c
--- linux-2.6.20-rc5/drivers/net/ehea/ehea_main.c   2007-01-19 
14:02:20.0 +0100
+++ patched_kernel/drivers/net/ehea/ehea_main.c 2007-01-19 14:11:30.0 
+0100
@@ -642,6 +642,8 @@ int ehea_sense_port_attr(struct ehea_por
break;
}
 
+   port-autoneg = 1;
+
/* Number of default QPs */
port-num_def_qps = cb0-num_default_qps;
 
@@ -2334,8 +2336,6 @@ static int ehea_setup_single_port(struct
 
INIT_LIST_HEAD(port-mc_list-list);
 
-   ehea_set_portspeed(port, EHEA_SPEED_AUTONEG);
-
ret = ehea_sense_port_attr(port);
if (ret)
goto out;

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2.6.20-rc5 4/7] ehea: New method to determine number of available ports

Count OFDT nodes to determine the number of available ports
instead of using the possibly outdated value from the hypervisor

Signed-off-by: Thomas Klein [EMAIL PROTECTED]
---


 drivers/net/ehea/ehea_main.c |   15 ++-
 1 files changed, 14 insertions(+), 1 deletion(-)


diff -Nurp -X dontdiff linux-2.6.20-rc5/drivers/net/ehea/ehea_main.c 
patched_kernel/drivers/net/ehea/ehea_main.c
--- linux-2.6.20-rc5/drivers/net/ehea/ehea_main.c   2007-01-19 
14:12:31.0 +0100
+++ patched_kernel/drivers/net/ehea/ehea_main.c 2007-01-19 14:15:53.0 
+0100
@@ -2269,6 +2269,8 @@ static void ehea_tx_watchdog(struct net_
 int ehea_sense_adapter_attr(struct ehea_adapter *adapter)
 {
struct hcp_query_ehea *cb;
+   struct device_node *lhea_dn = NULL;
+   struct device_node *eth_dn = NULL;
u64 hret;
int ret;
 
@@ -2285,7 +2287,18 @@ int ehea_sense_adapter_attr(struct ehea_
goto out_herr;
}
 
-   adapter-num_ports = cb-num_ports;
+   /* Determine the number of available logical ports
+* by counting the child nodes of the lhea OFDT entry
+*/
+   adapter-num_ports = 0;
+   lhea_dn = of_find_node_by_name(lhea_dn, lhea);
+   do {
+   eth_dn = of_get_next_child(lhea_dn, eth_dn);
+   if (eth_dn)
+   adapter-num_ports++;
+   } while ( eth_dn );
+   of_node_put(lhea_dn);
+
adapter-max_mc_mac = cb-max_mc_mac - 1;
ret = 0;
 

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2.6.20-rc5 5/7] ehea: Improved logging of permission issues

Disabled dump of hcall regs on some permission issues and
fixed appropriate misleading logmessages

Signed-off-by: Thomas Klein [EMAIL PROTECTED]
---


 drivers/net/ehea/ehea_main.c |   16 +++-
 drivers/net/ehea/ehea_phyp.c |   10 --
 2 files changed, 15 insertions(+), 11 deletions(-)


diff -Nurp -X dontdiff linux-2.6.20-rc5/drivers/net/ehea/ehea_main.c 
patched_kernel/drivers/net/ehea/ehea_main.c
--- linux-2.6.20-rc5/drivers/net/ehea/ehea_main.c   2007-01-19 
14:16:35.0 +0100
+++ patched_kernel/drivers/net/ehea/ehea_main.c 2007-01-19 14:22:42.0 
+0100
@@ -730,10 +730,7 @@ int ehea_set_portspeed(struct ehea_port 
}
} else {
if (hret == H_AUTHORITY) {
-   ehea_info(Hypervisor denied setting port speed. Either
-  this partition is not authorized to set 
- port speed or another partition has modified
-  port speed first.);
+   ehea_info(Hypervisor denied setting port speed);
ret = -EPERM;
} else {
ret = -EIO;
@@ -1487,11 +1484,12 @@ out:
 
 static void ehea_promiscuous_error(u64 hret, int enable)
 {
-   ehea_info(Hypervisor denied %sabling promiscuous mode.%s,
- enable == 1 ? en : dis,
- hret != H_AUTHORITY ?  :  Another partition owning a 
- logical port on the same physical port might have altered 
- promiscuous mode first.);
+   if (hret == H_AUTHORITY)
+   ehea_info(Hypervisor denied %sabling promiscuous mode,
+ enable == 1 ? en : dis);
+   else
+   ehea_error(failed %sabling promiscuous mode,
+  enable == 1 ? en : dis);
 }
 
 static void ehea_promiscuous(struct net_device *dev, int enable)
diff -Nurp -X dontdiff linux-2.6.20-rc5/drivers/net/ehea/ehea_phyp.c 
patched_kernel/drivers/net/ehea/ehea_phyp.c
--- linux-2.6.20-rc5/drivers/net/ehea/ehea_phyp.c   2007-01-12 
19:54:26.0 +0100
+++ patched_kernel/drivers/net/ehea/ehea_phyp.c 2007-01-19 14:23:31.0 
+0100
@@ -94,6 +94,7 @@ static long ehea_plpar_hcall9(unsigned l
 {
long ret;
int i, sleep_msecs;
+   u8 cb_cat;
 
for (i = 0; i  5; i++) {
ret = plpar_hcall9(opcode, outs,
@@ -106,7 +107,13 @@ static long ehea_plpar_hcall9(unsigned l
continue;
}
 
-   if (ret  H_SUCCESS)
+   cb_cat = EHEA_BMASK_GET(H_MEHEAPORT_CAT, arg2);
+
+   if ((ret  H_SUCCESS)  !(((ret == H_AUTHORITY)
+(opcode == H_MODIFY_HEA_PORT))
+(((cb_cat == H_PORT_CB4)  ((arg3 == H_PORT_CB4_JUMBO)
+   || (arg3 == H_PORT_CB4_SPEED))) || ((cb_cat == H_PORT_CB7)
+(arg3 == H_PORT_CB7_DUCQPN)
ehea_error(opcode=%lx ret=%lx
arg1=%lx arg2=%lx arg3=%lx arg4=%lx
arg5=%lx arg6=%lx arg7=%lx arg8=%lx
@@ -120,7 +127,6 @@ static long ehea_plpar_hcall9(unsigned l
   outs[0], outs[1], outs[2], outs[3],
   outs[4], outs[5], outs[6], outs[7],
   outs[8]);
-
return ret;
}
 

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2.6.20-rc5 6/7] ehea: Added logging off associated errors

Added logging of error events associated with a specific queue pair

Signed-off-by: Thomas Klein [EMAIL PROTECTED]
---


 drivers/net/ehea/ehea_main.c |8 
 1 files changed, 4 insertions(+), 4 deletions(-)


diff -Nurp -X dontdiff linux-2.6.20-rc5/drivers/net/ehea/ehea_main.c 
patched_kernel/drivers/net/ehea/ehea_main.c
--- linux-2.6.20-rc5/drivers/net/ehea/ehea_main.c   2007-01-19 
14:25:38.0 +0100
+++ patched_kernel/drivers/net/ehea/ehea_main.c 2007-01-19 14:31:34.0 
+0100
@@ -558,12 +558,12 @@ static irqreturn_t ehea_qp_aff_irq_handl
u32 qp_token;
 
eqe = ehea_poll_eq(port-qp_eq);
-   ehea_debug(eqe=%p, eqe);
+
while (eqe) {
-   ehea_debug(*eqe=%lx, *(u64*)eqe);
-   eqe = ehea_poll_eq(port-qp_eq);
qp_token = EHEA_BMASK_GET(EHEA_EQE_QP_TOKEN, eqe-entry);
-   ehea_debug(next eqe=%p, eqe);
+   ehea_error(QP aff_err: entry=0x%lx, token=0x%x,
+  eqe-entry, qp_token);
+   eqe = ehea_poll_eq(port-qp_eq);
}
 
return IRQ_HANDLED;

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

[PATCH 2.6.20-rc5 7/7] ehea: Fixed possible nullpointer access

Fixed possible nullpointer access in event queue processing

Signed-off-by: Thomas Klein [EMAIL PROTECTED]
---


 drivers/net/ehea/ehea_main.c |5 +++--
 1 files changed, 3 insertions(+), 2 deletions(-)


diff -Nurp -X dontdiff linux-2.6.20-rc5/drivers/net/ehea/ehea_main.c 
patched_kernel/drivers/net/ehea/ehea_main.c
--- linux-2.6.20-rc5/drivers/net/ehea/ehea_main.c   2007-01-19 
14:33:04.0 +0100
+++ patched_kernel/drivers/net/ehea/ehea_main.c 2007-01-19 14:36:05.0 
+0100
@@ -575,8 +575,9 @@ static struct ehea_port *ehea_get_port(s
int i;
 
for (i = 0; i  adapter-num_ports; i++)
-   if (adapter-port[i]-logical_port_id == logical_port)
-   return adapter-port[i];
+   if (adapter-port[i])
+   if (adapter-port[i]-logical_port_id == logical_port)
+   return adapter-port[i];
return NULL;
 }
 

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: Suspend to RAM generates oops and general protection fault

2007-01-22 Thread Rafael J. Wysocki

Hi,

On Monday, 22 January 2007 03:34, Jean-Marc Valin wrote:
 Hi,
 
 I just encountered the following oops and general protection fault
 trying to suspend/resume my laptop. I've got a Dell D820 laptop with a 2
 GHz Core 2 Duo CPU. It usually suspends/resumes fine but not always. The
 relevant errors are below but the full dmesg log is at
 http://people.xiph.org/~jm/suspend_resume_oops.txt and my config is in
 http://people.xiph.org/~jm/config-2.6.20-rc5.txt
 
 This happens when I'm running 2.6.20-rc5. The previous kernel version I
 was using is 2.6.19-rc6 and was much more broken (second attempt
 *always* failed), so it's probably not a regression.

This is a shot against the odds, but could you please check if the attached
patch has any effect?

Rafael


-- 
If you don't have the time to read,
you don't have the time or the tools to write.
- Stephen King
Both process_zones()and drain_node_pages() check for populated zones before
touching pagesets. However, __drain_pages does not do so,

This may result in a NULL pointer dereference for pagesets in unpopulated
zones if a NUMA setup is combined with cpu hotplug.

Initially the unpopulated zone has the pcp pointers pointing to the boot
pagesets.  Since the zone is not populated the boot pageset pointers will
not be changed during page allocator and slab bootstrap.

If a cpu is later brought down (first call to __drain_pages()) then the pcp
pointers for cpus in unpopulated zones are set to NULL since __drain_pages
does not first check for an unpopulated zone.

If the cpu is then brought up again then we call process_zones() which will ignore
the unpopulated zone. So the pageset pointers will still be NULL.

If the cpu is then again brought down then __drain_pages will attempt to drain
pages by following the NULL pageset pointer for unpopulated zones.

Signed-off-by: Christoph Lameter [EMAIL PROTECTED]

---
 mm/page_alloc.c |3 +++
 1 file changed, 3 insertions(+)

Index: linux-2.6.20-rc4/mm/page_alloc.c
===
--- linux-2.6.20-rc4.orig/mm/page_alloc.c
+++ linux-2.6.20-rc4/mm/page_alloc.c
@@ -714,6 +714,9 @@ static void __drain_pages(unsigned int c
 		if (!populated_zone(zone))
 			continue;
 
+		if (!populated_zone(zone))
+			continue;
+
 		pset = zone_pcp(zone, cpu);
 		for (i = 0; i  ARRAY_SIZE(pset-pcp); i++) {
 			struct per_cpu_pages *pcp;

Re: [PATCH] Introduce simple TRUE and FALSE boolean macros.

2007-01-22 Thread Nick Piggin


Robert P. J. Day wrote:

On Mon, 22 Jan 2007, Nick Piggin wrote:



Robert P. J. Day wrote:



by adding (temporarily) the definitions of TRUE and FALSE to
types.h, you should then (theoretically) be able to delete over
100 instances of those same macros being *defined* throughout the
source tree. you're not going to be deleting the hundreds and
hundreds of *uses* of TRUE and FALSE (not yet, anyway) but, at the
very least, by adding two lines to types.h, you can delete all
those redundant *definitions* and make sure that nothing breaks.
(it shouldn't, of course, but it's always nice to be sure.)


Doesn't seem very worthwhile, and it legitimises this definition
we're trying to get rid of.



h ... apparently, you totally missed my use of the important
word temporarily:


No, I didn't.

--
SUSE Labs, Novell Inc.
Send instant messages to your online friends http://au.messenger.yahoo.com 


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: SATA exceptions with 2.6.20-rc5

2007-01-22 Thread Chr

On Monday, 22. January 2007 03:39, Tejun Heo wrote:
 Hello,
 
 Chr wrote:
  Ok, you won't believe this... I opened my case and rewired my drives... 
  And guess what, my second (aka the good) HDD is now failing! 
  I guess, my mainboard has a (but maybe two, or three :( ) bad 
  sata-port(s)!  
 
 Or, you have power related problem.  Try to rewire the power lines or 
 connect harddrives to a separate powersupply.  It's often useful to 
 change one component at a time and watch which change the problem 
 follows.  Anyways, you seem to be suffering transmission failures, not a 
 driver problem.
 
 Thanks.
 

Yes and no, it's probably not a power problem, I've tried another
PSU with the same result :( . Futhermore, the RAID0 setup makes
it impossible to try only one drive alone :(. 

Anyway,the WD2500KS is known to have some strange bugs in the FW.
e.g.: It reports 255°C right after a cold start. 
( http://www.bugtrack.almico.com/view.php?id=468 ).

Thanks,
Chr.
 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: configfs: return value for drop_item()/make_item()?

2007-01-22 Thread Michael Noisternig


Thanks for your reply again! See comments inline...

Joel Becker wrote:

I fully agree with the idea of configfs not being allowed to destroy
user-created objects. OTOH, while configfs is described as a filesystem
for user-created objects under user control, compared to sysfs as a
filesystem for kernel-created objects under kernel control, configfs
_does_ permit kernel-created objects in a limited way (by filling in
struct config_group-default_groups), and these objects can only be
destroyed again by the kernel, and not by the user.


They are not created by a kernel action.  They are created as a
direct result of a user action.  The user must mkdir(2) the parent in
the chain.  Only then do these default_groups appear.  Contrast sysfs,
where filesystem structures can be added at any time, from any codepath,
via the sysfs in-kernel API.


Sure, but what I meant to say was that the user, when creating a 
directory, did not request creation of such sub-directories, so I see 
them as created by the kernel.


If you argue that they are in fact created by the user because they are 
a direct result of a user action, then I can apply the same argument to 
this one example:



For another example, and directly related to above link, suppose
having an object with a number of attributes, one of them being
called 'type'. Dependent on the value of 'type' this object may
contain a particular type of sub-object (with type-dependent
attributes). E.g. 'type' may be empty | 'a' | 'b' | 'ab', then
dependent on this there should be 0 or 1 directories called 'a',
and 0 or 1 directories called 'b'. Doing it this way means that
while the user decides what sub-directory/-ies the object has, he
does not create them (directly) - it is the kernel which creates 
the object, and as such it is also the kernel only which is

permitted to destroy the object again - by the user writing a
different value to the 'type' attribute (or clearing it). sysconfs
could solve this.


This is precisely what configfs is designed to forbid. The kernel
does not, ever, create configfs objects on its own. It does it as a
result of userspace action.


No. The sub-directory only appears as a direct result of the user 
writing a value into the 'type' attribute. ;-)



If you want the following:

# cd mydir
# ls -l
-rw-r--r-- 1 root root 0 2006-12-28 07:11 type
# echo 'ab'  type
# ls -l mydir
drwxr-xr-x 2 root root 4096 2007-01-08 14:21 ab
-rw-r--r-- 1 root root 2 2007-01-08 14:21 type
# echo ''  type
# ls -l mydir
-rw-r--r-- 1 root root 0 2007-01-08 14:22 type

you're never going to get it from configfs. You should be using
sysfs.


Hardly. sysfs doesn't allow the user creating directories. :


As such I don't understand fully why one doesn't want to merge sysfs and
configfs (let's call it sysconfs for now) in such a way that it allows
_both_ user- and kernel-created objects, with user-objects only allowed
to be destroyed by the user and kernel-objects only by the kernel.


The programming interface is very, very different.  Check out
historical messages on this topic:

http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg95708.html
http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg95711.html
http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg95714.html
http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg95717.html


Well, you could still use type (user object/kernel object) dependent 
structure pointers?



Often however, what you want is that an object may contain 0 or 1 other
objects. If -make_item()/make_group() would allow returning a
meaningful error code the kernel could deny creation of a 2nd object
other than by pretending to be out of memory.


You make a reasonable case that ENOMEM isn't always the error
you want, but perhaps we can add a better umbrella error code?  I'm wary
of introducing PTR_ERR() or any other complexity if we don't _need_ it.
I'm all for thoughts on possibly compromises.



I was thinking about
ssize_t make_item(struct config_group *group, const char *name, struct
config_item **new_item)
with return value 0 meaning no-error.


Sure, it's another way to go, but it's effectively the same
thing.


Well, you don't need PTR_ERR().


I was thinking about having symlinks in a directory and deriving the
order by the symlinks' filenames, too. I dismissed it originally for two
reasons. First, I didn't see how to keep the order when some link gets
deleted, e.g. there's 1,2,3 and then link 2 gets deleted. Now, thinking
about it again, I can simply keep a ordered linked list internally, and
therefrom remove the node for link 2. But it's still not perfect,
because how do I insert a link between filenames 1 and 2? Ok, I have to
delete all symlinks first and then rebuild them, and in the end it's
like rewriting a params_list attribute file... except that it's not
atomic. Second, a simple params_list file seems a lot more easy to
handle from the user perspective... simply open

Re: [Ksummit-2006-discuss] 2007 Linux Kernel Summit

2007-01-22 Thread Alan Cox

On Mon, Jan 22, 2007 at 12:07:11PM +0100, Christoph Hellwig wrote:
  process.  This year, the Kernel Summit will be held in Cambridge,
  England, at the DeVere University Arms Hotel, September 5-6 (with a
  welcome reception on the 4th).  The decision to move the Kernel Summit
  to England is a one-year experiment based on the very strong request of
  last year's kernel summit attendees to try a location outside of Ottawa,
  and especially from the roughly 1/3rd of the attendees that come from
  the UK or Europe.  So the plan is for us to book the Ottawa Congress
  Ceter space for July 2008 (which we will need to do by mid-year 2007),

Ditto..

Definitely disagree with that. I'd like to see the conference somewhere
else different this time - perhaps Czech Republic, or somewhere else more
easterly and Linux active (or even Finland...)

 While we're at it it would be nice to get rid of all that usenix

Well if you want to organise and fund it yourself 8)

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [ANNOUNCE] System Inactivity Monitor v1.0

2007-01-22 Thread Alessandro Di Marco

Pavel Machek [EMAIL PROTECTED] writes:

+if [ ! -d /proc/sin ]; then
+echo /proc/sin not found, has sinmod been loaded?
+exit
+fi

   No new /proc files, please.

This was merely a prototype realized in a hurry, not a production
driver. Really, I did't think it could be interesting for anybody.

Would be /sys ok?

+cat EOF
+
+SIN wakes up periodically and checks for user activity occurred in the
+meantime; this options lets you to specify how much frequently SIN should 
be
+woken-up. Its value is expressed in tenth of seconds.

   Heh. We'll waste power trying to save it.

Well, not just a power saver. For example I use SIN to auto-logoff my bash
session as well (detaching the screen session.)

   If you have to hook it into kernel, can you at least do it properly?

Of course. You can find attached a patch fixing this. Now SIN wakes up just
when it expects to do something: if in the meantime the user interacts with the
system, SIN simply recalculates the next wake-up time on the basis of the last
user's activity date and goes to sleep again.

Best,

---
 gentable |   72 +-
 procfs.c |2 +-
 sin.c|   68 
 sin.h|   36 -
 table.c  |  132 --
 table.h  |   21 +-
 6 files changed, 176 insertions(+), 155 deletions(-)

diff --git a/gentable b/gentable
index 44b4f77..3a322df 100755
--- a/gentable
+++ b/gentable
@@ -31,23 +31,9 @@ fi

 cat EOF

-SIN wakes up periodically and checks whether user activity has occurred
-since it last ran; the next option lets you to specify how frequently
-SIN should wake up. Its value is expressed in tenth of seconds.
-
-EOF
-
-input Pace ticks? pace
-
-if [ -z ${pace} ]; then
-pace=10
-fi
-
-cat EOF
-
-Asleep or not, SIN constantly monitors the input devices watching for user
-activity. The next option lets you choose which device have to be monitored.
-You must specify at least one device and must not specify duplicates.
+SIN constantly monitors the input devices watching for user activity. This
+option lets you choose which device have to be monitored. You must specify at
+least one device and must not specify duplicates.

 EOF

@@ -65,8 +51,8 @@ devices=(${devs})

 cat EOF

-SIN produces ACPI events depending on the user activity. You must
-specify a suitable handler that will be used as originator.
+SIN produces ACPI events depending on the user activity. You must specify a
+suitable handler that will be used as originator.

 EOF

@@ -83,18 +69,17 @@ fi
 cat EOF

 SIN produces events based on rules. Each rule is a triple composed by a
-counter, a type, and a data value. When SIN awakens, a global counter
-is increased if SIN detects no user activity and reset to zero, otherwise.
-When this global counter reaches the value specified in the counter field
-of a rule, an event is generated with the corresponding type and data.
-Different rules should have different type and data fields to convey
-different signals to the user space daemon.
+target, a type, and a data value. The target field is a timeout in
+tenth of seconds specifying the minimum period of user inactivity needed to
+trigger the rule. When a rule triggers, an event is generated with the
+corresponding type and data.  Different rules should have different type
+and data fields to convey different signals to the user space daemon.

-For example, the rule 60 1 19 produces the ACPI event  0001
-0019 when SIN recognizes one minute of user inactivity (assuming pace=10.)
+For example, the rule 600 1 19 produces the ACPI event  0001
+0019 when SIN recognizes one minute of user inactivity.

-Please specify each rule as a space-separated triple on a separate line;
-when finished, just press enter.
+Please specify each rule as a space-separated triple on a separate line; when
+finished, just press enter.

 EOF

@@ -114,9 +99,9 @@ fi

 cat EOF

-A special event has been provided to simplify using SIN
-as a screen-blanker. It will be generated as soon as some user activity is
-detected, but only after one or more rules have been triggered.
+A special event has been provided to simplify using SIN as a screen-blanker. It
+will be generated as soon as some user activity is detected, but only after one
+or more rules have been triggered.

 EOF

@@ -128,15 +113,14 @@ fi

 cat EOF

-Often an SIN event results in suspending or hibernating the system,
-hibernate, requiring user interaction to wake-up the system. Unfortunately
-that interaction occurs when SIN, as well as the kernel, cannot capture
-it. As a consequence, no event will ever be generated and
-the system will remain in the state associated with the next-to-last rule
-(e.g. blanked screen, wireless powered off, etc.). The next option
-allows you to request a special event, resetting the global
-counter to an arbitrary value, so to restart the

Re: [Ksummit-2006-discuss] 2007 Linux Kernel Summit

2007-01-22 Thread Theodore Tso

On Mon, Jan 22, 2007 at 07:45:02AM -0500, Alan Cox wrote:
 
 Definitely disagree with that. I'd like to see the conference somewhere
 else different this time - perhaps Czech Republic, or somewhere else more
 easterly and Linux active (or even Finland...)
 

Understand that one of the feedback that I get from the keepers of the
corporate travel budgets is that money for sending employees to exotic
locations is finite --- which is why we haven't tried pairing the
kernel summit with linux.conf.au.  Cambridge works out because there
are relatively cheap flights to Amsterdam and then you can take a
cheap Ryan Air flight to Stanisted.  Still, the fact that it isn't
paired with another conference means that we are getting some
expressions of unhappiness from other Kernel Summit stakeholders.
It's for that reason that (a) I'm trying to line up some folks who
might be interested in trying to put together a relatively small,
2-day technical conference after the Kernel Summit, which can
hopefully serve as a seed for something like OLS and LCA in UK/Europe,
and (b) I've told folks that the moving it away from Cambridge is a
one-time experiment, after which point we will re-evaluate.

I understand that if it were only up to us developers, we'd want to
have the conference in Honolulu, or perhaps in Australia or New
Zeland.  Unfortunately there are other stakeholers and other financial
realities involved.

  While we're at it it would be nice to get rid of all that usenix
 
 Well if you want to organise and fund it yourself 8)

The sponsors help pay for the conference venue, as well as travel
scholoarships for those people who don't have corporate affiliations,
or whose companies refuse to pay their travel, and who were important
that they be there.  One of my concerns is if we have too many kernel
developers where their employes refuse to pay travel, we won't have
enough travel scholoarship money.

It's a somewhat tricky balancing act.

- Ted
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Ksummit-2006-discuss] 2007 Linux Kernel Summit

 hopefully serve as a seed for something like OLS and LCA in UK/Europe,
 and (b) I've told folks that the moving it away from Cambridge is a
 one-time experiment, after which point we will re-evaluate.

Perhaps that will work out for the best, it may be the right answer long
term is to alternate anyway ?

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

serial console problem in linux-2.6-20

2007-01-22 Thread Suresh Chandra Mannava


Hi All,

I am working on porting linux-2.6.20-rc2 (DENX) kernel to our board. It
consists of powerpc MPC7410, IBM CPC700 system controller and couple of AMD
79C972 network chips.
I am using gcc version 4.0.0 (DENX ELDK 4.0 4.0.0) cross compiler for this
task.
I followed IBM spruce which consists of CPC700 as. CPC700 serial port is 
16550

compatible.
I can see printk's  on serial console till Freeing unused kernel memory,
this happens before starting of init.
I enabled debug statements in 8250.c and found some messages like
serial8250_interrupt(3)...end and kernel freezes ( I attached serial
console messages). ttyS0 is using interrupt 3.

I assume it is not a tool chain or ramdisk image problem because I ported
linux-2.4 (DENX) with the same tool chain and ramdisk image.
Serial console is working fine in linux-2.4.

I request you to provide some pointers for the same.

Thanks,
Suresh

_
Always wanted to be a writer? Here's your chance! 
http://content.msn.co.in/Contribute/Default.aspx

Total memory = 128MB; using 256kB for hash table (at c028)
Linux version 2.6.20-rc5 ([EMAIL PROTECTED]) (gcc version 4.0.0 (DENX ELDK 4.0 
4.0.0)) #28 Sat Jan 20 21:26:52 IST 2007

System Identification: Cornet CSVG4 Linux Boot
Zone PFN ranges:
 DMA 0 -32768
 Normal  32768 -32768
early_node_map[1] active PFN ranges
   0:0 -32768
Built 1 zonelists.  Total pages: 32512
Kernel command line: console=ttyS0,57600 root=/dev/ram0 rw
PID hash table entries: 512 (order: 9, 2048 bytes)
Console: colour dummy device 80x25
Dentry cache hash table entries: 16384 (order: 4, 65536 bytes)
Inode-cache hash table entries: 8192 (order: 3, 32768 bytes)
Memory: 126492k available (1796k kernel code, 480k data, 112k init, 0k 
highmem)

Calibrating delay loop... 731.13 BogoMIPS (lpj=1462272)
Mount-cache hash table entries: 512
NET: Registered protocol family 16
PCI: Probing PCI hardware
NET: Registered protocol family 2
IP route cache hash table entries: 1024 (order: 0, 4096 bytes)
TCP established hash table entries: 4096 (order: 2, 16384 bytes)
TCP bind hash table entries: 2048 (order: 1, 8192 bytes)
TCP: Hash tables configured (established 4096 bind 2048)
TCP reno registered
checking if image is initramfs...it isn't (no cpio magic); looks like an 
initrd

Freeing initrd memory: 637k freed
io scheduler noop registered
io scheduler anticipatory registered
io scheduler deadline registered
io scheduler cfq registered (default)
Serial: 8250/16550 driver $Revision: 1.90 $ 2 ports, IRQ sharing disabled
ttyS0: autoconf (0x, 0xff600300): .%ï¿½..%ï¿½6.)=.type=16550A
serial8250: ttyS0 at MMIO 0x0 (irq = 3) is a 16550A
ttyS1: autoconf (0x, 0xff600400): iir=3 iir1=6 iir2=6 type=16550A
serial8250: ttyS1 at MMIO 0x0 (irq = 4) is a 16550A
RAMDISK driver initialized: 16 RAM disks of 4096K size 1024 blocksize
nbd: registered device at major 43
pcnet32.c:v1.33 27.Jun.2006 [EMAIL PROTECTED]
pcnet32: PCnet/FAST+ 79C972 at 0x3ffefe0, 00 00 00 00 00 00
   tx_start_pt(0x0c00):~220 bytes, BCR18(9861):BurstWrEn BurstRdEn NoUFlow
   SRAMSIZE=0x, SRAM_BND=0x, assigned IRQ 22.
eth0: registered as PCnet/FAST+ 79C972
pcnet32: PCnet/FAST+ 79C972 at 0x3ffefc0, 00 00 00 00 00 00
   tx_start_pt(0x0c00):~220 bytes, BCR18(9861):BurstWrEn BurstRdEn NoUFlow
   SRAMSIZE=0x, SRAM_BND=0x, assigned IRQ 23.
eth1: registered as PCnet/FAST+ 79C972
pcnet32: 2 cards_found.
mice: PS/2 mouse device common for all mice
IPv4 over IPv4 tunneling driver
GRE over IPv4 tunneling driver
TCP cubic registered
Initializing XFRM netlink socket
NET: Registered protocol family 1
NET: Registered protocol family 10
lo: Disabled Privacy Extensions
IPv6 over IPv4 tunneling driver
NET: Registered protocol family 17
NET: Registered protocol family 15
ieee80211: 802.11 data/management/control stack, git-1.1.13
ieee80211: Copyright (C) 2004-2005 Intel Corporation 
[EMAIL PROTECTED]

RAMDISK: Compressed image found at block 0
VFS: Mounted root (ext2 filesystem).
Freeing unused kernel memory: 112k init
serial8250_interrupt(3)...end.
serial8250_interrupt(3)...end.
serial8250_interrupt(3)...end.
serial8250_interrupt(3)...end.

Re: SATA ahci Bug in 2.6.19.x

2007-01-22 Thread Stephen Evanchik


On 1/22/07, Stefan Priebe - FH [EMAIL PROTECTED] wrote:


I've an Asus A8V Mainboard which works wonderful with a 2.6.18.X kernel.
But i cannot use the SATA Controller with a 2.6.19.x Kernel.


I also have an Asus A8V motherboard that cannot boot a newer kernel
because the SATA controller does not come up properly. I have tried
kernels 2.6.19.2 and 2.6.20-rc5 with no luck. It looks like later
kernels don't recognize the proper IRQ of the device as compared to
the 2.6.18 boot logs.


ACPI: PCI Interrupt :00:0f.0[B] - GSI 21 (level, low) - IRQ 21
ahci :00:0f.0: AHCI 0001. 32 slots 4 ports 3 Gbps 0xf impl IDE
mode
ahci :00:0f.0: flags: 64bit ncq pm led clo pmp pio slum part 
ata1: SATA max UDMA/133 cmd 0xC2004D00 ctl 0x0 bmdma 0x0 irq 1277
ata2: SATA max UDMA/133 cmd 0xC2004D80 ctl 0x0 bmdma 0x0 irq 1277
ata3: SATA max UDMA/133 cmd 0xC2004E00 ctl 0x0 bmdma 0x0 irq 1277
ata4: SATA max UDMA/133 cmd 0xC2004E80 ctl 0x0 bmdma 0x0 irq 1277


Similar output as above.


Does any one have any ideas?


Stephen
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: i810fb fails to load

2007-01-22 Thread Thomas Hellström


Andrew Morton wrote:

On Mon, 15 Jan 2007 00:52:36 +0100 Tilman Schmidt [EMAIL PROTECTED] wrote:
With kernel 2.6.20-rc4-mm1 and all hotfixes, i810fb fails to load on my
Dell Optiplex GX110. Here's an excerpt of the diff between the boot logs
of 2.6.20-rc5 (working) and 2.6.20-rc4-mm1 (non-working):

@@ -4,7 +4,7 @@
 No module symbols loaded - kernel modules not enabled.

 klogd 1.4.1, log source = ksyslog started.
-5Linux version 2.6.20-rc5-noinitrd ([EMAIL PROTECTED]) (gcc version 4.0.2 
20050901 (prerelease) (SUSE Linux)) #2 PREEMPT Sun Jan 14 23:37:12 CET 2007
+5Linux version 2.6.20-rc4-mm1-noinitrd ([EMAIL PROTECTED]) (gcc version 
4.0.2 20050901 (prerelease) (SUSE Linux)) #3 PREEMPT Sun Jan 14 21:08:56 CET 2007
 6BIOS-provided physical RAM map:
 4sanitize start
 4sanitize end
@@ -188,7 +192,6 @@
 6ACPI: Interpreter enabled
 6ACPI: Using PIC for interrupt routing
 6ACPI: PCI Root Bridge [PCI0] (:00)
-7PCI: Probing PCI hardware (bus 00)
 6ACPI: Assume root bridge [\_SB_.PCI0] bus is 0
 7Boot video device is :00:01.0
 4PCI quirk: region 0800-087f claimed by ICH4 ACPI/GPIO/TCO
@@ -238,20 +241,15 @@
 6isapnp: No Plug  Play device found
 6Real Time Clock Driver v1.12ac
 6Intel 82802 RNG detected
-6Linux agpgart interface v0.101 (c) Dave Jones
+6Linux agpgart interface v0.102 (c) Dave Jones
 6agpgart: Detected an Intel i810 E Chipset.
 6agpgart: detected 4MB dedicated video ram.
 6agpgart: AGP aperture is 64M @ 0xf800
 4ACPI: PCI Interrupt Link [LNKA] enabled at IRQ 9
 7PCI: setting IRQ 9 as level-triggered
 6ACPI: PCI Interrupt :00:01.0[A] - Link [LNKA] - GSI 9 (level, low) - 
IRQ 9
-4i810-i2c: Probe DDC1 Bus
-4i810fb_init_pci: DDC probe successful
-4Console: switching to colour frame buffer device 160x64
-4I810FB: fb0 : Intel(R) 810E Framebuffer Device v0.9.0
-4I810FB: Video RAM   : 4096K
-4I810FB: Monitor : H: 30-83 KHz V: 55-75 Hz
-4I810FB: Mode: [EMAIL PROTECTED]
+4i810fb_alloc_fbmem: can't bind framebuffer memory
+4i810fb: probe of :00:01.0 failed with error -16
 6Serial: 8250/16550 driver $Revision: 1.90 $ 4 ports, IRQ sharing enabled
 6serial8250: ttyS0 at I/O 0x3f8 (irq = 4) is a 16550A
 6serial8250: ttyS1 at I/O 0x2f8 (irq = 3) is a 16550A

Please let me know if you need more information.




Don't know.  But I bet someone on the Cc does...
  

Tilman,
Thanks for reporting.
Can you try the attached patch to see if that fixes the problem.

Regards,
Thomas Hellström


diff --git a/drivers/char/agp/generic.c b/drivers/char/agp/generic.c
index 91c1f36..6ef0960 100644
--- a/drivers/char/agp/generic.c
+++ b/drivers/char/agp/generic.c
@@ -190,6 +190,7 @@ struct agp_memory *agp_create_memory(int
 		return NULL;
 	}
 	new-num_scratch_pages = scratch_pages;
+	new-type = AGP_NORMAL_MEMORY;
 	return new;
 }
 EXPORT_SYMBOL(agp_create_memory);
diff --git a/drivers/char/agp/intel-agp.c b/drivers/char/agp/intel-agp.c
index b8896c8..5a0713c 100644
--- a/drivers/char/agp/intel-agp.c
+++ b/drivers/char/agp/intel-agp.c
@@ -260,6 +260,7 @@ static int intel_i810_insert_entries(str
 		readl(intel_i810_private.registers+I810_PTE_BASE+((i-1)*4));
 		break;
 	case AGP_PHYS_MEMORY:
+	case AGP_NORMAL_MEMORY:
 		if (!mem-is_flushed)
 			global_cache_flush();
 		for (i = 0, j = pg_start; i  mem-page_count; i++, j++) {

Re: [PATCH] select: fix sys_select to not leak ERESTARTNOHAND to userspace

On Tue, 16 Jan 2007 15:13:32 -0500
Neil Horman [EMAIL PROTECTED] wrote:

 As it is currently written, sys_select checks its return code to convert
 ERESTARTNOHAND to EINTR.  However, the check is within an if (tvp) clause, and
 so if select is called from userspace with a NULL timeval, then it is possible
 for the ERESTARTNOHAND errno to leak into userspace, which is incorrect.  This
 patch moves that check outside of the conditional, and prevents the errno 
 leak.

the ERESTARTNOHAND thing is handled in arch specific signal code,
syscalls can return -ERESTARTNOHAND as much as they want (and your
change breaks the current behaviour of select()).

For example:

arch/x86_64/kernel/signal.c

/* Are we from a system call? */
if ((long)regs-orig_rax = 0) {
/* If so, check system call restarting.. */
switch (regs-rax) {
case -ERESTART_RESTARTBLOCK:
case -ERESTARTNOHAND:
regs-rax = -EINTR;
break;

-- 
Paolo Ornati
Linux 2.6.20-rc5 on x86_64
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] [PATCH] Power S3 Resume Optimization Patch. Request for Comment

2007-01-22 Thread Pavel Machek

Hi!

 My initial idea was to execute only block device resume on the separate
 thread, as it take almost 80% of the total device resume time ( I did

If you do this in one block driver that is slow for you (sata?), then it is
probably acceptable. (Maintainer decides.) I'd encourage that option.

If you want to do it for _all_ block devices, you'll probably have to
audit all of them. _Lot_ of work.
Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) 
http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: 2.6.19.2, cp 18gb_file 18gb_file.2 = OOM killer, 100% reproducible (multi-threaded USB no go)

2007-01-22 Thread Justin Piszcz



On Sun, 21 Jan 2007, Greg KH wrote:

 On Sun, Jan 21, 2007 at 12:29:51PM -0500, Justin Piszcz wrote:
  
  
  On Sun, 21 Jan 2007, Justin Piszcz wrote:
  
   
   

Good luck,
Jurriaan
-- 
 What does ELF stand for (in respect to Linux?)
ELF is the first rock group that Ronnie James Dio performed with back 
in 
the early 1970's.  In constrast, a.out is a misspelling  of the French 
word 
for the month of August.  What the two have in common is beyond me, but 
Linux users seem to use the two words together.
seen on c.o.l.misc
Debian (Unstable) GNU/Linux 2.6.20-rc5 2x2011 bogomips load 0.83
the Jack Vance Integral Edition: http://www.integralarchive.org
-
To unsubscribe from this list: send the line unsubscribe linux-raid in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html

   
   Thanks, I'll give it another go in a bit!
   
   Justin.
   -
  
  Running 2.6.20-rc5 now, the multi-threaded USB probing causes my UPS not 
  to work, probably because udev has problems or something, it is also the 
  only USB device I have plugged into the machine.
 
 multi-threaded USB is about to go away as it caused too many problems
 for people, and they didn't read the Kconfig help entry about it :(
 
 thanks,
 
 greg k-h
 -
 To unsubscribe from this list: send the line unsubscribe linux-raid in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

Ah-- ok-- still experiencing the copy bug though.  When I copy an 18gb 
file to 18gbfile.2 on the same volume, havoc ensues.

Justin.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: change strip_cache_size freeze the whole raid

2007-01-22 Thread Justin Piszcz



On Mon, 22 Jan 2007, kyle wrote:

 Hi,
 
 Yesterday I tried to increase the value of strip_cache_size to see if I can
 get better performance or not. I increase the value from 2048 to something
 like 16384. After I did that, the raid5 freeze. Any proccess read / write to
 it stucked at D state. I tried to change it back to 2048, read
 strip_cache_active, cat /proc/mdstat, mdadm stop, etc. All didn't return back.
 I even cannot shutdown the machine. Finally I need to press the reset button
 in order to get back my control.
 
 Kernel is 2.6.17.8 x86-64, running at AMD Athlon3000+, 2GB Ram, 8 x Seagate
 8200.10 250GB HDD, nvidia chipset.
 
 cat /proc/mdstat (after reboot):
 Personalities : [raid1] [raid5] [raid4]
 md1 : active raid1 hdc2[1] hda2[0]
  6144768 blocks [2/2] [UU]
 
 md2 : active raid5 sdf1[7] sde1[6] sdd1[5] sdc1[4] sdb1[3] sda1[2] hdc4[1]
 hda4[0]
  1664893440 blocks level 5, 512k chunk, algorithm 2 [8/8] []
 
 md0 : active raid1 hdc1[1] hda1[0]
  104320 blocks [2/2] [UU]
 
 Kyle
 
 -
 To unsubscribe from this list: send the line unsubscribe linux-raid in
 the body of a message to [EMAIL PROTECTED]
 More majordomo info at  http://vger.kernel.org/majordomo-info.html
 

Yes, I noticed this bug too, if you change it too many times or change it 
at the 'wrong' time, it hangs up when you echo numbr  
/proc/stripe_cache_size.

Basically don't run it more than once and don't run it at the 'wrong' time 
and it works.  Not sure where the bug lies, but yeah I've seen that on 3 
different machines!

Justin.

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: problems with latest smbfs changes on 2.4.34 and security backports

2007-01-22 Thread Santiago Garcia Mantinan

Hi again!

I tried to replicate the problem at home during the weekend with my laptop,
but I couldn't get it to show links with previous kernels, so I guess I had
something different on my samba server or similar, I'm at the real machines
now so I have done the real tests and they look promising. I'm getting
completely different results than those of Grant, which seems really weird.

I applied just this patch:

  --- kernel-source-2.4.27.orig/fs/smbfs/proc.c  2007-01-19 
  17:53:57.247695476 -0700
  +++ kernel-source-2.4.27/fs/smbfs/proc.c   2007-01-19 17:49:07.480161733 
  -0700
  @@ -1997,7 +1997,7 @@
 fattr-f_mode = (server-mnt-dir_mode  (S_IRWXU | S_IRWXG | 
   S_IRWXO)) | S_IFDIR;
 else if ( (server-mnt-flags  SMB_MOUNT_FMODE) 
   !(S_ISDIR(fattr-f_mode)) )
  -  fattr-f_mode = (server-mnt-file_mode  (S_IRWXU | S_IRWXG | 
  S_IRWXO)) | S_IFREG;
  +  fattr-f_mode = (server-mnt-file_mode  (S_IRWXU | S_IRWXG | 
  S_IRWXO)) | (fattr-f_mode  S_IFMT);
   
   }

To an unpatched 2.4.34, the client is an IBM NetworkStation 1000 (a PowerPC
based thin client), and the server is a normal amd64 based PC running
2.6.19.1, both running Debian, the client runs Sarge and the Server Etch.
I'm descriving this to see if differences on the architectures could be
causing the differences on behaviour between my tests and Grant's.

  client running 2.4.34 with above patch, server is running 2.6.19.2 to 
  eliminate it from the problem space (hopefully ;) :
  [EMAIL PROTECTED]:/home/other$ uname -r
  2.4.34b
  [EMAIL PROTECTED]:/home/other$ ls -l
  total 9
  drwxr-xr-x 1 grant wheel 4096 2007-01-21 11:44 dir/
  drwxr-xr-x 1 grant wheel 4096 2007-01-21 11:44 dirlink/
  -rwxr-xr-x 1 grant wheel   15 2007-01-21 11:43 file*
  -rwxr-xr-x 1 grant wheel   15 2007-01-21 11:43 filelink*
 
 It seems to me that there is a difference, because filelink now appears the
 same size as file. It's just as if we had hard links instead of symlinks.

Here is what I did, I mounted the remote filesystem on /mnt on my client,
the share on the server has a normal Debian Sarge PowerPC filesystem on it.

$ pwd
/mnt/usr
$ ls -l
total 0
drwxr-xr-x  1 root root  0 Feb 15  2005 X11R6
drwxr-xr-x  1 root root  0 Jan 16  2007 bin
drwxr-xr-x  1 root root  0 Jan 16  2007 doc
drwxr-xr-x  1 root root  0 Feb 10  2005 games
drwxr-xr-x  1 root root  0 Jan 16  2007 include
lrwxr-xr-x  1 root root 10 Jan 16  2007 info - share/info
drwxr-xr-x  1 root root  0 Jan 16  2007 lib
drwxr-xr-x  1 root root  0 Feb 10  2005 local
drwxr-xr-x  1 root root  0 Jan 16  2007 sbin
drwxr-xr-x  1 root root  0 Jan  5  2006 share
drwxr-xr-x  1 root root  0 Dec 15  2004 src
$ ls -l info/
total 249856
-rwxr-xr-x  1 root root 150109 Jul 16  2004 coreutils.info.gz
-rwxr-xr-x  1 root root   1299 Jan 16  2007 dir
-rwxr-xr-x  1 root root   1299 Jan 16  2007 dir.old
-rwxr-xr-x  1 root root  28019 Mar 20  2005 find.info.gz
-rwxr-xr-x  1 root root  26136 Nov 22  2004 grep.info.gz
-rwxr-xr-x  1 root root  12914 Sep 16  2006 gzip.info.gz
-rwxr-xr-x  1 root root  12316 Sep 18  2005 ipc.info.gz
-rwxr-xr-x  1 root root  21432 Jan 23  2005 rl5userman.info.gz
-rwxr-xr-x  1 root root  26647 Dec  1  2004 sed.info.gz
-rwxr-xr-x  1 root root 123382 Dec  1  2006 tar.info.gz
-rwxr-xr-x  1 root root  54876 May 23  2005 wget.info.gz
$ cd ../bin
$ ls -l sh
lrwxr-xr-x  1 root root 4 Jan 16  2007 sh - bash
$ dd if=sh bs=1 count=6
ELF6+0 records in
6+0 records out
6 bytes transferred in 0.001432 seconds (4190 bytes/sec)

As you can see I now can see the symbolic links perfectly and they work as
expected.

In fact, this patch is working so well that it poses a security risk, as now
the devices on my /mnt/dev directory are not only seen as devices (like they
were seen on 2.4.33) but they also work (which didn't happen on 2.4.33).

So... for me now the remote filesystem works as if it was a local
filesystem, without any difference of behaviour, not even on special files
like devices or whatever.

As I said before... this behaviour of having the remote device files work...
seems a security problem and I don't think is desirable, other than that it
seems to work well on my PowerPC, I'll try to run the tests on a normal x86
client and report back.

Regards...
-- 
Santiago García Mantiñán
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: problems with latest smbfs changes on 2.4.34 and security backports

2007-01-22 Thread Willy Tarreau

Hi Santiago !

On Mon, Jan 22, 2007 at 09:54:00AM +0100, Santiago Garcia Mantinan wrote:
 Hi again!
 
 I tried to replicate the problem at home during the weekend with my laptop,
 but I couldn't get it to show links with previous kernels, so I guess I had
 something different on my samba server or similar, I'm at the real machines
 now so I have done the real tests and they look promising. I'm getting
 completely different results than those of Grant, which seems really weird.
 
 I applied just this patch:
 
   --- kernel-source-2.4.27.orig/fs/smbfs/proc.c2007-01-19 
   17:53:57.247695476 -0700
   +++ kernel-source-2.4.27/fs/smbfs/proc.c 2007-01-19 17:49:07.480161733 
   -0700
   @@ -1997,7 +1997,7 @@
fattr-f_mode = (server-mnt-dir_mode  (S_IRWXU | 
S_IRWXG | S_IRWXO)) | S_IFDIR;
else if ( (server-mnt-flags  SMB_MOUNT_FMODE) 
  !(S_ISDIR(fattr-f_mode)) )
   -fattr-f_mode = (server-mnt-file_mode  (S_IRWXU | 
   S_IRWXG | S_IRWXO)) | S_IFREG;
   +fattr-f_mode = (server-mnt-file_mode  (S_IRWXU | 
   S_IRWXG | S_IRWXO)) | (fattr-f_mode  S_IFMT);

}
 
 To an unpatched 2.4.34, the client is an IBM NetworkStation 1000 (a PowerPC
 based thin client), and the server is a normal amd64 based PC running
 2.6.19.1, both running Debian, the client runs Sarge and the Server Etch.
 I'm descriving this to see if differences on the architectures could be
 causing the differences on behaviour between my tests and Grant's.
 
   client running 2.4.34 with above patch, server is running 2.6.19.2 to 
   eliminate it from the problem space (hopefully ;) :
   [EMAIL PROTECTED]:/home/other$ uname -r
   2.4.34b
   [EMAIL PROTECTED]:/home/other$ ls -l
   total 9
   drwxr-xr-x 1 grant wheel 4096 2007-01-21 11:44 dir/
   drwxr-xr-x 1 grant wheel 4096 2007-01-21 11:44 dirlink/
   -rwxr-xr-x 1 grant wheel   15 2007-01-21 11:43 file*
   -rwxr-xr-x 1 grant wheel   15 2007-01-21 11:43 filelink*
  
  It seems to me that there is a difference, because filelink now appears the
  same size as file. It's just as if we had hard links instead of symlinks.
 
 Here is what I did, I mounted the remote filesystem on /mnt on my client,
 the share on the server has a normal Debian Sarge PowerPC filesystem on it.
 
 $ pwd
 /mnt/usr
 $ ls -l
 total 0
 drwxr-xr-x  1 root root  0 Feb 15  2005 X11R6
 drwxr-xr-x  1 root root  0 Jan 16  2007 bin
 drwxr-xr-x  1 root root  0 Jan 16  2007 doc
 drwxr-xr-x  1 root root  0 Feb 10  2005 games
 drwxr-xr-x  1 root root  0 Jan 16  2007 include
 lrwxr-xr-x  1 root root 10 Jan 16  2007 info - share/info
 drwxr-xr-x  1 root root  0 Jan 16  2007 lib
 drwxr-xr-x  1 root root  0 Feb 10  2005 local
 drwxr-xr-x  1 root root  0 Jan 16  2007 sbin
 drwxr-xr-x  1 root root  0 Jan  5  2006 share
 drwxr-xr-x  1 root root  0 Dec 15  2004 src
 $ ls -l info/
 total 249856
 -rwxr-xr-x  1 root root 150109 Jul 16  2004 coreutils.info.gz
 -rwxr-xr-x  1 root root   1299 Jan 16  2007 dir
 -rwxr-xr-x  1 root root   1299 Jan 16  2007 dir.old
 -rwxr-xr-x  1 root root  28019 Mar 20  2005 find.info.gz
 -rwxr-xr-x  1 root root  26136 Nov 22  2004 grep.info.gz
 -rwxr-xr-x  1 root root  12914 Sep 16  2006 gzip.info.gz
 -rwxr-xr-x  1 root root  12316 Sep 18  2005 ipc.info.gz
 -rwxr-xr-x  1 root root  21432 Jan 23  2005 rl5userman.info.gz
 -rwxr-xr-x  1 root root  26647 Dec  1  2004 sed.info.gz
 -rwxr-xr-x  1 root root 123382 Dec  1  2006 tar.info.gz
 -rwxr-xr-x  1 root root  54876 May 23  2005 wget.info.gz
 $ cd ../bin
 $ ls -l sh
 lrwxr-xr-x  1 root root 4 Jan 16  2007 sh - bash
 $ dd if=sh bs=1 count=6
 ELF6+0 records in
 6+0 records out
 6 bytes transferred in 0.001432 seconds (4190 bytes/sec)
 
 As you can see I now can see the symbolic links perfectly and they work as
 expected.
 
 In fact, this patch is working so well that it poses a security risk, as now
 the devices on my /mnt/dev directory are not only seen as devices (like they
 were seen on 2.4.33) but they also work (which didn't happen on 2.4.33).

Why do you consider this a security problem ? Is any user able to create a
device entry with enough permissions ? As a general rule of thumb, networked
file systems should be mounted with the nodev option.

 So... for me now the remote filesystem works as if it was a local
 filesystem, without any difference of behaviour, not even on special files
 like devices or whatever.
 
 As I said before... this behaviour of having the remote device files work...
 seems a security problem and I don't think is desirable, other than that it
 seems to work well on my PowerPC, I'll try to run the tests on a normal x86
 client and report back.

Thanks very much for your tests.

Grant, just to be sure, are you really certain that you tried the fixed kernel ?
It is possible that you booted a wrong kernel during one of your tests. I'm
intrigued by the fact that it changed nothing for you and that it fixed the
problem for Santiago.

Best regards,
Willy

-
To

Re: problems with latest smbfs changes on 2.4.34 and security backports

2007-01-22 Thread Santiago Garcia Mantinan

  As you can see I now can see the symbolic links perfectly and they work as
  expected.
  
  In fact, this patch is working so well that it poses a security risk, as now
  the devices on my /mnt/dev directory are not only seen as devices (like they
  were seen on 2.4.33) but they also work (which didn't happen on 2.4.33).
 
 Why do you consider this a security problem ? Is any user able to create a
 device entry with enough permissions ? As a general rule of thumb, networked
 file systems should be mounted with the nodev option.

You are completely right on that, it is just that I thought those devices
didn't work on 2.4.33, but I just retested again and they work ok, only that
they were not working to me on the PC I tested the other day and it was
because of a nodev option :-) just that.

So... I have finised with my tests, I have tested an x86 client on which it
worked ok, just like on the PowerPC client, both working perfectly just like
they used to do on 2.4.33.

 Grant, just to be sure, are you really certain that you tried the fixed 
 kernel ?
 It is possible that you booted a wrong kernel during one of your tests. I'm
 intrigued by the fact that it changed nothing for you and that it fixed the
 problem for Santiago.

Maybe he had also applied some of the earlier patches you had sent and that
I did not apply to mine?

Just to clear things up a bit, I'm sure I'm with the 2.4.34 kernel and...
I'm running a pristine kernel with just this latest patch applied, the one
that changes S_IFREG for (fattr-f_mode  S_IFMT).

Regards...
-- 
Santiago García Mantiñán
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: problems with latest smbfs changes on 2.4.34 and security backports

2007-01-22 Thread Grant Coady

On Mon, 22 Jan 2007 10:18:16 +0100, Willy Tarreau [EMAIL PROTECTED] wrote:

Grant, just to be sure, are you really certain that you tried the fixed kernel 
?
It is possible that you booted a wrong kernel during one of your tests. I'm
intrigued by the fact that it changed nothing for you and that it fixed the
problem for Santiago.

Closest I get to Santiago's results are with the 2.4.33.7 plus the patch, 
with 'use default NLS' option turned on, as well as the unix extensions.

2.4.34 was a no go for me.  Changing the default NLS made no difference, 
now trying with unix extensions turned on. . .  Yeah, that works, apart 
from the test file gaining execute bits, compared to operation under 
2.4.33.3, this is 2.4.34 + patch + default NLS and unix extensions:

[EMAIL PROTECTED]:/home/other$ cat dirlink/filelink
this is a test
[EMAIL PROTECTED]:/home/other$ echo this is a test  testfile
[EMAIL PROTECTED]:/home/other$ ls -l
total 4096
drwxr-xr-x 1 grant wheel  0 2007-01-21 11:44 dir/
lrwxr-xr-x 1 grant wheel  3 2007-01-21 11:43 dirlink - dir/
-rwxr-xr-x 1 grant wheel 15 2007-01-21 11:43 file*
lrwxr-xr-x 1 grant wheel  4 2007-01-21 11:44 filelink - file*
drwxr-xr-x 1 grant wheel  0 2007-01-22 10:45 test/
-rwxr-xr-x 1 grant wheel 15 2007-01-22 21:31 testfile*
lrwxr-xr-x 1 grant wheel  4 2007-01-22 21:29 testlink - test/
[EMAIL PROTECTED]:/home/other$ ln -s testfile testfilelink
[EMAIL PROTECTED]:/home/other$ cat testfilelink
this is a test


The fix depends on the smbfs configuration?  Is this the requirement?
I ask as the other mode of operation (unix extensions turned off): do 
not display symlinks, prevent creation of symlinks, seems to be logically 
self-consistent, as well as matching what I see from a 'doze box.

Grant.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: problems with latest smbfs changes on 2.4.34 and security backports

2007-01-22 Thread Grant Coady

On Mon, 22 Jan 2007 10:36:30 +0100, Santiago Garcia Mantinan [EMAIL 
PROTECTED] wrote:

  As you can see I now can see the symbolic links perfectly and they work as
  expected.
  
  In fact, this patch is working so well that it poses a security risk, as 
  now
  the devices on my /mnt/dev directory are not only seen as devices (like 
  they
  were seen on 2.4.33) but they also work (which didn't happen on 2.4.33).
 
 Why do you consider this a security problem ? Is any user able to create a
 device entry with enough permissions ? As a general rule of thumb, networked
 file systems should be mounted with the nodev option.

You are completely right on that, it is just that I thought those devices
didn't work on 2.4.33, but I just retested again and they work ok, only that
they were not working to me on the PC I tested the other day and it was
because of a nodev option :-) just that.

So... I have finised with my tests, I have tested an x86 client on which it
worked ok, just like on the PowerPC client, both working perfectly just like
they used to do on 2.4.33.

 Grant, just to be sure, are you really certain that you tried the fixed 
 kernel ?
 It is possible that you booted a wrong kernel during one of your tests. I'm
 intrigued by the fact that it changed nothing for you and that it fixed the
 problem for Santiago.

Maybe he had also applied some of the earlier patches you had sent and that
I did not apply to mine?

Just to clear things up a bit, I'm sure I'm with the 2.4.34 kernel and...
I'm running a pristine kernel with just this latest patch applied, the one
that changes S_IFREG for (fattr-f_mode  S_IFMT).

Same kernel + patch here for latest results posting :)  We seem to get 
similar results now -- though I query the file execute bits coming up.

Grant.

Regards...

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Make CARDBUS_MEM_SIZE and CARDBUS_IO_SIZE customizable

2007-01-22 Thread Éric Piel


01/19/2007 04:57 AM, Atsushi Nemoto wrote/a écrit:

On Fri, 19 Jan 2007 12:19:10 +0900 (JST), Atsushi Nemoto [EMAIL PROTECTED] 
wrote:

OK, here is a revised patch which uses pci= option instead of config
parameters.


Sorry, this patch would cause build failure if setup-bus.c was not
built into kernel.  Revised again.


Subject: [PATCH] Make CARDBUS_MEM_SIZE and CARDBUS_IO_SIZE customizable

CARDBUS_MEM_SIZE was increased to 64MB on 2.6.20-rc2, but larger size
might result in allocation failure for the reserving itself on some
platforms (for example typical 32bit MIPS).  Make it (and
CARDBUS_IO_SIZE too) customizable by pci= option for such platforms.

:


diff --git a/Documentation/kernel-parameters.txt 
b/Documentation/kernel-parameters.txt
index 25d2985..ace7a9a 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1259,6 +1259,12 @@ and is between 256 and 4096 characters. 
 This sorting is done to get a device

order compatible with older (= 2.4) kernels.
nobfsortDon't sort PCI devices into breadth-first order.
+   cbiosize=nn[KMG]A fixed amount of bus space is
+   reserved for CardBus bridges.
+   The default value is 256 bytes.
+   cbmemsize=nn[KMG]   A fixed amount of bus space is
+   reserved for CardBus bridges.
+   The default value is 64 megabytes.
Hi, I've got the feeling that those two parameters don't do the same 
things, although they have the same description ;-) Maybe the texts 
could be:
* The fixed amount of bus space which is reserved for the CardBus 
bridges IO window.
* The fixed amount of bus space which is reserved for the CardBus 
bridges memory window.


See you,
Eric
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Make CARDBUS_MEM_SIZE and CARDBUS_IO_SIZE customizable

2007-01-22 Thread Atsushi Nemoto

On Mon, 22 Jan 2007 14:57:46 +0100, Éric Piel [EMAIL PROTECTED] wrote:
  +   cbiosize=nn[KMG]A fixed amount of bus space is
  +   reserved for CardBus bridges.
  +   The default value is 256 bytes.
  +   cbmemsize=nn[KMG]   A fixed amount of bus space is
  +   reserved for CardBus bridges.
  +   The default value is 64 megabytes.
 Hi, I've got the feeling that those two parameters don't do the same 
 things, although they have the same description ;-) Maybe the texts 
 could be:
 * The fixed amount of bus space which is reserved for the CardBus 
 bridges IO window.
 * The fixed amount of bus space which is reserved for the CardBus 
 bridges memory window.

Thanks for your comment.  Updated.


Subject: [PATCH] Make CARDBUS_MEM_SIZE and CARDBUS_IO_SIZE customizable

CARDBUS_MEM_SIZE was increased to 64MB on 2.6.20-rc2, but larger size
might result in allocation failure for the reserving itself on some
platforms (for example typical 32bit MIPS).  Make it (and
CARDBUS_IO_SIZE too) customizable by pci= option for such platforms.

Signed-off-by: Atsushi Nemoto [EMAIL PROTECTED]
---
 Documentation/kernel-parameters.txt |6 ++
 drivers/pci/pci.c   |6 ++
 drivers/pci/setup-bus.c |   27 +++
 include/linux/pci.h |3 +++
 4 files changed, 30 insertions(+), 12 deletions(-)

diff --git a/Documentation/kernel-parameters.txt 
b/Documentation/kernel-parameters.txt
index 25d2985..dc39989 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1259,6 +1259,12 @@ and is between 256 and 4096 characters.
This sorting is done to get a device
order compatible with older (= 2.4) kernels.
nobfsortDon't sort PCI devices into breadth-first order.
+   cbiosize=nn[KMG]The fixed amount of bus space which is
+   reserved for the CardBus bridges IO window.
+   The default value is 256 bytes.
+   cbmemsize=nn[KMG]   The fixed amount of bus space which is
+   reserved for the CardBus bridges memory window.
+   The default value is 64 megabytes.
 
pcmv=   [HW,PCMCIA] BadgePAD 4
 
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 206c834..639069a 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -1168,6 +1168,12 @@ static int __devinit pci_setup(char *str
if (*str  (str = pcibios_setup(str))  *str) {
if (!strcmp(str, nomsi)) {
pci_no_msi();
+   } else if (!strncmp(str, cbiosize=, 9)) {
+   pci_cardbus_io_size =
+   memparse(str + 9, str);
+   } else if (!strncmp(str, cbmemsize=, 10)) {
+   pci_cardbus_mem_size =
+   memparse(str + 10, str);
} else {
printk(KERN_ERR PCI: Unknown option `%s'\n,
str);
diff --git a/drivers/pci/setup-bus.c b/drivers/pci/setup-bus.c
index 89f3036..1dfc288 100644
--- a/drivers/pci/setup-bus.c
+++ b/drivers/pci/setup-bus.c
@@ -40,8 +40,11 @@
  * FIXME: IO should be max 256 bytes.  However, since we may
  * have a P2P bridge below a cardbus bridge, we need 4K.
  */
-#define CARDBUS_IO_SIZE(256)
-#define CARDBUS_MEM_SIZE   (64*1024*1024)
+#define DEFAULT_CARDBUS_IO_SIZE(256)
+#define DEFAULT_CARDBUS_MEM_SIZE   (64*1024*1024)
+/* pci=cbmemsize=nnM,cbiosize=nn can override this */
+unsigned long pci_cardbus_io_size = DEFAULT_CARDBUS_IO_SIZE;
+unsigned long pci_cardbus_mem_size = DEFAULT_CARDBUS_MEM_SIZE;
 
 static void __devinit
 pbus_assign_resources_sorted(struct pci_bus *bus)
@@ -415,12 +418,12 @@ pci_bus_size_cardbus(struct pci_bus *bus
 * Reserve some resources for CardBus.  We reserve
 * a fixed amount of bus space for CardBus bridges.
 */
-   b_res[0].start = CARDBUS_IO_SIZE;
-   b_res[0].end = b_res[0].start + CARDBUS_IO_SIZE - 1;
+   b_res[0].start = pci_cardbus_io_size;
+   b_res[0].end = b_res[0].start + pci_cardbus_io_size - 1;
b_res[0].flags |= IORESOURCE_IO;
 
-   b_res[1].start = CARDBUS_IO_SIZE;
-   b_res[1].end = b_res[1].start + CARDBUS_IO_SIZE - 1;
+   b_res[1].start = pci_cardbus_io_size;
+   b_res[1].end = b_res[1].start + pci_cardbus_io_size - 1;
b_res[1].flags |= IORESOURCE_IO;
 
/*
@@ -440,16 +443,16 @@ pci_bus_size_cardbus(struct pci_bus *bus
 * twice the size.
 */
if (ctrl  PCI_CB_BRIDGE_CTL_PREFETCH_MEM0) {
-

Re: [RFC] Asynchronous Messaging

 This is accomplished by allocating a page (or more) of memory which
 is executable and mapped into every threads address space. Also, all
 ISR entry points are modified to detect if the code that was interrupted
 was executing within the ACE page. If it was then the ACE code is
 allowed to complete before the ISR continues. This then provides
 the guarantee of atomic execution.

What if you enter the ISR, pass the point of the check and then another
CPU core hits the ACE space ?

Also how do you handle the case where the code gets stuck in your atomic
pages ?

Alan
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [Ksummit-2006-discuss] 2007 Linux Kernel Summit

2007-01-22 Thread Steven Whitehouse

Hi,

On Mon, Jan 22, 2007 at 08:14:17AM -0500, Theodore Tso wrote:
 On Mon, Jan 22, 2007 at 07:45:02AM -0500, Alan Cox wrote:
  
  Definitely disagree with that. I'd like to see the conference somewhere
  else different this time - perhaps Czech Republic, or somewhere else more
  easterly and Linux active (or even Finland...)
  
 
 Understand that one of the feedback that I get from the keepers of the
 corporate travel budgets is that money for sending employees to exotic
 locations is finite --- which is why we haven't tried pairing the
 kernel summit with linux.conf.au.  Cambridge works out because there
 are relatively cheap flights to Amsterdam and then you can take a
 cheap Ryan Air flight to Stanisted.  Still, the fact that it isn't
 paired with another conference means that we are getting some
 expressions of unhappiness from other Kernel Summit stakeholders.
 It's for that reason that (a) I'm trying to line up some folks who
 might be interested in trying to put together a relatively small,
 2-day technical conference after the Kernel Summit, which can
 hopefully serve as a seed for something like OLS and LCA in UK/Europe,
 and (b) I've told folks that the moving it away from Cambridge is a
 one-time experiment, after which point we will re-evaluate.


Wrt, point (a), UKUUG are moving their UK based Summer Linux conference
to coincide timewise with the kernel summit. Normally its in the July/August
time frame. Location probably, but last I heard from Alasdair Kergon not
certain to be, in Cambridge,

Steve.
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] select: fix sys_select to not leak ERESTARTNOHAND to userspace

2007-01-22 Thread Neil Horman

On Mon, Jan 22, 2007 at 02:59:56PM +0100, Paolo Ornati wrote:
 On Tue, 16 Jan 2007 15:13:32 -0500
 Neil Horman [EMAIL PROTECTED] wrote:
 
  As it is currently written, sys_select checks its return code to convert
  ERESTARTNOHAND to EINTR.  However, the check is within an if (tvp) clause, 
  and
  so if select is called from userspace with a NULL timeval, then it is 
  possible
  for the ERESTARTNOHAND errno to leak into userspace, which is incorrect.  
  This
  patch moves that check outside of the conditional, and prevents the errno 
  leak.
 
 the ERESTARTNOHAND thing is handled in arch specific signal code,

In the signal handling path yes.
Not always in the case of select, though.  Check core_sys_select:

if (!ret) {
ret = -ERESTARTNOHAND;
if (signal_pending(current))
goto out;
ret = 0;
}
...

out:
if (bits != stack_fds)
kfree(bits);
out_nofds:
return ret;

Its possible for core_sys_select to return ERESTARTNOHAND to sys_select, which
will in turn (as its currently written), return that value back to user space.

 syscalls can return -ERESTARTNOHAND as much as they want (and your
 change breaks the current behaviour of select()).
 

It doesn't break it, it fixes it.  select isn't meant to ever return
ERESTARTNOHAND to user space:
http://www.opengroup.org/onlinepubs/009695399/functions/select.html

And ENORESTARTHAND isn't defined in the userspace errno.h, so this causes all
sorts of confusion for apps that don't expect it.

Neil

 For example:
 
 arch/x86_64/kernel/signal.c
 
 /* Are we from a system call? */
 if ((long)regs-orig_rax = 0) {
 /* If so, check system call restarting.. */
 switch (regs-rax) {
 case -ERESTART_RESTARTBLOCK:
 case -ERESTARTNOHAND:
 regs-rax = -EINTR;
 break;
 
 -- 
   Paolo Ornati
   Linux 2.6.20-rc5 on x86_64
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Make CARDBUS_MEM_SIZE and CARDBUS_IO_SIZE customizable

2007-01-22 Thread Sergei Shtylyov


Hello.

Atsushi Nemoto wrote:


Subject: [PATCH] Make CARDBUS_MEM_SIZE and CARDBUS_IO_SIZE customizable

CARDBUS_MEM_SIZE was increased to 64MB on 2.6.20-rc2, but larger size
might result in allocation failure for the reserving itself on some
platforms (for example typical 32bit MIPS).  Make it (and
CARDBUS_IO_SIZE too) customizable by pci= option for such platforms.


   Sorry for grammatic nitpicking. :-)


diff --git a/Documentation/kernel-parameters.txt 
b/Documentation/kernel-parameters.txt
index 25d2985..dc39989 100644
--- a/Documentation/kernel-parameters.txt
+++ b/Documentation/kernel-parameters.txt
@@ -1259,6 +1259,12 @@ and is between 256 and 4096 characters.
This sorting is done to get a device
order compatible with older (= 2.4) kernels.
nobfsortDon't sort PCI devices into breadth-first order.
+   cbiosize=nn[KMG]The fixed amount of bus space which is
+   reserved for the CardBus bridges IO window.


   It shoyld be bridge's...


+   The default value is 256 bytes.
+   cbmemsize=nn[KMG]   The fixed amount of bus space which is
+   reserved for the CardBus bridges memory window.


   Ditto.


+   The default value is 64 megabytes.
 


MBR, Sergei
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Introduce simple TRUE and FALSE boolean macros.

2007-01-22 Thread Mike Galbraith

On Mon, 2007-01-22 at 06:02 -0500, Robert P. J. Day wrote:
 On Mon, 22 Jan 2007, Nick Piggin wrote:
 
  Robert P. J. Day wrote:
 
   by adding (temporarily) the definitions of TRUE and FALSE to
   types.h, you should then (theoretically) be able to delete over
   100 instances of those same macros being *defined* throughout the
   source tree. you're not going to be deleting the hundreds and
   hundreds of *uses* of TRUE and FALSE (not yet, anyway) but, at the
   very least, by adding two lines to types.h, you can delete all
   those redundant *definitions* and make sure that nothing breaks.
   (it shouldn't, of course, but it's always nice to be sure.)
 
  Doesn't seem very worthwhile, and it legitimises this definition
  we're trying to get rid of.
 
 h ... apparently, you totally missed my use of the important
 word temporarily:
 
   $ grep -r temporary hack . | wc -l
   16

That's a pretty good argument _against_ adding another one :)  I wonder
how old those temporary hacks are (the ones you missed as well).

To make TRUE/FALSE go away, you or someone will have to visit them all,
which will take time.  Why add an intermediate step where you or others
can end up getting interrupted (indefinitely), leaving the temporary
definition lying around for folks to use?

-Mike

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Undo some of the pseudo-security madness

2007-01-22 Thread Valdis . Kletnieks

On Mon, 22 Jan 2007 02:23:30 +0300, Samium Gromoff said:

 not core-dumps but core files, in the lispspeak, but anyway.
 
 the reason is trivial -- if i can write programs enjoying setuid
 privileges in C, i want to be able to do the same in Lisp.

Go read up on how the XEmacs crew designed their portable dumper,
specifically to get around a lot of these sorts of problems because the
old Emacs 'unexec' code was incredibly fragile.

 the only way to achieve this i see, is to directly setuid root
 the lisp system executable itself -- because the lisp code
 is read, compiled and executed in the process of the lisp
 system executable.

If that's the only way you can see to do it, maybe you should think a
bit harder before making kernel hacks to do something.






pgpCDea0NdYeg.pgp
Description: PGP signature

[PATCH] Fix race in efi variable delete code.

2007-01-22 Thread Prarit Bhargava

Fix race when deleting an EFI variable and issuing another EFI command on the
same variable.  The removal of the variable from the efivars_list should be
done in efivar_delete and not delayed until the kprobes release.

Signed-off-by: Prarit Bhargava [EMAIL PROTECTED]

diff --git a/drivers/firmware/efivars.c b/drivers/firmware/efivars.c
index 5ab5e39..bf2ca97 100644
--- a/drivers/firmware/efivars.c
+++ b/drivers/firmware/efivars.c
@@ -385,10 +385,8 @@ static struct sysfs_ops efivar_attr_ops = {
 
 static void efivar_release(struct kobject *kobj)
 {
-   struct efivar_entry *var = container_of(kobj, struct efivar_entry, 
kobj);
-   spin_lock(efivars_lock);
-   list_del(var-list);
-   spin_unlock(efivars_lock);
+   struct efivar_entry *var = container_of(kobj, struct efivar_entry,
+   kobj);
kfree(var);
 }
 
@@ -537,6 +535,9 @@ efivar_delete(struct subsystem *sub, const char *buf, 
size_t count)
spin_unlock(efivars_lock);
return -EIO;
}
+
+   list_del(search_efivar-list);
+
/* We need to release this lock before unregistering. */
spin_unlock(efivars_lock);
 
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] Introduce simple TRUE and FALSE boolean macros.

2007-01-22 Thread Robert P. J. Day

On Mon, 22 Jan 2007, Mike Galbraith wrote:

 On Mon, 2007-01-22 at 06:02 -0500, Robert P. J. Day wrote:
  On Mon, 22 Jan 2007, Nick Piggin wrote:
 
   Robert P. J. Day wrote:
  
by adding (temporarily) the definitions of TRUE and FALSE to
types.h, you should then (theoretically) be able to delete over
100 instances of those same macros being *defined* throughout the
source tree. you're not going to be deleting the hundreds and
hundreds of *uses* of TRUE and FALSE (not yet, anyway) but, at the
very least, by adding two lines to types.h, you can delete all
those redundant *definitions* and make sure that nothing breaks.
(it shouldn't, of course, but it's always nice to be sure.)
  
   Doesn't seem very worthwhile, and it legitimises this definition
   we're trying to get rid of.
 
  h ... apparently, you totally missed my use of the important
  word temporarily:
 
$ grep -r temporary hack . | wc -l
16

 That's a pretty good argument _against_ adding another one :)  I
 wonder how old those temporary hacks are (the ones you missed as
 well).

 To make TRUE/FALSE go away, you or someone will have to visit them
 all, which will take time.  Why add an intermediate step where you
 or others can end up getting interrupted (indefinitely), leaving the
 temporary definition lying around for folks to use?

as opposed to the 100+ *other* definitions currently cluttering up the
tree, which this patch would allow to be deleted *immediately*.

forget it.  i can see this argument is going nowhere and that, six
months from now, some poor sucker is going to post, asking, hey, you
know all these TRUE/FALSE things?  wouldn't it be great if we could,
you know, clean those up?  whaddya say?

and groundhog day will begin all over again ...

rday
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: PROBLEM: KB-KiB, MB - MiB, ... (IEC 60027-2)

2007-01-22 Thread Lennart Sorensen

On Sun, Jan 21, 2007 at 10:12:55PM +0100, Jan Engelhardt wrote:
 Same lie like with harddrives. It's around 80, not 100.
 But it depends on how you look at it. 80 for Layer3, possibly
 a little more for Layer2/1.

Strange, I tend to get about 95 for layer 3.

--
Len Sorensen
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: O_DIRECT question

2007-01-22 Thread Phillip Susi


Denis Vlasenko wrote:

What will happen if we just make open ignore O_DIRECT? ;)

And then anyone who feels sad about is advised to do it
like described here:

http://lkml.org/lkml/2002/5/11/58


Then database and other high performance IO users will be broken.  Most 
of Linus's rant there is being rehashed now in this thread, and it has 
been pointed out that using mmap instead is unacceptable because it is 
inherently _synchronous_ and the app can not tolerate the page faults on 
read, and handling IO errors during the page fault is impossible/highly 
problematic.



-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: PROBLEM: KB-KiB, MB - MiB, ... (IEC 60027-2)

2007-01-22 Thread Lennart Sorensen

On Sun, Jan 21, 2007 at 12:10:00PM +0100, Eduard Bloch wrote:
 And I cannot seriosly believe that you are cappable of reading his
 examples. Megabananas are a ridiculous demonstration becase of the
 object beeing counted itself, but if you take stuff from real life then
 I doubt that you expect a kilometer to be 1024 meters. Same for
 kilogram. And a megatone is not 1048576 tones, even not 104857600 kg,
 and not 107374182400 grams. Wanna more stupid examples created by
 abusing decimal units?

The computer world has a long history of borrowing and abusing terms.
Probably the majority of computer terms came to be that way.  Why should
we change any of them now?  Should we stop calling it booting because
some people might be confused and think it means kicking the computer?
Should we rename threads because people might think it has something to
do with sewing stuff together?

 You talk for everybody, or is it just your (and only your) mind refusing
 to accept new terms? For my taste, kib and mib are even easier to
 speech, easier than {KiLoBytE} resp. {MeGaBytE} or KaaaBe / eMmmBe.

There is too much legacy code and systems around for it to ever be
nonambiguous.  It is too late to fix it, and the units that this
standard came up with just sound too stupid to be taken seriously.

You also don't pronounce units just because it looks like you can.  So
KiB is not easier than KB.  Heck most people in speach wouild just call
them Ks (kays or something like that).  And MBs just become Megs.  Same
for Gigs.

Whoever wasted their time coming up with this standard, well they simply
wasted their time.  It will NEVER catch on, and it will never replace
the common usage.  It's about 50 or 60 years to late for that.

--
Len Sorensen
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC] Asynchronous Messaging

2007-01-22 Thread Wink Saville


On 1/22/07, Alan [EMAIL PROTECTED] wrote:

 This is accomplished by allocating a page (or more) of memory which
 is executable and mapped into every threads address space. Also, all
 ISR entry points are modified to detect if the code that was interrupted
 was executing within the ACE page. If it was then the ACE code is
 allowed to complete before the ISR continues. This then provides
 the guarantee of atomic execution.

What if you enter the ISR, pass the point of the check and then another
CPU core hits the ACE space ?


If CPU A has passed the point of the check then by definition the lock in
the ACE space that it was holding will have been released and be available
to CPU B, thus there will be no contention and CPU B will proceed to
execute the code within the ACE space.


Also how do you handle the case where the code gets stuck in your atomic
pages ?


The code in the ACE space must execute quickly and must never get stuck, the
same rules as any code which holds spin locks. As I envision it the
ACE space is micro-code provided by only the kernel and thus is bug
free.

Of course shit happens, for example I use ACE to manipulate shared linked lists.
What happens if a pointer passed to the ACE code caused a page fault.
This will cause the ISR to be reentered and is definitely a problem. But this
can be detected and fixed-up, i.e. release the spin lock and mark the
faulting code to be killed and not rescheduled.

My proof of concept code does not handle this situation but I believe it
can be handled.

A similar problem might occur if buggy or malicious code were to begin
executing in the middle of the ACE space rather than at one of its entry
points. Protection will need to put in place to handle this also. For instance
if N ISR's in a row detect that the ACE space code has never stopped
executing then kill the erroneous thread. Another idea would be to only
allow approved code to use ACE.

Wink
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

H. Peter Anvin wrote:
Ralf Baechle wrote:
On Sun, Jan 21, 2007 at 06:37:24PM +0200, S.ط£â€،aط¤إ¸lar Onur wrote:
21 Oca 2007 Paz tarihinde ط¥إ¸unlarط¤ï؟½ yazmط¤ï؟½ط¥إ¸tط¤ï؟½nط¤ï؟½z:
RSS feed of the git tree:
http://www.kernel.org/git/?p=linux/kernel/git/stable/linux-2.6.16.y.gi
t;a=r

I already mailed to webmaster _at_ kernel.org 2 days ago but still all
RSS feeds gaves Internal Server Error

kernel.org is not in quite the best shape currently due to the machines'
massive overload, so this may take a little while to get fixed.

Do note that www2.kernel.org has a load that is usually 1/20th of
www1.kernel.org; apparently due to Microsoft DNS braindamage (which
affects anyone whose ISP uses MS-DNS.) Using www2.kernel.org explicitly
is likely to give you better performance. HOWEVER, performance is going
to suck due to the measures we've had to take on the servers regardless,
and it's entirely likely git-rss is totally broken. Again, we should
have a dedicated git server in operation in about a month.

It's rather sad to see kernel.org ppl resort to ms tactics to fix sw problems
by throwing more hardware into the mess.

My hunch, it's a scheduler problem. Try spa_no-frills with xfs.

Thanks!

--
Al

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/

Re: O_DIRECT question

2007-01-22 Thread Al Boldi

Andrea Arcangeli wrote:
 Linus may be right that perhaps one day the CPU will be so much faster
 than disk that such a copy will not be measurable and then O_DIRECT
 could be downgraded to O_STREAMING or an fadvise. If such a day will
 come by, probably that same day Dr. Tanenbaum will be finally right
 about his OS design too.

Dr. T. is probably right with his OS design, it's just people aren't ready 
for it, yet.


Thanks!

--
Al

-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] select: fix sys_select to not leak ERESTARTNOHAND to userspace

2007-01-22 Thread Linus Torvalds



On Mon, 22 Jan 2007, Neil Horman wrote:

 On Mon, Jan 22, 2007 at 02:59:56PM +0100, Paolo Ornati wrote:
  
  the ERESTARTNOHAND thing is handled in arch specific signal code,
 
 In the signal handling path yes.

Right.

 Not always in the case of select, though.  Check core_sys_select:

No, even in the case of select().

 if (!ret) {
 ret = -ERESTARTNOHAND;
 if (signal_pending(current))
 goto out;
 ret = 0;

Since we have signal_pending(current) being true, we _know_ that the 
signal handling path will be triggered, so the ERESTARTNOHAND will be 
changed into the appropriate error return (or restart) by the signal 
handling code.

 Its possible for core_sys_select to return ERESTARTNOHAND to sys_select, which
 will in turn (as its currently written), return that value back to user space.

No. Exactly because sys_select() will always return through the system 
call handling path, and that will turn the ERESTARTNOHAND into something 
else.

NOTE! If you use ptrace(), you may see the internal errors. But that's a 
ptrace-only thing, and may have fooled you into thinking that the actual 
_application_ sees those internal errors. It won't.

Of course, we could have some signal-handling bug here, but if so, it 
would affect a lot more than just select(). Have you actually seen 
ERESTARTNOINTR in the app (not just ptrace?)

Linus
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: SATA exceptions with 2.6.20-rc5

2007-01-22 Thread Björn Steinbrink

On 2007.01.21 18:17:01 -0600, Robert Hancock wrote:
 Björn Steinbrink wrote:
 On 2007.01.21 13:58:01 -0600, Robert Hancock wrote:
 Björn Steinbrink wrote:
 All kernels were bad using that approach. So back to square 1. :/
 
 Björn
 
 OK guys, here's a new patch to try against 2.6.20-rc5:
 
 Right now when switching between ADMA mode and legacy mode (i.e. when 
 going from doing normal DMA reads/writes to doing a FLUSH CACHE) we just 
 set the ADMA GO register bit appropriately and continue with no delay. 
 It looks like in some cases the controller doesn't respond to this 
 immediately, it takes some nanoseconds for the controller's status 
 registers to reflect the change that was made. It's possible that if we 
 were trying to issue commands during this time, the controller might not 
 react properly. This patch adds some code to wait for the status 
 register to change to the state we asked for before continuing.
 
 Just got two exceptions with your patch, none of the debug messages were
 issued.
 
 Björn
 
 Hmm, another miss, apparently.. Has anyone tried removing these lines
 from nv_host_intr in 2.6.20-rc5 sata_nv.c and see what that does?
 
 /* bail out if not our interrupt */
 if (!(irq_stat  NV_INT_DEV))
 return 0;

Running a kernel with the return statement replace by a line that prints
the irq_stat instead.

Currently I'm seeing lots of 0x10 on ata1 and 0x0 on ata2.

Björn
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] nfs: fix congestion control -v3

2007-01-22 Thread Trond Myklebust

On Sat, 2007-01-20 at 08:01 +0100, Peter Zijlstra wrote:
 Subject: nfs: fix congestion control
 
 The current NFS client congestion logic is severly broken, it marks the 
 backing
 device congested during each nfs_writepages() call but doesn't mirror this in
 nfs_writepage() which makes for deadlocks. Also it implements its own 
 waitqueue.
 
 Replace this by a more regular congestion implementation that puts a cap on 
 the
 number of active writeback pages and uses the bdi congestion waitqueue.
 
 Signed-off-by: Peter Zijlstra [EMAIL PROTECTED]
 Cc: Trond Myklebust [EMAIL PROTECTED]
 ---
  fs/nfs/super.c  |4 -
  fs/nfs/sysctl.c |8 +++
  fs/nfs/write.c  |  116 
 
  include/linux/backing-dev.h |1 
  include/linux/nfs_fs.h  |1 
  include/linux/nfs_fs_sb.h   |1 
  mm/backing-dev.c|   16 ++
  7 files changed, 104 insertions(+), 43 deletions(-)
 
 Index: linux-2.6-git/fs/nfs/write.c
 ===
 --- linux-2.6-git.orig/fs/nfs/write.c 2007-01-20 07:20:10.0 +0100
 +++ linux-2.6-git/fs/nfs/write.c  2007-01-20 07:20:12.0 +0100
 @@ -89,8 +89,6 @@ static struct kmem_cache *nfs_wdata_cach
  static mempool_t *nfs_wdata_mempool;
  static mempool_t *nfs_commit_mempool;
  
 -static DECLARE_WAIT_QUEUE_HEAD(nfs_write_congestion);
 -
  struct nfs_write_data *nfs_commit_alloc(void)
  {
   struct nfs_write_data *p = mempool_alloc(nfs_commit_mempool, GFP_NOFS);
 @@ -245,6 +243,39 @@ static int wb_priority(struct writeback_
  }
  
  /*
 + * NFS congestion control
 + */
 +
 +int nfs_congestion_kb;
 +
 +#define NFS_CONGESTION_ON_THRESH (nfs_congestion_kb  (PAGE_SHIFT-10))
 +#define NFS_CONGESTION_OFF_THRESH\
 + (NFS_CONGESTION_ON_THRESH - (NFS_CONGESTION_ON_THRESH  2))
 +
 +static inline void nfs_set_page_writeback(struct page *page)
 +{
 + if (!test_set_page_writeback(page)) {
 + struct inode *inode = page-mapping-host;
 + struct nfs_server *nfss = NFS_SERVER(inode);
 +
 + if (atomic_inc_return(nfss-writeback)  
 NFS_CONGESTION_ON_THRESH)
 + set_bdi_congested(nfss-backing_dev_info, WRITE);
 + }
 +}
 +
 +static inline void nfs_end_page_writeback(struct page *page)
 +{
 + struct inode *inode = page-mapping-host;
 + struct nfs_server *nfss = NFS_SERVER(inode);
 +
 + end_page_writeback(page);
 + if (atomic_dec_return(nfss-writeback)  NFS_CONGESTION_OFF_THRESH) {
 + clear_bdi_congested(nfss-backing_dev_info, WRITE);
 + congestion_end(WRITE);
 + }
 +}
 +
 +/*
   * Find an associated nfs write request, and prepare to flush it out
   * Returns 1 if there was no write request, or if the request was
   * already tagged by nfs_set_page_dirty.Returns 0 if the request
 @@ -281,7 +312,7 @@ static int nfs_page_mark_flush(struct pa
   spin_unlock(req_lock);
   if (test_and_set_bit(PG_FLUSHING, req-wb_flags) == 0) {
   nfs_mark_request_dirty(req);
 - set_page_writeback(page);
 + nfs_set_page_writeback(page);
   }
   ret = test_bit(PG_NEED_FLUSH, req-wb_flags);
   nfs_unlock_request(req);
 @@ -336,13 +367,8 @@ int nfs_writepage(struct page *page, str
   return err; 
  }
  
 -/*
 - * Note: causes nfs_update_request() to block on the assumption
 - *that the writeback is generated due to memory pressure.
 - */
  int nfs_writepages(struct address_space *mapping, struct writeback_control 
 *wbc)
  {
 - struct backing_dev_info *bdi = mapping-backing_dev_info;
   struct inode *inode = mapping-host;
   int err;
  
 @@ -351,11 +377,6 @@ int nfs_writepages(struct address_space 
   err = generic_writepages(mapping, wbc);
   if (err)
   return err;
 - while (test_and_set_bit(BDI_write_congested, bdi-state) != 0) {
 - if (wbc-nonblocking)
 - return 0;
 - nfs_wait_on_write_congestion(mapping, 0);
 - }
   err = nfs_flush_mapping(mapping, wbc, wb_priority(wbc));
   if (err  0)
   goto out;
 @@ -369,9 +390,6 @@ int nfs_writepages(struct address_space 
   if (err  0)
   err = 0;
  out:
 - clear_bit(BDI_write_congested, bdi-state);
 - wake_up_all(nfs_write_congestion);
 - congestion_end(WRITE);
   return err;
  }
  
 @@ -401,7 +419,7 @@ static int nfs_inode_add_request(struct 
  }
  
  /*
 - * Insert a write request into an inode
 + * Remove a write request from an inode
   */
  static void nfs_inode_remove_request(struct nfs_page *req)
  {
 @@ -585,8 +603,8 @@ static inline int nfs_scan_commit(struct
  
  static int nfs_wait_on_write_congestion(struct address_space *mapping, int 
 intr)
  {
 + struct inode *inode = mapping-host;
   struct backing_dev_info *bdi = mapping-backing_dev_info;
 - DEFINE_WAIT(wait);
   int ret = 0;

Re: O_DIRECT question

2007-01-22 Thread Phillip Susi


Denis Vlasenko wrote:

The difference is that you block exactly when you try to access
data which is not there yet, not sooner (potentially much sooner).

If application (e.g. database) needs to know whether data is _really_ there,
it should use aio_read (or something better, something which doesn't use 
signals.
Do we have this 'something'? I honestly don't know).


The application _IS_ using aio, which is why it can go and perform other 
work while it waits to be told that the read has completed.  This is not 
possible with mmap because the task is blocked while faulting in pages, 
and unless it tries to access the pages, they won't be faulted in.



In some cases, evne this is not needed because you don't have any other
things to do, so you just do read() (which returns early), and chew on
data. If your CPU is fast enough and processing of data is light enough
so that it outruns disk - big deal, you block in page fault handler
whenever a page is not read for you in time.
If CPU isn't fast enough, your CPU and disk subsystem are nicely working
in parallel.


Being blocked in the page fault handler means the cpu is now idle 
because you can't go chew on data that _IS_ in core.  The aio + O_DIRECT 
allows you to control when IO is started rather than rely on the kernel 
to decide when is a good time for readahead, and to KNOW when that IO is 
done so you can chew on the data.



With O_DIRECT, you alternate:
CPU is idle, disk is working / CPU is working, disk is idle.


You have this completely backwards.  With mmap this is what you get 
because you chew data, page fault... chew data... page fault...



What do you want to do on I/O error? I guess you cannot do much -
any sensible db will shutdown itself. When your data storage
starts to fail, it's pointless to continue running.


Ever hear of error recovery?  A good db will be able to cope with one or 
two bad blocks, or at the very least continue operating the other tables 
or databases it is hosting, or flush transactions and switch to read 
only mode, or any number of things other than abort().



You do not need to know which read() exactly failed due to bad disk.
Filename and offset from the start is enough. Right?

So, SIGIO/SIGBUS can provide that, and if your handler is of
void (*sa_sigaction)(int, siginfo_t *, void *);
style, you can get fd, memory address of the fault, etc.
Probably kernel can even pass file offset somewhere in siginfo_t...


Sure... now what does your signal handler have to do in order to handle 
this error in such a way as to allow the one request to be failed and 
the task to continue handling other requests?  I don't think this is 
even possible, yet alone clean.



You can still be multithreaded. The point is, with O_DIRECT
you _are forced_ to_ be_ multithreaded, or else perfomance will suck.


Or use aio.  Simple read/write with the kernel trying to outsmart the 
application is nice for very simple applications, but it does not 
provide very good performance.  This is why we have aio and O_DIRECT; 
because the application can manage the IO better than the kernel because 
it actually knows what it needs and when.


Yes, the application ends up being more complex, but that is the price 
you pay.  You simply can't get it perfect in a general purpose kernel 
that has to guess what the application is really trying to do.



You think Oracle. But this application may very well be
not Oracle, but diff, or dd, or KMail. I don't want to care.
I want all big writes to be efficient, not just those done by Oracle.
*Including* single threaded ones.


Then redesign those applications to use aio and O_DIRECT.  Incidentally 
I have hacked up dd to do just that and have some very nice performance 
numbers as a result.



Well, I too currently work with Oracle.
Apparently people who wrote damn thing have very, eh, Oracle-centric
world-view. We want direct writes to the disk. Period. Why? Does it
makes sense? Are there better ways? - nothing. They think they know better.


Nobody has shown otherwise to date.


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [PATCH] select: fix sys_select to not leak ERESTARTNOHAND to userspace

2007-01-22 Thread Neil Horman

On Mon, Jan 22, 2007 at 08:03:53AM -0800, Linus Torvalds wrote:
 
 
 On Mon, 22 Jan 2007, Neil Horman wrote:
 
  On Mon, Jan 22, 2007 at 02:59:56PM +0100, Paolo Ornati wrote:
   
   the ERESTARTNOHAND thing is handled in arch specific signal code,
  
  In the signal handling path yes.
 
 Right.
 
  Not always in the case of select, though.  Check core_sys_select:
 
 No, even in the case of select().
 
  if (!ret) {
  ret = -ERESTARTNOHAND;
  if (signal_pending(current))
  goto out;
  ret = 0;
 
 Since we have signal_pending(current) being true, we _know_ that the 
 signal handling path will be triggered, so the ERESTARTNOHAND will be 
 changed into the appropriate error return (or restart) by the signal 
 handling code.
 
  Its possible for core_sys_select to return ERESTARTNOHAND to sys_select, 
  which
  will in turn (as its currently written), return that value back to user 
  space.
 
 No. Exactly because sys_select() will always return through the system 
 call handling path, and that will turn the ERESTARTNOHAND into something 
 else.
 
 NOTE! If you use ptrace(), you may see the internal errors. But that's a 
 ptrace-only thing, and may have fooled you into thinking that the actual 
 _application_ sees those internal errors. It won't.
 
 Of course, we could have some signal-handling bug here, but if so, it 
 would affect a lot more than just select(). Have you actually seen 
 ERESTARTNOINTR in the app (not just ptrace?)
 
The error was reported to me second hand.  I'm expecting a reproducer (although
to date, I'm still waiting for it, so I may have jumped the gun here).  In fact,
I see what your saying now, down in the assembly glue for our arches (x86 in
this case) we jump to do_notify_resume since we have a pending signal, and
inside do_signal from there we fix up ERESTARTNOHAND to be something sane for
userspace.  Ok, I withdraw this patch.  I'll repost when/if I get my hands on
the reproducer and see that something is actually slipping through.

Neil

   Linus
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC 3/6] bidi support: bidirectional request

2007-01-22 Thread James Bottomley

On Mon, 2007-01-22 at 01:25 +0200, Boaz Harrosh wrote:
 - Instantiate another request_io_part in request for bidi_read.
 - Define  Implement new API for accessing bidi parts.
 - API to Build bidi requests and map to sglists.
 - Define new end_that_request_block() function to end a complete request.

Actually, this approach looks to be a bit too narrow.  You seem to be
predicating on the idea that the bidirectional will transfer in and out
of the same area.  For some of the frame in/frame out stuff, we probably
need the read and write areas for the bidirectional request to be
separated.

James


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC 1/6] bidi support: request dma_data_direction

2007-01-22 Thread Douglas Gilbert

Benny Halevy wrote:
 Douglas Gilbert wrote:
 Boaz Harrosh wrote:
 - Introduce a new enum dma_data_direction data_dir member in struct request.
   and remove the RW bit from request-cmd_flag
 - Add new API to query request direction.
 - Adjust existing API and implementation.
 - Cleanup wrong use of DMA_BIDIRECTIONAL

Perhaps the right use of DMA_BIRECTIONAL needs to be
defined.

Could it be used with a XDWRITE(10) SCSI command
defined in sbc3r07.pdf at http://www.t10.org ? I suspect
using two scatter gather lists would be a better approach.

 - Introduce new blk_rq_init_unqueued_req() and use it in places ad-hoc
   requests were used and bzero'ed.
 With a bi-directional transfer is it always unambiguous
 which transfer occurs first (or could they occur at
 the same time)?
 
 The bidi transfers can occur in any order and in parallel.

Then it is not sufficient for modern SCSI transports in which
certain bidirectional commands (probably most) have a well
defined order.

So DMA_BIDIRECTIONAL looks PCI specific and it may have
been a mistake to replace other subsystem's direction flags
with it. RDMA might be an interesting case.

Doug Gilbert


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [2.6 patch] SCSI seagate.c: remove SEAGATE_USE_ASM

On Sun, 21 Jan 2007 20:13:00 +0100
Adrian Bunk [EMAIL PROTECTED] wrote:

 Using assembler code for performance in drivers might have been a good 
 idea 15 years ago when this code was written, but with today's compilers 
 that's unlikely to be an advantage.
 
 Besides this, it also hurts the readability.
 
 Simply use the C code that was already there as an alternative.
 
 Signed-off-by: Adrian Bunk [EMAIL PROTECTED]
stosb\n\t

NAK

The C codepaths are essentially untested on this driver.

Alan
-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [RFC 1/6] bidi support: request dma_data_direction

2007-01-22 Thread James Bottomley

On Mon, 2007-01-22 at 10:05 -0500, Douglas Gilbert wrote:
 Perhaps the right use of DMA_BIRECTIONAL needs to be
 defined.
 
 Could it be used with a XDWRITE(10) SCSI command
 defined in sbc3r07.pdf at http://www.t10.org ? I suspect
 using two scatter gather lists would be a better approach.
 
  - Introduce new blk_rq_init_unqueued_req() and use it in places ad-hoc
requests were used and bzero'ed.
  With a bi-directional transfer is it always unambiguous
  which transfer occurs first (or could they occur at
  the same time)?
  
  The bidi transfers can occur in any order and in parallel.

 Then it is not sufficient for modern SCSI transports in which
 certain bidirectional commands (probably most) have a well
 defined order.

Right, that's why I think bi-directional needs to be one way op followed
by one way op ... even if it is to the same buffer.  That should be a
general enough paradigm for everything.

 So DMA_BIDIRECTIONAL looks PCI specific and it may have
 been a mistake to replace other subsystem's direction flags
 with it. RDMA might be an interesting case.

It's bus specific ... it means that the bus must be programmed to expect
the device to transfer both to and from the memory buffer.  There are a
very few drivers which do this when they don't know the actual transfer
direction, so it might be reasonably tested on architectures ... but
we'd probably have to check.

James


-
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Re: [2.6 patch] SCSI seagate.c: remove SEAGATE_USE_ASM