Re: [PATCH] FUTEX : new PRIVATE futexes

2007-04-05 Thread Eric Dumazet

Nick Piggin a écrit :

Hi Eric,

Thanks for doing this... It's looking good, I just have some minor
comments:


Hi Nick, thanks for reviewing.



Eric Dumazet wrote:

  */
-int get_futex_key(void __user *uaddr, union futex_key *key)
+int get_futex_key(void __user *uaddr, union futex_key *key,
+struct rw_semaphore *shared)


Can we pass in something other than the rw_semaphore here? Seeing as
it only actually gets used as a flag, it might be nicer just to pass
a 0 or 1? And all through the call stack...

Did the whole thing just turn out neater when you passed the rwsem?
We always know to use current->mm->mmap_sem, so it doesn't seem like
a boolean flag would hurt?


That's a good question

current->mm->mmap_sem being calculated once is a win in itself, because 
current access is not cheap.
It also does the memory access to go through part of the chain in advance, 
before its use. It does a prefetch() equivalent for free : If current->mm is 
not in CPU cache, CPU wont stall because next instructions dont depend on it.


This means less CPU stall in case current->mm is not in CPU cache. Thats 
difficult to benchmark it, but you can trust me.


A flag means :

if (flag)
up_read(>mm->mmap_sem)

This generates quite a bad code.

if (ptr)
   up_read(ptr)

generates *much* better code.

So this is a cleanup and a runtime optimization.

I dit a similar optimization on commit 163da958ba5282cbf85e8b3dc08e4f51f8b01c5e

I invite you to check it :

http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=163da958ba5282cbf85e8b3dc08e4f51f8b01c5e






 {
 unsigned long address = (unsigned long)uaddr;
 struct mm_struct *mm = current->mm;
@@ -218,6 +224,22 @@ int get_futex_key(void __user *uaddr, un
 address -= key->both.offset;
 
 /*

+ * PROCESS_PRIVATE futexes are fast.
+ * As the mm cannot disappear under us and the 'key' only needs
+ * virtual address, we dont even have to find the underlying vma.
+ * Note : We do have to check 'address' is a valid user address,
+ *but access_ok() should be faster than find_vma()
+ * Note : At this point, address points to the start of page,
+ *not the real futex address, this is ok.
+ */
+if (!shared) {
+if (!access_ok(VERIFY_WRITE, address, sizeof(int)))
+return -EFAULT;


Shouldn't that be sizeof(long) to handle 64 bit futexes? Or strictly, it
should depend on the size of the operation. Maybe the access_ok check
should go outside get_futex_key?


If you check again, you'll see that address points to the start of the PAGE, 
not the real u32/u64 futex address. This checks the PAGE. We can use char, 
short, int, long, or char[PAGE_SIZE] as long as we know a futex cannot span 
two pages.




  */
 key->shared.inode = vma->vm_file->f_path.dentry->d_inode;
-key->both.offset++; /* Bit 0 of offset indicates inode-based key. */
+key->both.offset += FUT_OFF_INODE; /* inode-based key. */
 if (likely(!(vma->vm_flags & VM_NONLINEAR))) {
 key->shared.pgoff = (((address - vma->vm_start) >> PAGE_SHIFT)
  + vma->vm_pgoff);


I like |= for adding flags, it seems less ambiguous. But I guess that's
a matter of opinion. Hugh seems to like +=, and I can't argue with him
about style issues ;)



Previous code was doing offset++ wich means offset += 1;
I didnt want to hurt Hugh :)


 EXPORT_SYMBOL_GPL(drop_futex_key_refs);


I wonder if it would be worthwhile inlining and likley()ing the
private fastpath? Might make it pretty compact... I guess that's
something to worry about after glibc gets support.


Yes, in a future patch, in about one year :)


+
+if (!(vma = find_vma(mm, address)) ||
+vma->vm_start > address || !(vma->vm_flags & VM_WRITE))
+ret = -EFAULT;
+
+else
+switch (handle_mm_fault(mm, vma, address, 1)) {
+case VM_FAULT_MINOR:
+current->min_flt++;
+break;
+case VM_FAULT_MAJOR:
+current->maj_flt++;
+break;
+default:
+ret = -EFAULT;
+}
+if (!shared)
+up_read(>mmap_sem);
+return ret;
 }
 
 /*


You've got an extra space after the if (maybe for clarity?). In this
situation I prefer putting braces around both the if and the else, and
if you get rid of that blank line, it doesn't cost you anything more ;)


Oh well...




@@ -1598,6 +1656,8 @@ static int futex_wait(unsigned long __us
 restart->arg1 = val;
 restart->arg2 = (unsigned long)abs_time;
 restart->arg3 = (unsigned long)futex64;
+if (shared)
+restart->arg3 |= 2;


Could you make this into a proper flags argument and use #define 
CONSTANTs for it?


Yes, but I'm not sure it will improve readability.




@@ -2377,23 +2455,24 @@ sys_futex64(u64 __user *uaddr, int op, u
 struct timespec ts;
 ktime_t t, *tp = NULL;
 u64 val2 = 0;
+int opm = op & FUTEX_CMD_MASK;


What's opm stand for?



[PATCH 2.6.21-rc6] mm/page_alloc.c: removal of an unused definition of 'setup_n_node_ids'

2007-04-05 Thread Patrick Ringl
Remove an empty and thus unused definition of 'setup_nr_node_ids' (in 
case of MAX_NUMNODES < 1) in order to resolve a compiler warning.


Signed-off-by: Patrick Ringl <[EMAIL PROTECTED]>
---

--- linux-2.6.20-o/mm/page_alloc.c  2007-03-22 23:11:25.0 +0100
+++ linux-2.6.20/mm/page_alloc.c2007-04-06 07:19:38.0 +0200
@@ -680,8 +680,6 @@ static void __init setup_nr_node_ids(voi
   highest = node;
   nr_node_ids = highest + 1;
}
-#else
-static void __init setup_nr_node_ids(void) {}
#endif

#ifdef CONFIG_NUMA



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: RAID1 "out of memory" error, was Re: 2.6.21-rc5-mm4

2007-04-05 Thread Dan Williams

On 4/5/07, Andrew Morton <[EMAIL PROTECTED]> wrote:

On Fri, 06 Apr 2007 02:33:03 +1000
Reuben Farrelly <[EMAIL PROTECTED]> wrote:

> Hi,
>
> On 3/04/2007 3:47 PM, Andrew Morton wrote:
> > 
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.21-rc5/2.6.21-rc5-mm4/
> >
> > - The oops in git-net.patch has been fixed, so that tree has been restored.
> >   It is huge.
> >
> > - Added the device-mapper development tree to the -mm lineup (Alasdair
> >   Kergon).  It is a quilt tree, living at
> >   ftp://ftp.kernel.org/pub/linux/kernel/people/agk/patches/2.6/editing/.
> >
> > - Added davidel's signalfd stuff.
>
> Looks like some damage, or maybe intolerance to on-disk damage, to RAID-1.
>
> md1 is the first array on the disk, and it refuses to start up on boot, or 
after
> boot.
>
> ...
>
> tornado ~ # mdadm --assemble /dev/md1 /dev/sda1 /dev/sdc1
> mdadm: device /dev/md1 already active - cannot assemble it
> tornado ~ # mdadm --run /dev/md1
> mdadm: failed to run array /dev/md1: Cannot allocate memory
> tornado ~ #
>
> and looking at a dmesg, this is logged:
>
> md: bind
> md: bind
> raid1: raid set md1 active with 2 out of 2 mirrors
> md1: bitmap initialized from disk: read 0/1 pages, set 0 bits, status: -12
> md1: failed to create bitmap (-12)
> md: pers->run() failed ...


Is this the dmesg from boot or the dmesg after running the mdadm --run command?


>
> tornado ~ # uname -a
> Linux tornado 2.6.21-rc5-mm4 #1 SMP Thu Apr 5 23:47:42 EST 2007 x86_64 
Intel(R)
> Pentium(R) 4 CPU 3.00GHz GenuineIntel GNU/Linux
> tornado ~ #
>
> The last known version that worked was 2.6.21-rc3-mm1 - I haven't been testing
> out the -mm releases so much lately.

OK.  I assume that bitmap->chunks in bitmap_init_from_disk() has some
unexpectedly large value.

I don't _think_ there's anything in -mm which would have triggered this.
Does mainline do the same thing?

I guess it's possible that the code in git-md-accel.patch accidentally
broke things.  Perhaps try disabling CONFIG_DMA_ENGINE?



git-md-accel.patch does not touch anything in the raid1 path, but I
guess stranger things have happened.

--
Dan
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Reiser4. BEST FILESYSTEM EVER.

2007-04-05 Thread Valdis . Kletnieks
On Thu, 05 Apr 2007 18:34:48 PDT, [EMAIL PROTECTED] said:

> If they are accurate, THEN they are obviously very relevant.

Erm. No. They're not "obviously" very relevant.

I could hypothetically create a benchmark, that's accurate and repeatable,
that shows that reiser4 is able to wash a herd of elephants exactly 11.458%
faster than ext3.  And you would, of course, say "But elephants have nothing to
do with file systems", Because they aren't relevant to file systems.

Similarly, we've seen benchmarks that show some patch improves NUMA performance
by 5% - and those aren't relevant on my laptop because my laptop doesn't do
NUMA.  And a benchmark of file system performance is only as relevant as it
reflects *your* application's use of the filesystem - how fast it can create
and remove tiny files isn't relevant if your use of the filesystem is to store
large files with long sequential read/write patterns.  And the level of
compression isn't very relevant if you're using the partition to store
already-compressed audio or video.

I know somebody who defines a "relevance index" for things, and the measure
is "how many cubicles do I have to go to find somebody who actually cares
about ABC?" - and for him, that's itself a relevant index, because if it's
0, *he* cares, and if it's 1, his immediate neighbors care and will cause him
grief if ABC is a problem.   People who are 5 or 6 cubicles away are less
likely to give him a hard time, and the people who are 15 to 20 cubicles away
are in an entirely separate building. :)


pgp7slhTxfy9C.pgp
Description: PGP signature


Re: Reiser4. BEST FILESYSTEM EVER.

2007-04-05 Thread johnrobertbanks
Hi Peter,

You say that the results may be accurate, but "Whether or not they're
*relevant* is a totally different ball of wax." and

"Whether or not they're relevant depends on how well they happen to
reflect your particular usage pattern."

Well, surprise, surprise,.. everyone knows that.

Have a look at the (summary) of the results: 

.-.
| FILESYSTEM | TIME |DISK |
| TYPE   |(secs)|USAGE|
.-.
|REISER4 lzo | 1938 | 278 |
|REISER4 gzip| 2295 | 213 |
|REISER4 | 3462 | 692 |
|EXT2| 4092 | 816 |
|JFS | 4225 | 806 |
|EXT4| 4408 | 816 |
|EXT3| 4421 | 816 |
|XFS | 4625 | 779 |
|REISER3 | 6178 | 793 |
|FAT32   |12342 | 988 |
|NTFS-3g |10414 | 772 |
.-.


for the full results see:
http://linuxhelp.150m.com/resources/fs-benchmarks.htm 

Don't you agree, that "If they are accurate, THEN they are obviously
very relevant."

I have set up a Reiser4 partition with gzip compression, here is the
difference in disk usage of a typical Debian installation on two 10GB
partitions, one with Reiser3 and the other with Reiser4.

debian:/# df
Filesystem   1K-blocks  Used Available Use% Mounted on
/dev/sda3 10490104   6379164   4110940  61% /3
/dev/sda7  9967960   2632488   7335472  27% /7

Partitions 3 and 7 have exactly the same data on them (the typical
Debian install).

The partitions are exactly the same size (although df records different
sizes).

Partition 3 is Reiser3 -- uses 6.4 GB.
Partition 7 is Reiser4 -- uses 2.6 GB.

So Reiser4 uses 2.6 GB to store the (typical) data that it takes Reiser3
6.4 GB to store (note it would take ext2/3/4 some 7 GB to store the same
info).

Don't you think this result is significant in itself?

Following your hint I have booted /dev/sda7 and all the programs seem to
work fine. They do not seem to be any faster than when using Reiser3.

The whole system seems about as responsive as always.

For fun, I ran bonnie++. Here are the results:

debian:/# ./bonnie++ -u root
Using uid:0, gid:0.
Writing a byte at a time...done
Writing intelligently...done
Rewriting...done
Reading a byte at a time...done
Reading intelligently...done
start 'em...done...done...done...done...done...
Create files in sequential order...done.
Stat files in sequential order...done.
Delete files in sequential order...done.
Create files in random order...done.
Stat files in random order...done.
Delete files in random order...done.
Version 1.93c   --Sequential Output-- --Sequential Input-
--Random-
Concurrency   1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
--Seeks--
MachineSize K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP 
/sec %CP
debian   1G   121  99 86524  21 63297  41   920  99 187762  80 
1782 233
Latency 82484us 386ms 438ms   26758us 110ms
398ms
Version 1.93c   --Sequential Create-- Random
Create
debian  -Create-- --Read--- -Delete-- -Create-- --Read---
-Delete--
  files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP 
  /sec %CP
 16 + +++ + +++ 18509  92 17776  86 + +++
 19495  91
Latency   210us5475us5525us5777us5522us   
5839us

I particularly liked the 233%CP for Random-Seeks.

John.



On Thu, 05 Apr 2007 21:07:28 -0700, "H. Peter Anvin" <[EMAIL PROTECTED]>
said:
> [EMAIL PROTECTED] wrote:
> > Hi Peter,
> > 
> > You say that the results may be accurate, but not relevant.
> > 
> 
> NO, I said that whether they're accurate is another matter.
> 
> > If they are accurate, THEN they are obviously very relevant.
> 
> Crap-o-la.  Whether or not they're relevant depends on how well they 
> happen to reflect your particular usage pattern.
> 
> There are NO benchmarks which are relevant to all users.  Understanding 
> whether or not a benchmark is relevant to one's particular application 
> is one of the trickiest things about benchmarks.
> 
>   -hpa
-- 
  
  [EMAIL PROTECTED]

-- 
http://www.fastmail.fm - Email service worth paying for. Try it for free

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Reiser4. BEST FILESYSTEM EVER.

2007-04-05 Thread H. Peter Anvin

[EMAIL PROTECTED] wrote:

Hi Peter,

You say that the results may be accurate, but not relevant.



NO, I said that whether they're accurate is another matter.


If they are accurate, THEN they are obviously very relevant.


Crap-o-la.  Whether or not they're relevant depends on how well they 
happen to reflect your particular usage pattern.


There are NO benchmarks which are relevant to all users.  Understanding 
whether or not a benchmark is relevant to one's particular application 
is one of the trickiest things about benchmarks.


-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] USB gadget rndis: fix struct rndis_packet_msg_type unaligned bug

2007-04-05 Thread Wu, Bryan
[PATCH] usb gadget rndis: 
skb_push function may return a pointer which is not aligned as required
by struct rndis_packet_msg_type. Using attribute trick to fix this bug.

Signed-off-by: Roy Huang <[EMAIL PROTECTED]>
Signed-off-by: Jie Zhang <[EMAIL PROTECTED]>
Signed-off-by: Bryan Wu <[EMAIL PROTECTED]>
---
 drivers/usb/gadget/rndis.h |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/usb/gadget/rndis.h b/drivers/usb/gadget/rndis.h
index 4c3c725..397b149 100644
--- a/drivers/usb/gadget/rndis.h
+++ b/drivers/usb/gadget/rndis.h
@@ -195,7 +195,7 @@ struct rndis_packet_msg_type
__le32  PerPacketInfoLength;
__le32  VcHandle;
__le32  Reserved;
-};
+} __attribute__ ((packed));
 
 struct rndis_config_parameter
 {
-- 
1.5.0.5

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Lguest launcher, child starving parent

2007-04-05 Thread Rusty Russell
On Thu, 2007-04-05 at 16:40 -0400, Steven Rostedt wrote:
> Glauber noticed long delays between hitting a key, and seeing data come
> up on the virtual console.  Looking into this, I found that the
> wake_parent routine that reads from all devices was actually starving
> out the parent after sending the parent a signal to wake up.
> 
> The thing is, the child which takes the console input is recognized by
> the scheduler as an interactive process.  The parent, doesn't do so
> much, so it is recognized more as a CPU hog. So the child easily gets a
> higher priority than the parent.

Hmm, I changed the prio of the waker from "nice(19)" to "nice(5)" after
Andi complained (he still isn't happy tho).  I'll change it back for the
moment.

Unfortunately we need to keep sending signals to the parent, in order to
avoid the race between unblocking SIGUSR1 and the read() on /dev/lguest.
This is the nature of Unix signals, unfortunately.

I've been pondering restoring the original /dev/lguest interface, which
handed an fd directly into the kernel.  Then the child would just use
this fd and not send signals.  It could well improve performance, too...

Thanks for the bug report,
Rusty.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: set up new kernel with grub

2007-04-05 Thread WANG Cong
On Thu, Apr 05, 2007 at 12:28:03PM -0500, Michael wrote:
>Hi, Dick,
>
>Your steps work beautifully. Thanks.
>
>If you could explain a little about what happens in each step, that 
>would be even better.
>
>> # cd /usr/src/linux-2.6.20.3
>> If your current kernel is 2.6.20.3, edit the Makefile to
>> add some character after "EXTRAVERSION" as EXTRAVERSION= 3x
>> # cp .config ..

Save your existing config file in the parent directory.

>> # make distclean

Clean the files generated by last compiling.

>> # cp ../.config .

Copy your .config back here.

>> # make oldconfig

"The make oldconfig command causes the kernel configuration process to read in 
your existing configuration information and then prompt you for a value for any 
kernel configuration variables that were not provided set the existing kernel 
configuration file."

>> # make

Check all changed object files, and do the final kernel image link.

>> # make modules_install

Reinstall the newly-compiled modules.

>> # make install

Copy the kernel image and system.map to /boot and modify /boot/grub/menu.lst 
(or lilo.conf) properly.



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: missing madvise functionality

2007-04-05 Thread Nick Piggin

Ulrich Drepper wrote:

Nick Piggin wrote:


Cool. According to my thinking, madvise(MADV_DONTNEED) even in today's
kernels using down_write(mmap_sem) for MADV_DONTNEED is better than
mmap/mprotect, which have more fundamental locking requirements, more
overhead and no benefits (except debugging, I suppose).



It's a tiny bit faster, see

  http://people.redhat.com/drepper/dontneed.png

I just ran it once so the graph is not smooth.  This is on a UP dual
core machine.  Maybe tomorrow I'll turn on the big 4p machine.


Hmm, I saw an improvement, but that was just on a raw syscall test
with a single page chunk. Real-world use I guess will get progressively
less dramatic as other overheads start being introduced.

Multi-thread performance probably won't get a whole lot better (it does
eliminate 1 down_write(mmap_sem), but one remains) until you use my
madvise patch.



I would have to see dramatically different results on the big machine to
make me change the libc code.  The reason is that there is a big drawback.

So far, when we allocate a new arena, we allocate address space with
PROT_NONE and only when we need memory the protection is changed to
PROT_READ|PROT_WRITE.  This is the advantage of catching wild pointer
accesses.


Sure, yes. And I guess you'd always want to keep that options around as
a debugging aid.

--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: missing madvise functionality

2007-04-05 Thread Ulrich Drepper
Nick Piggin wrote:
> Cool. According to my thinking, madvise(MADV_DONTNEED) even in today's
> kernels using down_write(mmap_sem) for MADV_DONTNEED is better than
> mmap/mprotect, which have more fundamental locking requirements, more
> overhead and no benefits (except debugging, I suppose).

It's a tiny bit faster, see

  http://people.redhat.com/drepper/dontneed.png

I just ran it once so the graph is not smooth.  This is on a UP dual
core machine.  Maybe tomorrow I'll turn on the big 4p machine.

I would have to see dramatically different results on the big machine to
make me change the libc code.  The reason is that there is a big drawback.

So far, when we allocate a new arena, we allocate address space with
PROT_NONE and only when we need memory the protection is changed to
PROT_READ|PROT_WRITE.  This is the advantage of catching wild pointer
accesses.

-- 
➧ Ulrich Drepper ➧ Red Hat, Inc. ➧ 444 Castro St ➧ Mountain View, CA ❖



signature.asc
Description: OpenPGP digital signature


Linux 2.6.21-rc6

2007-04-05 Thread Linus Torvalds

Ok,
 I don't think there really is anything very interesting here, but we're 
hopefully whittling down the list of regressions, and fixing various 
random other small issues while at it.

Some smallish MIPS updates, networking (and network driver) fixes, removal 
of a long obsolete framebuffer driver, etc etc. The shortlog really tells 
the story.

We should be getting close to a 2.6.21 release, so please update any 
regression reports you've done,

Linus

---
Adrian Bunk (6):
  [DCCP]: make dccp_write_xmit_timer() static again
  9p: make struct v9fs_cached_file_operations static
  drivers/spi/: fix section mismatches
  drivers/eisa/pci_eisa.c:pci_eisa_init() should be init
  drivers/mfd/sm501.c: fix an off-by-one
  net/sunrpc/svcsock.c: fix a check

Alan Cox (2):
  tty: minor merge correction
  pata_pdc202xx_old: LBA48 bug

Alan Stern (1):
  UHCI: Fix problem caused by lack of terminating QH

Albert Lee (5):
  pdc202xx_new: Enable ATAPI DMA
  libata: reorder HSM_ST_FIRST for easier decoding (take 3)
  libata: Clear tf before doing request sense (take 3)
  libata: Limit max sector to 128 for TORiSAN DVD drives (take 3)
  libata: Limit ATAPI DMA to R/W commands only for TORiSAN DVD drives (take 
3)

Alexey Dobriyan (1):
  [NET]: Correct accept(2) recovery after sock_attach_fd()

Alexey Kuznetsov (1):
  [NET]: Fix neighbour destructor handling.

Andi Kleen (3):
  x86-64: Disable local APIC timer use on AMD systems with C1E
  x86-64: Let oprofile reserve MSR on all CPUs
  x86-64: Increase NMI watchdog probing timeout

Andreas Oberritter (2):
  V4L/DVB (5495): Tda10086: fix DiSEqC message length
  V4L/DVB (5496): Pluto2: fix incorrect TSCR register setting

Andrew Morton (4):
  proc: fix linkage with CONFIG_SYSCTL=y, CONFIG_PROC_SYSCTL=n
  revert "retries in ext3_prepare_write() violate ordering requirements"
  revert "retries in ext4_prepare_write() violate ordering requirements"
  remove protection of LANANA-reserved majors

Andrew Victor (1):
  [ARM] 4289/1: AT91: SAM9260 NAND flash timing

Arnaldo Carvalho de Melo (1):
  [DCCP] getsockopt: Fix DCCP_SOCKOPT_[SEND,RECV]_CSCOV

Avi Kivity (1):
  KVM: Prevent system selectors leaking into guest on real->protected mode 
transition on vmx

Ayaz Abdulla (2):
  forcedeth: fix nic poll
  forcedeth: fix tx timeout

Bartlomiej Zolnierkiewicz (2):
  ide: revert "ide: fix drive side 80c cable check, take 2" for now
  ide: fix locking for manual DMA enable/disable ("hdparm -d")

Bill Helfinstine (1):
  b44: fix IFF_ALLMULTI handling of CAM slots

Brian Pomerantz (1):
  fix page leak during core dump

Brice Goglin (1):
  myri10ge: correctly detect when TSO should be used

Bruce Fields (2):
  knfsd: nfsd4: fix inheritance flags on v4 ace derived from posix default 
ace
  knfsd: nfsd4: demote "clientid in use" printk to a dprintk

Carsten Otte (1):
  mm: fix xip issue with /dev/zero

Chris Dearman (2):
  [MIPS] lockdep: Handle interrupts in R3000 style c0_status register.
  [MIPS] lockdep: Deal with interrupt disable hazard in TRACE_IRQFLAGS

Chris Snook (1):
  atl1: save mac address on remove

Chuck Meade (1):
  [POWERPC] qe: Fix QUICC Engine SDMA setup errors

Conke Hu (1):
  ahci.c: walkaround for SB600 SATA internal error issue

Cornelia Huck (2):
  [S390] cio: Device status validity.
  [S390] cio: Fix handling of interrupt for csch().

Cyrill V. Gorcunov (1):
  SUN3/3X Lance trivial fix improved

Daniel Drake (1):
  generic_serial: fix decoding of baud rate

David Brownell (4):
  USB: omap_udc: workaround dma_free_coherent() bogosity
  USB: fix usb-serial/generic build warning
  USB: fix usb-serial/ftdi build warning
  rtc-cmos lockdep fix, irq updates

David Howells (1):
  SLAB: Mention slab name when listing corrupt objects

David S. Miller (4):
  [IPV6]: Fix routing round-robin locking.
  [DRM]: Delete sparc64 FFB driver code that never gets built.
  [VIDEO] ffb: Fix two DAC handling bugs.
  [SCSI]: Fix scsi_send_eh_cmnd scatterlist handling

David Wilder (1):
  [S390] kprobes: Align probe address.

David Woodhouse (1):
  bcm43xx: Fix machine check on PPC for version 1 PHY

Divy Le Ray (4):
  cxgb3 - Safeguard TCAM size usage
  cxgb3 - detect NIC only adapters
  cxgb3 - Tighten xgmac workaround
  cxgb3 - Firwmare update

Dmitriy Monakhov (1):
  splice: partial write fix

Erez Zilber (1):
  IB/iser: Handle aborting a command after it is sent

Eric W. Biederman (4):
  MSI-X: fix resume crash
  pid: Properly detect orphaned process groups in exit_notify
  msi: synchronously mask and unmask msi-x irqs.
  net: Ignore sysfs network device rename bugs.

Francois Romieu (3):
  sis190: new PHY support
  r8169: issue request_irq after the private data are completely initialized
 

Re: [-mm3 PATCH] (Retry) Check the return value of kobject_add and etc.

2007-04-05 Thread WANG Cong
On Thu, Apr 05, 2007 at 06:00:16PM +0200, Cornelia Huck wrote:
>On Thu, 5 Apr 2007 23:27:32 +0800,
>WANG Cong <[EMAIL PROTECTED]> wrote:
>
>> Thank you very much! I know. So I should replace all kfree with kobject_put, 
>> like this one:
>> 
>> -sysfs_create_link(>kobj, _subsys.kset.kobj, "subsystem");
>> +if (sysfs_create_link(>kobj, _subsys.kset.kobj, "subsystem")) {
>> +kobject_uevent(>kobj, KOBJ_REMOVE);
>> +kobject_del(>kobj);
>> +kobject_put(>kobj);
>> +return;
>> +}
>> 
>> Is that all right?
>> 
>
>Yes, or use kobject_unregister().

OK. Then I send it again. Hopefully it can be accepted this time. ;-p


Signed-off-by: WANG Cong <[EMAIL PROTECTED]>
---

--- linux-2.6.21-rc5-mm4/fs/partitions/check.c.orig 2007-04-05 
12:48:29.0 +0800
+++ linux-2.6.21-rc5-mm4/fs/partitions/check.c  2007-04-05 23:15:41.0 
+0800
@@ -385,10 +385,18 @@ void add_partition(struct gendisk *disk,
p->kobj.parent = >kobj;
p->kobj.ktype = _part;
kobject_init(>kobj);
-   kobject_add(>kobj);
+   if (kobject_add(>kobj)) {
+   kobject_put(>kobj);
+   return;
+   }
if (!disk->part_uevent_suppress)
kobject_uevent(>kobj, KOBJ_ADD);
-   sysfs_create_link(>kobj, _subsys.kset.kobj, "subsystem");
+   if (sysfs_create_link(>kobj, _subsys.kset.kobj, "subsystem")) {
+   kobject_uevent(>kobj, KOBJ_REMOVE);
+   kobject_del(>kobj);
+   kobject_put(>kobj);
+   return;
+   }
if (flags & ADDPART_FLAG_WHOLEDISK) {
static struct attribute addpartattr = {
.name = "whole_disk",
@@ -396,7 +404,13 @@ void add_partition(struct gendisk *disk,
.owner = THIS_MODULE,
};
 
-   sysfs_create_file(>kobj, );
+   if (sysfs_create_file(>kobj, )) {
+   sysfs_remove_link(>kobj, "subsystem");
+   kobject_uevent(>kobj, KOBJ_REMOVE);
+   kobject_del(>kobj);
+   kobject_put(>kobj);
+   return;
+   }
}
partition_sysfs_add_subdir(p);
disk->part[part-1] = p;


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Questions about porting perfmon2 to powerpc

2007-04-05 Thread Kevin Corry
On Thu April 5 2007 6:04 pm, Benjamin Herrenschmidt wrote:
> On Thu, 2007-04-05 at 14:55 -0500, Kevin Corry wrote:
> > First, the stock 2.6.20 kernel has a prototype in include/linux/smp.h for
> > a function called smp_call_function_single(). However, this routine is
> > only implemented on i386, x86_64, ia64, and mips. Perfmon2 apparently
> > needs to call this to run a function on a specific CPU. Powerpc provides
> > an smp_call_function() routine to run a function on all active CPUs, so I
> > used that as a basis to add an smp_call_function_single() routine. I've
> > included the patch below and was wondering if it looked like a sane
> > approach.
>
> We should do better... it will require some backend work for the various
> supported PICs though. I've always wanted to look into doing a 
> smp_call_function_cpumask in fact :-)

I was actually wondering about that myself today. It would seem like an 
smp_call_function() that takes a CPU mask would be much more flexible than 
either the current version or the new one that I proposed. However, that was 
a little more hacking that I was willing to do today on powerpc architecture 
code. :)

> > Next, we ran into a problem related to Perfmon2 initialization and sysfs.
> > The problem turned out to be that the powerpc version of topology_init()
> > is defined as an __initcall() routine, but Perfmon2's initialization is
> > done as a subsys_initcall() routine. Thus, Perfmon2 tries to initialize
> > its sysfs information before some of the powerpc cpu information has been
> > initialized. However, on all other architectures, topology_init() is
> > defined as a subsys_initcall() routine, so this problem was not seen on
> > any other platforms. Changing the powerpc version of topology_init() to a
> > subsys_initcall() seems to have fixed the bug. However, I'm not sure if
> > that is going to cause problems elsewhere in the powerpc code. I've
> > included the patch below (after the smp-call-function-single patch). Does
> > anyone know if this change is safe, or if there was a specific reason
> > that topology_init() was left as an __initcall() on powerpc?
>
> It would make sense to follow what other archs do. Note that if both
> perfmon and topology_init are subsys_initcall, that is on the same
> level, it's still a bit hairy to expect one to be called before the
> other...

I wondered that as well, but based on what Arnd posted earlier (presumably 
about the kernel linking order), the topology_init() call, which is in the 
arch/ top-level directory, should occur before pfm_init(), which is in 
perfmon/, even if both are in the same initcall level.

Thanks,
-- 
Kevin Corry
[EMAIL PROTECTED]
http://www.ibm.com/linux/
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Questions about porting perfmon2 to powerpc

2007-04-05 Thread Kevin Corry
On Thu April 5 2007 3:32 pm, Kevin Corry wrote:
> On Thu April 5 2007 3:08 pm, Arnd Bergmann wrote:
> > On Thursday 05 April 2007, Kevin Corry wrote:
> > > First, the stock 2.6.20 kernel has a prototype in include/linux/smp.h
> > > for a function called smp_call_function_single(). However, this routine
> > > is only implemented on i386, x86_64, ia64, and mips. Perfmon2
> > > apparently needs to call this to run a function on a specific CPU.
> > > Powerpc provides an smp_call_function() routine to run a function on
> > > all active CPUs, so I used that as a basis to add an
> > > smp_call_function_single() routine. I've included the patch below and
> > > was wondering if it looked like a sane approach.
> >
> > The function itself looks good, but since it's very similar to the
> > existing smp_call_function(), you should probably try to share some of
> > the code, e.g. by making a helper function that gets an argument to
> > decide whether to run on a specific CPU or on all CPUs.
>
> Ok. I'll see what I can come up with and post another patch today or
> tomorrow.

Here's a new version that adds smp_call_function_single(), and moves the
code that's shared with smp_call_function() to __smp_call_function().

Thanks,
-- 
Kevin Corry
[EMAIL PROTECTED]
http://www.ibm.com/linux/


Add an smp_call_function_single() to the powerpc architecture. Since this
is very similar to the existing smp_call_function() routine, the common
portions have been split out into __smp_call_function(). Since the
spin_lock(_lock) was moved to __smp_call_function(),
smp_call_function() now explicitly calls preempt_disable() before getting
the count of online CPUs.

Signed-off-by: Kevin Corry <[EMAIL PROTECTED]>

Index: linux-2.6.20-arnd3-perfmon/arch/powerpc/kernel/smp.c
===
--- linux-2.6.20-arnd3-perfmon.orig/arch/powerpc/kernel/smp.c
+++ linux-2.6.20-arnd3-perfmon/arch/powerpc/kernel/smp.c
@@ -198,26 +198,11 @@ static struct call_data_struct {
 /* delay of at least 8 seconds */
 #define SMP_CALL_TIMEOUT   8
 
-/*
- * This function sends a 'generic call function' IPI to all other CPUs
- * in the system.
- *
- * [SUMMARY] Run a function on all other CPUs.
- *  The function to run. This must be fast and non-blocking.
- *  An arbitrary pointer to pass to the function.
- *  currently unused.
- *  If true, wait (atomically) until function has completed on other 
CPUs.
- * [RETURNS] 0 on success, else a negative status code. Does not return until
- * remote CPUs are nearly ready to execute <> or are or have executed.
- *
- * You must not call this function with disabled interrupts or from a
- * hardware interrupt handler or from a bottom half handler.
- */
-int smp_call_function (void (*func) (void *info), void *info, int nonatomic,
-  int wait)
-{ 
+static int __smp_call_function(void (*func)(void *info), void *info,
+  int wait, int target_cpu, int num_cpus)
+{
struct call_data_struct data;
-   int ret = -1, cpus;
+   int ret = -1;
u64 timeout;
 
/* Can deadlock when called with interrupts disabled */
@@ -234,40 +219,33 @@ int smp_call_function (void (*func) (voi
atomic_set(, 0);
 
spin_lock(_lock);
-   /* Must grab online cpu count with preempt disabled, otherwise
-* it can change. */
-   cpus = num_online_cpus() - 1;
-   if (!cpus) {
-   ret = 0;
-   goto out;
-   }
 
call_data = 
smp_wmb();
/* Send a message to all other CPUs and wait for them to respond */
-   smp_ops->message_pass(MSG_ALL_BUT_SELF, PPC_MSG_CALL_FUNCTION);
+   smp_ops->message_pass(target_cpu, PPC_MSG_CALL_FUNCTION);
 
timeout = get_tb() + (u64) SMP_CALL_TIMEOUT * tb_ticks_per_sec;
 
/* Wait for response */
-   while (atomic_read() != cpus) {
+   while (atomic_read() != num_cpus) {
HMT_low();
if (get_tb() >= timeout) {
-   printk("smp_call_function on cpu %d: other cpus not "
-  "responding (%d)\n", smp_processor_id(),
-  atomic_read());
+   printk("%s on cpu %d: other cpus not "
+  "responding (%d)\n", __FUNCTION__,
+  smp_processor_id(), atomic_read());
debugger(NULL);
goto out;
}
}
 
if (wait) {
-   while (atomic_read() != cpus) {
+   while (atomic_read() != num_cpus) {
HMT_low();
if (get_tb() >= timeout) {
-   printk("smp_call_function on cpu %d: other "
-  "cpus not finishing (%d/%d)\n",
-  smp_processor_id(),
+   printk("%s on cpu %d: other 

Re: missing madvise functionality

2007-04-05 Thread Nick Piggin

Ulrich Drepper wrote:

In case somebody wants to play around with Rik patch or another
madvise-based patch, I have x86-64 glibc binaries which can use it:

  http://people.redhat.com/drepper/rpms

These are based on the latest Fedora rawhide version.  They should work
on older systems, too, but you screw up your updates.  Use them only if
you know what you do.

By default madvise(MADV_DONTNEED) is used.  With the environment variable


Cool. According to my thinking, madvise(MADV_DONTNEED) even in today's
kernels using down_write(mmap_sem) for MADV_DONTNEED is better than
mmap/mprotect, which have more fundamental locking requirements, more
overhead and no benefits (except debugging, I suppose).

MADV_DONTNEED is twice as fast in single threaded performance, and an
order of magnitude faster for multiple threads, when MADV_DONTNEED only
takes mmap_sem for read.

Do you plan to include this change in general glibc releases? Maybe it
will make google malloc obsolete? ;) (I don't suppose you'd be able to
get any tests done, Andrew?)

--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Define EFLAGS_IF

2007-04-05 Thread Rusty Russell
On Thu, 2007-04-05 at 18:06 -0700, H. Peter Anvin wrote:
> Andi Kleen wrote:
> > 
> > No processor.h is such a hodgepodge of unrelated stuff that any
> > splitting up is a good thing.
> > 
> 
> Fair enough.  However, I'd still like to see the X86_CR* constants 
> moved, too (and constants added for at least CR0 as well.)

Agreed.  This was on theory of minimum damage, but since it seems to
have received a warm reception, I'd say moving the rest to
processor-flags.h would be a welcome addition.

Cheers,
Rusty.



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] USB gadget rndis: fix bug skb_push function may return an unaligned pointer bug

2007-04-05 Thread Wu, Bryan
On Thu, 2007-04-05 at 14:29 -0700, David Brownell wrote:
> On Tuesday 03 April 2007 11:28 pm, Wu, Bryan wrote:
> > USB gadget rndis: skb_push function may return a pointer which is not
> > aligned as required by struct rndis_packet_msg_type.
> 
> Can you instead try to update the declaration of that struct
> so that it's "__attribute__((packed))"?  That's less invasive,
> and will address similar issues elsewhere ...
> 
> - Dave

OK, Jie and Roy will try to use this __attribute__ method and test it on
blackfin platform. Sorry for missing their "Signed-off-by". I will
resend a patch later for review.

Thanks
-Bryan

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Unified lguest launcher

2007-04-05 Thread Rusty Russell
On Thu, 2007-04-05 at 11:43 -0300, Glauber de Oliveira Costa wrote:
> and here's the new patch, merging rusty's suggestions and some more on my own.
> 
> May I upload this, or does Rusty (or any other) has some more suggestions?

This looks excellent!

There are a couple of extra spaces floating around, but that's trivial.
You use "errno = ESRCH; err()" where you could use "errx()".

Please merge it straight in. No need for a separate patch in the tree
for this I think, unless you plan more work?

Thanks!
Rusty.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Oops in scsi_send_eh_cmnd 2.6.21-rc5-git6,7,10,13

2007-04-05 Thread Andrew Burgess

On Thu, 2007-04-05 at 15:36 -0700, David Miller wrote:
> From: Andrew Burgess <[EMAIL PROTECTED]>
> Date: Thu, 5 Apr 2007 15:13:27 -0700
> 
> > David, do you see any other problems with scsi_send_eh_cmnd?
> > 
> > I've switched back to 2.6.18 which seems to not oops 
> > and am happy to try patches.
> 
> Does 2.6.20 with my patch OOPS too?  Does reverting my patch
> make the oops go away?
> 
> If reverting my patch makes the OOPS go away, we need to
> verify if page_address() is returning crap for some reason
> or the length is wrong.

2.6.20.4 with your patch dies in the memcpy (as does 21-gitN)

2.6.20.4 without your patch dies in the subsequent __free_page
with a null pointer ref at 000...008

James should I try your posted patch? On which kernel?

This machine will die in boot on these kernels until I power
cycle it (which somehow fixes the disk/controller for a
while), 2.6.18 continue to work (gets the scsi errors and
continues)


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


AIC79xx: scsi0: device overrun (status a)

2007-04-05 Thread Wakko Warner
Can anyone tell me what this means:
Apr  5 22:11:56 vegeta kernel: [ 1265.267700] scsi0: device overrun (status a) 
on 0:1:0

Kernel is 2.6.20.

I setup a raid1 between 2 hard disks (on partition #2), as soon as it
started to sync the array, my log was flooded with the above entry.

The scsi adapter is an onboard controller on a supermicro x5da8.  The hard
disks are SEAGATE ST318404LW drives on channel 0 (no other devices on this
channel).

>From what I can tell, the speed of the sync is going fairly quickly. 
~29mb/sec

This is really strange to me, since I dd'd from the first disk to the second
with out any messages in the log (It was an older kernel, 2.6.17)

If there's any other information needed, ask.  I'm not sure what else is
needed.

-- 
 Lab tests show that use of micro$oft causes cancer in lab animals
 Got Gas???
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: init's children list is long and slows reaping children.

2007-04-05 Thread Eric W. Biederman
Linus Torvalds <[EMAIL PROTECTED]> writes:

> On Thu, 5 Apr 2007, Chris Snook wrote:
>
>> Linus Torvalds wrote:
>>
>> > Another thing we could do is to just make sure that kernel threads simply
>> > don't end up as children of init. That whole thing is silly, they're really
>> > not children of the user-space init anyway. Comments?
>> 
>> Does anyone remember why we started doing this in the first place?  I'm sure
>> there are some tools that expect a process tree, rather than a forest, and
>> making it a forest could make them unhappy.
>
> I'm not sure anybody would really be unhappy with pptr pointing to some 
> magic and special task that has pid 0 (which makes it clear to everybody 
> that the parent is something special), and that has SIGCHLD set to SIG_IGN 
> (which should make the exit case not even go through the zombie phase).
>
> I can't even imagine *how* you'd make a tool unhappy with that, since even 
> tools like "ps" (and even more "pstree" won't read all the process states 
> atomically, so they invariably will see parent pointers that don't even 
> exist any more, because by the time they get to the parent, it has exited 
> already.

Right.  pid == 1 being missing might cause some confusing having 
but having ppid == 0 should be fine.  Heck pid == 1 already has 
ppid == 0, so it is a value user space has had to deal with for a
while.

In addition there was a period in 2.6 where most kernel threads
and init had a pgid == 0 and a session  == 0, and nothing seemed
to complain.

We should probably make all of the kernel threads children of
init_task.  The initial idle thread on the first cpu that is the
parent of pid == 1.   That will give the ppid == 0 naturally because
the idle thread has pid == 0.

>> The support angel on my shoulder says we should just put all the kernel
>> threads under a kthread subtree to shorten init's child list and minimize
>> impact.
>
> A number are already there, of course, since they use the kthread 
> infrastructure to get there. 

Almost everything should be using kthread by now.  I do admit that there
are a handful of kernel threads that still use kthread_create but it
is a relatively short list.

Looking we apparently have a couple of process started by
kthread_create that are not under kthread.  They all have  pid numbers
lower than kthread so I'm guessing it is some startup ordering issue.

Currently it looks like daemonize is reparenting everything to init,
changing that to init_task and making the threads self reaping
should be trivial.

.

I'm a little nervous that we exceeded our default pid max just booting
the kernel.  32768 is a lot of kernel threads.  That sounds like 32
kernel threads per cpu.  That seems to be more than I have on any
of my little machines.


There is no defined order for reaping of child processes and in fact
I can't even see anything in the kernel right now that would even
accidentally give user space the idea we had a defined order.

So I think we have some options once we get the kernel threads out
of the way.  Getting the kernel threads out of the way would seem
to be the first priority.

Eric
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Bugfix for VMI paravirt ops

2007-04-05 Thread Zachary Amsden

Jeremy Fitzhardinge wrote:

Zachary Amsden wrote:
  

Do you mean kmap_atomic_pfn?



Yes.

  

  kunmap_atomic can stay lazy (at least for VMI), actually, but it
doesn't help since it happens outside the spin lock.



May as well be consistent.  Or do you mean you can't flush outside the
spinlock, even if there's nothing pending?
  


Consistency is good.  Flush is always fine, just an extra function call.

Zach

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Bugfix for VMI paravirt ops

2007-04-05 Thread Jeremy Fitzhardinge
Zachary Amsden wrote:
> Do you mean kmap_atomic_pfn?

Yes.

>   kunmap_atomic can stay lazy (at least for VMI), actually, but it
> doesn't help since it happens outside the spin lock.

May as well be consistent.  Or do you mean you can't flush outside the
spinlock, even if there's nothing pending?

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Bugfix for VMI paravirt ops

2007-04-05 Thread Zachary Amsden

Jeremy Fitzhardinge wrote:

Zachary Amsden wrote:
  

Throw it in the queue; I'll slide in after it.



I've pushed it up.  I added a few missing cases to the patch
(kmap_atomic_pte, kunmap_atomic).
  


Do you mean kmap_atomic_pfn?  kunmap_atomic can stay lazy (at least for 
VMI), actually, but it doesn't help since it happens outside the spin lock.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Reiser4. BEST FILESYSTEM EVER.

2007-04-05 Thread johnrobertbanks
Hi Peter,

You say that the results may be accurate, but not relevant.

.-.
| FILESYSTEM | TIME |DISK |
| TYPE   |(secs)|USAGE|
.-.
|REISER4 lzo | 1938 | 278 |
|REISER4 gzip| 2295 | 213 |
|REISER4 | 3462 | 692 |
|EXT2| 4092 | 816 |
|JFS | 4225 | 806 |
|EXT4| 4408 | 816 |
|EXT3| 4421 | 816 |
|XFS | 4625 | 779 |
|REISER3 | 6178 | 793 |
|FAT32   |12342 | 988 |
|NTFS-3g |10414 | 772 |
.-.

If they are accurate, THEN they are obviously very relevant.

Trying to follow http://linuxhelp.150m.com/resources/fs-benchmarks.htm 

I have set up a Reiser4 partition with gzip compression, here is the
difference in disk usage of a typical Debian installation on two 10GB
partitions, one with Reiser3 and the other with Reiser4.

debian:/# df
Filesystem   1K-blocks  Used Available Use% Mounted on
/dev/sda3 10490104   6379164   4110940  61% /3
/dev/sda7  9967960   2632488   7335472  27% /7

Partitions 3 and 7 have exactly the same data on them (the typical
Debian install).

The partitions are exactly the same size (although df records different
sizes).

Partition 3 is Reiser3 -- uses 6.4 GB.
Partition 7 is Reiser4 -- uses 2.6 GB.

So Reiser4 uses 2.6 GB to store the (typical) data that it takes Reiser3
6.4 GB to store (note it would take ext2/3/4 some 7 GB to store the same
info).

This seems very relevant to me.

John.



On Thu, 05 Apr 2007 17:39:58 -0700, "H. Peter Anvin" <[EMAIL PROTECTED]>
said:
> [EMAIL PROTECTED] wrote:
> > Yeap, I guess that will probably work. 
> > 
> > And here I was trying to compile old versions of GRUB from namesys.com.
> > 
> > By the way, do you think the benchmarks from:
> > 
> > http://linuxhelp.150m.com/resources/fs-benchmarks.htm and
> > http://m.domaindlx.com/LinuxHelp/resources/fs-benchmarks.htm
> > 
> > are accurate?
> > 
> 
> Accurate, probably.  Whether or not they're *relevant* is a totally 
> different ball of wax.
> 
>   -hpa
-- 
  
  [EMAIL PROTECTED]

-- 
http://www.fastmail.fm - mmm... Fastmail...

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Bugfix for VMI paravirt ops

2007-04-05 Thread Jeremy Fitzhardinge
Zachary Amsden wrote:
> Throw it in the queue; I'll slide in after it.

I've pushed it up.  I added a few missing cases to the patch
(kmap_atomic_pte, kunmap_atomic).

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: init's children list is long and slows reaping children.

2007-04-05 Thread Linus Torvalds


On Thu, 5 Apr 2007, Chris Snook wrote:

> Linus Torvalds wrote:
>
> > Another thing we could do is to just make sure that kernel threads simply
> > don't end up as children of init. That whole thing is silly, they're really
> > not children of the user-space init anyway. Comments?
> 
> Does anyone remember why we started doing this in the first place?  I'm sure
> there are some tools that expect a process tree, rather than a forest, and
> making it a forest could make them unhappy.

I'm not sure anybody would really be unhappy with pptr pointing to some 
magic and special task that has pid 0 (which makes it clear to everybody 
that the parent is something special), and that has SIGCHLD set to SIG_IGN 
(which should make the exit case not even go through the zombie phase).

I can't even imagine *how* you'd make a tool unhappy with that, since even 
tools like "ps" (and even more "pstree" won't read all the process states 
atomically, so they invariably will see parent pointers that don't even 
exist any more, because by the time they get to the parent, it has exited 
already.

> The support angel on my shoulder says we should just put all the kernel
> threads under a kthread subtree to shorten init's child list and minimize
> impact.

A number are already there, of course, since they use the kthread 
infrastructure to get there. 

Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: missing madvise functionality

2007-04-05 Thread Nick Piggin

Rik van Riel wrote:

Nick Piggin wrote:


Oh, also: something like this patch would help out MADV_DONTNEED, as it
means it can run concurrently with page faults. I think the locking will
work (but needs forward porting).



Ironically, your patch decreases throughput on my quad core
test system, with Jakub's test case.

MADV_DONTNEED, my patch, 1 loops  (14k context switches/second)

real0m34.890s
user0m17.256s
sys 0m29.797s


MADV_DONTNEED, my patch & your patch, 1 loops  (50 context 
switches/second)


real1m8.321s
user0m20.840s
sys 1m55.677s

I suspect it's moving the contention onto the page table lock,
in zap_pte_range().  I guess that the thread private memory
areas must be living right next to each other, in the same
page table lock regions :)

For more real world workloads, like the MySQL sysbench one,
I still suspect that your patch would improve things.


I think it definitely would, because the app will be wanting to
do other things with mmap_sem as well (like futexes *grumble*).

Also, the test case is allocating and freeing 512K chunks, which
I think would be on the high side of typical.

You have 32 threads for 4 CPUs, so then it would actually make
sense to context switch on mmap_sem write lock rather than spin
on ptl. But the kernel doesn't know that.

Testing with a small chunk size or thread == CPUs I think would
show a swing toward my patch.

--
SUSE Labs, Novell Inc.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/3] epoll cleanups - epoll include diet ...

2007-04-05 Thread Andrew Morton
On Thu, 5 Apr 2007 18:12:58 -0700 (PDT)
Davide Libenzi  wrote:

> On Thu, 5 Apr 2007, Andrew Morton wrote:
> 
> > epoll uses signal stuff and might need signal.h.  It implements syscalls
> > and it certainly needs to have those syscall's prototypes in scope.  It
> > surely uses stuff from mm.h (doesn't everything??)
> 
> Ack about signal.h, I forgot about the pwait code :(
> Why syscalls.h? The eventpoll.c file expots syscalls, but it doesn't use 
> anything declared in there.

So that the compiler can verify that our declarations of sys_epoll_foo()
match our definitions of them.

> What does eventpoll.c use *directly* from mm.h? If eventpoll.c uses, let's 
> say sched.h, and sched.h needs mm.h, it is sched.h responsibility to 
> include the mm.h file not eventpoll.c one.
> 

Sure.  But if epoll.c _does_ use something from mm.h (or uses something
from a header which mm.h includes) then if we later remove the #include
mm.h from sched.h, eventpoll.c will break.

The general rule is: include in .c the header files which provide the stuff
which that .c file uses.  Now, it maybe that eventpoll.c indeed uses nothing
which mm.h provides, and nothing which mm.h's includees provide.  But it is
non-trivial to prove that.  Once added, includes are hard to remove :(
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Bugfix for VMI paravirt ops

2007-04-05 Thread Zachary Amsden

Jeremy Fitzhardinge wrote:

Zachary Amsden wrote:
  

Yes, thought about several solutions, and this seems the best.  But it
requires a new paravirt-op.



Not with the power of multiplexing.  Something like this, perhaps?
  


Ok, I tried that and I got a nice clean fix.  For 2.6.22.  Backporting 
this to 2.6.21 creates havoc, as a number of cleanup patches as well as 
changes to highmem code get in the way.


Andi, do you really want to deal with the conflicts this will create for 
the paravirt queue for 2.6.22, or would you rather apply the dumb yet 
simple and non-confrontational workaround I have been trying to get 
applied to 2.6.21?


Zach
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] FUTEX : new PRIVATE futexes

2007-04-05 Thread Nick Piggin

Hi Eric,

Thanks for doing this... It's looking good, I just have some minor
comments:

Eric Dumazet wrote:


Signed-off-by: Eric Dumazet <[EMAIL PROTECTED]>



--- linux-2.6.21-rc5-mm4/kernel/futex.c
+++ linux-2.6.21-rc5-mm4-ed/kernel/futex.c
@@ -16,6 +16,9 @@
  *  Copyright (C) 2006 Red Hat, Inc., Ingo Molnar <[EMAIL PROTECTED]>
  *  Copyright (C) 2006 Timesys Corp., Thomas Gleixner <[EMAIL PROTECTED]>
  *
+ *  PRIVATE futexes by Eric Dumazet
+ *  Copyright (C) 2007 Eric Dumazet <[EMAIL PROTECTED]>
+ *
  *  Thanks to Ben LaHaise for yelling "hashed waitqueues" loudly
  *  enough at me, Linus for the original (flawed) idea, Matthew
  *  Kirkwood for proof-of-concept implementation.
@@ -199,9 +202,12 @@ static inline int match_futex(union fute
  * Returns: 0, or negative error code.
  * The key words are stored in *key on success.
  *
- * Should be called with >mm->mmap_sem but NOT any spinlocks.
+ * shared is NULL for PROCESS_PRIVATE futexes
+ * For other futexes, it points to >mm->mmap_sem and
+ * caller must have taken the reader lock. but NOT any spinlocks.
  */
-int get_futex_key(void __user *uaddr, union futex_key *key)
+int get_futex_key(void __user *uaddr, union futex_key *key,
+   struct rw_semaphore *shared)


Can we pass in something other than the rw_semaphore here? Seeing as
it only actually gets used as a flag, it might be nicer just to pass
a 0 or 1? And all through the call stack...

Did the whole thing just turn out neater when you passed the rwsem?
We always know to use current->mm->mmap_sem, so it doesn't seem like
a boolean flag would hurt?


 {
unsigned long address = (unsigned long)uaddr;
struct mm_struct *mm = current->mm;
@@ -218,6 +224,22 @@ int get_futex_key(void __user *uaddr, un
address -= key->both.offset;
 
 	/*

+* PROCESS_PRIVATE futexes are fast.
+* As the mm cannot disappear under us and the 'key' only needs
+* virtual address, we dont even have to find the underlying vma.
+* Note : We do have to check 'address' is a valid user address,
+*but access_ok() should be faster than find_vma()
+* Note : At this point, address points to the start of page,
+*not the real futex address, this is ok.
+*/
+   if (!shared) {
+   if (!access_ok(VERIFY_WRITE, address, sizeof(int)))
+   return -EFAULT;


Shouldn't that be sizeof(long) to handle 64 bit futexes? Or strictly, it
should depend on the size of the operation. Maybe the access_ok check
should go outside get_futex_key?



+   key->private.mm = mm;
+   key->private.address = address;
+   return 0;
+   }
+   /*
 * The futex is hashed differently depending on whether
 * it's in a shared or private mapping.  So check vma first.
 */
@@ -244,6 +266,7 @@ int get_futex_key(void __user *uaddr, un
 * mappings of _writable_ handles.
 */
if (likely(!(vma->vm_flags & VM_MAYSHARE))) {
+   key->both.offset += FUT_OFF_MMSHARED; /* reference taken on mm 
*/
key->private.mm = mm;
key->private.address = address;
return 0;
@@ -253,7 +276,7 @@ int get_futex_key(void __user *uaddr, un
 * Linear file mappings are also simple.
 */
key->shared.inode = vma->vm_file->f_path.dentry->d_inode;
-   key->both.offset++; /* Bit 0 of offset indicates inode-based key. */
+   key->both.offset += FUT_OFF_INODE; /* inode-based key. */
if (likely(!(vma->vm_flags & VM_NONLINEAR))) {
key->shared.pgoff = (((address - vma->vm_start) >> PAGE_SHIFT)
 + vma->vm_pgoff);


I like |= for adding flags, it seems less ambiguous. But I guess that's
a matter of opinion. Hugh seems to like +=, and I can't argue with him
about style issues ;)


@@ -281,17 +304,19 @@ EXPORT_SYMBOL_GPL(get_futex_key);
  * Take a reference to the resource addressed by a key.
  * Can be called while holding spinlocks.
  *
- * NOTE: mmap_sem MUST be held between get_futex_key() and calling this
- * function, if it is called at all.  mmap_sem keeps key->shared.inode valid.
  */
 inline void get_futex_key_refs(union futex_key *key)
 {
-   if (key->both.ptr != 0) {
-   if (key->both.offset & 1)
+   if (key->both.ptr == 0)
+   return;
+   switch (key->both.offset & (FUT_OFF_INODE|FUT_OFF_MMSHARED)) {
+   case FUT_OFF_INODE:
atomic_inc(>shared.inode->i_count);
-   else
+   break;
+   case FUT_OFF_MMSHARED:
atomic_inc(>private.mm->mm_count);
-   }
+   break;
+   }
 }
 EXPORT_SYMBOL_GPL(get_futex_key_refs);
 
@@ -301,11 +326,15 @@ EXPORT_SYMBOL_GPL(get_futex_key_refs);

  */
 void drop_futex_key_refs(union futex_key *key)
 {
-   if (key->both.ptr != 0) {
-  

Re: [PATCH] x86_64/acpi: make kernel to be compiled when CONFIG_ACPI_NUMA is set and power management with acpi is not enabled

2007-04-05 Thread Andrew Morton
On Tue, 3 Apr 2007 21:02:03 -0700
"Yinghai Lu" <[EMAIL PROTECTED]> wrote:

> [PATCH] x86_64/acpi: make kernel to be compiled when  CONFIG_ACPI_NUMA is set 
> and power management with acpi is not enabled
> 
> when CONFIG_ACPI_NUMA is set, and power management with acpi is not used. the 
> kernel can not be compiled.
> so use CONFIG_ACPI_POWER and CONFIG_ACPI_SYTEM to comment function about 
> set/get power and event.
> 
> Signed-off-by: Yinghai Lu <[EMAIL PROTECTED]> 
> 
> diff --git a/drivers/acpi/bus.c b/drivers/acpi/bus.c
> index dd49ea0..4d06885 100644
> --- a/drivers/acpi/bus.c
> +++ b/drivers/acpi/bus.c
> @@ -121,6 +121,7 @@ int acpi_bus_get_status(struct acpi_device *device)
>  
>  EXPORT_SYMBOL(acpi_bus_get_status);
>  
> +#ifdef CONFIG_ACPI_POWER
>  /* --
>   Power Management
> 
> -- */
> @@ -269,7 +270,9 @@ int acpi_bus_set_power(acpi_handle handle, int state)
>  }
>  
>  EXPORT_SYMBOL(acpi_bus_set_power);
> +#endif
>  
> +#ifdef CONFIG_ACPI_SYSTEM
>  /* --
>  Event Management
> 
> -- */
> @@ -358,6 +361,7 @@ int acpi_bus_receive_event(struct acpi_bus_event *event)
>  }
>  
>  EXPORT_SYMBOL(acpi_bus_receive_event);
> +#endif
>  
>  /* --
>   Notification Handling
> diff --git a/drivers/net/e1000/e1000_param.c b/drivers/net/e1000/e1000_param.c
> diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c
> index a064f36..c2a1ac9 100644
> --- a/drivers/pci/pci-acpi.c
> +++ b/drivers/pci/pci-acpi.c
> @@ -255,7 +255,7 @@ static int acpi_pci_choose_state(struct pci_dev *pdev, 
> pm_message_t state)
>  
>   return -ENODEV;
>  }
> -
> +#ifdef CONFIG_ACPI_POWER
>  static int acpi_pci_set_power_state(struct pci_dev *dev, pci_power_t state)
>  {
>   acpi_handle handle = DEVICE_ACPI_HANDLE(>dev);
> @@ -272,7 +272,7 @@ static int acpi_pci_set_power_state(struct pci_dev *dev, 
> pci_power_t state)
>   return -ENODEV;
>   return acpi_bus_set_power(handle, acpi_state);
>  }
> -
> +#endif
>  
>  /* ACPI bus type */
>  static int acpi_pci_find_device(struct device *dev, acpi_handle *handle)
> @@ -321,7 +321,9 @@ static int __init acpi_pci_init(void)
>   if (ret)
>   return 0;
>   platform_pci_choose_state = acpi_pci_choose_state;
> +#ifdef CONFIG_ACPI_POWER
>   platform_pci_set_power_state = acpi_pci_set_power_state;
> +#endif
>   return 0;
>  }
>  arch_initcall(acpi_pci_init);

This is a rather unpleasing patch from a maintainability point of view -
all those ifdefs do cause various problems.

I wonder if the situation could be improved by something like:

- Move acpi_bus_set_power() and acpi_bus_get_power() into power.c, which
  is only compiled if CONFIG_ACPI_POWER.

- Move acpi_bus_generate_event() and acpi_bus_receive_event() and their
  associated global variables into event.c, whcih is only compiled if
  CONFIG_ACPI_SYSTEM.

- Move acpi_pci_set_power_state() into power.c

- Move the initalisation of platform_pci_set_power_state into
  acpi_power_init() (this will have runtime effects - changed startup
  ordering)

Of course, making these changes might require some adjustments elsewhere -
some symbols might need to be made global, others maybe can become newly
static, etc.

The primary aim should be to keep the code _logical_.  If we think that the
above code motion reduces ifdefs, but makes the overall code layout less
logical, then we shouldn't do it.  But if the code remains at least equally
logical afterwards, and we can reduce the ifdeffing then we should do it.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [patch 1/3] epoll cleanups - epoll include diet ...

2007-04-05 Thread Davide Libenzi
On Thu, 5 Apr 2007, Andrew Morton wrote:

> epoll uses signal stuff and might need signal.h.  It implements syscalls
> and it certainly needs to have those syscall's prototypes in scope.  It
> surely uses stuff from mm.h (doesn't everything??)

Ack about signal.h, I forgot about the pwait code :(
Why syscalls.h? The eventpoll.c file expots syscalls, but it doesn't use 
anything declared in there.
What does eventpoll.c use *directly* from mm.h? If eventpoll.c uses, let's 
say sched.h, and sched.h needs mm.h, it is sched.h responsibility to 
include the mm.h file not eventpoll.c one.



- Davide


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Define EFLAGS_IF

2007-04-05 Thread H. Peter Anvin

Andi Kleen wrote:


No processor.h is such a hodgepodge of unrelated stuff that any
splitting up is a good thing.



Fair enough.  However, I'd still like to see the X86_CR* constants 
moved, too (and constants added for at least CR0 as well.)


-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: init's children list is long and slows reaping children.

2007-04-05 Thread Chris Snook

Chris Snook wrote:

Linus Torvalds wrote:


On Thu, 5 Apr 2007, Robin Holt wrote:

For testing, Jack Steiner create the following patch.  All it does
is moves tasks which are transitioning to the zombie state from where
they are in the children list to the head of the list.  In this way,
they will be the first found and reaping does speed up.  We will still
do a full scan of the list once the rearranged tasks are all removed.
This does not seem to be a significant problem.


I'd almost prefer to just put the zombie children on a separate list. 
I wonder how painful that would be..


That would still make it expensive for people who use WUNTRACED to get 
stopped children (since they'd have to look at all lists), but maybe 
that's not a big deal.


Shouldn't be any worse than it already is.

Another thing we could do is to just make sure that kernel threads 
simply don't end up as children of init. That whole thing is silly, 
they're really not children of the user-space init anyway. Comments?


Linus


Does anyone remember why we started doing this in the first place?  I'm 
sure there are some tools that expect a process tree, rather than a 
forest, and making it a forest could make them unhappy.


The support angel on my shoulder says we should just put all the kernel 
threads under a kthread subtree to shorten init's child list and 
minimize impact.  The hacker devil on my other shoulder says that with 
usermode helpers, containers, etc. it's about time we treat it as a 
tree, and any tools that have a problem with that need to be fixed.


-- Chris


Err, that should have been "about time we treat it as a forest".

-- Chris
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Ten percent test

2007-04-05 Thread Con Kolivas
On Thursday 05 April 2007 21:54, Ingo Molnar wrote:
>  - fiftyp.c:  noticeable, but alot better than previously!

fiftyp.c seems to have been stumbled across by accident as having an effect 
when Xenofon was trying to recreate Mike's 50% x 3 test case. I suggest a ten 
percent version like the following would be more useful as a test for the 
harmful effect discovered in fiftyp.c. (/me throws in obligatory code style 
change).

Starts 15 processes that sleep ten times longer than they run. Change forks to 
15 times the number of cpus you have and it should work on any size hardware.

-- 
-ck
// gcc -O2 -o tenp tenp.c -lrt
// code from interbench.c
#include 
#include 
#include 
#include 
#include 
#include 
/*
 * Start $forks processes that run for 10% cpu time each. Set this to
 * 15 * number of cpus for best effect.
 */
int forks = 15;

unsigned long run_us = 10, sleep_us;
unsigned long loops_per_ms;

void terminal_error(const char *name)
{
	fprintf(stderr, "\n");
	perror(name);
	exit (1);
}

unsigned long long get_nsecs(struct timespec *myts)
{
	if (clock_gettime(CLOCK_REALTIME, myts))
		terminal_error("clock_gettime");
	return (myts->tv_sec * 10 + myts->tv_nsec );
}

void burn_loops(unsigned long loops)
{
	unsigned long i;

	/*
	 * We need some magic here to prevent the compiler from optimising
	 * this loop away. Otherwise trying to emulate a fixed cpu load
	 * with this loop will not work.
	 */
	for (i = 0 ; i < loops ; i++)
	 asm volatile("" : : : "memory");
}

/* Use this many usecs of cpu time */
void burn_usecs(unsigned long usecs)
{
	unsigned long ms_loops;

	ms_loops = loops_per_ms / 1000 * usecs;
	burn_loops(ms_loops);
}

void microsleep(unsigned long long usecs)
{
	struct timespec req, rem;

	rem.tv_sec = rem.tv_nsec = 0;

	req.tv_sec = usecs / 100;
	req.tv_nsec = (usecs - (req.tv_sec * 100)) * 1000;
continue_sleep:
	if ((nanosleep(, )) == -1) {
		if (errno == EINTR) {
			if (rem.tv_sec || rem.tv_nsec) {
req.tv_sec = rem.tv_sec;
req.tv_nsec = rem.tv_nsec;
goto continue_sleep;
			}
			goto out;
		}
		terminal_error("nanosleep");
	}
out:
	return;
}

/*
 * In an unoptimised loop we try to benchmark how many meaningless loops
 * per second we can perform on this hardware to fairly accurately
 * reproduce certain percentage cpu usage
 */
void calibrate_loop(void)
{
	unsigned long long start_time, loops_per_msec, run_time = 0,
		min_run_us = run_us;
	unsigned long loops;
	struct timespec myts;
	int i;

	printf("Calibrating loop\n");
	loops_per_msec = 100;
redo:
	/* Calibrate to within 1% accuracy */
	while (run_time > 101 || run_time < 99) {
		loops = loops_per_msec;
		start_time = get_nsecs();
		burn_loops(loops);
		run_time = get_nsecs() - start_time;
		loops_per_msec = (100 * loops_per_msec / run_time ? :
			loops_per_msec);
	}

	/* Rechecking after a pause increases reproducibility */
	microsleep(1);
	loops = loops_per_msec;
	start_time = get_nsecs();
	burn_loops(loops);
	run_time = get_nsecs() - start_time;

	/* Tolerate 5% difference on checking */
	if (run_time > 105 || run_time < 95)
		goto redo;
	loops_per_ms=loops_per_msec;
	printf("Calibrating sleep interval\n");
	microsleep(1);
	/* Find the smallest time interval close to 1ms that we can sleep */
	for (i = 0; i < 100; i++) {
		start_time=get_nsecs();
		microsleep(1000);
		run_time=get_nsecs()-start_time;
		run_time /= 1000;
		if (run_time < run_us && run_us > 1000)
			run_us = run_time;
	}
	/* Then set run_us to that duration and sleep_us to 9 x that */
	sleep_us = run_us * 9;
	printf("Calibrating run interval\n");
	microsleep(1);
	/* Do a few runs to see what really gets us run_us runtime */
	for (i = 0; i < 100; i++) {
		start_time=get_nsecs();
		burn_usecs(run_us);
		run_time=get_nsecs()-start_time;
		run_time /= 1000;
		if (run_time < min_run_us && run_time > run_us)
			min_run_us = run_time;
	}
	if (min_run_us < run_us)
		run_us = run_us * run_us / min_run_us;
	printf("Each fork will run for %lu usecs and sleep for %lu usecs\n",
		run_us, sleep_us);
}

int main(void){
	int i;

	calibrate_loop();
	printf("starting %d forks\n", forks);
	for(i = 1; i < forks; i++){
		if(!fork())
			break;
	}
	while(1){
		burn_usecs(run_us);
		microsleep(sleep_us);
	}
	return 0;
}


Re: [patch 1/3] epoll cleanups - epoll include diet ...

2007-04-05 Thread Andrew Morton
On Tue, 03 Apr 2007 18:35:06 -0700
Davide Libenzi  wrote:

> Remove some unneeded include files from epoll code.
> 

Our definitions of "unneeded" might differ.

> 
> Signed-off-by: Davide Libenzi 
> 
> 
> - Davide
> 
> 
> 
> Index: linux-2.6.21-rc5.mm4/fs/eventpoll.c
> ===
> --- linux-2.6.21-rc5.mm4.orig/fs/eventpoll.c  2007-04-03 17:59:54.0 
> -0700
> +++ linux-2.6.21-rc5.mm4/fs/eventpoll.c   2007-04-03 18:33:30.0 
> -0700
> @@ -1,6 +1,6 @@
>  /*
> - *  fs/eventpoll.c ( Efficent event polling implementation )
> - *  Copyright (C) 2001,...,2006   Davide Libenzi
> + *  fs/eventpoll.c (Efficent event notification implementation)
> + *  Copyright (C) 2001,...,2007   Davide Libenzi
>   *
>   *  This program is free software; you can redistribute it and/or modify
>   *  it under the terms of the GNU General Public License as published by
> @@ -17,30 +17,21 @@
>  #include 
>  #include 
>  #include 
> -#include 
>  #include 
> -#include 
>  #include 
>  #include 
>  #include 
>  #include 
>  #include 
>  #include 
> -#include 
>  #include 
>  #include 
>  #include 
>  #include 
> -#include 
> -#include 
>  #include 
>  #include 
>  #include 
> -#include 
> -#include 
> -#include 
>  #include 
> -#include 

epoll uses signal stuff and might need signal.h.  It implements syscalls
and it certainly needs to have those syscall's prototypes in scope.  It
surely uses stuff from mm.h (doesn't everything??)

I am suspecting that this patch relies upon accidental nested inclusions
from within other headers.  But that is super-fragile: change a config
item, switch to a different architecture and whoops, it doesn't compile any
more.

Maybe I'm wrong, and you somehow worked out that none of these things which
these headers define, and none the things which these headers' includees
define is used in epoll.c or in the headers which are included after these
headers, or in those headers' includees.  If so, how the heck did you do
that?


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [linux-usb-devel] [RFC] HID bus design overview.

2007-04-05 Thread Li Yu
Dmitry Torokhov wrote:
>> +static void hid_bus_release(struct device *dev)
>> +{
>> +}
>> +
>> +struct device hid_bus = {
>> +.bus_id   = "hidbus0",
>> +.release  = hid_bus_release
>> +};
>> +
>> +static void hid_dev_release(struct device *dev)
>> +{
>> +}
>> +
>> 
>
> That will for sure raise Greg KH's blood pressure ;)
>   

I know your words now. The entire hid_bus device is useless. The
original code of hid bus is copied from LDD3e. It seem the API had
changed since it pressed. In fact, the new kernel only work silent
without it, or the kref_get() will warn us.

And, I fixed the double hidinput_disconnect() problem last night. It's
reason is not invalid memory access, instead of, it's normal behavior of
hidinput_disconnect(). The resolution is easy, We should move inputs
member to hid_device, not in hid_driver. so if we removed one
hid_device, it do not disconnect all devices which its driver bind, just
only itself.

Now, usbhid works fine.

Good luck.

- Li Yu


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Bugfix for VMI paravirt ops

2007-04-05 Thread Zachary Amsden

Jeremy Fitzhardinge wrote:

Zachary Amsden wrote:
  

Yes, thought about several solutions, and this seems the best.  But it
requires a new paravirt-op.



Not with the power of multiplexing.  Something like this, perhaps?
  


Throw it in the queue; I'll slide in after it.

Zach
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: init's children list is long and slows reaping children.

2007-04-05 Thread Chris Snook

Linus Torvalds wrote:


On Thu, 5 Apr 2007, Robin Holt wrote:

For testing, Jack Steiner create the following patch.  All it does
is moves tasks which are transitioning to the zombie state from where
they are in the children list to the head of the list.  In this way,
they will be the first found and reaping does speed up.  We will still
do a full scan of the list once the rearranged tasks are all removed.
This does not seem to be a significant problem.


I'd almost prefer to just put the zombie children on a separate list. I 
wonder how painful that would be..


That would still make it expensive for people who use WUNTRACED to get 
stopped children (since they'd have to look at all lists), but maybe 
that's not a big deal.


Shouldn't be any worse than it already is.

Another thing we could do is to just make sure that kernel threads simply 
don't end up as children of init. That whole thing is silly, they're 
really not children of the user-space init anyway. Comments?


Linus


Does anyone remember why we started doing this in the first place?  I'm sure 
there are some tools that expect a process tree, rather than a forest, and 
making it a forest could make them unhappy.


The support angel on my shoulder says we should just put all the kernel threads 
under a kthread subtree to shorten init's child list and minimize impact.  The 
hacker devil on my other shoulder says that with usermode helpers, containers, 
etc. it's about time we treat it as a tree, and any tools that have a problem 
with that need to be fixed.


-- Chris
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Oops in scsi_send_eh_cmnd 2.6.21-rc5-git6,7,10,13

2007-04-05 Thread James Bottomley
On Thu, 2007-04-05 at 17:15 -0700, David Miller wrote:
> This won't work I believe.
> 
> There are cases that use smaller sense buffers than the minimum
> specified by the SCSI layer.
> 
> One example is that do_sr_ioctl() stuff when the cgc passed
> in has a sense buffer.  That will only be as large as a
> "struct request_sense".
> 
> I'm pretty sure that's one of the reasons why we cons up a local sense
> buffer in this EH code.
> 
> So we could walk past the end of that and corrupt memory with
> your patch.

That should be fine ... the application copies the sense out of
scmnd->sense_buffer ... it can take as much or as little as it wants
(sense_buffer is actually a SCSI_SENSE_BUFFERSIZE array inside the
command). There was one thing I missed, which is that the sense buffer
size of the command is 252, whereas I need to set it back down to
sizeof(scmnd->sense_buffer).

This is another area where we "could do better" ... the request actually
gives us a sense buffer, but we use our own and later copy data out of
it back into the request.

James


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Bugfix for VMI paravirt ops

2007-04-05 Thread Jeremy Fitzhardinge
Zachary Amsden wrote:
> Yes, thought about several solutions, and this seems the best.  But it
> requires a new paravirt-op.

Not with the power of multiplexing.  Something like this, perhaps?

J

diff -r 5be4a5ff8e6b arch/i386/mm/highmem.c
--- a/arch/i386/mm/highmem.cThu Apr 05 17:04:04 2007 -0700
+++ b/arch/i386/mm/highmem.cThu Apr 05 17:50:46 2007 -0700
@@ -42,6 +42,8 @@ void *kmap_atomic_prot(struct page *page
 
vaddr = __fix_to_virt(FIX_KMAP_BEGIN + idx);
set_pte(kmap_pte-idx, mk_pte(page, prot));
+
+   arch_flush_lazy_mmu_mode();
 
return (void*) vaddr;
 }
diff -r 5be4a5ff8e6b include/asm-generic/pgtable.h
--- a/include/asm-generic/pgtable.h Thu Apr 05 17:04:04 2007 -0700
+++ b/include/asm-generic/pgtable.h Thu Apr 05 17:50:46 2007 -0700
@@ -180,6 +180,7 @@ static inline void ptep_set_wrprotect(st
 #ifndef __HAVE_ARCH_ENTER_LAZY_MMU_MODE
 #define arch_enter_lazy_mmu_mode() do {} while (0)
 #define arch_leave_lazy_mmu_mode() do {} while (0)
+#define arch_flush_lazy_mmu_mode() do {} while (0)
 #endif
 
 /*
@@ -193,6 +194,7 @@ static inline void ptep_set_wrprotect(st
 #ifndef __HAVE_ARCH_ENTER_LAZY_CPU_MODE
 #define arch_enter_lazy_cpu_mode() do {} while (0)
 #define arch_leave_lazy_cpu_mode() do {} while (0)
+#define arch_flush_lazy_cpu_mode() do {} while (0)
 #endif
 
 /*
diff -r 5be4a5ff8e6b include/asm-i386/paravirt.h
--- a/include/asm-i386/paravirt.h   Thu Apr 05 17:04:04 2007 -0700
+++ b/include/asm-i386/paravirt.h   Thu Apr 05 17:50:46 2007 -0700
@@ -27,9 +27,10 @@ struct desc_struct;
 
 /* Lazy mode for batching updates / context switch */
 enum paravirt_lazy_mode {
-   PARAVIRT_LAZY_NONE = 0,
-   PARAVIRT_LAZY_MMU = 1,
-   PARAVIRT_LAZY_CPU = 2,
+   PARAVIRT_LAZY_NONE = 0, /* exit lazy mode */
+   PARAVIRT_LAZY_MMU = 1,  /* lazy mmu updates */
+   PARAVIRT_LAZY_CPU = 2,  /* lazy cpu state updates */
+   PARAVIRT_LAZY_FLUSH = 3,/* flush pending changes, if any */
 };
 
 struct paravirt_ops
@@ -1044,6 +1045,10 @@ static inline void arch_leave_lazy_cpu_m
 {
PVOP_VCALL1(set_lazy_mode, PARAVIRT_LAZY_NONE);
 }
+static inline void arch_flush_lazy_cpu_mode(void)
+{
+   PVOP_VCALL1(set_lazy_mode, PARAVIRT_LAZY_FLUSH);
+}
 
 #define  __HAVE_ARCH_ENTER_LAZY_MMU_MODE
 static inline void arch_enter_lazy_mmu_mode(void)
@@ -1053,6 +1058,10 @@ static inline void arch_leave_lazy_mmu_m
 static inline void arch_leave_lazy_mmu_mode(void)
 {
PVOP_VCALL1(set_lazy_mode, PARAVIRT_LAZY_NONE);
+}
+static inline void arch_flush_lazy_mmu_mode(void)
+{
+   PVOP_VCALL1(set_lazy_mode, PARAVIRT_LAZY_FLUSH);
 }
 
 void _paravirt_nop(void);

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Define EFLAGS_IF

2007-04-05 Thread Andi Kleen
On Thu, Apr 05, 2007 at 05:29:52PM -0700, H. Peter Anvin wrote:
> Jeremy Fitzhardinge wrote:
> >
> >That patch got dropped, and replaced by one which pulled all the flags
> >definitions out of 
> >
> 
> Saw that a little too late :)
> 
> In general, it would be nice if the various CPU constants were all 
> defined in one place, so I'd rather suggest protecting the appropriate 
> parts of asm/processor.h with #ifndef __ASSEMBLY__.

No processor.h is such a hodgepodge of unrelated stuff that any
splitting up is a good thing.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Bugfix for VMI paravirt ops

2007-04-05 Thread Zachary Amsden

Jeremy Fitzhardinge wrote:

Zachary Amsden wrote:
  

So the clean fix for this is still even further out.  I don't think I
want to hook kmap/unmap as paravirt-ops.



Yes, it seems like overkill.

How about something like adding PARAVIRT_LAZY_FLUSH as an argument to
set_lazy_mode?  It would be valid to use at any time, and it would flush
any pending work while still remaining in whatever lazy mode its
currently in.  That way kmap_atomic can flush anything pending without
having to muck around with the current lazy state.
  


Yes, thought about several solutions, and this seems the best.  But it 
requires a new paravirt-op.


Zach

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Reiser4. BEST FILESYSTEM EVER? I need help.

2007-04-05 Thread H. Peter Anvin

[EMAIL PROTECTED] wrote:
Yeap, I guess that will probably work. 


And here I was trying to compile old versions of GRUB from namesys.com.

By the way, do you think the benchmarks from:

http://linuxhelp.150m.com/resources/fs-benchmarks.htm and
http://m.domaindlx.com/LinuxHelp/resources/fs-benchmarks.htm

are accurate?



Accurate, probably.  Whether or not they're *relevant* is a totally 
different ball of wax.


-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Bugfix for VMI paravirt ops

2007-04-05 Thread Jeremy Fitzhardinge
Zachary Amsden wrote:
> So the clean fix for this is still even further out.  I don't think I
> want to hook kmap/unmap as paravirt-ops.

Yes, it seems like overkill.

How about something like adding PARAVIRT_LAZY_FLUSH as an argument to
set_lazy_mode?  It would be valid to use at any time, and it would flush
any pending work while still remaining in whatever lazy mode its
currently in.  That way kmap_atomic can flush anything pending without
having to muck around with the current lazy state.

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Reiser4. BEST FILESYSTEM EVER? I need help.

2007-04-05 Thread johnrobertbanks
Yeap, I guess that will probably work. 

And here I was trying to compile old versions of GRUB from namesys.com.

By the way, do you think the benchmarks from:

http://linuxhelp.150m.com/resources/fs-benchmarks.htm and
http://m.domaindlx.com/LinuxHelp/resources/fs-benchmarks.htm

are accurate?

.-.
| FILESYSTEM | TIME |DISK |
| TYPE   |(secs)|USAGE|
.-.
|REISER4 lzo | 1938 | 278 |
|REISER4 gzip| 2295 | 213 |
|REISER4 | 3462 | 692 |
|EXT2| 4092 | 816 |
|JFS | 4225 | 806 |
|EXT4| 4408 | 816 |
|EXT3| 4421 | 816 |
|XFS | 4625 | 779 |
|REISER3 | 6178 | 793 |
|FAT32   |12342 | 988 |
|NTFS-3g |10414 | 772 |
.-.


Column one measures the time taken to complete the bonnie++ benchmarking
test (run with the parameters bonnie++ -n128:128k:0)

Column two, Disk Usage: measures the amount of disk used to store 655MB
of raw data (which was 3 different copies of the Linux kernel sources).

Thanks for that, John.


On Thu, 05 Apr 2007 17:23:23 -0700, "H. Peter Anvin" <[EMAIL PROTECTED]>
said:
> [EMAIL PROTECTED] wrote:
> > 
> > Anyway, I have patched the 2.6.20 kernel and have a partition formatted
> > with Reiser4.
> > 
> > However, I am having trouble getting LILO or GRUB working (with
> > Reiser4).
> > 
> > Could you guys who know all about this, help me, or point me to some
> > help.
> > 
> 
> Make your /boot a separate partition and format it as conservatively as 
> possible (e.g. ext3, or even ext2.)
> 
> Problem solved.
> 
>   -hpa
-- 
  
  [EMAIL PROTECTED]

-- 
http://www.fastmail.fm - Send your email first class

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] VM throttling: Start writeback at dirty_writeback_start_ratio

2007-04-05 Thread Andrew Morton
On Tue, 03 Apr 2007 19:46:04 +0900
Tomoki Sekiyama <[EMAIL PROTECTED]> wrote:

> This patchset is to avoid the problem that write(2) can be blocked for a
> long time if a system has several disks with different speed and is
> under heavy I/O pressure.
> 
> -Description of the problem:
> While Dirty+Writeback pages get more than 40%(`dirty_ratio') of memory,
> generators of dirty pages are blocked in balance_dirty_pages() until
> they start writeback of a specific number (`write_chunk', typically=1536)
> of dirty pages on the disks they write to.
> 
> Under this rule, if a process writes to the disk which has only a few
> (less than 1536) dirty pages, that process will be blocked until
> writeback of the other disks is completed and % of Dirty+Writeback goes
> below 40%.
> 
> Thus, if a slow device (such as a USB disk) has many dirty pages, the
> processes which write small data to the other disks can be blocked for
> quite a long time.
> 
> -Solution:
> This patch introduces high/low-watermark algorithm in
> balance_dirty_pages() in order to throttle only the processes which
> write to disks with heavy load.
> 
> This patch adds `dirty_start_writeback_ratio' for the low-watermark,
> and modifies get_dirty_limits() to calculate and return the writeback
> starting level of dirty pages based on `dirty_start_writeback_ratio'.
> 
> If % of Dirty+Writeback > `dirty_writeback_start_ratio', generators of
> dirty pages start writeback of dirty pages by themselves. At that time,
> these processes are not blocked in balance_dirty_pages(), but they may
> be blocked if the write-requests-queue of the written disk is full
> (that is, the length of the queue > `nr_requests'). By this behavior,
> we can throttle only processes which write to the disks with heavy load,
> and can allow processes to write to the other disks without blocking.
> 
> If % of Dirty+Writeback > `dirty_ratio', generators of dirty pages
> are throttled as current Linux does, not to fill up memory with dirty
> pages.

Does this actually solve the problem?  If the request queue is sufficiently
large (relative to the various dirty-memory thresholds) then I'd expect
that a heavy-writer will be able to very quickly take the total
dirty+writeback memory up to the dirty_ratio (should be renamed
throttle_threshold, but it's too late for that).

I suspect the reason why this patch was successful in your testing was
because dirty_start_writeback_ratio happens to exceed the size of the disk
request queues, so the heavy writer is getting stuck on disk request queue
exhaustion.

But that won't work if we have a lot of processes writing to a lot of
disks, and it won't work if the request queue size is large, or if the
dirty-memory thresholds are small (relative to the request queue size).

Do the patches still work after
`echo 1 > /sys/block/sda/queue/nr_requests'?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Define EFLAGS_IF

2007-04-05 Thread H. Peter Anvin

Jeremy Fitzhardinge wrote:


That patch got dropped, and replaced by one which pulled all the flags
definitions out of 



Saw that a little too late :)

In general, it would be nice if the various CPU constants were all 
defined in one place, so I'd rather suggest protecting the appropriate 
parts of asm/processor.h with #ifndef __ASSEMBLY__.


-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Reiser4. BEST FILESYSTEM EVER? I need help.

2007-04-05 Thread H. Peter Anvin

[EMAIL PROTECTED] wrote:


Anyway, I have patched the 2.6.20 kernel and have a partition formatted
with Reiser4.

However, I am having trouble getting LILO or GRUB working (with
Reiser4).

Could you guys who know all about this, help me, or point me to some
help.



Make your /boot a separate partition and format it as conservatively as 
possible (e.g. ext3, or even ext2.)


Problem solved.

-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Bugfix for VMI paravirt ops

2007-04-05 Thread Zachary Amsden

Jeremy Fitzhardinge wrote:

Zachary Amsden wrote:
  

No, they are totally dependent.  The reason interrupts are disabled is
to stop kmap_atomic in interrupt handlers.  With the kmap_atomic_pte
changes, the whole interrupt disable jibberish goes away. 



But kmap_atomic_pte is a special case of kmap_atomic for ptes. 
Interrupt routines can still use plain kmap_atomic for bouncebuffers and

so on.
  


Ah, yes.


A more general patch would be to make kmap/unmap_atomic pv_ops, and then
they can all be rolled together.  I.e: check the type to see if special
pte handling needs to happen, etc.
  


So the clean fix for this is still even further out.  I don't think I 
want to hook kmap/unmap as paravirt-ops.


Zach
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Define EFLAGS_IF

2007-04-05 Thread Jeremy Fitzhardinge
H. Peter Anvin wrote:
> Rusty Russell wrote:
>   
>> There is now more than one place where we use the fact that bit 9 of
>> eflags is the interrupt-enabled flag, so define EFLAGS_IF.  We make it
>> 512 so it can be used in asm, too.
>> 
>
> How about defining all the other EFLAGS in one place?
>   

That patch got dropped, and replaced by one which pulled all the flags
definitions out of 

J
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Define EFLAGS_IF

2007-04-05 Thread H. Peter Anvin

Rusty Russell wrote:

There is now more than one place where we use the fact that bit 9 of
eflags is the interrupt-enabled flag, so define EFLAGS_IF.  We make it
512 so it can be used in asm, too.


How about defining all the other EFLAGS in one place?

-hpa
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Oops in scsi_send_eh_cmnd 2.6.21-rc5-git6,7,10,13

2007-04-05 Thread David Miller
From: James Bottomley <[EMAIL PROTECTED]>
Date: Thu, 05 Apr 2007 19:02:19 -0500

> On Thu, 2007-04-05 at 15:36 -0700, David Miller wrote:
> > From: Andrew Burgess <[EMAIL PROTECTED]>
> > Date: Thu, 5 Apr 2007 15:13:27 -0700
> > 
> > > David, do you see any other problems with scsi_send_eh_cmnd?
> > > 
> > > I've switched back to 2.6.18 which seems to not oops 
> > > and am happy to try patches.
> > 
> > Does 2.6.20 with my patch OOPS too?  Does reverting my patch
> > make the oops go away?
> > 
> > If reverting my patch makes the OOPS go away, we need to
> > verify if page_address() is returning crap for some reason
> > or the length is wrong.
> 
> Assuming this does turn out to be the problem, we should just junk the
> page allocation ... it's completely unnecessary; when the slab allocated
> commands were done, we made sure the actual sense_buffer is at the
> correct location, so this should be the final fix:

This won't work I believe.

There are cases that use smaller sense buffers than the minimum
specified by the SCSI layer.

One example is that do_sr_ioctl() stuff when the cgc passed
in has a sense buffer.  That will only be as large as a
"struct request_sense".

I'm pretty sure that's one of the reasons why we cons up a local sense
buffer in this EH code.

So we could walk past the end of that and corrupt memory with
your patch.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Bugfix for VMI paravirt ops

2007-04-05 Thread Jeremy Fitzhardinge
Zachary Amsden wrote:
> No, they are totally dependent.  The reason interrupts are disabled is
> to stop kmap_atomic in interrupt handlers.  With the kmap_atomic_pte
> changes, the whole interrupt disable jibberish goes away. 

But kmap_atomic_pte is a special case of kmap_atomic for ptes. 
Interrupt routines can still use plain kmap_atomic for bouncebuffers and
so on.

A more general patch would be to make kmap/unmap_atomic pv_ops, and then
they can all be rolled together.  I.e: check the type to see if special
pte handling needs to happen, etc.

J

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Bugfix for VMI paravirt ops

2007-04-05 Thread Zachary Amsden

Jeremy Fitzhardinge wrote:

Zachary Amsden wrote:
  

Well at this point, the "proper" fix is dependent on Jeremy's
kmap_atomic_pte changes, which are definitely too late to pull into
2.6.21.  Can we just apply this patch please? 



Hm, I think they're independent aren't they?  Your fix is about making
lazy_mmu disable interrupts; that's independent of how highpte pages get
mapped.


No, they are totally dependent.  The reason interrupts are disabled is 
to stop kmap_atomic in interrupt handlers.  With the kmap_atomic_pte 
changes, the whole interrupt disable jibberish goes away.


Zach
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Bugfix for VMI paravirt ops

2007-04-05 Thread Jeremy Fitzhardinge
Zachary Amsden wrote:
> Well at this point, the "proper" fix is dependent on Jeremy's
> kmap_atomic_pte changes, which are definitely too late to pull into
> 2.6.21.  Can we just apply this patch please? 

Hm, I think they're independent aren't they?  Your fix is about making
lazy_mmu disable interrupts; that's independent of how highpte pages get
mapped.

J

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/01] New FBDev driver for Intel Vermilion Range

2007-04-05 Thread Antonino A. Daplas
On Thu, 2007-04-05 at 11:44 +0100, Alan Hourihane wrote:
> Attached is a patch against 2.6.21-rc5 which adds the Intel Vermilion
> Range support.
> 
> Intel funded Tungsten Graphics to do this work.
> 
> If there's any problems or updates needed to be done to get accepted,
> please let me know.
> 

Preferably, add sparse annotations and compile with make C=1. I've
included possible sparse annotations (the only ones I can see) below.
> 
> +
> +struct cr_sys {
> + struct vml_sys sys;
> + struct pci_dev *mch_dev;
> + struct pci_dev *lpc_dev;
> + __u32 mch_bar;
> + __u8 *mch_regs_base;

void __iomem *mch_regs_base; (sparse)

> + __u32 gpio_bar;
> + __u32 saved_panel_state;
> + __u32 saved_clock;
> +};
> +
> 
> +static void crvml_panel_on(const struct vml_sys *sys)
> +{
> + const struct cr_sys *crsys = container_of(sys, struct cr_sys, sys);
> + __u32 addr = crsys->gpio_bar + CRVML_PANEL_PORT;
> + __u32 cur = inl(addr);
> +
> + if (!(cur & CRVML_PANEL_ON)) {
> + /* Make sure LVDS controller is down. */
> + if (cur & 0x0001) {
> + cur &= ~CRVML_LVDS_ON;
> + outl(cur, addr);
> + }
> + /* Power up Panel */
> + schedule_timeout(HZ / 10);
> + cur |= CRVML_PANEL_ON;
> + outl(cur, addr);
> + }
> +
> + /* Power up LVDS controller */
> +
> + if (!(cur & CRVML_LVDS_ON)) {
> + schedule_timeout(HZ / 10);
> + outl(cur | CRVML_LVDS_ON, addr);
> + }
> +}
> +
> +static void crvml_panel_off(const struct vml_sys *sys)
> +{
> + const struct cr_sys *crsys = container_of(sys, struct cr_sys, sys);
> +
> + __u32 addr = crsys->gpio_bar + CRVML_PANEL_PORT;
> + __u32 cur = inl(addr);
> +
> + /* Power down LVDS controller first to avoid high currents */
> + if (cur & CRVML_LVDS_ON) {
> + cur &= ~CRVML_LVDS_ON;
> + outl(cur, addr);
> + }
> + if (cur & CRVML_PANEL_ON) {
> + schedule_timeout(HZ / 10);
> + outl(cur & ~CRVML_PANEL_ON, addr);
> + }
> +}
> +
> +static void crvml_backlight_on(const struct vml_sys *sys)
> +{
> + const struct cr_sys *crsys = container_of(sys, struct cr_sys, sys);
> + __u32 addr = crsys->gpio_bar + CRVML_PANEL_PORT;
> + __u32 cur = inl(addr);
> +
> + if (cur & CRVML_BACKLIGHT_OFF) {
> + cur &= ~CRVML_BACKLIGHT_OFF;
> + outl(cur, addr);
> + }
> +}
> +
> +static void crvml_backlight_off(const struct vml_sys *sys)
> +{
> + const struct cr_sys *crsys = container_of(sys, struct cr_sys, sys);
> + __u32 addr = crsys->gpio_bar + CRVML_PANEL_PORT;
> + __u32 cur = inl(addr);
> +
> + if (!(cur & CRVML_BACKLIGHT_OFF)) {
> + cur |= CRVML_BACKLIGHT_OFF;
> + outl(cur, addr);
> + }
> +}
> 

Perhaps backling_on/off and panel_on/off can be moved to the backlight
subsystem?

> +
> 
> +static int crvml_sys_restore(struct vml_sys *sys)
> +{
> + struct cr_sys *crsys = container_of(sys, struct cr_sys, sys);
> + __u32 *clock_reg = (__u32 *) (crsys->mch_regs_base + CRVML_REG_CLOCK);

__u32 __iomem *clock_reg = crsys->mch_regs_base + CRVML_REG_CLOCK; (sparse)

> + __u32 cur = crsys->saved_panel_state;
> +
> + if (cur & CRVML_BACKLIGHT_OFF) {
> + crvml_backlight_off(sys);
> + } else {
> + crvml_backlight_on(sys);
> + }
> +
> + if (cur & CRVML_PANEL_ON) {
> + crvml_panel_on(sys);
> + } else {
> + crvml_panel_off(sys);
> + if (cur & CRVML_LVDS_ON) {
> + ;
> + /* Will not power up LVDS controller while panel is off 
> */
> + }
> + }
> + iowrite32(crsys->saved_clock, clock_reg);
> + ioread32(clock_reg);
> +
> + return 0;
> +}
> +
> +static int crvml_sys_save(struct vml_sys *sys)
> +{
> + struct cr_sys *crsys = container_of(sys, struct cr_sys, sys);
> + __u32 *clock_reg = (__u32 *) (crsys->mch_regs_base + CRVML_REG_CLOCK);
> +

__u32 __iomem *clock_reg = crsys->mch_regs_base + CRVML_REG_CLOCK; (sparse)

> + crsys->saved_panel_state = inl(crsys->gpio_bar + CRVML_PANEL_PORT);
> + crsys->saved_clock = ioread32(clock_reg);
> +
> + return 0;
> +}
> +
> +static int crvml_nearest_index(const struct vml_sys *sys, int clock)
> +{
> +
> + int i;
> + int cur_index;
> + int cur_diff;
> + int diff;
> +
> + cur_index = 0;
> + cur_diff = clock - crvml_clocks[0];
> + cur_diff = (cur_diff < 0) ? -cur_diff : cur_diff;
> + for (i = 1; i < crvml_num_clocks; ++i) {
> + diff = clock - crvml_clocks[i];
> + diff = (diff < 0) ? -diff : diff;
> + if (diff < cur_diff) {
> + cur_index = i;
> + cur_diff = diff;
> + }
> + }
> + return cur_index;
> +}
> +
> +static int crvml_nearest_clock(const 

Re: Oops in scsi_send_eh_cmnd 2.6.21-rc5-git6,7,10,13

2007-04-05 Thread James Bottomley
On Thu, 2007-04-05 at 15:36 -0700, David Miller wrote:
> From: Andrew Burgess <[EMAIL PROTECTED]>
> Date: Thu, 5 Apr 2007 15:13:27 -0700
> 
> > David, do you see any other problems with scsi_send_eh_cmnd?
> > 
> > I've switched back to 2.6.18 which seems to not oops 
> > and am happy to try patches.
> 
> Does 2.6.20 with my patch OOPS too?  Does reverting my patch
> make the oops go away?
> 
> If reverting my patch makes the OOPS go away, we need to
> verify if page_address() is returning crap for some reason
> or the length is wrong.

Assuming this does turn out to be the problem, we should just junk the
page allocation ... it's completely unnecessary; when the slab allocated
commands were done, we made sure the actual sense_buffer is at the
correct location, so this should be the final fix:

James

diff --git a/drivers/scsi/scsi_error.c b/drivers/scsi/scsi_error.c
index adb40f2..997532b 100644
--- a/drivers/scsi/scsi_error.c
+++ b/drivers/scsi/scsi_error.c
@@ -18,12 +18,12 @@
 #include 
 #include 
 #include 
-#include 
 #include 
 #include 
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -641,16 +641,8 @@ static int scsi_send_eh_cmnd(struct scsi_cmnd *scmd, 
unsigned char *cmnd,
memcpy(scmd->cmnd, cmnd, cmnd_size);
 
if (copy_sense) {
-   gfp_t gfp_mask = GFP_ATOMIC;
-
-   if (shost->hostt->unchecked_isa_dma)
-   gfp_mask |= __GFP_DMA;
-
-   sgl.page = alloc_page(gfp_mask);
-   if (!sgl.page)
-   return FAILED;
-   sgl.offset = 0;
-   sgl.length = 252;
+   sg_init_one(, scmd->sense_buffer,
+   sizeof(scmd->sense_buffer));
 
scmd->sc_data_direction = DMA_FROM_DEVICE;
scmd->request_bufflen = sgl.length;
@@ -721,18 +713,6 @@ static int scsi_send_eh_cmnd(struct scsi_cmnd *scmd, 
unsigned char *cmnd,
 
 
/*
-* Last chance to have valid sense data.
-*/
-   if (copy_sense) {
-   if (!SCSI_SENSE_VALID(scmd)) {
-   memcpy(scmd->sense_buffer, page_address(sgl.page),
-  sizeof(scmd->sense_buffer));
-   }
-   __free_page(sgl.page);
-   }
-
-
-   /*
 * Restore original data
 */
scmd->request_buffer = old_buffer;


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Reiser4. BEST FILESYSTEM EVER? I need help.

2007-04-05 Thread johnrobertbanks
Hi Ignatich,

After seeing the following benchmarks at 

http://linuxhelp.150m.com/resources/fs-benchmarks.htm and
http://m.domaindlx.com/LinuxHelp/resources/fs-benchmarks.htm

The Reiser4 benchmarks are so good, I have decided to try the Reiser4
filesystem.

.-.
| FILESYSTEM | TIME |DISK |
| TYPE   |(secs)|USAGE|
.-.
|REISER4 lzo | 1938 | 278 |
|REISER4 gzip| 2295 | 213 |
|REISER4 | 3462 | 692 |
|EXT2| 4092 | 816 |
|JFS | 4225 | 806 |
|EXT4| 4408 | 816 |
|EXT3| 4421 | 816 |
|XFS | 4625 | 779 |
|REISER3 | 6178 | 793 |
|FAT32   |12342 | 988 |
|NTFS-3g |10414 | 772 |
.-.

Column one measures the time taken to complete the bonnie++ benchmarking
test (run with the parameters bonnie++ -n128:128k:0)

Column two, Disk Usage: measures the amount of disk used to store 655MB
of raw data (which was 3 different copies of the Linux kernel sources).

Anyway, I have patched the 2.6.20 kernel and have a partition formatted
with Reiser4.

However, I am having trouble getting LILO or GRUB working (with
Reiser4).

Could you guys who know all about this, help me, or point me to some
help.

Thanks a lot, John.


On Fri, 06 Apr 2007 02:42:35 +0400, "Ignatich" <[EMAIL PROTECTED]>
said:
> While trying to find the cause of problems with reiser4 in recent 
> kernels I came across this.
> 
> Incomplete write handling seem to be missing from reiser4_write_extent() 
> thanks to reiser4-temp-fix.patch. Strangely, there is a patch by Edward 
> Shishkin that should address that issue, but it is missing from -mm 
> tree. Please check.
> 
> Max
> 
-- 
  
  [EMAIL PROTECTED]

-- 
http://www.fastmail.fm - And now for something completely different…

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: 2.6.21-rc5 possible regression: KDE processes die silently (was: 2.6.21-rc3-mm2: KDE processes die while system is idle)

2007-04-05 Thread Rafael J. Wysocki
On Tuesday, 3 April 2007 01:06, Adrian Bunk wrote:
> On Sun, Apr 01, 2007 at 06:48:03PM +0200, Rafael J. Wysocki wrote:
> > On Sunday, 1 April 2007 17:21, Tilman Schmidt wrote:
> > > I'm sorry to say this has now happened with kernel 2.6.21-rc5, too.
> > > I started a kernel compilation in the evening and came back in the
> > > morning to find all KDE decorations gone. All processes normally
> > > running for a KDE session and labelled "[kinit]" in ps were gone
> > > but everything else was running fine, and the system was still
> > > usable via ssh. /var/log/kdm.log and /var/log/Xorg.0.log contained
> > > nothing remotely suspicious. /var/log/messages had two lines I
> > > never saw before:
> > > 
> > > Mar 31 02:27:36 gx110 kernel: [153577.891443] ReiserFS: hda3: warning: 
> > > vs-8115: get_num_ver: not directory or indirect item
> > > Mar 31 02:27:36 gx110 kernel: [153577.891559] ReiserFS: hda3: warning: 
> > > vs-8115: get_num_ver: not directory or indirect item
> > > 
> > > But those didn't appear on previous occurrences of the "dying KDE"
> > > problem so I guess they are not related.
> > > 
> > > This is SUSE LINUX 10.0 (i586) running on a Dell OptiPlex GX110
> > > (Intel P3, 933 MHz, i810 chipset, 512 MB RAM, 60 GB ATA disk)
> > > % uname -a
> > > Linux gx110 2.6.21-rc5-noinitrd #1 PREEMPT Sat Mar 31 02:15:19 CEST 2007 
> > > i686 i686 i386 GNU/Linux
> > > % cat /proc/cmdline
> > > root=/dev/hda3 selinux=0 x11i=vesa video=intelfb:[EMAIL PROTECTED] 
> > > nmi_watchdog=2 lapic 5
> > > Kernel configuration mostly-modular, based on standard SuSE kernel's
> > > /proc/config.gz, just compiling into the kernel everything I need to
> > > boot without an initrd and omitting some parts I'm not interested in.
> > > (.config attached.) What else might be relevant?
> > > 
> > > Again, this is a Heisenbug, ie. it's not reproducible and invariably
> > > happens when I'm away from the machine. (Probably Murphy at work.)
> > > It's pretty rare: I have seen it four times on 2.6.21-rc3-mm2 and
> > > once on 2.6.21-rc5, on a machine which spends about equal amounts
> > > of time running the latest stable, rc, and mm kernels. OTOH, so far
> > > it hasn't ever happened with any 2.6.20 or earlier kernel. Nor have
> > > I seen it with 2.6.21-rc[1-4] or 2.6.21-rc4-mm* - but for the -rc4
> > > and -rc4-mm releases that's not conclusive as those have only been
> > > running for a very short time.
> > 
> > I have a similar problem on x86_64 OpenSUSE 10.2, but it seems to happen
> > when a sound (eg. notification) is played while the display is suspended
> > (or "powered off").
> 
> Is it easily reproducible and still present with the latest -git?
> If yes, can you bisect?
> 
> > IMO it's a SUSE bug.
> 
> We also have a report of KDE crashes on Debian [1].
> And just a few days ago a kernel bug kwin ran into was fixed [2].
> 
> If the pattern is "works with 2.6.20 but does not work with 2.6.21-rc",
> then it's most likely a kernel regression.

Well, I'm not able to reproduce it with the current mainline, so let's hope
it's been fixed. :-)

Greetings,
Rafael
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Bugfix for VMI paravirt ops

2007-04-05 Thread Zachary Amsden

Andi Kleen wrote:

On Friday 06 April 2007 01:29:56 Zachary Amsden wrote:
  
I noticed this never got applied.  There was some feedback which I did 
not include in this patch because I think it is inappropriate to touch 
code outside vmi.c at this point for 2.6.21. 



I think it is. That is why i didn't apply it.
  


Well at this point, the "proper" fix is dependent on Jeremy's 
kmap_atomic_pte changes, which are definitely too late to pull into 
2.6.21.  Can we just apply this patch please?


Thanks,

Zach
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


2.6.21-rc5-mm4 initramfs Make Error

2007-04-05 Thread Zan Lynx
I built a version of 2.6.21-rc5-mm4 with an initramfs and it built OK
the first time.

Then I made changes (applied a Reiser4 patch) and rebuilt, and got the
following error:

zephyr linux # make
  CHK include/linux/version.h
  CHK include/linux/utsrelease.h
  CALLscripts/checksyscalls.sh
:1356:2: warning: #warning syscall getcpu not implemented
:1360:2: warning: #warning syscall epoll_pwait not implemented
:1364:2: warning: #warning syscall lutimesat not implemented
:1380:2: warning: #warning syscall revokeat not implemented
:1384:2: warning: #warning syscall frevoke not implemented
  CHK include/linux/compile.h
/usr/src/linux-2.6.21-rc5-mm4/usr/Makefile:41: *** target pattern contains no 
`%'.  Stop.
make: *** [usr] Error 2

I have this in the config:
CONFIG_INITRAMFS_SOURCE="/initramfs"

/initramfs is the directory where I build my initramfs, which is just a
busybox setup, very simple.

# rm usr/.initramfs_data.*
seems to make it go again.
-- 
Zan Lynx <[EMAIL PROTECTED]>


signature.asc
Description: This is a digitally signed message part


Re: [PATCH] Bugfix for VMI paravirt ops

2007-04-05 Thread Andrew Morton
On Thu, 05 Apr 2007 16:34:43 -0700
Zachary Amsden <[EMAIL PROTECTED]> wrote:

> > I noticed this never got applied.  There was some feedback which I did 
> > not include in this patch because I think it is inappropriate to touch 
> > code outside vmi.c at this point for 2.6.21.  Please apply; this patch 
> > is needed as a bugfix in 2.6.21.  An updated version for 2.6.22 will 
> > come later which has a nicer interface.
> >

There was a big foodfight last time you sent this out and I'd assumed that
there were still unresolved issues.  Or at least a general auru of
unhappiness.

I guess we merge it now, then (forget to) fix up those issues later on.

> Erm, stale patch, sorry.  This one instead.

yeah, that's the patch which has been in -mm for a week.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Bugfix for VMI paravirt ops

2007-04-05 Thread Andi Kleen
On Friday 06 April 2007 01:29:56 Zachary Amsden wrote:
> I noticed this never got applied.  There was some feedback which I did 
> not include in this patch because I think it is inappropriate to touch 
> code outside vmi.c at this point for 2.6.21. 

I think it is. That is why i didn't apply it.

-Andi
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/2] Make page->private usable in compound pages V1

2007-04-05 Thread Andrew Morton
On Thu,  5 Apr 2007 15:36:51 -0700 (PDT)
Christoph Lameter <[EMAIL PROTECTED]> wrote:

> If we add a new flag so that we can distinguish between the
> first page and the tail pages then we can avoid to use page->private
> in the first page. page->private == page for the first page, so there
> is no real information in there.
> 
> Freeing up page->private makes the use of compound pages more transparent.
> They become more usable like real pages. Right now we have to be careful f.e.
> if we are going beyond PAGE_SIZE allocations in the slab on i386 because we
> can then no longer use the private field. This is one of the issues that
> cause us not to support debugging for page size slabs in SLAB.
> 
> Having page->private available for SLUB would allow more meta information
> in the page struct. I can probably avoid the 16 bit ints that I have in
> there right now.
> 
> Also if page->private is available then a compound page may be equipped
> with buffer heads. This may free up the way for filesystems to support
> larger blocks than page size.
> 
> We add PageTail as an alias of PageReclaim. Compound pages cannot
> currently be reclaimed. Because of the alias one needs to check
> PageCompound first.

So slub is using compound pages so that it can locate the head page in
higher-order pages, whereas slab uses per-object (or per-order-0-page?)
metadata for that?

I see four instances of

+   page = virt_to_page(p);
+
+   if (unlikely(PageCompound(page)))
+   page = page->first_page;

A new virt_to_head_page() is needed.


Sigh.  We're seeing rather a lot of churn to accommodate slub.  Do we
actually have any justification for all this?  If we end up deciding to
merge slub and to deprecate then remove slab, what would our reasons have
been?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Bugfix for VMI paravirt ops

2007-04-05 Thread Zachary Amsden

Zachary Amsden wrote:
I noticed this never got applied.  There was some feedback which I did 
not include in this patch because I think it is inappropriate to touch 
code outside vmi.c at this point for 2.6.21.  Please apply; this patch 
is needed as a bugfix in 2.6.21.  An updated version for 2.6.22 will 
come later which has a nicer interface.




Erm, stale patch, sorry.  This one instead.


Critical bugfix; when using software RAID, potentially USB or AIO in
highmem configurations, drivers are allowed to use kmap_atomic from
interrupt context.  This is incompatible with the current implementation
of lazy MMU mode, and means the kmap will silently fail, causing either
memory corruption or kernel panics.

The fix is to disable interrupts on the CPU when entering a lazy MMU
state; this is totally safe, as preemption is already disabled, and
lazy update state can neither be nested nor overlapping.  Thus per-cpu
variables to track the state and flags can be used to disable interrupts
during this critical region.

Signed-off-by: Zachary Amsden <[EMAIL PROTECTED]>

diff -r eee69881b2f9 arch/i386/kernel/vmi.c
--- a/arch/i386/kernel/vmi.cThu Apr 05 16:20:18 2007 -0700
+++ b/arch/i386/kernel/vmi.cThu Apr 05 16:31:12 2007 -0700
@@ -69,6 +69,7 @@ static struct {
void (*flush_tlb)(int);
void (*set_initial_ap_state)(int, int);
void (*halt)(void);
+   void (*set_lazy_mode)(int mode);
 } vmi_ops;
 
 /*
@@ -545,6 +546,31 @@ vmi_startup_ipi_hook(int phys_apicid, un
 }
 #endif
 
+static void vmi_set_lazy_mode(int new_mode)
+{
+   static DEFINE_PER_CPU(int, mode);
+   static DEFINE_PER_CPU(unsigned long, flags);
+   int cpu = smp_processor_id();
+
+   if (!vmi_ops.set_lazy_mode)
+   return;
+
+   /*
+* Modes do not nest or overlap, so we can simply disable
+* irqs when entering a mode and re-enable when leaving.
+*/
+   BUG_ON(per_cpu(mode, cpu) && new_mode);
+   BUG_ON(!new_mode && !per_cpu(mode, cpu));
+   
+   if (new_mode)
+   local_irq_save(per_cpu(flags, cpu));
+   else
+   local_irq_restore(per_cpu(flags, cpu));
+
+   vmi_ops.set_lazy_mode(new_mode);
+   per_cpu(mode, cpu) = new_mode;
+}
+
 static inline int __init check_vmi_rom(struct vrom_header *rom)
 {
struct pci_header *pci;
@@ -769,7 +795,7 @@ static inline int __init activate_vmi(vo
para_wrap(load_esp0, vmi_load_esp0, set_kernel_stack, 
UpdateKernelStack);
para_fill(set_iopl_mask, SetIOPLMask);
para_fill(io_delay, IODelay);
-   para_fill(set_lazy_mode, SetLazyMode);
+   para_wrap(set_lazy_mode, vmi_set_lazy_mode, set_lazy_mode, SetLazyMode);
 
/* user and kernel flush are just handled with different flags to 
FlushTLB */
para_wrap(flush_tlb_user, vmi_flush_tlb_user, flush_tlb, FlushTLB);


[PATCH] Bugfix for VMI paravirt ops

2007-04-05 Thread Zachary Amsden
I noticed this never got applied.  There was some feedback which I did 
not include in this patch because I think it is inappropriate to touch 
code outside vmi.c at this point for 2.6.21.  Please apply; this patch 
is needed as a bugfix in 2.6.21.  An updated version for 2.6.22 will 
come later which has a nicer interface.


Zach
Critical bugfix; when using software RAID, potentially USB or AIO in
highmem configurations, drivers are allowed to use kmap_atomic from
interrupt context.  This is incompatible with the current implementation
of lazy MMU mode, and means the kmap will silently fail, causing either
memory corruption or kernel panics.

The fix is to disable interrupts on the CPU when entering a lazy MMU
state; this is totally safe, as preemption is already disabled, and
lazy update state can neither be nested nor overlapping.  Thus per-cpu
variables to track the state and flags can be used to disable interrupts
during this critical region.

Signed-off-by: Zachary Amsden <[EMAIL PROTECTED]>

Index: ubuntu-2.6.20/arch/i386/kernel/vmi.c
===
--- ubuntu-2.6.20.orig/arch/i386/kernel/vmi.c   2007-03-29 21:17:47.0 
-0700
+++ ubuntu-2.6.20/arch/i386/kernel/vmi.c2007-03-30 00:01:20.0 
-0700
@@ -69,6 +69,7 @@
void (fastcall *flush_tlb)(int);
void (fastcall *set_initial_ap_state)(int, int);
void (fastcall *halt)(void);
+   void (fastcall *set_lazy_mode)(int mode);
 } vmi_ops;
  
 /* XXX move this to alternative.h */
@@ -577,6 +578,31 @@
 }
 #endif
 
+static void vmi_set_lazy_mode(int new_mode)
+{
+   static DEFINE_PER_CPU(int, mode);
+   static DEFINE_PER_CPU(unsigned long, flags);
+   int cpu = smp_processor_id();
+
+   if (!vmi_ops.set_lazy_mode)
+   return;
+
+   /*
+* Modes do not nest or overlap, so we can simply disable
+* irqs when entering a mode and re-enable when leaving.
+*/
+   BUG_ON(per_cpu(mode, cpu) && new_mode);
+   BUG_ON(!new_mode && !per_cpu(mode, cpu));
+   
+   if (new_mode)
+   local_irq_save(per_cpu(flags, cpu));
+   else
+   local_irq_restore(per_cpu(flags, cpu));
+
+   vmi_ops.set_lazy_mode(new_mode);
+   per_cpu(mode, cpu) = new_mode;
+}
+
 static inline int __init check_vmi_rom(struct vrom_header *rom)
 {
struct pci_header *pci;
@@ -806,7 +832,7 @@
para_wrap(load_esp0, vmi_load_esp0, set_kernel_stack, 
UpdateKernelStack);
para_fill(set_iopl_mask, SetIOPLMask);
para_fill(io_delay, IODelay);
-   para_fill(set_lazy_mode, SetLazyMode);
+   para_wrap(set_lazy_mode, vmi_set_lazy_mode, set_lazy_mode, SetLazyMode);
 
/* user and kernel flush are just handled with different flags to 
FlushTLB */
para_wrap(flush_tlb_user, vmi_flush_tlb_user, flush_tlb, FlushTLB);


Re: [PATCH 12/12] mm: per BDI congestion feedback

2007-04-05 Thread Andrew Morton
On Thu, 05 Apr 2007 19:42:21 +0200
[EMAIL PROTECTED] wrote:

> Now that we have per BDI dirty throttling is makes sense to also have oer BDI
> congestion feedback; why wait on another device if the current one is not
> congested.

Similar comments apply.  congestion_wait() should be called
throttle_at_a_rate_proportional_to_the_speed_of_presently_uncongested_queues().

If a process is throttled in the page allocator waiting for pages to become
reclaimable, that process absolutely does not care whether those pages were
previously dirty against /dev/sda or against /dev/sdb.  It wants to be woken
up for writeout completion against any queue.


-   wbc.encountered_congestion = 0;
+   wbc.encountered_congestion = NULL;
wbc.nr_to_write = MAX_WRITEBACK_PAGES;
wbc.pages_skipped = 0;
writeback_inodes();
min_pages -= MAX_WRITEBACK_PAGES - wbc.nr_to_write;
if (wbc.nr_to_write > 0 || wbc.pages_skipped > 0) {
/* Wrote less than expected */
-   congestion_wait(WRITE, HZ/10);
-   if (!wbc.encountered_congestion)
+   if (wbc.encountered_congestion)
+   congestion_wait(wbc.encountered_congestion,
+   WRITE, HZ/10);
+   else

Well that confused me.  You'd be needing to rename
wbc.encountered_congestion to congested_bdi or something.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 11/12] mm: accurate pageout congestion wait

2007-04-05 Thread Andrew Morton
On Thu, 05 Apr 2007 19:42:20 +0200
[EMAIL PROTECTED] wrote:

> Only do the congestion wait when we actually encountered congestion.

The name congestion_wait() was accurate back in 2002, but it isn't accurate
any more, and you got misled.  It does not only wait for a queue to become
uncongested.

See clear_bdi_congested()'s callers.  As long as the queue is in an
uncongested state, we deliver wakeups to congestion_wait() blockers on
every IO completion.  As I said before, it is so that the MM's polling
operations poll at a higher frequency when the IO system is working faster.
(It is also to synchronise with end_page_writeback()'s feeding of clean
pages to us via rotate_reclaimable_page()).



Page reclaim can get into trouble without any request queue having entered
a congested state.  For example, think about a machine which has a single
disk, and the operator has increased that disk's request queue size to
100,000.  With your patch all the VM's throttling would be bypassed and we
go into a busy loop and declare OOM instantly.

There are probably other situations in which page reclaim gets into trouble
without a request queue being congested.

Minor point: bdi_congested() can be arbitrarily expensive - for DM stackups
it is roughly proportional to the number of subdevices in the device.  We
need to be careful about how frequently we call it.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/4] IA64: SPARSE_VIRTUAL 16M page size support

2007-04-05 Thread David Miller
From: "Luck, Tony" <[EMAIL PROTECTED]>
Date: Thu, 5 Apr 2007 15:50:02 -0700

> Maybe a granule is not the right unit of allocation ... perhaps 4M
> would work better (4M/56 ~= 75000 pages ~= 1.1G)?  But if this is
> too small, then a hard-coded 16M would be better than a granule,
> because 64M is (IMHO) too big.

A 4MB chunk of page structs covers about 512MB of ram (I'm rounding up
to 64-bytes in my calculations and using an 8K page size, sorry :-).
So I think that is too small although on the sparc64 side that is the
biggest I have available on most processor models.

But I do agree that 64MB is way too big and 16MB is a good compromise
chunk size for this stuff.  That covers about 2GB of ram with the
above parameters, which should be about right.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Questions about porting perfmon2 to powerpc

2007-04-05 Thread Benjamin Herrenschmidt
On Thu, 2007-04-05 at 14:55 -0500, Kevin Corry wrote:
> Hello,
> 
> Carl Love and I have been working on getting the latest perfmon2 patches 
> (http://perfmon2.sourceforge.net/) working on Cell, and on powerpc in 
> general. We've come up with some powerpc-specific questions and we're hoping 
> to get some opinions from the powerpc kernel developers.
> 
> First, the stock 2.6.20 kernel has a prototype in include/linux/smp.h for a 
> function called smp_call_function_single(). However, this routine is only 
> implemented on i386, x86_64, ia64, and mips. Perfmon2 apparently needs to 
> call this to run a function on a specific CPU. Powerpc provides an 
> smp_call_function() routine to run a function on all active CPUs, so I used 
> that as a basis to add an smp_call_function_single() routine. I've included 
> the patch below and was wondering if it looked like a sane approach.

We should do better... it will require some backend work for the various
supported PICs though. I've always wanted to look into doing a
smp_call_function_cpumask in fact :-)

> Next, we ran into a problem related to Perfmon2 initialization and sysfs. The 
> problem turned out to be that the powerpc version of topology_init() is 
> defined as an __initcall() routine, but Perfmon2's initialization is done as 
> a subsys_initcall() routine. Thus, Perfmon2 tries to initialize its sysfs 
> information before some of the powerpc cpu information has been initialized. 
> However, on all other architectures, topology_init() is defined as a 
> subsys_initcall() routine, so this problem was not seen on any other 
> platforms. Changing the powerpc version of topology_init() to a 
> subsys_initcall() seems to have fixed the bug. However, I'm not sure if that 
> is going to cause problems elsewhere in the powerpc code. I've included the 
> patch below (after the smp-call-function-single patch). Does anyone know if 
> this change is safe, or if there was a specific reason that topology_init() 
> was left as an __initcall() on powerpc?

It would make sense to follow what other archs do. Note that if both
perfmon and topology_init are subsys_initcall, that is on the same
level, it's still a bit hairy to expect one to be called before the
other...

Ben.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3] Stop pmac_zilog from abusing 8250's device numbers; optionally.

2007-04-05 Thread David Woodhouse
On Fri, 2007-04-06 at 08:53 +1000, Paul Mackerras wrote:
> Why would the numbers be prone to change, any more than they are
> already?

Because now 8250 ports can actually coexist with Zilog ports. Before my
fix, it was strictly one or the other.

-- 
dwmw2

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 10/12] mm: page_alloc_wait

2007-04-05 Thread Andrew Morton
On Thu, 05 Apr 2007 19:42:19 +0200
[EMAIL PROTECTED] wrote:

> Introduce a mechanism to wait on free memory.
> 
> Currently congestion_wait() is abused to do this.

Such a very small explanation for such a terrifying change.

> ...
>
> --- linux-2.6-mm.orig/mm/vmscan.c 2007-04-05 16:29:46.0 +0200
> +++ linux-2.6-mm/mm/vmscan.c  2007-04-05 16:29:49.0 +0200
> @@ -1436,6 +1436,7 @@ static int kswapd(void *p)
>   finish_wait(>kswapd_wait, );
>  
>   balance_pgdat(pgdat, order);
> + page_alloc_ok();
>   }
>   return 0;
>  }

For a start, we don't know that kswapd freed pages which are in a suitable
zone.  And we don't know that kswapd freed pages which are in a suitable
cpuset.

congestion_wait() is similarly ignorant of the suitability of the pages,
but the whole idea behind congestion_wait is that it will throttle page
allocators to some speed which is proportional to the speed at which the IO
systems can retire writes - view it as a variable-speed polling operation,
in which the polling frequency goes up when the IO system gets faster. 
This patch changes that philosophy fundamentally.  That's worth more than a
2-line changelog.

Also, there might be situations in which kswapd gets stuck in some dark
corner.  Perhaps the process which is waiting in the page allocator holds
filesystem locks which kswapd is blocked on.  Or kswapd might be blocked on
a particular request queue, or a dead NFS server or something.  The timeout
will save us, but things will be slow.

There could be other problems too, dunno - this stuff is tricky.  Why are
you changing it, what problems are being solved, etc?

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3] Stop pmac_zilog from abusing 8250's device numbers; optionally.

2007-04-05 Thread Paul Mackerras
David Woodhouse writes:

> Of course, the _numbers_ might change -- a given port might no longer be
> ttyS0 but ttyS1. But we're happy to overlook that one even though the
> effect on the user is identical, right?

Why would the numbers be prone to change, any more than they are
already?
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [OT] the shortest thread of LKML !

2007-04-05 Thread Bill Davidsen

Willy Tarreau wrote:

On Wed, Mar 28, 2007 at 01:02:10PM -0700, David Miller wrote:

Please nobody reply to his posting, I'm shit-canning this thread from
the start as it's nothing but flame fodder.


He forgot the most important thing: there are *many* "benevolent dictators",
all with their own domain of excellence ;-)

Good catch, David, you're like a spider on a web waiting for the naive
intruder !


Posted several days too early for April Fool...

--
Bill Davidsen <[EMAIL PROTECTED]>
  "We have more to fear from the bungling of the incompetent than from
the machinations of the wicked."  - from Slashdot
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] leave loglevel at 7 through sysrq output so you can actually read it

2007-04-05 Thread Martin Bligh

We carefully set loglevel to 7, and print the sysrq messsage
as to what event we're doing, but we can't actually see
the output as it sets it back before calling the handler,
rather than after.

Move the assignment down one line.

Signed-off-by: Martin J. Bligh <[EMAIL PROTECTED]>
diff -aurpN -X /home/mbligh/.diff.exclude 
linux-2.6.21-rc5-git10/drivers/char/sysrq.c 
linux-2.6.21-rc5-git10-loglevel/drivers/char/sysrq.c
--- linux-2.6.21-rc5-git10/drivers/char/sysrq.c 2007-04-03 11:23:54.0 
-0700
+++ linux-2.6.21-rc5-git10-loglevel/drivers/char/sysrq.c2007-04-05 
15:49:40.0 -0700
@@ -421,8 +421,8 @@ void __handle_sysrq(int key, struct tty_
 */
if (!check_mask || sysrq_on_mask(op_p->enable_mask)) {
printk("%s\n", op_p->action_msg);
-   console_loglevel = orig_log_level;
op_p->handler(key, tty);
+   console_loglevel = orig_log_level;
} else {
printk("This sysrq operation is disabled.\n");
}


RE: [PATCH 4/4] IA64: SPARSE_VIRTUAL 16M page size support

2007-04-05 Thread Luck, Tony
> This implements granule page sized vmemmap support for IA64.

Christoph,

Your calculations here are all based on a granule size of 16M, but
it is possible to configure 64M granules.

With current sizeof(struct page) == 56, a 16M page will hold enough
page structures for about 4.5G of physical space (assuming 16K pages),
so a 64M page would cover 18G.

4.5G is possibly a bit wasteful (for a system with only a handful
of GBytes per node, and nodes that are not physically contiguous).
18G is definitely going to result in lots of wasted page structs
(that refer to non-existant physical memory around the edges of
each node).

Maybe a granule is not the right unit of allocation ... perhaps 4M
would work better (4M/56 ~= 75000 pages ~= 1.1G)?  But if this is
too small, then a hard-coded 16M would be better than a granule,
because 64M is (IMHO) too big.

-Tony

P.S. This patch breaks the build for tiger_defconfig, zx1_defconfig
etc.  But you may have fit on the "grand-unified theory" of mem_map
management ... so if the benchmarks come in favourably we could
drop all the other CONFIG options.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH RFC] x86: clear X86_FEATURE_MWAIT for AMD Fam10 CPU

2007-04-05 Thread Andreas Herrmann
Hi,

I send this as RFC because I won't manage it to test it
before end of Easter but want to have a consensus about
how the final patch should look like.

Andi,
what do you finally prefer?

(1) Something like the attached patch or
(2) a version which keeps to the MWAIT flag for Fam10 but
introduces an X86_FEATURE_MWAIT_DOESNT_SAVE_POWER as you
suggested.

An idle=mwait kernel parameter could (and should) be introduced
with both alternatives.

Meanwhile I think it would suffice to do (1) and issue another
cpuid if idle=mwait was used to select mwait_idle.


Regards,

Andreas

--


diff --git a/arch/i386/kernel/cpu/amd.c b/arch/i386/kernel/cpu/amd.c
index 2d47db4..4e01262 100644
--- a/arch/i386/kernel/cpu/amd.c
+++ b/arch/i386/kernel/cpu/amd.c
@@ -228,6 +228,9 @@ #define CBAR_KEY(0X00CB)
}
 
switch (c->x86) {
+   case 16:
+   clear_bit(X86_FEATURE_MWAIT, c->x86_capability);
+   break;
case 15:
set_bit(X86_FEATURE_K8, c->x86_capability);
break;
diff --git a/arch/x86_64/kernel/setup.c b/arch/x86_64/kernel/setup.c
index 3d98b69..f53ee6c 100644
--- a/arch/x86_64/kernel/setup.c
+++ b/arch/x86_64/kernel/setup.c
@@ -583,6 +583,10 @@ #endif
if (c->x86 == 15 && ((level >= 0x0f48 && level < 0x0f50) || level >= 
0x0f58))
set_bit(X86_FEATURE_REP_GOOD, >x86_capability);
 
+   /* disable use of mwait on idle */
+   if (c->x86 == 16)
+   clear_bit(X86_FEATURE_MWAIT, c->x86_capability);
+
/* Enable workaround for FXSAVE leak */
if (c->x86 >= 6)
set_bit(X86_FEATURE_FXSAVE_LEAK, >x86_capability);

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Question: half-duplex and full-duplex serial driver

2007-04-05 Thread Valdis . Kletnieks
On Thu, 05 Apr 2007 18:40:55 EDT, Bill Davidsen said:
> Mockern wrote:
> > Hi,
> > 
> > Could you help me please, how can my serial driver to work in  half-duplex 
> > and full-duplex mode?
> > 
> > Thank you
> 
> Since you don't seem to have gotten an answer, and while this is 
> probably the wrong list for your question, I can give you a pointer 
> which may help.

I got the impression that they were trying to write an in-kernel driver for
a serial card, and it was oopsing.  My first guess is "bad locking",
and my first suggestion is 'Linux Device Drivers, 3rd edition'

http://lwn.net/Kernel/LDD3 last I remember.


pgpsbUmaVbAKi.pgp
Description: PGP signature


Re: Any Intel folks on the list? Intel PCI-E bridge ACPI resource question

2007-04-05 Thread Justin Piszcz
My .config is attached.. I cannot reproduce this problem, it only happened 
once, but I want to find out how to make sure it does not happen again.



On Thu, 5 Apr 2007, Justin Piszcz wrote:





On Thu, 5 Apr 2007, Justin Piszcz wrote:


http://www.ussg.iu.edu/hypermail/linux/kernel/0701.3/0315.html



Here is the badblocks output:

p34:~# /usr/bin/time badblocks -b 512 -s -v -w /dev/sdl
Checking for bad blocks in read-write mode
From block 0 to 293046768
Testing with pattern 0xaa: done
Reading and comparing: done
Testing with pattern 0x55: done
Reading and comparing: done
Testing with pattern 0xff: done
Reading and comparing: done
Testing with pattern 0x00: done
Reading and comparing: done
Pass completed, 0 bad blocks found.
1929.06user 467.89system 4:36:23elapsed 14%CPU (0avgtext+0avgdata 
0maxresident)k

0inputs+0outputs (1major+257minor)pagefaults 0swaps
p34:~#

Nothing wrong with the drive.  This problem concerns me greatly as I am not 
sure what I can do to fix this issue, how can I make sure it does not happen 
again?


Justin.


config-2.6.20.4.bz2
Description: Binary data


Re: [PATCH 08/12] mm: fixup possible deadlock

2007-04-05 Thread Andrew Morton
On Thu, 05 Apr 2007 19:42:17 +0200
[EMAIL PROTECTED] wrote:

> When the threshol is in the order of the per cpu inaccuracies we can
> deadlock by not receiveing the updated count,

That explanation is a bit, umm, terse.

> introduce a more expensive
> but more accurate stat read function to use on low thresholds.

Looks like percpu_counter_sum().
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 09/12] mm: remove throttle_vm_writeback

2007-04-05 Thread Andrew Morton
On Thu, 05 Apr 2007 19:42:18 +0200
[EMAIL PROTECTED] wrote:

> rely on accurate dirty page accounting to provide enough push back

I think we'd like to see a bit more justification than that, please.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Question: half-duplex and full-duplex serial driver

2007-04-05 Thread Bill Davidsen

Mockern wrote:

Hi,

Could you help me please, how can my serial driver to work in  half-duplex and 
full-duplex mode?

Thank you


Since you don't seem to have gotten an answer, and while this is 
probably the wrong list for your question, I can give you a pointer 
which may help.


The communications program "kermit" can do this, google for the source, 
or try kermit.columbia.edu first, and read the source to see how they do 
it. I'm reasonably sure ioctl() is the answer, but that's choice three 
for your research.


--
bill davidsen <[EMAIL PROTECTED]>
  CTO TMR Associates, Inc
  Doing interesting things with small computers since 1979
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


REISER4: fix for reiser4_write_extent

2007-04-05 Thread Ignatich
While trying to find the cause of problems with reiser4 in recent 
kernels I came across this.


Incomplete write handling seem to be missing from reiser4_write_extent() 
thanks to reiser4-temp-fix.patch. Strangely, there is a patch by Edward 
Shishkin that should address that issue, but it is missing from -mm 
tree. Please check.


   Max

--
Subject: reiser4: fix write_extent
From: Edward Shishkin <[EMAIL PROTECTED]>

. Fix reiser4_write_extent():
   1) handling incomplete writes missed in reiser4-temp-fix.patch
   2) bugs in the case of returned errors


Signed-off-by: Andrew Morton <[EMAIL PROTECTED]>
---

 fs/reiser4/plugin/item/extent_file_ops.c |   64 -
 1 file changed, 37 insertions(+), 27 deletions(-)

diff -puN fs/reiser4/plugin/item/extent_file_ops.c~reiser4-fix-write_extent 
fs/reiser4/plugin/item/extent_file_ops.c
--- a/fs/reiser4/plugin/item/extent_file_ops.c~reiser4-fix-write_extent
+++ a/fs/reiser4/plugin/item/extent_file_ops.c
@@ -941,15 +941,15 @@ static int write_extent_reserve_space(st
  * reiser4_write_extent - write method of extent item plugin
  * @file: file to write to
  * @buf: address of user-space buffer
- * @write_amount: number of bytes to write
- * @off: position in file to write to
+ * @count: number of bytes to write
+ * @pos: position in file to write to
  *
  */
 ssize_t reiser4_write_extent(struct file *file, const char __user *buf,
 size_t count, loff_t *pos)
 {
int have_to_update_extent;
-   int nr_pages;
+   int nr_pages, nr_dirty;
struct page *page;
jnode *jnodes[WRITE_GRANULARITY + 1];
struct inode *inode;
@@ -958,7 +958,7 @@ ssize_t reiser4_write_extent(struct file
int i;
int to_page, page_off;
size_t left, written;
-   int result;
+   int result = 0;
 
inode = file->f_dentry->d_inode;
if (write_extent_reserve_space(inode))
@@ -972,10 +972,12 @@ ssize_t reiser4_write_extent(struct file
 
BUG_ON(get_current_context()->trans->atom != NULL);
 
+   left = count;
index = *pos >> PAGE_CACHE_SHIFT;
/* calculate number of pages which are to be written */
end = ((*pos + count - 1) >> PAGE_CACHE_SHIFT);
nr_pages = end - index + 1;
+   nr_dirty = 0;
assert("", nr_pages <= WRITE_GRANULARITY + 1);
 
/* get pages and jnodes */
@@ -983,22 +985,17 @@ ssize_t reiser4_write_extent(struct file
page = find_or_create_page(inode->i_mapping, index + i,
   reiser4_ctx_gfp_mask_get());
if (page == NULL) {
-   while(i --) {
-   unlock_page(jnode_page(jnodes[i]));
-   page_cache_release(jnode_page(jnodes[i]));
-   }
-   return RETERR(-ENOMEM);
+   nr_pages = i;
+   result = RETERR(-ENOMEM);
+   goto out;
}
-
jnodes[i] = jnode_of_page(page);
if (IS_ERR(jnodes[i])) {
unlock_page(page);
page_cache_release(page);
-   while (i --) {
-   jput(jnodes[i]);
-   page_cache_release(jnode_page(jnodes[i]));
-   }
-   return RETERR(-ENOMEM);
+   nr_pages = i;
+   result = RETERR(-ENOMEM);
+   goto out;
}
/* prevent jnode and page from disconnecting */
JF_SET(jnodes[i], JNODE_WRITE_PREPARED);
@@ -1009,7 +1006,6 @@ ssize_t reiser4_write_extent(struct file
 
have_to_update_extent = 0;
 
-   left = count;
page_off = (*pos & (PAGE_CACHE_SIZE - 1));
for (i = 0; i < nr_pages; i ++) {
to_page = PAGE_CACHE_SIZE - page_off;
@@ -1050,14 +1046,26 @@ ssize_t reiser4_write_extent(struct file
flush_dcache_page(page);
kunmap_atomic(kaddr, KM_USER0);
}
-
-   written = filemap_copy_from_user(page, page_off, buf, to_page);
+   written = filemap_copy_from_user_atomic(page, page_off, buf,
+   to_page);
+   if (written != to_page)
+   /* Do it the slow way */
+   written = filemap_copy_from_user_nonatomic(page,
+  page_off,
+  buf,
+  to_page);
+   if (unlikely(written != to_page)) {
+   unlock_page(page);
+   result = RETERR(-EFAULT);
+   

optimizing sendfile

2007-04-05 Thread Yaar Schnitman

Hi,

How can I control the size of the block requests the sendfile() syscall 
performs

against the disk?

I'm using sendfile (on a 2.6.18 kernel) to copy 1M file chunks into a 
socket. The
socket send buffer size is 2MB, and I verify that its empty before 
making the call.
Indeed, 1M chunk is being sent, but from iostat, I can tell that the 
average request
size is around 128KB. Are there any kernel configuration variables that 
could change that?


Help will be appreciated.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 02/12] mm: scalable bdi statistics counters.

2007-04-05 Thread Andrew Morton
On Thu, 05 Apr 2007 19:42:11 +0200
[EMAIL PROTECTED] wrote:

> Provide scalable per backing_dev_info statistics counters modeled on the ZVC
> code.
> 
> Signed-off-by: Peter Zijlstra <[EMAIL PROTECTED]>
> ---
>  block/ll_rw_blk.c   |1 
>  drivers/block/rd.c  |2 
>  drivers/char/mem.c  |2 
>  fs/char_dev.c   |1 
>  fs/fuse/inode.c |1 
>  fs/nfs/client.c |1 
>  include/linux/backing-dev.h |   98 +
>  mm/backing-dev.c|  103 
> 

madness!  Quite duplicative of vmstat.h, yet all this infrastructure
is still only usable in one specific application.

Can we please look at generalising the vmstat.h stuff?

Or, the API in percpu_counter.h appears suitable to this application.
(The comment at line 6 is a total lie).
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 2/2] Optimize compound_head() by avoiding a shared page flag

2007-04-05 Thread Christoph Lameter
Unalias PG_tail for performance reasons

If PG_tail is an alias then we need to check PageCompound before PageTail.
This is particularly bad because the slab and others have to use these tests
in performance critical paths. 

This patch uses one of the freed up software suspend flags that is defined
next to PG_compound.

Excerpt from kfree (page = compound_head(page)) before patch:

r33 = pointer to page struct.

0xa00100170271 :  ld4.acq r14=[r33]
0xa00100170272 :  nop.i 0x0;;
0xa00100170280 :  [MIB]   nop.m 0x0
0xa00100170281 :  tbit.z p9,p8=r14,14
0xa00100170282 :(p09) br.cond.dptk.few 0xa001001702c0 

0xa00100170290 :  [MMI]   ld4.acq r9=[r33]
0xa00100170291 :  nop.m 0x0
0xa00100170292 :  adds r8=16,r33;;
0xa001001702a0 :  [MII]   nop.m 0x0
0xa001001702a1 :  tbit.z p10,p11=r9,17
0xa001001702a2 :  nop.i 0x0
0xa001001702b0 : [MMI]   nop.m 0x0;;
0xa001001702b1 :   (p11) ld8 r33=[r8]
0xa001001702b2 : nop.i 0x0;;
0xa001001702c0 : [MII]   ...

After patch:

r34 pointer to page struct

0xa0010016f541 :  ld4.acq r3=[r34]
0xa0010016f542 :  nop.i 0x0
0xa0010016f550 :  [MMI]   adds r2=16,r34;;
0xa0010016f551 :  nop.m 0x0
0xa0010016f552 :  tbit.z p10,p11=r3,13;;
0xa0010016f560 :  [MII] (p11) ld8 r34=[r2]

No branch anymore.

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

Index: linux-2.6.21-rc5-mm4/include/linux/page-flags.h
===
--- linux-2.6.21-rc5-mm4.orig/include/linux/page-flags.h2007-04-05 
15:18:33.0 -0700
+++ linux-2.6.21-rc5-mm4/include/linux/page-flags.h 2007-04-05 
15:18:39.0 -0700
@@ -82,6 +82,7 @@
 #define PG_private 11  /* If pagecache, has fs-private data */
 
 #define PG_writeback   12  /* Page is under writeback */
+#define PG_tail13  /* Page is tail of a compound 
page */
 #define PG_compound14  /* Part of a compound page */
 #define PG_swapcache   15  /* Swap page: swp_entry_t in private */
 
@@ -95,12 +96,6 @@
 /* PG_owner_priv_1 users should have descriptive aliases */
 #define PG_checked PG_owner_priv_1 /* Used by some filesystems */
 
-/*
- * Marks tail portion of a compound page. We currently do not reclaim
- * compound pages so we can reuse a flag only used for reclaim here.
- */
-#define PG_tailPG_reclaim
-
 #if (BITS_PER_LONG > 32)
 /*
  * 64-bit-only flags build down from bit 31
@@ -220,10 +215,6 @@ static inline void SetPageUptodate(struc
 #define __SetPageCompound(page)__set_bit(PG_compound, &(page)->flags)
 #define __ClearPageCompound(page) __clear_bit(PG_compound, &(page)->flags)
 
-/*
- * Note: PG_tail is an alias of another page flag. The result of PageTail()
- * is only valid if PageCompound(page) is true.
- */
 #define PageTail(page) test_bit(PG_tail, &(page)->flags)
 #define __SetPageTail(page)__set_bit(PG_tail, &(page)->flags)
 #define __ClearPageTail(page)  __clear_bit(PG_tail, &(page)->flags)
Index: linux-2.6.21-rc5-mm4/mm/page_alloc.c
===
--- linux-2.6.21-rc5-mm4.orig/mm/page_alloc.c   2007-04-05 15:18:33.0 
-0700
+++ linux-2.6.21-rc5-mm4/mm/page_alloc.c2007-04-05 15:18:39.0 
-0700
@@ -500,18 +500,13 @@ static inline int free_pages_check(struc
1 << PG_private |
1 << PG_locked  |
1 << PG_active  |
+   1 << PG_reclaim |
1 << PG_slab|
1 << PG_swapcache |
1 << PG_writeback |
1 << PG_reserved |
1 << PG_buddy 
bad_page(page);
-   /*
-* PageReclaim == PageTail. It is only an error
-* for PageReclaim to be set if PageCompound is clear.
-*/
-   if (unlikely(!PageCompound(page) && PageReclaim(page)))
-   bad_page(page);
if (PageDirty(page))
__ClearPageDirty(page);
/*
Index: linux-2.6.21-rc5-mm4/mm/internal.h
===
--- linux-2.6.21-rc5-mm4.orig/mm/internal.h 2007-04-05 15:18:33.0 
-0700
+++ linux-2.6.21-rc5-mm4/mm/internal.h  2007-04-05 15:18:39.0 -0700
@@ -24,7 +24,7 @@ static inline void set_page_count(struct
  */
 static inline void set_page_refcounted(struct page *page)
 {
-   VM_BUG_ON(PageCompound(page) && PageTail(page));
+   VM_BUG_ON(PageTail(page));
VM_BUG_ON(atomic_read(>_count));
set_page_count(page, 1);
 }
Index: linux-2.6.21-rc5-mm4/include/linux/mm.h

Re: Oops in scsi_send_eh_cmnd 2.6.21-rc5-git6,7,10,13

2007-04-05 Thread David Miller
From: Andrew Burgess <[EMAIL PROTECTED]>
Date: Thu, 5 Apr 2007 15:13:27 -0700

> David, do you see any other problems with scsi_send_eh_cmnd?
> 
> I've switched back to 2.6.18 which seems to not oops 
> and am happy to try patches.

Does 2.6.20 with my patch OOPS too?  Does reverting my patch
make the oops go away?

If reverting my patch makes the OOPS go away, we need to
verify if page_address() is returning crap for some reason
or the length is wrong.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH 1/2] Make page->private usable in compound pages V1

2007-04-05 Thread Christoph Lameter
[PATCH] Free up page->private for compound pages

If we add a new flag so that we can distinguish between the
first page and the tail pages then we can avoid to use page->private
in the first page. page->private == page for the first page, so there
is no real information in there.

Freeing up page->private makes the use of compound pages more transparent.
They become more usable like real pages. Right now we have to be careful f.e.
if we are going beyond PAGE_SIZE allocations in the slab on i386 because we
can then no longer use the private field. This is one of the issues that
cause us not to support debugging for page size slabs in SLAB.

Having page->private available for SLUB would allow more meta information
in the page struct. I can probably avoid the 16 bit ints that I have in
there right now.

Also if page->private is available then a compound page may be equipped
with buffer heads. This may free up the way for filesystems to support
larger blocks than page size.

We add PageTail as an alias of PageReclaim. Compound pages cannot
currently be reclaimed. Because of the alias one needs to check
PageCompound first.

The RFC for the this approach was discussed at
http://marc.info/?t=11757430281=1=2

Signed-off-by: Christoph Lameter <[EMAIL PROTECTED]>

Index: linux-2.6.21-rc5-mm4/include/linux/mm.h
===
--- linux-2.6.21-rc5-mm4.orig/include/linux/mm.h2007-04-05 
13:59:23.0 -0700
+++ linux-2.6.21-rc5-mm4/include/linux/mm.h 2007-04-05 14:08:11.0 
-0700
@@ -297,17 +297,28 @@ static inline int get_page_unless_zero(s
return atomic_inc_not_zero(>_count);
 }
 
+static inline struct page *compound_head(struct page *page)
+{
+   /*
+* We could avoid the PageCompound(page) check if
+* we would not overload PageTail().
+*
+* This check has to be done in several performance critical
+* paths of the slab etc. IMHO PageTail deserves its own flag.
+*/
+   if (unlikely(PageCompound(page) && PageTail(page)))
+   return page->first_page;
+   return page;
+}
+
 static inline int page_count(struct page *page)
 {
-   if (unlikely(PageCompound(page)))
-   page = (struct page *)page_private(page);
-   return atomic_read(>_count);
+   return atomic_read(_head(page)->_count);
 }
 
 static inline void get_page(struct page *page)
 {
-   if (unlikely(PageCompound(page)))
-   page = (struct page *)page_private(page);
+   page = compound_head(page);
VM_BUG_ON(atomic_read(>_count) == 0);
atomic_inc(>_count);
 }
@@ -344,6 +355,18 @@ static inline compound_page_dtor *get_co
return (compound_page_dtor *)page[1].lru.next;
 }
 
+static inline int compound_order(struct page *page)
+{
+   if (!PageCompound(page) || PageTail(page))
+   return 0;
+   return (unsigned long)page[1].lru.prev;
+}
+
+static inline void set_compound_order(struct page *page, unsigned long order)
+{
+   page[1].lru.prev = (void *)order;
+}
+
 /*
  * Multiple processes may "see" the same page. E.g. for untouched
  * mappings of /dev/null, all processes see the same page full of
Index: linux-2.6.21-rc5-mm4/include/linux/page-flags.h
===
--- linux-2.6.21-rc5-mm4.orig/include/linux/page-flags.h2007-04-05 
13:59:23.0 -0700
+++ linux-2.6.21-rc5-mm4/include/linux/page-flags.h 2007-04-05 
14:00:56.0 -0700
@@ -95,6 +95,12 @@
 /* PG_owner_priv_1 users should have descriptive aliases */
 #define PG_checked PG_owner_priv_1 /* Used by some filesystems */
 
+/*
+ * Marks tail portion of a compound page. We currently do not reclaim
+ * compound pages so we can reuse a flag only used for reclaim here.
+ */
+#define PG_tailPG_reclaim
+
 #if (BITS_PER_LONG > 32)
 /*
  * 64-bit-only flags build down from bit 31
@@ -214,6 +220,14 @@ static inline void SetPageUptodate(struc
 #define __SetPageCompound(page)__set_bit(PG_compound, &(page)->flags)
 #define __ClearPageCompound(page) __clear_bit(PG_compound, &(page)->flags)
 
+/*
+ * Note: PG_tail is an alias of another page flag. The result of PageTail()
+ * is only valid if PageCompound(page) is true.
+ */
+#define PageTail(page) test_bit(PG_tail, &(page)->flags)
+#define __SetPageTail(page)__set_bit(PG_tail, &(page)->flags)
+#define __ClearPageTail(page)  __clear_bit(PG_tail, &(page)->flags)
+
 #ifdef CONFIG_SWAP
 #define PageSwapCache(page)test_bit(PG_swapcache, &(page)->flags)
 #define SetPageSwapCache(page) set_bit(PG_swapcache, &(page)->flags)
Index: linux-2.6.21-rc5-mm4/mm/internal.h
===
--- linux-2.6.21-rc5-mm4.orig/mm/internal.h 2007-04-05 13:59:24.0 
-0700
+++ linux-2.6.21-rc5-mm4/mm/internal.h  2007-04-05 14:00:56.0 -0700
@@ -24,7 +24,7 @@ 

Re: Any Intel folks on the list? Intel PCI-E bridge ACPI resource question

2007-04-05 Thread Justin Piszcz




On Thu, 5 Apr 2007, Justin Piszcz wrote:


http://www.ussg.iu.edu/hypermail/linux/kernel/0701.3/0315.html



Here is the badblocks output:

p34:~# /usr/bin/time badblocks -b 512 -s -v -w /dev/sdl
Checking for bad blocks in read-write mode

From block 0 to 293046768

Testing with pattern 0xaa: done
Reading and comparing: done
Testing with pattern 0x55: done
Reading and comparing: done
Testing with pattern 0xff: done
Reading and comparing: done
Testing with pattern 0x00: done
Reading and comparing: done
Pass completed, 0 bad blocks found.
1929.06user 467.89system 4:36:23elapsed 14%CPU (0avgtext+0avgdata 
0maxresident)k

0inputs+0outputs (1major+257minor)pagefaults 0swaps
p34:~#

Nothing wrong with the drive.  This problem concerns me greatly as I am 
not sure what I can do to fix this issue, how can I make sure it does not 
happen again?


Justin.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 1/4] Generic Virtual Memmap suport for SPARSEMEM V3

2007-04-05 Thread Christoph Lameter
On Thu, 5 Apr 2007, David Miller wrote:

> Hey Christoph, here is sparc64 support for this stuff.

Great!

> After implementing this and seeing more and more how it works, I
> really like it :-)
> 
> Thanks a lot for doing this work Christoph!

Thanks for the appreciation. CCing Andy Whitcroft who will hopefully 
merge this all of this together into sparsemem including the S/390 
implementation.


-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Any Intel folks on the list? Intel PCI-E bridge ACPI resource question

2007-04-05 Thread Justin Piszcz



On Thu, 5 Apr 2007, Justin Piszcz wrote:




On Thu, 5 Apr 2007, Justin Piszcz wrote:


http://www.ussg.iu.edu/hypermail/linux/kernel/0701.3/0315.html

I have similar issues as this poster-- I was wondering (if anyone) had an 
idea to the root cause of this issue; is it a problem with the chipset, the 
BIOS revision?


Mobo: Intel DG965WHMKR
BIOS: 1666

Is it only Intel Chipsets that suffer from this problem?

... or is it a way the kernel handles ACPI/IO-APIC/etc?

Justin.



p34:~# /usr/bin/time badblocks -b 512 -s -v -w /dev/sdl
Checking for bad blocks in read-write mode
From block 0 to 293046768
Testing with pattern 0xaa: done
Reading and comparing: done
Testing with pattern 0x55: done
Reading and comparing: done
Testing with pattern 0xff: done
Reading and comparing: done
Testing with pattern 0x00: done
Reading and comparing: done
Pass completed, 0 bad blocks found.
1929.06user 467.89system 4:36:23elapsed 14%CPU (0avgtext+0avgdata
0maxresident)k
0inputs+0outputs (1major+257minor)pagefaults 0swaps
p34:~#

Not a single bad block found.  Does the ICH8 chipset have issues, or the
cards I am using and how they are routed?

Any suggestions as to what this is?

Justin.




http://www.linuxhq.com/kernel/v2.6/18/drivers/scsi/sata_sil24.c

+   [PORT_CERR_SEND]   = { AC_ERR_ATA_BUS, ATA_EH_SOFTRESET,
+"failed to transmit command FIS" },
+   [PORT_CERR_INCONSISTENT] = { AC_ERR_HSM, ATA_EH_SOFTRESET,
+ "protocol mismatch" },

Is this a chipset or a problem with the PCI-e x1 SiI dual SATA port card?

Justin.
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: Oops in scsi_send_eh_cmnd 2.6.21-rc5-git6,7,10,13

2007-04-05 Thread Andrew Burgess
Chuck Ebbert wrote:

>Andrew Burgess wrote:
>
>> Apr  5 03:45:16 cichlid kernel: 3w-: scsi2: Command failed: status = 
>> 0xc7, flags = 0x7f, unit #4.
>> Apr  5 03:45:20 cichlid kernel: 3w-: scsi2: Command failed: status = 
>> 0xc7, flags = 0x80, unit #4.
>> Apr  5 03:47:20 cichlid kernel: 3w-: scsi0: Command failed: status = 
>> 0xc7, flags = 0x80, unit #0.
>> Apr  5 03:47:20 cichlid kernel: 3w-: scsi0: Command failed: status = 
>> 0xc7, flags = 0x80, unit #1.
>> Apr  5 04:00:08 cichlid kernel: 3w-: scsi0: Command failed: status = 
>> 0xc7, flags = 0x80, unit #0.
>..
>> Apr  5 04:00:08 cichlid kernel: 
>> Apr  5 04:00:08 cichlid kernel: general protection fault:  [1] PREEMPT 
>> SMP 
>> Apr  5 04:00:08 cichlid kernel: CPU 1 
>> Apr  5 04:00:08 cichlid kernel: Modules linked in: dm_multipath multipath 
>> linear raid456 xor raid1 md_mod act_police sch_ingress sch_sfq sch_cbq 
>> ipt_TOS cls_u32 sch_htb ipt_MASQUERADE ipt_LOG xt_multiport nf_nat_ftp 
>> nf_conntrack_ftp iptable_mangle iptable_nat nf_nat emi26 w83627hf hwmon_vid 
>> i2c_isa sunrpc ipt_REJECT xt_tcpudp nf_conntrack_ipv4 xt_state nf_conntrack 
>> nfnetlink iptable_filter ip_tables x_tables freq_table sr_mod loop dm_mirror 
>> dm_mod video thermal sbs processor i2c_ec fan dock button battery asus_acpi 
>> ac parport_pc lp parport floppy nvram snd_usb_audio snd_ice1712 
>> snd_ice17xx_ak4xxx snd_via82xx sg gameport snd_seq_dummy pcspkr 
>> snd_ak4xxx_adda snd_cs8427 snd_via82xx_modem snd_seq_oss snd_ac97_codec 
>> sata_via snd_i2c snd_seq_midi_event snd_seq skge i2c_viapro snd_mpu401_uart 
>> i2c_core k8temp ac97_bus snd_pcm_oss snd_mixer_oss snd_pcm snd_timer 
>> snd_page_alloc dsbr100 snd_usb_lib snd_rawmidi compat_ioctl32 snd_seq_device 
>> videodev v4l2_common v4l1_compat snd_hwdep !
 snd soundcore sisusbvga ftdi_sio usb_stor
>> Apr  5 04:00:08 cichlid kernel: ge serio_raw emi62 usbserial asix usbnet 
>> ata_piix 3w_ ata_generic pata_via libata sd_mod scsi_mod ext3 jbd 
>> ehci_hcd ohci_hcd uhci_hcd
>> Apr  5 04:00:08 cichlid kernel: Pid: 386, comm: scsi_eh_0 Tainted: G   M   
>> 2.6.21-rc5-git10-1-slab-debug #1
>> Apr  5 04:00:08 cichlid kernel: RIP: 0010:[memcpy_c+11/32]  [memcpy_c+11/32] 
>> memcpy_c+0xb/0x20
>> Apr  5 04:00:08 cichlid kernel: RIP: 0010:[]  
>> [] memcpy_c+0xb/0x20
>> Apr  5 04:00:08 cichlid kernel: RSP: :8100beebbce8  EFLAGS: 00010246
>> Apr  5 04:00:08 cichlid kernel: RAX: 8100b4978140 RBX: 2003 
>> RCX: 000c
>> Apr  5 04:00:08 cichlid kernel: RDX:  RSI: 6ddaa592 
>> RDI: 8100b4978140
>> Apr  5 04:00:08 cichlid kernel: RBP: 8100beebbe20 R08: 0002 
>> R09: 0001
>> Apr  5 04:00:08 cichlid kernel: R10:  R11:  
>> R12: 8100b4978140
>> Apr  5 04:00:08 cichlid kernel: R13: 8807a1ce R14: 8100b4978058 
>> R15: 8100beebc000
>> Apr  5 04:00:08 cichlid kernel: FS:  2b615f4dc240() 
>> GS:8100bffae4c8() knlGS:f795eb90
>> Apr  5 04:00:08 cichlid kernel: CS:  0010 DS: 0018 ES: 0018 CR0: 
>> 8005003b
>> Apr  5 04:00:08 cichlid kernel: CR2: 2b615f4b9000 CR3: a25ad000 
>> CR4: 06e0
>> Apr  5 04:00:08 cichlid kernel: Process scsi_eh_0 (pid: 386, threadinfo 
>> 8100beeba000, task 8100bedd0100)
>> Apr  5 04:00:08 cichlid kernel: Stack:  88055efc 2711 
>> 8100b49780b4 8100bef94508
>> Apr  5 04:00:08 cichlid kernel:  00020002 1a240001 
>> 810013404c78 
>> Apr  5 04:00:08 cichlid kernel:  8101 dead4ead 
>>  
>> Apr  5 04:00:08 cichlid kernel: Call Trace:
>> Apr  5 04:00:08 cichlid kernel:  [_end+123674792/2126102956] 
>> :scsi_mod:scsi_send_eh_cmnd+0x3fc/0x480
>> Apr  5 04:00:08 cichlid kernel:  [] 
>> :scsi_mod:scsi_send_eh_cmnd+0x3fc/0x480
>> Apr  5 04:00:08 cichlid kernel:  [thread_return+230/301] 
>> thread_return+0xe6/0x12d
>> Apr  5 04:00:08 cichlid kernel:  [] 
>> thread_return+0xe6/0x12d
>> Apr  5 04:00:08 cichlid kernel:  [_end+123678124/2126102956] 
>> :scsi_mod:scsi_error_handler+0x0/0x540
>> Apr  5 04:00:08 cichlid kernel:  [] 
>> :scsi_mod:scsi_error_handler+0x0/0x540
>> Apr  5 04:00:08 cichlid kernel:  [keventd_create_kthread+0/144] 
>> keventd_create_kthread+0x0/0x90
>> Apr  5 04:00:08 cichlid kernel:  [] 
>> keventd_create_kthread+0x0/0x90
>> Apr  5 04:00:08 cichlid kernel:  [kthread+218/272] kthread+0xda/0x110
>> Apr  5 04:00:08 cichlid kernel:  [] kthread+0xda/0x110
>> Apr  5 04:00:08 cichlid autoblacklist[9591]: src= proto= srcport= destport= 
>> srcname= destportname= srcportname= icmptype= icmpcode=
>> Apr  5 04:00:08 cichlid kernel:  [child_rip+10/18] child_rip+0xa/0x12
>> Apr  5 04:00:08 cichlid kernel:  [] child_rip+0xa/0x12
>> Apr  5 04:00:08 cichlid kernel:  [schedule_tail+124/240] 
>> schedule_tail+0x7c/0xf0
>> Apr  5 04:00:08 cichlid kernel:  [] 

Re: Any Intel folks on the list? Intel PCI-E bridge ACPI resource question

2007-04-05 Thread Justin Piszcz



On Thu, 5 Apr 2007, Justin Piszcz wrote:


http://www.ussg.iu.edu/hypermail/linux/kernel/0701.3/0315.html

I have similar issues as this poster-- I was wondering (if anyone) had an 
idea to the root cause of this issue; is it a problem with the chipset, the 
BIOS revision?


Mobo: Intel DG965WHMKR
BIOS: 1666

Is it only Intel Chipsets that suffer from this problem?

... or is it a way the kernel handles ACPI/IO-APIC/etc?

Justin.



p34:~# /usr/bin/time badblocks -b 512 -s -v -w /dev/sdl
Checking for bad blocks in read-write mode

From block 0 to 293046768

Testing with pattern 0xaa: done
Reading and comparing: done
Testing with pattern 0x55: done
Reading and comparing: done
Testing with pattern 0xff: done
Reading and comparing: done
Testing with pattern 0x00: done
Reading and comparing: done
Pass completed, 0 bad blocks found.
1929.06user 467.89system 4:36:23elapsed 14%CPU (0avgtext+0avgdata
0maxresident)k
0inputs+0outputs (1major+257minor)pagefaults 0swaps
p34:~#

Not a single bad block found.  Does the ICH8 chipset have issues, or the
cards I am using and how they are routed?

Any suggestions as to what this is?

Justin.

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Any Intel folks on the list? Intel PCI-E bridge ACPI resource question

2007-04-05 Thread Justin Piszcz

http://www.ussg.iu.edu/hypermail/linux/kernel/0701.3/0315.html

I have similar issues as this poster-- I was wondering (if anyone) had an 
idea to the root cause of this issue; is it a problem with the chipset, 
the BIOS revision?


Mobo: Intel DG965WHMKR
BIOS: 1666

Is it only Intel Chipsets that suffer from this problem?

... or is it a way the kernel handles ACPI/IO-APIC/etc?

Justin.

(again, the dmesg output posted earlier (below))

[369143.916093] ata13.00: exception Emask 0x0 SAct 0x0 SErr 0x0 action 0x6 
frozen
[369143.916100] ata13.00: (irq_stat 0x00020002, failed to transmit command 
FIS)
[369143.916107] ata13.00: cmd ca/00:00:97:1a:d5/00:00:00:00:00/e9 tag 0 
cdb 0x0 data 131072 out
[369143.916109]  res 93/37:00:00:00:00/00:00:40:00:93/00 Emask 
0x12 (ATA bus error)

[369143.916116] ata13: hard resetting port
[369146.145915] ata13: softreset failed (port not ready)
[369146.145922] ata13: follow-up softreset failed, retrying in 5 secs
[369151.146035] ata13: hard resetting port
[369153.376736] ata13: softreset failed (port not ready)
[369153.376743] ata13: follow-up softreset failed, retrying in 5 secs
[369158.376664] ata13: hard resetting port
[369160.608025] ata13: softreset failed (port not ready)
[369160.608033] ata13: reset failed, giving up
[369160.608036] ata13.00: disabled
[369160.608043] ata13: EH pending after completion, repeating EH (cnt=4)
[369160.718365] ata13: exception Emask 0x10 SAct 0x0 SErr 0x405 action 
0x6 frozen
[369160.718370] ata13: (irq_stat 0x00060002, failed to transmit command 
FIS)

[369161.238432] ata13: waiting for device to spin up (8 secs)
[369168.715610] ata13: hard resetting port
[369170.946658] ata13: softreset failed (port not ready)
[369170.94] ata13: follow-up softreset failed, retrying in 5 secs
[369175.946249] ata13: hard resetting port
[369178.167644] ata13: softreset failed (port not ready)
[369178.167651] ata13: follow-up softreset failed, retrying in 5 secs
[369183.167742] ata13: hard resetting port
[369185.398497] ata13: softreset failed (port not ready)
[369185.398504] ata13: reset failed, giving up
[369185.398522] sd 12:0:0:0: SCSI error: return code = 0x0802
[369185.398526] sdl: Current [descriptor]: sense key: Aborted Command
[369185.398532] Additional sense: Scsi parity error
[369185.398539] Descriptor sense data with sense descriptors (in hex):
[369185.398544] 72 0b 47 00 00 00 00 0c 00 0a 80 00 00 00 00 00
[369185.398572] 00 00 00 00
[369185.398581] end_request: I/O error, dev sdl, sector 164960919
[369185.398586] raid5: Disk failure on sdl1, disabling device. Operation 
continuing on 3 devices

[369185.398617] sd 12:0:0:0: rejecting I/O to offline device
[369185.398625] ata13: EH complete
[369185.398635] ata13.00: detaching (SCSI 12:0:0:0)
[369185.398676] sd 12:0:0:0: SCSI error: return code = 0x0001
[369185.398680] end_request: I/O error, dev sdl, sector 164961175
[369185.398702] raid5:md3: read error not correctable (sector 164962304 on 
sdl1).
[369185.398707] raid5:md3: read error not correctable (sector 164962312 on 
sdl1).
[369185.398711] raid5:md3: read error not correctable (sector 164962320 on 
sdl1).
[369185.398716] raid5:md3: read error not correctable (sector 164962328 on 
sdl1).

[369185.398760] Synchronizing SCSI cache for disk sdl:
[369185.398784] FAILED
[369185.398785]   status = 0, message = 00, host = 4, driver = 00
[369185.398786]   <3>scsi 12:0:0:0: rejecting I/O to dead device
[369185.404619] scsi 12:0:0:0: rejecting I/O to dead device
[369185.404641] scsi 12:0:0:0: rejecting I/O to dead device
[369185.404662] scsi 12:0:0:0: rejecting I/O to dead device
[369185.404682] scsi 12:0:0:0: rejecting I/O to dead device
[369185.404686] scsi 12:0:0:0: rejecting I/O to dead device
[369185.404691] scsi 12:0:0:0: rejecting I/O to dead device
[369185.404712] scsi 12:0:0:0: rejecting I/O to dead device
[369185.404732] scsi 12:0:0:0: rejecting I/O to dead device
[369185.404753] scsi 12:0:0:0: rejecting I/O to dead device
[369185.404774] scsi 12:0:0:0: rejecting I/O to dead device
[369185.404794] scsi 12:0:0:0: rejecting I/O to dead device
[369185.404815] scsi 12:0:0:0: rejecting I/O to dead device
[369185.404844] scsi 12:0:0:0: rejecting I/O to dead device
[369185.404863] scsi 12:0:0:0: rejecting I/O to dead device
[369185.404882] scsi 12:0:0:0: rejecting I/O to dead device
[369185.404900] scsi 12:0:0:0: rejecting I/O to dead device
[369185.404918] scsi 12:0:0:0: rejecting I/O to dead device
[369185.404937] scsi 12:0:0:0: rejecting I/O to dead device
[369185.404956] scsi 12:0:0:0: rejecting I/O to dead device
[369185.413938] RAID5 conf printout:
[369185.413944]  --- rd:4 wd:3
[369185.413948]  disk 0, o:1, dev:sdi1
[369185.413950]  disk 1, o:1, dev:sdj1
[369185.413953]  disk 2, o:0, dev:sdl1
[369185.413956]  disk 3, o:1, dev:sdg1
[369185.418873] RAID5 conf printout:
[369185.418878]  --- rd:4 wd:3
[369185.418881]  disk 0, o:1, dev:sdi1
[369185.418884]  disk 1, o:1, dev:sdj1
[369185.418887]  disk 

Re: [PATCH 01/01] New FBDev driver for Intel Vermilion Range

2007-04-05 Thread Antonino A. Daplas
On Thu, 2007-04-05 at 22:42 +0100, Alan Hourihane wrote:
> On Thu, 2007-04-05 at 21:38 +0200, Arnd Bergmann wrote:
> > On Thursday 05 April 2007, Alan Hourihane wrote:
> > > @@ -0,0 +1,405 @@
> > > +/*  
> > > + * Copyright (c) Intel Corp. 2007.
> > > + * All Rights Reserved.
> > > + *
> > 
> > Saying 'All Rights Reserved' is usually considered the opposite of
> > licensing your code as GPL. I suppose you need to remove that.
> 
> Arnd, 
> 
> Thanks for your comments, and I'll review and make appropriate changes
> as you've suggested.
> 
> As for the above, I've noticed that drivers/video/epson1355fb.c also has
> this wording and is under the GPL. 
> 

"All Rights Reserved" is written notice, a part of copyright law
formality. Nowadays, your works are protected (under the copyright law)
even without that written notice, so the phrase can be excluded.

I don't think the phrase is incompatible with GPL.

Tony 



-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 01/01] New FBDev driver for Intel Vermilion Range

2007-04-05 Thread Arnd Bergmann
On Thursday 05 April 2007, Alan Hourihane wrote:
> As for the above, I've noticed that drivers/video/epson1355fb.c also has
> this wording and is under the GPL.

Yes, many files have it, but that doesn't make it right ;-)

Arnd <><
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] USB gadget rndis: fix bug skb_push function may return an unaligned pointer bug

2007-04-05 Thread David Brownell
On Tuesday 03 April 2007 11:28 pm, Wu, Bryan wrote:
> USB gadget rndis: skb_push function may return a pointer which is not
> aligned as required by struct rndis_packet_msg_type.

Can you instead try to update the declaration of that struct
so that it's "__attribute__((packed))"?  That's less invasive,
and will address similar issues elsewhere ...

- Dave
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   6   7   8   9   >